Branchless UTF-8 Length

I recently read an article about Aha! – A Hacker’s Assistant, a superoptimizer used to find branchless algorithms with brute force. There's a problem for which I always wanted a short branchless solution: finding the length of a UTF-8 byte sequence without a lookup table. So I gave Aha a try.

The length of a UTF-8 byte sequence is determined by its first byte. The possible sequences are:

  • 1-byte sequences start with a byte in the range 0x00-0x7F
  • 2-byte sequences start with a byte in the range 0xC0-0xDF
  • 3-byte sequences start with a byte in the range 0xE0-0xEF
  • 4-byte sequences start with a byte in the range 0xF0-0xF7

(There are a couple of other restrictions but I'm only interested in valid UTF-8 strings and don't care about the results for invalid sequences.)

Aha is straight-forward to use. You write a simple C fragment that contains the function you want to make branchless, compile Aha using that fragment, and run the resulting program. You only have to provide a command-line parameter with the number of operations Aha should try. Other options can be set in a C header file. Since Aha uses brute force, it's only useful for short algorithms. Exhausting all programs with three operations takes under a second. Four operations about 1-10 minutes. Five operations probably in the range of hours (I didn't have to try).

Since the length of a UTF-8 byte sequence is determined by the upper four bits of the first byte, I figured it would be better to provide only these bits as input to my function. So I ended up with the following C fragment:

#include "aha.h"

int userfun(int x) {
    if (x == 0xC || x == 0xD) return 2;
    if (x == 0xE) return 3;
    if (x == 0xF) return 4;
    return 1;

Via the config.h file I told Aha to only try values 0-7 and 0xC-0xF. As intermediates I chose the range -16 to 16, for shift intermediates values 1-3. Unfortunately, Aha didn't find any programs with three operations, but a couple with four operations. Many of the programs are essentially the same, but here are the different classes of solutions (I use >>> for signed (arithmetic) right shift):

((x - 10) >>> (16 - x)) + 2

This is probably the nicest result. Only a single signed right shift and three adds/subtracts. Two operations can also be executed in parallel. The final addition of a constant can sometimes come for free. The "undefined" input values 0x8 and 0x9 yield 1, 0xA and 0xB yield 2.

((10 >> (15 - x)) + 7) >> 2

Two right shifts and two adds/subs. Values 0x08-0x0B all yield 1 which might be useful in some situations. All operations must run sequentially.

(x >> 3) - (-3 >>> (15 - x))

Two right shifts and two adds/subs. Values 0x08-0x0B all yield 2. Two operations can be executed in parallel.

It's interesting that all solutions rely on right-shifting by a value derived from x. Try the above formulas with a few sample values and you'll see how it works.

The expression 15 - x can also be computed as x ^ 15 which should be slightly faster on platforms without reverse subtract. Here's small Perl script that verifies the results and adjusts the expressions to work directly with the first byte of a sequence:

use strict;
use warnings;
use integer; # Enables signed right shift.

for my $c (0..255) {
    my $hi   = $c >> 4;
    my $len1 = ((10 >> ($hi ^ 15)) + 7) >> 2;
    my $len2 = (($c - 160) >> (20 - $hi)) + 2;
    my $len3 = ($c >> 7) - (-3 >> ($hi ^ 15));
    printf("%02X: %d %d %d\n", $c, $len1, $len2, $len3);

Howto: XS Coverage Reports on

The excellent Travis Perl helpers make it easy submit automated coverage reports to Coveralls whenever you commit to GitHub. All that's needed is to set the COVERAGE environment variable.

Unfortunately, this doesn't include the XS portions of a module. But since Devel::Cover already supports XS modules out of the box, only a slight change is needed to make it work. Simply make the Travis Perl helpers compile and link your XS …

Writing XS Like a Pro - PERL_NO_GET_CONTEXT and Static Functions

The perlxs man page recommends to define the PERL_NO_GET_CONTEXT macro before including EXTERN.h, perl.h, and XSUB.h. If this macro is defined, it is assumed that the interpreter context is passed as a parameter to every function. If it's undefined, the context will typically be fetched from thread-local storage when calling the Perl API, which incurs a performance overhead.

Unfortunately, many XS modules still ship without defining PERL_NO_GET_CONTEXT. This is probably due to the f…

Writing XS Like a Pro - The INTERFACE Keyword

Like many Perl hackers with a C background, I did my fair share of XS programming. But mostly, I contributed bug fixes and other minor changes. In the rare case that I had to add a new XSUB, I typically used another XSUB from the same project as template. Unfortunately, this cargo cult is common among XS authors.

When I wrote my first public XS module, CommonMark, from scratch, I decided to read the perlxs documentation front to back to get a better understanding of the features XS …

About Nick Wellnhofer

user-pic I'm a freelancer looking for contract work in Munich, Germany, or remotely. Check out my profiles on MetaCPAN, GitHub, LinkedIn and StackOverflow.