Base conversion

Earlier this year I added some commands to the ntheory module, inspired by Pari/GP and Mathematica, to split numbers into digits and put them together again. Of course this isn't terribly exciting by itself since Perl has split and join, but with optional bases it gets more interesting.

I recently took a look at the various modules that do base conversion (at least 9 modules, plus various standalone subroutines). Each has slightly different features and interfaces, and the performance at the extremes differs by over 10,000x. I've made some internal changes to ntheory based on my tests, which should show up in the next release.

Add: This is for Perl 5. Perl6 has native support for base conversions for bases 2-36 and seamless Bigint support. It Just Works. For larger bases or alternate encodings, bbkr's TinyID module can be used.

Base conversion is used for different reasons. My interest has been purely for math reasons, rather than the more (in my view) web-oriented things such as base-64, base-85, and ASCII. ntheory will do base conversions independent of encoding if desired (an array of digit values), while most of the other modules here only deal with encoded numbers. Some modules are very flexible with encoding character sets, while others (including ntheory) stick with 0..9,a..z. Bigint handling varies greatly.

Since I'm always interested in performance, here are some benchmarks. 1000 random 60-bit integers are created, then converted to the input base. The benchmark sub does @output = map { convert } @input, which helps remove overhead vs. converting only one in a sub. The output arrays are verified to match. ntheory is an XS module, the rest (other than builtin and POSIX) are all pure Perl.

Base 6 to base 26:

Mastering Alg     3.35e-02/s
simple               0.602/s
Math::BaseConvert    0.605/s
Math::BaseCnv        0.611/s
Convert::AnyBase      57.4/s
Math::NumberBase      78.4/s
Math::BaseCalc        89.3/s
Math::Int2Base         105/s
ntheory               1627/s

Convert to base 2:

Mastering Alg       0.113/s
Math::BaseConvert   0.299/s
Math::BaseCnv       0.301/s
Math::Base::Convert  27.9/s
Convert::AnyBase     36.4/s
Math::NumberBase     46.7/s
simple               64.0/s
Math::BaseCalc       72.7/s
Math::Int2Base       91.7/s
builtin              1953/s
ntheory              2424/s

Convert from base 16:

Mastering Alg       0.271/s
Math::BaseConvert    1.92/s
simple               1.95/s
Math::BaseCnv        1.97/s
Math::Base::Convert  22.5/s
Math::BaseCalc        180/s
Math::NumberBase      190/s
Math::Int2Base        216/s
Convert::AnyBase      380/s
POSIX                3099/s
ntheory              3709/s

There are some builtins for common conversions. For instance:

sub to2  { sprintf "%b", shift; }
sub to16 { sprintf "%x", shift; }
sub from2  { unpack("N", pack("B32", substr("0" x 32 . shift, -32))); }
sub from16 { hex(shift); }

though they are limited to bases 2,8,10,16 and limited to the native word size (hex will warn if given a 33+bit input). The POSIX core module includes strtoul that will work with bases 2-36, but limited to the size of an unsigned long (which may or may not be the same size as Perl's UV integers!). These are quite fast.

There is some simple Perl code on Rosetta Code: Non-decimal radices/Convert. It works for bases 2-36, though not terribly quickly. It should be clear how to adjust it to handle other bases / encodings.

The book "Mastering Algorithms with Perl" (1999, errata1, errata2) has a base conversion routine. It's interesting, it works, but it's very slow -- over 10x slower than the much shorter RosettaCode routines. Bases 2-36, no bigints.

ntheory (2012-2016) is the one XS module in this list, so it's no surprise that it is the fastest, typically over 10x faster than the next fastest, and about the same speed as the builtins. It supports bases 2-231 for arrays, but string encodings only 2-36 (lower case on output, either case for input).

fromdigits( $n );                  # implied base 10, number/string/bigint
fromdigits( "1c8", 16 );           # bases 2-36, any length string
fromdigits( [1,2,0,0,2,1], 3 );    # bases 2-2^31, digits of number

This takes a number in the given base and turns it into a single base-10 integer, possibly a bigint.

For todigits, the input $n is always an integer, hence base 10. It can be an actual int, a string, or a bigint. The result is an array of digits in the optional base, which is 2-231.

todigits( $n );          # implied base 10, returns array of digits
todigits( $n, 2);        # turns $n into an array of binary digits
todigits( $n, 23, 14);   # exactly 14 base-23 digits

The optional third argument will either pad with leading zeros or truncate leading digits. It would be most common to use for things like needing exactly 32 binary digits.

Rather than rely on context (which too often goes wrong or requires wrapping in scalar()), I decided to use a separate function to return a string.

todigitstring( $n );       # implied base 10, basically a no-op.
todigitstring( $n, 23 );   # bases 2-36, encoded lower case
todigitstring( $n, 2, 4);  # bases 2-36, pads or truncates to 4 characters

Putting them together, we can convert strings in base 6 to base 27 like:

$n27 = todigitstring( fromdigits( $n6, 6 ), 27 );

Math::Int2Base (2008-2011) is a nice simple module that does base conversions with bases 2-62. In my benchmarks it is typically one of the fastest (barring ntheory and builtins), and the interface is very simple.

$s = int2base( $n, $b );   # converts $n (base-10) to $s (base-$b)
$n = base2int( $s, $b );   # converts $s (base-$b) to $n (base-10)

An optional third argument for int2base will pad with leading zeros if desired. Input and output are hard-coded to 0..9,A..Z,a..z, so bases under 36 must use upper case for input, and upper case will be output.

BigInts are supported for int2base if the input number is a bigint, and for base2int if the base is a bigint, with a caveat. Math::BigInt and Math::Pari work, but Math::GMP and Math::GMPz will both give incorrect results due to some weird interaction with int(). v0.59 and earlier of ntheory has the exact same issue.

With v5.21 and newer, int2base outputs a warning. I've filed an RT.

Math::BaseCnv (2003-2016) is also easy to use, and supports bases 2-64 by default, and many more encodings by name as well as custom settings if desired. By default upper case must be used for input and will be used in output.

$n27 = cnv( $n6, 6, 27 );   # convert input from first base into second.

As mentioned, really easy to use. The downside is that it is quite slow. If speed wasn't an issue and I wanted web-ish or arbitrary encodings, this is probably the module I'd use.

Math::BaseConvert is a fork of Math::BaseCnv to fix version/metadata issues that seem to now be resolved in the original. No surprise, the performance and functionality are almost identical. While the description says "fast functions [...]", it is not fast (Math::BaseCnv changed to "simple functions" a while back).

Math::BaseCalc (1999-2013) creates a conversion object that is then used in conversions to/from the base. There are few presets, but typically an array ref of characters is given to denote the encoding.

my $cnv7 = Math::BaseCalc->new(digits=>[0..6]);
$n = $cnv7->from_base("6543210");

Two objects are needed to convert between two non-decimal bases. Speed isn't bad. Bigints are supported by to_base but not from_base.

Math::NumberBase (2009) also uses a conversion object. Bases 2-36 use default lower-case symbols, but alternate sets can be given. There is a way to wrap two conversions together, though still using two objects.

my $cnv7 = Math::NumberBase->new(7);
$n = $cnv7->to_decimal("6543210");

Math::Base::Convert (2012-2015) has quite a few features, including both function and object interfaces. The object method performs pre-optimizations, and it has a lot of layers trying to map to fast conversion functions.

Strangely it does not seem to support bases other than powers of 2 and its named bases. I have filed an RT, because this is not the documented functionality. The cnv function returns either a value or an array depending on context, meaning I had to remember to wrap scalar() around it in most uses. I'm not a fan of using context here.

My benchmarks did not show it as particularly fast in most conversions, whether using the function or object method. For Bigints it does look better, and I made changes to ntheory based on this.

Convert::AnyBase (2009) uses Moose, which makes it a monster for dependencies compared to anything else. On the other hand, it uses this for some unique features. In particular, a user-defined sub can be applied to any input string, which allows things like case adjustment, handling negative inputs, invalid input, transforms (e.g. o to 0), etc.

A convenience downside is that a full symbol set must be given when creating the object. I used a shortcut thus:

my $fb=Convert::AnyBase->new(set => substr("0123456789abcdefghijklmnopqrstuvwxyz", 0, $base));

and as usual with most OO modules, two objects are needed for conversion between non-decimal bases.

Convert::BaseN (2008) is quite different than the other modules here, as it expects input and output as binary strings. I did not use it, as it really doesn't fit the use model here.

Math::Fleximal (2001-2005) is another module that doesn't really fit with the others. One creates flex objects that have a particular base, then one can do math with them, or convert to flexes with different bases. I didn't use this much, as it is overkill for simple individual number base conversions.


For Perl 6 you can use
to convert to any base using provided alphabet.

Leave a comment

About Dana Jacobsen

user-pic I've been using Perl since 1991 (4.010 or so). At some point I realized that while I have used many languages, I enjoy using Perl more than any others.