Base conversion
Earlier this year I added some commands to the ntheory module, inspired by Pari/GP and Mathematica, to split numbers into digits and put them together again. Of course this isn't terribly exciting by itself since Perl has split and join, but with optional bases it gets more interesting.
I recently took a look at the various modules that do base conversion (at least 9 modules, plus various standalone subroutines). Each has slightly different features and interfaces, and the performance at the extremes differs by over 10,000x. I've made some internal changes to ntheory based on my tests, which should show up in the next release.
Add: This is for Perl 5. Perl6 has native support for base conversions for bases 2-36 and seamless Bigint support. It Just Works. For larger bases or alternate encodings, bbkr's TinyID module can be used.
Base conversion is used for different reasons. My interest has been purely for math reasons, rather than the more (in my view) web-oriented things such as base-64, base-85, and ASCII. ntheory will do base conversions independent of encoding if desired (an array of digit values), while most of the other modules here only deal with encoded numbers. Some modules are very flexible with encoding character sets, while others (including ntheory) stick with 0..9,a..z. Bigint handling varies greatly.
Since I'm always interested in performance, here are some benchmarks. 1000 random 60-bit integers are created, then converted to the input base. The benchmark sub does @output = map { convert } @input, which helps remove overhead vs. converting only one in a sub. The output arrays are verified to match. ntheory
is an XS module, the rest (other than builtin and POSIX) are all pure Perl.
Base 6 to base 26:
Mastering Alg 3.35e-02/s
simple 0.602/s
Math::BaseConvert 0.605/s
Math::BaseCnv 0.611/s
Convert::AnyBase 57.4/s
Math::NumberBase 78.4/s
Math::BaseCalc 89.3/s
Math::Int2Base 105/s
ntheory 1627/s
Convert to base 2:
Mastering Alg 0.113/s
Math::BaseConvert 0.299/s
Math::BaseCnv 0.301/s
Math::Base::Convert 27.9/s
Convert::AnyBase 36.4/s
Math::NumberBase 46.7/s
simple 64.0/s
Math::BaseCalc 72.7/s
Math::Int2Base 91.7/s
builtin 1953/s
ntheory 2424/s
Convert from base 16:
Mastering Alg 0.271/s
Math::BaseConvert 1.92/s
simple 1.95/s
Math::BaseCnv 1.97/s
Math::Base::Convert 22.5/s
Math::BaseCalc 180/s
Math::NumberBase 190/s
Math::Int2Base 216/s
Convert::AnyBase 380/s
POSIX 3099/s
ntheory 3709/s
There are some builtins for common conversions. For instance:
sub to2 { sprintf "%b", shift; }
sub to16 { sprintf "%x", shift; }
sub from2 { unpack("N", pack("B32", substr("0" x 32 . shift, -32))); }
sub from16 { hex(shift); }
though they are limited to bases 2,8,10,16 and limited to the native word size (hex will warn if given a 33+bit input). The POSIX core module includes strtoul that will work with bases 2-36, but limited to the size of an unsigned long (which may or may not be the same size as Perl's UV integers!). These are quite fast.
There is some simple Perl code on Rosetta Code: Non-decimal radices/Convert. It works for bases 2-36, though not terribly quickly. It should be clear how to adjust it to handle other bases / encodings.
The book "Mastering Algorithms with Perl" (1999, errata1, errata2) has a base conversion routine. It's interesting, it works, but it's very slow -- over 10x slower than the much shorter RosettaCode routines. Bases 2-36, no bigints.
ntheory (2012-2016) is the one XS module in this list, so it's no surprise that it is the fastest, typically over 10x faster than the next fastest, and about the same speed as the builtins. It supports bases 2-231 for arrays, but string encodings only 2-36 (lower case on output, either case for input).
fromdigits( $n ); # implied base 10, number/string/bigint
fromdigits( "1c8", 16 ); # bases 2-36, any length string
fromdigits( [1,2,0,0,2,1], 3 ); # bases 2-2^31, digits of number
This takes a number in the given base and turns it into a single base-10 integer, possibly a bigint.
For todigits
, the input $n
is always an integer, hence base 10. It can be an actual int, a string, or a bigint. The result is an array of digits in the optional base, which is 2-231.
todigits( $n ); # implied base 10, returns array of digits
todigits( $n, 2); # turns $n into an array of binary digits
todigits( $n, 23, 14); # exactly 14 base-23 digits
The optional third argument will either pad with leading zeros or truncate leading digits. It would be most common to use for things like needing exactly 32 binary digits.
Rather than rely on context (which too often goes wrong or requires wrapping in scalar()
), I decided to use a separate function to return a string.
todigitstring( $n ); # implied base 10, basically a no-op.
todigitstring( $n, 23 ); # bases 2-36, encoded lower case
todigitstring( $n, 2, 4); # bases 2-36, pads or truncates to 4 characters
Putting them together, we can convert strings in base 6 to base 27 like:
$n27 = todigitstring( fromdigits( $n6, 6 ), 27 );
Math::Int2Base (2008-2011) is a nice simple module that does base conversions with bases 2-62. In my benchmarks it is typically one of the fastest (barring ntheory and builtins), and the interface is very simple.
$s = int2base( $n, $b ); # converts $n (base-10) to $s (base-$b)
$n = base2int( $s, $b ); # converts $s (base-$b) to $n (base-10)
An optional third argument for int2base
will pad with leading zeros if desired. Input and output are hard-coded to 0..9,A..Z,a..z, so bases under 36 must use upper case for input, and upper case will be output.
BigInts are supported for int2base
if the input number is a bigint, and for base2int
if the base is a bigint, with a caveat. Math::BigInt and Math::Pari work, but Math::GMP and Math::GMPz will both give incorrect results due to some weird interaction with int()
. v0.59 and earlier of ntheory
has the exact same issue.
With v5.21 and newer, int2base
outputs a warning. I've filed an RT.
Math::BaseCnv (2003-2016) is also easy to use, and supports bases 2-64 by default, and many more encodings by name as well as custom settings if desired. By default upper case must be used for input and will be used in output.
$n27 = cnv( $n6, 6, 27 ); # convert input from first base into second.
As mentioned, really easy to use. The downside is that it is quite slow. If speed wasn't an issue and I wanted web-ish or arbitrary encodings, this is probably the module I'd use.
Math::BaseConvert is a fork of Math::BaseCnv to fix version/metadata issues that seem to now be resolved in the original. No surprise, the performance and functionality are almost identical. While the description says "fast functions [...]", it is not fast (Math::BaseCnv changed to "simple functions" a while back).
Math::BaseCalc (1999-2013) creates a conversion object that is then used in conversions to/from the base. There are few presets, but typically an array ref of characters is given to denote the encoding.
my $cnv7 = Math::BaseCalc->new(digits=>[0..6]);
$n = $cnv7->from_base("6543210");
Two objects are needed to convert between two non-decimal bases. Speed isn't bad. Bigints are supported by to_base
but not from_base
.
Math::NumberBase (2009) also uses a conversion object. Bases 2-36 use default lower-case symbols, but alternate sets can be given. There is a way to wrap two conversions together, though still using two objects.
my $cnv7 = Math::NumberBase->new(7);
$n = $cnv7->to_decimal("6543210");
Math::Base::Convert (2012-2015) has quite a few features, including both function and object interfaces. The object method performs pre-optimizations, and it has a lot of layers trying to map to fast conversion functions.
Strangely it does not seem to support bases other than powers of 2 and its named bases. I have filed an RT, because this is not the documented functionality. The cnv
function returns either a value or an array depending on context, meaning I had to remember to wrap scalar()
around it in most uses. I'm not a fan of using context here.
My benchmarks did not show it as particularly fast in most conversions, whether using the function or object method. For Bigints it does look better, and I made changes to ntheory
based on this.
Convert::AnyBase (2009) uses Moose, which makes it a monster for dependencies compared to anything else. On the other hand, it uses this for some unique features. In particular, a user-defined sub can be applied to any input string, which allows things like case adjustment, handling negative inputs, invalid input, transforms (e.g. o to 0), etc.
A convenience downside is that a full symbol set must be given when creating the object. I used a shortcut thus:
my $fb=Convert::AnyBase->new(set => substr("0123456789abcdefghijklmnopqrstuvwxyz", 0, $base));
and as usual with most OO modules, two objects are needed for conversion between non-decimal bases.
Convert::BaseN (2008) is quite different than the other modules here, as it expects input and output as binary strings. I did not use it, as it really doesn't fit the use model here.
Math::Fleximal (2001-2005) is another module that doesn't really fit with the others. One creates flex objects that have a particular base, then one can do math with them, or convert to flexes with different bases. I didn't use this much, as it is overkill for simple individual number base conversions.
For Perl 6 you can use
https://github.com/bbkr/TinyID/blob/master/README.md
to convert to any base using provided alphabet.
Thanks! I added a small paragraph about Perl 6. The example on RosettaCode (http://rosettacode.org/wiki/Non-decimal_radices/Convert#Perl_6) shows how easy it is for standard bases 2-36, and I love how there's no wonkiness with int/double/bigint.