Module Algorithm |
Stat Quality |
Interval | Bits in output |
Auto Seed |
Predictable | Speed |
---|---|---|---|---|---|---|

CORE::rand drand48 |
Bad | [0,1) | 48 | OK | Yes | 29000 k/s |

Math::Random::MTwist MT XS |
OK | [0,1) | 53 | OK | Yes | 14000 k/s |

ntheory ChaCha20 XS |
Good | [0,1) | 53-113 | Full | No | 12000 k/s |

Math::Prime::Util::GMP ISAAC XS |
Good | [0,1) | 53-64 | Good | No | 14000 k/s |

Crypt::PRNG Fortuna XS |
Good | [0,1) | 53 | Good | No | 700 k/s |

Math::Random::MT::Auto MT XS |
OK | [0,1) | 52 | Full | Yes | 5000 k/s |

Math::Random::MT MT XS |
OK | [0,1) | 32 | rand() | Yes | 2200 k/s |

Math::Random::Secure Math::Random::ISAAC |
Good | [0,1) | 32 | Full | No | 460 k/s |

Math::Random::ISAAC ISAAC XS or PP |
Good | [0,1] | 32 | None | No | 5700 k/s 480 k/s |

Math::Random::Xorshift Xorshift XS |
OK | [0,1] | 32 | Time | Yes | 16000 k/s |

**Statistically random.**Any good CSPRNG will qualify, as will a much faster modern PRNG such as PCG or Xoroshiro128+. The Mersenne Twister is quite good though inferior to the newer choices in a few ways. Xorshift has been found to not be very good but the latest in its family, Xoroshiro128+ is considered excellent. drand48, used by Perl for CORE::rand since 5.19.4, has*numerous*issues. It is, however, better than the crap-shoot we had in older Perls. Both are listed as having "serious defects" in Jones' guide.**Consistent semantics.**`rand()`returns a uniform floating point number in the half-open interval [0,1). This means we can get 0 but will not every get a 1. If given an argument, then our range goes up to but not including it -- we will get a random representable real from 0 up to but not including the number. This makes things like`int(rand(6))`return 0-5 and not occasionally give 6. This also lets us combine adjacent intervals knowing there is no overlap. Two of the modules get this wrong. This has nothing to do with the underlying generator, but with the method chosen to create the return value.**Quality outputs.**An NV in Perl is, by default, a double which can store up to 53 bits in the range [0,1). Some modules only fill the result with 32 bits. While this is fine for rolling dice, it won't give good results in some other scenarios. A Perl build with long double support may have 64 bits of mantissa, and with Quadmath support in 5.22 and later we get a full 113 bits to fill. I decided in my module that the best solution was to fill up the NV, giving them the maximum number of bits. It's no extra work for double and most long double builds. It is a little slower for quadmath, but clearly the customer wants lots of precision so we should honor that. I think anything less than 52 bits is substandard.**Good seeding available.**Some modules strive to seed with an O/S entropy source including Windows and other fallbacks if /dev/urandom isn't available. Perl's internal seed function tries to use /dev/urandom as well as a bit of random process state. Its main issue is that it only uses 32 bits. A couple modules use the current time, which is generally not a good method. Math::Random::ISAAC punts the problem to the caller on the theory that no seeding is better than bad seeding, so modules using it need a seeding layer. There are some subtleties in seeding ISAAC that some ignore.**Repeatable.**For testing or simulations we need to be able to repeat the result, therefore we have to be able to manually seed. All the modules except Crypt::PRNG support this. For true cryptographic use we really ought to have a forced-system-entropy-seed mode plus periodic reseeding. Crypt::PRNG operates in this way with no ability for user seeding.**Unpredictibility.**For many uses, such as simulations, this isn't necessary at all. For others it is absolutely critical (breaking online Poker in 1999). This is one of the things that separates CSPRNGs from their lesser (but faster) brethren. It is analagous to using a cryptographic hash (e.g. SHA2) vs. generic hash (e.g. SipHash). The former has stronger guarantees but is often an order of magnitude or more slower. For drand48 or Xorshift, after seeing a single output we know every number that will come out in the future as well as all the previous outputs. With Mersenne Twister it takes 624 results. Methods like PCG and Xoroshiro128+ are much harder to predict though it is not impossible. This feature is a CSPRNG requirement so ISAAC, Salsa20, ChaCha20, and Fortuna completely meet it.**Performant.**All other things being equal, we'd like it to be fast. A vast amount of time can be spent in the Perl / Module interface, making some of the solutions much slower than the underlying algorithm requires. Even when the interface is well optimized, it masks a lot of the difference, allowing algorithms like ChaCha20 and ISAAC to have not-dissimilar speeds from some fast regular PRNGs and substantially outperform much faster underlying algorithms in some modules. With a bulk interface such as`random_bytes()`things look slightly different. Note that the worst performing solution is still 100,000 calls per second, so this may be moot for your application if you just need a few numbers.

**CORE::rand**. Very fast. Separate context for each thread. 32-bit seed, 48-bit state. Very poor quality -- for example the 48th bit will always alternate 1,0,1,0,1,0,... while the 47th goes 0,0,1,1,0,0,1,1,... etc. The bottom bits are not at all random, which is a fundamental property of the type of generator using a non-prime modulus (2^48 in this case). It would be very interesting to see some alternatives (e.g. PCG and Xoroshiro128+) in the Perl core, similar to what we do with hash functions.

**Math::Random::MTwist**. Excellent alternative to core. It's fast and good quality. Random quality is what one would expect from Mersenne Twister. Lots of options. Threading should be done with OO interface.

**ntheory**. My module, which I added a ChaCha20 CSPRNG to. It was originally intended for speeding up large random prime construction vs. using Bytes::Random::Secure, but it is useful for a standard rand interface as well (srand, rand, irand, irand64, random_bytes, plus some more). ChaCha20 is the CSPRNG which is being used in new `arc4random` functions in *BSD, is the chosen method for /dev/urandom in Linux 4.18, is used in some new TLS methods, and is increasingly popular as a CSPRNG or stream cipher. The standard implementation isn't as fast as the ISAAC core, but there are AVX2 implementations that are a lot faster. Performance here doesn't suffer much vs. other alternatives, and it's in the top performance tier (albeit at the bottom of the leaders, but 2x faster than many other modules). Seeds at startup using a number of possible sources, and allows easy reseeding with a decent amount of system entropy. It also contains a full ChaCha20 implementation in Perl, as well as matching XS and Perl implementations of ISAAC, though using ISAAC is a build-time configuration option likely to be removed. Uses a single state and mutex lock for threads, which will be replaced by per-thread state in a later release.

**Math::Prime::Util::GMP**. Uses ISAAC in XS and runs quite a bit faster than the other ISAAC modules. Seeds at startup using /dev/urandom if possible. It wasn't really intended to be a primary module for this work, but it's an interesting comparison with the other ISAAC modules, especially the performance. No threading support.

**Crypt::PRNG**. An API for LibTomCrypt and part of the large CryptX package. It has a few choices, with the default being Fortuna -- an entropy pool framework with multiple entropy input streams, using AES in counter mode for the actual stream. Very high quality results, well done interface. The only downside is the slow performance, which is typical of the AES-CTR algorithm (when implemented without specific hardware support).

**Math::Random::MT::Auto**Not a bad module, but it seems to be superceded by MTwist, in that the latter is over 2x faster and doesn't seem to be missing many (any?) features.

**Math::Random::MT**. An earlier module than the above. Simpler, slower, and seeds from CORE::rand. Only allows 32-bits of state to be set by the user. Only fills returned doubles with 32-bits.

**Math::Random::Secure**. Gives a (srand,irand,rand) interface to Math::Random::ISAAC, including proper seeding. Large dependency list. Only fills returned doubles with 32-bits. Very slow, and even slower if Math::Random::ISAAC::XS isn't installed.

**Math::Random::ISAAC**. The first pure Perl implementation of ISAAC, widely copied or used by other modules. There is also an XS back end. Creates double output for rand() by dividing a random 32-bit int by 2^32-1 which makes the non-standard interval [0,1] and only uses 32-bits. No auto seeding is done.

**Math::Random::Xorshift**. A thin wrapper around Marsaglia's 2003 Xorshift PRNG. As the module author points out, this has some statistical flaws, though the later xorshift* algorithm fixed most of them and xoroshiro128+ is significantly better. Per thread context. Creates double output for rand() by dividing a random 32-bit int by 2^32-1 which makes the non-standard interval [0,1] and only uses 32-bits. Seeds with time. The primary advantage is that it runs faster than the other modules.

I recently took a look at the various modules that do base conversion (at least 9 modules, plus various standalone subroutines). Each has slightly different features and interfaces, and the performance at the extremes differs by over 10,000x. I've made some internal changes to ntheory based on my tests, which should show up in the next release.

Add: This is for Perl 5. Perl6 has native support for base conversions for bases 2-36 and seamless Bigint support. It Just Works. For larger bases or alternate encodings, bbkr's TinyID module can be used.

]]> Base conversion is used for different reasons. My interest has been purely for math reasons, rather than the more (in my view) web-oriented things such as base-64, base-85, and ASCII. ntheory will do base conversions independent of encoding if desired (an array of digit values), while most of the other modules here only deal with encoded numbers. Some modules are very flexible with encoding character sets, while others (including ntheory) stick with 0..9,a..z. Bigint handling varies greatly.Since I'm always interested in performance, here are some benchmarks. 1000 random 60-bit integers are created, then converted to the input base. The benchmark sub does @output = map { convert } @input, which helps remove overhead vs. converting only one in a sub. The output arrays are verified to match. `ntheory`

is an XS module, the rest (other than builtin and POSIX) are all pure Perl.

Base 6 to base 26:

```
Mastering Alg 3.35e-02/s
simple 0.602/s
Math::BaseConvert 0.605/s
Math::BaseCnv 0.611/s
Convert::AnyBase 57.4/s
Math::NumberBase 78.4/s
Math::BaseCalc 89.3/s
Math::Int2Base 105/s
ntheory 1627/s
```

Convert to base 2:

```
Mastering Alg 0.113/s
Math::BaseConvert 0.299/s
Math::BaseCnv 0.301/s
Math::Base::Convert 27.9/s
Convert::AnyBase 36.4/s
Math::NumberBase 46.7/s
simple 64.0/s
Math::BaseCalc 72.7/s
Math::Int2Base 91.7/s
builtin 1953/s
ntheory 2424/s
```

Convert from base 16:

```
Mastering Alg 0.271/s
Math::BaseConvert 1.92/s
simple 1.95/s
Math::BaseCnv 1.97/s
Math::Base::Convert 22.5/s
Math::BaseCalc 180/s
Math::NumberBase 190/s
Math::Int2Base 216/s
Convert::AnyBase 380/s
POSIX 3099/s
ntheory 3709/s
```

There are some builtins for common conversions. For instance:

```
sub to2 { sprintf "%b", shift; }
sub to16 { sprintf "%x", shift; }
sub from2 { unpack("N", pack("B32", substr("0" x 32 . shift, -32))); }
sub from16 { hex(shift); }
```

though they are limited to bases 2,8,10,16 and limited to the native word size (hex will warn if given a 33+bit input). The POSIX core module includes strtoul that will work with bases 2-36, but limited to the size of an unsigned long (which may or may not be the same size as Perl's UV integers!). These are quite fast.

There is some simple Perl code on Rosetta Code: Non-decimal radices/Convert. It works for bases 2-36, though not terribly quickly. It should be clear how to adjust it to handle other bases / encodings.

The book "Mastering Algorithms with Perl" (1999, errata1, errata2) has a base conversion routine. It's interesting, it works, but it's **very** slow -- over 10x slower than the much shorter RosettaCode routines. Bases 2-36, no bigints.

ntheory (2012-2016) is the one XS module in this list, so it's no surprise that it is the fastest, typically over 10x faster than the next fastest, and about the same speed as the builtins. It supports bases 2-2^{31} for arrays, but string encodings only 2-36 (lower case on output, either case for input).

```
fromdigits( $n ); # implied base 10, number/string/bigint
fromdigits( "1c8", 16 ); # bases 2-36, any length string
fromdigits( [1,2,0,0,2,1], 3 ); # bases 2-2^31, digits of number
```

This takes a number in the given base and turns it into a single base-10 integer, possibly a bigint.

For `todigits`

, the input `$n`

is always an integer, hence base 10. It can be an actual int, a string, or a bigint. The result is an array of digits in the optional base, which is 2-2^{31}.

```
todigits( $n ); # implied base 10, returns array of digits
todigits( $n, 2); # turns $n into an array of binary digits
todigits( $n, 23, 14); # exactly 14 base-23 digits
```

The optional third argument will either pad with leading zeros or truncate leading digits. It would be most common to use for things like needing exactly 32 binary digits.

Rather than rely on context (which too often goes wrong or requires wrapping in `scalar()`

), I decided to use a separate function to return a string.

```
todigitstring( $n ); # implied base 10, basically a no-op.
todigitstring( $n, 23 ); # bases 2-36, encoded lower case
todigitstring( $n, 2, 4); # bases 2-36, pads or truncates to 4 characters
```

Putting them together, we can convert strings in base 6 to base 27 like:

```
$n27 = todigitstring( fromdigits( $n6, 6 ), 27 );
```

Math::Int2Base (2008-2011) is a nice simple module that does base conversions with bases 2-62. In my benchmarks it is typically one of the fastest (barring ntheory and builtins), and the interface is very simple.

```
$s = int2base( $n, $b ); # converts $n (base-10) to $s (base-$b)
$n = base2int( $s, $b ); # converts $s (base-$b) to $n (base-10)
```

An optional third argument for `int2base`

will pad with leading zeros if desired. Input and output are hard-coded to 0..9,A..Z,a..z, so bases under 36 must use upper case for input, and upper case will be output.

BigInts are supported for `int2base`

if the input number is a bigint, and for `base2int`

if the base is a bigint, with a caveat. Math::BigInt and Math::Pari work, but Math::GMP and Math::GMPz will both give incorrect results due to some weird interaction with `int()`

. v0.59 and earlier of `ntheory`

has the exact same issue.

With v5.21 and newer, `int2base`

outputs a warning. I've filed an RT.

Math::BaseCnv (2003-2016) is also easy to use, and supports bases 2-64 by default, and many more encodings by name as well as custom settings if desired. By default upper case must be used for input and will be used in output.

```
$n27 = cnv( $n6, 6, 27 ); # convert input from first base into second.
```

As mentioned, really easy to use. The downside is that it is quite slow. If speed wasn't an issue and I wanted web-ish or arbitrary encodings, this is probably the module I'd use.

Math::BaseConvert is a fork of Math::BaseCnv to fix version/metadata issues that seem to now be resolved in the original. No surprise, the performance and functionality are almost identical. While the description says "fast functions [...]", it is not fast (Math::BaseCnv changed to "simple functions" a while back).

Math::BaseCalc (1999-2013) creates a conversion object that is then used in conversions to/from the base. There are few presets, but typically an array ref of characters is given to denote the encoding.

```
my $cnv7 = Math::BaseCalc->new(digits=>[0..6]);
$n = $cnv7->from_base("6543210");
```

Two objects are needed to convert between two non-decimal bases. Speed isn't bad. Bigints are supported by `to_base`

but not `from_base`

.

Math::NumberBase (2009) also uses a conversion object. Bases 2-36 use default lower-case symbols, but alternate sets can be given. There is a way to wrap two conversions together, though still using two objects.

```
my $cnv7 = Math::NumberBase->new(7);
$n = $cnv7->to_decimal("6543210");
```

Math::Base::Convert (2012-2015) has quite a few features, including both function and object interfaces. The object method performs pre-optimizations, and it has a lot of layers trying to map to fast conversion functions.

Strangely it does not seem to support bases other than powers of 2 and its named bases. I have filed an RT, because this is not the documented functionality. The `cnv`

function returns either a value or an array depending on context, meaning I had to remember to wrap `scalar()`

around it in most uses. I'm not a fan of using context here.

My benchmarks did not show it as particularly fast in most conversions, whether using the function or object method. For Bigints it does look better, and I made changes to `ntheory`

based on this.

Convert::AnyBase (2009) uses Moose, which makes it a monster for dependencies compared to anything else. On the other hand, it uses this for some unique features. In particular, a user-defined sub can be applied to any input string, which allows things like case adjustment, handling negative inputs, invalid input, transforms (e.g. o to 0), etc.

A convenience downside is that a full symbol set must be given when creating the object. I used a shortcut thus:

```
my $fb=Convert::AnyBase->new(set => substr("0123456789abcdefghijklmnopqrstuvwxyz", 0, $base));
```

and as usual with most OO modules, two objects are needed for conversion between non-decimal bases.

Convert::BaseN (2008) is quite different than the other modules here, as it expects input and output as binary strings. I did not use it, as it really doesn't fit the use model here.

Math::Fleximal (2001-2005) is another module that doesn't really fit with the others. One creates flex objects that have a particular base, then one can do math with them, or convert to flexes with different bases. I didn't use this much, as it is overkill for simple individual number base conversions.

]]>

Module | Impl | Comb | Perm | Comb w/rep | Perm w/rep | Derange | Speed | Order | Comments |
---|---|---|---|---|---|---|---|---|---|

Algorithm::Combinatorics | XS | yes | yes | yes | yes | yes | + | Lexico | Fast iterator or array |

ntheory | XS | yes | yes | no | no | no | ++ | Lexico | Fast block call |

Math::Combinatorics | Perl | yes | yes | no | no | yes | - - | Impl | Iterator or array |

Algorithm::FastPermute | XS | no | yes | no | no | no | +++ | Impl | Fast block call |

Algorithm::Permute | XS | no | yes | no | no | no | + | Impl | Iterator or fast block call |

Algorithm::Loops | Perl | no | yes | no | no | no | + | Impl | Iterator |

List::Permutor | Perl | no | yes | no | no | no | - | Lexico | Iterator |

Iterator::Misc | Perl | no | yes | no | no | no | - - | Lexico | Iterator |

Math::Permute::Array | Perl | no | yes | no | no | no | - - | Impl | Iterator or index |

Math::Permute::List | Perl | no | yes | no | no | no | Impl | Block call | |

Math::GSL::Permutation | XS | no | yes | no | no | no | - | Lexico | function interface |

Math::Disarrange::List | Perl | no | no | no | no | yes | Impl | Block call | |

Math::GSL::Combination | XS | yes | no | no | no | no | + | Lexico | iterator or by index |

Some modules such as Algorithm::Combinatorics, ntheory, and List::Permutor give results in guaranteed lexicographic order. The other modules return data in an order corresponding to whatever internal algorithm is used. For an example unsorted 7 element array, each of the "Impl"-order modules gave a unique sequence (that is, each modules gave a different sequence from any other), while all "Lexico"-order modules gave identical sequences.

The speed is an approximate rating of how fast the permutations or combinations are generated with a relatively large set. Looping over the 479 million permutations of a 12 item set takes only 12 seconds for Algorithm::FastPermute, 1 minute for ntheory, 6 minutes for Algorithm::Permute, 12 minutes for Algorithm::Combinatorics and Algorithm::Loops, 30 minutes for List::Permutor, 37 minutes for Math::Combinatorics, 39 minutes for Math::GSL::Permutation, 42 minutes for the common example tsc-permute, 48 minutes for Iterator::Misc.

The perlfaq recommends List::Permutor, Algorithm::Permute, and Algorithm::Loops. I believe Algorithm::Combinatorics to be a better choice to point people to, as it is likely to cover all needs and calling styles, not just permutations. The results come in lexicographic order rather than implementation defined. It also has excellent documentation.

In all examples, assume we have done something like this setup, and wish to see all permutations of the data, or all combinations of 3 elements.

```
use feature 'say';
my @data = (qw/apple bread curry donut éclair/);
```

**Algorithm::Combinatorics**. This is probably what you're looking for. It has nearly everything you need and is pretty fast. Recommended. If you need the highest speed, Algorithm::FastPermute and ntheory are faster.

`use Algorithm::Combinatorics qw/combinations permutations/;`

my $citer = combinations(\@data, 3);

while (my $c = $citer->next) { say "@$c"; }

`my $piter = permutations(\@data);`

while (my $p = $piter->next) { say "@$p"; }

**ntheory**. XS and Perl block calls for permutations and combinations. Note that the source array isn't directly used -- each block invocation is given an array of indices rather than a direct permutation/combination of the source array.

`use ntheory qw/forcomb forperm/;`

forcomb { say "@data[@_]" } scalar(@data),3;

`forperm { say "@data[@_]" } scalar(@data);`

**Math::Combinatorics**. One of the slowest of the modules tested, but it does combinations, permutations, and derangements all without XS.

`use Math::Combinatorics; my $comb = Math::Combinatorics->new(count => 3, data => [@data]);`

while (my @c = $comb->next_combination) { say "@c" }

`while (my @p = $comb->next_permutation) { say "@p" }`

**Algorithm::FastPermute**. The fastest permutation generator -- for large arrays it is about 10x faster than ntheory, 50x faster than Algorithm::Permute, 60-500x faster than the other modules. Modifies the source array, but after a full permutation it will be in the original order.

```
use Algorithm::FastPermute qw/permute/;
permute { say "@data" } @data;
```

**Algorithm::Permute**. Decent permutation iterator. Also includes the FastPermute block generator that works just like that example.

```
use Algorithm::Permute;
my $perm = Algorithm::Permute->new(\@data);
while (my @set = $perm->next) { say "@set" }
```

**Algorithm::Loops**. Permutations through an iterator that modifies the source array. Combinations possible with some code. The array __must__ be sorted to work correctly, and if using sorted numbers, you must use NextPermuteNum.

```
use Algorithm::Loops qw/NextPermute/;
do { say "@data" } while NextPermute(@data);
```

**List::Permutor**. Yet another permutation iterator.

```
use List::Permutor;
my $perm = List::Permutor->new(@data);
while (my @set = $perm->next) { say "@set" }
```

**Iterator::Misc**. Yet another permutation iterator, also includes various other iteration functions.

```
use Iterator::Misc;
my $iter = ipermute(@data);
while (!$iter->is_exhausted) { say "@{$iter->value}" }
```

**Math::Permute::Array**. Yet another permutation iterator. Has a rather different syntax than most other modules. Also has a block call. Also allows direct access to permutation index, though without defined order.

```
use Math::Permute::Array;
my $perm = Math::Permute::Array->new(\@data);
say "@{$perm->cur()}";
for (1 .. $perm->cardinal()-1) { say "@{$perm->next()}" }
```

**Math::Permute::List**. A block permutation generator. Permission issue recently fixed, so it installs correctly now.

```
use Math::Permute::List;
permute { say "@_" } @data;
```

**Math::GSL::Permutation**. Uses the GSL permutation API, which is function based, with a few (incomplete) helper methods. Not recommended for this task unless you're already using GSL. Uses a different API than Combination. Permutes reasonably fast, but no quick way to retrieve the array of permutations (calling `as_list` takes 95% of the time). Also note we have to use private class data.

```
use Math::GSL::Permutation qw/:all/;
my $p = Math::GSL::Permutation->new(scalar(@data));
do {
say "@data[$p->as_list]";
} while !gsl_permutation_next($p->{_permutation});
```

**Math::Disarrange::List**. A block derangement generator. Permission issue recently fixed, so it installs correctly now.

```
use Math::Disarrange::List;
disarrange { say "@_" } @data;
```

**Math::GSL::Combination**. Documentation a bit incomplete. Not recommended for this task unless you're already using GSL. Inconsistent API with Permutation.

```
use Math::GSL::Combination qw/:all/;
my $c = Math::GSL::Combination->new(scalar(@data),3);
do {
say "@data[$c->as_list]";
$c->next()
} while !$c->status();
```

]]>

Solution | Impl | Order anti-lex |
Order lexico |
Restrict count |
Restrict size |
Max in 10s | Count in 10s |
---|---|---|---|---|---|---|---|

ntheory 0.45 | XS | yes | no | yes | yes | 87 | 223,000 |

ntheory 0.45 | Perl | yes | no | yes | yes | 72 | 7,300 |

Integer::Partition 0.05 | Perl | yes | yes | no | no | 67 | - |

(unreleased, from Limbic 2004) |
Perl | no | yes | no | no | 62 | 6,000 |

MJD 2013 | Perl | no | no | no | no | 71 | - |

blokhead 2007 | Perl | yes | no | no | no | 63 | - |

kvale 2004 | Perl | yes | no | no | no | 62 | - |

sfink 2004 | Perl | yes | no | no | no | 58 | - |

tye 2001 | Perl | no | no | no | no | 58 | - |

(golfed, 73 chrs) |
Perl | no (73) yes(90) |
no | no | no | 21 | - |

Pari/GP 2.8.0 (not a Perl module!) |
C/Pari | no | no | yes | yes | 100 | 34,000,000 |

For counting, the fastest solutions use the Hardy-Ramanujan-Rademacher formula. The state of the art is Johansson's ARB, which is thousands of times faster than Pari. Pari also uses the Rademacher formula and is quite fast. Jonathan Bober has GPL code using GMP and MPFR that is a little faster than Pari, but MPFR isn't installed on most platforms (meaning it's hard to incorporate into a portable library). I'm using a triangle solution, which isn't too bad in C+GMP compared to Perl's bigints, but way off the fast formula. Integer::Partitions doesn't have a counting method.

ntheory (aka Math::Prime::Util) has a function taking a block (aka an anonymous sub), the number, and an optional hash of restriction parameters. The block is called for each partition with the argument list set to the partition. The restriction parameters are similar to Pari/GP's, with min/max count and min/max element size. This can save quite a bit of time doing filtering (sometimes with shortcuts) inside the XS code.

I debated call by value (a new argument list for each call) vs. call by reference (giving the caller access to internal data). Some XS iterators, e.g. Algorithm::Permute's fast permute, do the latter, and it is faster. I decided on the former because I like the idea that the caller can manipulate its arguments as desired without worrying about segfaults, incorrect iteration, infinite iteration, etc.

Typically the XS code would be used, but there are also pure Perl
implementations for everything. They are used if the XS Loader fails, the
environment variable `MPU_NO_XS` exists and is true, or in cases
where the arguments would overflow or are not properly parseable.

```
use ntheory qw/forpart partitions/;
forpart { say "@_"; } 8; # All partitions of 8
forpart { say "@_"; } 10,{nmin=>5,amax=>4}; # Only 5+ parts, all <= 4
say partitions(2000); # Counting
```

The Integer::Partition module has been on CPAN for a number of years, and is the only solution giving both lexicographic and anti-lexicographic ordering choices. It's reasonably fast unless you need larger values with restrictions, or want counts.

```
use Integer::Partition;
my $iter = Integer::Partition->new(8);
while (my $p = $iter->next) {
say "@$p";
}
```

Math::Pari
isn't in the table because it builds with Pari 2.1.7 by default.
`numbpart` (partition count) was added in version 2.2.5 (Nov 2003), and
`forpart` was added in 2.6.1 (Sep 2013). It's possible to build
Math::Pari with version 2.3.5 so we could get `numbpart` but not
`forpart`.

Pari's `forpart` is quite fast, and has some nice optimizations
for restrictions as well. The ordering is by number of elements rather than
a lexicographic ordering.
The only way to use this from Perl would be a system call to gp.

There are also golfed solutions in under 100 characters. We can even add a callback sub and anti-lexicographic sort and still come in at 93 characters. As usual with heavily golfed code, these are quite slow, managing only 21 partitions in under 10 seconds. This also uses an internal hash for the partitions, which means memory use will grow (though time grows faster so this isn't really an issue).

Here is my simple modification to the golfed solutions, taking 90 characters

for integer partitions in anti-lexicographic order with a callback sub. It's

very slow however, so just for fun.

sub p{my($s,@e,$o,@f)=@_;@f=sort{$b<=>$a}@e;$_{"@f"}||=$s->(@f);p($s,++$o,@e)while--$e[0]}

p(5);

]]>

Memory | Time | Solution |
---|---|---|

2096k | 72.6s | Perl trial division mod 6 |

2100k | 124.8s | Perl trial division |

3652k | 36.2s | Math::GMP |

3940k | 14.8s | Math::GMPz |

4040k | 1.9s | Math::Prime::Util (no GMP backend) |

4388k | 1.9s | Math::Prime::Util (with GMP backend) |

4568k | 1.4s | Math::Prime::Util (GMP + precalc) |

4888k | 4.4s | Math::Prime::XS |

5316k | 245.1s | Math::Primality |

5492k | 29.8s | Math::Pari |

6260k | 1.5s | Math::Prime::FastSieve |

~16MB | >1 year | regex |

Times are with perl 5.20.0 on an Ubuntu Core2 E7500 machine. I used `/usr/bin/time perl -E '`*...*`'` to get the time and memory use. With this system just starting up Perl on the command line takes about 2MB.

The first two entries are simple Perl trial division routines:

`# mod-6 wheel`

sub isprime { my($n) = @_; return ($n >=2) if $n < 4; return if ($n%2 == 0) || ($n%3 == 0); my $sn=int(sqrt($n)); for (my $i = 5; $i <= $sn; $i += 6) { return unless $n % $i && $n % ($i+2); } 1; }

$s += isprime($_) for 1..1e7; say $s;

```
# Standard method, from RosettaCode
sub isprime { my($n) = @_; $n % $_ or return for 2 .. sqrt $n; $n > 1; }
$s += isprime($_) for 1..1e7; say $s;
```

These have essentially no memory use, but are pretty slow especially as the input increases. However, very low memory and gets the job done for small inputs.

The modules Math::GMP and Math::GMPz have calls to GMP's mpz_probab_prime_p function. They are the lowest memory of the module solutions by a small margin, but not the fastest. They shouldn't slow down much with larger inputs.

Math::GMPz has a bit clunkier interface but is faster and exports the entire GMP integer API (which makes it use a little more memory to load). The speed will differ based on the number of tests: only 3 are required for this size, but we must have at minimum 11 for 64-bit inputs and probably more to avoid false results. Math::GMP has much more object overhead though I believe this is being worked on.

Like some other modules, the result of the primality test is either 2 (definitely prime), 1 (probably prime), or 0 (definitely composite). Using the double negation is an easy and fast way to make the result either 1 or 0.

`use Math::GMP; $i=Math::GMP->new(0); for (1..1e7) { $i++; $s += !!$i->probab_prime(15) } ; say $s;`

`use Math::GMPz; $i=Math::GMPz->new(0); for (1..1e7) { $i++; $s += !!Math::GMPz::Rmpz_probab_prime_p($i,15) }; say $s`

Math::Prime::Util is my number theory module. After working on cutting down memory use it's reasonably small even with many exportable functions. It is the fastest solution as well.

By default it will load the GMP back-end if available. This uses a little more memory, but can be turned off either by not having it installed or setting the environment variable `MPU_NO_GMP` to a non-zero value.

`use Math::Prime::Util "is_prime"; $s += !!is_prime($_) for 1..1e7; say $s;`

To match the behavior of Math::Prime::FastSieve, we can precalculate the primes, making `is_prime` a simple bit-set lookup.

`use Math::Prime::Util qw/is_prime prime_precalc/; prime_precalc(1e7); $s += !!is_prime($_) for 1..1e7; say $s;`

Math::Prime::XS does mod-6 trial division in C. It's fast for small inputs, but as expected from an exponential-time algorithm, will slow down a lot with large inputs. It uses more memory than I'd expect.

`use Math::Prime::XS "is_prime"; $s += is_prime($_) for 1..1e7; say $s;`

Math::Primality implements the BPSW algorithm in Perl using Math::GMPz. It really is better suited for bigint inputs, being both slow and memory intensive for this simple task.

`use Math::Primality "is_prime"; $s += !!is_prime($_) for 1..1e7; say $s;`

Math::Pari is the Perl interface to the old Pari 2.1.7 library. It has lots of functionality, but does take up almost 1.5MB more startup memory. While PARI/GP is reasonably efficient (about 4 seconds), the module returns the boolean result as a Math::Pari object which sucks up lots of time. The double negation is a little faster than directly summing the result.

`use Math::Pari qw/isprime/; $s += !!isprime($_) for 1..1e7; say $s;`

Math::Prime::FastSieve takes a little different approach. It's written using Inline::CPP and sieves a range into a bit vector, after which operations (such as `isprime`) can be efficiently performed. That does limit its range, but the time shows it's quite fast at this operation.

`use Math::Prime::FastSieve; my $sieve = Math::Prime::FastSieve::Sieve->new(1e7); $s += $sieve->isprime($_) for 1..1e7; say $s;`

Lastly we come to Abigail's regex. Very popular for code golfing and showing awesome regex hacks, it occasionally is seen as a practical recommendation from people who have clearly never used it for non-toy-size inputs. It's very cool. It's also not practical for larger inputs. For this task it took 6.8 seconds and 2684k for the first 10k, but 2607 seconds and 7144k for isprime for the first 100k integers. It takes over 1 minute to verify 999,983 is prime, and for 9,999,991 I killed it after 40 minutes. Hence my estimate of over a year to finish the sum-to-10M example.

`sub isprime { ('1' x shift) !~ /^1?$|^(11+?)\1+$/ } $s += isprime($_) for 1..1e7; say $s`

Conclusions

If you're using Moose or a long-running process, the memory use for any of the reasonable solutions here probably doesn't matter -- use what is easy and fast. For command-line programs or processes that are spun up just for a single task, the memory use can matter. My module was hitting 9MB before I finally had enough and reduced it substantially (a big chunk of that was by having functions go straight to XS and load up the thousands of lines of PP only if required). Even for such a simple task as this we can see sizes ranging from 2MB to 6MB, with over 2MB difference even between modules.

Another subject that is important, especially for making utility scripts, is startup time. This task did not measure that, but it can also be a bottleneck especially if comparing vs. standalone C programs that have essentially no startup cost.

]]>** invmod(a,n)** computes the modular inverse. Similar to Pari's

** vecsum(...)** returns the sum of a list of integers. What about List::Util::sum! I was using that, until I started wondering why my results were sometimes incorrect. Try:

my $n = int(2**53); die unless $n+13 == int(sum($n,13));

The problem is that List::Util turns everything into an NV, which means it lops off the lower 11 or so bits of 64-bit integers. Oops. min and max have the same problem, so the next release of MPU will probably add vecmin and vecmax functions.

** binomial(n,k)** computes a binomial coefficient. I discussed this in a previous post. It uses the full Kronenburg definition for negative arguments, and as much computation in C as possible for speed.

** forpart { ... } n[,{...}]** loops over integer partitions. Pari 2.6 added this, and I thought I would as well. In its simplest form, it gives the additive partitions like the Integer::Partition module, just much faster (albeit not using an iterator). It also allows restrictions to be given, such as

forpart { say "@_" } 10,{n=>5}to only show the partitions with exactly 5 elements. We can use amin and amax to restrict by element size; nmin and nmax to restrict by number of elements. Of course any restriction can be done inside the block, but using the passed-in restriction means the block doesn't get called at all -- important when the number of unrestricted partitions is in the tens or hundreds of millions.

Performance

Wojchiech Izykowski has been working on fast Miller-Rabin testing for some time, including, for a few years now, hosting the Best known SPRP bases collection. He's also been working on Fast Miller-Rabin primality test code. I sent him a 1994 paper by Arazi a while back, which he managed to turn into a performance improvement to his 2^64 modular inverse, and some more back and forth led to even faster code. That combined with a few other changes to the Montgomery math resulted in a close to 50% speedup in the primality tests, which were already blazingly fast. On my 4770k it's now less than a microsecond at any 64-bit size, and 2-5x faster than Pari/GP. The caveat being that the fastest Montgomery path is only used on x86_64. Regardless of platform, the results for any 64-bit number are deterministic -- there are no false results because we use the BPSW test.

I also made a small speedup for small integer factoring, which helps speed up a few functions that call it, e.g. euler_phi, divisor_sum, moebius, etc. Useful for shaving off a little time from naive Project Euler solutions perhaps. I had a few tasks that did a lot of calling of these functions for small-ish values, and while they're already pretty fast, every little bit helps.

What's next?

For minor updates, I already mentioned vecmin and vecmax. I have some improvements to legendre_phi that should be done. I'm thinking is_power may get an optional third argument like Pari that gets set to the root.

Math::Prime::Util::GMP has implementations of valuation, invmod, is_pseudoprime, and binomial now, to help speed those up. I'll probably add vecsum as well. I have a speedup coming for primality testing of numbers in the 40 to 5000 digit range, which is important for the applications I'm running.

Lastly, I'm really hoping to get time to make an XS bigint module, which should give a big boost to the cases where we need bigint processing but don't have GMP. Math::BigInt is a lot slower than I'd like.

]]>]]> Since the base of Math::Prime::Util is in C, I first needed a solution in C. MJD has an old blog post and followup from 2007 related to overflow. This mostly works, but we need a way to detect overflow. RosettaCode has an idea, though overall the code is worse than MJD's.

Experience has shown that once I have to leave the C code, performance takes a huge hit. Hence it would be best to handle everything possible here. I did a little experiment, calculating the 5151 cases of binomials with n in 0..100 and k in 0..n. All but 1355 of them have a 64-bit result, so this is (without going too crazy) the best we can get.

- RosettaCode's example bails on 1990 cases, including (38,13).
- MJD's base code bails on 1617 cases, including (63,29).
- Adding a gcd bails on 1389 cases, including (76,21).
- Adding a second gcd bails on 1355 cases, all having a result > 2^64.

The single gcd is exactly what MJD suggests in his followup blog and gets most of the cases. However, for `r = r * n/d`, if we first reduce `n/d` then `r/d`, that handles the other cases. The cost is an extra gcd in C (only when it looks like we might overflow), to save us a call to either a GMP binomial turned into a Math::BigInt (not too bad), or to Perl's Math::BigInt bnok (a *lot* slower).

static UV gcd_ui(UV x, UV y) { if (y < x) { UV t = x; x = y; y = t; } while (y > 0) { UV t = y; y = x % y; x = t; /* y1 <- x0 % y0 ; x1 <- y0 */ } return x; } UV binomial(UV n, UV k) { UV d, g, r = 1; if (k >= n) return (k == n); if (k > n/2) k = n-k; for (d = 1; d <= k; d++) { if (r >= UV_MAX/n) { /* Possible overflow */ UV nr, dr; /* reduced numerator / denominator */ g = gcd_ui(n, d); nr = n/g; dr = d/g; g = gcd_ui(r, dr); r = r/g; dr = dr/g; if (r >= UV_MAX/nr) return 0; /* Unavoidable overflow */ r *= nr; r /= dr; n--; } else { r *= n--; r /= d; } } }

Negative Arguments

Once the C code overflows (or the XS layer decides it can't understand the arguments), I go to GMP if available, and Math::BigInt's *bnok* if not. When I started testing negative arguments, things got interesting. First let's look at the well defined case: `n < 0, k >= 0`. Knuth 1.2.6.G is quite standard: `binomial(-n,k) = (-1)^k * binomial(n+k-1,k)`. This is handled correctly by Mathematica, Pari, and GMP, but not by Math::BigInt. An RT has been filed (the worst part is that it gives different answers depending on which back end is used).

Things get murky when looking at negative k. Knuth 1.2.6.B indicates that for a positive n, the binomial is 0 when `k < 0` or `k > n`. But what about negative n? Pari says it is 0 in this case as well. GMP's API doesn't allow negative k at all. Math::BigInt also says it is always zero. But Mathematica references Kronenburg 2011 and defines it as `(-1)^(n-k) * binomial(-k-1,n-k)` when `n < 0` and `k <= n`.

I've decided to follow the full Kronenburg definition for negative arguments. This means doing some additional work in the XS and GMP wrappers, as well as wrapping up Math::BigInt's function.

For GMP, this is relatively easy to handle, given char* strn and strk:

mpz_init_set_str(n, strn, 10);

mpz_init_set_str(k, strk, 10);

if (mpz_sgn(k) < 0) { /* Handle negative k */

if (mpz_sgn(n) >= 0 || mpz_cmp(k,n) > 0) mpz_set_ui(n, 0);

else mpz_sub(k, n, k);

}

mpz_bin_ui(n, n, mpz_get_ui(k));

/* ... return result n as appropriate ... */

mpz_clear(k); mpz_clear(n);

For XS and native-int Perl it's slightly trickier, because we have to overflow in the case where the unsigned binomial succeeded and we need to negate the result but the high bit is set -- hence it can't be represented as a signed integer. Wrapping Math::BigInt isn't too different than GMP. I've put a patch in the RT to do the modifications inside Math::BigInt, but it remains to be reviewed and further tested.

]]>The usual speed improvements in various areas, some approximation improvements, and new functions. Primality testing benchmarks also show Perl has state of the art solutions.

]]> New functions:`twin_prime_count`and`nth_twin_prime`, similar to the regular`prime_count`and`nth_prime`functions, give the count or value for twin primes.`twin_prime_count_approx`and`nth_twin_prime_approx`, also similar to the standard functions, give fast approximations, which are especially useful for very large inputs.`random_shawe_taylor_prime`generates random proven primes using the FIPS 186-4 method. A*...*`_with_cert`version is also available that returns a primality certificate. This is somewhat faster than the random Maurer primes, but returns a smaller subset, so is not used for the generic`random_proven_prime`function. It's nice to have if one wants FIPS behavior.

- GMP versions of
`is_power`and`exp_mangoldt`, so these run faster for large inputs.

Two big speedups in GMP primality tests. ECPP has updated polynomial data and some performance updates to keep its lead as the fastest open source primality proof for up to 500 digits. It remains competitive with Pari and the newest mpz_aprcl for quite a while after. With the larger poly set doesn't do too bad compared with Primo up to 2000 digits.

AKS has an improved algorithm using changes from Bernstein and Voloch, with a nice r/s heuristic from Bornemann. It runs about 200x faster now, making it, by quite a bit, the fastest publicly available AKS implementation. Even this updated version is still millions of times slower than ECPP. Repeat after me: AKS is important for the theory, but is not a practical algorithm...

Shows primality timing measurements for a number of open source solutions. All but APRCL and Primo are included in Math::Prime::Util.

]]>If you want to factor 30+ digit numbers I recommend additionally installing Math::BigInt::GMP and Math::Prime::Util::GMP.

factor.pl 2**63-1

factor.pl 'nth_prime(100000)*pn_primorial(10)*random_nbit_prime(90)'

This is just a brief look so I chose some easy numbers:

2**38-1 (3,174763,524287)

2**90-1 (3,3,3,7,11,19,31,73,151,331,631,23311,18837001)

2**150-1 (3,3,7,11,31,151,251,331,601,1801,4051,100801,

10567201,1133836730401)

and some "hard" numbers:

2**62-1 (3,715827883,2147483647)

2**95-1 (31,191,524287,420778751,30327152671)

2**195-1 (7,31,79,151,8191,121369,145295143558111,

134304196845099262572814573351)2**250-1 (3,11,31,251,601,1801,4051,229668251,269089806001,

4710883168879506001,5519485418336288303251)

An aside: *everyone's thoughts on what "small", "big", "easy", "hard" mean differ. To some, a 19-digit number is big, while to many people working on modern factoring that size is completely trivial, and anything under 100-digits is a yawn. My goal is to make it easy to factor 50-80 digit numbers in a reasonable time from Perl. If you are seriously looking for larger or better methods, I recommend yafu, msieve, GMP-ECM, and GGNFS.*

So first let's look at the modules:

- Math::Big::Factors. I believe this was intended as an example of using Math::BigInt, but there are some modules actually using it. The main problem is that is super slow (slower than a simple trial division loop in many cases).
- Math:Factor::XS. Nicely coded trial division in C. Great for easy native size integers, slower than need be for ones with two large factors, and doesn't support bigints.
- Math::Factoring. Leto's placeholder for factoring algorithms in Perl+Math::GMPz. Unfortunately it currently only does trial division, so it only works well on easy numbers. Also be careful to give it only native or Math::GMPz inputs or it blows up.
- Math::Pari. Perl interface to the Pari library (2.1 by default, 2.3 possible if hand-built, 2.5+ not supported). Downside: licensing and portability issues on some platforms. Upside:
**lots**of functions, integer factoring is fast and has a relatively predictable slowdown as the input gets larger. Uses SQUFOF, Pollard Rho, ECM, and MPQS. - Math::Prime::Util. The module I've been working on lately. Small numbers are factored in C using Pollard Rho, SQUFOF, Pollard p-1, and HOLF. For BigInts we see if Math::Prime::Util::GMP is installed and use that if so, otherwise various Perl methods are used: Trial, HOLF, Pollard's Rho, Pollard's p-1, and 2-stage ECM.
- Math::Prime::Util::GMP. This module contains C+GMP code for various tasks and is meant to be a backend for Math::Prime::Util -- install it if you have GMP and most big integer functions speed up. For factoring this is
**much**faster than Perl+Math::BigInt::GMP. It uses a combination of Trial, native SQUFOF, 2-stage Pollard p-1, 2-stage ECM, Pollard Rho, and a simple Quadratic Sieve.

I took a fresh 5.16.2 perlbrew on an x86 Ubuntu machine and installed Math::BigInt::GMP to start, then tried various modules on some simple examples. I installed fresh versions of each module with CPAN before running. Timing results, in seconds, from the command line:

38 90 150 62 95 195 250 ------------ ------ ------ ------ ------ ------ ------ ------ Math::Big 13.7 369 >600 >600 >600 MF::XS 0.02 1.78 M::Factoring 0.22 0.12 7.86 493 302 >600 MPU 0.05 0.16 1.39 0.05 2.03 120+ 600+ MPU::GMP 0.05 0.05 0.06 0.05 0.06 0.40 1.57 Math::Pari 0.05 0.06 0.06 0.06 0.06 0.35 5.65

This is just a gross look, factoring a single number, so anything under 0.1 seconds is reflecting the overhead of starting Perl more than the actual time taken. Both Math::Pari and Math::Prime::Util do some startup work such as creating a small set of primes used for other functions, which the other modules don't do.

Math::Big::Factor: I tried different sizes for the factor wheel and 5 was the fastest for these examples. It took over 6 minutes to factor 2^90-1, which a trial division loop can do in under a second. I stopped it after 15 minutes on 2^150-1, which a trial division loop can do in a bit over 5 minutes.

Math::Factor::XS is super fast for the small example, but certainly slower than need be for 2^62-1. Its maximum size is MAX(sizeof(unsigned long), sizeof(UV)), so either 32-bit or 64-bit.

Math::Factoring will be pretty interesting once algorithms get added, and I'm thinking of submitting a patch to support some of the simpler methods from MPU like Pollard Rho (with Brent's improvements), 2-stage p-1, Fermat, HOLF, and possibly ECM. Using Math::GMPz directly should make it faster than using Math::BigInt objects.

Math::Prime::Util using pure Perl does a pretty good job on most of the numbers. The two largest examples are found using ECM, which is why the '+' sign on the time -- exactly how fast they get found depends on the random curves selected.

Math::Pari and Math::Prime::Util::GMP are the clear time winners, finding all the factors of the small examples basically instantly. Both of them factor random 30-digit numbers in under 10ms, and average under a second for random 60 digit numbers.

Edit for reference: command lines used:

]]>

perl -Mbigint=lib,GMP -MMath::Big::Factors=factors_wheel -E 'my $n = 2**38-1; say join ",", factors_wheel($n, 5);'

perl -MMath::Factor::XS=prime_factors -E 'my $n = 2**38-1; say join ",", prime_factors($n);'

perl -MMath::Factoring=factor -Mbigint=lib,GMP -E 'my $n = 2**38-1; say join ",", factor(Math::GMPz->new("$n"));'

perl -MMath::Prime::Util=factor -Mbigint=lib,GMP -E 'my $n = 2**38-1; say join ",", factor($n);

perl -MMath::Pari=factorint -Mbigint=lib,GMP -E 'my $n = 2**38-1; my ($pn,$pc) = @{factorint($n)}; say join ",", map { ($pn->[$_]) x $pc->[$_] } 0 .. $#$pn'

I decided to go ahead and do the Pari removal, and it's on CPAN as Alt::Crypt::RSA::BigInt. This uses Ingy's Alt framework -- install the Alt version and you get a new shiny Crypt::RSA. Install the old version and you're back again. There are no Pari dependencies, and if Math::BigInt::GMP is used, it's **5 to 11 times faster**. I decided to use Math::BigInt directly, as that makes it very portable, though the speed without either the GMP or Pari backend leaves much to be desired.

This also fixes, to the best of my knowledge, all 11 of the open RT issues with Crypt::RSA, and adds some new features (e.g. SHA256, SHA224, SHA384, SHA512 support, fix Key::Private::SSH, and add new ciphers). It also fixes some serious issues with the key generation caused by Crypt::Primes. I intentionally left the API and internal structure as identical as possible. There's an argument to be made for a new RSA module, but that wasn't the point for this change.

The conversion of the math inside Crypt::RSA was relatively straightforward. The big issues were the dependencies. Math::Prime::Util has fast random prime generation and primality testing, which solves that portion. Fortuitously, David Oswald released Bytes::Random::Secure right before I started on the conversion, and it met all my needs for replacing Crypt::Random.

]]>