Module | Impl | Comb | Perm | Comb w/rep | Perm w/rep | Derange | Speed | Order | Comments |
---|---|---|---|---|---|---|---|---|---|

Algorithm::Combinatorics | XS | yes | yes | yes | yes | yes | + | Lexico | Fast iterator or array |

ntheory | XS | yes | yes | no | no | no | ++ | Lexico | Fast block call |

Math::Combinatorics | Perl | yes | yes | no | no | yes | - - | Impl | Iterator or array |

Algorithm::FastPermute | XS | no | yes | no | no | no | +++ | Impl | Fast block call |

Algorithm::Permute | XS | no | yes | no | no | no | + | Impl | Iterator or fast block call |

Algorithm::Loops | Perl | no | yes | no | no | no | + | Impl | Iterator |

List::Permutor | Perl | no | yes | no | no | no | - | Lexico | Iterator |

Iterator::Misc | Perl | no | yes | no | no | no | - - | Lexico | Iterator |

Math::Permute::Array | Perl | no | yes | no | no | no | - - | Impl | Iterator or index |

Math::Permute::List | Perl | no | yes | no | no | no | Impl | Block call | |

Math::GSL::Permutation | XS | no | yes | no | no | no | - | Lexico | function interface |

Math::Disarrange::List | Perl | no | no | no | no | yes | Impl | Block call | |

Math::GSL::Combination | XS | yes | no | no | no | no | + | Lexico | iterator or by index |

Some modules such as Algorithm::Combinatorics, ntheory, and List::Permutor give results in guaranteed lexicographic order. The other modules return data in an order corresponding to whatever internal algorithm is used. For an example unsorted 7 element array, each of the "Impl"-order modules gave a unique sequence (that is, each modules gave a different sequence from any other), while all "Lexico"-order modules gave identical sequences.

The speed is an approximate rating of how fast the permutations or combinations are generated with a relatively large set. Looping over the 479 million permutations of a 12 item set takes only 12 seconds for Algorithm::FastPermute, 1 minute for ntheory, 6 minutes for Algorithm::Permute, 12 minutes for Algorithm::Combinatorics and Algorithm::Loops, 30 minutes for List::Permutor, 37 minutes for Math::Combinatorics, 39 minutes for Math::GSL::Permutation, 42 minutes for the common example tsc-permute, 48 minutes for Iterator::Misc.

The perlfaq recommends List::Permutor, Algorithm::Permute, and Algorithm::Loops. I believe Algorithm::Combinatorics to be a better choice to point people to, as it is likely to cover all needs and calling styles, not just permutations. The results come in lexicographic order rather than implementation defined. It also has excellent documentation.

In all examples, assume we have done something like this setup, and wish to see all permutations of the data, or all combinations of 3 elements.

```
use feature 'say';
my @data = (qw/apple bread curry donut éclair/);
```

**Algorithm::Combinatorics**. This is probably what you're looking for. It has nearly everything you need and is pretty fast. Recommended. If you need the highest speed, Algorithm::FastPermute and ntheory are faster.

`use Algorithm::Combinatorics qw/combinations permutations/;`

my $citer = combinations(\@data, 3);

while (my $c = $citer->next) { say "@$c"; }

`my $piter = permutations(\@data);`

while (my $p = $piter->next) { say "@$p"; }

**ntheory**. XS and Perl block calls for permutations and combinations. Note that the source array isn't directly used -- each block invocation is given an array of indices rather than a direct permutation/combination of the source array.

`use ntheory qw/forcomb forperm/;`

forcomb { say "@data[@_]" } scalar(@data),3;

`forperm { say "@data[@_]" } scalar(@data);`

**Math::Combinatorics**. One of the slowest of the modules tested, but it does combinations, permutations, and derangements all without XS.

`use Math::Combinatorics; my $comb = Math::Combinatorics->new(count => 3, data => [@data]);`

while (my @c = $comb->next_combination) { say "@c" }

`while (my @p = $comb->next_permutation) { say "@p" }`

**Algorithm::FastPermute**. The fastest permutation generator -- for large arrays it is about 10x faster than ntheory, 50x faster than Algorithm::Permute, 60-500x faster than the other modules. Modifies the source array, but after a full permutation it will be in the original order.

```
use Algorithm::FastPermute qw/permute/;
permute { say "@data" } @data;
```

**Algorithm::Permute**. Decent permutation iterator. Also includes the FastPermute block generator that works just like that example.

```
use Algorithm::Permute;
my $perm = Algorithm::Permute->new(\@data);
while (my @set = $perm->next) { say "@set" }
```

**Algorithm::Loops**. Permutations through an iterator that modifies the source array. Combinations possible with some code. The array __must__ be sorted to work correctly, and if using sorted numbers, you must use NextPermuteNum.

```
use Algorithm::Loops qw/NextPermute/;
do { say "@data" } while NextPermute(@data);
```

**List::Permutor**. Yet another permutation iterator.

```
use List::Permutor;
my $perm = List::Permutor->new(@data);
while (my @set = $perm->next) { say "@set" }
```

**Iterator::Misc**. Yet another permutation iterator, also includes various other iteration functions.

```
use Iterator::Misc;
my $iter = ipermute(@data);
while (!$iter->is_exhausted) { say "@{$iter->value}" }
```

**Math::Permute::Array**. Yet another permutation iterator. Has a rather different syntax than most other modules. Also has a block call. Also allows direct access to permutation index, though without defined order.

```
use Math::Permute::Array;
my $perm = Math::Permute::Array->new(\@data);
say "@{$perm->cur()}";
for (1 .. $perm->cardinal()-1) { say "@{$perm->next()}" }
```

**Math::Permute::List**. A block permutation generator. Permission issue recently fixed, so it installs correctly now.

```
use Math::Permute::List;
permute { say "@_" } @data;
```

**Math::GSL::Permutation**. Uses the GSL permutation API, which is function based, with a few (incomplete) helper methods. Not recommended for this task unless you're already using GSL. Uses a different API than Combination. Permutes reasonably fast, but no quick way to retrieve the array of permutations (calling `as_list` takes 95% of the time). Also note we have to use private class data.

```
use Math::GSL::Permutation qw/:all/;
my $p = Math::GSL::Permutation->new(scalar(@data));
do {
say "@data[$p->as_list]";
} while !gsl_permutation_next($p->{_permutation});
```

**Math::Disarrange::List**. A block derangement generator. Permission issue recently fixed, so it installs correctly now.

```
use Math::Disarrange::List;
disarrange { say "@_" } @data;
```

**Math::GSL::Combination**. Documentation a bit incomplete. Not recommended for this task unless you're already using GSL. Inconsistent API with Permutation.

```
use Math::GSL::Combination qw/:all/;
my $c = Math::GSL::Combination->new(scalar(@data),3);
do {
say "@data[$c->as_list]";
$c->next()
} while !$c->status();
```

]]>
That said, I like the human-readable bit more than random hex digits, but it's something to consider.

]]>I've seen a few places that use something like: Pi(n) that gives n digits of Pi. I added that to the Perl5 ntheory library a couple months ago, though I'm sure a clean Perl6 implementation wouldn't be as fractured. The performance difference is minor at a few digits, but at larger sizes it's quite dramatic (e.g. for 1M digits it can be a couple seconds vs. hours vs. weeks).

Chudnovsky / Ramanujan binary splitting. Looks like 4-5x faster than AGM, though more complicated. Pari uses this (they have AGM commented out after doing both and comparing). This is also what is used in the example on the GMP library site. I've tried it with Math::BigFloat and it was slower than AGM for me (probably due to more but smaller operations, which kills you with overhead).

AGM (Gauss-Brent-Salamin). Fast and easy (~15 lines of Perl 5), not too many lines with GMP. Obviously the Perl5 implementation is oodles slower than C+GMP, but it still has the good growth rate. MPFR uses this, as does ntheory's GMP and Perl5 code. There's a patch for Math::BigFloat to use this for larger inputs, but it hasn't been accepted. There's a Perl6 implementation on RosettaCode, but pretty slow.

Machin type formulas. Math::BigFloat uses this. It's pretty good for small amounts but AGM kills it for many digits because the growth rate is better. With the default Calc backend there are faster ways at all sizes.

Spigot style. The nice thing is that it doesn't need bigints and can be easily done in standard C or Perl. There is something similar for Perl6 on RosettaCode. Terrible growth rate vs. the others, but it's pretty fast for small sizes: at 2000 digits the pure Perl version is over 10x faster than either Machin or AGM with Math::BigFloat's Calc backend.

As for a rational, it would be nice to see something that produced a Rational or FatRat of the desired accuracy. Rationals are limited to 64-bit denominators (S02) so the fixed continuous fractions would seem fine. I think using the Chudnosky formula plus the RC trick for square roots could do this for FatRats.

As an aside, I'm happy that Perl6 is using arbitrary size ints, but it seems Rat vs. FatRat is akin to the native vs. bigint that drives me nuts with Perl5 and Perl6 gets rid of. For a library that expects exact answers, now I have to constantly deal with input and output conversion, and performance vs. correctness tradeoffs.

]]>

Solution | Impl | Order anti-lex |
Order lexico |
Restrict count |
Restrict size |
Max in 10s | Count in 10s |
---|---|---|---|---|---|---|---|

ntheory 0.45 | XS | yes | no | yes | yes | 87 | 223,000 |

ntheory 0.45 | Perl | yes | no | yes | yes | 72 | 7,300 |

Integer::Partition 0.05 | Perl | yes | yes | no | no | 67 | - |

(unreleased, from Limbic 2004) |
Perl | no | yes | no | no | 62 | 6,000 |

MJD 2013 | Perl | no | no | no | no | 71 | - |

blokhead 2007 | Perl | yes | no | no | no | 63 | - |

kvale 2004 | Perl | yes | no | no | no | 62 | - |

sfink 2004 | Perl | yes | no | no | no | 58 | - |

tye 2001 | Perl | no | no | no | no | 58 | - |

(golfed, 73 chrs) |
Perl | no (73) yes(90) |
no | no | no | 21 | - |

Pari/GP 2.8.0 (not a Perl module!) |
C/Pari | no | no | yes | yes | 100 | 34,000,000 |

For counting, the fastest solutions use the Hardy-Ramanujan-Rademacher formula. The state of the art is Johansson's ARB, which is thousands of times faster than Pari. Pari also uses the Rademacher formula and is quite fast. Jonathan Bober has GPL code using GMP and MPFR that is a little faster than Pari, but MPFR isn't installed on most platforms (meaning it's hard to incorporate into a portable library). I'm using a triangle solution, which isn't too bad in C+GMP compared to Perl's bigints, but way off the fast formula. Integer::Partitions doesn't have a counting method.

ntheory (aka Math::Prime::Util) has a function taking a block (aka an anonymous sub), the number, and an optional hash of restriction parameters. The block is called for each partition with the argument list set to the partition. The restriction parameters are similar to Pari/GP's, with min/max count and min/max element size. This can save quite a bit of time doing filtering (sometimes with shortcuts) inside the XS code.

I debated call by value (a new argument list for each call) vs. call by reference (giving the caller access to internal data). Some XS iterators, e.g. Algorithm::Permute's fast permute, do the latter, and it is faster. I decided on the former because I like the idea that the caller can manipulate its arguments as desired without worrying about segfaults, incorrect iteration, infinite iteration, etc.

Typically the XS code would be used, but there are also pure Perl
implementations for everything. They are used if the XS Loader fails, the
environment variable `MPU_NO_XS` exists and is true, or in cases
where the arguments would overflow or are not properly parseable.

```
use ntheory qw/forpart partitions/;
forpart { say "@_"; } 8; # All partitions of 8
forpart { say "@_"; } 10,{nmin=>5,amax=>4}; # Only 5+ parts, all <= 4
say partitions(2000); # Counting
```

The Integer::Partition module has been on CPAN for a number of years, and is the only solution giving both lexicographic and anti-lexicographic ordering choices. It's reasonably fast unless you need larger values with restrictions, or want counts.

```
use Integer::Partition;
my $iter = Integer::Partition->new(8);
while (my $p = $iter->next) {
say "@$p";
}
```

Math::Pari
isn't in the table because it builds with Pari 2.1.7 by default.
`numbpart` (partition count) was added in version 2.2.5 (Nov 2003), and
`forpart` was added in 2.6.1 (Sep 2013). It's possible to build
Math::Pari with version 2.3.5 so we could get `numbpart` but not
`forpart`.

Pari's `forpart` is quite fast, and has some nice optimizations
for restrictions as well. The ordering is by number of elements rather than
a lexicographic ordering.
The only way to use this from Perl would be a system call to gp.

There are also golfed solutions in under 100 characters. We can even add a callback sub and anti-lexicographic sort and still come in at 93 characters. As usual with heavily golfed code, these are quite slow, managing only 21 partitions in under 10 seconds. This also uses an internal hash for the partitions, which means memory use will grow (though time grows faster so this isn't really an issue).

Here is my simple modification to the golfed solutions, taking 90 characters

for integer partitions in anti-lexicographic order with a callback sub. It's

very slow however, so just for fun.

sub p{my($s,@e,$o,@f)=@_;@f=sort{$b<=>$a}@e;$_{"@f"}||=$s->(@f);p($s,++$o,@e)while--$e[0]}

p(5);

]]>

Using the 'for $in -> $x { ... }' style was going quite slow, but the helpful people on #perl6 got be to try .get in a loop, e.g. 'while (my $x = $in.get) { ... }' which turns out to be much faster. Not only does it use almost no memory, it's 50% faster than the latest @d = "file".IO.lines.

BTW, this was meant to share my experience with using .get for reading a large file. Big thanks to Liz and others for speeding up Str.lines!

]]>132.1 Perl trial division mod 6

291.7 Perl trial division

9.8 Math::Prime::Util

2.5 Math::Prime::Util with precalc

6.7 Math::Prime::XS

On this machine, Math::Prime::XS's simple trial division loop is faster than the non-cached routine I use in MPU until 3e7. Part of this is that MPU uses UV internally while MPXS uses "unsigned long". On this machine UV is "unsigned long long" (64-bit) and unsigned long is only 32-bit. That means MPXS is 32-bit, so doesn't work past 2^32 and probably explains the speed difference as well.

]]>These aren't huge numbers, but from the Math::Prime::Util documentation:

is_prime from 10^100 to 10^100 + 0.2M 2.2s Math::Prime::Util (BPSW + 1 random M-R) 2.7s Math::Pari w/2.3.5 (BPSW) 13.0s Math::Primality (BPSW) 35.2s Math::Pari (10 random M-R) 38.6s Math::Prime::Util w/o GMP (BPSW) 70.7s Math::Prime::Util (n-1 or ECPP proof) 102.9s Math::Pari w/2.3.5 (APR-CL proof)

Math::Prime::Util with the GMP backend will support hundreds of thousands of digits, and is probably the fastest code for large numbers other than OpenPFGW's Fermat test, and is substantially faster than any of the other Perl modules. See this stackexchange challenge, or Nicely's list of first occurrence prime gaps where I used this module.

Caveat being that without Math::Prime::Util::GMP installed, it uses Math::BigInt (with GMP or Pari backend), which is super slow. My todo list has some sort of replacement to get a bigint solution that is both (1) portable assuming XS, and (2) reasonably fast. Also, there are some nice optimizations for x86_64 as well as 64-bit in general. It is still fast on non-x86 machines, but it will miss some of the better optimizations (asm mulmod, montgomery math).

Math::Pari, Math::GMP, Math::GMPz, and Math::Primality will support bigints pretty well. For the two GMP methods you'll have to decide how many tests to use. Math::Pari really needs to be updated to use a newer Pari by default -- the current version will do 10 M-R tests and is quite a bit slower than when built with Pari 2.3.5.

Math::Prime::XS does not support bigints. For 64-bit primes it is about 3-4 million times slower than MPU on my machine (but should be fast for most composites).

Math::Prime::FastSieve is going to eat a lot of memory and time making the sieve once we're past 10^8 or so. The answers are fast once done, but it's not the best solution. It took me 2 minutes to sieve to 10^10, and beyond that will take GB of memory.

Trial division is exponential time so even with C+GMP is not going to be practical past 25-30 digits (and is hideously slow at those sizes). The Perl code is just going to get worse.

Time for primality proofs is another discussion -- I'm writing some web pages on that since I realized I keep writing the same thing on forums.

For the largest known primes, we'd want to use a Lucas-Lehmer test since they are Mersenne primes. I have not added any special form tests (nor have the other modules), but the LL test is pretty straightforward. They would still take a long time. The largest currently known prime has 17,425,170 digits. Using code specifically made for this, it took 6 days on a 32-core server and 3.6 days on a GPU.

For a general form numbers, last year some people ran tests on a couple Wagstaff PRPs with ~4 million digits. OpenPFGW took 4-70 hours to show they were Fermat PRPs, and 5 days for the Lucas test. A fast Frobenius test implemented with GMP took slightly over one month.

]]>

Memory | Time | Solution |
---|---|---|

2096k | 72.6s | Perl trial division mod 6 |

2100k | 124.8s | Perl trial division |

3652k | 36.2s | Math::GMP |

3940k | 14.8s | Math::GMPz |

4040k | 1.9s | Math::Prime::Util (no GMP backend) |

4388k | 1.9s | Math::Prime::Util (with GMP backend) |

4568k | 1.4s | Math::Prime::Util (GMP + precalc) |

4888k | 4.4s | Math::Prime::XS |

5316k | 245.1s | Math::Primality |

5492k | 29.8s | Math::Pari |

6260k | 1.5s | Math::Prime::FastSieve |

~16MB | >1 year | regex |

Times are with perl 5.20.0 on an Ubuntu Core2 E7500 machine. I used `/usr/bin/time perl -E '`*...*`'` to get the time and memory use. With this system just starting up Perl on the command line takes about 2MB.

The first two entries are simple Perl trial division routines:

`# mod-6 wheel`

sub isprime { my($n) = @_; return ($n >=2) if $n < 4; return if ($n%2 == 0) || ($n%3 == 0); my $sn=int(sqrt($n)); for (my $i = 5; $i <= $sn; $i += 6) { return unless $n % $i && $n % ($i+2); } 1; }

$s += isprime($_) for 1..1e7; say $s;

```
# Standard method, from RosettaCode
sub isprime { my($n) = @_; $n % $_ or return for 2 .. sqrt $n; $n > 1; }
$s += isprime($_) for 1..1e7; say $s;
```

These have essentially no memory use, but are pretty slow especially as the input increases. However, very low memory and gets the job done for small inputs.

The modules Math::GMP and Math::GMPz have calls to GMP's mpz_probab_prime_p function. They are the lowest memory of the module solutions by a small margin, but not the fastest. They shouldn't slow down much with larger inputs.

Math::GMPz has a bit clunkier interface but is faster and exports the entire GMP integer API (which makes it use a little more memory to load). The speed will differ based on the number of tests: only 3 are required for this size, but we must have at minimum 11 for 64-bit inputs and probably more to avoid false results. Math::GMP has much more object overhead though I believe this is being worked on.

Like some other modules, the result of the primality test is either 2 (definitely prime), 1 (probably prime), or 0 (definitely composite). Using the double negation is an easy and fast way to make the result either 1 or 0.

`use Math::GMP; $i=Math::GMP->new(0); for (1..1e7) { $i++; $s += !!$i->probab_prime(15) } ; say $s;`

`use Math::GMPz; $i=Math::GMPz->new(0); for (1..1e7) { $i++; $s += !!Math::GMPz::Rmpz_probab_prime_p($i,15) }; say $s`

Math::Prime::Util is my number theory module. After working on cutting down memory use it's reasonably small even with many exportable functions. It is the fastest solution as well.

By default it will load the GMP back-end if available. This uses a little more memory, but can be turned off either by not having it installed or setting the environment variable `MPU_NO_GMP` to a non-zero value.

`use Math::Prime::Util "is_prime"; $s += !!is_prime($_) for 1..1e7; say $s;`

To match the behavior of Math::Prime::FastSieve, we can precalculate the primes, making `is_prime` a simple bit-set lookup.

`use Math::Prime::Util qw/is_prime prime_precalc/; prime_precalc(1e7); $s += !!is_prime($_) for 1..1e7; say $s;`

Math::Prime::XS does mod-6 trial division in C. It's fast for small inputs, but as expected from an exponential-time algorithm, will slow down a lot with large inputs. It uses more memory than I'd expect.

`use Math::Prime::XS "is_prime"; $s += is_prime($_) for 1..1e7; say $s;`

Math::Primality implements the BPSW algorithm in Perl using Math::GMPz. It really is better suited for bigint inputs, being both slow and memory intensive for this simple task.

`use Math::Primality "is_prime"; $s += !!is_prime($_) for 1..1e7; say $s;`

Math::Pari is the Perl interface to the old Pari 2.1.7 library. It has lots of functionality, but does take up almost 1.5MB more startup memory. While PARI/GP is reasonably efficient (about 4 seconds), the module returns the boolean result as a Math::Pari object which sucks up lots of time. The double negation is a little faster than directly summing the result.

`use Math::Pari qw/isprime/; $s += !!isprime($_) for 1..1e7; say $s;`

Math::Prime::FastSieve takes a little different approach. It's written using Inline::CPP and sieves a range into a bit vector, after which operations (such as `isprime`) can be efficiently performed. That does limit its range, but the time shows it's quite fast at this operation.

`use Math::Prime::FastSieve; my $sieve = Math::Prime::FastSieve::Sieve->new(1e7); $s += $sieve->isprime($_) for 1..1e7; say $s;`

Lastly we come to Abigail's regex. Very popular for code golfing and showing awesome regex hacks, it occasionally is seen as a practical recommendation from people who have clearly never used it for non-toy-size inputs. It's very cool. It's also not practical for larger inputs. For this task it took 6.8 seconds and 2684k for the first 10k, but 2607 seconds and 7144k for isprime for the first 100k integers. It takes over 1 minute to verify 999,983 is prime, and for 9,999,991 I killed it after 40 minutes. Hence my estimate of over a year to finish the sum-to-10M example.

`sub isprime { ('1' x shift) !~ /^1?$|^(11+?)\1+$/ } $s += isprime($_) for 1..1e7; say $s`

Conclusions

If you're using Moose or a long-running process, the memory use for any of the reasonable solutions here probably doesn't matter -- use what is easy and fast. For command-line programs or processes that are spun up just for a single task, the memory use can matter. My module was hitting 9MB before I finally had enough and reduced it substantially (a big chunk of that was by having functions go straight to XS and load up the thousands of lines of PP only if required). Even for such a simple task as this we can see sizes ranging from 2MB to 6MB, with over 2MB difference even between modules.

Another subject that is important, especially for making utility scripts, is startup time. This task did not measure that, but it can also be a bottleneck especially if comparing vs. standalone C programs that have essentially no startup cost.

]]>Math::Pari probably needs a co-maintainer with lots of time. So far I don't think anyone qualified has wanted to step up. It's a lot of work. On the plus side, the RT situation isn't quite so bad -- it would look a lot better with some pruning of duplicates and closing of fixed issues. There are a lot of issues that look like they're fixed but just haven't been closed.

This leads to the digression of how it would be nice to wean the remainder of the Perl crypto modules off of Math::Pari, but they're often in the same boat. The authors are around but don't have time to work on the modules and aren't ready to give them up. There's also the issue of the alternatives: Math::BigInt is core and portable, but super slow for this work without one of the backends, and the backends also have long-standing critical bugs. Math::GMP or Math::GMPz would be the obvious and best choices, but then we're requiring platforms to have GMP installed. I'm still trying to find time to get an alternative out, but it won't be ready in time for CPAN day.

]]>In many cases what I found was that often just using string storage was faster than Bit::Vector, merely because Perl optimizes the heck out of things like substr. Once the vector grows large (e.g. for Unary codes) then Bit::Vector is better. Using 32-bit vectors with bit twiddling in Perl was pretty close to Bit::Vector's speed for my operations. Of course it will differ based on your operations.

Using an XS back end for the bit manipulation results in ~70x speedup for this application (and another 2x speedup if I go straight to XS and skip the Perl Moo class entirely, but then you give up some extensibility).

]]>