## A comparison of memory use for primality modules

Performance and memory use are two areas of concern for many libraries. Last December I made a number of changes in my Math::Prime::Util module to reduce memory use, and bulk88 helped really tweak some of the XS. I thought I'd pick a simple task and look at the speed and memory use of a number of solutions.

Memory | Time | Solution |
---|---|---|

2096k | 72.6s | Perl trial division mod 6 |

2100k | 124.8s | Perl trial division |

3652k | 36.2s | Math::GMP |

3940k | 14.8s | Math::GMPz |

4040k | 1.9s | Math::Prime::Util (no GMP backend) |

4388k | 1.9s | Math::Prime::Util (with GMP backend) |

4568k | 1.4s | Math::Prime::Util (GMP + precalc) |

4888k | 4.4s | Math::Prime::XS |

5316k | 245.1s | Math::Primality |

5492k | 29.8s | Math::Pari |

6260k | 1.5s | Math::Prime::FastSieve |

~16MB | >1 year | regex |

Times are with perl 5.20.0 on an Ubuntu Core2 E7500 machine. I used `/usr/bin/time perl -E '`*...*`'` to get the time and memory use. With this system just starting up Perl on the command line takes about 2MB.

The first two entries are simple Perl trial division routines:

```
# mod-6 wheel
sub isprime { my($n) = @_; return ($n >=2) if $n < 4; return if ($n%2 == 0) || ($n%3 == 0); my $sn=int(sqrt($n)); for (my $i = 5; $i <= $sn; $i += 6) { return unless $n % $i && $n % ($i+2); } 1; }
$s += isprime($_) for 1..1e7; say $s;
```

```
# Standard method, from RosettaCode
sub isprime { my($n) = @_; $n % $_ or return for 2 .. sqrt $n; $n > 1; }
$s += isprime($_) for 1..1e7; say $s;
```

These have essentially no memory use, but are pretty slow especially as the input increases. However, very low memory and gets the job done for small inputs.

The modules Math::GMP and Math::GMPz have calls to GMP's mpz_probab_prime_p function. They are the lowest memory of the module solutions by a small margin, but not the fastest. They shouldn't slow down much with larger inputs.

Math::GMPz has a bit clunkier interface but is faster and exports the entire GMP integer API (which makes it use a little more memory to load). The speed will differ based on the number of tests: only 3 are required for this size, but we must have at minimum 11 for 64-bit inputs and probably more to avoid false results. Math::GMP has much more object overhead though I believe this is being worked on.

Like some other modules, the result of the primality test is either 2 (definitely prime), 1 (probably prime), or 0 (definitely composite). Using the double negation is an easy and fast way to make the result either 1 or 0.

`use Math::GMP; $i=Math::GMP->new(0); for (1..1e7) { $i++; $s += !!$i->probab_prime(15) } ; say $s;`

`use Math::GMPz; $i=Math::GMPz->new(0); for (1..1e7) { $i++; $s += !!Math::GMPz::Rmpz_probab_prime_p($i,15) }; say $s`

Math::Prime::Util is my number theory module. After working on cutting down memory use it's reasonably small even with many exportable functions. It is the fastest solution as well.

By default it will load the GMP back-end if available. This uses a little more memory, but can be turned off either by not having it installed or setting the environment variable `MPU_NO_GMP` to a non-zero value.

`use Math::Prime::Util "is_prime"; $s += !!is_prime($_) for 1..1e7; say $s;`

To match the behavior of Math::Prime::FastSieve, we can precalculate the primes, making `is_prime`a simple bit-set lookup.

`use Math::Prime::Util qw/is_prime prime_precalc/; prime_precalc(1e7); $s += !!is_prime($_) for 1..1e7; say $s;`

Math::Prime::XS does mod-6 trial division in C. It's fast for small inputs, but as expected from an exponential-time algorithm, will slow down a lot with large inputs. It uses more memory than I'd expect.

`use Math::Prime::XS "is_prime"; $s += is_prime($_) for 1..1e7; say $s;`

Math::Primality implements the BPSW algorithm in Perl using Math::GMPz. It really is better suited for bigint inputs, being both slow and memory intensive for this simple task.

`use Math::Primality "is_prime"; $s += !!is_prime($_) for 1..1e7; say $s;`

Math::Pari is the Perl interface to the old Pari 2.1.7 library. It has lots of functionality, but does take up almost 1.5MB more startup memory. While PARI/GP is reasonably efficient (about 4 seconds), the module returns the boolean result as a Math::Pari object which sucks up lots of time. The double negation is a little faster than directly summing the result.

`use Math::Pari qw/isprime/; $s += !!isprime($_) for 1..1e7; say $s;`

Math::Prime::FastSieve takes a little different approach. It's written using Inline::CPP and sieves a range into a bit vector, after which operations (such as `isprime`) can be efficiently performed. That does limit its range, but the time shows it's quite fast at this operation.

`use Math::Prime::FastSieve; my $sieve = Math::Prime::FastSieve::Sieve->new(1e7); $s += $sieve->isprime($_) for 1..1e7; say $s;`

Lastly we come to Abigail's regex. Very popular for code golfing and showing awesome regex hacks, it occasionally is seen as a practical recommendation from people who have clearly never used it for non-toy-size inputs. It's very cool. It's also not practical for larger inputs. For this task it took 6.8 seconds and 2684k for the first 10k, but 2607 seconds and 7144k for isprime for the first 100k integers. It takes over 1 minute to verify 999,983 is prime, and for 9,999,991 I killed it after 40 minutes. Hence my estimate of over a year to finish the sum-to-10M example.

`sub isprime { ('1' x shift) !~ /^1?$|^(11+?)\1+$/ } $s += isprime($_) for 1..1e7; say $s`

Conclusions

If you're using Moose or a long-running process, the memory use for any of the reasonable solutions here probably doesn't matter -- use what is easy and fast. For command-line programs or processes that are spun up just for a single task, the memory use can matter. My module was hitting 9MB before I finally had enough and reduced it substantially (a big chunk of that was by having functions go straight to XS and load up the thousands of lines of PP only if required). Even for such a simple task as this we can see sizes ranging from 2MB to 6MB, with over 2MB difference even between modules.

Another subject that is important, especially for making utility scripts, is startup time. This task did not measure that, but it can also be a bottleneck especially if comparing vs. standalone C programs that have essentially no startup cost.

I'm also curious what the range limitations are for each method. Sometimes a slow program that has a longer range is better than a speedy one that stops at 127. And, how long does it take for the last prime (largest known) instead of the first 100k? :)

I'd also like to see how this plays out on different platforms. When I get a chance I'll try it on my boxen.

Brian, that's really a different question. This started out as just a "how much memory is used just to load the module without doing anything" but then I figured I should call a single function. Then figured I may as well do a toy task.

These aren't huge numbers, but from the Math::Prime::Util documentation:

Math::Prime::Util with the GMP backend will support hundreds of thousands of digits, and is probably the fastest code for large numbers other than OpenPFGW's Fermat test, and is substantially faster than any of the other Perl modules. See this stackexchange challenge, or Nicely's list of first occurrence prime gaps where I used this module.

Caveat being that without Math::Prime::Util::GMP installed, it uses Math::BigInt (with GMP or Pari backend), which is super slow. My todo list has some sort of replacement to get a bigint solution that is both (1) portable assuming XS, and (2) reasonably fast. Also, there are some nice optimizations for x86_64 as well as 64-bit in general. It is still fast on non-x86 machines, but it will miss some of the better optimizations (asm mulmod, montgomery math).

Math::Pari, Math::GMP, Math::GMPz, and Math::Primality will support bigints pretty well. For the two GMP methods you'll have to decide how many tests to use. Math::Pari really needs to be updated to use a newer Pari by default -- the current version will do 10 M-R tests and is quite a bit slower than when built with Pari 2.3.5.

Math::Prime::XS does not support bigints. For 64-bit primes it is about 3-4 million times slower than MPU on my machine (but should be fast for most composites).

Math::Prime::FastSieve is going to eat a lot of memory and time making the sieve once we're past 10^8 or so. The answers are fast once done, but it's not the best solution. It took me 2 minutes to sieve to 10^10, and beyond that will take GB of memory.

Trial division is exponential time so even with C+GMP is not going to be practical past 25-30 digits (and is hideously slow at those sizes). The Perl code is just going to get worse.

Time for primality proofs is another discussion -- I'm writing some web pages on that since I realized I keep writing the same thing on forums.

For the largest known primes, we'd want to use a Lucas-Lehmer test since they are Mersenne primes. I have not added any special form tests (nor have the other modules), but the LL test is pretty straightforward. They would still take a long time. The largest currently known prime has 17,425,170 digits. Using code specifically made for this, it took 6 days on a 32-core server and 3.6 days on a GPU.

For a general form numbers, last year some people ran tests on a couple Wagstaff PRPs with ~4 million digits. OpenPFGW took 4-70 hours to show they were Fermat PRPs, and 5 days for the Lucas test. A fast Frobenius test implemented with GMP took slightly over one month.

Perl 5.21.3 built on AIX 7.1 Power 7, 64-bit. Unfortunately building things on AIX is a PITA -- I gave up trying to get GMP, Math::Pari, or ExtUtils::CppGuess (used by Math::Prime::FastSieve) to work. Measuring memory use is non-obvious for fast-running tasks. I'll just do some times for the same 1-10M isprime to at least show the different architecture.

132.1 Perl trial division mod 6

291.7 Perl trial division

9.8 Math::Prime::Util

2.5 Math::Prime::Util with precalc

6.7 Math::Prime::XS

On this machine, Math::Prime::XS's simple trial division loop is faster than the non-cached routine I use in MPU until 3e7. Part of this is that MPU uses UV internally while MPXS uses "unsigned long". On this machine UV is "unsigned long long" (64-bit) and unsigned long is only 32-bit. That means MPXS is 32-bit, so doesn't work past 2^32 and probably explains the speed difference as well.

I had tried adding trial division with no better performance, but that last was the clue. For small sizes, when not using Montgomery math, I have trial division using unsigned ints. This almost halves the time on the AIX/Power7 machine -- down to 5.0s. As I expected, trial division is slower on x86_64. I'm interested in hearing how your tests went.