Factoring integers in Perl
Recently I've been working on factoring and primality proofs (with certificates) for Math::Prime::Util. I thought I'd give a brief summary and comparison of the modules I know of for factoring integers from Perl.
Let me take a moment to promote the factor.pl utility installed by Math::Prime::Util. It works just like the BSD or GNU factor utility, except it supports bigints on all platforms and parses simple expressions. So you can run commands like:
If you want to factor 30+ digit numbers I recommend additionally installing Math::BigInt::GMP and Math::Prime::Util::GMP.
factor.pl 2**63-1
factor.pl 'nth_prime(100000)*pn_primorial(10)*random_nbit_prime(90)'
This is just a brief look so I chose some easy numbers:
2**38-1 (3,174763,524287)
2**90-1 (3,3,3,7,11,19,31,73,151,331,631,23311,18837001)
2**150-1 (3,3,7,11,31,151,251,331,601,1801,4051,100801,
10567201,1133836730401)
and some "hard" numbers:
2**62-1 (3,715827883,2147483647)
2**95-1 (31,191,524287,420778751,30327152671)
2**195-1 (7,31,79,151,8191,121369,145295143558111,
134304196845099262572814573351)2**250-1 (3,11,31,251,601,1801,4051,229668251,269089806001,
4710883168879506001,5519485418336288303251)
An aside: everyone's thoughts on what "small", "big", "easy", "hard" mean differ. To some, a 19-digit number is big, while to many people working on modern factoring that size is completely trivial, and anything under 100-digits is a yawn. My goal is to make it easy to factor 50-80 digit numbers in a reasonable time from Perl. If you are seriously looking for larger or better methods, I recommend yafu, msieve, GMP-ECM, and GGNFS.
So first let's look at the modules:
- Math::Big::Factors. I believe this was intended as an example of using Math::BigInt, but there are some modules actually using it. The main problem is that is super slow (slower than a simple trial division loop in many cases).
- Math:Factor::XS. Nicely coded trial division in C. Great for easy native size integers, slower than need be for ones with two large factors, and doesn't support bigints.
- Math::Factoring. Leto's placeholder for factoring algorithms in Perl+Math::GMPz. Unfortunately it currently only does trial division, so it only works well on easy numbers. Also be careful to give it only native or Math::GMPz inputs or it blows up.
- Math::Pari. Perl interface to the Pari library (2.1 by default, 2.3 possible if hand-built, 2.5+ not supported). Downside: licensing and portability issues on some platforms. Upside: lots of functions, integer factoring is fast and has a relatively predictable slowdown as the input gets larger. Uses SQUFOF, Pollard Rho, ECM, and MPQS.
- Math::Prime::Util. The module I've been working on lately. Small numbers are factored in C using Pollard Rho, SQUFOF, Pollard p-1, and HOLF. For BigInts we see if Math::Prime::Util::GMP is installed and use that if so, otherwise various Perl methods are used: Trial, HOLF, Pollard's Rho, Pollard's p-1, and 2-stage ECM.
- Math::Prime::Util::GMP. This module contains C+GMP code for various tasks and is meant to be a backend for Math::Prime::Util -- install it if you have GMP and most big integer functions speed up. For factoring this is much faster than Perl+Math::BigInt::GMP. It uses a combination of Trial, native SQUFOF, 2-stage Pollard p-1, 2-stage ECM, Pollard Rho, and a simple Quadratic Sieve.
I took a fresh 5.16.2 perlbrew on an x86 Ubuntu machine and installed Math::BigInt::GMP to start, then tried various modules on some simple examples. I installed fresh versions of each module with CPAN before running. Timing results, in seconds, from the command line:
38 90 150 62 95 195 250 ------------ ------ ------ ------ ------ ------ ------ ------ Math::Big 13.7 369 >600 >600 >600 MF::XS 0.02 1.78 M::Factoring 0.22 0.12 7.86 493 302 >600 MPU 0.05 0.16 1.39 0.05 2.03 120+ 600+ MPU::GMP 0.05 0.05 0.06 0.05 0.06 0.40 1.57 Math::Pari 0.05 0.06 0.06 0.06 0.06 0.35 5.65
This is just a gross look, factoring a single number, so anything under 0.1 seconds is reflecting the overhead of starting Perl more than the actual time taken. Both Math::Pari and Math::Prime::Util do some startup work such as creating a small set of primes used for other functions, which the other modules don't do.
Math::Big::Factor: I tried different sizes for the factor wheel and 5 was the fastest for these examples. It took over 6 minutes to factor 2^90-1, which a trial division loop can do in under a second. I stopped it after 15 minutes on 2^150-1, which a trial division loop can do in a bit over 5 minutes.
Math::Factor::XS is super fast for the small example, but certainly slower than need be for 2^62-1. Its maximum size is MAX(sizeof(unsigned long), sizeof(UV)), so either 32-bit or 64-bit.
Math::Factoring will be pretty interesting once algorithms get added, and I'm thinking of submitting a patch to support some of the simpler methods from MPU like Pollard Rho (with Brent's improvements), 2-stage p-1, Fermat, HOLF, and possibly ECM. Using Math::GMPz directly should make it faster than using Math::BigInt objects.
Math::Prime::Util using pure Perl does a pretty good job on most of the numbers. The two largest examples are found using ECM, which is why the '+' sign on the time -- exactly how fast they get found depends on the random curves selected.
Math::Pari and Math::Prime::Util::GMP are the clear time winners, finding all the factors of the small examples basically instantly. Both of them factor random 30-digit numbers in under 10ms, and average under a second for random 60 digit numbers.
Edit for reference: command lines used:
perl -Mbigint=lib,GMP -MMath::Big::Factors=factors_wheel -E 'my $n = 2**38-1; say join ",", factors_wheel($n, 5);'
perl -MMath::Factor::XS=prime_factors -E 'my $n = 2**38-1; say join ",", prime_factors($n);'
perl -MMath::Factoring=factor -Mbigint=lib,GMP -E 'my $n = 2**38-1; say join ",", factor(Math::GMPz->new("$n"));'
perl -MMath::Prime::Util=factor -Mbigint=lib,GMP -E 'my $n = 2**38-1; say join ",", factor($n);
perl -MMath::Pari=factorint -Mbigint=lib,GMP -E 'my $n = 2**38-1; my ($pn,$pc) = @{factorint($n)}; say join ",", map { ($pn->[$_]) x $pc->[$_] } 0 .. $#$pn'
For clarification, this is looking at prime factors. A table for modules and prime factors vs. divisors:
Pari includes 1 and n , MFXS and MPU do not (it's easy enough to add or remove them). As an example, 2^90-1 has 13 prime factors (11 unique), and 4094 divisors (plus 1 and n).