** invmod(a,n)** computes the modular inverse. Similar to Pari's

** vecsum(...)** returns the sum of a list of integers. What about List::Util::sum! I was using that, until I started wondering why my results were sometimes incorrect. Try:

my $n = int(2**53); die unless $n+13 == int(sum($n,13));

The problem is that List::Util turns everything into an NV, which means it lops off the lower 11 or so bits of 64-bit integers. Oops. min and max have the same problem, so the next release of MPU will probably add vecmin and vecmax functions.

** binomial(n,k)** computes a binomial coefficient. I discussed this in a previous post. It uses the full Kronenburg definition for negative arguments, and as much computation in C as possible for speed.

** forpart { ... } n[,{...}]** loops over integer partitions. Pari 2.6 added this, and I thought I would as well. In its simplest form, it gives the additive partitions like the Integer::Partition module, just much faster (albeit not using an iterator). It also allows restrictions to be given, such as

forpart { say "@_" } 10,{n=>5}to only show the partitions with exactly 5 elements. We can use amin and amax to restrict by element size; nmin and nmax to restrict by number of elements. Of course any restriction can be done inside the block, but using the passed-in restriction means the block doesn't get called at all -- important when the number of unrestricted partitions is in the tens or hundreds of millions.

Performance

Wojchiech Izykowski has been working on fast Miller-Rabin testing for some time, including, for a few years now, hosting the Best known SPRP bases collection. He's also been working on Fast Miller-Rabin primality test code. I sent him a 1994 paper by Arazi a while back, which he managed to turn into a performance improvement to his 2^64 modular inverse, and some more back and forth led to even faster code. That combined with a few other changes to the Montgomery math resulted in a close to 50% speedup in the primality tests, which were already blazingly fast. On my 4770k it's now less than a microsecond at any 64-bit size, and 2-5x faster than Pari/GP. The caveat being that the fastest Montgomery path is only used on x86_64. Regardless of platform, the results for any 64-bit number are deterministic -- there are no false results because we use the BPSW test.

I also made a small speedup for small integer factoring, which helps speed up a few functions that call it, e.g. euler_phi, divisor_sum, moebius, etc. Useful for shaving off a little time from naive Project Euler solutions perhaps. I had a few tasks that did a lot of calling of these functions for small-ish values, and while they're already pretty fast, every little bit helps.

What's next?

For minor updates, I already mentioned vecmin and vecmax. I have some improvements to legendre_phi that should be done. I'm thinking is_power may get an optional third argument like Pari that gets set to the root.

Math::Prime::Util::GMP has implementations of valuation, invmod, is_pseudoprime, and binomial now, to help speed those up. I'll probably add vecsum as well. I have a speedup coming for primality testing of numbers in the 40 to 5000 digit range, which is important for the applications I'm running.

Lastly, I'm really hoping to get time to make an XS bigint module, which should give a big boost to the cases where we need bigint processing but don't have GMP. Math::BigInt is a lot slower than I'd like.

]]>]]> Since the base of Math::Prime::Util is in C, I first needed a solution in C. MJD has an old blog post and followup from 2007 related to overflow. This mostly works, but we need a way to detect overflow. RosettaCode has an idea, though overall the code is worse than MJD's.

Experience has shown that once I have to leave the C code, performance takes a huge hit. Hence it would be best to handle everything possible here. I did a little experiment, calculating the 5151 cases of binomials with n in 0..100 and k in 0..n. All but 1355 of them have a 64-bit result, so this is (without going too crazy) the best we can get.

- RosettaCode's example bails on 1990 cases, including (38,13).
- MJD's base code bails on 1617 cases, including (63,29).
- Adding a gcd bails on 1389 cases, including (76,21).
- Adding a second gcd bails on 1355 cases, all having a result > 2^64.

The single gcd is exactly what MJD suggests in his followup blog and gets most of the cases. However, for `r = r * n/d`, if we first reduce `n/d` then `r/d`, that handles the other cases. The cost is an extra gcd in C (only when it looks like we might overflow), to save us a call to either a GMP binomial turned into a Math::BigInt (not too bad), or to Perl's Math::BigInt bnok (a *lot* slower).

static UV gcd_ui(UV x, UV y) { if (y < x) { UV t = x; x = y; y = t; } while (y > 0) { UV t = y; y = x % y; x = t; /* y1 <- x0 % y0 ; x1 <- y0 */ } return x; } UV binomial(UV n, UV k) { UV d, g, r = 1; if (k >= n) return (k == n); if (k > n/2) k = n-k; for (d = 1; d <= k; d++) { if (r >= UV_MAX/n) { /* Possible overflow */ UV nr, dr; /* reduced numerator / denominator */ g = gcd_ui(n, d); nr = n/g; dr = d/g; g = gcd_ui(r, dr); r = r/g; dr = dr/g; if (r >= UV_MAX/nr) return 0; /* Unavoidable overflow */ r *= nr; r /= dr; n--; } else { r *= n--; r /= d; } } }

Negative Arguments

Once the C code overflows (or the XS layer decides it can't understand the arguments), I go to GMP if available, and Math::BigInt's *bnok* if not. When I started testing negative arguments, things got interesting. First let's look at the well defined case: `n < 0, k >= 0`. Knuth 1.2.6.G is quite standard: `binomial(-n,k) = (-1)^k * binomial(n+k-1,k)`. This is handled correctly by Mathematica, Pari, and GMP, but not by Math::BigInt. An RT has been filed (the worst part is that it gives different answers depending on which back end is used).

Things get murky when looking at negative k. Knuth 1.2.6.B indicates that for a positive n, the binomial is 0 when `k < 0` or `k > n`. But what about negative n? Pari says it is 0 in this case as well. GMP's API doesn't allow negative k at all. Math::BigInt also says it is always zero. But Mathematica references Kronenburg 2011 and defines it as `(-1)^(n-k) * binomial(-k-1,n-k)` when `n < 0` and `k <= n`.

I've decided to follow the full Kronenburg definition for negative arguments. This means doing some additional work in the XS and GMP wrappers, as well as wrapping up Math::BigInt's function.

For GMP, this is relatively easy to handle, given char* strn and strk:

mpz_init_set_str(n, strn, 10);

mpz_init_set_str(k, strk, 10);

if (mpz_sgn(k) < 0) { /* Handle negative k */

if (mpz_sgn(n) >= 0 || mpz_cmp(k,n) > 0) mpz_set_ui(n, 0);

else mpz_sub(k, n, k);

}

mpz_bin_ui(n, n, mpz_get_ui(k));

/* ... return result n as appropriate ... */

mpz_clear(k); mpz_clear(n);

For XS and native-int Perl it's slightly trickier, because we have to overflow in the case where the unsigned binomial succeeded and we need to negate the result but the high bit is set -- hence it can't be represented as a signed integer. Wrapping Math::BigInt isn't too different than GMP. I've put a patch in the RT to do the modifications inside Math::BigInt, but it remains to be reviewed and further tested.

]]>The usual speed improvements in various areas, some approximation improvements, and new functions. Primality testing benchmarks also show Perl has state of the art solutions.

]]> New functions:`twin_prime_count`and`nth_twin_prime`, similar to the regular`prime_count`and`nth_prime`functions, give the count or value for twin primes.`twin_prime_count_approx`and`nth_twin_prime_approx`, also similar to the standard functions, give fast approximations, which are especially useful for very large inputs.`random_shawe_taylor_prime`generates random proven primes using the FIPS 186-4 method. A*...*`_with_cert`version is also available that returns a primality certificate. This is somewhat faster than the random Maurer primes, but returns a smaller subset, so is not used for the generic`random_proven_prime`function. It's nice to have if one wants FIPS behavior.

- GMP versions of
`is_power`and`exp_mangoldt`, so these run faster for large inputs.

Two big speedups in GMP primality tests. ECPP has updated polynomial data and some performance updates to keep its lead as the fastest open source primality proof for up to 500 digits. It remains competitive with Pari and the newest mpz_aprcl for quite a while after. With the larger poly set doesn't do too bad compared with Primo up to 2000 digits.

AKS has an improved algorithm using changes from Bernstein and Voloch, with a nice r/s heuristic from Bornemann. It runs about 200x faster now, making it, by quite a bit, the fastest publicly available AKS implementation. Even this updated version is still millions of times slower than ECPP. Repeat after me: AKS is important for the theory, but is not a practical algorithm...

Shows primality timing measurements for a number of open source solutions. All but APRCL and Primo are included in Math::Prime::Util.

]]>If you want to factor 30+ digit numbers I recommend additionally installing Math::BigInt::GMP and Math::Prime::Util::GMP.

factor.pl 2**63-1

factor.pl 'nth_prime(100000)*pn_primorial(10)*random_nbit_prime(90)'

This is just a brief look so I chose some easy numbers:

2**38-1 (3,174763,524287)

2**90-1 (3,3,3,7,11,19,31,73,151,331,631,23311,18837001)

2**150-1 (3,3,7,11,31,151,251,331,601,1801,4051,100801,

10567201,1133836730401)

and some "hard" numbers:

2**62-1 (3,715827883,2147483647)

2**95-1 (31,191,524287,420778751,30327152671)

2**195-1 (7,31,79,151,8191,121369,145295143558111,

134304196845099262572814573351)2**250-1 (3,11,31,251,601,1801,4051,229668251,269089806001,

4710883168879506001,5519485418336288303251)

An aside: *everyone's thoughts on what "small", "big", "easy", "hard" mean differ. To some, a 19-digit number is big, while to many people working on modern factoring that size is completely trivial, and anything under 100-digits is a yawn. My goal is to make it easy to factor 50-80 digit numbers in a reasonable time from Perl. If you are seriously looking for larger or better methods, I recommend yafu, msieve, GMP-ECM, and GGNFS.*

So first let's look at the modules:

- Math::Big::Factors. I believe this was intended as an example of using Math::BigInt, but there are some modules actually using it. The main problem is that is super slow (slower than a simple trial division loop in many cases).
- Math:Factor::XS. Nicely coded trial division in C. Great for easy native size integers, slower than need be for ones with two large factors, and doesn't support bigints.
- Math::Factoring. Leto's placeholder for factoring algorithms in Perl+Math::GMPz. Unfortunately it currently only does trial division, so it only works well on easy numbers. Also be careful to give it only native or Math::GMPz inputs or it blows up.
- Math::Pari. Perl interface to the Pari library (2.1 by default, 2.3 possible if hand-built, 2.5+ not supported). Downside: licensing and portability issues on some platforms. Upside:
**lots**of functions, integer factoring is fast and has a relatively predictable slowdown as the input gets larger. Uses SQUFOF, Pollard Rho, ECM, and MPQS. - Math::Prime::Util. The module I've been working on lately. Small numbers are factored in C using Pollard Rho, SQUFOF, Pollard p-1, and HOLF. For BigInts we see if Math::Prime::Util::GMP is installed and use that if so, otherwise various Perl methods are used: Trial, HOLF, Pollard's Rho, Pollard's p-1, and 2-stage ECM.
- Math::Prime::Util::GMP. This module contains C+GMP code for various tasks and is meant to be a backend for Math::Prime::Util -- install it if you have GMP and most big integer functions speed up. For factoring this is
**much**faster than Perl+Math::BigInt::GMP. It uses a combination of Trial, native SQUFOF, 2-stage Pollard p-1, 2-stage ECM, Pollard Rho, and a simple Quadratic Sieve.

I took a fresh 5.16.2 perlbrew on an x86 Ubuntu machine and installed Math::BigInt::GMP to start, then tried various modules on some simple examples. I installed fresh versions of each module with CPAN before running. Timing results, in seconds, from the command line:

38 90 150 62 95 195 250 ------------ ------ ------ ------ ------ ------ ------ ------ Math::Big 13.7 369 >600 >600 >600 MF::XS 0.02 1.78 M::Factoring 0.22 0.12 7.86 493 302 >600 MPU 0.05 0.16 1.39 0.05 2.03 120+ 600+ MPU::GMP 0.05 0.05 0.06 0.05 0.06 0.40 1.57 Math::Pari 0.05 0.06 0.06 0.06 0.06 0.35 5.65

This is just a gross look, factoring a single number, so anything under 0.1 seconds is reflecting the overhead of starting Perl more than the actual time taken. Both Math::Pari and Math::Prime::Util do some startup work such as creating a small set of primes used for other functions, which the other modules don't do.

Math::Big::Factor: I tried different sizes for the factor wheel and 5 was the fastest for these examples. It took over 6 minutes to factor 2^90-1, which a trial division loop can do in under a second. I stopped it after 15 minutes on 2^150-1, which a trial division loop can do in a bit over 5 minutes.

Math::Factor::XS is super fast for the small example, but certainly slower than need be for 2^62-1. Its maximum size is MAX(sizeof(unsigned long), sizeof(UV)), so either 32-bit or 64-bit.

Math::Factoring will be pretty interesting once algorithms get added, and I'm thinking of submitting a patch to support some of the simpler methods from MPU like Pollard Rho (with Brent's improvements), 2-stage p-1, Fermat, HOLF, and possibly ECM. Using Math::GMPz directly should make it faster than using Math::BigInt objects.

Math::Prime::Util using pure Perl does a pretty good job on most of the numbers. The two largest examples are found using ECM, which is why the '+' sign on the time -- exactly how fast they get found depends on the random curves selected.

Math::Pari and Math::Prime::Util::GMP are the clear time winners, finding all the factors of the small examples basically instantly. Both of them factor random 30-digit numbers in under 10ms, and average under a second for random 60 digit numbers.

Edit for reference: command lines used:

]]>

perl -Mbigint=lib,GMP -MMath::Big::Factors=factors_wheel -E 'my $n = 2**38-1; say join ",", factors_wheel($n, 5);'

perl -MMath::Factor::XS=prime_factors -E 'my $n = 2**38-1; say join ",", prime_factors($n);'

perl -MMath::Factoring=factor -Mbigint=lib,GMP -E 'my $n = 2**38-1; say join ",", factor(Math::GMPz->new("$n"));'

perl -MMath::Prime::Util=factor -Mbigint=lib,GMP -E 'my $n = 2**38-1; say join ",", factor($n);

perl -MMath::Pari=factorint -Mbigint=lib,GMP -E 'my $n = 2**38-1; my ($pn,$pc) = @{factorint($n)}; say join ",", map { ($pn->[$_]) x $pc->[$_] } 0 .. $#$pn'

I decided to go ahead and do the Pari removal, and it's on CPAN as Alt::Crypt::RSA::BigInt. This uses Ingy's Alt framework -- install the Alt version and you get a new shiny Crypt::RSA. Install the old version and you're back again. There are no Pari dependencies, and if Math::BigInt::GMP is used, it's **5 to 11 times faster**. I decided to use Math::BigInt directly, as that makes it very portable, though the speed without either the GMP or Pari backend leaves much to be desired.

This also fixes, to the best of my knowledge, all 11 of the open RT issues with Crypt::RSA, and adds some new features (e.g. SHA256, SHA224, SHA384, SHA512 support, fix Key::Private::SSH, and add new ciphers). It also fixes some serious issues with the key generation caused by Crypt::Primes. I intentionally left the API and internal structure as identical as possible. There's an argument to be made for a new RSA module, but that wasn't the point for this change.

The conversion of the math inside Crypt::RSA was relatively straightforward. The big issues were the dependencies. Math::Prime::Util has fast random prime generation and primality testing, which solves that portion. Fortuitously, David Oswald released Bytes::Random::Secure right before I started on the conversion, and it met all my needs for replacing Crypt::Random.

]]>