## Math::Prime::Util, May update

Math::Prime::Util 0.41 was released, with new functions (valuation, invmod, vecsum, binomial, forpart), and the usual crop of performance improvements. In particular, primality testing on x86_64 got another big speedup.

valuation(n,k) is similar to the Pari function, though MPU only supports integers. This returns the number of times k evenly divides n. This sounds rather boring, but we can use it for a very common task: stripping off trailing zeros in a given base. E.g. \$n >>= valuation(\$n,2) will remove all the trailing binary zeros. For decimal: \$n /= 10**valuation(\$n,10).

invmod(a,n) computes the modular inverse. Similar to Pari's lift(Mod(1/a,n)) and Math::BigInt's bmodinv, though faster than either with native values and returns an undef when not defined instead of raising an exception (Pari) or returning NaN (Math::BigInt).

vecsum(...) returns the sum of a list of integers. What about List::Util::sum! I was using that, until I started wondering why my results were sometimes incorrect. Try:

`    my \$n = int(2**53); die unless \$n+13 == int(sum(\$n,13));`

The problem is that List::Util turns everything into an NV, which means it lops off the lower 11 or so bits of 64-bit integers. Oops. min and max have the same problem, so the next release of MPU will probably add vecmin and vecmax functions.

binomial(n,k) computes a binomial coefficient. I discussed this in a previous post. It uses the full Kronenburg definition for negative arguments, and as much computation in C as possible for speed.

forpart { ... } n[,{...}] loops over integer partitions. Pari 2.6 added this, and I thought I would as well. In its simplest form, it gives the additive partitions like the Integer::Partition module, just much faster (albeit not using an iterator). It also allows restrictions to be given, such as

`    forpart { say "@_" } 10,{n=>5}`
to only show the partitions with exactly 5 elements. We can use amin and amax to restrict by element size; nmin and nmax to restrict by number of elements. Of course any restriction can be done inside the block, but using the passed-in restriction means the block doesn't get called at all -- important when the number of unrestricted partitions is in the tens or hundreds of millions.

Performance

Wojchiech Izykowski has been working on fast Miller-Rabin testing for some time, including, for a few years now, hosting the Best known SPRP bases collection. He's also been working on Fast Miller-Rabin primality test code. I sent him a 1994 paper by Arazi a while back, which he managed to turn into a performance improvement to his 2^64 modular inverse, and some more back and forth led to even faster code. That combined with a few other changes to the Montgomery math resulted in a close to 50% speedup in the primality tests, which were already blazingly fast. On my 4770k it's now less than a microsecond at any 64-bit size, and 2-5x faster than Pari/GP. The caveat being that the fastest Montgomery path is only used on x86_64. Regardless of platform, the results for any 64-bit number are deterministic -- there are no false results because we use the BPSW test.

I also made a small speedup for small integer factoring, which helps speed up a few functions that call it, e.g. euler_phi, divisor_sum, moebius, etc. Useful for shaving off a little time from naive Project Euler solutions perhaps. I had a few tasks that did a lot of calling of these functions for small-ish values, and while they're already pretty fast, every little bit helps.

What's next?

For minor updates, I already mentioned vecmin and vecmax. I have some improvements to legendre_phi that should be done. I'm thinking is_power may get an optional third argument like Pari that gets set to the root.

Math::Prime::Util::GMP has implementations of valuation, invmod, is_pseudoprime, and binomial now, to help speed those up. I'll probably add vecsum as well. I have a speedup coming for primality testing of numbers in the 40 to 5000 digit range, which is important for the applications I'm running.

Lastly, I'm really hoping to get time to make an XS bigint module, which should give a big boost to the cases where we need bigint processing but don't have GMP. Math::BigInt is a lot slower than I'd like.