## Math::Prime::Util, May update

Math::Prime::Util 0.41 was released, with new functions (valuation, invmod, vecsum, binomial, forpart), and the usual crop of performance improvements. In particular, primality testing on x86_64 got another big speedup.

** valuation(n,k)** is similar to the Pari function, though MPU only supports integers. This returns the number of times

`k`evenly divides

`n`. This sounds rather boring, but we can use it for a very common task: stripping off trailing zeros in a given base. E.g.

`$n >>= valuation($n,2)`will remove all the trailing binary zeros. For decimal:

`$n /= 10**valuation($n,10)`.

** invmod(a,n)** computes the modular inverse. Similar to Pari's

`lift(Mod(1/a,n))`and Math::BigInt's

`bmodinv`, though faster than either with native values and returns an undef when not defined instead of raising an exception (Pari) or returning NaN (Math::BigInt).

** vecsum(...)** returns the sum of a list of integers. What about List::Util::sum! I was using that, until I started wondering why my results were sometimes incorrect. Try:

my $n = int(2**53); die unless $n+13 == int(sum($n,13));

The problem is that List::Util turns everything into an NV, which means it lops off the lower 11 or so bits of 64-bit integers. Oops. min and max have the same problem, so the next release of MPU will probably add vecmin and vecmax functions.

** binomial(n,k)** computes a binomial coefficient. I discussed this in a previous post. It uses the full Kronenburg definition for negative arguments, and as much computation in C as possible for speed.

** forpart { ... } n[,{...}]** loops over integer partitions. Pari 2.6 added this, and I thought I would as well. In its simplest form, it gives the additive partitions like the Integer::Partition module, just much faster (albeit not using an iterator). It also allows restrictions to be given, such as

forpart { say "@_" } 10,{n=>5}to only show the partitions with exactly 5 elements. We can use amin and amax to restrict by element size; nmin and nmax to restrict by number of elements. Of course any restriction can be done inside the block, but using the passed-in restriction means the block doesn't get called at all -- important when the number of unrestricted partitions is in the tens or hundreds of millions.

**Performance**

Wojchiech Izykowski has been working on fast Miller-Rabin testing for some time, including, for a few years now, hosting the Best known SPRP bases collection. He's also been working on Fast Miller-Rabin primality test code. I sent him a 1994 paper by Arazi a while back, which he managed to turn into a performance improvement to his 2^64 modular inverse, and some more back and forth led to even faster code. That combined with a few other changes to the Montgomery math resulted in a close to 50% speedup in the primality tests, which were already blazingly fast. On my 4770k it's now less than a microsecond at any 64-bit size, and 2-5x faster than Pari/GP. The caveat being that the fastest Montgomery path is only used on x86_64. Regardless of platform, the results for any 64-bit number are deterministic -- there are no false results because we use the BPSW test.

I also made a small speedup for small integer factoring, which helps speed up a few functions that call it, e.g. euler_phi, divisor_sum, moebius, etc. Useful for shaving off a little time from naive Project Euler solutions perhaps. I had a few tasks that did a lot of calling of these functions for small-ish values, and while they're already pretty fast, every little bit helps.

**What's next?**

For minor updates, I already mentioned vecmin and vecmax. I have some improvements to legendre_phi that should be done. I'm thinking is_power may get an optional third argument like Pari that gets set to the root.

Math::Prime::Util::GMP has implementations of valuation, invmod, is_pseudoprime, and binomial now, to help speed those up. I'll probably add vecsum as well. I have a speedup coming for primality testing of numbers in the 40 to 5000 digit range, which is important for the applications I'm running.

Lastly, I'm really hoping to get time to make an XS bigint module, which should give a big boost to the cases where we need bigint processing but don't have GMP. Math::BigInt is a lot slower than I'd like.

An XS BigInt module would be great. I have been maintaining Algorithm::AM, which uses custom 128-bit integers in XS code. I'd love to be able to outsource it and simplify the whole thing.

Nate, if your platform has GMP, then Math::GMPz or Math::GMP are great. Similarly, if you're using the right platform then Math::Int128 is nice. But those modules don't work on lots of machines out there.

I am going to do a LibTomMath-based module that's self contained (see CryptX, for instance). Interface similar to Math::GMP probably with some additions (e.g. modular inverse, exponentiation, etc. similar to Math::BigInt). It certainly won't be as fast as GMP for the big stuff, but it will be fine for most uses and I will be able to include it as a dependency and still work on all platforms.

For Algorithm::AM, I'm not sure if any of this would help or not. LTM will be tucked away inside the Math::LTM module where you couldn't get to it (let me know if there is a way, but given we don't install the C headers as part of the module, for instance, I don't see how it would work). If the concern is performance and readability, you could use some generic functions such as a subset of the GMP API (e.g. mpz_init, mpz_set_ui, mpz_add, mpz_mul, mpz_get_d, mpz_get_str, mpz_clear), then have Makefile.PL / the compiler make a decision which to use (1) gcc's __int128_t type, (2) GMP (no functions needed, just include GMP.h), or (3) by-hand. Good for performance, not so good for testing. GMP is installed on quite a few systems these days, but definitely not all.

Ooh, I wasn't aware of the trouble of using List::Util::sum like that. Can you report it as a bug and I'll take a look into fixing it.