One thing I ended up adding to my test suite for one package was an "examples.t" file, which contains tests for all the examples in the synopsis and throughout the documentation (including the small examples per method and the EXAMPLES section). This compares the expected output, which Test::Synopsis doesn't have. I wanted to make sure I caught any regressions in anything I used in the documentation because that would be particularly embarrassing. The downside is keeping things in sync.

]]>* Longest chain of monthly reference to module in other sources (e.g. stackoverflow, non-Perl blogs, RosettaCode, etc.). Modules that show how Perl can be usefully extended outside of the echo chamber. Hard to tally and prevent gaming though.

* Longest chain of an author contributing to unique non-owned modules (that is, each week/month they have to submit an RT, issue, patch, pull request, etc. to a module that they don't own, each module only allowed once in a chain).

My taste runs to trying to increase the quality of what we have, vs. putting out yet another variation on some module we already have. Measuring this is certainly harder than tracking new CPAN submissions though.

]]>I am going to do a LibTomMath-based module that's self contained (see CryptX, for instance). Interface similar to Math::GMP probably with some additions (e.g. modular inverse, exponentiation, etc. similar to Math::BigInt). It certainly won't be as fast as GMP for the big stuff, but it will be fine for most uses and I will be able to include it as a dependency and still work on all platforms.

For Algorithm::AM, I'm not sure if any of this would help or not. LTM will be tucked away inside the Math::LTM module where you couldn't get to it (let me know if there is a way, but given we don't install the C headers as part of the module, for instance, I don't see how it would work). If the concern is performance and readability, you could use some generic functions such as a subset of the GMP API (e.g. mpz_init, mpz_set_ui, mpz_add, mpz_mul, mpz_get_d, mpz_get_str, mpz_clear), then have Makefile.PL / the compiler make a decision which to use (1) gcc's __int128_t type, (2) GMP (no functions needed, just include GMP.h), or (3) by-hand. Good for performance, not so good for testing. GMP is installed on quite a few systems these days, but definitely not all.

]]>** invmod(a,n)** computes the modular inverse. Similar to Pari's

** vecsum(...)** returns the sum of a list of integers. What about List::Util::sum! I was using that, until I started wondering why my results were sometimes incorrect. Try:

my $n = int(2**53); die unless $n+13 == int(sum($n,13));

The problem is that List::Util turns everything into an NV, which means it lops off the lower 11 or so bits of 64-bit integers. Oops. min and max have the same problem, so the next release of MPU will probably add vecmin and vecmax functions.

** binomial(n,k)** computes a binomial coefficient. I discussed this in a previous post. It uses the full Kronenburg definition for negative arguments, and as much computation in C as possible for speed.

** forpart { ... } n[,{...}]** loops over integer partitions. Pari 2.6 added this, and I thought I would as well. In its simplest form, it gives the additive partitions like the Integer::Partition module, just much faster (albeit not using an iterator). It also allows restrictions to be given, such as

forpart { say "@_" } 10,{n=>5}to only show the partitions with exactly 5 elements. We can use amin and amax to restrict by element size; nmin and nmax to restrict by number of elements. Of course any restriction can be done inside the block, but using the passed-in restriction means the block doesn't get called at all -- important when the number of unrestricted partitions is in the tens or hundreds of millions.

Performance

Wojchiech Izykowski has been working on fast Miller-Rabin testing for some time, including, for a few years now, hosting the Best known SPRP bases collection. He's also been working on Fast Miller-Rabin primality test code. I sent him a 1994 paper by Arazi a while back, which he managed to turn into a performance improvement to his 2^64 modular inverse, and some more back and forth led to even faster code. That combined with a few other changes to the Montgomery math resulted in a close to 50% speedup in the primality tests, which were already blazingly fast. On my 4770k it's now less than a microsecond at any 64-bit size, and 2-5x faster than Pari/GP. The caveat being that the fastest Montgomery path is only used on x86_64. Regardless of platform, the results for any 64-bit number are deterministic -- there are no false results because we use the BPSW test.

I also made a small speedup for small integer factoring, which helps speed up a few functions that call it, e.g. euler_phi, divisor_sum, moebius, etc. Useful for shaving off a little time from naive Project Euler solutions perhaps. I had a few tasks that did a lot of calling of these functions for small-ish values, and while they're already pretty fast, every little bit helps.

What's next?

For minor updates, I already mentioned vecmin and vecmax. I have some improvements to legendre_phi that should be done. I'm thinking is_power may get an optional third argument like Pari that gets set to the root.

Math::Prime::Util::GMP has implementations of valuation, invmod, is_pseudoprime, and binomial now, to help speed those up. I'll probably add vecsum as well. I have a speedup coming for primality testing of numbers in the 40 to 5000 digit range, which is important for the applications I'm running.

Lastly, I'm really hoping to get time to make an XS bigint module, which should give a big boost to the cases where we need bigint processing but don't have GMP. Math::BigInt is a lot slower than I'd like.

]]>]]> Since the base of Math::Prime::Util is in C, I first needed a solution in C. MJD has an old blog post and followup from 2007 related to overflow. This mostly works, but we need a way to detect overflow. RosettaCode has an idea, though overall the code is worse than MJD's.

Experience has shown that once I have to leave the C code, performance takes a huge hit. Hence it would be best to handle everything possible here. I did a little experiment, calculating the 5151 cases of binomials with n in 0..100 and k in 0..n. All but 1355 of them have a 64-bit result, so this is (without going too crazy) the best we can get.

- RosettaCode's example bails on 1990 cases, including (38,13).
- MJD's base code bails on 1617 cases, including (63,29).
- Adding a gcd bails on 1389 cases, including (76,21).
- Adding a second gcd bails on 1355 cases, all having a result > 2^64.

The single gcd is exactly what MJD suggests in his followup blog and gets most of the cases. However, for `r = r * n/d`, if we first reduce `n/d` then `r/d`, that handles the other cases. The cost is an extra gcd in C (only when it looks like we might overflow), to save us a call to either a GMP binomial turned into a Math::BigInt (not too bad), or to Perl's Math::BigInt bnok (a *lot* slower).

static UV gcd_ui(UV x, UV y) { if (y < x) { UV t = x; x = y; y = t; } while (y > 0) { UV t = y; y = x % y; x = t; /* y1 <- x0 % y0 ; x1 <- y0 */ } return x; } UV binomial(UV n, UV k) { UV d, g, r = 1; if (k >= n) return (k == n); if (k > n/2) k = n-k; for (d = 1; d <= k; d++) { if (r >= UV_MAX/n) { /* Possible overflow */ UV nr, dr; /* reduced numerator / denominator */ g = gcd_ui(n, d); nr = n/g; dr = d/g; g = gcd_ui(r, dr); r = r/g; dr = dr/g; if (r >= UV_MAX/nr) return 0; /* Unavoidable overflow */ r *= nr; r /= dr; n--; } else { r *= n--; r /= d; } } }

Negative Arguments

Once the C code overflows (or the XS layer decides it can't understand the arguments), I go to GMP if available, and Math::BigInt's *bnok* if not. When I started testing negative arguments, things got interesting. First let's look at the well defined case: `n < 0, k >= 0`. Knuth 1.2.6.G is quite standard: `binomial(-n,k) = (-1)^k * binomial(n+k-1,k)`. This is handled correctly by Mathematica, Pari, and GMP, but not by Math::BigInt. An RT has been filed (the worst part is that it gives different answers depending on which back end is used).

Things get murky when looking at negative k. Knuth 1.2.6.B indicates that for a positive n, the binomial is 0 when `k < 0` or `k > n`. But what about negative n? Pari says it is 0 in this case as well. GMP's API doesn't allow negative k at all. Math::BigInt also says it is always zero. But Mathematica references Kronenburg 2011 and defines it as `(-1)^(n-k) * binomial(-k-1,n-k)` when `n < 0` and `k <= n`.

I've decided to follow the full Kronenburg definition for negative arguments. This means doing some additional work in the XS and GMP wrappers, as well as wrapping up Math::BigInt's function.

For GMP, this is relatively easy to handle, given char* strn and strk:

mpz_init_set_str(n, strn, 10);

mpz_init_set_str(k, strk, 10);

if (mpz_sgn(k) < 0) { /* Handle negative k */

if (mpz_sgn(n) >= 0 || mpz_cmp(k,n) > 0) mpz_set_ui(n, 0);

else mpz_sub(k, n, k);

}

mpz_bin_ui(n, n, mpz_get_ui(k));

/* ... return result n as appropriate ... */

mpz_clear(k); mpz_clear(n);

For XS and native-int Perl it's slightly trickier, because we have to overflow in the case where the unsigned binomial succeeded and we need to negate the result but the high bit is set -- hence it can't be represented as a signed integer. Wrapping Math::BigInt isn't too different than GMP. I've put a patch in the RT to do the modifications inside Math::BigInt, but it remains to be reviewed and further tested.

]]>The usual speed improvements in various areas, some approximation improvements, and new functions. Primality testing benchmarks also show Perl has state of the art solutions.

]]> New functions:`twin_prime_count`and`nth_twin_prime`, similar to the regular`prime_count`and`nth_prime`functions, give the count or value for twin primes.`twin_prime_count_approx`and`nth_twin_prime_approx`, also similar to the standard functions, give fast approximations, which are especially useful for very large inputs.`random_shawe_taylor_prime`generates random proven primes using the FIPS 186-4 method. A*...*`_with_cert`version is also available that returns a primality certificate. This is somewhat faster than the random Maurer primes, but returns a smaller subset, so is not used for the generic`random_proven_prime`function. It's nice to have if one wants FIPS behavior.

- GMP versions of
`is_power`and`exp_mangoldt`, so these run faster for large inputs.

Two big speedups in GMP primality tests. ECPP has updated polynomial data and some performance updates to keep its lead as the fastest open source primality proof for up to 500 digits. It remains competitive with Pari and the newest mpz_aprcl for quite a while after. With the larger poly set doesn't do too bad compared with Primo up to 2000 digits.

AKS has an improved algorithm using changes from Bernstein and Voloch, with a nice r/s heuristic from Bornemann. It runs about 200x faster now, making it, by quite a bit, the fastest publicly available AKS implementation. Even this updated version is still millions of times slower than ECPP. Repeat after me: AKS is important for the theory, but is not a practical algorithm...

Shows primality timing measurements for a number of open source solutions. All but APRCL and Primo are included in Math::Prime::Util.

]]>The last time I tried that, someone removed *all* the implementation links claiming "Wikipedia is not a code repository". I get the point, but sometimes having a link to some actual working code is useful. It's a very inconsistent standard.

Oh, and thank you very much for doing this. I have an edit queued up for after work.

]]>This came to mind when wondering why Math::TrulyRandom wasn't on the list. It's mentioned in the core documentation for rand. It's also broken in many ways and has no owner. However it has no dependent distributions so ends up with a score of 6 (my estimate). In this particular case I think the correct answer is to patch the core documentation to remove any mention of the module.

Said module brings up another point, which is that arguably the module should go away entirely.

]]>Having the ability for a tester to retract results would be nice. I once tried smoking on a Windows box, which went fine until it hit the too-many-characters-on-a-command-line issue and then started spewing fail reports that had nothing to do with the modules being tested. Not only might one want to retract individual reports, but it would be nice to have a "Remove all reports for configuration ID XYZ: that platform/configuration was borked."

Having the ability to attach comments to smoke results (both by the tester and module owner) seems useful, but I'm not sure how many people would really dig that deep.

Re: rating of testers, I'd like to see if separated out into (1) tester, and (2) configuration. #1 is purely how responsive the person is. #2 is a rating of the results that are coming from from that smoker process.

Reason one for splitting: some people do smoking on multiple devices, so indicating how good the results are per configuration is useful. They may be running a 5.16-on-x86-gcc-Linux setup as well as a 5.19.2-on-ARM-clangdev-BSD, where the former is doing well but the latter may have some failures that turn out to be related to the dev Perl or dev compiler or whatever. This does require some way to identify a smoking configuration in addition to the tester id, and get a new configuration id once the tester has decided something has changed enough to justify it.

Reason two for splitting: Let's say I did something dumb and one of my smokers sent out bad reports. Immediately 10 people give me -1 votes. Well I may as well quit smoking forever, because I'll never get +1 votes back without pestering people to upvote me (at which point those old bad results now start showing up and I get -1'd again).

]]>Another example of this is the change to SvUPGRADE(sv), which used to return a value and changed to a void function in 5.17.8ish (it was indicated as being a void function in 2005). This broke a few Crypto modules, albeit the fix is really easy. But for a user who just wanted some upstream module, this means upgrading Perl made things stop working.

The module dependency issue is one reason I found to, when given the choice of two equal-for-my-use modules, one having 2 total chain non-core dependencies and the other having over 40, prefer the smaller.

]]>Broken dependency. Mirod already said it, but it's so common.

Incompatible operating system I might generalize to incompatible platform. That includes weirdness like different tmpfile behavior, NFS tmpdir, classic O/S issues, Win32 issues, big- vs. little-endian, compiler differences (e.g. the MSVC compiler).

Something that is similar to the above two but slightly different is wacky broken dependencies already installed. When I look at ActiveState build logs, some of their old systems have the right version of one of my dependencies, but for some reason it acts differently on their system than any other. Put another way, if my system has a broken/non-standard Math::BigInt installed, then I may get what seem like mysterious test failures when trying to install a module or its dependencies.

This may qualify as one of the others including broken module, but I've seen a number of cases where in my environment, 999/1000 test cases pass but one doesn't. This is often something like "testing interoperability with Obscure::Module" and Obscure::Module has changed something that invalidates the test. (Danger ahead) When I want the module for a one-off non-production experiment that has nothing to do with Obscure::Module I sometimes just force the install and typically everything works fine.

P.S. CPAN Testers, yay.

]]>Module prime factors divisorsPari includes 1 and n , MFXS and MPU do not (it's easy enough to add or remove them). As an example, 2^90-1 has 13 prime factors (11 unique), and 4094 divisors (plus 1 and n).]]>

------------------ ------------- -----------

Math::Big::Factors factors_wheel -

Math::Factor::XS prime_factors factors

Math::Factoring factor -

Math::Prime::Util factor all_factors

Math::Pari factorint divisors