Fun with binomials

I decided to add a binomial(n,k) function to Math::Prime::Util, and found some interesting things while doing it. Overflow detection and mitigation in C and Perl were the first thing. Next was looking at negative arguments, which led to finding some differences in various solutions as well as filing a bug report for Math::BigInt.

Since the base of Math::Prime::Util is in C, I first needed a solution in C. MJD has an old blog post and followup from 2007 related to overflow. This mostly works, but we need a way to detect overflow. RosettaCode has an idea, though overall the code is worse than MJD's.

Experience has shown that once I have to leave the C code, performance takes a huge hit. Hence it would be best to handle everything possible here. I did a little experiment, calculating the 5151 cases of binomials with n in 0..100 and k in 0..n. All but 1355 of them have a 64-bit result, so this is (without going too crazy) the best we can get.

  • RosettaCode's example bails on 1990 cases, including (38,13).
  • MJD's base code bails on 1617 cases, including (63,29).
  • Adding a gcd bails on 1389 cases, including (76,21).
  • Adding a second gcd bails on 1355 cases, all having a result > 2^64.

The single gcd is exactly what MJD suggests in his followup blog and gets most of the cases. However, for r = r * n/d, if we first reduce n/d then r/d, that handles the other cases. The cost is an extra gcd in C (only when it looks like we might overflow), to save us a call to either a GMP binomial turned into a Math::BigInt (not too bad), or to Perl's Math::BigInt bnok (a lot slower).

static UV gcd_ui(UV x, UV y) {
  if (y < x) { UV t = x; x = y; y = t; }
  while (y > 0) {
    UV t = y;  y = x % y;  x = t;  /* y1 <- x0 % y0 ; x1 <- y0 */
  }
  return x;
}
UV binomial(UV n, UV k) {
  UV d, g, r = 1;
  if (k >= n) return (k == n);
  if (k > n/2) k = n-k;
  for (d = 1; d <= k; d++) {
    if (r >= UV_MAX/n) {  /* Possible overflow */
      UV nr, dr;  /* reduced numerator / denominator */
      g = gcd_ui(n, d);  nr = n/g;  dr = d/g;
      g = gcd_ui(r, dr);  r = r/g;  dr = dr/g;
      if (r >= UV_MAX/nr) return 0;  /* Unavoidable overflow */
      r *= nr;
      r /= dr;
      n--;
    } else {
      r *= n--;
      r /= d;
    }
  }
}

Negative Arguments

Once the C code overflows (or the XS layer decides it can't understand the arguments), I go to GMP if available, and Math::BigInt's bnok if not. When I started testing negative arguments, things got interesting. First let's look at the well defined case: n < 0, k >= 0. Knuth 1.2.6.G is quite standard: binomial(-n,k) = (-1)^k * binomial(n+k-1,k). This is handled correctly by Mathematica, Pari, and GMP, but not by Math::BigInt. An RT has been filed (the worst part is that it gives different answers depending on which back end is used).

Things get murky when looking at negative k. Knuth 1.2.6.B indicates that for a positive n, the binomial is 0 when k < 0 or k > n. But what about negative n? Pari says it is 0 in this case as well. GMP's API doesn't allow negative k at all. Math::BigInt also says it is always zero. But Mathematica references Kronenburg 2011 and defines it as (-1)^(n-k) * binomial(-k-1,n-k) when n < 0 and k <= n.

I've decided to follow the full Kronenburg definition for negative arguments. This means doing some additional work in the XS and GMP wrappers, as well as wrapping up Math::BigInt's function.

For GMP, this is relatively easy to handle, given char* strn and strk:


mpz_init_set_str(n, strn, 10);
mpz_init_set_str(k, strk, 10);
if (mpz_sgn(k) < 0) { /* Handle negative k */
if (mpz_sgn(n) >= 0 || mpz_cmp(k,n) > 0) mpz_set_ui(n, 0);
else mpz_sub(k, n, k);
}
mpz_bin_ui(n, n, mpz_get_ui(k));
/* ... return result n as appropriate ... */
mpz_clear(k); mpz_clear(n);

For XS and native-int Perl it's slightly trickier, because we have to overflow in the case where the unsigned binomial succeeded and we need to negate the result but the high bit is set -- hence it can't be represented as a signed integer. Wrapping Math::BigInt isn't too different than GMP. I've put a patch in the RT to do the modifications inside Math::BigInt, but it remains to be reviewed and further tested.

Leave a comment

About Dana Jacobsen

user-pic I've been using Perl since 1991 (4.010 or so). At some point I realized that while I have used many languages, I enjoy using Perl more than any others.