Predicting Football with Math

One of my little obsessions is attempting to figure out a way to create a money-spinner (also known as a cash cow) by predicting the outcome of football matches (also known as soccer matches to those in North America). I did some research and I found that there is a frequentist (I don’t want to get into the argument of Bayesian vs Frequentist approaches to Statistics) method for doing this. Most of the research is from econometrics and has to do with predicting crowd sizes rather than actual outcomes of matches.

To predict the outcome of matches, you can use the Poisson Distribution. One thing that you will notice immediately, however, is that you must input the mean goals per game for each team independently. This means that your prediction assumes that the other team does not exist on the field of play during the match (they are not cross-correlated). To cross-correlate the teams you must you a derivative of the Poisson Distribution called the Skellam Distribution. This distribution cross-correlates two Poisson Distributions (for home and away teams).

Now, you will notice that the formula for the Skellam Distribution needs a Modified Bessel Function of the First Kind. If you look closely (and you will have noticed it in the Poisson Distribution as well), there are two problems with this function. First, it is infinite over a sum of m. Second, m is used in a factorial which causes the computational complexity to explode.

First thing I needed to do was get a gamma function, which I helpfully found on Rosetta Code. The second thing I needed and thanks to a friend in my department, was to figure out how to sum over an infinite. In Haskell this would be simple but since I use Ocaml as my language of choice these days, I needed to do this finitely. So what I decided to do was map across the formula for an arbitrary number of times. I first decided to do this for 1000 times but Ocaml started throwing NaN so I just dialed back the number of times I map’ed across the formula. I ended up settling on 50 as this was a nice round number and I got good results. I will throw the code below:

Module Bessel = struct

let factorial n =
  let rec fact_aux n a =
    match n with
    | 0 -> a
    | _ -> fact_aux (n-1) (n*a)
  fact_aux n 1

let mod_bessel_first m a =
  (1. /. ((float (factorial m)) *. (Lanczos.gamma (float (m + a + 1)))))

let mod_bessel_second m x a =
  (x /. 2.) ** (float (2 * m + a))

let mod_bessel m a x =
  (mod_bessel_first m a) *. (mod_bessel_second m x a)

let run_mod_bessel a x =
  let range = (BatList.of_enum (BatEnum.range 0 ~until:50)) in
  let ans_list = (fun m -> mod_bessel m a x) range in
    List.fold_left (+.) 0. ans_list

There are probably numerous things that I could do to make this code a bit easier to read but the main points are there (also, if you find any bugs, please let me know). So, now it was just a matter of plugging it all together:

let skellam k u1 u2 =
  let first = Lanczos.e ** -.(u1 +. u2) in
  let second = (u1 /. u2) ** ((float k) /. 2.) in
  let third = 2. *. sqrt (u1 +. u2) in
    first *. second *. (Bessel.run_mod_bessel (abs k) third)

So, there we have it. I now have written a program to predict point spreads in football. If you have any interest in the code, I have put it up on github and I would be more than happy for patches and/or comments. Also, it would be interesting to see how accurate it is (and thus how much money you could theoretically make).

Leave a comment

About cyocum

user-pic Celticist, Computer Scientist, Nerd, sometimes a poet…