December 2010 Archives

Currying

Until recently, while I thought currying was kind of cool, I did not really understand what the use of it was. I call a function and I am done with it. It was not until I was cleaning up some code in my football predictor project that I started to understand how currying could make your life easier.

Let us say that you have a function that takes an input and based on that input does some calculations. Now, this is normal in Perl. For instance,

sub foo {
    my ($choice) = @_;
    my @float_lst;
    my $total;

    if($choice eq "win") {
        @float_lst = map { run_prediction($_) } (1..10);
        $total += $_ for @float_lst;
    } elsif($choice eq "lose")
        @float_lst = map { run_prediction($_) } (-1..-10);
        $total += $_ for @float_lst;
    } elsif($choice eq "draw") {
        @float_lst = map { run_prediction($_) } (0);
        $total += $_ for @float_lst;
    }
    return $total;
}

You probably notice that you repeat a rather long line of statements. In Ocaml, you can use currying to cut down on those (I am sure there are ways of doing this in Perl as well).

 type outcome = Win | Lose | Draw

 let wins = [1;2;3;4;5;6;7;8;9;10]
 let draw = [0]
 let lose = [-1;-2;-3;-4;-5;-6;-7;-8;-9;-10]

 let calc_outcome outcome =
     let calc_outcome_aux = BatList.map run_prediction in
     let fold_float = List.fold_left (+.) 0. in
         match outcome with
             | Win ->
                 fold_float (calc_outcome_aux wins)
             | Lose ->
                 fold_float (calc_outcome_aux lose)
             | Draw ->
                 fold_float (calc_outcome_aux draw)

Notice how I have called BatList.map and List.fold_left without the required number of arguments. This is not an “off by one error”. What the compiler does is break down the function into a set of single argument functions that it then stores the next in the chain in the lexical bindings. Once you have fulfilled the required number of arguments, it will return its answer. Of course, you can imagine using map over a function that has more than one argument that returns a list of functions that have the other required arguments left to fulfill. In addition, you can call functions where you know one argument will take a long time to compute so you can do the quick arguments first then compute the heavy argument.

In this case, I was able to cut down the length of a line and repeated statements and reuse them based on user input.

More Football Prediction with Math

I had an interesting talk with a friend over the weekend about the math behind the prediction. The outcome of which is that to come up with win, lose, draw predictions, all you need to do is add up the predictions for all point differences. Let me give you an example, if the point difference for the home team is 1 then they have one by one goal (1-0, 2-1, 3-2, …). If you take the draw then all draws are equal thus the goal difference of 0 is the chance of all draws (0-0,1-1, 2-2,…). If you take the lose position then the goal difference for the home time is -1 which covers (0-1, 1-2, 2-3,….). So, you do this up to a goal difference of 5. This makes it much easier to make a match prediction.

Let me leave you with a prediction. Today Liverpool is playing versus Aston Villa. Here is how it breaks down: Liverpool has a goals per game at home average of 1.71 and Aston Villa has a goals per game away average of 0.71 according to Soccer Stats. You can probably already predict the outcome just from that but here is the percentage breakdown:

  • home win —> 60.92%
  • draw —> 23.44%
  • away win —> 15.15%

Predicting Football with Math

One of my little obsessions is attempting to figure out a way to create a money-spinner (also known as a cash cow) by predicting the outcome of football matches (also known as soccer matches to those in North America). I did some research and I found that there is a frequentist (I don’t want to get into the argument of Bayesian vs Frequentist approaches to Statistics) method for doing this. Most of the research is from econometrics and has to do with predicting crowd sizes rather than actual outcomes of matches.

To predict the outcome of matches, you can use the Poisson Distribution. One thing that you will notice immediately, however, is that you must input the mean goals per game for each team independently. This means that your prediction assumes that the other team does not exist on the field of play during the match (they are not cross-correlated). To cross-correlate the teams you must you a derivative of the Poisson Distribution called the Skellam Distribution. This distribution cross-correlates two Poisson Distributions (for home and away teams).

Now, you will notice that the formula for the Skellam Distribution needs a Modified Bessel Function of the First Kind. If you look closely (and you will have noticed it in the Poisson Distribution as well), there are two problems with this function. First, it is infinite over a sum of m. Second, m is used in a factorial which causes the computational complexity to explode.

First thing I needed to do was get a gamma function, which I helpfully found on Rosetta Code. The second thing I needed and thanks to a friend in my department, was to figure out how to sum over an infinite. In Haskell this would be simple but since I use Ocaml as my language of choice these days, I needed to do this finitely. So what I decided to do was map across the formula for an arbitrary number of times. I first decided to do this for 1000 times but Ocaml started throwing NaN so I just dialed back the number of times I map’ed across the formula. I ended up settling on 50 as this was a nice round number and I got good results. I will throw the code below:

Module Bessel = struct

let factorial n =
  let rec fact_aux n a =
    match n with
    | 0 -> a
    | _ -> fact_aux (n-1) (n*a)
in
  fact_aux n 1

let mod_bessel_first m a =
  (1. /. ((float (factorial m)) *. (Lanczos.gamma (float (m + a + 1)))))

let mod_bessel_second m x a =
  (x /. 2.) ** (float (2 * m + a))

let mod_bessel m a x =
  (mod_bessel_first m a) *. (mod_bessel_second m x a)

let run_mod_bessel a x =
  let range = (BatList.of_enum (BatEnum.range 0 ~until:50)) in
  let ans_list = BatList.map (fun m -> mod_bessel m a x) range in
    List.fold_left (+.) 0. ans_list
end

There are probably numerous things that I could do to make this code a bit easier to read but the main points are there (also, if you find any bugs, please let me know). So, now it was just a matter of plugging it all together:

let skellam k u1 u2 =
  let first = Lanczos.e ** -.(u1 +. u2) in
  let second = (u1 /. u2) ** ((float k) /. 2.) in
  let third = 2. *. sqrt (u1 +. u2) in
    first *. second *. (Bessel.run_mod_bessel (abs k) third)

So, there we have it. I now have written a program to predict point spreads in football. If you have any interest in the code, I have put it up on github and I would be more than happy for patches and/or comments. Also, it would be interesting to see how accurate it is (and thus how much money you could theoretically make).

About cyocum

user-pic Celticist, Computer Scientist, Nerd, sometimes a poet…