## Beating memoization

Memoize is a great module. It's one of those fantastic modules that, when you need it, does exactly what you need. Like when you're perfectly healthy and want to beg for money so reach for a crutch. That's how many people use Memoize.

Don't get me wrong: I've been lazy and abused this module myself. It's quick, it's easy, and as it says on the tin: "Make functions faster by trading space for time". So you know there's a trade-off: you save time, but your memory overhead increases. That's OK. Memory is cheap. No argument I might put down will dissuade you on that point.

But is it really faster?

Let's look at how Fibonacci is defined in the documentation:

``````sub fib {
my \$n = shift;
return \$n if \$n < 2;
fib(\$n-1) + fib(\$n-2);
}
``````

Now it's pointed out, quite rightly, that this function is very slow. Sit down and work it out on paper and you'll find out that it grows exponentially. (Dominus is a smart guy and I'm pretty damned certain he knows better solutions and chose this for an example.) So if you wanted to compute the 200th fibonacci number, you can't use this function ... unless you cheat and do this:

``````use Memoize 'memoize';
memoize('fib');
....
``````

On my box, the wait starts to get annoyingly long when N starts to hit the mid-thirties. Without caching, there's no way I could compute the 200th Fibonacci number. Without caring about the actual number, the time is around 0.004 seconds. Not bad, eh? Four one-thousandths of a second. Pretty darn good. Of course, I'm trading memory for this, but so what? I want to be fast, right?

Except that you might be familiar with this algorithm:

``````sub fib2 {
my \$n = shift;
my ( \$last, \$fib ) = ( -1, 1 );
for ( 0 .. \$n ) {
my \$new = \$fib + \$last;
\$last = \$fib;
\$fib  = \$new;
}
return \$fib;
}
``````

Now, instead of computing it recursively, you're computing it iteratively ... and forward. There's no duplication of results (this also tends to be how you work it out by hand).

The time for the 200th number? Around 0.00008 seconds on my box, or almost 50 times faster (with less memory consumption to boot. In this example it's too trivial to be worth discussing, but this is not always the case!).

But maybe I cheated? After all, Memoize has lots of code to handle special cases. Isn't algorithms really about choosing the right tool for the right job? Let me hand-craft my cache:

``````{
my %cache;

sub fib {
no warnings 'recursion';
my \$n = shift;
return \$n if \$n < 2;
return \$cache{\$n} //= fib( \$n - 1 ) + fib( \$n - 2 );
}
}
``````

Sadly, that takes about 0.0005 seconds, almost ten times as fast as Memoize, but still far slower than our non-memoized solution.

Understanding algorithms is about choosing the right tool for the job, but if you don't know your tools, you can't choose them. The next time you want to memoize something, ask yourself if you really came up with the most efficient implementation. You might be surprised.

Full code in case you want to play with it yourself:

``````#!/usr/bin/perl

use v5.14.0;
use strict;
use warnings;
use Time::HiRes qw(gettimeofday tv_interval);

use Memoize 'memoize';

sub timefor(&\$) {
my \$start = [gettimeofday];
\$_[0]->();
say sprintf "\$_[1]: took %s time" => tv_interval(\$start);
}

memoize('fib1');

sub fib1 {
my \$n = shift;
return \$n if \$n < 2;
fib1( \$n - 1 ) + fib1( \$n - 2 );
}

sub fib2 {
my \$n = shift;
my ( \$last, \$fib ) = ( -1, 1 );
for ( 0 .. \$n ) {
my \$new = \$fib + \$last;
\$last = \$fib;
\$fib  = \$new;
}
return \$fib;
}

{
my %cache;

sub fib3 {
no warnings 'recursion';
my \$n = shift;
return \$n if \$n < 2;
return \$cache{\$n} //= fib3( \$n - 1 ) + fib3( \$n - 2 );
}
}

timefor { say fib1(200) } 'backwards';
timefor { say fib2(200) } 'forward';
timefor { say fib3(200) } 'our cache';
``````

Update: The "forward", or fib2 algorithm runs in Ɵ(n) time, but you can actually calculate fibonacci sequences in Ɵ(log n) time with matrix multiplication.

Ha, i'm not alone. :)

I've honstly never encountered a situation where Memoize actually gave me an advantage. What's worse: The one time i was in a situation where Memoize SHOULD have given me an advantage (caching date calculations), it was slower than the original code and only caching manually with a hash and simple string concatenation gave me a speed-up.

I don’t know, I see memoization as a convenient band aid.

You have some complex algorithm, there is a costly step… if you are just gunning to make things fast right now (let’s say your web service just got hit hard and it’s straining to keep up), then I’d try memoizing the function first thing and check if it’ll help buy time.

I don’t see memoization as a permanent or Right solution to anything. It’s just a tool to make things go faster while putting in no effort, or maybe to buy time against rewriting a simple slow algorithm into an uglier faster form – since readability is king. (Fibonacci does not count.)

Even in those cases where caching is the Right solution, you want to write your own instead of leaving it to Memoize.

This is essentially my Benchmarking chapter in Mastering Perl. Better algorithms work better, but you can't assume that your program will only call your subroutine once. Even with the better algorithm, why should you recompute the answers every time? If you're trying to save memory you might want to redo the work, but I bet most people care more about speed.

I think you took way too long to make this point: "optimizing with a better algorithm is often better than optimizing with a cache".

Then everything with Memoize and fib() is just an example, and not a terribly good one because you could make the point with only your hand-crafted cache algorithm. (And then go on to point out briefly how much worse Memoize is than even the hand-crafted cache.)

Judging Memoize itself based solely on fib() makes as little sense as judging Moose based solely on "package Point".

As a simple example: Memoize makes it easy to try caching quickly and see if it helps, which requires less programmer time than rewriting an algorithm or implementing a hand-crafted cache. If it helps "enough" for particularly expensive function, that might be sufficient and let the programmer move on to other productive tasks.

I think Ovid's basic point that we often go through "premature" optimization of poor algorithms instead of choosing the proper algorithm in the first place. Eric Wilhem gave a nice talk on this exact issue at OSCON 2011: http://www.oscon.com/oscon2011/public/schedule/detail/19249. Hopefully, he will post his slides.

I believe a good counter argument here is that it's often cheaper it terms of human time (which is often \$\$\$) to just throw memory/cpu/hardware at the problem. What is the business case for saving a few microseconds? Well, if the job runs once a minute, probably nothing. If it's run millions of times a second, that's another story.

Lastly, in defence of Memoize, memcache, etc., these types of caching facility are enormously useful for hiding the latency of disk and network I/O. They can also have nice benefits like taking some of the load off your database server.

If I new that the domain of my function was natural numbers, I'd use @cache rather than %cache. It's an order of magnitude faster than even the iterative algorithm on my machine and it's just as much of a quick fix as Memoize.

Better yet, with knowledge of algorithm you can conclude that it’s not possible for fib(n-1) to be cached without fib(n-2) also being cached (because the latter must be calculated in order to calculate the former), and for the same reason it’s not possible for fib(n-2) to be uncached without fib(n-1) also being uncached. So you can run the calculation out of the cache, and you only ever need to check whether fib(n-1) is cached,

``````my @fib = (undef, 1, 1);
sub fib4 {
my \$n = shift;
croak "fib(\$n) undefined" if \$n <= 0;
\$fib[\$n] //= ( \$fib[\$n-1] // fib4(\$n-1) ) + \$fib[\$n-2];
}``````

This only ever recurses linearly and only recurses just as far as necessary, never redoing any work.

The nice thing, of course, is that a cache amortises all calculations over all subsequent calls. If you change the benchmark to run many calculations, e.g.:

``timefor { \$a = fib1(180+\$_) for 1..30 } 'backwards';``

you will find that it takes essentially identical time to call `fib4` once or thirty times. It’s hard to make the needle move at all on `fib4`, even as even the other caching versions accumulate runtime. In fact if you compare

``timefor { \$a = fib1 \$_ for 1..200 } 'backwards';``

with

``timefor { \$a = fib1 \$_ for reverse 1..200 } 'backwards';``

you will find that calculating progressively larger Fibonacci values in a loop is faster for `fib4` than calculating them in a single call, since it’s not being forced to recurse even once ever. In that case it beats the other implementations by a very long shot.

If you prime the caches before benchmarking, you will find that `fib4` is 50% faster than `fib3`, which is 6× faster than `fib1` in turn.

(Needless to say, `fib2` is not even a contender after timing a handful of repeated calls.)

All this goes to the point I made above: if you use your knowledge of the algorithm to weave caching into the right place, you get otherwise inconceivable speed-ups.