April 2019 Archives

Memoising - standardisation of normalisation

Hopefully, the sesquipedalian polysyllabalisation of the title will have made your eyes glaze over. Now to wake you up: MASSIVE PERFORMANCE GAINZ!

Memoising, as any fule kno, is storing answers after they're calculated, against each set of inputs, so you don't have to keep re-calculating the same thing. If the calculating process is expensive, or even just more expensive than normalising (see below) and a lookup, this can make your programs run faster. Possibly much faster!

This is a thing that works great for computing factorials or Fibonacci numbers. It's a terrible idea for calculating the lengths of simple lists, since normalising would take longer than simply recalculating, and worse would be very unlikely to give repeated hits.

A hybrid answer for that case would be to only store the lengths, but that would still not add value in that specific case because if the normalised input is the answer, this guarantees that normalise+lookup will be slower than normalise=calculate. It does however give a hint that calculating(ha!) whether to normalise or not depends in part on the relationship between normalising and calculating, together with the likelihood of cache hits in that domain.

The memoising process works like so:

  • normalise the inputs: getting the right thing to store answers against
  • look to see if an answer for those inputs exists, and return it if so
  • calculate the answer
  • store that
  • return it

The problem here, such as it is, is that while the Perl module obviously has a facility for normalising inputs, I would say it's most orientated towards functions, rather than methods. My claim is that what's needed here is a convention, or even just a module (probably based heavily on Attribute::Memoize, to make this easy. It would probably just default to a method in the existing package called normalise.

A remaining problem would be what I am currently thinking of as "partial memoisation". This is inspired by considering SQL::Abstract's select method. This call:

$sql->select('tickets', '*', { id => 3 });

returns:

("SELECT * FROM tickets WHERE ( id = ? )", 3)

There is no absolutely general way of knowing from the outside that one should normalise based on the keys of the third parameter, then include the memoised output (here, the SQL) as part of the returns, together with the remaining stuff. Here, one would:

  • split the internals of the SQL-generator out into a memoisable method
  • make a _fractionate_where method that could be used by the above, plus the public select as below, to return in suitable order array-refs with the normalisable keys, and the values
  • have the normalise method use its knowledge of the relevant object state (e.g. quote_char) plus the "fractionated" keys of \%where to produce a normalised input
  • lookup / calculate the SQL using normal memoisation as already discussed
  • have the select method use that, then return the SQL plus the "fractionated" values

"Someone should" (which means I will) make an Attribute::Memoize::Defaults (or a better name - weigh in!) that implements the above, then pull-request its use onto SQL::Abstract.

Perl community, your thoughts are welcome!

About Mohawk

user-pic I blog about Perl.