Improve My Perl 6!

I was playing around with the Slope One collaborative filtering algorithm. Collaborative filtering is a way of trying to guess what users may like based on past choices. There are roughly two approaches. One is the "neighbor-to-neighbor" approach. In this approach, we try to find people who have expressed similar preferences to your own and we use preferences that they've expressed but you haven't to guess what you might like in the future. This has a few problems. One, it tends to be computationally expensive. Two, you might have identical preferences to many people but if you've not expressed any preferences or the ones expressed have no overlap, you lose.

The other main approach is "item-to-item". Here we go through every user and figure out what they've expressed preferences for and correlate all of their preferences. Thus, instead of "people similar to you like Buck Rogers and thus you'll like Buck Rogers too", we have "people who like brain-dead sci-fi like Buck Rogers and thus since you like brain-dead sci-fi ...". "Item-to-item" is less computationally intensive and can also do a better job of protecting user privacy since the identity of users is not relevant. The "Slope One" algorithm is a very simple but powerful example of this concept and I came across an implementation in Python. Naturally I thought "I want to see that in Perl 6". There's nothing really amazing about my Perl 6 code and I fear I'll be writing "baby Perl 6" for a while, so if you see any "natural" improvements, I'd love to hear about them.

class SlopeOne { has %!diffs; has %!freqs; method update (%userdata) { for %userdata.kv -> $user, $ratings { for $ratings.kv -> $item1, $rating1 { %!freqs{$item1} //= {}; %!diffs{$item1} //= {}; for $ratings.kv -> $item2, $rating2 { %!freqs{$item1}{$item2} //= 0; %!diffs{$item1}{$item2} //= 0; %!freqs{$item1}{$item2} += 1; %!diffs{$item1}{$item2} += ( $rating1 - $rating2 ); } } } for %!diffs.kv -> $item1, $ratings { for $ratings.keys -> $item2 { $ratings{$item2} /= %!freqs{$item1}{$item2}; } } } method predict(%userprefs) { my ( %preds, %freqs ); for %userprefs.kv -> $item, $rating { for %!diffs.kv -> $diffitem, $diffratings { my $freq = %!freqs{$diffitem}{$item}; %preds{$diffitem} //= 0; %freqs{$diffitem} //= 0; %preds{$diffitem} += $freq * ( $diffratings{$item} + $rating ); %freqs{$diffitem} += $freq; } } return gather for %preds.keys { if ! %userprefs{$_} && %freqs{$_} { take $_ => ( %preds{$_} / %freqs{$_} ); } } } } my SlopeOne $s .= new; my %userdata = alice => { squid => 1.0, cuttlefish => 0.5, octopus => 0.2 }, bob => { squid => 1.0, octopus => 0.5, nautilus => 0.2 }, carole => { squid => 0.2, octopus => 1.0, cuttlefish => 0.4, nautilus => 0.4 }, dave => { cuttlefish => 0.9, octopus => 0.4, nautilus => 0.5 }; $s.update(%userdata); say $s.predict( { squid => 0.4 } ).perl;

That prints out something almost identical to the Python version:

["cuttlefish" => 0.25, "nautilus" => 0.1, "octopus" => 0.233333333333333]

This post cheerfully written from Calais, France, where I'm spending Christmas with my fiancée and her mother.

6 Comments

You can force coercion to a hash with

return %( gather for %preds.kv ...)

But I'm not sure if you like that bettter.

Also a perfect Perl 6 implementation wouldn't need %!freqs{$item1}{$item2} //= 0; and similar lines. I'm pretty sure you can leave it out with current Rakudo if you use prefix ++:

++%!freqs{$item1}{$item2}

Unfortunately Rakudo does not yet support enough Perl 6 to actually write this idiomatically. A real Perl 6 version would declare the fields like this:

has %!diffs{Pair};
has %!freqs{Pair};

And the first loop in update would look something like this:

my @rating = %userdata.values».pairs;
for @rating X @rating -> ( $item1, $rating1 ), ( $item2, $rating2 ) {
    %!freqs{$item1 => $item2} ++;
    %!diffs{$item1 => $item2} += ( $rating1 - $rating2 );
}

Nice, huh?

Woops, that’s broken. I confused myself while trying to follow what’s going on – all those similarly-named variables bound in different levels of immediately nested loops were kinda dizzying to follow. The outer loop is necessary:

for %userdata.values -> %user_ratings {
    my @rating = %user_ratings.pairs;
    for @rating X @rating -> ( $item1, $rating1 ), ( $item2, $rating2 ) {
        %!freqs{$item1 => $item2}++;
        %!diffs{$item1 => $item2} += ( $rating1 - $rating2 );
    }
}

Also, masak says indexing hashes on pairs may not work the way I thought. But my reading of the spec seems to indicate that declaring the hashes as something like my %!freqs{Str, Str} is supposed to work. The changes to the loop code would be minimal – I think the hash access would then be written %!freqs{$item1; $item2}. That just so happens to be shorter and less noisy too.

Leave a comment

About Ovid

user-pic Have Perl; Will Travel. Freelance Perl/Testing/Agile consultant. Photo by http://www.circle23.com/. Warning: that site is not safe for work. The photographer is a good friend of mine, though, and it's appropriate to credit his work.