Improve My Perl 6!
I was playing around with the Slope One collaborative filtering algorithm. Collaborative filtering is a way of trying to guess what users may like based on past choices. There are roughly two approaches. One is the "neighbor-to-neighbor" approach. In this approach, we try to find people who have expressed similar preferences to your own and we use preferences that they've expressed but you haven't to guess what you might like in the future. This has a few problems. One, it tends to be computationally expensive. Two, you might have identical preferences to many people but if you've not expressed any preferences or the ones expressed have no overlap, you lose.
The other main approach is "item-to-item". Here we go through every user and figure out what they've expressed preferences for and correlate all of their preferences. Thus, instead of "people similar to you like Buck Rogers and thus you'll like Buck Rogers too", we have "people who like brain-dead sci-fi like Buck Rogers and thus since you like brain-dead sci-fi ...". "Item-to-item" is less computationally intensive and can also do a better job of protecting user privacy since the identity of users is not relevant. The "Slope One" algorithm is a very simple but powerful example of this concept and I came across an implementation in Python. Naturally I thought "I want to see that in Perl 6". There's nothing really amazing about my Perl 6 code and I fear I'll be writing "baby Perl 6" for a while, so if you see any "natural" improvements, I'd love to hear about them.
class SlopeOne {
has %!diffs;
has %!freqs;
method update (%userdata) {
for %userdata.kv -> $user, $ratings {
for $ratings.kv -> $item1, $rating1 {
%!freqs{$item1} //= {};
%!diffs{$item1} //= {};
for $ratings.kv -> $item2, $rating2 {
%!freqs{$item1}{$item2} //= 0;
%!diffs{$item1}{$item2} //= 0;
%!freqs{$item1}{$item2} += 1;
%!diffs{$item1}{$item2} += ( $rating1 - $rating2 );
}
}
}
for %!diffs.kv -> $item1, $ratings {
for $ratings.keys -> $item2 {
$ratings{$item2} /= %!freqs{$item1}{$item2};
}
}
}
method predict(%userprefs) {
my ( %preds, %freqs );
for %userprefs.kv -> $item, $rating {
for %!diffs.kv -> $diffitem, $diffratings {
my $freq = %!freqs{$diffitem}{$item};
%preds{$diffitem} //= 0;
%freqs{$diffitem} //= 0;
%preds{$diffitem} += $freq * ( $diffratings{$item} + $rating );
%freqs{$diffitem} += $freq;
}
}
return gather for %preds.keys {
if ! %userprefs{$_} && %freqs{$_} {
take $_ => ( %preds{$_} / %freqs{$_} );
}
}
}
}
my SlopeOne $s .= new;
my %userdata = alice => {
squid => 1.0,
cuttlefish => 0.5,
octopus => 0.2
},
bob => {
squid => 1.0,
octopus => 0.5,
nautilus => 0.2
},
carole => {
squid => 0.2,
octopus => 1.0,
cuttlefish => 0.4,
nautilus => 0.4
},
dave => {
cuttlefish => 0.9,
octopus => 0.4,
nautilus => 0.5
};
$s.update(%userdata);
say $s.predict( { squid => 0.4 } ).perl;
That prints out something almost identical to the Python version:
["cuttlefish" => 0.25, "nautilus" => 0.1, "octopus" => 0.233333333333333]
This post cheerfully written from Calais, France, where I'm spending Christmas with my fiancée and her mother.
One tiny change I made was to have the update return this:
And that allows me to do this:
say $s.predict( { squid => 0.4 } ).fmt("Preference for %s is %0.2f");
But I couldn't figure out how to do that without the intermediate %predictions variable. I'm unsure if this is a limitation in Perl 6 or myself :)
You can force coercion to a hash with
But I'm not sure if you like that bettter.
Also a perfect Perl 6 implementation wouldn't need
%!freqs{$item1}{$item2} //= 0;
and similar lines. I'm pretty sure you can leave it out with current Rakudo if you use prefix ++:@moritz: I like that I can force coercion to a hash, but I had assumed (incorrectly?) that I could return a hashref by wrapping the gather/take in "{}". Is this invalid?
Unfortunately Rakudo does not yet support enough Perl 6 to actually write this idiomatically. A real Perl 6 version would declare the fields like this:
And the first loop in
update
would look something like this:Nice, huh?
@Aristotle: Thank you, and yes, that is nice. Looking forward to Perl 6 handling that.
Woops, that’s broken. I confused myself while trying to follow what’s going on – all those similarly-named variables bound in different levels of immediately nested loops were kinda dizzying to follow. The outer loop is necessary:
Also, masak says indexing hashes on pairs may not work the way I thought. But my reading of the spec seems to indicate that declaring the hashes as something like
my %!freqs{Str, Str}
is supposed to work. The changes to the loop code would be minimal – I think the hash access would then be written%!freqs{$item1; $item2}
. That just so happens to be shorter and less noisy too.