CPAN module recommendation system

A little confession/reasoning/backstory: I love CPAN surfing. You know, watching the latest releases, browsing module dependencies and other modules by the same authors... And favoriting the interesting stuff. Stats show that I'm not supposed to be the only one. If so, wouldn't it be nice to provide a crowdsourced recommendation for CPAN modules? Think: "People who favorite Mojolicious often favorite: AnyEvent, Data::Printer, Devel::NYTProf, Dist::Zilla...". Plus, given the user's favorites, own releases & own release dependencies, a custom-tailored module suggestion list could be build for any PAUSE ID subscribed to the MetaCPAN. Enter the CPAN::U experiment.

It uses Amazon's item-to-item algorithm to group CPAN modules with the greatest cosine similarity between the feature vectors of their "likers".

tl;dr: a poor man's Cinematch.

Not a big deal, but it addresses, at least partially, an issue raised in the recent Categorizing CPAN modules post (which was an actual inspiration for wrapping up a public release for some quick & dirty code written one day prior to that post publication). And it is fun to explore, after all!

The next logical step is to tweak my fork of metacpan-web to incorporate the recommendation API. But first, the API needs to be tested. That's the main purpose for the CPAN::U experiment: to be a crash test dummy for the further "return to the source". And this is why I'm kindly asking for your help. There are too many questions unanswered (which can be replied directly on the project's landing page):

  • Query your PAUSE ID and/or a few not-so-ubiquitous modules you know. Rate the results, from 0 for "complete nonsense" to 5 for "the module I was long searching for".
  • Is it slow? Does it crashes? Does it work at all, in your browser? (I suck at frontend, sorry 'bout Bootstrap thingie)
  • Are you aware of any collaborative filtering algorithms more appropriate for this task?
  • Could it be implemented as a part of the MetaCPAN API, at all? (currently, there is a Perl fetch/process script which populates a CouchDB database which is queried via Ajax)

As always, pull requests are welcome!

4 Comments

This looks really good. It has already pointed me towards App::p as a module I'd like to try out. :) I think there's stuff in here which would be great to incorporate into MetaCPAN at some point. It works really well as a stand-alone concept, too, so maybe it would be a matter of incorporating some of the ideas into metacpan-web and also linking back to the CPAN::U from metacpan-web. I look forward to seeing how this progresses. I'll have a look at your MetaCPAN fork as well.

This looks very nice. I like the discussions already attached to the modules, although that should probably be merged with http://cpanforum.com

Another type of recommendation system that I've been pondering for the longest time is a way to say "instead of module X, use module Y", which would maybe help with the typical problem of finding which modules of a certain problem-space are recommended / maintained / best practices. I should make implementing a prototype of that one of my Holidays projects. :-)

Hi stas

It recommended 'perl' for my CPANID, so it can't be all bad ;-).

Actually, if you eliminate Dancer, Mojolicious, and Catalyst::Runtime (all roughly equivalent) from the list, I use all the other modules recommended, which is a good sign indeed.

If you want to make it really clever, scan the Build.PLs, or Makefile.PLs, of each of the CPANID owner's distros, and don't bother recommending modules they already use.

Which of course leads to the other suggestion above, recommend X to replace Y when they already use Y.

But that suffers from the same hopefully-small problem of the whole concept. It's all based on personal preference, rather than some rigorous and consistent evaluation of the modules. But I see no way around that - it seems that's the best we can do...

Just don't extend the idea to wine - we all have our own taste buds!

Cheers
Ron

Leave a comment

About stas

user-pic Just another lazy, impatient and arrogant IT guy.