Improving the CPAN experience (a GSoC summer tale)

What will MetaCPAN offer that other services don’t?

  • Instant availability (new uploads are indexed within a minute)
  • Personalisation - “follow your favourites”
  • Searchable metadata
  • Mashup of other CPAN related services
  • Unified (REST) API
  • Back-end for Android/iPhone apps, command line tools etc.
  • MetaCPAN.local for companies
  • Includes BackPAN as well
  • Open-source and free

Now what?

Apply for GSoC to get this thing up and running

MetaCPAN is being developed by a group of perl coders who have jobs and all kinds of stuff on their minds. This means it is hard to get the momentum up. I got very much infected by the idea of having an API to CPAN that everyone could use and a front-end that could eventually replace search.cpan.org. So I joined the MetaCPAN group and started coding. And since I’m still a student, GSoC is a great opportunity to delve even deeper into the guts of MetaCPAN and do some serious work.

Community feedback to complete proposal

In order to finish my GSoC application I want to collect as much input as possible from the community. I compiled a list of features that I feel are nice to have and will improve the experience with CPAN. Though not all of them might be feasible or even desirable.

My application will consist of two subprojects. Improving the backend and writing a state-of-the art frontend. While search.metacpan.org is nice, it doesn’t add any additional functionality to search.cpan.org. I’d like to change that and leverage the power of metacpan.

Proposed Features

Personalization

  • Follow your favorite Modules / Authors
  • Get instant notifications on updates
  • with a diff of the Changes file
  • Add discussions to modules
  • Tag modules as installed, broken, author unresponsive etc.
  • Add metadata to your own distribution (e.g. “Looking for maintainer”, deprecated etc.)
  • “CPAN of trust”

Improved search results

Currently search.cpan.org does a decent job on searching. However, it can be improved. For example it doesn’t show previews of the search results and the relevance of the returned results is sometimes questionable.

Evaluate third-party data

The following resources can be used to adjust the scoring of search results:

  • cpanvote
  • Kwalitee
  • CPAN Testers
  • CPAN Ratings

PageRank-like scoring

Using the dependency chain, one can create a graph of modules and calculate a PageRank for each module. This will greatly enhance search results since modules with a high degree of centrality will be ranked higher.

Front-end

  • A full-text search that previews the relevant segments of the document
  • Optionally limit search to a release / distribution
  • Search for exact matches in the module name (autocompletion)
  • Search for authors based on email, name and pauseid
  • Exclude results with certain dependencies (e.g. modules using Moose or XS code)
  • Keyboard navigation and shortcuts for super fast and mouse-less browsing
  • Integrate grep.cpan.me
  • Rate distributions from inside the new front-end no need to leave the page and re-login
  • and many many more features

MetaCPAN for companies

minicpan has made it easy for companies to take control over their local CPAN requirements, but they can’t search either their local minicpan, or their own internal code.

MetaCPAN.local:

  • Will be a distribution that can be installed in your company network
  • With all the features of MetaCPAN
  • Add internal company modules to the index
  • Either index the company’s minicpan or fall back to the live CPAN
  • Every front-end developed for MetaCPAN will just work for MetaCPAN.local too

Documentation

Nobody is going to use the MetaCPAN backend if there is no documentation which guides you through the basic steps of querying the metabase or setting up your own front-end.

Your Turn

I’m very excited to hear your ideas. Please don’t think too much about implementation details. Let the developer in you rest for a moment and ask youself:

  • What do I need to access CPAN more easily?
  • What information do I want to access through MetaCPAN?
  • What data is required to further improve tools like cpanm?
  • What am I missing from search.cpan.org?
  • Basically, what can MetaCPAN and its front-end do for you?

3 Comments

We have brainstormed some possible ways to better help in finding modules on the "Finding a Module on CPAN" page as part of the Rethinking CPAN effort.

For the record, I think that the front page of search.cpan.org should not have the categories there because they are based on the long module list, which is heavily under-maintained and often out-of-date. Maybe something better can be found like a tag cloud. I also could use a somewhat better search than search.cpan.org.

There's lots of interesting features listed here, and I know you've read my blog post where I talk about what a search.cpan replacement should do.

I'd encourage you to think of this from a high level before you get too caught up in features.

Right now, we have two related problems. First, it can be very difficult to find a module that does what you want. Second, when you find many modules, it's hard to know which ones to use.

Most of the time, there are people in the community who already know the answer to the question of "what module does X?"

The trick, then, is to find a way to harness that knowledge. There are lots of ways to do this, including tags, web of trust, module reviews/comments, annocpan, etc.

Combine human input with automated input (test results, kwalitee, downstream dep counts) and you could have something very useful.

I'd strongly suggest focusing on enabling user input and using that data for searching and sorting, as opposed to allowing for "power searches" or anything like that.

Leave a comment

About mo

user-pic I blog about Perl.