Improving the CPAN experience (a GSoC summer tale)

By mo on March 20, 2011 10:33 AM

What will MetaCPAN offer that other services don’t?

Instant availability (new uploads are indexed within a minute)
Personalisation - “follow your favourites”
Searchable metadata
Mashup of other CPAN related services
Unified (REST) API
Back-end for Android/iPhone apps, command line tools etc.
MetaCPAN.local for companies
Includes BackPAN as well
Open-source and free

Now what?

Apply for GSoC to get this thing up and running

MetaCPAN is being developed by a group of perl coders who have jobs and all kinds of stuff on their minds. This means it is hard to get the momentum up. I got very much infected by the idea of having an API to CPAN that everyone could use and a front-end that could eventually replace search.cpan.org. So I joined the MetaCPAN group and started coding. And since I’m still a student, GSoC is a great opportunity to delve even deeper into the guts of MetaCPAN and do some serious work.

Community feedback to complete proposal

In order to finish my GSoC application I want to collect as much input as possible from the community. I compiled a list of features that I feel are nice to have and will improve the experience with CPAN. Though not all of them might be feasible or even desirable.

My application will consist of two subprojects. Improving the backend and writing a state-of-the art frontend. While search.metacpan.org is nice, it doesn’t add any additional functionality to search.cpan.org. I’d like to change that and leverage the power of metacpan.

Proposed Features

Personalization

Follow your favorite Modules / Authors
Get instant notifications on updates
with a diff of the Changes file
Add discussions to modules
Tag modules as installed, broken, author unresponsive etc.
Add metadata to your own distribution (e.g. “Looking for maintainer”, deprecated etc.)
“CPAN of trust”

Improved search results

Currently search.cpan.org does a decent job on searching. However, it can be improved. For example it doesn’t show previews of the search results and the relevance of the returned results is sometimes questionable.

Evaluate third-party data

The following resources can be used to adjust the scoring of search results:

cpanvote
Kwalitee
CPAN Testers
CPAN Ratings

PageRank-like scoring

Using the dependency chain, one can create a graph of modules and calculate a PageRank for each module. This will greatly enhance search results since modules with a high degree of centrality will be ranked higher.

Front-end

A full-text search that previews the relevant segments of the document
Optionally limit search to a release / distribution
Search for exact matches in the module name (autocompletion)
Search for authors based on email, name and pauseid
Exclude results with certain dependencies (e.g. modules using Moose or XS code)
Keyboard navigation and shortcuts for super fast and mouse-less browsing
Integrate grep.cpan.me
Rate distributions from inside the new front-end no need to leave the page and re-login
and many many more features

MetaCPAN for companies

minicpan has made it easy for companies to take control over their local CPAN requirements, but they can’t search either their local minicpan, or their own internal code.

MetaCPAN.local:

Will be a distribution that can be installed in your company network
With all the features of MetaCPAN
Add internal company modules to the index
Either index the company’s minicpan or fall back to the live CPAN
Every front-end developed for MetaCPAN will just work for MetaCPAN.local too

Documentation

Nobody is going to use the MetaCPAN backend if there is no documentation which guides you through the basic steps of querying the metabase or setting up your own front-end.

Your Turn

I’m very excited to hear your ideas. Please don’t think too much about implementation details. Let the developer in you rest for a moment and ask youself:

What do I need to access CPAN more easily?
What information do I want to access through MetaCPAN?
What data is required to further improve tools like cpanm?
What am I missing from search.cpan.org?
Basically, what can MetaCPAN and its front-end do for you?

3 comments

3 Comments

Shlomi Fish | March 20, 2011 9:15 PM | Reply

We have brainstormed some possible ways to better help in finding modules on the "Finding a Module on CPAN" page as part of the Rethinking CPAN effort.

For the record, I think that the front page of search.cpan.org should not have the categories there because they are based on the long module list, which is heavily under-maintained and often out-of-date. Maybe something better can be found like a tag cloud. I also could use a somewhat better search than search.cpan.org.

mo replied to comment from Shlomi Fish | March 20, 2011 10:31 PM | Reply

Thanks Shlomi. That are some useful resources!

autarch.urth.org | March 21, 2011 3:57 AM | Reply

There's lots of interesting features listed here, and I know you've read my blog post where I talk about what a search.cpan replacement should do.

I'd encourage you to think of this from a high level before you get too caught up in features.

Right now, we have two related problems. First, it can be very difficult to find a module that does what you want. Second, when you find many modules, it's hard to know which ones to use.

Most of the time, there are people in the community who already know the answer to the question of "what module does X?"

The trick, then, is to find a way to harness that knowledge. There are lots of ways to do this, including tags, web of trust, module reviews/comments, annocpan, etc.

Combine human input with automated input (test results, kwalitee, downstream dep counts) and you could have something very useful.

I'd strongly suggest focusing on enabling user input and using that data for searching and sorting, as opposed to allowing for "power searches" or anything like that.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About mo

I blog about Perl.

More info »

mo