search.metacpan.org: Building a Sexier CPAN Search

As we began working on iCPAN, we became aware of how problematic it can be to figure just exactly what is in the CPAN. More importantly, we became aware of things we really wanted to do when interacting with CPAN. Last month, Dave Rolsky posted some comments on a next generation CPAN search . He has a fairly extensive list of things which a CPAN search could offer and I'm more than inclined to think that he's on the right track.

A CPAN search site should:


  • be available to the wider community to clone, fork, patch, pull etc

  • let you upvote/downvote modules

  • have tighter integration with reviews, dependency reports etc

  • allow for complex searches

There are really so many things it could be. The problem, as I see it, is that this is too big a job for just one person and that may be where projects have stalled in the past.

There are many valuable, individual efforts out there for improving CPAN searching, but there is no one service which has the scope of search.cpan.org. I'm personally not here to tell you that I'm building a better search.cpan site. What I am saying is that I think it's about more than building a better search. I think the CPAN needs an extensive web service. The web service I envision has info on modules, distributions, authors and ratings. It's RESTful, but it allows for complex queries. It's distributed and the source is open. It could be expanded to include information from CPANTS and github issues. Author info would link directly to Github accounts, blogs and Twitter feeds. It can be expanded to do much more than was originally intended.


Introducing the CPAN-API

This service does exist, to an extent. Toronto.pm got together at our October meeting and decided to put something together. What we've come up with is the CPAN-API project. This is a project which is an expanding web service for CPAN information.


Introducing search.metacpan.org

As we were building this web service, we came to realize that it wouldn't be useful without (at the very least) a proof of concept search site. ioncache took it upon himself to write such a site, but he gave himself a special challenge. His search site would be written entirely in JavaScript. I'm actually quite surprised at how far he has gotten over the last 7 or so weeks. His work can be found at search.metacpan.org.

search.metacpan.org is pure JavaScript. It queries our web service, which is built on ElasticSearch. We're using Clinton Gormley's ElasticSearch CPAN module to index the data which is hosted in our Rackspace Cloud instance.

So far, it's just beta and there are lots of tweaks and feature additions taking place. If you have a moment, take a look at the following searches:

Module search: mojolicious
Distribution search: dancer
Author search: FREW

So, this should give you an idea of what we're up to. Our hope is that this doesn't remain the project of a few folks in Toronto, but that there will be lots of clones, forks and pull requests to really flesh this out. All ideas and comments are welcome. Please get in touch with us with any feedback you may have.

15 Comments

When do you expect to actually compelete the project? I'd like to see this implemented and tested over a wider range of sites so we can workout any bugs or issues on the code side.

Bonnie Smith
COO/Director FXP
http://www.forexpulse.com

The CPAN API looks promising. As to upvoting/downvoting modules, where do you plan to store the data?

Yes, and one idea for such site that I had been thinking a while back might be some kind of social networking site where we can "befriend" (or "like", or "follow", and why not also "hate" or "avoid") modules that we like, and later we can draw "social graphs" of these modules.

You're fscking kiddin' me!

This is beautiful. I wish I could contribute to it (and perhaps, with time) and I hope you'll be successful in this attempt.

Great job!!

This is awesome!

I'm particularly cheering for the upvote/downvote functionality (I've soliloquized about some very preliminary draft work toward a service that would do that at http://babyl.dyndns.org/techblog/entry/cpanvote-a-perl-mini-project eons ago), but the whole thing is very, very exciting.

And I didn't know Toronto.pm was that active. Next time I'm around, I'll have to try and drop in at a meeting, just for giggles. :-)

> If I can convince you to contribute some code and/or get involved in mapping out how some of this stuff should work, that would be great. :)

Twist my arm. :-) I've joined the mailing list and will see what I can contribute to the party.

> Toronto.pm generally meets on the last Thursday of each month.

I'm not very often in the GTA, but we do hop there from time to time for mini-vacations in the Big City(tm). Next time we do that, I'll make sure it's on the last week of a month. :-)

Great work.

But why ElasticSearch? Can I help you get it working with a native Perl search, like KinoSearch?

There's Search::OpenSearch::Engine::KSx with which you can use Search::OpenSearch::Server.

There's no REST API yet, but it's been on my todo list and if you were interested in collaborating on the design, it could rise to the top sooner.

I have nothing against ElasticSearch, or Lucene in general. It just seems like a top-to-bottom Perl solution would be somehow philosophically fitting for a CPAN-focused project.

You can find me on #lucy_dev on freenode if you are interested in pursuing this idea.

Makes total sense. Keep up the good work!

Leave a comment

About Olaf Alders

user-pic I hack on MetaCPAN, CPAN modules and other fun stuff.