An Overview of MetaCPAN

This week a small group of dedicated Perl developers are gathering in Chicago for meta::hack, the first MetaCPAN hackathon. The primary goal is to complete the transition to Elasticsearch v2, a major undertaking that was started more than a year ago.

Because all the participants are volunteers, this was only possible with sponsorship. Over the next few days we'll be sharing information about MetaCPAN and the work going on, and acknowledging some of the key sponsors.

This post is brought to you by FastMail, a gold sponsor for meta::hack. FastMail is a stalwart supporter of the Perl community — they also sponsored the QA Hackathon this year.

MetaCPAN's beginnings

Like so many good things, MetaCPAN arose out of a conversation in a pub, in October of 2010. Mark Jubenville and Olaf Alders had been working on an iPhone application for browsing CPAN documentation. At some point they realised that the underlying information could be provided by a web service, and such a service might be useful for other people as well. A group of local mongers provided $20 bills to cover cloud hosting and they were off!

One technology suggested on that first night was Elasticsearch, and it looked to be a good fit. Fired up, Olaf spent evenings and weekends for the next 6 weeks implementing a CPAN web service built on Elasticsearch. The MetaCPAN.org domain name was registered in November of 2010 and the API was made public. The project began to snowball, picking up users and enthusiastic contributors and has slowly grown into a resource on which many Perl programmers lean quite heavily.

The Goal

MetaCPAN’s primary purpose is to provide a free web service (API) for querying meta-information about CPAN releases, distributions, and modules.

Our secondary purpose is to provide a web interface for end users to search and browse the same information.

Components of MetaCPAN

There are two main services provided by the MetaCPAN project: the search interface (metacpan.org) and the API (api.metacpan.org).

Although the need for the API is what started this all off, for many users MetaCPAN is the search interface. For most Perl programmers, the search interface is the google of CPAN: it provides a simple search box, into which you can type module names, a description of what you want, or concepts. The search interface uses the API to query not only the metadata of CPAN distributions, but also the full text of all modules' documentation.

For many of us, MetaCPAN is also the way we read documentation for modules, even though we have the documentation installed locally. This is mainly due to the good formatting, but is also made possible by the simple URL structure used.

The API lets you run queries against the database of information about CPAN releases, distributions, modules, authors, and more. This information is aggregated from a range of sources, including PAUSE's indexes, CPAN Testers, CPANTS, and CPAN ratings. The core information is updated frequently, making the API a key resource for people developing additional CPAN tools (I regularly write tools where I want to process "all modules currently on CPAN", and the MetaCPAN API makes this kind of query easy).

Why Elasticsearch?

At its root, MetaCPAN is the result of a hobby project which was meant to be a proof of concept for a web service. Elasticsearch was used for the project not only because it allows for powerful and arbitrarily complex searches (against both metadata and full text) but also because it provides a RESTish API for free. This allowed MetaCPAN to get up and running incredibly quickly, while immediately giving users the ability to run complex queries themselves.

If the project had required a REST API to be designed from the ground up in addition to providing the powerful types of searches which Elasticsearch provides, it would never have gotten off the ground. Elasticsearch “just worked” right out of the box, which allowed the problem at hand to be solved without having to design all of the elements of an API first.

MetaCPAN and the CPAN ecosystem

Because the MetaCPAN API aggregates information about CPAN distributions from a number of sources, it is steadily becoming the one-stop-shop for CPAN tool builders. The following are just some of the services and tools that make use of the API:

  • cpanminus - one of the most widely-used CPAN clients.
  • rt.cpan.org - the free bug tracking service (written in Perl) that provides bug queues for all CPAN distributions.
  • alfred-metacpan - an Alfred workflow for finding module documentation.
  • Perlmodules.net - a web site that lets you track releases of your favourite distributions.
  • DuckDuckGo MetaCPAN Instant Answer
  • rpmcpan - a Modern Perl rpm Packager.
  • matrix.cpantesters.org - provides a tabular view that lets you see how cleanly a distribution tests across different operating systems and versions of Perl. A valuable resource for both CPAN authors and end-users of CPAN modules.
  • mapofcpan.org - a funky interactive visualisation of CPAN namespaces and the modules in them.
  • OrePAN2 - lets you build your own mini CPAN, for managing releases of the modules you use.
  • Test::DependentModules - a module and test script that let you test all distributions on CPAN that rely on your distribution. Invaluable to check that changes to your module(s) aren't going to break anything else on CPAN.
  • Git::CPAN::Patch - allows you to patch CPAN modules using Git.

Hopefully this illustrates how the MetaCPAN API has become a cornerstone of the CPAN ecosystem, and why we've organised the hackathon this week.

MetaCPAN also provides the ++ system (like Facebook's "like", but for CPAN distributions), allowing users to upvote modules which are in line with current best practices. It also allows authors to list numerous contact methods on their profiles, making it easier to locate CPAN authors when you need to find them in a hurry. Because it is open source, user contributions are welcomed and encouraged.

About FastMail

FastMail is a commercial hosted email service founded in 1999, which has established a reputation for technical leadership in the hosted email space, with a focus on security, privacy, and reliability. It is run by FastMail Pty Ltd, an Australian company based in Melbourne. From their early days they've been users and supporters of Perl, and several of their developers are CPAN authors: BRONG, ROBN, and ROBM (one of the founders). In late 2015 they acquired pobox.com, another hosted mail company and longtime user and supporter of Perl. Pobox's tech team includes RJBS and WOLFSAGE.

Leave a comment

About Neil Bowers

user-pic Perl hacker since 1992.