The Perl Toolchain: PAUSE and CPAN

This is the first in a series of blog posts about the Perl toolchain and the collection of tools and modules around it that are central to the CPAN we have today. These posts will illustrate the scope of things worked on at the QA Hackathon. We'll start with the core lifecycle of CPAN modules, focusing on PAUSE and CPAN.

This post is brought to you by FastMail, a gold sponsor for this year's QA Hackathon (QAH). It is only with the support of companies like FastMail that we're able to bring together the lead developers of these tools at the QAH.

Introduction

CPAN is a collection of more than 33 thousand distributions, containing more than 163 thousand modules.

In many languages the equivalent of CPAN is a single central repository: you upload releases to it, and other people go to that site to look for modules / packages. Having found one, they download it from the same site.

CPAN isn't like that. There isn't a single CPAN site, it's more akin to a content distribution network (CDN). There are 230+ CPAN mirrors, each of which has a full copy of 'CPAN'. When you download something "from CPAN", you're talking to a CPAN mirror, and most of the time you don't need to know which one. Very often what people think of as part of CPAN are systems that build on the core PAUSE and CPAN infrastructure.

In this post we'll look at the basics of PAUSE and CPAN, and then in later posts we'll learn about the other pieces of the ecosystem that build on and around the basics.

Modules, distributions, and releases

The element of re-use in Perl is the module. Most of the time a module contains a single class, or a collection of functions. One or more related modules are bundled together in a distribution. A release is a versioned instance of a distribution. Releases are what go on CPAN.

You can learn more about core CPAN terminology in this CPAN Glossary, and more about packages and modules here. The perlmod documentation gives a lot more detail on how modules work.

Here's a simple view of the lifecycle of a CPAN release:

Releases are uploaded to PAUSE, which is the way things get onto CPAN. People search for modules on CPAN, and if they find something of interest, then they download and install it.

Releasing a release

To get your release on CPAN you have to use PAUSE, the Perl Authors Upload Server:

Anyone can register to get a PAUSE account. PAUSE usernames are alphanumeric, and by convention given in uppercase. For example, Rob Mueller is one of the founders of FastMail, and his PAUSE id is ROBM. Once you've got an account, you can configure various things, but the main thing you can do is upload files.

The following shows how PAUSE and CPAN are related. Rob has created a module called Mail::IMAPTalk, which he releases in the Mail-IMAPTalk distribution. Here's what (probably) happened when ROBM released version 4.03 of the Mail-IMAPTalk distribution:

Rob might have uploaded his release via the PAUSE website directly, but he's more likely to have used the cpan-upload. As long as PAUSE hasn't seen that filename before, the release tarball is added to ROBM's author directory on the CPAN Master. The CPAN master is mirrored by a number of sites, and some of those are mirrored in turn.

All registered users of PAUSE have an author directory. For user ROBM you can see his author directory:

When you look at Rob's author directory, notice that it has all releases done by Rob, unless he explicitly deletes them, via the PAUSE web interface.

It is these author directories that make up the bulk of CPAN: every author's directory is copied onto all of the mirrors. From an author's perspective the main purpose of PAUSE is to manage your author directory: adding tarballs by uploading them, and deleting old releases when they've been superseded. Unless you delete them, everything you release will remain on CPAN. If ROBM has permission to release all modules in the release, they'll be added to the CPAN Index. Various indexes are maintained by PAUSE, and periodically written to the CPAN Master, which means you can grab them from any mirror.

The main index tells you which modules appear in which releases. Here's the relevant line for Rob's release:

Mail::IMAPTalk  4.03  R/RO/ROBM/Mail-IMAPTalk-4.03.tar.gz

This says the latest version of module Mail::IMAPTalk is version 4.03 and is contained in ROBM's Mail-IMAPTalk-4.03.tar.gz release. We'll cover indexes in a later article, maybe.

Finding modules

If you know who released a particular module you could just go directly to their author directory and download the release tarball. But most sane people don't do that; they use one of the search engines. The two main ones are MetaCPAN and search.cpan.org. The search engines use the indexes generated by PAUSE to keep track of what's "on CPAN", and inspect the release tarballs to get additional information (so searches will include documentation for modules as well).

The search engines aren't really like typical search engines: they provide a lot of additional information and links for each module and distribution. For many Perl programmers MetaCPAN's page for a module is the module's home page. Many of us read documentation via the search engines, for example.

MetaCPAN is an open source project, developed by a number of Perl programmers for the benefit of the rest of us. You can use it to find modules, read the documentation for them, and also to get at a whole load of other information besides. It also provides an API, which you can use to trawl CPAN programmatically.

Installing modules

Having found a module using a search engine, you could download it from there and install it manually. But most people don't. Typically you will use a CPAN client to download and install things modules that are on CPAN. Perl comes with the CPAN module, and a cpan script, which you can use to install a module. For example, if I wanted to install Rob's Mail::IMAPTalk module I could just run:

cpan Mail::IMAPTalk

It would look for the module in the main index, and find Rob's 4.03 release. It would download that file and install it for you. If Mail::IMAPTalk relies on other CPAN modules, it will install those as well, if they're not already installed. There's a lot more going on behind the scenes, which we might cover in a later article. Another popular client these days is cpanm, which is short for "cpan minus". This is a play on CPANPLUS, another CPAN client, which isn't so popular these days.

About FastMail

Fastmail is a commercial hosted email service founded in 1999, and which has established a reputation for technical leadership in the hosted email space, with a focus on security, privacy, and reliability. It is run by FastMail Pty Ltd, an Australian company based in Melbourne. From their early days they've been users and supporters of Perl, and several of their developers are CPAN authors: BRONG, ROBN, and ROBM (one of the founders). In late 2015 they acquired pobox.com, another hosted mail company and longtime user and supporter of Perl. Pobox's tech lead is RJBS, and they're currently looking to hire an experienced developer.

Thank you to FastMail for their support of the QA Hackathon and Perl.

www.fastmail.com

4 Comments

Hi Neil,

Long time no see (Oli here from back-in-the-day at SpinVox).

As you might remember I've never really got to grips with Perl, but that's not to say I'm not interested...

Anyway, reading this I was struck by one thought:

Is there any way to have the modules and dependencies one might need for an app in a sandboxed or separated fashion, like virtualenv on Python or gemsets in Ruby etc.?

Congrats on QA Hackathon and the sponsorships you have got - if I was in the UK I'd be coming along :-)

Cheers!

Hi Oli,

I think what you're looking for is Perlbrew for Unix, or Berrybrew for Windows.

Cheers,

Steve

ps. Nice writeup Neil!

Sounds like local::lib to me, not Perlbrew/Berrybrew.

This actually sounds like the job for cpanfile. See https://metacpan.org/pod/Carton for managing it all.

Leave a comment

About Neil Bowers

user-pic Perl hacker since 1992.