The 'right' name for your CPAN distribution

When you release your module to CPAN, you should make sure that the distribution has the right name. Doing so makes it more likely that all the different tools and systems in the CPAN ecosystem can process your distribution. I'll outline what I mean by that, and how you can get it right. If one or more of your distributions doesn't follow the conventions, maybe you could release a fix on CPAN day?

David Farrell recently posted a brief definition of distributions, modules and packages, but I'll work through an example here.

I released my module Module::Path to CPAN. The distribution name is Module-Path, which you can see is derived from the module name. The most recent version is 0.13, which was uploaded to PAUSE in release Module-Path-0.13.tar.gz. The CPAN relative path to that release on CPAN is authors/id/N/NE/NEILB/Module-Path-0.13.tar.gz.

If you're using ExtUtils::MakeMaker in Makefile.PL, the key bits are:

WriteMakefile(
    NAME => 'Module::Path'
    # other stuff
);

The NAME key is for identifying the package that identifies the distribution; this is often referred to as the 'lead module name'. There's only one module in this dist, but many have multiple modules. You can specify the distribution name explicitly, by including a DISTNAME key, but if you don't, the dist name will be generated from the lead module name.

If you look in the metadata file for your distro, you'll see the dist name in there, under the name key. For example in the META.yml for Module-Path:

name: Module-Path

Why should I care?

Many tools and services in the CPAN ecosystem deal with distributions, rather than modules. For example

You get the idea. Hopefully you can also see that where you release a dist containing a single module, it really makes sense for the dist name to be based on the module name.

CPAN::DistnameInfo

Lots of CPAN related systems keep themselves up to date by watching one of the lists of recent releases. These often just give the path to each release, in the relevant author's directory. For example:

N/NE/NEILB/Module-Path-0.13.tar.gz

You'll see this type of path turning up everywhere. The official index of packages on CPAN is the file 02packages.details.txt, which you can get from CPAN. The file has a line for every module, well more accurately package, that is on CPAN, and the most recent release that contains it. For example, for Module::Path the entry is:

Module::Path                       0.13  N/NE/NEILB/Module-Path-0.13.tar.gz

When you're writing tools that process modules and distributions, you want to get the distribution name. One way to do that would be to go grab that tarball from CPAN, unpack it, and extract the distname from the metadata.

But that's a lot of work, particularly since the dist name is staring you right in the face. Instead you can use Graham Barr's module CPAN::DistnameInfo), which parses the path and gives you various bits of info. So instead of the shenanigans with the tarball you can just write:

$path     = 'N/NE/NEILB/Module-Path-0.13.tar.gz';
$distinfo = CPAN::DistnameInfo->new($path);
$distname = $di->dist;

Ok, so that's a lot easier, and a heck of a lot quicker.

But what if ...

When you're using CPAN::DistnameInfo, you're taking a shortcut, a shortcut that assumes that the release file name contains the distribution name which is also in the metadata.

What if those two things are in fact different? Surely that never happens? I thought so, until I wrote a script to check, and discovered 170 dists where it did happen. I'm slowly working through that list submitting bugs and pull requests.

Also, CPANTS (which I covered yesterday), has a check for this. For example, if you look at LEEJO's author page, you'll notice that the first red cross for CGI is 'distname_matches_name_in_meta'. The distname is META.yml is CGI.pm, but CPAN::DistnameInfo says it's CGI (it's a special case in CPAN::DistnameInfo, but it's still wrong).

Is this post ever going to finish?!

The TL;DR for this post is:

  • The distribution name should be based on the module name.
  • Make sure the release file name matches the dist name in the metadata

If you've got more than one module in your distribution, make sure you have a lead module, and base the dist name on that. In the future that's likely to be enforced, as documented in the Lancaster Consensus, which is a set of standards and practices for QA and toolchain authors, agreed at the QA Hackathon 2013.

10 Comments

PAUSE has some naming guidelines in On the naming of modules for choosing the actual module name.

Maybe you can roll some of this into it (the PAUSE Github repo. :)

Great points, Neil!

Also - if your distribution could be named better, why not perform that renaming on CPAN Day (August 16)? :)

CGI.pm is a special case as it pre-dates a lot of the infrastructure built around CPAN, including CPAN itself. So before it was this, now it's that, but should be something else. There is a ticket to get the distribution name changed, which i will look into soon, but my feeling is it will break a lot of stuff outside of the CPAN network. Maybe. I don't know. History, bah!

For reference: https://github.com/leejo/CGI.pm/issues/109

Thanks Neil! I will probably get this done for the next release.

Heh, sadly not as i will be away. Certainly within the next couple of months however.

Well it took a while (pesky Windows...) but i've just uploaded the latest version that fixes this.

Leave a comment

About Neil Bowers

user-pic Perl hacker since 1992.