Dependency phases in CPAN distribution metadata

In the previous article in this series we gave a general introduction to the distribution metadata which is included in releases as files META.json and/or META.yml. In this article I'll drill into more detail at one critical component of a distribution's metadata: dependencies, also known as prerequisites (usually shortened to "prereqs"). This is how you specify other CPAN modules that your distribution depends on.

This post is brought to you by Booking.com, a platinum sponsor for the Perl Toolchain Summit. Booking.com is one of the largest Perl shops in the world, and so depends heavily on the toolchain. Thank you to Booking.com for supporting the summit.

Background

In the early days of CPAN, distributions had a Makefile.PL that used ExtUtils::MakeMaker. The PREREQ_PM key takes a hash, with the keys being module names, and the value being either a minimum version number required for that module, or 0. Here's an example from Data::Direct:

use ExtUtils::MakeMaker;
WriteMakefile(
    'NAME'          => 'Data::Direct',
    'VERSION_FROM' => 'Direct.pm', # finds $VERSION
    'PREREQ_PM'    => {'DBI' => '1.10'},
);

All modules required by your distribution should be listed in the hash, whether they were required for the Makefile.PL to run, to build your module, to run it, or just for the tests. And what about optional requirements? If you listed them, and one of them couldn't be installed, it would prevent your distribution from being installed.

These issues, and others, led to the development of distribution metadata, and ultimately to version 2 of the metadata spec, which is the version of the spec supported by META.json. As a reminder, the META.yml file only supports the meta spec version 1.4.

There are two factors to consider when defining your prereqs:

  • The phase(s) of installation for which the module is required, and
  • How hard the requirement is -- can your distribution still be installed if the module isn't available?

In this article we're going to look at the phases, some of which are only supported in META.json; we'll cover the rest of the dependency picture in the next article.

Phases

There are five phases that a module can be required for: configure, build, runtime, test, and develop. We'll look at each of these in turn.

Configure dependencies

A configure dependency is a module that must be installed before you can run either of these:

perl Makefile.PL
perl Build.PL

If your distribution includes a Makefile.PL based on ExtUtils::MakeMaker, then ExtUtils::MakeMaker is a configure dependency. Likewise, distributions with a Build.PL based on Module::Build would have Module::Build as a configure dependency.

Most Makefile.PLs don't use any modules beyond ExtUtils::MakeMaker, but if yours does, then those modules are configure dependencies as well. For example, if you look at the Makefile.PL for BDFOY's Business::xISBN, you'll see that it uses File::Spec::Functions, so it's listed in the configure section of the prereqs in the META.json file.

Some distributions (typically older ones) have a Makefile.PL based on Module::Install. Typically for these distributions, Module::Install and associated modules (such as plugins), are bundled in the release, in the inc/ directory (for example, look at the top directory of Catalyst-Devel. For such distributions, Module::Install and any plugins used aren't configure dependencies, because they don't need to be installed before you run perl Makefile.PL. That's the advantage of Module::Install. One of the disadvantages is that there are distributions on CPAN with various old versions of Module::Install, which will have bugs fixed in later releases of Module::Install.

Most distributions that specify configure dependencies have just one: there are only 46 distributions that have 4 or more configure prereqs. For example, Florian Ragwitz's Cond-Expr has 5 configure dependencies. Though notice that "perl" is one of them, and IO::File is another. Generally "perl" is only listed as a runtime requirement, and this is only done when you want to specify the earliest version of Perl that your distribution supports.

Build dependencies

A build dependency is a module that must be installed before you can run make or ./Build.

If the module is already a configure dependency, you don't need to list it again, and it's better practice if you don't. For example, for a Makefile.PL, you don't need to list ExtUtils::MakeMaker as a build dependency if it's already in the configure prereqs.

Only 4% of distributions (just over 1500) have build dependencies specified. For example, the META.json for App-Memcached-CLI (generated by Minilla) lists Pod::Markdown::Github as a build dependency.

While writing this article, I wrote a script to look for distributions with dependencies with the less-common phases. It turned out that quite a few of those 1500 distributions list test dependencies as build dependencies. For example look at the META.json for LinkEmbedder, which lists Test::More and Test::Deep. This is because historically there was no way to separately specify test dependencies, so test dependencies would often be listed as build dependencies.

Runtime dependencies

A runtime dependency is a module must be installed before you can use the distribution in your Perl code. So runtime dependencies have to be installed before your distribution can be installed, but they almost certainly have to be installed before your test suite can be run as well.

When talking about your distribution's dependencies, it's very easy to only think about the runtime dependencies, but hopefully you can already see that it's a more complex picture.

Test dependencies

A test dependency is a module is that must be installed before someone can run your distribution's regular test suite, for example with make test or ./Build test.

All of your runtime dependencies are automatically test dependencies, so you don't need to list those again.

A classic test dependency is Test::More, but any other test modules you use, like Test::Deep or Test::RequiresInternet, should be listed as well.

Most CPAN distributions have just a few, or no, test dependencies. And then there are a small number of distributions like Web::Machine, which has 17 test dependencies. Though notice that there are some listed there should shouldn't be, such as ExtUtils::MakeMaker, which is also listed as a configure dependency.

Develop dependencies

A develop dependency is a module that is "needed to work on the distribution's source code as the author does". This can cover a multitude of sins:

  • You may generate the module, or its documentation, using the Template Toolkit, or one of the many other templating modules on CPAN.
  • If you have author tests or release tests, then any modules used in those that aren't also used in your regular tests (those run when you type make test, and covered with test dependencies).
  • If you do something non-standard when building a release tarball (e.g. with make dist), that relies on a CPAN module.
  • If you have additional tools that you use during development, for example to benchmark your working version against previous releases.

A classic example of a test that should be a release test rather than a regular test (run when someone's installing your distribution) is checking the validity of your pod documentation. If you've made a mistake in your pod, that shouldn't mean that someone can't install your distribution.

If you look at App-DBI-Loader, for example, it has a test t/release-pod-coverage.t. If you look at the test, you'll see it is skipped unless you're running release tests. As a result, if you look at the develop prereqs, you'll see that Test::Pod::Coverage is listed there, and not as a test dependency.

Repeating dependencies in multiple phases

You don't always need to specify dependencies if they apply to multiple phases. In general:

  • List configure requirements
  • List any build requirements that weren't already listed as configure requirements
  • List all runtime requirements, even if they were listed as configure and/or build requirements)
  • List test requirements not already listed
  • List develop requirements not already listed

Conclusion

A well-behaved modern CPAN distribution classifies its dependencies according to when they're required. If your distribution relies on other CPAN modules, try to ensure they're associated with the right phase.

How do you get the dependencies into your distribution's metadata? Remember that in the previous article we said that the metadata files should always be automatically generated. We'll cover this in more detail in the next article, but in the meantime:

  • If you have a Makefile.PL based on ExtUtils::MakeMaker, look at Makefile.PL for Business-xISBN.
  • If you have a Build.PL based on Module::Build, look at the Build.PL for Alien-CMake.
  • Dist::Zilla uses Makefile.PL/ExtUtils::MakeMaker, with the Makefile.PL auto-generated. Have a look at the Makefile.PL for JSON-Typist, which shows how you check whether the local version of ExtUtils::MakeMaker supports test, build, and configure dependencies. For old versions of ExtUtils::MakeMaker, it just rolls those dependencies into PREREQ_PM.

Thanks to David Golden for his help with this post.

About Booking.com

Booking.com B.V., part of the Priceline Group (Nasdaq: PCLN), owns and operates Booking.com, the world leader in booking accommodations online. Each day, over 1,200,000 room nights are reserved on Booking.com. Booking.com has supported Perl in countless ways over the years, and employs many well-known CPAN authors, including Sawyer X, Steffen Mueller, Philippe Bruhat, Mickey Nasriachi, Graham Knop, Rafaël Garcia-Suarez, Yves Orton, Stevan Little, and an awful lot more (you can see a hopefully complete list in the source of ACME::CPANAuthors::Booking).

3 Comments

Nice post, thank you!

I didn't get where the development dependencies come from. Are they autogenerated? Do they come from a specific file? Is there a way to specify them if I'm using MakeMaker?

Excellent article!

Can you briefly describe how do you package Perl 5 ecosystem for _production_ environment? In my case it goes like this:

1. Build Perl itself. Holy cow, 55MB! Worst bundle ever for slim Docker containers, because it throws everything into one bucket. Develop dependencies (like Pod::Perldoc, Devel::, Benchmark, TAP::Parser::* or CPAN::*), build dependencies (ExtUtils::*, Module::*), runtime dependencies (Unicode stuff and pragmas) and of course tons of stuff no one really uses nowadays (NBDM, ODBM, SBDM interfaces, ptar, zipdetails binaries).

I truly hate this "distribution approach". Unbelievable bloatware.

2. Clone it and trim excessive fat. To do so I use manifest from perl-base debian package as a base of what's crucial and what's not. That gives me 7MB clean, minimalistic Perl.

3. On Perl from 1 (full installation) I do "cpanm Whatever Is Used In My Code". That does all the module testing and bumps ecosystem to enormous 300MB package.

4. Now the hard part is to install _runtime_ packages only on minimalistic Perl from 2. I haven't found a good way to do it. It's a mix of horrible hackery of symlinks (build dependencies must be "borrowed" from full installation) and partial parsing META files. I usually install package X used by my code, without dependencies. Then I try "use X" to see what it REALLY needs (META phases are often not correct or package is too old to have distinct phases configured) and install dependencies until X can be loaded and my code that uses X passes tests.

After few hours of pain I have 300MB "devel" ecosystem and 90MB "production" ecosystem that I can trust and deploy waaaay faster on multiple machines.

Do you go similar way? Or maybe have some cool tricks to debloat Perl itself and CPAN too-greedy modules dependencies chain?

Leave a comment

About Neil Bowers

user-pic Perl hacker since 1992.