The difference between distros, modules, and programs; and how that affects indexing

Programs, modules, and distributions are different things, but we are often loose with the language. In CPANizing Behavior and Democratizing Publishing, chromatic conflates two issues that are really only loosely connected: making modules and releasing code as a Perl distribution.

Distributions are merely the unit of things we give away. A Perl distribution has a conventional structure, and usually contains modules. However, there's no requirement that it contain any modules. It doesn't even have to contain Perl code. It's quite easy to distribute a non-module Perl program in a Perl distribution. The major installers handle it just fine. You can even extend the installers right from the distribution, doing almost anything you wanted.

Code re-use, in the form of modules, is a different issue. For that, you want to separate the parts that your forced to run from those that you choose to run. You also want to give people the ability to extend what you've done and share those changes. Modulinos, the term I use for modules that act as both traditional modules and programs at the same time, are an easy way to do that. This has nothing to do with the packaging of a distribution though.

The trick, which chromatic is implicitly trying to make, involves the proper indexing. This is where his CPAN::Dark still needs improvement and I think is ultimately flawed. He wraps CPAN::Mini::Inject, which as a module is dumb. That is, you have to tell it everything and it discovers nothing. If you give it the wrong information, it happily uses it. CPAN::Dark, not yet on CPAN, provides some of this information to CPAN::Mini::Inject by looking in the META.yml file of a distribution. He takes the name of the distribution and assumes it's a module name. Instead, he should look in the provides section. How, for instance, would you properly inject libwww-perl? So far, CPAN::Dark assumes that the distribution name is the main module name, and that is the only name you'll use to install the distribution (and hence all that it provides). The distribution and the modules are separate things. It's a common case that the names overlap, but you shouldn't rely on that. Indeed, the best solution will make as few assumptions as possible.

This is why my DPAN stuff doesn't use CPAN::Mini::Inject and doesn't ask any human to supply the information. The customers who paid me to develop DPAN were frustrated with the dumbness of that CPAN::Mini::Inject assumptions and wanted to properly inject distributions without having to know anything and without necessarily trusting the META.* files. For instance, you might forget to include a file in provides, or not want to list it for some reason. That doesn't matter to the user who gets the "Can't locate Foo.pm" message, since that doesn't lead anyone to the right distribution to install to get the missing module. If you don't index all of the modules, regardless of what the META.yml files say, eventually you will have this problem. That's why my DPAN tool discovers information from the module files instead of relying on or trusting anything else. Even without a META.yml file, DPAN still works. That might matter to you when you want to use an old distribution that is no longer maintained or even in CPAN.

And this reminds me that I should update my scriptdist distribution, which I haven't used in ages now that I make things as modulinos.

7 Comments

I have been working on Dist::Metadata to augment CPAN::Mini::Inject so that it could index more by default.

After a recent change in structure my tests are passing again. Perhaps I should release it so people can start using it.

I had considered adding a "trust_meta" attribute that would likely default to true but could be set to false in the constructor. It sounds like you would support the attribute. Would you default it to false?

I tried using dpan a few months back but there must have been something I didn't understand about using it... but I got CPAN::Mini::Inject to work for me.

Thanks for the bug report. That's one reason CPAN::Dark isn't on the CPAN and might not ever be. I haven't decided if it's better off as a part of CPAN::Mini::Inject or if there are enough similar projects that attempting to simplify one of those would do everyone more good.

modulinos are cool. I do that all the time now as well. :)

Thanks for the post, you reminded me that i wrote this ( https://gist.github.com/670826 ) a longish while ago and made it inject Dist::Zilla-created dist packages posted at a faux PAUSE url into a CPAN-Mini mirror.

CPAN::ParseDistribution made it pretty easy to automatically pick out those details from the tarball.

I'll have to fold at least the automatic functionality into CMI.

Dist::Metadata uploaded and arriving soon at a mirror near you...

CPAN::Mini::Inject pull request to follow...

That sounds good to me, thanks in advance. :)

One thing though: You don't need to guess what the PAUSE indexer does. You can look at it here: https://github.com/andk/pause/blob/master/cron/mldistwatch

Leave a comment

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).