What's wrong with CPAN?

This is an attempt to succinctly list all the different problems perceived with CPAN, and give them a name. No attempt at proposing solutions, or structuring a taxonomy / priority list, but data gathering.

Following Brendan's post on the four major problems with CPAN, and other posts recently, I've been thinking a lot about the general topic, and clearly so have many others. I thought it might be useful to try and come up with a shared list of "what's wrong with CPAN", so we can have a common terms of reference. Given things a label can help: for example Yanick and Tim have both proposed solutions for the best fit problem.

Here's a first cut, from trawling various blog posts, comments, and personal mulling. A blog isn't the best format for this sort of exercise, so I've created a google doc, which you can all edit. I'll sync any updates in both directions.

I'm deliberately not describing possible solutions here, just shooting for a succinct list. There will be overlap, and some that are really just symptoms of others. You may not agree with some, but at this point I'm hoping more for the union than intersection. I'd be happy to hear better names for some of these too!

  • Discoverability: It's hard to find all the modules that are relevant for a particular need.
  • Whipupitude: People don't look for existing competing modules before releasing a new module, as it's often quicker to just write the module you need than find one (or people do look, but don't find some of them, due to the discoverability problem).
  • Best fit: Hard to tell which module is the "right one" for a need.
    Yanick on alternative dist recommendations; TIMB on suggested alternatives.
  • Me too: When people perceive a problem with an existing module on CPAN, they're often quicker to release their own module than to work on "fixing" the existing one. This may be down to the orphan, communication, or resistance to change problems.
  • Orphans: Too many modules are essentially unmaintained, but not explicitly so.
    brian d foy on ADOPTME user.
  • Communication: Contacting authors of modules is sometimes near impossible, if they've changed email addresses but not updated PAUSE. This is different from the Orphans problem: I've had situations where when I've eventually tracked down the author, they're happy to give me co-maint.
  • Barrier to contribution: It can feel / be hard to contribute to a module, and feels even more so if you just want to submit a small change (eg doc). [annocpan]
  • Maturity: Hard to identify the maturity of a module (and maturity is in the eye of the beholder). There can be massive differences in performance, but you've no way of telling that unless you benchmark all the alternatives, with representative data.
  • Resistance to change: Hard, or at least feels like it is, to rename or delete a module. Partly because of the unknown usage problem.
  • Unknown usage: no way to really know whether your CPAN module is being used, unless it's by other CPAN modules, or if someone emails you.
  • Land-grab: Namespaces are first-come, and once you've grabbed a namespace, it's essentially forever.
  • Convergence: multiple solutions to a problem can inspire new approaches, but there's not much force for convergence, and attempts at convergence can often just add one more module to the mix.
  • Duplicated dependencies: If you use a number of dists from CPAN in the same app, you can easily pull in multiple implementations of the same thing. Eg base vs parent vs superclass; exception handling; constants; OO framework; etc. you might argue this is a perl problem not a CPAN problem :-)
  • Private: Many modules are used mainly / only for personal / semi-private needs.

Added

  • Poor design: Many modules are poorly designed or over-engineered, solving problems in a specific way for the author that hinder more general reuse (thus prompting fragmentation)
  • Dependency indifference: not caring about what your dependencies are, and as a result how many modules might be ultimately pulled in by someone using your module.
  • Community tools: lack of community tools and therefore engagement, reviewing etc.
  • Lack of cooperation: besides a few large / well-known dists, most modules are developed by single author with possibly a small number of contribut{ers,ions}. (as of November 2012, 79% of modules had a single PAUSE id against them; another 10% have an owner and 1 co-maint.
  • Contribution undervalued: uploading your own module often given a disproportionately higher value than contributing to someone else's module (fixing a bug, submitting an addition, writing tests, documentation, etc).

What else?

If it's not clear, I'm not trying to say that CPAN's shit: most of these problems are born out of CPAN's success.

24 Comments

  • Poor design: Many modules are poorly designed or over-engineered, solving problems in a specific way for the author that hinder more general reuse (thus prompting fragmentation)

Side note -- I'm not sure private modules are really a problem. Personal Task modules act like a curated list of good modules. Even Dzil author bundles show how to do things and make it easier for people to contribute to a project. The most useless offenders are the the Acme::AUTHOR ones and those are part of an explicit training program to get people over the barrier of contributing.

Whipupitude: Your "SEE ALSO" section should read like the "prior work" in an academic paper. It should list similar things, and explain why yours is different or better.

Poor design: These are ideological problems too often to worry about them. One man's "over-engineered" is another's "properly Object-Oriented."

Duplicated dependencies: IMHO this is a "not caring about my dependencies" problem. Authors should be aware of what they force people to (transitively) install.

I think it would be a big improvement to CPAN if there was some objective way to know whether one's modules were being downloaded or used. For example, it would be useful to know the pageviews of the various documentation pages. My guess is that a lot of the abandoned and undeveloped modules get abandoned before they are developed into something useful because the author never receives any kind of feedback or encouragement. Once the module is uploaded, who knows what happens to it?

I would add Lack of community tools and therefore engagement, reviewing, etc. Of course there is a review system on CPAN, but it is unpractical and obsolete. As a result, even a widespread module such as LWP has had only 8 reviews in its many years of existence.

If we compare with Github, we can see all the features that we can desire: starring modules, following people, forking, issues (open and closed) -- all these provide very nice intuitive metrics when examining a new module.

One of the most valuable feature in my eyes is the Pull Request. Much better than reporting a bug: suggest a solution.

By the way, I see that more and more CPAN authors do have a parallel Github repository. Why not encourage this movement and use Github as the next CPAN? (or should we reinvent the wheel?).

Neil, I am not sure if these are on par with your points or if these are already reasons that cause the problems.

Lack of cooperation. IMHO, besides a few relatively large distributions, most have only one author/maintainer with a few contributors.

Lack of perceived value in contribution: I think there is a feeling that "I have uploaded my own module to CPAN" has a disproportionally bigger value than "I fixed a bug in a CPAN module" even if "my module" is not used by anyone and the bug I fixed was in a module used by millions. (e.g. DBI)

IMHO these all lead people to prefer to upload a new module instead of fixing an existing one.

Lack of usage statistics
You can see how many and which other CPAN modules use a certain module, but you don't have any indication what other open source projects (that are not on CPAN) and what corporate code uses specific modules.

Convergence reminds me of Standards [xkcd].

Some of these are comparatively easy problems to solve.

Duplicated dependencies: release a trial version; take a look at the CPAN dependencies tool to see a recursive list of your project's dependencies; and note down any duplicates. For example, if your project directly depends on Moo and Catalyst, then you might notice that Catalyst already uses Moose, so you could switch from Moo to Moose, and you drop one direct and a couple of indirect dependencies. This is something that needs to be done on a per-project basis.

Barrier to contribution: many projects (CPAN projects, non-CPAN projects, and non-Perl projects too!) distribute a file called CONTRIBUTING in the root directory of their tarball. Neither search.cpan.org nor metacpan.org display this file in their lists of documentation - you need to browse the tarball contents to find it. metacpan.org should display this file on a distribution's front page. Ideally it should also display a "contributing" link in the sidebar, which links to the CONTRIBUTING file if it exists, and to a page containing general advice about how to contribute to CPAN projects otherwise.

Whipupitude: what educated_foo said! (I can't believe I just typed that.) Also SEE ALSO should contain links to modules that are not alternatives but natural partners to the current module. For example, a parser for a particular file format should link to a serialiser for the same format. Ideally the documentation should give clear examples using the partner modules hand-in-hand - in pod, or by distributing a separate "examples" directory. And metacpan should accept pull req 741 :-)

Your "lack of cooperation" point made me think of another metric that could be used for measuring relative "quality" (discussed frequently as a problem being worked on from many angles): a "cooperation index" can be assigned to an author based on how many dists they have with no other maintainers, or how many other maintainers are on dists they comaint.

This is one metric where I'd score quite highly, as I'm comaint on a vast number of dists, a large number of which have 5+ other comaintainers (yes, I've been "rafl"ed) ;) -- and fewer than 1% of my dists have no other maintainers at all.

I don't know how useful that is as a metric. libwww-perl for example has two maintainers on PAUSE (one of whom seems to be a dummy user), but the latest release alone includes patches from seven people, including yourself.

DBI is similar - two maintainers, but tonnes of contributors.

App::cpanminus, Dist::Zilla and Dancer2 each seem to have good communities of contributors behind them, but each only has a single PAUSE maintainer.

Measuring cooperation in terms of co-maint status seems dubious. The number of people who can release a module to CPAN does not really reflect the number of contributors.

For modules with GitHub repos, reporting the number of stars and forks on MetaCPAN would be useful. It already reports open issues, so adding these shouldn't be too much of a stretch, @oalders?

> Of course there is a review system on CPAN, but it is unpractical and obsolete.

I completely agree. I don't think I've ever submitted a review for a module. I should, but I don't. I'm not really sure why.

Why do you think the review system is "unpractical and obsolete"?

> Lack of usage statistics

That is something that I have planned for Stratopan. I'm working on computing a score for each module based on its location in the dependency tree. Modules at the top of the tree (i.e. those your app depends on directly) get the most weight. As you go further down the tree, modules get less and less weight.

Assuming that I can get a sufficient number of users on Stratopan, the score might be a reasonable indicator of how many people actually use a module.

what educated_foo said! (I can't believe I just typed that.)
Even a stopped clock... Frighteningly, I found myself agreeing with you in another thread.

> I don't think I've ever submitted a review for a module. I should, but I don't. I'm not really sure why.

I'm a bit stumped as well. For some reason, people will buy something on Amazon, get it in the mail a few days later, use it for awhile, then revisit the site and leave a review. But they won't do the same for a module from CPAN. A few possible reasons:

* the module was pulled in as a dependency, so they don't even know they have it.
* reviewing the module requires using a different interface from installing it.
* since they didn't pay for it, they are less motivated to encourage or discourage others from making the same choice.

Maybe something like Debian's popcon would be useful. I have one module packaged for Debian by some kind soul, and I know that maybe 8-10 people use it thanks to popcon.

I completely agree. I don't think I've ever submitted a review for a module. I should, but I don't. I'm not really sure why.

JFDI? :-)

>Why do you think the review system is "unpractical and obsolete"?

Hard to say, but the fact is that almost nobody uses it. I can still see a few points that are annoying.

- The login system (bitcard) is too restrictive. One should be able to login with a Twitter, Google or Github account.

- The starring system is probably too detailed. Nowadays people don't seem to take the time: you just RT, Like, or Star (as on Github).

- The text field for the review should be optional (again: people don't take the time). You should be able to star without writing a textual comment -- it' still useful information.

Well, I don't wan't to be a pain, but Github simply works ;-)

One thing is that module owners should be able and encouraged to deprecate their own modules. There are quite a few examples where a module is flavour of the month for a while, and then superseded by a subtly better version that the community gets behind. I'm not even certain where the consensus lands on Mouse, Moo, Mo, Any::Moose is anymore. (actually, I see that http://search.cpan.org/~sartak/Any-Moose-0.21/ has hacked around this problem. That is what we need to see, possibly without having to make a release!)

So let a module owner mark it as "deprecated" and possibly to "recommend instead" another module.

This module is deprecated by the author. Author recommend instead "Foo::Tiny".

I think the module information could be crowd-sourced. Users (using Google/Twitter OpenID/OAuth accounts) log in to the system, and post annotations to distributions:
- reviews (using cpanratings)
- bug reports (using rt)
- annotations (using annocpan)
- links to relevant questions on trusted forums like stack exchange or mailing list archives
(some of this might be machine-generatable)
- similar modules

Other users can vote up/down on these annotations.

Not a stretch at all. That's a feature request that would belong in metacpan-web. :)

Another metric of the "CPAN community engagement" would be some statistics on rt.cpan.org tickets.

Check this gist for some tools to query RT.

There are quite a lot of modules on CPAN which are over ten years old and which haven't seen an update in that time. That doesn't mean that they need an update, but it does show that the timelines are quite long.

I think it would be worth adding a heartbeat function. Every user who maintains at least one module should get an email once per year asking them to confirm their details. (That's one per author per year, not one per author per module per year.)

If they fail to reply for five years then they should be removed from the list of maintainers of their modules. Any modules left with no maintainers should be marked as "adopt me".

That will also help people who want to be more pro-active in taking over modules. They will be able to see that the author has been uncontactable for three years or whatever when they start the official process of broadcating that they would like to take over a module.

Leave a comment

About Neil Bowers

user-pic Perl hacker since 1992.