Don't release experiments to CPAN

I'm proposing an explicit community convention where experimental code isn't released to CPAN, but is shared on github, perhaps with an associated blog post, or discussion on PrePAN.

This addresses just one of CPAN's problems, which have also been raised today by Brendan Byrd.

Motivation

There are many modules on CPAN which appear to be the result of some experimentation. Once the author has demonstrated their point, (s)he loses interest, and the module lurks on CPAN, waiting to catch out the unwary.

I've reviewed a number of module categories now, and in a number of them I've hit such experiments. I've had email with some of the authors, who've admitted the heritage, and often comment "oh, I forgot about that module".

The problem with these modules is that they just reduce the signal-to-noise ratio of CPAN, and make life harder for users, particularly those new to Perl and CPAN. Consider the following:

  • If you want to define some constants, there are 21 modules.
  • If you want to perform some kind of run-time loading, for example of plugins, there are at least 43 modules you might consider, for about a dozen use cases.
  • If you wanted a module to find dependencies of your code there are 26 modules, but only about 5 distinct use cases.

In all of these categories there are modules that were experiments that never went anywhere.

Let me be clear: I don't want to suppress experimentation - it's a key contributor to progress. I imagine Moose started off as a wee experiment.

Proposal

If you've been experimenting and would like to share, either (a) to show "hey, look what I did", or (b) you're just not sure whether anyone else would use it:

  • Put the module on github. Sure, there are other places you could put it, but github is fairly well linked into the Perl ecosphere, and cpanminus 1.6+ can install modules direct from github:
            cpanm git://github.com/miyagawa/Plack.git
  • Write a blog post about it. If you don't have a blog, or your blog doesn't have a wide readership, then consider posting to blogs.perl.org as well. If you link to your blog, you might even gain some more readers.
  • Discuss your module on PrePAN.
  • If there are existing modules on CPAN which are similar, you could email their authors, asking whether they might link to your github repository in the SEE ALSO section of their doc. Ok, I admit your success rate might not be good here.
  • You could annotate these modules on annocpan.

But if many modules start as experiments, how do you decide whether / when to release to CPAN? Just apply some common sense. For example, if you or anyone else starts relying on the module, then it's time for CPAN.

How might this idea evolve?

  • If the name weren't already taken, PrePAN could be a place for uploading experimental modules. Maybe PrePAN could evolve to become that as well?
  • If MetaCPAN indexed all perl dists on github, it might not include them in search results by default, but could say "N additional things on github matched your query, click here to include them".

You could also consider deleting any experiments you already have on CPAN: cpanminus can install from backpan. Check with the reverse dependencies service first. Deleting modules from CPAN is worthy of a separate post, but Brendan Byrd might beat me to it.

Revisited, 2 days later

I think have should have more clearly defined what I mean(t) by experiment!

There are a number of situations when I'd consider a module to be an experiment, but the classic example (for me) is where you're writing a module with no intention to use it in any real code. This might be to see whether something is possible, possibly trying to (ab)use Perl in some unexpected way. Such an experiment may obviously lead to something unexpected and useful.

Another category, but less clear-cut for me, is when you're creating something, but you're not sure exactly what it is you're creating, and whether there might be a module on CPAN already. Often the namespace(s) will change, and you may drop it anyway. Typically at this stage I just don't share it, because I don't want to worry about namespaces (even though you can free them after you rename, I know), but I've had a couple of cases where people wanted to play with the code anyway. See below.

Some examples might clarify, and might help me refine what the hell I'm talking / thinking about!

Not experiments

I do not see the following as experiments:
  • Someone new to Perl writes a module which they're using at work. They proudly upload it to CPAN. They might have no idea whether it would be of interest to anyone else.
    As an aside, one problem (with the current toolchain) is that authors aren't encouraged / helped to find modules on CPAN that might serve their needs, or which they might be able to take over and evolve (commented on by brian d foy in a comment on Brendan Byrd's post on problems with CPAN). A topic for another day.
  • Karen's post describing a dev release of Test::Warnings. Karen describes the implementation as experimental, but I don't see the module as an experiment — it's clearly written to meet a need, and will be used.
  • Someone writes a module that's a complete hack (deadlines, we've all been there), but which they imagine they'll probably get around to doing a 'proper' version of eventually.
    Aside: it's not very easy to tag your dist with maturity, as others have noted. You can't just look at the version: you'd miss Net::HTTP::Tiny 0.001. 10.7% of dists on CPAN have version 0.01 or 0.001. And you can't just look at reverse dependencies: Net::HTTP::Tiny has none — I tend to use it in some of my scripts, and HTTP::Tiny in modules.
  • Someone has a (possibly slightly crazy) idea for a module, which addresses something they see / have as a very real need, and which they think might pan out. I think this comes down to personal definition of experimental, and how you like to play that out, but if there are others already interested in joining in, then I'd always err on the side of CPAN.

Experiments

Where you draw the line between experimental and not is a personal call. From now on before I upload a new module (and as you can see, it's not something I've done many times), I'll just ask myself whether this is experimental, by my personal definition. If so, I'll release it to GitHub, and possibly describe it in a blog post. And if someone else starts using it, I'll put it on CPAN.

I consider the following to be experiments:

  • I was looking for a tool to graph dependencies, and started searching for modules. I didn't find anything that met my needs, so knocked up a module. I'd already found a handful of modules, so started writing a review, and decided I wouldn't put my module on CPAN until I finished the review, in case I found a module I was essentially duplicating. As I progressed I kept finding more modules, and others pointed out modules in namespaces I'd not even considered. Still I found nothing in direct competition with my module. But the 23rd module was, and so I may not ever release my module, but either submit changes, or refactor it as a helper module. If I find myself in this situation again, I'll put it on GitHub.
  • I have various modules and scripts I use when writing reviews. At some point I plan on releasing them to CPAN, but until recently, I seemed to tear them apart every time I wrote a review, including a change of namespace. I'm hoping to stabilise them soon, and will then put them on GitHub, and then ask Leo (and anyone else interested) to comment on them. I probably won't put them on CPAN unless / until someone else writes a review using them, which may be never, and that's fine.
  • I suspect that Module::Hash is an experiment. Someone new to Perl might come across this module and think "This Toby guy seems to know what he's doing, the module is recent, it's well documented, so maybe this is the modern/new way to load modules at runtime". To me this seems like a good candidate for "GitHub and blog post", and I can see the blog post generating comments, which may in turn lead to a CPAN release, or not. But if it doesn't meet Toby's definition of experiment, then fine, and my apologies to Toby.
  • A number of modules in the Acme namespace :-)

Furthermore, I'm not imagining that such code would be forever banished to GitHub, never allowed to sully CPAN. Once code isn't an experiment, then I see it being uploaded to CPAN as well.

My process

This "experiments on GitHub not CPAN" idea addresses just one small part of the "problems with CPAN"; I was thinking about a much smaller percentage of CPAN modules that many of you seemed to think. Mea culpa.

My personal process when creating modules will now be something like:

  • Put the module on GitHub.
  • Possibly register the namespace. I haven't always done this, but when I have, I've had thoughtful and helpful comments.
  • Search for similar modules, and link to them in the SEE ALSO section.
  • If there's an existing module close enough, see if I can contribute to that rather than release my module, or if it's gone stale, whether I can take it over.
  • If it's not experimental, and I got to this point, then release to CPAN.
  • Possibly write a blog post on it.

Most (if not all?) of the people who've commented on this, and related posts, are not part of the demographic I'm worrying about (read: screw the lot of you! ;-) They are: new or casual Perl programmers and CPAN users. CPAN is currently a seriously sub-optimal experience for such users.

A few years ago I was a born-again Perl newbie, and often when I turned to CPAN for "a module to do X", I'd find a handful of modules and no easy way to determine which was the right one to use. I decided I'd do a quick (ha!) review whenever I hit this, so (a) I'd make an informed decision, (b) it might help others, and (c) the peanut gallery might point out gaps / flaws in my reviews, and improve the end result. After doing a few reviews, I gave a talk on CPAN Curation at LPW 2011, where I listed some of the problems I saw with CPAN, and thoughts for how they might be addressed. I'll revisit that in a separate post, as I've ended up thinking about it a lot over the last 2 or 3 days...

21 Comments

Issues with this approach:
1) If you improve the infrastructure (cpanm, metacpan) support of github modules, you get the same problem again; and if you don't improve it, you put experimental module users to disadvantage ("author has forgot or is too shy to upload to the big CPAN" can be an issue too!)
2) There are a lot of experimental / low-quality modules on CPAN already. Cleaning them all up will take a lot of work, and also can lead to ugly conflicts.
3) Where will you draw the line?

I think this is the question of better search ranking.
Do we have enough signal (+1's, usage stats, keywords) to improve it?

Maybe we should add "experimental" flag to META.(yml|json), penalize such distros in search results, and attach a big red "experimental" badge to them?

Not many people are responding on PrePan at the moment, perhaps it would be nice if some people submitting their modules got a comment or two.

One reason to upload "experiments" to CPAN is to get them tested by CPAN Testers.

Because everything on CPAN is open source, it's possible to fork the modules.

There's already a "maturity" flag on CPAN modules. Does anything use that?

I feel so strongly about this issue, I posted on my blog in response:

CPAN is for experimentation and I hope that never changes.

If it's experimental, I put EXPERIMENTAL (in all caps) in the =head1 NAME abstract section - i.e.

=head1 NAME

Module::Name - EXPERIMENTAL frobnicator

and I think that's useful, just like Net::IRC having 'DEAD SINCE 2004' in the same space is.

If it's more experimental than that, I make it dev release only.

But I still put it on CPAN, because CPAN is how we share our work. Devel::Declare went up onto there just so that a few people could install it to try and those people (a) wrote the docs (b) rewrote most of the code.

I wouldn't want to discourage that.

I meant the D field of the DSLIP. Maybe they're the same.

The problem is that the Perl toolchain isn't really geared up for tracking dependencies on non-CPAN projects. And without that people will be reluctant to depend on non-CPAN projects. Do you think Moose/Class::MOP would have emerged out of its experimental phase without people releasing useful projects that depend on it? The way projects go from experimental to stable is through usage and feedback.

OK, so you could change the entire Perl toolchain to install and track GitHub projects as easily as it does CPAN projects. But then GitHub effectively becomes part of CPAN, so experimental projects are "on CPAN" again.

It would be better to have a clear status flag beside each distribution. (Like DSLIP but actually used!) Tools like http://deps.cpantesters.org/ could then show you which experimental modules you are relying on. People aiming to write stable releases could avoid non-stable deps, the same way people writing "tiny" releases already avoid non-core deps.

People aiming to write stable releases could avoid non-stable deps, the same way people writing "tiny" releases already avoid non-core deps
This sounds like a good idea. One place to start would be to add DSLIP filtering to MetaCPAN, search.cpan.org, and the various command-line clients. This would have the added benefit of encouraging module authors to pay attention to (and, yes, possibly lie about) the various fields.

I'm reminded of the fact that one can register a namespace on PAUSE (although many people do not bother to register anything -- after all, what is the benefit currently?), and it has various categories such as "stable", "experimental", etc. I would propose that we make wider use of these categories, e.g. making them visible on metacpan.org.

It would be one more way of determining "code quality" and "suitability for use", which I hear a number of people are working on from different angles.

Why not simply ask people to submit experimental modules under development version numbers? This seems to me (after a grand total of maybe five minutes' thought) to have two advantages over the original proposal:

  1. The modules get tested by CPANTESTERS
  2. The modules can be installed using only core tools

Additionally, Module::Install-based modules can be installed from CPAN without further tweaking (see The Main Problem With CPAN Modules On Github).

Yes, the modules will still be in CPAN, but they will not be indexed, and I would think that anyone searching for a module and finding a distribution with only development releases will realize they are looking at something that is, in the opinion of the author, not ready for prime time.

I agree with David. I release experiments to CPAN. Sometimes I don't do anything with it anymore, but someone else might. Because letting code rot in a corner is a waste of effort.

That said, most of my experiments are fairly unique snowflakes that don't already have a dozen implementations, they tend to be the more interesting ones.

Neil, thanks for clarifying.

Personally, I think either dev versions or Acme are good places to put experimental code that one isn't sure has a long-term future, but that one wants to get exposed to the community.

For example, when I wrote Acme::Module::Build::Tiny, I put it in Acme because while it worked, it was a crazy-ass experiment to do most of Module::Build in 3% of the lines of code. Later, Leon took the idea and turned into something actually useful with Module::Build::Tiny.

Acme has the advantage that it's indexed and easily shared.

In the medium-term, I hope to help the toolchain support alternate indexes, so you could select the "unstable" toolchain and easily try out moduels, which might make releasing dev versions or "-TRIAL" versions more appealing to authors for experimental code.

Thank you for the clarification. I have no particular confidence that I could apply your distinction between an experimental module and an experimental implementation to an arbitrary module -- which is not to find fault with you. The fault (if any) may lie with me, or it may simply be the nature of the beast. And you yourself said in your clarification that the distinction was to an extent personal (or at least, so I understood you). But I fear that unless the distinction is seen the same way by a majority of CPAN authors, any attempt to achieve something sweeping in this way may be doomed.

I was unaware that name spaces were allocated only on production releases, but that seems to me to be an advantage, not a disadvantage. I have great faith in my ability to name a module badly, and if I am not sure what it should do or how it should do it, coming up with a good name is even more chancy. The fact that the name space is not allocated makes a development-version-only module even more like a module released only to Github.

I think you are right that we need better ways to distinguish the wheat from the chaff. Or at least we need to make better use of the ways we have (CPAN ratings, forums, etc.)

Guys, I think what Neil is talking about here is the modules we've all run across before: modules that don't even have any code in them, or only pseudo-code, or some quickly hacked code with no tests--this sort of thing. The modules that are really nothing more than placeholders for nifty ideas that may or may not ever go anywhere. But, in the meantime, they use up names and confuse n00bs.

mst says:

> ... because CPAN is how we share our work.

This is the heart of the matter right here. Some people want to CPAN to be a place where we're all slinging code around to each other and everything is fast and loose. Others want CPAN to be the place where settled, mature modules live and can be downloaded for immediate use. Those two goals are slightly at odds, and, as long as CPAN is trying to be both, it's always going to be unsatisfying for large swaths of the expectations.

I read Neil's post as saying, "look GitHub is a better place to share code that's still undergoing rapid changes." Which, honestly, it is; not leastwise because it's a hell of a lot easier for new people to get a module up on GitHub than on CPAN. But that goes against what a lot of people expect out of CPAN.

In a perfect world, I think I'd love to have one of each. Two code repositories that allow free movement back and forth, possibly based on community concensus or voting or something like that. But I don't know that that's practical at this point in CPAN's evolution.

If you want this, first make the guys writing books about perl not tell new developers to submit their code to CPAN; I've read 2 such books. I am sure there are more books out there saying the same thing.
I may be confused, however I was under the impression that Acme::yourname:: existed for that reason; to separate experimental/developmental/unfinished work from the "real*" modules.

I am also unclear on your reasoning for the problem. Is it: 1) because you don't like how long the module list is with the extra entries? 2) The submissions of Acme:: and related modules are NOT going into Acme:: like they should.
3) Bandwidth or storage space problem for the administration.
4) They are annoying to look at when looking for "real*" modules.

If it is just the 'noise' of looking at Acme::yourname modules, or the annoyance thereof; then should there not be some sort of filter put in place? (more than what exists today) This would be better than to just annihilate the whole category.
If it is the problem of Acme::yourname:: not being used when it should be, then some new policies may be in order.
If it is the problem of bandwidth, perhaps more mirrors are in order, or stricter checks for duplicate or redundant data. Also checking for data that is not to be uploaded. I realize this is being done to a certain degree already, but there is always more that can be done. I would gladly volunteer my time for such a purpose.
To banish experimentation to some other place when it has been done this way for years is something you can only wish for, it's like wishing for developers to all test their modules correctly, it will never happen. There are deadlines, workloads, and time itself to consider.
To quote Buddy Burden's comment, "In a perfect world, I think I'd love to have one of each" of those.
You are always going to have people that will color outside the lines. I could go on and on but I must cut this short, if you require more information from me, don't hesitate to write me an e-mail message.

* The term, '"real" modules' means something different to each of us. Because of this dynamic, it is not possible to please everyone, all of the time.

Leave a comment

About Neil Bowers

user-pic Perl hacker since 1992.