Identifying CPAN distributions you could help out with

The other day Andy Lester posed a question Where can someone find Perl modules to contribute to? My first answer was to look at the dists with the most bugs. I continued thinking about it, wondering how you could identify a module that is ripe for help.

This post outlines my next idea, and the top 20 dists based on my first implementation.

If you're going to contribute, it's most motivating to do something that's going to be used. So the idea is to look for dists that are still getting bugs raised against them, but that haven't seen a release for a good while.

Dist Released Bug days Gap Score
Perl6-Parameters 2002-08-17 2 3769 1884.50
Crypt-Primes 2003-01-16 2 3617 1808.50
TAP-Formatter-HTML 2010-03-21 1 997 997.00
SOAP-WSDL 2010-03-28 1 990 990.00
Acme-Brainfuck 2004-04-06 4 3169 792.25
POE-Component-CPAN-SQLite-Info 2008-10-14 2 1519 759.50
IO-Digest 2004-09-11 4 3011 752.75
Net-CIDR-Set 2009-01-30 2 1411 705.50
IO-Async-SSL 2011-02-28 1 653 653.00
Proc-ParallelLoop 2003-03-13 7 3556 508.00
Log-SelfHistory 2010-08-07 2 857 428.50
CGI-Application-Plugin-LinkIntegrity 2006-05-18 6 2395 399.17
Catalyst-Authentication-Store-LDAP 2010-10-05 2 798 399.00
XiaoI 2008-08-18 4 1574 393.50
IO-Plumbing 2008-08-21 4 1571 392.75
Data-Transform-SAXBuilder 2008-08-27 4 1565 391.25
PITA-POE-SupportServer 2008-09-02 4 1559 389.75
Config-Tiny 2011-03-24 2 628 314.00
Pod-Spell 2001-10-27 13 4052 311.69
Text-Identify-BoilerPlate 2005-08-22 9 2661 295.67
  • Released is the date of the last release of the dist.
  • Bug days is the number of days since the last open bug was raised.
  • Gap is the number of days between the last release and the most recent bug.
  • Score is Gap / bug days.

Here's the top 20 for a slightly different measure. In the table below, gap is the number of days between the most recently reported still-open bug and the oldest still-open bug. If there's only one bug, then gap will be 1, so the dist won't appear here.

Dist Released Bug days Gap Score
SOAP-WSDL 2010-03-28 1 1596 1596.00
TAP-Formatter-HTML 2010-03-21 1 1500 1500.00
Math-BigInt 2011-09-04 2 2568 1284.00
Params-Util 2012-03-11 1 1174 1174.00
RT-Client-REST 2012-01-09 2 2251 1125.50
DBI 2012-11-20 2 2148 1074.00
Crypt-Primes 2003-01-16 2 2147 1073.50
IPC-Run 2012-08-30 4 3937 984.25
Path-Class 2012-12-09 3 2752 917.33
Net-DNS 2012-12-12 3 2593 864.33
Filesys-SmbClient 2012-12-04 4 3315 828.75
Config-Tiny 2011-03-24 2 1635 817.50
SQL-Translator 2012-10-09 4 3261 815.25
Authen-Captcha 2012-08-14 4 3051 762.75
libwww-perl 2012-02-18 5 3634 726.80
Storable 2012-09-11 6 3892 648.67
SQL-Interp 2012-02-08 3 1824 608.00
Net-CIDR-Set 2009-01-30 2 1187 593.50
Perl-Tidy 2012-12-09 3 1771 590.33
HTML-TagCloud 2011-06-18 5 2230 446.00

Note: these are really identifying modules that are potentially worthwhile candidates for taking over (getting co-maint), rather than modules where you could contribute without having to take over maintenance. That's a separate list!

Some thoughts for improving this:

  • It's skewed towards bugs raised within the last few days — too much so. Maybe instead of bug days as the denominator, I should use log10, to smooth things out.
  • A dist may have been bug-free until yesterday, so hasn't needed any releases. I could look at the number of bugs that have been reported since the last release, that are still outstanding.
  • Even further, I could look at the elapsed time between the oldest open bug and the most recently reported open bug.
  • Factor in the number of dists that are dependent on each dist, and weight the score based on this.

What else should be factored in? I'll play a bit more, then put a longer sortable list online. I love the fact that I could get hold of all the metadata needed to create this. Now I feel like I should find a bug to fix!

18 Comments

What are the odds that my help would actually be merged though? I've submitted patches to popular Perl modules before, but I'm not sure they ever got looked at, much less merged.

Perhaps look at which modules are still getting commits as well?

Hmm, I just submitted two bugs for Crypt::Primes, which probably made it go to the top. One would be pretty easy to fix, the other is a critical breakage and would require more thought, but isn't huge.

But I already have a module that duplicates most of the functionality, so spending time fixing that module seems not the best use of time. I found the issues in Crypt::Primes while testing mine, and reading the paper they're based on and wondering why the code didn't match.

OK, so you shamed me into submitting patches. Well done! :) Based on the comments on Crypt::RSA I'm not sure if the author is still around however.

Overall, I like the idea of something like this list. I like most of your additional suggestions. It would be nice to weight based on "importance", which is being somewhat taken into account by the reverse dependencies. It'd also be nice to give negative weight to modules that have unanswered patches, though how one would programatically get that I'm not sure. After all, if someone has already solved an issue but it's being ignored, it's probably a waste of time submitting patches for other issues.

preaction:

That differs from maintainer to maintainer, no? If you want some level of confidence before you invest yourself I’d say check the issue tracker of the module for an idea about how responsive the author is. (NB.: recent activity should be weighted a lot higher than past phases of inactivity.)

A more interesting way of scoring bugs by importance would be taking into account the number of other modules that depend on a given one in a first approximation, or even overall. From this point of view, probably CGI is one of the most critical ones, and, well, Acme::Brainfuck not so much.
It's not clear how to submit patches to CGI, however, other than just email them. Where's it developed?

With regards to CGI, the latest release isn't from 2001, but from a month ago.

MetaCPAN shows the github repo for it, so contributing to CGI is easy.

@Neil: I'm interested in the tool you probably wrote to build this list. Share it!

@Neil: There's an unfortunate situation—not yet resolved—on rt.cpan.org with the existence of both a CGI.pm queue and a CGI queue. This originally stemmed from differences in distname parsing. The maintainers currently direct you to CGI.pm, but other sources direct to CGI. The CGI.pm queue should be merged into the CGI queue, and it's on my list of things to tackle in rt.cpan.org. That may be one source of confusion in the data I pointed you at originally.

If the author doesn't respond to your patches, then you could aim to take over the module. Fork it on github, and fold your patches in. Put a pointer to your repo in the RT comments. Two weeks from now, email the author again, and say that you'd be happy to take over the module and release fixes. Then in a month's time when you apply for co-maint, you can point the PAUSE admins to your fork and patches, and your attempts to engage the author. Cc the author on that email as well.

Perhaps this should be (or already has been?) formalized into a standard procedure.

Second request for a list based on impact, e.g. reverse deps.

What code did you use to determine the number and age of outstanding bugs? I was thinking of creating a Kwalitee data point on these stats.

Leave a comment

About Neil Bowers

user-pic Perl hacker since 1992.