The return of CPANDB and the (alpha) Top 100 website

It's been a while since I've posted anything about Perl.

It's been a while since I've written much Perl as well, looking at my CPAN page shows a long gap since I moved over to America (and the Microsoft stack) to work at Kaggle.

The break has caused quite a few problems in terms of maintainership of various things. Padre's progress towards 1.00 has suffered quite a bit, and I've handed off a few modules where people showed interest in taking them over.

The time away from Perl has also given me a chance to reassess my work and the CPAN ecosystem and to think about which parts of it are actually important and which are in desperate need of a shake up.

The first project that badly needs some love is CPANDB, which is a single relatively small SQLite database (and ORM) layer that aggregates all the most important data about CPAN authors, distributions and modules together in one place.

The original one was trapped using the 2gb download version of the CPAN Testers data, failed to install on Windows, merged data poorly without tolerance for delays or degradation in the data, and did not deal with dependencies properly after the introduction CPAN::Meta.

The new CPANDB 0.18 fixes all of these problems.

The new version of the client removes some problematic dependencies (and reduces the dependency load generally), while the generator on the server side fixes and number of problems.

The data it needs is reduced to the point I can now safely put it on a twice-daily cron to keep it properly up to date.

With the return of CPANDB comes the return of my old prototype CPAN Top 100 website.

The Top 100 website integrates the dependency graph with other data from the CPAN cloud to produce interesting rankings of CPAN modules.

The most notable change since the last time the Top 100 was working is the immense dependency bloat that seems to have occurred at the top end of the scale.

MojoMojo is now up to 411 dependencies. Catalyst with ExtJS support takes 311 dependencies!

It used to require approximately 140 dependencies to make it onto the Heavy 100 list, now it takes 240 dependencies to make it.

The method used to count dependencies on the current Top 100 website is intentionally crude. But fortunately it doesn't need to be so for long, as the CPANDB graph integration supports the ability to describe dependencies in terms of starting from a modern 5.16 version rather than from 5.005 (the default)

But more on the new version of the Top 100 website later...


You might want to look at an alternative to the cpanstats.db, as it will be decommissioned at some point. It currently has far too many errors, which don't seem to disappear even when recreating the SQLite DB from scratch.

The new distro, CPAN-Testers-WWW-Reports-Query-Reports, is now the recommended way to load the cpanstats data.

Feature request: link the distribution name in CPAN Top 100 reports to results where we could see the details.

Just a tiny thought, but perhaps excluding Author plugin bundles (ie Dist::Zilla::PluginBundle) namespaces would give a more interesting list.

Leave a comment

About Adam Kennedy

user-pic I blog about Perl.