CPAN Testers Summary - October 2010 - Nine In A Pond Is Here
Back in January 2008 we were celebrating the one millionth post submitted to CPAN Testers. Although that article proclaimed it to be the one millionth report, many initial posts to the mailing list also included discussions and announcements of uploads. It wasn't until I created the Interesting Stats page that we started to see the true picture. However, we only had to wait until March 2008 for the real one millionth report to be posted. Now some 2 years and 7 months later we've had the nine millionth report submitted. It took 9 years to produce 1 million reports, but only a further 2½ years to produce another 8 million reports. The rate at which CPAN Testers has been able to get people involved in the project has been phenomenal. We are now submitting over 500,000 reports a month, so I have no doubt we will pass the 10 millionth mark before the end of the year .. probably just before Christmas :)
In the comments to my nine millionth post on Perl Blogs, John Napiorkowski asked of the comparisons for testing packages in other languages, particularly Python and Ruby. Chris Williams provided some links to the testing setups for those languages, and the sites prove rather interesting. From the perspective of trying to find information about test results CPAN Testers wipes the floor with both, as I found both the Cheese Cake and Firebrigade sites awkward to follow. For the Cheesecake site it seems they are aiming more for a site like CPANTS, which is probably a good first step to encourage a testing culture. While the Firebrigade site seems to have tried to take on the idea of CPAN Testers, but in trying to also be different they've actually made things hard for themselves. I also refute the Ruby claim of "Firebrigade tests every gem ever made on every platform under the sun". On the front page it lists that it only has tests on 45 platforms. CPAN Testers would never make such a bold or false claim, but with over 100 platforms, and 74 alone during October 2010, I think Perl's sun must be much bigger than Ruby's, and CPAN Testers are still only scratching the surface. It will be a long time before any other language can compete with CPAN Testers, and with only 20405 reports in nearly 4 years, the Ruby team have a long way to catch up. CPAN Testers should be immensely proud of the work they have put into the project, whether as a developer, tester or even those with just the odd suggestion to help improve the eco-system. Every contribution has helped to make CPAN Testers worthwhile and valued by the Perl community, as well as respected, imitated and/or envied by other language communities. And we're still improving.
Talking of improvements, there have been several performance improvements to the applications which produce the web pages for several sites. The CPAN Testers Statistics site was suffering from the vast amount of number crunching it performed, and was previously using up as much as 3GB of RAM, and often taking over an hour to produce its results. With a rework to save a snapshot of the data, and restart each time from where we left off, the application now uses less than 1GB RAM and processing takes about 40 minutes. The CPAN Testers Reports page builder has also seen some tweaks, and again the pages have seen a dramatic improvement in build times. Some author pages were taking over an hour to build, but now even RJBS and ADAMK only take 5-15 minutes. There is still room for improvement, and better use of on disk storage is planned.
Still the biggest drain on resources is bzip2. It is a memory and CPU hog at the best of times, and with it holding IO on occasions, it often has a significant impact on other applications. As such I am taking time to review how the bzip2 files are produced. Part of that is to review how often they need to be generated. Tellingly the frequency of any 1 IP to grab the two most popular archive files (uploads.db.bz and cpanstats.db.bz2) are just once a day. Currently the gzip archive of cpanstats.db.gz has a similar popularity. As such over the next few days expect the archive creations to happen in the early hours of the morning (CET), gathering up the previous days stats. Initially I was planning to move the archiving to another server, but with the archives not being in high demand, I will now look at running one complete archive process a day and see how that effects the server performance. If the change in timestamp is likely to cause problems, please me know and I'll see what I can do to help.
In the next few weeks David Golden and I are planning to chat about the future of CPAN Testers. Now that CT2.0 is live, where do we go from here? There are some obvious improvements we can now start to look at, such as expanding the metadata we record, but we have other plans to make CPAN Testers even more reliable and current. Once we've had a chance to discuss the ideas and point them in the right direction, we'll let you know more.
In other news, it is likely that the Preferences site's SSL certificate will fail very soon. For the past 2 years we've been able to qualify for GoDaddy's OpenSource scheme which donates a 1 year certificate for any verified Open Source project. Sadly, despite them considering CPAN Testers an Open Source project for the last 2 years, we have now been rejected for not being an Open Source project! Yes, the response surprised me too, but despite attempts to understand why we no longer qualify, they've now closed the request ticket and have effectively ended the discussion. As such, I'll be looking to purchase a new SSL certificate from another vendor shortly, who hopefully have a better support policy.
I was intrigued to see Yanick Champoux's recent blog post: Generating RT bugs out of CPAN Testers' Reports. Yanick was looking for an effective way submit a test report into his RT queue. Unfortunately I can't add a button to the site as suggested, as at the current time the site doesn't verify that you are the author of a distribution, and opening it up to all would be a nightmare waiting to happen. I did wonder whether this was something that could be added to the Preferences site, but with potentially hundreds of reports coming in, trying to decide whether they are applicable for RT or not could also turn into a nightmare. As such, if you're interested in doing this yourself for your own RT queue, read Yanick's post and see how you get on.
Moving away from CPAN Testers and looking at CPAN, we passed another milestone last month. On 8th October 2010 ETHER became the 5,000th PAUSE user to upload a distribution to CPAN. Although we currently have 8482 PAUSE users (as of 02/11/2010), it is surprising how many have used their ID for other CPAN related activities. After holding the top spot for some considerable time, Adam Kennedy has now been overtaken by Ricardo Signes for the most current distributions attributed to a single PAUSE user. Some years ago it was considered quite a feat to reach 100 distributions, but with 230 active distributions currently to his name, I'm not surprised Ricardo created Dist::Zilla to help him manage them all :)
Finally, we have some more mappings, with 40 new address mappings, of which 22 are for new testers. Until next time, happy testing :)
Cross-posted from the CPAN Testers Blog