Party In Paris - 2012 QA Hackathon (part 1)
I'm currently at the 2012 QA Hackathon working on CPAN Testers servers, sites, databases and code. It has already been very productive, and already I have two new module releases.
CPAN::Testers::WWW::Reports::Query::AJAX
This module was originally written in response to a question by Leo Lapworth about how the summary information is produced. As a consequence he wrote CPAN::Testers::WWW::Reports::Query::JSON, which takes the data from the stored JSON file. In most cases this data is sufficient, but the module requires parsing the JSON file which may be slow for distributions with a large number of reports. On the CPAN Testers Reports site, in the side panel on the distribution page, you will see the temperature graphs measuring the percentage of PASS, FAIL, NA and UNKNOWN reports a particular release has. This is glean from an AJAX call to the server.
But what if you don't want an HTML/Javascript styled response? What if you wanted the results in plain test or XML? Enter CPAN::Testers::WWW::Reports::Query::AJAX. Now you can use this to query the live data to for a particular distribution, and optionally a specific version, all the result values and get them pack as a simple hash to do with as you please.
I anticipate this might be most useful to project website who wish to display their latest results from CPAN Testers in some way. They can now get the data, and present it however they wish.
CPAN::Testers::WWW::Reports::Query::Reports
Now we get to perhaps the bigger module, even though its smaller than the one above. This module is perhaps most useful to all those who are trying to maintain a version of the cpanstats metadata from the SQLite database. As mentioned previously the SQLite database has been giving us grief over the past year, and we haven't gotten to the bottom of it. Andreas suspects there is some unusual textual data in some reports that is causing SQLite problems when it tries to store it. I'm not quite convinced by this, but as I'm only inserting records, I'm at a lost as to what else be the cause.
The SQLite file now clocks in at over 1GB compressed and over 8GB uncompressed, and is starting to take a notable amount of disk space (though considerably smaller than the 250GB+ Metabase database ;) ). It is also a significant bandwidth consumer each day, which can slow processing and page displays, as disk access is our limiting factor now.
Enter CPAN::Testers::WWW::Reports::Query::Reports. This module uses the same principles as the AJAX module above, but now accesses an new API on the CPAN Testers Reports site to enable consumers to get either a specific record or a whole range of report metadata records. Currently the maximum number of records that can be return in a single request is 2500, but this may be increased once the system has been proven to work well. Typically we have around 30,000 reports submitted each day, so to allow consumers to make best use of this API, I will look to increasing the limit to maybe 50,000 or 100,000. I want to impose a limit as I don't want accidental requests being sent to consume the full database in one go, as again this would put a strain on disk access.
The aim of the module is to allow those that currently consume the SQLite database, to more regularly request smaller updates and store the results in any database they so choose. Even into a NoSQL style database. It will ultimately reduce the bandwidth, data stored and processing to gzip and bzip2, which then means we can reallocate effort to more useful tasks.
If you currently consume the SQLite database, please take a look at this module and see how you can use it. I plan to include some example scripts that could be drop-in replacements for your current processes, but if you get there first, please feel free to submit them to me too, and I will include them with full credit. If you spot any issues or improvements, please also let me know.
CPAN Testers Platform Metabase Facts
This morning we had a CPAN Testers presentation and discussion hosted by David Golden. As there is plenty of interest from a variety of parties about CPAN Testers, it was a good opportunity to highlight an area that needs work, but which David and myself, as well as other key developers in the CPAN Tester community, just don't have time to do. Breno de Oliveira (garu or IRC) has very kindly stepped forward to look at one particular task, which we have been wanting to write since the QA Hackathon in Birmingham, back in 2009!
Breno has written a CPAN Testers client for cpanminus. At the moment its a stand-alone application, but it may well be included within cpanminus in the future. As part of writing the application, Breno asked David and myself about how the clients for CPAN::Reporter and CPANPLUS::YACSmoke create the report. Due to the legacy system we came from (email and NNTP) we still use an email style presentation of the reports. However, it has always been our intention to produce structured data. A CPAN Testers Report currently has only two facts that are required, a Legacy Report and a Test Summary. However there are other facts that we have already scoped, except they are just not implemented.
Back last year the Birmingham Perl Mongers produced the CPAN::Testers::Fact::PlatformInfo fact, that consumes the data from Devel::Platform::Info (which we'd written the previous year). The problem with the way test reports are currently created, is that we don't always know the definite platform information for the platform the test suite was run on. Reports, particularly in the Perl Config section, can lie. Not big lies necessarily, but enough that it can disguise why a particular OS may have problems with a particular distribution.
Breno is now looking to produce a module that firstly abstracts all the metadata creation parts from CPAN::Reporter, CPANPLUS::YACsmoke, Test::Reporter as well as his own new application, and puts them into a single library that can then create all the appropriate facts before submitting the report to the metabase. Hopefully he can get this done during the Hackathon, but even if he doesn't, we're hopful that he will get enough done to make it easy to complete soon after. Once we then patch the respective clients to use the new library, we will then start to be able to do interesting things with how we present reports.
The CPAN Testers Reports site only displays the legacy style report, which for most is sufficient, but it really would be nice to have some specially styled presentations for particular sections, or even allow user preferences to show/hide sections automatically when a user reads a report.
CPAN Testers Admin site
This is a site that I have been working on, on and off, for about 4 years, before we even had a Metabase. As a consequence it has been promised at various points and I've always failed to deliver. Now I have release the modules above, and there have been several comments already about having such functionality, I think I need to put some focus on it again. I have shown Breno the site running on my laptop and he has given me some more ideas to make it even more useful. It'll still be awhile before its released, but this will likely be down to running with some beta testers first before a major launch, just so it doesn't break the eco-system too badly!
Essentially the site was written to help authors and testers to highlight dubious reports and have them deleted from the system. Although the reports won't actually be deleted, they will be marked to ignore, so that they can be removed from JSON files and summary requests, as well as on the CPAN Testers Report site. This will hopefully enable us to get more accurate data, and bogus reports about running out of memory or disk space can be disregarded.
However, following Breno suggestions, I will look to making the site more public, so that authors can more easily see the reporting patterns without having to log in. The log in aspect will still be needed to flag reports, but the alternate browsing of reports by testers will be much more accessible.
Thanks
I would like to thank a few people who have helped to get me here, and have enabled these QA projects, not just CPAN Testers, to advance further.
Firstly I would like to single out ShadowCat Systems, who have very kindly paid for my flight here. Thanks to BooK and Laurent for organising the event, and to all the sponsors and Perl community who have provided the funding for the venue, accommodation and food for the event. It has already been very much appreciated, and hopefully the significant submissions to GitHub and PAUSE are evidence of just how worthwhile this event is.
Thanks also to all those who are here, and are helping out in all shapes and forms to help Perl QA be even better than it already is.
Cross-posted from Memoirs of a Roadie
Leave a comment