July 2013 Archives

My Love/Hate Relationship with CPAN Testers

The Great Part

I really like the idea of a CPAN testing service where individuals volunteer their computers to tests CPAN packages and those results are accumulated and shared.

The accumulated results then are tallied with other result. People can use this information to help me decide whether to use a package or when a package fails if others have a similar problem.

Comparing the CPAN Testers to Travis (which I also use on the github repository), the CPAN Testers covers far more OS’s, OS distributions, and releases of Perl than I could ever hope try with Travis.

And that’s the good part.

The Not-so-Good Part

While there is lots of emphasis on rating a perl module, there is very little-to-no effort on detecting false blame, assessing the quality of the individual CPAN testers, the management of the Smokers, or of the quality of the Perl releases themselves.

Yet the information is already there to do this. It is just a matter cross-tabulating data, analyzing it, and presenting it.

Suppose a particular smoker reports a failure for a Perl module on a particular OS and distribution and Perl version, but several other smokers do not report that error. It could be that there is an obscure or intermittent bug in the perl module tested, but it also might be bugs in the Smoker environment or setup, or bugs in the Perl release.

Even if it is an obscure bug in the program that many other smokers don’t encounter, as a person assessing a Perl module for whether it will work on my systems, I’d like to know if a failure seems to be anomalous to that or a few smokers. If it is anomalous, the Perl module will probably work for me.

Rating a Perl Release.

Going further there is a lot of data there to rate the overall release itself.

Consider this exchange:

Me:

This Perl double free or corruption crash doesn’t look good for Perl 5.19.0 Comments?

Tester:

5.19.0 was just a devel snapshot, I wouldn’t overrate it. Current git repository has 973 commits on top of 5.19.0.

Well and good, but isn’t that failure permanently marked in the testing service as a problem with my package when it isn’t? If 5.19.0 was flaky, isn’t the decent thing to do is to retract the report? Or maybe in the summary for this package the line listing 5.19.0 should note that this release was more unstable than the others?

Again, what sucks is that to me it feels like blame will likely forever be put on the package. In those cases where the report is proved faulty, well tough, live with it. It is the hypocrisy that bothers me the most — that the service attempts to be so rigorous about how a Perl module should work with everything, but so lax when it comes to ensuring what it reports is really accurate.

And this gets to the aspect how well or poorly the smokers are managed.

I mentioned before that if a particular smoker is the only one that reports a failure for that particular OS distro and Perl release, the smoker might be suspect. And if that happens with several packages, then that suggests more that smoker (or it could be a set of smokers managed by a person) is at fault. It may still be true that there may be legitimate bugs in all of the packages; perhaps the smoker has not-commonly-used LANG and LOCALE settings. But again, as things stand there is no way to determine that this smoker or set of smokers managed by a single person exhibit such characteristics.

Knowing that is both helpful to the person writing the failing package(s) as well as those who might want to assess overall failures of a particular package.

Rating the Testers and Responsiveness of Testers

There is an asymmetry in the way testing is done. Testers can choose Perl modules, but Perl Module authors, short of opting totally of the testing service, can’t choose testers. I think it is only a matter of basic fairness. The premise that Perl Modules will get better if they are tested and rated also applies to the testers.

I think one should have the ability for Perl Module authors to rate the responsiveness of testers of those reports they get (unsolicited except at the too coarse scale of opt-out of everything).

Let’s say I get a report that says my Perl Module fails its test on this smoker. Unless the error message clearly shows what the problem is (and again cross-checks to ensure validity are lacking) or unless I can reproduce the problem in an environment I control, I’m at the mercy of the the person running the smoker.

As you might expect there are some that are very very helpful, and some that I’ve sent email too and just don’t get responses. Having a simple mechanism where I could +1 or -1 the tester and the testers accumulated score that sent along with the report would be great. That way, if get several reports with failures I can pick which tester to work with first.

Given the fact that there is no effort to make each smoker not duplicate the work of others, in theory if the problem really is in the Perl Module rather than the tester’s setup or the Perl version, I should get multiple reports.

Alternatives to CPAN Testing Service?

I believe there are alternatives to the CPAN testing system. Any comments on them and how good they compare the CPAN testing system? Is there a way to have those show up in metacpan.org or search.cpan.org?

About rockyb

user-pic I blog about Perl.