Analysing CPAN Testers' Reports

The task

In the second round of the Pull Request Challenge, I was assigned Olson::Abbreviations. At work, we've been bitten by the ambiguity of the EST timezone, so I liked the general idea of the module. Moreover, there were some test failures reported at CPAN Testers Reports, so the task was clear: Make all the tests pass!

FAIL or PASS

The module had only 5 failures in non-development versions of Perl, contrasting with more than 1500 good test results. Therefore, I was able to analyse the failure reports one by one easily.

The first error reported was usually a variation of
Odd number of elements in hash assignment at C:/strawberry163/perl/vendor/lib/MooseX/ClassAttribute.pm line 37.

What all the failed reports shared was the version of the dependent MooseX::ClassAttribute: 0.26. And indeed, when I checked its Changes, it said:

0.27   2013-03-28

- The latest Moose release (2.08) broke this module. This release
  fixes MooseX::ClassAttribute to work with both new and old
  Mooses. Reported by Jonathan Stowe. RT #84263.

At that point, I was almost sure the fix would be simple: just require version 0.27 of the dependency. However, I wanted to be sure I didn't miss something, so I decided to analyse the PASS reports as well.

Getting the reports

How does one fetch 1500 reports from CPAN testers? I searched for some API, but the only advice I got was to download the whole 2GB database. So, I decided to scrape the site gently.

I used WWW::Mechanize to get the report list. From there, I extracted individual reports using HTML::TableExtract. I only downloaded reports for stable Perl versions.

After downloading the first few reports, I discovered they were coming in two flavours. In one of them, the dependencies are reported in the following way:

Module Name                        Have     Want
Moose                            2.1403        0
MooseX::ClassAttribute             0.27     0.25

while, in the other, the order of the columns is different:

Module                 Need Have    
---------------------- ---- --------
Moose                  0    2.1403  
MooseX::ClassAttribute 0.25 0.27    

Also, I noticed the version 0.26 was present in one of the PASS reports. Therefore, I extracted Moose version from the reports, too, to verify Moose older than 2.08 could work with 0.26 without problems.

Here is the full code:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use List::Util qw{ sum };
use WWW::Mechanize;
use HTML::TableExtract;

use Data::Dumper;


my $url = 'http://static.cpantesters.org/distro/O/Olson-Abbreviations.html';

my $mech = 'WWW::Mechanize'->new();
$mech->get($url);

my $te = 'HTML::TableExtract'->new( keep_html => 1,
                                    headers   => [ 'Grade',
                                                   'Perl version',
                                                   'OS name',
                                                   'OS version',
                                                   'Architecture',
                                                 ]);
$te->parse($mech->content);
my $table = ($te->tables)[0];     # Recent module verision only.

my %reports;

for my $row ($table->rows) {
    my $perl_version = $row->[1];
    my $major = (split /\./, $perl_version)[1];
    next if $perl_version !~ /^5\.[0-9]+\.[0-9]+$/
         or 1 == $major % 2;      # Stable Perls only.

    $mech->update_html($row->[0]);
    my $link = ($mech->links)[0];
    push @{ $reports{ $link->text } }, $link->url;
}

my %version;
my $count = 0;
my $all = sum(map scalar @$_, values %reports);
for my $result (sort keys %reports) {
    for my $url (@{ $reports{$result} }) {
        $mech->get($url);
        my $report = $mech->content;
        my $pos = $mech->content =~ /Module\s+Need\s+Have/ ? 1
          : $mech->content =~ /Module Name\s+Have\s+Want/  ? 0
                                                           : undef;
        unless (defined $pos) {
            warn "Can't find module version for $url.\n";
            next
        }

        my $mxca  = ($mech->content =~
                     /MooseX::ClassAttribute\s+(\S+)\s+(\S+)/)[$pos];
        my $moose = ($mech->content =~ /Moose\s+(\S+)\s+(\S+)/)[$pos];
        my $deps  = "$mxca + $moose";
        ++$version{$result}{$deps};
        print STDERR ++$count, ' / ', $all,
            " ($result $deps $version{$result}{$deps})    \r";
        sleep 1;                  # Let the server breathe.
    }
}
say STDERR q();
say "MooseX::ClassAttribute + Moose";
print Dumper \%version;

And here is the result I got after 45 minutes:

MooseX::ClassAttibute + Moose
$VAR1 = {
          'FAIL' => {
                      '0.26 + 2.0800' => 2,
                      '0.26 + 2.1005' => 2
                      '0.26 + 2.1204' => 1,
                    },
          'PASS' => {
                      '0.26 + 2.0402' => 1,
                      '0.26 + 2.0602' => 1,
                      '0.26 + 2.0603' => 3,
                      '0.26 + 2.0604' => 309,
                      '0.27 + 2.0604' => 4,
                      '0.27 + 2.0801' => 70,
                      '0.27 + 2.0802' => 16,
                      '0.27 + 2.0900' => 2,
                      '0.27 + 2.1001' => 2,
                      '0.27 + 2.1004' => 2,
                      '0.27 + 2.1005' => 231,
                      '0.27 + 2.1100' => 4,
                      '0.27 + 2.1101' => 5,
                      '0.27 + 2.1102' => 7,
                      '0.27 + 2.1103' => 5,
                      '0.27 + 2.1105' => 1,
                      '0.27 + 2.1106' => 2,
                      '0.27 + 2.1107' => 2,
                      '0.27 + 2.1201' => 3
                      '0.27 + 2.1202' => 191,
                      '0.27 + 2.1203' => 2,
                      '0.27 + 2.1204' => 148,
                      '0.27 + 2.1205' => 33,
                      '0.27 + 2.1206' => 84,
                      '0.27 + 2.1209' => 2,
                      '0.27 + 2.1210' => 1,
                      '0.27 + 2.1211' => 3,
                      '0.27 + 2.1213' => 234,
                      '0.27 + 2.1303' => 1,
                      '0.27 + 2.1304' => 8,
                      '0.27 + 2.1400' => 6,
                      '0.27 + 2.1401' => 13,
                      '0.27 + 2.1402' => 28,
                      '0.27 + 2.1403' => 18,
                    }
        };

Only one report didn't contain the versions at all. As you can see, my assumption was correct: 0.27 works with any Moose version, but 0.26 need pre-2.08 Moose to work.

Conclusion

My pull request just changed the required version in Makefile.PL:

requires 'MooseX::ClassAttribute' => '0.27';

And the same in the module itself:

use MooseX::ClassAttribute 0.27;
I pondered the possibility to allow the old version of MooseX::ClassAttribute if Moose was old, but I decided against it. It would just duplicate the work introduced in 0.27.

7 Comments

Good work!

As well as specifying the version in Makefile.PL, you should specify the constraint in the code as well:


use MooseX::ClassAttribute 0.27;

Did you know about analyses.cpantesters.org? I think it would have provided you with the information you needed, without the need of a scraper.

45 minutes via webscraping? My script gets the reports in 2 minutes, with some yaml caching, and files it under t/reports/$version

https://github.com/rurban/perl-compiler/blob/master/t/download-reports

Open the page showing the combination of mod:MooseX::ClassAttribute and mod:Moose, and sort by "state".

Leave a comment

About E. Choroba

user-pic I blog about Perl.