A Date with CPAN: And Now, a Word from Our Sponsor
[This is a post in my latest long-ass series. You may want to begin at the beginning. I do not promise that the next post in the series will be next week. Just that I will eventually finish it, someday. Unless I get hit by a bus.]
Today’s blog post is brought to you by CPAN Testers. CPAN Testers: testing your code on every version of Perl on every operating system in every possible circumstance ... so you don’t have to.
I’ve talked about CPAN Testers before. If you’ve read that, you probably know how awesome I think they are already. And, with this foray into creating a date module, they’ve stepped up again.
Now, you will imagine that I made sure all my tests passed on my machine before I dared upload Date::Easy to CPAN. But that doesn’t mean they’ll pass on everyone else’s machines, so I watched CPAN Testers with some trepidation. Remember that dates are annoying to get right, and, even though I’m trying to mess with the underlying date code as little as possible, there’s still chances aplenty for things to go tragically wrong. Which is pretty much exactly what happened.
Oh, I got a few successes, sure. But a lot more failures. Most of the failures seemed to fit one particular pattern, and I had a sneaking suspicion I knew what was causing it ... and, as it turned out, I was completely wrong. But we’ll dive into that next week. This week, I want to tell you the saga of how I tracked it down. Or, to be more precise, how I attempted to track it down, and failed, and how the good folks on the CPAN Testers mailing list graciously set me straight.
The trickiest thing when you have a raft of failures from CPAN Testers is figuring out exactly what aspect of a smoker’s build is causing the failure. So you start looking for patterns. Sometimes the pattern is obvious, or you can use some of the great tools that CPAN Testers hosts for you, like this handy Perl version by OS matrix. Particularly if it’s a certain OS (say, it always fails on Windows machines) or a certain version of versions of Perl (perhaps everything from 5.18.0 and up fails). Once you’ve recognized what the failing systems have in common, it’s much easier to figure out what the problem is and how to fix it. But what if you can’t figure out what the point of commonality is?
Well, one thing that I remembered was a little script that tobyink put together a while back. tobyink is one of my favorite CPAN authors: he has innovative and usually elegant solutions to problems I have every day. A few years back, he wrote this little gem which gives you a command-line interface to the JSON data underlying CPAN Testers and allows you to slice and dice it in new and interesting ways. I had downloaded the script and done some minor mods on it a while back, like adding a switch to break things down by threaded vs unthreaded Perls. Now I went back to my version of the script and starting hacking on it in earnest, trying to track down the actual point of failure. And, while in the end it was the back-and-forth on the mailing list that eventually tipped me to the problem, I was able to learn a lot from fiddling with the script, and I added a lot of bits to it that I think others will find useful. So I wanted to share that with you today.
First let me just give you the updated script in its entirety. Since code samples here tend to degrade over time,1 I’m just going to link you to where it’s checked in on GitHub. That makes it easier to download, too, plus if anyone gets a wild urge to throw me a pull request, now you can. Some of the changes I made to tobyink’s original just reflect our different programming styles. But let’s look at some of the new features I added.
First, let’s look at the output of tobyink’s original version:
[cibola:~/docs/blog] gists/cpan-testers.pl Date::Easy
CPAN Testers results for Date-Easy version 0.01
PASS FAIL ETC
Perl 5.008 0 12 0
Perl 5.010 0 17 0
Perl 5.012 3 34 0
Perl 5.014 2 29 0
Perl 5.016 2 27 0
Perl 5.018 3 36 0
Perl 5.020 6 58 0
Perl 5.022 5 37 0
Perl 5.023 26 21 0
As you can see, the default breakdown is by Perl version. Or we can break it down by OS:
[cibola:~/docs/blog] gists/cpan-testers.pl --os Date::Easy
CPAN Testers results for Date-Easy version 0.01
PASS FAIL ETC
Perl 5.008, Debian GNU/kFreeBSD 0 4 0
Perl 5.008, FreeBSD 0 4 0
Perl 5.008, GNU/Linux 0 2 0
Perl 5.008, Mac OS X 0 1 0
Perl 5.008, NetBSD 0 1 0
Perl 5.010, Debian GNU/kFreeBSD 0 2 0
Perl 5.010, FreeBSD 0 7 0
Perl 5.010, GNU/Linux 0 3 0
Perl 5.010, NetBSD 0 5 0
Perl 5.012, FreeBSD 0 6 0
Perl 5.012, GNU/Linux 3 2 0
Perl 5.012, NetBSD 0 26 0
Perl 5.014, FreeBSD 0 2 0
Perl 5.014, GNU/Linux 2 5 0
Perl 5.014, Mac OS X 0 2 0
Perl 5.014, NetBSD 0 20 0
Perl 5.016, FreeBSD 0 5 0
Perl 5.016, GNU/Linux 2 3 0
Perl 5.016, Mac OS X 0 3 0
Perl 5.016, NetBSD 0 16 0
Perl 5.018, FreeBSD 0 12 0
Perl 5.018, GNU/Linux 3 7 0
Perl 5.018, Mac OS X 0 2 0
Perl 5.018, NetBSD 0 15 0
Perl 5.020, Debian GNU/kFreeBSD 0 16 0
Perl 5.020, FreeBSD 0 16 0
Perl 5.020, GNU/Linux 6 14 0
Perl 5.020, Mac OS X 0 4 0
Perl 5.020, NetBSD 0 8 0
Perl 5.022, Debian GNU/kFreeBSD 0 8 0
Perl 5.022, FreeBSD 0 7 0
Perl 5.022, GNU/Linux 5 16 0
Perl 5.022, Mac OS X 0 2 0
Perl 5.022, NetBSD 0 4 0
Perl 5.023, FreeBSD 0 2 0
Perl 5.023, GNU/Linux 26 18 0
Perl 5.023, Mac OS X 0 1 0
So now we have it broken down by Perl version and OS. What about other breakdowns?
The first thing I added, a couple of years back, was an option to break down by threaded Perls vs non-threaded Perls. To do that, I had to tweak the run
method of the class. There’s 3 major places you have to update to add a new switch: you have to have a local variable for it, you have to bind that variable to the switch name in the call to Getopt::Long‘s GetOptionsFromArray
, and then you have to pass the variable into the class constructor. Of course, you also have to make a new attribute to hold it, so I suppose it’s really 4 places.2 After all that, the class now knows that you passed in a new switch when you use it on the command line, but now we have to actually tweak the code to do something with it.
This involves mods to the version_data
method, which is what actually populates the numbers for that table up there. Once the data is populated correctly, the version_report
method does the actual printing, but that won’t need any mods at all. We just have to get the key in hash returned by version_data
right and we’re golden.
tobyink’s original code looked like this:
my $key = $self->os_data
? sprintf("Perl 5.%03d, %s", $pv, $_->{ostext})
: sprintf("Perl 5.%03d", $pv);
If we want to add another breakdown, we just need to extend that ternary ... and, honestly, that’s exactly how I did it at first. But you see how it can get messy real fast: we’re taking 2 possibilities (version, or version + OS) and turning it into 4: version, version + OS, version + threaded, or all three. So I eventually just created two new arrays and built the key like so:
push @key_fmt, "Perl 5.%03d";
push @args, $pv;
if ($self->os_data)
{
push @key_fmt, "%s";
push @args, $_->{ostext};
}
if ($self->threaded)
{
push @key_fmt, "%s";
push @args, $_->{platform} =~ /thread/ ? 'threaded' : 'non-thread';
}
my $key = sprintf(join(', ', @key_fmt), @args);
Now we can break down by threaded or not:
[cibola:~/proj/date-easy] cpan-testers --threaded Date::Easy
CPAN Testers results for Date-Easy version 0.01
PASS FAIL OTHER TOTAL
Perl 5.008, non-thread 0 6 0 6
Perl 5.008, threaded 0 6 0 6
Perl 5.010, non-thread 0 11 0 11
Perl 5.010, threaded 0 6 0 6
Perl 5.012, non-thread 2 18 0 20
Perl 5.012, threaded 1 16 0 17
Perl 5.014, non-thread 2 14 0 16
Perl 5.014, threaded 0 15 0 15
Perl 5.016, non-thread 1 13 0 14
Perl 5.016, threaded 1 14 0 15
Perl 5.018, non-thread 2 19 0 21
Perl 5.018, threaded 1 17 0 18
Perl 5.020, non-thread 4 29 0 33
Perl 5.020, threaded 2 29 0 31
Perl 5.022, non-thread 3 19 0 22
Perl 5.022, threaded 2 18 0 20
Perl 5.023, non-thread 17 11 0 28
Perl 5.023, threaded 9 10 0 19
And the best part is, the code is now super-easy to extend. How about if we actually don’t care about the Perl version? Simple: just slap a conditional around those first two lines:
if ($self->perl_ver)
{
push @key_fmt, "Perl 5.%03d";
push @args, $pv;
}
and add a new switch, and voilà:
[cibola:~/proj/date-easy] cpan-testers --no-perlver --threaded Date::Easy
CPAN Testers results for Date-Easy version 0.01
PASS FAIL OTHER TOTAL
non-thread 31 140 0 171
threaded 16 131 0 147
Okay, but what if we want to get really wacky about how to break down things? For a while I was fixated on a line in the reports called config_args
: this tells you what arguments were passed to ./Configure
before building the Perl in question.3 It turns out that this was a red herring, but, by making my script capable of looking at this variable, I was able to come up with some convincing (if wrong) statistics.
The trick was that tobyink’s original was just pulling down the JSON summary of the reports; it wasn’t looking at individual reports themselves. So I wrote some code to do that:
sub _get_web_data
{
my ($self, $uri) = @_;
my $data;
for (1..5) { $data = LWP::Simple::get($uri) and last }
die "Failed to retrieve URI $uri\n" unless $data;
return $data;
}
sub get_report
{
my ($self, $result) = @_;
my $guid = $result->{guid};
my $uri = $self->report_uri($guid);
my $file = $self->_cache_filename($guid);
unless (-r $file)
{
$file->spew($self->_get_web_data($uri));
}
return scalar $file->slurp;
}
sub report_uri
{
my ($self, $guid) = @_;
return "http://cpantesters.org/cpan/report/$guid";
}
Pretty basic. I give the retrieval a few tries, because sometimes the CPAN Testers server gets a little overwhelmed. (Not that I blame it: it’s processing a massive quantity of these reports, and they’re coming in pretty much constantly.) I stick the results in a cachefile so I don’t ever bother retrieving the same report more than once per run.4 The URL for the report is a pretty simple one, based on the GUID in the JSON summary data.
Now I can make a “special” breakdown ... I knew that this would be a oneoff thing, so I wanted to make a generic framework for adding new breakdowns. So I added a --special
switch whose argument would just fire a method with a corresponding name:
sub get_special_val
{
my ($self, $result) = @_;
my $spec = "special_" . $self->special =~ s/-/_/gr;
$self->_mark_progress;
return $self->$spec($result);
}
and, back in
version_data
: if ($self->special)
{
push @key_fmt, "%s";
push @args, $self->get_special_val($_);
}
Now, every time I write a function like this:
sub special_config_pthread
{
my ($self, $result) = @_;
my $report = $self->get_report($result);
my ($args) = $report =~ /^\s*config_args='(.*)'\s*$/m;
return "config_args missing" unless defined $args;
return "config_args empty" unless $args;
return ( grep { $_ eq 'pthread' } split(' ', $args) ) ? 'pthread' : 'no pthread';
}
}
I can invoke it like so:
[cibola:~/proj/date-easy] cpan-testers --cache --no-perlver --special config-pthread Date::Easy
processing reports ............................... done
CPAN Testers results for Date-Easy version 0.01
PASS FAIL OTHER TOTAL
config_args empty 47 4 0 51
no pthread 0 241 0 241
pthread 0 26 0 26
Of course, I might forget what special breakdown methods I’ve written. No worries: I’ll just add a line of code to the show_help
method:
say foreach map { " $_" } apply { s/_/-/g } grep { s/^special_// } keys %App::CpanTesters::;
The
keys %App::CpanTesters::
thing is a way to rifle through the namespace of my class—I’m cheating a bit by not bothering to distinguish methods from any other types of variables, but I know I didn’t name anything “special_"something other than those methods. So I try to remove that prefix and, assuming it’s successufl, change all the underscores to dashes, toss in a few leading spaces, and print the lot. If you’re not familiar with apply
(from List::MoreUtils), think of it as just a nicer way of saying map { s/_/-/g; $_ }
.5
This is functional programming, and some people find it very easy to grasp. Your mileage may vary, of course, but I’m starting to enjoy its concision of expression more and more as I do more and more with it.6
One final feature I want to mention here: I found as I dug in that not all CPAN Testers failures are equal. Specifically, my failures came in 3 distinct groups, and one group was much larger than the others. But how to isolate just the one failure I was interested in? All the breakdowns show us totals for passing reports and failing reports. But what if I only care about certain failing reports?
So I added yet another switch: --failure
, whose argument is an arbitrary string to search for.7 As long as you can isolate some string that the failures you care about have, and the failures you don’t care about don’t have, you can use that to show breakdowns by which reports have the string (and are therefore presumably fails) and which don’t (whether they’re fails or passes or what-have-you). To achieve this, we just need to turn this line of code:
my $num = { PASS => 0, FAIL => 1 }->{$_->{status}} // 2;
into this:
my $num = $self->failure_string
? $self->check_expected_failure($_)
: { PASS => 0, FAIL => 1 }->{$_->{status}} // 2;
and then add this method:
sub check_expected_failure
{
my ($self, $result) = @_;
state $regex = qr/${\($self->failure_string)}/;
my $report = $self->get_report($result);
$self->_mark_progress;
return $report =~ $regex ? 1 : 0;
}
And we’re set.8 And here’s what it looks like in action:
[cibola:~/proj/date-easy] cpan-testers --cache --failure "can still use parsedate normally" Date::Easy
processing reports ............................... done
CPAN Testers results for Date-Easy version 0.01
Does report have failure: can still use parsedate normally
W/O WITH TOTAL
Perl 5.008 0 12 12
Perl 5.010 0 17 17
Perl 5.012 3 34 37
Perl 5.014 3 28 31
Perl 5.016 3 26 29
Perl 5.018 3 36 39
Perl 5.020 45 19 64
Perl 5.022 6 36 42
Perl 5.023 27 20 47
So, despite the fact that this didn’t actually help me track down what was causing my failures, it was an interesting yak to shave, and I think it may be helpful to some of you folks out there as well. Which is why I decided to take a little time out from fiddling with Date::Easy to tell you about it.
Next time, we’ll talk about what really went wrong, and how I fixed it.
(If you want to see the great feedback I got from the excellent folks on the CPAN Testers mailing list, the entire discussion is here.)
__________
1 Which sounds like an impossibility, but I’ve seen it. It’s kinda freaky.
2 This is where you start to appreciate systems like MooseX::App::Cmd. But I suppose tobyink felt it was overkill for this simple little script, and I haven’t found the time or energy to want to rewrite it yet. Although it’s grown to the point where something along those lines is not so crazy any more.
3 Thanks to Perl master Tony Cook for explaining this to me on the CPAN Testers mailing list.
4 In tobyink’s original, all cachefiles were deleted at the end of every run. My version adds an option to make the cache permanent, which can save you loads of time, if you’re running breakdowns that need to look at individual reports and you have to run them several times. But right now it never cleans up that cache, so I didn’t think it wise to make that the default until I addressed that issue.
5 Of course, since I upped my minimum required version of Perl to 5.14, I could have used the super-cool new /r
modifier like so: map { s/_/-/gr }
and it would have had the same effect. But using apply
in this case just reads a little more naturally, to me.
6 For an even wackier use of functional programming, check out the version_report
method. Once you know that $self->_full_report_info
is a hash where the values are arrayrefs containing lists of two-element arrayrefs, the code is fairly easy to follow. Or at least I find it to be so.
7 Technically, it’s a regex. So you can get a bit fancy if you need to.
8 The $self->_mark_progress
bit is the part that prints the dots so I don’t get bored while waiting for it to churn through all those reports.
The matrix is only the first step. You can try the more detailed and sophisticated analysis of the reports at (probably still experimental)
http://analysis.cpantesters.org/solved?distv=Date-Easy-0.01
Good tip. Not being a serious stats person, some of that is Greek to me.[1] But I bet lots of other folks out there will find it useful. :-)
[1] Like, what's a "theta"? Oh, hey: that really is Greek. Heheheheheh.
This tool looks nice. Will you put it on CPAN?
It actually already is on CPAN, although obviously I can't recommend you get the current version. I hope to have the corrected version up within the next few days or so.
But you can get it now if you really want to: it's searchable on search.cpan.org, but not on MetaCPAN for some reason ... I plan to address that, but I figured I ought to concentrate on making it work first. ;-> But you can get to it on MetaCPAN if you know the URL: https://metacpan.org/pod/release/BAREFOOT/Date-Easy-0.01/lib/Date/Easy.pm
> It actually already is on CPAN
I think preaction was referring to the tester analysis tool.
But the reason why metacpan isn't showing your distribution is because it's not in the index. Why/how it didn't get indexed should be revealed by the email you got back from PAUSE when you did the upload.
Ah, my bad then. Will I put the
cpan-testers
script on CPAN? I suppose I might, although right now, since it's completely self-contained in a single file, having it on CPAN doesn't make it easier to install; plus I'm not the original author so I feel like I should probably talk to tobyink about it first. But, maybe at some point, yes.Sure. Like I said, I'm going to work on that aspect after I get things working a bit better. Priorities, don'cha know. :-)