A Random Story

For my first (real) tale, I thought I’d tell the story of how I came to be the maintainer for Data::Random.

I first came across this module when I was trying to find some way to test some date routines I was doing for $work.  Funnily enough (or not, depending on your perspective), the date routines were themselves for testing—I’m working on introducing TDD to my department, and so I try to maintain a decent library to make testing easier.  To that end, I decided to make it easier to set up records with certain dates:

    # go far back in time to avoid conflicting with real rates
    my $initial_date = '1/1/1990';
    $prop->create_billing_period(
        start_date => days_from($initial_date, 30),
        end_date => days_from($initial_date, 60)
    );
That sort of thing.  And of course you must use TDD on your test function helpers—if you can’t trust your testing code, you’re in pretty big trouble.  So I wanted to test my days_from method very thoroughly.


So I thought, why not come up with a function to create random dates?  And then I thought, I’m sure someone’s already done that.  So I looked on CPAN.  And I found Data::Random.  So I used it, and everyone lived happily ever after.

No, wait, that’s not right ...

Well, we all were living happily ever after for a while.  Then, one day, I’m running some tests and I start getting these weird Moose errors in a new module one of my colleagues has written.  I grabbed said colleague and we started debugging.  My code called something which had a use Test::Rent::Config in it.  Test::Rent::Config is a test object designed to replace a Rent::Config, naturally enough, so it subclasses that class and overrides a few methods here and there.  But the code kept complaining about a method being missing.  There it was, right there in Rent::Config, but the code couldn’t find it.  What the heck??  Well, after nearly two hours of staring at the code, and digging into internal Moose code (’cause it had to be something tricky going on, and we all know there’s few things trickier than Moose guts), and putting debugging statements all over, and fiddling with the debugger (always dicey when trying to catch compile-time errors), we finally figured it out: Data::Random contained this line:

    use lib qw(..);
And I just happened to be running my tests in the directory lib/Test/t.  Which means that when Test::Rent::Config tried to load Rent::Config, it actually loaded ... itself.


Crap.

Obviously the fix was simple: remove that line.  What was it doing in there anyway?  Probably something that had been put in during testing, and just never taken out.  No one had noticed it in the past 8 years, apparently, because it takes a very special set of circumstances to trigger this bug.  Lucky me.

So I put in a hack fix (something along the lines of shift @INC if $INC[0] eq '..') and decided to open an RT bug.  Didn’t expect it to do me a lot of good, of course, since a) there had been no CPAN release in 8 years, b) the author only had one other module and it hadn’t seen a new release in 10 years, c) and there were 5 open RT bugs, ranging from 4 to 8 years old, none of which had any responses from the author.

Not that I’m casting any aspersions on the author, of course.  People move on; their lives get complicated.  Sometimes shit (and real life) happens.  In fact, I thought I’d poke around Google and see if I could see what this particular author was up to these days, just for fun.  (And also because I figured an 8-year-old email address didn’t have much chance of working.) And I found him: he’s currently the CEO of a successful company in San Francisco.  Well, heck: no wonder he didn’t have time to fiddle with CPAN distributions any more.  He’s busy building a business empire and whatnot.

So I sent him an email.  I used 3 addresses: first, the one in the module’s POD, even though I felt pretty sure it was outdated; and secondly, two wild guesses based on the fact that I knew the author’s full name, and I now knew the domain name for his new company.  My email said, hey, I see you haven’t been around CPAN for a while; perhaps you’re interested in passing on maintainership of this module?  Lo and behold, he responded.  Sure, he said, take it over; I don’t have much time for Perl any more.  By the way ... how do I even do that?

Well, heck, I dunno ... but I figured I ought to figure it out for him.  He was a busy CEO, after all; I was the one who wanted my bug fixed.  So I got to poking around PAUSE.  Didn’t take long to find it: “Change Permissions,” it’s called.  So I explained to him how to do it (and prayed he could remember his PAUSE password), and, next thing you know, I was the primary maintainer.

Great, now to fix the bug.  First thing I need, of course, is source code.  I could ask the author if he had a repository I could start from, but it certainly wouldn’t be in Git (on account of, you know, Git didn’t exist 8 years ago), so best case scenario is I’d have to import it from another VCS.  Alternatively, I could take advantage of the awesomeness that is gitPAN.  So I started a discussion about it, and eventually just forked the gitPAN repo and called it a day.

Now I was ready to fix the bug.  First, I created a failing test, then I simply removed the offending line.  Simple enough.  I made a few packaging changes for easier releasing and was ready to start.

First, a developer release.  I don’t know much about how to release stuff on CPAN, but one thing I do know: first thing you do is make a developer release (i.e. a release number with an underscore in it), then you see how many FAIL reports you get back from CPAN Testers.  This also helps make sure you’ve got all your packaging stuff just right and don’t end up with a half-broken official CPAN release ... which is exactly what I did on my first try.  My MANIFEST file was wrong and the install failed for everyone.  Okay, second try.

This time I got a more subtle problem.  A few of the CPAN Testers reports showed a test failing, but that same test passed on all the others.  I looked for patterns in versions of Perl, operating systems, etc, but found none.  What could it be?  I looked more closely at the test in question.  It was a test of rand_time, which was a method I personally had not been using, but, looking at the test, I could easily see what was going on.  The test generated a random time between now and now, which was essentially guaranteed to return the current time.  It then compared the results against the current time.  But the problem with that is that there were two different places where “the current time” was being captured, and, every once in a great while, it was possible for the clock to roll over to a new second in between the two places.  Thus, intermittent failures.  I’d seen this sort of error before when messing around with datetimes, luckily, so I recognized it fairly easily.

Okay, fine: rewrite that test.  But there was another test having intermittent failures too: it turned out that the test was making sure the that the random time was defined by testing its truth or falsehood.  But what if the random time returned 0 seconds after midnight?  That is, 00:00:00, also known as: 0 seconds.  A.k.a., “false.” The test was only generating a very small number of random samples to test, so it was very rare that it just so happened to get 0 back as an answer.

Obviously the whole test file needed to be rewritten.  I decided to generate 5 times as many random numbers as there were possible values, figuring that that would practically guarantee me generating all possible values.  I left the undefined/false bug to make sure I tripped it, then ran my new test several times.  The current time bug was fixed, but I wasn’t getting that 0 seconds failure reliably enough.  I kept on upping it, until I got the failure every time; I ended up at generating 10 random times for every possible value.  Of course, now the problem was that the test took forever to complete.  Don’t want people to have to sit and stare at the CPAN shell for 2 minutes while my tests are running, right?  But isn’t there a way to have long-running tests only run sometimes?  Well, sure there is!  Just check $AUTOMATED_TESTING.  Rock on.  So now my test does 10x as many tests as possible values when $AUTOMATED_TESTING is on, but only half as many as possible when it’s off.  Cool.

My third try had another packaging failure.  When I rewrote the test file, I naturally updated it to use the done_testing feature of modern Test::More versions, but I forgot to change the Test::More version number (to 0.88) under test_requires in the Makefile.PL.  D’oh!

My fourth try had a number of insane failures from CPAN Testers.  You can read the full discussion if you really care about the details, but let me summarize it for you: There are 86,400 seconds in a day.  If you run a test file that generates 10x as many random time values as there possible values, that’s 864,000 tests (and that’s only the first range I was testing).  If you have a test file, using Test::More, that actually generates around a million tests, you will run out of memory.  Bummer.

So I rewrote the test file such that every range of tests was one big test.  And, finally, on dev release number 5, it worked.  Thank <insert deity of choice>.

Then I changed nothing but the version number, to something without an underscore, and I was done.  Except I still need to go back and remove all those extranneous dev versions from PAUSE so they’re not eating up disk space for no reason.

Anyhow, that’s the story of how I took over Data::Random, and I hope there’s something instructive in there for folks.  Taking over a module can be a bit of a chore, but I think it’s worthwhile in the end.  Perhaps if we all did that, we could clean up some of the cruft that’s out there and make our CPAN a better place to visit.

3 Comments

Nice story. I admit, I usually (about 75% of the time) took the easy route of just fork the module because I need something usable and on CPAN rather quickly. Perhaps I should be more like you.

A library should not mess with "use lib", ever.

On a rather similar note, lately I've been pissed with PHP web applications which messes around with php.ini settings (usually error_reporting, or memory_limit). Changing application setting is now not as simple as editing php.ini, but you have to chase down into application code. To application developers: Please don't change settings or environment outside your application!

Leave a comment

About Buddy Burden

user-pic 5 years in California, 15 years in Perl, 25 years in computers, 45 years in bare feet.