A Date with CPAN, Part 5: Everything's Relative

[This is a post in my latest long-ass series.  You may want to begin at the beginning.  I do not promise that the next post in the series will be next week.  Just that I will eventually finish it, someday.  Unless I get hit by a bus.

IMPORTANT NOTE!  When I provide you links to code on GitHub, I’m giving you links to particular commits.  This allows me to show you the code as it was at the time the blog post was written and insures that the code references will make sense in the context of this post.  Just be aware that the latest version of the code may be very different.]


Last time I actually got down to it and wrote some code: Date::Easy now has a date constructor for turning arbitrary human-readable strings into date objects.  Now it’s time to expand on that and allow even more formats.

But first, we’ll do a bit of math.

As you may recall from some of my other posts, I’m generally a fan of TDD, and I’m definitely using that here.  Since one of the things that Time::ParseDate gives us is the ability to parse relative date formats (such as “next Thursday”), for my very first test, I thought it would be cool to check that today + 1 == date("tomorrow").1  Of course, that involves making sure that == would do the right thing here, and I wasn’t quite ready for that yet.  But, more importantly,2 it meant making sure that today + 1 would do the right thing, and, while I wasn’t quite ready for that yet either, I figured I could easily enough get ready.  It’s just a (very) SMOP, right?


use overload
        '+' => \&add,
        '-' => \&subtract;
 
sub add
{
    my ($self, $rhs) = @_;
 
    return $self->_mktime($self->epoch + 86_400 * $rhs);
}
 
sub subtract
{
    my ($self, $rhs) = @_;
 
    return $self->_mktime($self->epoch - 86_400 * $rhs);
}

Really pretty simple, when you get down to it.

With that out of the way, I started looking at looking at how to integrate this with the parsing from Date::Parse.  The first thing I noticed was that Date::Parse can’t understand any format that doesn’t have a digit in it.  Therefore if you’re parsing something like “yesterday” or “last month,” Date::Parse is never going to be successful.  So why even bother trying?  So I can do this:

    elsif ( $date !~ /\d/ )
    {
        my $time = _parsedate($date);
        return Date::Easy::Date->new($time);
    }

and then just change this:

        else
        {
            die "Illegal date: $date";
        }

to this:

        else
        {
            my $time = _parsedate($date);
            die "Illegal date: $date" unless $time;
            return Date::Easy::Date->new($time);
        }
Of course the main event in this case is the implementation of _parsedate itself.  You may recall from last time that there were some issues with just calling Date::Parse’s str2date: I needed to parse the date string as if it were expressed in the local timezone, even if there’s an explicit timezone specified in the string.  This may sound a bit “off” at first, but remember this example:
Mon, 21 Nov 1994 00:47:58 -0500
You want that to come back as 11/21/1994, right? regardless of what timezone you happen to be in ... right?  Right.


Well, I have the same problem with calling Time::ParseDate’s parsedate.  Now, recall that last time I outlined my choices for how to handle this problem as follows:

I decided I had 3 options:3
  1. call strptime and str2time
  2. copy some code from str2time into Date::Easy::Date
  3. do Something Devious (like monkeypatching Date::Parse with a wrapper around strptime)
And I eventually settled on #2, because #1 was radically inefficient (although correct) and #3 just felt like it would introduce too much icky.  We have a similar problem with parsedate, but the analog is not exact.  For one thing, parsedate doesn’t call a subfunction which returns the majority of the interesting bits.  So solution #1 isn’t really relevant here.4  So that just leaves us with #2 and #3.  Oh, and I managed to think of another option ...  See, it occurred to me that, if push came to shove, I could always steal my testing methodology.  When writing unit tests, one often has to come up with an alternate method of arriving at the same destination—otherwise, you end up testing that A == A, which is obviously not very helpful.  For the unit tests for date that I stole from Date::Parse (and of course I’ll be doing the same thing for Time::ParseDate), the proper answers are provided for me ... except that they’re the proper answers for datetime, which will be very nice when I get to that part, but not so useful here.  So either I have to calculate the proper answer for a whole lot of tests by hand,5 or I have to calculate the proper answers.  Preferably some other way than I’m calculating the answers I’m checking against.6 What I hit upon was to just to remove all timezone markers from the test string.  I did this by going through the code and picking out the regexes it used to identify timezones.  Then I removed anything matching those same regexes.  The code (for strtime-like parsing) looks like this:
    lives_ok { $t = date($_) } "parse survival: $_";
    # figure out what the proper date *should* be by dropping any timezone specifier
    (my $proper = $_) =~ s/ ( [+-] \d{4} | [A-Z]{3} | [+-] \d{4} \h \( [A-Z]{3} \) | Z ) $//x;
    is $t->strftime($FMT), Time::Piece->_mktime(str2time($proper), 1)->strftime($FMT), "successful parse: $_"
            or diag("compared against parse of: $proper");

So, looking back at this once I got to the parsedate-style parsing, it occurred to me that I could just use this method for the implementation.7  So my choices this time boiled down to:

  1. [not applicable]
  2. copy some code from parsedate into Date::Easy::Date
  3. do Something Devious (like monkeypatching Time::ParseDate with a wrapper around ... something)
  4. attempt to remove any timezone specifier in the passed-in string

Now, I put up #4 because it would work, but I considered it a last-ditch option.  Actually changing the string that the user asked me to parse is okay in a unit test, which is a very controlled circumstance: if I accidentally change the string in a way that breaks something, I’m going to know about it (because I’ll have an incorrectly failing test).  If I do it in the wild, I could end up borking something in creative and unexpected ways.  Also, it creates a dependency between my code and the code I’m integrating with.  That is, if Time::ParseDate were to ever add a new timezone specifier, then my code wouldn’t work until I added a corresponding regex on my side.  That feels wrong.  I’m sure new timezone specifiers don’t get added all that often, but on the other hand Time::ParseDate does have a tendency to change more often than Date::Parse.  Which, by the way, makes #2 a pretty hard sell as well.

Which leaves us with #3.  Now, I said last time that the reason monkey-patching felt icky was not, in my case, because I feel icky about monkey-patching in general.  In fact, I think it’s cool that Perl can do these sorts of things.  I felt it was icky because I would have to monkey-patch something in the middle of the callstack—that is, I would need my code to call str2date, which would need to call my monkey-patched code, which would then need to call strptime.  And my monkey-patched code would need to somehow communicate back my actual code the results of strptime before passing them on to str2date, and there wasn’t any good way to do that, other than global variables ... which means there wasn’t any good way to do that at all.

But parsedate is a very different situation.  When I started looking at the code and figuring out just how to make it do what I wanted, I realized that all I really needed to do was to stop it adjusting the number of epoch seconds it was calculating for timezones.  Looking at the code, I discovered that it does this adjustment in eight places, using two different functions: Time::ParseDate::mkoff and Time::Timezone::tz_offset (Time::Timezone is part of the Time::ParseDate distro).  After some experimentation, I figured out that I didn’t actually need these functions to return 0, but rather to return the number of seconds offset for the local timezone (remember: Date::Easy::Date’s are parsed as localtime, then stored as UTC8).  Happily, Time::Timezone broke out figuring out the local timezone offset into a separate function, which meant I could just call that.  So in the end this works:
sub _parsedate
{
    require Time::ParseDate;
    require Time::Timezone;
    no warnings 'redefine';
    *orig_mkoff     = \&Time::ParseDate::mkoff;
    *orig_tz_offset = \&Time::ParseDate::tz_offset;
 
    *Time::ParseDate::mkoff = sub { Time::Timezone::tz_local_offset() };
    # Note: *not* *Time::Timezone::tz_offset!  Since it was already exported to the
    # Time::ParseDate namespace, that's the one we care about.
    *Time::ParseDate::tz_offset = sub { Time::Timezone::tz_local_offset() };
    my $t = scalar Time::ParseDate::parsedate(shift, DATE_REQUIRED => 1);
 
    *Time::ParseDate::mkoff     = \&orig_mkoff;
    *Time::ParseDate::tz_offset = \&orig_tz_offset;
    return $t;
}
It threw me off for a while that I needed to change Time::ParseDate::tz_offset and not Time::Timezone::tz_offset, but I got it after thinking about it for a while.  Also note that my use of scalar is completely unnecessary, but I left it in to help remind myself that you can get funky results if you end up calling parsedate in a list context.9


Now, it’s possible that this makes something non-re-entrant, somehow.  Certainly if parsedate were to contain any calls to itself which expected to be getting back timezone-adjusted epoch seconds it would be surprised (and disappointed).  But a) it doesn’t, and b) even if it did, in this particular circumtance it’s likely that I wouldn’t want the timezone to be adjusted anyway.  Maybe there’s some multithreaded scenario where simultaneous calls to Date::Easy::Date::_parsedate and Time::ParseDate::parsedate cause the latter to get the wrong answer?  Well, maybe ... but so far I haven’t been able to produce one in my limited number of attempts.  If any of you readers who are smarter than I10 want to throw me some code that causes such a failure, I’ll use it as a unit test and see if I can come up with a solution.  But so far it seems pretty safe to me, so I’m sticking with it.

There are two other bit of intriguery that I’ll share with you.

Firstly, while running through the unit tests I stole from Time::ParseDate, I accidentally discovered a bug in the code I wrote last time.  As it turns out, completely ignoring the time portion of the parsed datetime is not quite correct.11  Remember that what I’m reproducing in my hacked-up copy of str2date is its defaulting and its validation.  Sometimes Date::Parse::strptime can return an illegal month, like -1, and then you need to throw an error rather than proceeding.  Well, it can also return an hour of -1, or something similar, so by throwing away the time portion altogether I was letting through some otherwise illegal dates.  I discovered this because there are, of course, some formats that str2date and parsedate have in common.  Since Date::Easy prefers str2date, I wanted to skip any unit test of a string that str2date would handle and test only those that were actually making it to parsedate.  I figured which were which by simply passing the string to str2date and only continuing if that returned undef.  But of course failing to validate the time meant that str2date might fail but Date::Easy::Date’s _strptime might succeed.  Tricky.  So I simply made a quick change to actually validate the seconds, minutes, and hours (but otherwise ignore them).

Secondly, I got almost everything working on my laptop, but I still had a few odd failures that I couldn’t work out.  One night I had a little extra time at work and decided to poke at these lingering failures,12 so I ran my test suite ... and everything passed.  WTF?  After some head scratching and deeper investigation, I found the culprit: different versions of Time::ParseDate on the two different computers.  When I went to steal the unit tests, I took them from the latest version of Time::ParseDate.  Which the version on my work machine was able to handle.  But my laptop had an older version and it naturally failed the newer tests.  This made me ponder something that I probably should have been thinking of before now: what versions of Date::Parse and Time::ParseDate did I want to depend on?  In an ideal world, it wouldn’t matter, but at the very least I needed to throw up versions for CPAN clients to work with.  And, by stealing unit tests, I made it so that the unit tests for Date::Easy would fail if your versions of Date::Parse or Time::ParseDate were too old.  (I suppose that’s a bit of a downside to stealing unit tests, but overall that seems like a pretty minor disadvantage.)  So I needed to pick versions and then make sure the unit tests I was stealing were from that version.  For Date::Parse, I just went back to the version which was the last time my stolen unit tests changed: 1.17, back in 2009.  For Time::ParseDate, it was a tougher call, but I eventually decided to go with the next-to-latest version (at time of writing): 2015.0925.  It may force people to upgrade an older version, but at least they won’t stumble on the bug that it fixes.


The full code for Date::Easy so far is here.  Of special note:

  • unit test for the super-simple date math above
  • the stolen unit tests from Time::ParseDate
  • I added a unit test to make sure I wasn’t loading on-demand modules unless demanded
  • the method I use to make sure I’m not “cheating” in my str2date unit tests by using parsedate to get the right answer (I actually had this happen to me a couple of times)
  • the parsedate version of the (str2date) code above which removes timzeone specifiers for unit tests
  • yes, I added a test to make sure I properly revert my monkey-patch

Next time, we’ll look at a first-cut implementation for Date::Easy::Datetime.



__________

1 Note that I get to leave the parends off of the call to today(), because of the prototype I gave it.  This is one of the very few instances in which prototypes are actually useful.


2 In my opinion, at least.


3 Feel free to point out any I might have missed in the comments.


4 That means that there’s a certain amount of inefficiency involved in using parsedate.  But I think it’s acceptable for the large number of additional date formats it converts.


5 That’s somewhere in the neighborhood of 400 tests, at the moment.  And, who knows: Date::Parse and/or Time::ParseDate may add more units test in the future, which I might also steal.  Starting down the manual calculation road is bound to lead to far more work than I want to put into this.  Remember: laziness.


6 Remember last time in that list of GitHub links at the bottom of the post, when I said “we’ll probably get into more detail on this later”?  Well, now it’s later.


7 Of course, that would necessitate finding a whole different way to test it.  But it could be worth it, in the right circumstances.


8 See the discussion last time for full details.


9 In list context, parsedate returns a two-element list consisting of the epoch seconds that you were really after, and whatever bits of the input string it had leftover after parsing out the date.  The latter usually being nothing.  So a call like Time::Piece->_mktime(parsedate($proper), 1) really does not do what you think it does.


10 Which means most of you, I suspect.


11 As I noted in footnote 11 of the previous post.


12 ‘Cause I gotta get ready for these blog posts, amiright?


Leave a comment

About Buddy Burden

user-pic 9 years in California, 20 years in Perl, 29 years in computers, 50 years in bare feet.