A Date with CPAN, Part 4: Construction Time Again

[This is a post in my latest, probably long-ass, series.  You may want to begin at the beginning.  I do not promise that the next post in the series will be next week.  Just that I will eventually finish it, someday.  Unless I get hit by a bus.

IMPORTANT NOTE!  When I provide you links to code on GitHub, I’m giving you links to particular commits.  This allows me to show you the code as it was at the time the blog post was written and insures that the code references will make sense in the context of this post.  Just be aware that the latest version of the code may be very different.]


Last time I babbled on for a while about my general plans, and finally came up with a name other than “my perfect date module.” This time we stop screwing around and finally write some code.

I decided to start out with the date class.  In retrospect, I sort of wished I’d started with the datetime class, as that would’ve been a lot simpler.  But the date class was more interesting, and more immediately useful, so that’s where I started, so that’s where we’ll start as well.

Now, for datetimes, I’ve already said that I want to deal with two timezones: local, and GMT.  This mirrors what Perl handles, and also what Time::Piece handles, which is what I decided to use for a base class back in part 2.  But what about dates?  If the date class is also derived from Time::Piece, then we either have to choose local, or choose GMT, or allow the user to specify which they want.  What’s the right answer?

Well, making the user choose is definitely wrong.  The whole point of a date class is so the user doesn’t have to think about times and timezones: making them choose between local and GMT totally defeats that goal.  So I should pick one.  But which one?  Initially I figured that UTC was the right answer, just on general principle.  But then I realized that “today” in UTC is (or at least can be) a whole different date than “today” in local time, and that’s obviously wrong.  So reluctantly I switched to using local time.  But then I realized that that screws up my date math: adding a day to the day before daylight savings ends gives you the same day, unless you take extreme measures to insure that it doesn’t.  Even doing it to the day before daylight savings starts gives you a time other than midnight, which breaks our contract with the user.  So, as it turns out, neither one is right.  Sad face emoji.

What I eventually worked out is that I need to use both, as counter-intuitive as that may seem.  That is, I need to parse a datestring in the context of local time, but then store it in my object as UTC.  That way I get the best of both worlds, and everything will pretty much DWIM.  And DWIM is one of my goals here—it’s part of the “easy” mantra.

Okay, armed with this decision, we have enough info to write ourselves a basic constructor.  We’ll accept either 3 integers, meaning year, month, and day (note that this is the same format many functions in Date::Calc accept, not accidentally), or one integer, meaning epoch seconds.  No arguments at all means use the current time.  We’ll lop off the hours, minutes, and seconds part and feed the result into timegm to get a truncated date.  Then we have to turn that into a Time::Piece object, except blessed into our class.  After some study of the Time::Piece source code, this looks pretty easy: we can just use _mktime: it’s a class method, handles being subclassed properly, and allows us to specify either local or GMT.1  So our basic new method will look like so:

sub new
{
    my $class = shift;
    my ($y, $m, $d);
    if (@_ == 3)
    {
        ($y, $m, $d) = @_;
        --$m;                                       # timegm will expect month as 0..11
    }
    else
    {
        my ($time) = @_;
        $time = time unless defined $time;
        ($d, $m, $y) = (localtime $time)[3..5];     # `Date`s are parsed relative to local time ...
        $y += 1900;                                 # (timelocal/timegm does odd things w/ 2-digit dates)
    }
 
    my $truncated_date =
            eval { timegm( 0,0,0, $d,$m,$y ) };     # ... but stored as UTC
    die "Illegal date: $y/$m/$d" unless $truncated_date;
    return $class->_mktime($truncated_date, 0);
}

Nothing too tricky going on here.  I have to adjust the month number if a human passed it in to me (as opposed to the case where I derived it myself from localtime), and I need to adjust the year if it came back from localtime, because of the way timegm (and timelocal, for that matter) handle two-digit dates:2 I might get the right answer if I don’t do the adjustment, but then again I might not.

So this gives us some pretty simple options for turning numbers into date objects.  But we want to be able to handle lots more formats than this, right?  How do we go about adding that?  Well, remember that I already said3 that I didn’t want you to have to do this:

my $date = Date::Easy::Date->new($secs);

I’d much prefer it if you could do this:

my $date = date($secs);

This is the style offered by modules such as Path::Class and Path::Tiny, as well as the interface of Date::Piece, which I already said I wanted to mostly steal.  Now, all those modules make the global constructor function equivalent to calling new as a class method.4  I’ve chosen to go a different route here: the class method new will provide a few, simple options for construction, whereas the global function will be where the serious parsing action will occur, trying anything it can think of to turn whatever you throw at it into a date.  This decision is not set in stone; I may well change my mind at some point down the road.  But let me explain my thinking here.  My date function will contain a fair amount of heuristics in order to be as DWIMmy as I want it to be.  But some people don’t like DWIMmy.  So I like having the new method provide a simpler, more predictable interface that people can fall back to if they’re ever surprised by the behavior of date.

So what will our date function look like?  Here’s a first cut:

sub date ($)
{
    my $date = shift;
    if ( $date =~ /^-?\d+$/ )
    {
        if ($date < 29000000 and $date >= 10000000)
        {
            my @time = $date =~ /^(\d{4})(\d{2})(\d{2})$/;
            return Date::Easy::Date->new(@time);
        }
        return Date::Easy::Date->new($date);
    }
    else
    {
        my ($d, $m, $y) = _strptime($date);
        if (defined $y)                                                 # they're either all defined, or it's bogus
        {
            return Date::Easy::Date->new($y, $m, $d);
        }
        else
        {
            die "Illegal date: $date";
        }
    }
    die("reached unreachable code");
}
So let’s break this down a bit.  First, we can see that I gave it a prototype.  My initial thought was that you might want to do something like this:
say "On ", date $input_date, " processed ", scalar @files, " files";
but being able to ditch the parends means giving up the flexibility of having additional (optional) arguments to date, and my brain is already percolating with ideas along those lines, so that prototype is likely to fall by the wayside pretty soon.


Next we see that if you pass date an integer (positive or negative, but no fancy notation), it gets treated as a number of epoch seconds unless it’s a number between 10000000 and 28999999, in which case it gets treated as a compact datestring—that is, you could pass in the (American-style) date 11/5/66 as 19661105.  This is a fairly common date format: many people who don’t like using their RDBMS’s native date storage5 favor this format because it’s compact, mostly human-readable, and still sorts properly.  Date::Parse won’t handle it, but Time::ParseDate will, and my initial thought was to just pass anything that looked like it was meant to be a compact datestring to TPD.  But then I realized that I already had a way to turn 3 numbers into a date object, so it was silly not to just break it up myself right there.

If you’re wondering what those bounds are, they don’t come from me: it’s the range that timegm/timelocal seem to be capable of handling.  From my experimentation, anything from 1/1/1000 to 12/31/2899 gets converted just fine; anything outside that range barfs.6  Now, that range is going to include a lot of numbers that can’t be converted to valid dates, of course.  For instance, the entire set of numbers from 28991232 to 28999999 are going to produce errors.  Still, I think it makes more sense to say, any positive 8-digit integer whose first 2 digits are between 10 and 28 is taken to mean a compact datestring, vs having some of them be datestrings and others be epoch seconds ... that would be really confusing, to me.  This decision does mean that there’s a block of about 7 months in 1970 where you can’t pass in epoch seconds to date and have it interpreted as epoch seconds.7  But I think that’s an acceptable sacrifice for the proper DWIMmery.

Anything that’s not an integer gets passed on to Date::Parse via internal function _strptime (which we’ll talk about in a second).  Anything unparseable throws an exception, because I find it annoying when date conversion routines fail in a non-obvious way and I realize 6 months later that I’ve been using 1/1/1970 all along.  Besides, if I die, the user can wrap that in an eval block (or Try::Tiny try block, if that’s more your speed) and then handle it however they like.  But perhaps more sophisticated error handling is an area to be explored for future expansion.

Now we can get into the details of how to interface with Date::Parse.  First, a super-brief tutorial on the guts of Date::Parse’s str2time.8  str2time calls another public function in Date::Parse, strptime, which returns a seven-element list: seconds, minutes, hours, day, month, year, and timezone.  All str2time itself does is validate the values coming out of strptime, default any missing date values to the current day, and then stuff them into either timegm or timelocal.  Now, my first thought here was, great: I can just call strptime directly.  After all, I don’t really need the stuffing into timegm/timelocal, because I’m going to do that myself.  If I let str2time do it, we’ll end up calling timelocal (or timegm), then localtime, then timegm again.  Totally unnecessary.  So I’ll avoid the middleman.

But this turned out to be a bad idea.  See, str2time is doing two other things besides calling timegm/timelocal: it’s doing a bit of defaulting, which I could maybe live without, and a fair amount of validation, which it turns out I can’t.  Because strptime is perfectly happy to return, say, a month value of -1.  str2time would then convert that to an undef for epoch seconds, but I was blithely passing it along to timegm, which promptly barfed.  (Interesting side note: I found this out because I decided to steal part of Date::Parse’s test suite—this is a very good thing to do if you’re planning to try to maintain some level of compatibility with another module.)  So that wasn’t working.

Briefly, I toyed with just calling str2time and saying screw the inefficiency (no premature optimization and all that), but that turned out to have a bigger problem than being slower than necessary: it was often just plain wrong.  See, I’m making the decision to simply ignore whatever time info you passed in.  If you tell me you want to make a date out of, say:9
Mon, 21 Nov 1994 00:47:58 -0500
then I assume you want to get back 11/21/1994.  But, if you happen to be in the Pacific Time Zone, as I am, and you run that through str2time and then take the result and run it through localtime, you get 11/20/1994 ... you’re a day off.  So that wasn’t working.  Obviously I needed the validation (at least) of str2time without the munging into epoch seconds.


I decided I had 3 options:10

  1. call strptime and str2time
  2. copy some code from str2time into Date::Easy::Date
  3. do Something Devious (like monkeypatching Date::Parse with a wrapper around strptime)

Now, #1 is a reasonable fallback position: it’s stupidly inefficient, but at least it’s correct.  If none of the other options are workable, that would be the way to go.  I’m not actually opposed to #3 on general principle (although I’m sure some of my readers would be).  Perl’s awesome ability to do tricky things like that is one of its strengths, in my opinion.  However, in this case it’s kind of icky.  I’d be inserting my custom function into the middle of the callstack, with no way to communicate back to my originating function other than global variables.  Which would cause me no end of heartache.  So I started thinking about option #2 ... would it really be so bad?  Well, the first thing to consider is that Date::Parse doesn’t really change that often: the particular chunk of code I’m looking at hasn’t changed since 2003.  And not because it’s abandoned or any such thing; it’s just very stable and doesn’t need much updating at this point.  And the other advantage to copying the code would be that I only have to copy the parts that relate to dates ... the parts that deal with times I don’t care about at all.11  So I can simplify the code I copy quite a bit.  So in the end that’s what I decided to go with.  If you like, you can compare my version of str2time’s guts with the original from Date::Parse.

The full code for Date::Easy so far is here.  Of special note:


So that’s a basic first cut at a date class.  Next time, we’ll look at adding a Time::ParseDate fallback for when Date::Parse isn’t enough.



__________

1 Yes, it’s a private method, so it’s imperfect in that sense.  But it’s so perfect otherwise that I’m sticking with it.


2 Which is, if you’ve never looked into it, a funky little piece of DWIMmery of its own.  It uses a floating window based on the current year to figure out whether a two-digit year is likely to be in the past or the future.  It works pretty well when you’re dealing with a two-digit year that came from a human.  When you’re dealing with one that came from localtime ... not so much.


3 In part 2, footnote 6, to be precise.


4 By different means: Path::Class and Date::Piece have the global constructors call the class method constructor, whereas Path::Tiny has the class method call the global constructor.


5 There are various good reasons for doing this, as it happens.  Whenever you select a date without applying a specific conversion function, you get a string, using the default date conversion specifer for your database server.  Which means your DBA could change one global setting and break all your code.  If the default conversion spec is session based, you could be borked by rogue code without the involvement of a DBA at all.  If you wrap every single selected column in an explicit date-to-string conversion function then a) you will almost certainly miss some, and, even if you didn’t, eventually someone will forget, and b) you’ve probably destroyed any hope of converting to a different RDBMS, because all conversion functions—all SQL functions of any kind, really—seem to be RDBMS-specific.  It’s not really all that unreasonable to say, screw it: I’m just storing all my dates as 8-digit strings.  I’ve never been a fan of it personally, but I certainly understand the impulse.


6 I presume this is tied to using a 64-bit signed time_t, so, if your machine’s time_t is different, your range might be as well.  But it seemed like a reasonable range for my heuristic.


7 Specifically, from 4/26/1970 10:46:40 to 12/2/1970 07:33:19.


8 In case, you know, you’ve never studied that.  Because, why would you, really?


9 This is an actual example from Date::Parse’s test suite.


10 Feel free to point out any I might have missed in the comments.


11 This turns out to be not quite 100% correct.  But we’ll address that issue in a future post.


12 Remember I talked about that in part 3.


Leave a comment

About Buddy Burden

user-pic 9 years in California, 20 years in Perl, 29 years in computers, 50 years in bare feet.