A Date with CPAN, Part 6: Time Won't Give Me Time

[This is a post in my latest long-ass series.  You may want to begin at the beginning.  I do not promise that the next post in the series will be next week.  Just that I will eventually finish it, someday.  Unless I get hit by a bus.

IMPORTANT NOTE!  When I provide you links to code on GitHub, I’m giving you links to particular commits.  This allows me to show you the code as it was at the time the blog post was written and insures that the code references will make sense in the context of this post.  Just be aware that the latest version of the code may be very different.]


Last time I added Time::ParseDate support to our date class, which made it fairly usable, if still incomplete.  This time I decided to concentrate on getting a first cut at our datetime class.

In many ways, the datetime class is simpler than the date class, because it doesn’t need to do anything fancy like truncate to midnight or try to ignore times and timezones when parsing.  Of course, datetimes do have to consider timezones, but I decided to defer that thorny issue until next time.

The datetime class parallels the date class in many ways.  Where Date::Easy::Date->new accepts 0, 1, or 3 args (meaning either the current day, a number of epoch seconds, or year/month/day), Date::Easy::Datetime->new will accept 0, 1, or 6 args: use the current time, a number of epoch seconds, or year/month/day/hours/minutes/seconds.  Any other number of arguments is an error.

sub new
{
    my $class = shift;
 
    my $t;
    if (@_ == 0)
    {
        $t = time;
    }
    elsif (@_ == 6)
    {
        my ($y, $m, $d, $H, $M, $S) = @_;
        --$m;                                       # timelocal/timegm will expect month as 0..11
        $t = timelocal($S, $M, $H, $d, $m, $y);
    }
    elsif (@_ == 1)
    {
        $t = shift;
    }
    else
    {
        die("Illegal number of arguments to datetime()");
    }
 
    return scalar $class->_mktime($t, 1);
}

Pretty basic.  Notice that I’m sending Time::Piece‘s _mktime a second argument of 1, indicating local time instead of UTC.  This is the part we’ll make variable next time, but local time is good enough for now.

Also note the use of scalar.  Something I missed until this time around is that Time::Piece->_mktime will return a list of datepart values when called in list context, which is something that’s not useful in this application.  Of course I had to go back and do this in the date class as well.

Where ::Date had today, ::Datetime will have now:

sub now () { Date::Easy::Datetime->new }

Again, super-trivial.  The prototype allows you to do cool things like now + 30 to mean “thirty seconds from now.” And, unlike with ::Date, that works right out of the box, because Time::Piece already defines addition (and subtraction) the way that we want it.  Nifty.

Parallel to ::Date’s date function, for parsing human-readable strings, we’ll need a datetime version:

sub datetime
{
    my $datetime = shift;
    if ( $datetime =~ /^-?\d+$/ )
    {
        return Date::Easy::Datetime->new($datetime);
    }
    else
    {
        return Date::Easy::Datetime->new( _str2time($datetime) );
    }
    die("reached unreachable code");
}

As you can see, it works pretty much the same, only a bit simpler: where I wanted date to handle things like compact datestrings, I didn’t have that need here.  So right now it just handles an integer, which it interprets as a number of epoch seconds, and everything else it tosses off to Date::Parse‘s str2time function, like so:

sub _str2time
{
    require Date::Parse;
    return &Date::Parse::str2time;
}

Much easier than the version in ::Date.  Also note that I haven’t yet added the fallback to Time::ParseDate, but that should be trivial to do for next time.

One of the reasons this round took me longer than I’d anticipated1 was a bit of trickiness I hadn’t considered.  Remember those unit tests I stole from Date::Parse?  They consist of a string and a corresponding number of epoch seconds.  For dates, I had to finagle that string quite a bit, and the number of epoch seconds wasn’t useful at all.  For datetimes, though, I can just blast the string straight through and I should end up with the epoch seconds value, right?  Well, if I had started with UTC, then yes.  (In hindsight, I should have done just that and added local time afterwards, just as I should have started with datetimes and done dates after that.  But for some reason I like doing things the hard way.)  But, since I’m doing the local time version, the epoch value from Date::Parse is not what I actually expect to get.

Well, how does the Date::Parse unit test do it then?  First of all, it has a hard-coded number (5, in this case) of strings which don’t have timezones in them and therefore should be parsed as relative to the current timezone.  For the first 5 strings it sees, it adjusts the epoch value by an amount based on running localtime and gmtime against the same value and then calculating a delta.  Time to steal some more code, I suppose.  So I copied over the code to do the adjustment (with a few small tweaks), but I didn’t like hardcoding a number of tests which required that adjustment.  So I decided to reuse my method of identifying which strings have timezones and which don’t: a regex to pick out all the timezone patterns which Date::Parse knows about.  And I already have such a regex from last time.  So all I needed to do is share it between the date unit tests and the datetime unit tests.  Which I did.

I also spent way too long fiddling with my regex, until I eventually realized I’d hit some sort of weird behavior in Perl ... I won’t say it’s a bug, but I sure don’t understand it.  See, the vast majority of timezone specifiers that Date::Parse knows about are at the end of the string, so naturally I used a $ anchor.  But, every once in a while, you get one just before the date, like so:

Jul 22 10:00:00 UTC 2002

My first attempt for this was just to use an optional look-ahead, something like this:

[A-Z]{3} (?= \h+ \d{4} )?

(Remember this is part of a much larger pattern which uses /x.)  Which worked perfectly when I was just matching.  But when I tried to share my regex with the date class, which actually has to remove the timezone code, I ran into a problem.  At first I thought it was something relating to an optional look-ahead.  Then I thought it had to do with the whitespace somehow.  Eventually, though, I narrowed it down to this:

[absalom:~] perl -le 'print "A B C" =~ s/B (?=C)//r'
A C
[absalom:~] perl -le 'print "A B C" =~ s/B (?=C)$//r'
A B C

Doesn’t matter whether the whitespace is inside the look-ahead or not, and it doesn’t make a difference if you replace $ with \Z.  If anyone wants to take a crack at explaining to me why this might actually be correct, I welcome the input.  It sure looks wonky to me.


The full code for Date::Easy so far is here.  Of special note:

  • export code; as usual, Date::Easy exports everything, while Date::Easy::Datetime exports only what you ask for
  • unit test which tries to verify that Date::Easy::Datetime->new, now, and time all return the same thing, even though we face the unpleasant reality that the system clock might rollover to a new second in between the assignments
  • how I fixed the problem of calling _mktime in list context in Date::Easy::Date
  • the stolen code from Time::ParseDate to do the unit test adjustment for local times
  • the shared timezone-identifying regex code for Date::Parse

Next time, I’ll add in the Time::ParseDate fallback, figure out how to handle the UTC version of datetimes, and hopefully slap some POD in here.  At that point, Date::Easy won’t be done, but it will be sufficiently useful to put up on CPAN for all you folks to start beating up.  I’m looking forward to it!



__________

1 Other than the interposition of Christmas and New Year’s, of course.






14 Comments

To answer your question: in "A B C", you can remove "B " that's followed by C, but you can't remove "B " at the end of the string, that's followed by C. The look-ahead is zero-length, so the $ must match after the space, not after the C. If you want to just look at it, include it in the parentheses:

$ perl -le 'print "A B C" =~ s/B (?=C$)//r'
A C

if two consecutive zero-length tokens need to match, they must both match at the same point?

Yes.

Consecutive tokens should match consecutively, it seems to me, regardless of their length.

That wouldn’t make any sense. Then (?: and (?= would mean the exact same thing. If you need (?: then use (?: instead of using (?= and complaining that it doesn’t do what you need. :-)

not to mention the repitition of "B," which in reality is a more complicated subpattern

No problem.

/(?:X|Y)$|B(?= C$|$)/

Or if the $ was actually a more complicated subpattern too, you could DRY this out further:

/(?:X|Y)$|B(?=(?: C)?$)/
I can't think of a case where this would actually be desired behavior from a usage perspective.

Easy. Consider when foo(?=.*bar)(?=.*quux) will match.

> if two consecutive zero-length tokens need to match, they must both match at the same point? Consecutive tokens should match consecutively, it seems to me, regardless of their length

If they are zero-length, then "at the same point" is the same as "consecutive". If they weren't zero-length, then they wouldn't be at the same point.

A(?=B) means an "A" followed by a "B," except don't include the "B" in the match string. I'm not sure how the consecutive-ness of anything would impact that.

Then if you could write 'ABC' =~ /(A(?=B)C)/ and have it match because consecutive, which portion of the string would $1 be supposed to contain?

That surprise is ignorance leaving your mind. :-) Please do yourself a favour and read Mastering Regular Expressions. You seem to have a mistaken mental model of how regexps work; that book will beat you into shape.

Why, "AC" of course. If tell you to match "ABC" but leave the "B" out, what else could you possibly get?

And what would @- and @+ contain?

Another try:

Then (?: and (?= would mean the exact same thing.
Wait ... what? I don't follow that at all. A(?:B) means an "A" followed by a "B." A(?=B) means an "A" followed by a "B," except don't include the "B" in the match string. I'm not sure how the consecutive-ness of anything would impact that.

What would be the difference between (?=A(?=B)) and (?=A(?:B))?

Meanwhile the way I actually need it to work right now is not even an option. :-) […] I see how to work around it, thanks to your continued efforts to enlighten me.

You’re aware that sentence 2 there directly contradicts sentence 1, yes? (Not that I agree with the characterisation as a “workaround”.)

In fact, the situation is strictly opposite of what you claim: the way it does work allows you to achieve everything you need, and the way you think it ought to work would disallow many other things that are possible with the current model.

Which is just why it’s defined this way around, and not like you think it should be.

Which is why I completely disagree that this is simply a matter of “this way is obvious to you and that way would be obvious to me”, or at least the implication in how you say it, that just because people spontaneously generate different mental models, all of them must be equally valid. Not every discipline is product design.

In particular, the way it does work allows you to say things like

/^(?!foobar)\w+\s+/

i.e. “match a sequence of whitespace-separated words except if the first word starts with foobar”, which you would be unable to express otherwise.** And (?= has a similar role, except in a “but only if” capacity.

If you fail to imagine how that could be useful to at least somebody else (and almost certainly even yourself), I’m afraid that’s not a failure of empathy on my part.

(Also, just technically, I’d be curious about how negative look-behind fits into this consecutive matching mental model.)

** For me it would “merely” be excessively painful. It’s not impossible, but you’d have to fix your mental model to be able to figure it out, and even so you wouldn’t want to have to resort to that way of doing it.

Leave a comment

About Buddy Burden

user-pic 14 years in California, 25 years in Perl, 34 years in computers, 55 years in bare feet.