Time::Str - Time Zones and Leap Seconds

Time::Str parses and formats date/time strings across 20+ standard formats, with an optional C/XS backend and nanosecond precision. The previous post, Introducing Time::Str, covered parsing and formatting. This one covers two additions, time zones and leap seconds, and ends with a note on the new C parsers.

Time Zones

Time::Str has understood the offset in a timestamp (+01:00, Z), but it could not turn a zone name, or a local time with no offset, into a precise instant. Two new modules add that:

Example

use Time::Str::TimeZone qw(timezone);
use Time::Str           qw(str2time time2str);

$str  = 'Dec. 24, 2012 12:30 p.m.';
$tz   = timezone('America/New_York');
$time = str2time($str, format => 'DateTime', timezone => $tz);

say time2str($time, timezone => $tz);
# 2012-12-24T12:30:00-05:00

The input string has no offset. The timezone object resolves the local wall-clock time to UTC (New York in December is EST, -05:00), and time2str renders the result back in the same zone. The result is DST-aware, so a July date prints -04:00.

Resolving abbreviations

The first post showed that Time::Str captures abbreviations such as IST in tz_abbrev without deciding what they mean, since one abbreviation can name several zones. A timezone_map supplies the meaning for your data:

$map = {
  EST => timezone('America/New_York'),
  EDT => timezone('America/New_York'),
};

$str  = 'Dec. 24, 2012 12:30 p.m. EST';
$time = str2time($str, format => 'DateTime', timezone_map => $map);

The abbreviation selects an object, and the object determines the offset for that instant.

Time::TZif and Time::TZif::POSIX

Time::TZif reads compiled TZif files (RFC 8536), the binary zoneinfo the operating system ships. Time::TZif::POSIX evaluates a POSIX TZ rule string (IEEE Std 1003.1), for zones expressed as rules such as EST5EDT,M3.2.0,M11.1.0:

use Time::TZif;

$tz = Time::TZif->new(
  path => "$dir/America/New_York",
  name => 'America/New_York',
);
$off = $tz->offset_for_utc($epoch);  # seconds east


use Time::TZif::POSIX;

$tz = Time::TZif::POSIX->new(
  tz_string => 'EST5EDT,M3.2.0,M11.1.0',
);

Both provide the same two methods, offset_for_utc and offset_for_local, which str2time and time2str use. Either object can be passed as timezone =>.

Implementation notes

  • TZ handling follows POSIX. It honors the TZ environment variable: a database name, a POSIX rule, or the system default via /etc/localtime.
  • It uses the system IANA database. OS vendors keep the tzdb current, so on a current POSIX system there is nothing to bundle or upgrade separately.
  • Maintenance. DateTime::TimeZone ships its own copy of the tzdb and is re-released when the rules change; Time::TZif reads the system copy instead.
  • Memory. For America/New_York, Time::TZif uses 17,765 bytes against 87,174 for DateTime::TimeZone.
  • Offset lookup speed is shown below.
  • Runtime reload. Clearing the cache (Time::Str::TimeZone->reset) lets a long-running process pick up an updated database or a changed TZ without restarting.

Choosing how local times resolve

A spring-forward gap and a fall-back overlap have no single answer: one wall-clock time does not exist, another occurs twice. Time::TZif resolves these by policy.

Defaults are set on the constructor and apply to every lookup the object makes:

$tz = Time::TZif->new(
  path           => $path,
  name           => 'America/New_York',
  gap_policy     => 'later',    # spring-forward
  overlap_policy => 'earlier',  # fall-back
);

Either default can be overridden for a single call, by passing the policy to offset_for_local:

# resolve this overlap as daylight time, this once
$off = $tz->offset_for_local($time, overlap_policy => 'dst');

The five values are reject (the constructor default, which croaks on an impossible or ambiguous local time), earlier and later (pick by clock order), and std and dst (pick the standard-time or daylight-time side). A local time then resolves the way your libc or DateTime.pm would, per object or per call.

Benchmarks

Memory held for America/New_York:

                        bytes
Time::TZif              17765
DateTime::TimeZone      87174

UTC offset lookup:

                      Rate DateTime::Lite   DateTime  Time::TZif
DateTime::Lite     64306/s             --       -91%        -99%
DateTime          709978/s          1004%         --        -86%
Time::TZif       5093119/s          7820%       617%          --

Time::TZif is roughly 7x DateTime on this lookup.

Limitation: Windows

The current catalog targets POSIX systems, where the zoneinfo files and TZ semantics are standard. A Windows fallback is planned on top of Time::OlsonTZ::Data, originally by the late Zefram and now maintained by Dan Book, which packages the Olson database for platforms that do not ship it.

Leap Seconds

When I asked for feedback on IRC, Karen Etheridge ran her JSON Schema tests against str2time. It accepted any timestamp whose seconds field was 60.

A leap second can appear only as 23:59:60 UTC, and only on June 30 or December 31. 12:30:60 is not a leap second.

str2time now checks this without a table:

str2time('2016-12-31T23:59:60Z');
# ok: folded onto 23:59:59

str2time('2016-12-31T12:30:60Z');
# croaks: "Unable to convert: a leap second must occur at 23:59:60 UTC"

str2time('2016-06-15T23:59:60Z');
# croaks: "Unable to convert: no leap second on this UTC date"

POSIX time cannot represent a 61st second, so str2time folds 23:59:60 onto the preceding 23:59:59, then confirms the UTC instant is the last second of June 30 or December 31 (since 1972). The check runs after the offset is applied, so it also holds for non-UTC offsets:

str2time('2017-01-01T05:29:60+05:30');  # ok
# the same instant as 2016-12-31 23:59:60 UTC

The check is table-free: it accepts every June 30 and December 31 slot, whether or not a leap second was inserted that day.

Time::LeapSecond

That gap led to Time::LeapSecond, which knows the leap seconds that were actually scheduled. It reads the system data file (TZDB leap-seconds or IERS leap-seconds.list), falls back to a built-in table, and exposes the instants, the TAI-UTC offset, and the file's expiration $EXPIRES (the POSIX epoch the data is complete until).

The plan is to use it in str2time: while time() is before $EXPIRES the table is authoritative, so an unlisted :60 is rejected; once past it the file may be stale, so str2time falls back to the table-free check and accepts any valid slot. Better to accept a wrong leap second than reject a real one.

Native C parsers

An update on the first post's "what's next": several parsers are now generated C, built with Ragel, which removes the regex step from the parse path. Parsing 2012-12-24T12:30:45.123456+01:00:

                     Rate DT8601 DT3339 D::Parse T::Moment T::Str
DT::F::ISO8601    23442/s     --   -50%     -78%     -100%  -100%
DT::F::RFC3339    46804/s   100%     --     -56%      -99%  -100%
D::Parse         105263/s   349%   125%       --      -99%   -99%
T::Moment       7030938/s 29893% 14922%    6579%        --   -36%
T::Str         11020543/s 46912% 23446%   10369%       57%     --

On this input Time::Str is about 57% faster than my other module Time::Moment.


Time::Str, Time::TZif, Time::TZif::POSIX, Time::Str::TimeZone and Time::LeapSecond are on CPAN and GitHub.

cpanm Time::Str

Leave a comment

About Christian Hansen

user-pic I blog about Perl.