Time::Str - Time Zones and Leap Seconds
Time::Str parses and formats date/time strings across 20+ standard formats, with an optional C/XS backend and nanosecond precision. The previous post, Introducing Time::Str, covered parsing and formatting. This one covers two additions, time zones and leap seconds, and ends with a note on the new C parsers.
Time Zones
Time::Str has understood the offset in a timestamp (+01:00,
Z), but it could not turn a zone name, or a local time with no offset,
into a precise instant. Two new modules add that:
- Time::Str::TimeZone resolves names to objects and caches them.
- Time::TZif and Time::TZif::POSIX compute the UTC offset of a zone at a given instant.
Example
use Time::Str::TimeZone qw(timezone);
use Time::Str qw(str2time time2str);
$str = 'Dec. 24, 2012 12:30 p.m.';
$tz = timezone('America/New_York');
$time = str2time($str, format => 'DateTime', timezone => $tz);
say time2str($time, timezone => $tz);
# 2012-12-24T12:30:00-05:00
The input string has no offset. The timezone object resolves the local
wall-clock time to UTC (New York in December is EST, -05:00), and
time2str renders the result back in the same zone. The result is
DST-aware, so a July date prints -04:00.
Resolving abbreviations
The first post showed that Time::Str captures abbreviations such as
IST in tz_abbrev without deciding what they mean, since one
abbreviation can name several zones. A timezone_map supplies the
meaning for your data:
$map = {
EST => timezone('America/New_York'),
EDT => timezone('America/New_York'),
};
$str = 'Dec. 24, 2012 12:30 p.m. EST';
$time = str2time($str, format => 'DateTime', timezone_map => $map);
The abbreviation selects an object, and the object determines the offset for that instant.
Time::TZif and Time::TZif::POSIX
Time::TZif reads compiled TZif files (RFC 8536), the binary zoneinfo
the operating system ships. Time::TZif::POSIX evaluates a POSIX TZ
rule string (IEEE Std 1003.1), for zones expressed as rules such as
EST5EDT,M3.2.0,M11.1.0:
use Time::TZif;
$tz = Time::TZif->new(
path => "$dir/America/New_York",
name => 'America/New_York',
);
$off = $tz->offset_for_utc($epoch); # seconds east
use Time::TZif::POSIX;
$tz = Time::TZif::POSIX->new(
tz_string => 'EST5EDT,M3.2.0,M11.1.0',
);
Both provide the same two methods, offset_for_utc and
offset_for_local, which str2time and time2str use. Either object
can be passed as timezone =>.
Implementation notes
TZhandling follows POSIX. It honors theTZenvironment variable: a database name, a POSIX rule, or the system default via/etc/localtime.- It uses the system IANA database. OS vendors keep the tzdb current, so on a current POSIX system there is nothing to bundle or upgrade separately.
- Maintenance. DateTime::TimeZone ships its own copy of the tzdb and is re-released when the rules change; Time::TZif reads the system copy instead.
- Memory. For
America/New_York, Time::TZif uses 17,765 bytes against 87,174 for DateTime::TimeZone. - Offset lookup speed is shown below.
- Runtime reload. Clearing the cache (
Time::Str::TimeZone->reset) lets a long-running process pick up an updated database or a changedTZwithout restarting.
Choosing how local times resolve
A spring-forward gap and a fall-back overlap have no single answer: one wall-clock time does not exist, another occurs twice. Time::TZif resolves these by policy.
Defaults are set on the constructor and apply to every lookup the object makes:
$tz = Time::TZif->new(
path => $path,
name => 'America/New_York',
gap_policy => 'later', # spring-forward
overlap_policy => 'earlier', # fall-back
);
Either default can be overridden for a single call, by passing the
policy to offset_for_local:
# resolve this overlap as daylight time, this once
$off = $tz->offset_for_local($time, overlap_policy => 'dst');
The five values are reject (the constructor default, which croaks on
an impossible or ambiguous local time), earlier and later (pick by
clock order), and std and dst (pick the standard-time or
daylight-time side). A local time then resolves the way your libc or
DateTime.pm would, per object or per call.
Benchmarks
Memory held for America/New_York:
bytes
Time::TZif 17765
DateTime::TimeZone 87174
UTC offset lookup:
Rate DateTime::Lite DateTime Time::TZif
DateTime::Lite 64306/s -- -91% -99%
DateTime 709978/s 1004% -- -86%
Time::TZif 5093119/s 7820% 617% --
Time::TZif is roughly 7x DateTime on this lookup.
Limitation: Windows
The current catalog targets POSIX systems, where the zoneinfo files and
TZ semantics are standard. A Windows fallback is planned on top of
Time::OlsonTZ::Data,
originally by the late Zefram and now maintained by Dan Book, which
packages the Olson database for platforms that do not ship it.
Leap Seconds
When I asked for feedback on IRC, Karen Etheridge ran her JSON Schema
tests against str2time. It accepted any timestamp whose seconds field
was 60.
A leap second can appear only as 23:59:60 UTC, and only on June 30 or
December 31. 12:30:60 is not a leap second.
str2time now checks this without a table:
str2time('2016-12-31T23:59:60Z');
# ok: folded onto 23:59:59
str2time('2016-12-31T12:30:60Z');
# croaks: "Unable to convert: a leap second must occur at 23:59:60 UTC"
str2time('2016-06-15T23:59:60Z');
# croaks: "Unable to convert: no leap second on this UTC date"
POSIX time cannot represent a 61st second, so str2time folds
23:59:60 onto the preceding 23:59:59, then confirms the UTC instant
is the last second of June 30 or December 31 (since 1972). The check
runs after the offset is applied, so it also holds for non-UTC offsets:
str2time('2017-01-01T05:29:60+05:30'); # ok
# the same instant as 2016-12-31 23:59:60 UTC
The check is table-free: it accepts every June 30 and December 31 slot, whether or not a leap second was inserted that day.
Time::LeapSecond
That gap led to
Time::LeapSecond, which
knows the leap seconds that were actually scheduled. It reads the system
data file (TZDB leap-seconds or IERS leap-seconds.list), falls back
to a built-in table, and exposes the instants, the TAI-UTC offset, and
the file's expiration $EXPIRES (the POSIX epoch the data is complete
until).
The plan is to use it in str2time: while time() is before
$EXPIRES the table is authoritative, so an unlisted :60 is rejected;
once past it the file may be stale, so str2time falls back to the
table-free check and accepts any valid slot. Better to accept a wrong
leap second than reject a real one.
Native C parsers
An update on the first post's "what's next": several parsers are now
generated C, built with Ragel,
which removes the regex step from the parse path. Parsing
2012-12-24T12:30:45.123456+01:00:
Rate DT8601 DT3339 D::Parse T::Moment T::Str
DT::F::ISO8601 23442/s -- -50% -78% -100% -100%
DT::F::RFC3339 46804/s 100% -- -56% -99% -100%
D::Parse 105263/s 349% 125% -- -99% -99%
T::Moment 7030938/s 29893% 14922% 6579% -- -36%
T::Str 11020543/s 46912% 23446% 10369% 57% --
On this input Time::Str is about 57% faster than my other module Time::Moment.
Time::Str, Time::TZif, Time::TZif::POSIX, Time::Str::TimeZone and Time::LeapSecond are on CPAN and GitHub.
cpanm Time::Str
I blog about Perl.
Leave a comment