Let's talk about Time::Moment and round-trip of strings

In my previous blog post I wrote a lot more about Time::Moment, than appeared in the post (could have been my mistake due to a preview and error and a incomplete copy and a paste, but still very inconvenient). So I have decided to break down my original blogpost in several blogposts.

Time::Moment implements a subset of ISO 8601 (known as 4.3 Date and time of day, 4.3.2 Complete representatio…

A bit more about Time::Moment

In my previous blog post I mentioned the bottlenecks of DateTime and why I had to develop Time::Moment and it's underlying c library, c-dt.

In this blog post I'll talk a bit of the design decisions I made for Time::Moment.

Time::Moment supports a finite set of timestamps, the range is 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z with nanosecond precision regardless …

Time::Moment vs DateTime

In December last year released the first version of Time::Moment. I don't foresee any major changes in Time::Moment 0.16 API, so in next release I'll remove the "early preview release" notice in the description. I have been using 0.16 in production two different deployments with great success, by removing DateTime from the ORM we have seen significantly reduced the memory usage and CPU usage and increased the throughput when inflating timestamps from a RDBM, previously DateTime was a one of …

What if sv_utf8_upgrade() used heuristic encoding?

Heuristic: decode as native unless is well-formed UTF-X:

sub heuristic_utf8_upgrade {
    utf8::upgrade($_[0])
      unless utf8::decode($_[0]);
    return !!0;
}

Here is some code to play with:

#!/usr/bin/perl
use strict;
use warnings;

{
    package encoding::heuristic;

    our $Encoding;

    BEGIN {
        require Encode;
        $Encoding = Encode::find_encoding('utf8');
    }

    sub import {
        ${^ENCODING} = bless \my $x, __PACKAGE__;
    }

    sub decode : lvalue {
        local ${^ENCO…

Coping with double encoded UTF-8

A few months ago a client asked if could help them with a "double encoded UTF-8 data problem", they had managed to store several GB of data with "corrupted" UTF-8 (technically it's not corrupt UTF-8 since its well-formed UTF-8). During the process I developed several regexs that I would like to share and may prove useful to you someday.

Due to the UTF-8 encoding usage of prefix codes it's easy to spot a double encoded UTF-8 sequence, the prefix code is within the range of ="h…