Sereal v2 Finally Released as Stable

Just now, I've uploaded three distributions to PAUSE: Sereal-Encoder-2.01, Sereal-Decoder-2.01, and Sereal-2.01.

This means that version 2 of the Sereal serialization protocol is finally considered public, stable, and ready for production use. There was a long article about Sereal v2 on the Booking.com blog a few months ago. Since then, at least one major new feature was added to Sereal version 2: Object serialization hooks (FREEZE/THAW). Let me summarize the two major user-visible new features.

  • Encoding data in document headers. This feature is useful primarily for large and/or compressed Sereal documents: It allows you to embed meta data into the document header which can be inspected without having to deserialize the entire document. Think routing information in a high-traffic server application.

  • FREEZE/THAW hooks. This much-requested feature led to lots of discussions between the other Sereal contributors and myself and quite some disagreement. The end result is that if you enable an option on the Sereal encoder object, the encoder will check for the existence of a FREEZE method on any object it encounters and then call it with the string "Sereal" as its first argument. Whatever the FREEZE method returns (as its first return value) will be serialized in place of the object. On the decoding side, any object that underwent FREEZE will require a THAW method of the object's class. It's the obvious reverse of FREEZE and whatever THAW returns will be included in the output of the decoder run. This feature slows down encoder and decoder due to method lookups and method calls. But it's optional. The whole thing is designed to only cost you anything if you actually use it. The best part about FREEZE/THAW, however, is that I didn't have to come up with the interface. Marc Lehmann wrote Types::Serializer including a specification of how these FREEZE/THAW hooks can work for multiple serialization libraries. (Basically passing in the name of the serializer for the class to dispatch on.) Thus, you can now often write a single hook that supports all of Sereal, JSON(::XS), and CBOR(::XS).

All in all, Sereal v2 was long in the making. The Perl/XS implementation of the decoder is fully backward-compatible and the encoder can optionally be made to emit v1 documents. We took great care in designing the changes and that simply took some time to percolate.

The Perl/XS implementation is live. I'd be surprised if we didn't see full, production grade v2 implementations from Borislav (Ruby), Damian (Go), and Andrea (Objective C) released within a few days.

Test ExtUtils::ParseXS 3.18_04 before it's too late!

ExtUtils::ParseXS is the module that translates XS to C. There have been development releases (newest: 3.18_04) for some time now. No CPAN testers failures. But for something as low-level as this, tests can only cover so much. I've been wanting to build a full CPAN against it, but the infrastructure for that has been in maintenance for months.

Because the release contains a few important fixes, please test your XS against the new development release soon. Otherwise, I will take a bit of a risk and cut a production release.

Announcement for Sereal, a binary data serialization format, finally live!

It's been long in the making, but finally, I've gotten the Sereal announcement article in a shape that I felt somewhat comfortable with publishing. Designing and implementing Sereal was a true team effort and we really hope to see non-Perl implementations of it in the future. We're virtually committed to finish the Java decoder at least for our data-warehousing infrastructure. Any help and cooperation is welcome, as are patches to improve the actual text of the specification (which is kind of a weak point still).

By the way, for those who worried about the lack of a comment-system on the Booking.com dev blog before, we've added Disqus-support.

But now, I'm just glad it's out there!

Booking.com dev blog goes live!

I'm proud to echo the announcement that the Booking.com dev blog has just gone live. Quoting the announcement:

Booking.com is an online hotel reservations company founded during the hey-days of the dot com era in the 90s. The product offering was initially limited to just the Dutch market. We grew rapidly to expand our offerings to include 240,000+ accommodations in 171 countries used by millions of unique visitors every month - numbers which continue to grow every single day. With such growth come interesting problems of scalability, design and localisation which we love solving every day.

The blog is kicked off with just a quick, humble article of mine on a debugging module that I published after needing the functionality at work. In a given code location, it allows you to find where in the code base the current set of signal handlers were set up. We plan to publish new content regularly and have a few interesting stories already lined up. So stay tuned!

New Data::Dumper release: 50% faster

Data::Dumper version 2.136 was just uploaded to CPAN. It's been over a year since the latest stable release of the module. Generally, I just synchronize changes to the module from the Perl core to CPAN releases and do so very carefully with lots of development releases.

Recently, however, there was a reason to look at Data::Dumper performance critically. A very simple change meant a speed-up of the order of 50% on my test data set. In a nutshell, Data::Dumper used to track each and every value in the data structure just in case you were going to want to use the Seen functionality. That pertains to a tiny fraction of all Data::Dumper uses and everybody was having to pay for it. For example, if you're using the functional interface (like most), then you wouldn't even ever get access to that information, yet everything was being tracked instead of just things with high reference counts.

With Data::Dumper 2.136, the functional interface has become faster unconditionally. If you use the OO interface, you may be one of the few people that care about the old Seen feature. That means you have to opt in to the new optimization by setting the Sparseseen option of the object. If you do, the Seen hash will be useless. Alternatively, you can globally enable the optimization by setting $Data::Dumper::Sparseseen = 1.

At the same time, the new release ports several bug fixes from the perl core. Unfortunately, some of those changes turned out to be incompatible with older versions of Perl. More specifically, it appears that there is one vstring related change that breaks some vstring tests on 5.8. I don't currently have the time to investigate. If you are affected by this, why don't you step up and help out to restore full compatibility?

A big thanks to my employer, Booking.com, for letting me spend work time on this optimization.

About Steffen Mueller

user-pic Physicist turned software developer. Working on Infrastructure Development at Booking.com. Apparently the only person who thinks the perl internals aren't nearly as bad as people make them. See also: the Booking.com tech blog, CPAN.