Serializers for Perl: when to use what
This is a moderately edited (primarily rearranged) version of a comment on the Perl 5 issue tracker written by Yves Orton (demerphq). I thought it would be useful to a wider audience so I am reposting here with permission.
Note: this was written off the cuff and is not comprehensive. In correspondence, Yves noted that such an article could/should cover more formats, e.g. MsgPack.
I [feel strongly] that Data::Dumper is generally unsuitable as a serialization tool. The reasons are as follows:
- It is incapable of serializing correctly many of the data structures Perl can produce.
- It is incapable of round tripping many data structures, even ones as simple as strings properly.
- It requires
eval
to deserialize, making its use as a serialization format insecure. Data::Undump somewhat addresses this, but only to a certain extent. - It is generally inefficient, both in implementation and in terms of output size, and
eval
is slow as well.
As a debugging/diagnostics tool Data::Dumper is decent, but even for that purpose there are better tools like Data::Dump::Streamer, assuming performance is not an issue (even for debugging performance can be an issue).
If you want to serialize Perl data then you have a number of other choices with different trade-offs, all of which I consider superior to Data::Dumper:
- Storable
-
Storable can represent any Perl data structure possible and is fast.
It suffers from the disadvantage that its serialization and deserialization code is in one module, meaning one cannot upgrade the decoder independent from the encoder. Compounding that problem is the fact it is distributed with the Perl core. This leads to the “Storable upgrade trap” where if you use it for persistent storage the only way you can upgrade the module or Perl is to upgrade every perl installation that uses that persistent store at once. If you do not do so then you run the risk that the Storable bundled with a newer perl produces documents that cannot be read by the Storable bundled with older perls.
On top of that the Storable protocol and implementation has various security issues, which some might consider to be fairly serious. Storable does not have a well-defined format.
- JSON
JSON is a relatively uncomplicated encoding format incapable of representing many Perl data structures. Nevertheless its simplicity, wide interoperability, and efficiency (in particular via Marc Lehmann’s excellent JSON::XS) makes it a good choice for many applications. JSON is standardized and is supported by many languages.
- YAML
-
YAML appears to be able to represent most if not all (of the sane) Perl data structures. It is portable to other languages, includes a more or less secure deserialization infrastructure. YAML is standardized and is supported by many languages.
[Ed.: in our correspondence, Yves also noted:] [This] needs more analysis, for instance I am not certain that YAML does not have security issues, and I am not sure about its performance envelope.
- Sereal
-
Sereal is designed as a replacement for Storable. (In the interest of full disclosure I should note I am one of authors of Sereal.)
It is more efficient both in terms of speed and size, includes built-in compression (fast Snappy and slow Gzip), allows for the decoder and encoder to be upgraded independently (and thus does not suffer from the upgrade trap) and is capable of representing nearly all “sane” Perl data structures. (It does not support dualvars nor globs, but does support pretty much everything else – aliases, regexps, etc.) It also includes a wide variety of other features not worth mentioning here.
It defends against all Storable attack vectors known to its authors at the time it was written. For instance it is robust against
DESTROY
-based attacks, forbids duplicate keys in serialized hashes, delays blessing items until it knows it has deserialized the entire data structure and will never auto-require a module.
So from my point of view:
- For pure serialization purposes Sereal is the best general choice by a long margin.
- If you also want human readable output then you should pick YAML or JSON.
- If you just want to debug stuff then Data::Dumper is plenty fine although Data::Dump::Streamer is probably better, albeit much slower.
Small carp: Storable does not do Regexps, at least out of the box.