Comparison of Perl serialization modules
A while ago I needed a Perl data serializer with some requirements (supports circular references and Regexp objects out of the box, consistent/canonical output due output will be hashed). Here's my rundown of currently available data serialization Perl modules. A few notes: the labels fast/slow is relative to each other and are not the result of extensive benchmarking.
Data::Dumper. The grand-daddy of Perl serialization module. Produces Perl code with adjustable indentation level (default is lots of indentation, so output is verbose). Slow. Available in core since the early days of Perl 5 (5.005 to be exact). To unserialize, we need to do eval(), which might not be good for security. Usually the first choice for many Perl programmers when it comes to serialization and arguably the most popular module for that purpose.
Storable. Fast. Produces compact, portable binary output. Also available in core distribution. Does not support Regexp objects out of the box (though adding support for that requires only a few lines). Binary format used to change several times in the past without backward compatibility in the newer version of the module, giving people major PITA. Supposedly stabilized now.
YAML::XS. Fast. Verbose YAML output (currently doesn't seem to have option to output inline YAML). My personal experience in the past is sometimes this module behaved weirdly and died with a cryptic error, but I guess currently it's pretty stable.
There are other YAML implementations like YAML::Syck (also pretty speedy) and the old Pure-Perl YAML.pm and partial implementation YAML::Tiny. The last two might not be a good choice for general serialization needs.
Data::Dump. Very slow. Produces nicely indented Perl output. The strength of this module is in pretty output and flexibility in customizing the formatting process. Based on Data::Dump I've hacked two other specialized modules: Data::Dump::PHP for producing PHP code, and Data::Dump::Partial to produce compact and partial Perl output for logging.
XML::Dumper. Produces *very* verbose (as is the case with all XML) XML output. Slow. Aside from the XML format, I don't think there's a reason why you should choose this over the others.
JSON::XS. Fast, outputs pretty compact but still readable code, but does not support circular references or Regexp objects.
JSYNC. Slow, outputs JSON and in addition supports circular references but not yet Regexp objects.
FreezeThaw. Slow, produces compact output but not as compact as Storable. Does not support Regexp objects out of the box.
Apart from these there are many other choices too, but I personally don't think any of them is interesting enough to be a favorite. For example, last time I checked PHP::Serialization (and all the other PHP-related modules) does not support circular references. There's also, for example, Data::Pond: cute concept but of little practical use as it is even more limited than JSON format.
There are also numerous alternatives to Data::Dumper/Data::Dump, producing Perl or Perl-like code or indented formatted output, but they are either: not unserializable back to data structures (so, they are more of a formatting module instead of serialization module) or focus on pretty printing instead of speed. In general I think most Data::Dumper-like modules are slow when it comes to serializing data.
In conclusion, choice is good but I have not found my perfect general serialization module yet. My two favorites are Storable and YAML::XS. If JSYNC is faster and supports Regexp, or if YAML::XS or YAML::Syck can output inline/compact YAML, that would be as near to perfect as I would like it.
Hope this comparison is useful. Corrections and additions welcome.
The main advantage of modules like Data::Dumper is that they output Perl code. If you use them for saving the user's preferences and configuration, they can be loaded via the use statement. They load faster than any other method since they use Perl's own parser and not one that needs be load, compiled and ran.
Nice post. I also tend to use Data::Serializer because as its manual says, it:
"Provides a unified interface to the various serializing modules currently available. Adds the functionality of both compression and encryption."
@shawnhcorey: You need to be extra careful, however, that the Perl code does not contain something malicious (or, that users that are not programmers can handle editing the configuration by hand). In short, unless speed is really a crucial issue (like you mentioned above), it's usually a better approach to pick a simpler format like INI or JSON/YAML (or a database) for user's configuration.
Well, for startup speed yes. But it's surely faster to parse 1MB of JSON compared to 1MB of Perl code, since Perl parsing is more complex and slow.
I would mention Sereal module here.
http://search.cpan.org/~smueller/Sereal/
It's faster than Storable, claims compatibility of format between versions, produces more compact binary output and has nice features like sort_keys
http://search.cpan.org/~smueller/Sereal-Encoder-0.37/lib/Sereal/Encoder.pm#sort_keys
which allows to obtain identical binary output for logically the same hashes which is important as perl keys order in hash is random by definition
http://perldoc.perl.org/functions/keys.html
Yup, Sereal looks to be an overall winner. Will be trying it out in future projects. I'm especially annoyed by Storable's continuing lack of Regexp support. It should probably be retired and replaced by Sereal.
I should also probably add other formats like MessagePack, BSON, etc.
And I should probably subcategorize according to some criteria, e.g.: binary vs human-readable, cross-language or Perl-specific (supporting arcane Perl features), etc.