October 2012 Archives

New Data::Dumper release: 50% faster

Data::Dumper version 2.136 was just uploaded to CPAN. It's been over a year since the latest stable release of the module. Generally, I just synchronize changes to the module from the Perl core to CPAN releases and do so very carefully with lots of development releases.

Recently, however, there was a reason to look at Data::Dumper performance critically. A very simple change meant a speed-up of the order of 50% on my test data set. In a nutshell, Data::Dumper used to track each and every value in the data structure just in case you were going to want to use the Seen functionality. That pertains to a tiny fraction of all Data::Dumper uses and everybody was having to pay for it. For example, if you're using the functional interface (like most), then you wouldn't even ever get access to that information, yet everything was being tracked instead of just things with high reference counts.

With Data::Dumper 2.136, the functional interface has become faster unconditionally. If you use the OO interface, you may be one of the few people that care about the old Seen feature. That means you have to opt in to the new optimization by setting the Sparseseen option of the object. If you do, the Seen hash will be useless. Alternatively, you can globally enable the optimization by setting $Data::Dumper::Sparseseen = 1.

At the same time, the new release ports several bug fixes from the perl core. Unfortunately, some of those changes turned out to be incompatible with older versions of Perl. More specifically, it appears that there is one vstring related change that breaks some vstring tests on 5.8. I don't currently have the time to investigate. If you are affected by this, why don't you step up and help out to restore full compatibility?

A big thanks to my employer, Booking.com, for letting me spend work time on this optimization.

About Steffen Mueller

user-pic Physicist turned software developer. Working on Infrastructure Development at Booking.com. Apparently the only person who thinks the perl internals aren't nearly as bad as people make them. See also: the Booking.com tech blog, CPAN.