New Data::Dumper release: 50% faster
Data::Dumper version 2.136 was just uploaded to CPAN. It's been over a year since the latest stable release of the module. Generally, I just synchronize changes to the module from the Perl core to CPAN releases and do so very carefully with lots of development releases.
Recently, however, there was a reason to look at
Data::Dumper performance critically. A very simple change meant a speed-up of the order of 50% on my test data set. In a nutshell,
Data::Dumper used to track each and every value in the data structure just in case you were going to want to use the
Seen functionality. That pertains to a tiny fraction of all
Data::Dumper uses and everybody was having to pay for it. For example, if you're using the functional interface (like most), then you wouldn't even ever get access to that information, yet everything was being tracked instead of just things with high reference counts.
Data::Dumper 2.136, the functional interface has become faster unconditionally. If you use the OO interface, you may be one of the few people that care about the old
Seen feature. That means you have to opt in to the new optimization by setting the
Sparseseen option of the object. If you do, the
Seen hash will be useless. Alternatively, you can globally enable the optimization by setting
$Data::Dumper::Sparseseen = 1.
At the same time, the new release ports several bug fixes from the perl core. Unfortunately, some of those changes turned out to be incompatible with older versions of Perl. More specifically, it appears that there is one vstring related change that breaks some vstring tests on 5.8. I don't currently have the time to investigate. If you are affected by this, why don't you step up and help out to restore full compatibility?
A big thanks to my employer, Booking.com, for letting me spend work time on this optimization.