Storable-like Modules
I need to store some big matrices (Math::MatrixReal objects) to be released on a module that will need to load them every time it is loaded. Therefore, the save time is not important, but I need fast loading. Also, it would be nice if the format would be kind of compressed (zlib or any other). Finally, and less important, it would be also nice if the format, in case it is binary, is platform independent.
Every time I needed something like this I used Storable and/or Data::Dumper. What other interesting options are out there?
I think Booking.com's Sereal is the best option available.
See comparison graphs for performance and compression size here: https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs
@Mike B: Thank you for pointing out Sereal. We put a lot of work into it. As for performance and compression size: We believe we're better than the competition in both, but ultimately it's a trade-off. For example, there's an option to tune Sereal's aggressivity in de-duplicating strings. Or you could compress the whole serialized document with LZMA manually, etc. If in doubt, it's best to test multiple options on the data at hand. I'd love to hear about the results of any such tests, too!
Hey.
Using the snappy option (not trying LZMA yet) the data goes from a 20MB text file (not in Data::Dumper, just an unrolled version of the matrices to load) to a 15MB file. The loading time goes from 12 seconds (the time to read line after line the txt file, and build the structure) to as few as 0.2 seconds.
So, for now will stick to it. Cheers
That's great, that sounds like fast loading (or at least a vast improvement) to me!
It sounds like in all practical situations, your disk is going to be the bottleneck. :)
I hit a similar issue in Munin. Writing was no issue, but reading it back in a CGI was very slow, as it was synchronous, and it needed to read everything.
Using Sereal was quite a perf improvement, but not enough. So I just bite the bullet and used SQLite as I didn't find any Tie-ed implemention that avoided to read the whole HashOfHash data structure in memory.
As your need is quite specific, you might be better off using a custom Tie implementation (by maybe deriving from an other one).
@Steve: One of the potential Sereal features that could have helped your case hit the chopping block. We considered partial deserialization using something like dpath. Fundamentally, I think that's still implementable, but I don't think I'd want to add that to the main decoder code: Too much extra complexity and likely to slow down the general case.
@Stefen yes, an mmap-like with just-in-time deserializing would be awesome. Even via overriding like Sereal::JIT and big performance penalty to read the whole.