Using bzipped Perl as storage

In a research project we are using a Perl hash (generated by a tool similar to Dumper, but with some sorting mechanisms that are relevant to the project). Unfortunately some of those dump files take more than 200MB of disk space. But, fortunately, text is easy to compress, and bzip2 does a fair good job compressing these files. But, unfortunately, Perl 'do' function does not handle bzipped files. But, fortunately, we can make our own.

The solution is quite simple. In any case, I decided to share it with you.

First, use the appropriate decompressing module:

use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error);

Then, do the job:

  sub bz2do {
     my $file = shift;
     my $out;
     bunzip2 $file => \$out or die "Failed to bunzip: $Bunzip2Error.";
     no strict;
     $out = eval $out;
     die $@ if $@;
     return $out;
  }

From this code: I am using no strict because our dump files includes a variable before the hash, that is not initialized. Then, I am using the same variable for the eval, because this way I get out of references to the big string and, hopefully, Perl will release its memory as soon as possible.

Probably you can think of better implementations. But this one works, and that is good enough for me at the moment...

3 Comments

handy, thanks for that idea

IIRC because $out is lexical in that function, it is stored in the function's scratch pad and the memory is never released. I don't think this applies to local variables or the constituent values of data structures.

Perhaps this will work better:

my @out = '';
bunzip2 $file => \$out[0] or die ...;

You get the idea. A quick google found the following link, but I don't think its where I originally read about it: http://www.perlmonks.org/?node_id=803515

I'd be interested to know if this helps.

On a somewhat unrelated note, I see that CPAN now also has module to do XZ compressing/uncompressing, so that's another alternative (XZ basically has higher compression rate and faster decompression speed compared to bzip2, though that might not matter in many cases).

Leave a comment

About Alberto Simões

user-pic I blog about Perl. D'uh!