Grant idea - pack and unpack on streams

Merijn Brand gave me this proposal. As it’s too long for our grant ideas list, I am posting here.

Currently, pack and unpack work on a string, which means that you have to move forward in the data-string yourself, if the full data-format is not known in advance, but depends on data seen so far.

If one could unpack on a stream, one could unpack the picture that is known, have the stream pointer move forward the amount of data read from the given picture, and be ready to read the next data based on a next, possibly different) picture.

Extreme “win” can be taken where the size of the data being read for a given picture is differing per architecture, like native floats.

From PerlMonks thread http://www.perlmonks.org/?node_id=1104462

Perl Monks, I humbly seek your wisdom: I wish to unpack a series of strustures from a binary file. I have used:

@array = unpack (“f*”, join (“”, <$filehandle>));

to load an array of floats into memory from a smaller binary file, but a) This file is too big to fit in memory b) It has a complex structure, like “int, int, float, float, float” repeating.

How shall I iteratively unpack the next structure from the file into a list of scalars, without loading the whole file into memory?

[ikegami] came with this solution:

my $template = "iifff";
my $rec_size = template_len ($template);
while (1) {
my $rv = read ($fh, my $rec, $rec_size);
defined $rv or die “$!\n”;
$rv or last;
$rv < $rec_size and die “Premature EOF\n”;

my @fields = unpack $template, $rec;

}

which does not show the required template_len function that needs to calculate the actual data length for a given picture (possibly by using the picture to pack data into a string and ask for its length).

Having the suggested functionality, that code could be simplified to

while (my @fields = unpack “iifff” => $fh) {

}

See also this thread on perlmonks: http://www.perlmonks.org/?node_id=1099544 where attached PM-1099544-0.pl has been rewritten to PM-1099544-1.pl first to make the code easier and readable, and then - given unpack on streams were possible - rewritten to PM-1099544-2.pl This example is just eliminating 3 read calls

If you need the attached *.pl files, contact Merijn or me.

Leave a comment

About Makoto Nozaki

user-pic Secretary, The Perl Foundation.