Why you don't need File::Slurp…

#! /usr/bin/env perl

use strict;
use warnings;
use Benchmark 'cmpthese';
use File::Slurp 'read_file';

my $filename = shift or die "No argument given";
my $count = shift || 10;

cmpthese($count, {
    'Unix'  => sub { open my $fh, '<:unix', $filename or die "Couldn't open $filename: $!"; read $fh, my $buffer, -s $fh or die "Couldn't read $filename: $!" },
    'Slurp' => sub { read_file($filename, buffer_ref => \my $buffer, binmode => ':raw') },
});

For large files, it's just as fast as File::Slurp is:

        Rate Slurp  Unix
Slurp 2.28/s    --   -0%
Unix  2.29/s    0%    --

For small files, it's actually significantly faster:

          Rate Slurp  Unix
Slurp  51020/s    --  -66%
Unix  151515/s  197%    --

So why use File::Slurp, when a two-liner will actually perform better?

16 Comments

In general, in a shop with both newbs and monks it's preferable to use modules as much as possible. In particular, canning common functions like slurping a file reduces the possibility of errors.

use File::Slurp leaves no doubt in anyone's mind as to what the code is doing.

If speed is your game...roll your own. ;-)

  1. Because File::Slurp is portable to non-Unix systems.
  2. Because File::Slurp can be rewritten in XS if speed is the overriding concern (and it will be faster than your 2-lines).
  3. Because File::Slurp is self-documenting.
  4. Because File::Slurp handles errors better.

In other words, because of all the gains we've tried to make in software development for the past 20 years.

Oh, and your benchmark is flawed. You're not comparing the cost of the slurping the file - you're comparing the cost of calling a function. Wrap your 2-liner in a function and see if things change any.

Indeed, self-documenting.

Functional vs imperative style.

I would be curious for you to add Damien Conway's Perl6::Slurp to your comparison.

Because most of the time my time as a programmer is more important than a second or two of runtime.

Because I'm not going to screw up "read_file($filename)" like I am more likely to screw up doing the file open/read/close that you recreated above.

There's more to life than raw speed.

mmap should be faster than sysread -s or read -s, and there exist a portable module for it.

I use a one-liner slurp at $WORK. I am sooo tired of copy+pasting it into scripts when I start them (and perhaps remembering which is the most recent if I've changed it lately). I would much prefer it be as easy as a module.

If I recall, File::Slurp had issues with the PerlIO :encoding layer and required manual decoding of the slurped data instead, which is why I have avoided it. Not sure if this is still an issue.

There's a flaw IMHO, read() may not return the whole file content (and actually it won't with large files), depends on OS buffering. The usual boilerplate includes a while() and buffer concatenation.

With a small fix that could still be a oneliner though.

"For small files, it's actually significantly faster"

Hm... Skimming over the code of File::Slurp, it seems to use the very same technique that you present here, in the case of small files.

So rob.kinyon may be right, it might just be the overhead of the function call (and processing of parameters and error handling etc. within that function) that you're measuring.

There's a flaw IMHO, read() may not return the whole file content (and actually it won't with large files), depends on OS buffering. The usual boilerplate includes a while() and buffer concatenation.

AFAIK, sysread() might return partial data. But read() should return whole file (it's there are no problems with memory allocation in Perl). Can you point a place where the behaviour you describe is documented ?

Leave a comment

About Leon Timmermans

user-pic