File::Slurp is broken and wrong

If you are using File::Slurp, you should possibly reconsider. Basically, there are three reasons to do so;

It is wrong in a lot of cases.

File::Slurp predates IO layers, and as such doesn't take them into account well. A few years ago, after some complaints, an attempt was done to make it handle encodings. This was nothing short of being wrong.

The best known bug in this area is #83126, which means that :encoding() layers are always interpreted as :utf8. This not only means that UTF-8 encoded text is not validated (which can be a security risk), but also that files in other encodings (such as UTF-16) will be read as UTF-8, which surely will give an incorrect result.

Likewise it's not handling :crlf correctly, in particular explicitly asking for :crlf will always disable it, even on Windows.

Basically, it's doing all binmodes wrong except the one you shouldn't be using anyway (:utf8), and you should pretty much always be using a binmode, so there's no way to win really.

The interface is poorly huffmanized.

Huffmanization is the process of making commonly used operations shorter. File::Slurp is failing to huffmanize in the unicode world of 2015. Text files are usually UTF-8 nowadays, which in File::Slurp would typically be read_file($filename, binmode => ':raw:utf8'). The shortest option, read_file($filename), does something that most people don't really want anymore: latin-1 encoded files with platforms specific line-endings.

This is mainly the fault of perl itself (backwards compatibility is a PITA), but a library can work around this to make the programmers life easier.

It is poorly maintained

The critical bug mentioned above has been known for about two years, yet the author hasn't even bothered to respond to it, let alone fix it. There hasn't been a release in 4 years despite an increasingly long list of issues. Worst yet, this isn't the first time such a thing happens; before his last maintenance surge in the spring of 2011 the author was also missing-in-action for years. This negligence is inexcusable for a module that is so commonly depended upon.

Recommendations

Instead of File::Slurp, I recommend you use one of these modules depending on your needs:

If your needs are minimal, I'd recommend my File::Slurper. It provides correct, fast and easy to use slurping and spewing functions.

If your needs are average (which is the case for most people), I'd recommend Path::Tiny. This provides a well-balanced set of functions for dealing with file paths and contents.

If you want to go for maximal overkill, try IO::All. It will do everything you can imagine and more.

7 Comments

+1 for Path::Tiny

I believe that the default encoding using by File::Slurp is because of a Perl bug with threads: https://rt.perl.org/Public/Bug/Display.html?id=41121

Any comments/opinions on

use Mojo::Util qw(slurp);

Thanks, Fritz

it's not particularly relevant to slurping as the handle won't be shared between threads.

It doesn't matter if the filehandle is shared. Just calling binmode will lead to a segfault.

Why slurp at all?

Before slurping I ask myself why. Not what module or how.

  • Do I actually know the possible file size? Many times processing large files I have broken modules (and needeed to replace them) because they use slurping which do not scale to large files (Megabyte to gigabyte plus).

  • Am I applying premature optimisation? In my testing the increase in speed (if any!) was outweighed by needing to take into account encodings and binmode issues - pretty much the issues described here. Just using the native methods might be better.

Slurping is a (useful) hack. There, I said it....:-)

Leave a comment

About Leon Timmermans

user-pic