File::Slurp is broken and wrong

If you are using File::Slurp, you should possibly reconsider. Basically, there are three reasons to do so;

It is wrong in a lot of cases.

File::Slurp predates IO layers, and as such doesn't take them into account well. A few years ago, after some complaints, an attempt was done to make it handle encodings. This was nothing short of being wrong.

The best known bug in this area is #83126, which means that :encoding() layers are always interpreted as :utf8. This not only means that UTF-8 encoded text is not validated (which can be a security risk), but also that files in other encodings (such as UTF-16) will be read as UTF-8, which surely will give an incorrect result.

Likewise it's not handling :crlf correctly, in particular explicitly asking for :crlf will always disable it, even on Windows.

Basically, it's doing all binmodes wrong except the one you shouldn't be using anyway (:utf8), and you should pretty much always be using a binmode, so there's no way to win really.

The interface is poorly huffmanized.

Huffmanization is the process of making commonly used operations shorter. File::Slurp is failing to huffmanize in the unicode world of 2015. Text files are usually UTF-8 nowadays, which in File::Slurp would typically be read_file($filename, binmode => ':raw:utf8'). The shortest option, read_file($filename), does something that most people don't really want anymore: latin-1 encoded files with platforms specific line-endings.

This is mainly the fault of perl itself (backwards compatibility is a PITA), but a library can work around this to make the programmers life easier.

It is poorly maintained

The critical bug mentioned above has been known for about two years, yet the author hasn't even bothered to respond to it, let alone fix it. There hasn't been a release in 4 years despite an increasingly long list of issues. Worst yet, this isn't the first time such a thing happens; before his last maintenance surge in the spring of 2011 the author was also missing-in-action for years. This negligence is inexcusable for a module that is so commonly depended upon.


Instead of File::Slurp, I recommend you use one of these modules depending on your needs:

If your needs are minimal, I'd recommend my File::Slurper. It provides correct, fast and easy to use slurping and spewing functions.

If your needs are average (which is the case for most people), I'd recommend Path::Tiny. This provides a well-balanced set of functions for dealing with file paths and contents.

If you want to go for maximal overkill, try IO::All. It will do everything you can imagine and more.

2 weeks of perl

It all started with the summer meeting on the 9th of August. I happened to be around there, so popped in. is a refreshingly young perl monger group (I might even have been older than the average age there, that's a first for me). At first I didn't know anyone, other than the guest speaker Mark Keating, but after my presentation I had lots of people approaching me and I had a brilliant evening.

A short week later I flew to Germany, for the Perl Reunification Summit in Perl. Like Schwern I arrived a day earlier than most, so I had a calm start of the meetup. It was mostly a gathering of familiar to me faces, though a significant number I hadn't really spoken to before, specially the Perl 6 guys, -Ofun attracts awesome people. I spent most of the PRS talking to people, and doing a little coding (both related and unrelated). It was a very enlightening meetup.

Lastly, there was YAPC::EU. Despite the sometimes unbearable heat, it was awesome. At some points it seemed a bit less organized than my previous YAPCs, but that may also be me noticing more of what's going on. I spent most of my time in the hallway track, which extended into the pub track, and I spent enough time discussing (and occasionally ranting) that it's a miracle that I still have voice left. In between I found enough time to attend some talks, interestingly I attended most of them on the day I gave one myself. After doing threads last year I could only top it with signals this year. I'll have a challenge to come up with a crazier, I think I'll have to look in a vastly different direction (I have ideas already). After a full week of conferencing, I was relieved to be going home though.

So in all I met Mark Keating in 3 different places in 2 weeks time, I'd almost accuse him of stalking me!

What you should know about signal based timeouts

The problem

I think we've all seen code like this example from perlipc:

my $ALARM_EXCEPTION = "alarm clock restart";
eval {
    local $SIG{ALRM} = sub { die $ALARM_EXCEPTION };
    alarm 10;
    flock(FH, 2)  || die "cannot flock: $!";
    alarm 0;
alarm 0;
if ($@ && $@ !~ quotemeta($ALARM_EXCEPTION)) { die }

Here, signals are used to put a time limit on some action. However sometimes this doesn't work as wanted. In particular, some C libraries used in XS modules don't honor the deferred signaling resulting in it being ignored until the C function has finished, which is unlikely to be what you want.

Therefore, people resort to unsafe signals

use Sys::SigAction qw( set_sig_handler );
my $ALARM_EXCEPTION = "alarm clock restart";
my $h;
eval {
    $h = set_sig_handler('ALRM', sub { die $ALARM_EXCEPTION }, { });
    alarm 10;
    flock $fh, 2 or die "cannot flock: $!";
    alarm 0;
alarm 0;
$SIG{ALRM} = $h;
if ($@ && $@ !~ quotemeta($ALARM_EXCEPTION)) { die }

This works as expected, mostly, but there is a serious problem with doing this; serious enough to have an explicit and specific high severity advise against it in CERT's secure coding guide (and it also happens to violate most other secure coding advises regarding signaling).

Signal handlers (or at least the real, unsafe ones) have a highly restricted set of operations they can safely to perform, doing anything that's not allowed means risking segfaults and data loss. This is why we needed "safe" signaling in the first place. By longjumping out of the unsafe/real signal hander (which is what die does), those restrictions are continued into the rest of the program. That means that anything from that point on can (and at some point probably will) cause segfaults and other bugs.


The way out?

That's the harsh part, sometimes there isn't any easy way out. If a piece of C code doesn't have it's own timeout support, there may be no alternative. The real solution is to write blocking/computationally intensive software in such a way that it can handle this more graciously, for example by using an event loop, but often one has to deal with the tools one has.

So, I'm not saying everyone is wrong for using unsafe signal timeouts, but you should be aware of and accept the risks that come with it.

Looking for Ilja Tabachnik

I'm looking for Ilja Tabachnik.

I want to fix his only module on CPAN (POSIX::RT::MQ), but his public email address no longer exists. If I can not reach him I will ask the PAUSE admins for permission to take over this module.

Why do you want new major features in core?

I've heard some people complain about 5.12 and 5.14 not adding many new major features. Compared to 5.10 that's certainly true, but is that a bad thing?

Let's be honest, many (most) prominent new features of 5.10 are failures:

  • Smartmatching? I think everyone agrees it is broken.
  • given/when is even worse as it's almost impossible to predict if it will use smartmatching or not.
  • Lexical $_? Mostly a new source of bugs, and the _ prototype is merely a hack to work around lexical $_ issues.
  • MAD? It never reached any usable form.
  • etcetera…

A lot of others are not flawed, but are so uncommon that I haven't seen them being used in any code. UNITCHECK, stacked filetests, no VERSION, the list goes on…

In the end, there are only two new features of 5.10 that I end up using all the time: say and defined-or. These two features have one thing in common: they are small and simple features that make daily programming easier. Likewise my favorite new feature in 5.14 is the /r modifier on s/// and tr///. I don't know how we managed to do without that, it makes so much code so much simpler. I want more of those features.

On the other hand, there's another group of new features that is just as important, but not nearly as visible. It's the under-the-hood or right on top of it. Few people know how the $^H and %^H changed in 5.10, but if you're writing pragmas you'll appreciate them. These are expert features that few people will use, but those few use them to write the modules on CPAN that everyone else uses. These features are just as important, if not more so: they lay the foundation for progress by enabling (competitive) evolution on CPAN.

It is important for this progress to happen not in core but on CPAN. Because if modules screw up they can be discarded and we can try to come up with something better (which you can't with the core). Because modules easily allow an allow an ecosystem of TIMTOWTDI. But most of all because the Perl community is awesome at creating modules.

Few people outside the echo chamber follow know Steven's work on a perl MOP, but that may become the most important development in Perl OO since the arrival of his Moose (also written by him). Likewise few people are using Zefram's awesome keywords API, but that is what will allow us to do Devel::Declare kind of stuff in a sane way, and may one day open up doors to macros. Few people know of the Unicode improvements by Karl Williamson and others that make Perl hands down the best language for Unicode processing.

We can't always know end-user's requirement in advance. That's why big core features should be open ended. Maybe that doesn't make for spectacular perldeltas, but it does lead to a better end result.