Why you don't need File::Slurp…

#! /usr/bin/env perl

use strict;
use warnings;
use Benchmark 'cmpthese';
use File::Slurp 'read_file';

my $filename = shift or die "No argument given";
my $count = shift || 10;

cmpthese($count, {
    'Unix'  => sub { open my $fh, '<:unix', $filename or die "Couldn't open $filename: $!"; read $fh, my $buffer, -s $fh or die "Couldn't read $filename: $!" },
    'Slurp' => sub { read_file($filename, buffer_ref => \my $buffer, binmode => ':raw') },
});

For large files, it's just as fast as File::Slurp is:

        Rate Slurp  Unix
Slurp 2.28/s    --   -0%
Unix  2.29/s    0%    --

For small files, it's actually significantly faster:

          Rate Slurp  Unix
Slurp  51020/s    --  -66%
Unix  151515/s  197%    --

So why use File::Slurp, when a two-liner will actually perform better?

2 weeks of perl

It all started with the Cluj.pm summer meeting on the 9th of August. I happened to be around there, so popped in. Cluj.pm is a refreshingly young perl monger group (I might even have been older than the average age there, that's a first for me). At first I didn't know anyone, other than the guest speaker Mark Keating, but after my presentation I had lots of people approaching me and I had a brilliant evening.

A short week later I flew to Germany, for the Perl Reunification Summit in Perl. Like Schwern I arrived a day earlier than most, so I had a calm start of the meetup. It was mostly a gathering of familiar to me faces, though a significant number I hadn't really spoken to before, specially the Perl 6 guys, -Ofun attracts awesome people. I spent most of the PRS talking to people, and doing a little coding (both related and unrelated). It was a very enlightening meetup.

Lastly, there was YAPC::EU. Despite the sometimes unbearable heat, it was awesome. At some points it seemed a bit less organized than my previous YAPCs, but that may also be me noticing more of what's going on. I spent most of my time in the hallway track, which extended into the pub track, and I spent enough time discussing (and occasionally ranting) that it's a miracle that I still have voice left. In between I found enough time to attend some talks, interestingly I attended most of them on the day I gave one myself. After doing threads last year I could only top it with signals this year. I'll have a challenge to come up with a crazier, I think I'll have to look in a vastly different direction (I have ideas already). After a full week of conferencing, I was relieved to be going home though.

So in all I met Mark Keating in 3 different places in 2 weeks time, I'd almost accuse him of stalking me!

What you should know about signal based timeouts

The problem

I think we've all seen code like this example from perlipc:

my $ALARM_EXCEPTION = "alarm clock restart";
eval {
    local $SIG{ALRM} = sub { die $ALARM_EXCEPTION };
    alarm 10;
    flock(FH, 2)  || die "cannot flock: $!";
    alarm 0;
};
alarm 0;
if ($@ && $@ !~ quotemeta($ALARM_EXCEPTION)) { die }

Here, signals are used to put a time limit on some action. However sometimes this doesn't work as wanted. In particular, some C libraries used in XS modules don't honor the deferred signaling resulting in it being ignored until the C function has finished, which is unlikely to be what you want.

Therefore, people resort to unsafe signals

use Sys::SigAction qw( set_sig_handler );
my $ALARM_EXCEPTION = "alarm clock restart";
my $h;
eval {
    $h = set_sig_handler('ALRM', sub { die $ALARM_EXCEPTION }, { });
    alarm 10;
    flock $fh, 2 or die "cannot flock: $!";
    alarm 0;
};
alarm 0;
$SIG{ALRM} = $h;
if ($@ && $@ !~ quotemeta($ALARM_EXCEPTION)) { die }

This works as expected, mostly, but there is a serious problem with doing this; serious enough to have an explicit and specific high severity advise against it in CERT's secure coding guide (and it also happens to violate most other secure coding advises regarding signaling).

Signal handlers (or at least the real, unsafe ones) have a highly restricted set of operations they can safely to perform, doing anything that's not allowed means risking segfaults and data loss. This is why we needed "safe" signaling in the first place. By longjumping out of the unsafe/real signal hander (which is what die does), those restrictions are continued into the rest of the program. That means that anything from that point on can (and at some point probably will) cause segfaults and other bugs.

Ouch!

The way out?

That's the harsh part, sometimes there isn't any easy way out. If a piece of C code doesn't have it's own timeout support, there may be no alternative. The real solution is to write blocking/computationally intensive software in such a way that it can handle this more graciously, for example by using an event loop, but often one has to deal with the tools one has.

So, I'm not saying everyone is wrong for using unsafe signal timeouts, but you should be aware of and accept the risks that come with it.

Looking for Ilja Tabachnik

I'm looking for Ilja Tabachnik.

I want to fix his only module on CPAN (POSIX::RT::MQ), but his public email address no longer exists. If I can not reach him I will ask the PAUSE admins for permission to take over this module.

Why do you want new major features in core?

I've heard some people complain about 5.12 and 5.14 not adding many new major features. Compared to 5.10 that's certainly true, but is that a bad thing?

Let's be honest, many (most) prominent new features of 5.10 are failures:

  • Smartmatching? I think everyone agrees it is broken.
  • given/when is even worse as it's almost impossible to predict if it will use smartmatching or not.
  • Lexical $_? Mostly a new source of bugs, and the _ prototype is merely a hack to work around lexical $_ issues.
  • MAD? It never reached any usable form.
  • etcetera…

A lot of others are not flawed, but are so uncommon that I haven't seen them being used in any code. UNITCHECK, stacked filetests, no VERSION, the list goes on…

In the end, there are only two new features of 5.10 that I end up using all the time: say and defined-or. These two features have one thing in common: they are small and simple features that make daily programming easier. Likewise my favorite new feature in 5.14 is the /r modifier on s/// and tr///. I don't know how we managed to do without that, it makes so much code so much simpler. I want more of those features.

On the other hand, there's another group of new features that is just as important, but not nearly as visible. It's the under-the-hood or right on top of it. Few people know how the $^H and %^H changed in 5.10, but if you're writing pragmas you'll appreciate them. These are expert features that few people will use, but those few use them to write the modules on CPAN that everyone else uses. These features are just as important, if not more so: they lay the foundation for progress by enabling (competitive) evolution on CPAN.

It is important for this progress to happen not in core but on CPAN. Because if modules screw up they can be discarded and we can try to come up with something better (which you can't with the core). Because modules easily allow an allow an ecosystem of TIMTOWTDI. But most of all because the Perl community is awesome at creating modules.

Few people outside the echo chamber follow know Steven's work on a perl MOP, but that may become the most important development in Perl OO since the arrival of his Moose (also written by him). Likewise few people are using Zefram's awesome keywords API, but that is what will allow us to do Devel::Declare kind of stuff in a sane way, and may one day open up doors to macros. Few people know of the Unicode improvements by Karl Williamson and others that make Perl hands down the best language for Unicode processing.

We can't always know end-user's requirement in advance. That's why big core features should be open ended. Maybe that doesn't make for spectacular perldeltas, but it does lead to a better end result.