What you should know about signal based timeouts

The problem

I think we've all seen code like this example from perlipc:

my $ALARM_EXCEPTION = "alarm clock restart";
eval {
    local $SIG{ALRM} = sub { die $ALARM_EXCEPTION };
    alarm 10;
    flock(FH, 2)  || die "cannot flock: $!";
    alarm 0;
};
alarm 0;
if ($@ && $@ !~ quotemeta($ALARM_EXCEPTION)) { die }

Here, signals are used to put a time limit on some action. However sometimes this doesn't work as wanted. In particular, some C libraries used in XS modules don't honor the deferred signaling resulting in it being ignored until the C function has finished, which is unlikely to be what you want.

Therefore, people resort to unsafe signals

use Sys::SigAction qw( set_sig_handler );
my $ALARM_EXCEPTION = "alarm clock restart";
my $h;
eval {
    $h = set_sig_handler('ALRM', sub { die $ALARM_EXCEPTION }, { });
    alarm 10;
    flock $fh, 2 or die "cannot flock: $!";
    alarm 0;
};
alarm 0;
$SIG{ALRM} = $h;
if ($@ && $@ !~ quotemeta($ALARM_EXCEPTION)) { die }

This works as expected, mostly, but there is a serious problem with doing this; serious enough to have an explicit and specific high severity advise against it in CERT's secure coding guide (and it also happens to violate most other secure coding advises regarding signaling).

Signal handlers (or at least the real, unsafe ones) have a highly restricted set of operations they can safely to perform, doing anything that's not allowed means risking segfaults and data loss. This is why we needed "safe" signaling in the first place. By longjumping out of the unsafe/real signal hander (which is what die does), those restrictions are continued into the rest of the program. That means that anything from that point on can (and at some point probably will) cause segfaults and other bugs.

Ouch!

The way out?

That's the harsh part, sometimes there isn't any easy way out. If a piece of C code doesn't have it's own timeout support, there may be no alternative. The real solution is to write blocking/computationally intensive software in such a way that it can handle this more graciously, for example by using an event loop, but often one has to deal with the tools one has.

So, I'm not saying everyone is wrong for using unsafe signal timeouts, but you should be aware of and accept the risks that come with it.

Leave a comment

About Leon Timmermans

user-pic