Gaming FAIL

Many years ago, when I was... roughly 15 (I think), I met Theodor Ts'o (one of the first hardcore Linux Kernel hackers, since version 0.90, I believe) at a Linux event IBM organized in Israel. I should note that he is a very nice person.

After the event, we got to talk a bit. We talked about our favorite games. Mine was "avoiding segfaults". This was back when I was programming in C.

Today I wrote the following regex:
qr/^([\w|\.]+)\s+(?(\d+)\s+)?(\w+)\s+(?:(\d+)\s+)?([\w|\d|\.]+)$/;

Then got a Segmentation fault

Can you spot the error?
Here's a hint: it is missing a colon (:).

This is perl, v5.10.0 built for i486-linux-gnu-thread-multi

11 Comments

"Can you spot the error?"

Yup, trying to do too much in a single regular expression. That gibberish you wrote can not be maintained and is therefore useless. Try breaking it into parts with meaningful names.

This regex is really not bad at all. The trick is the fact that it's generating a segfault, which is not terribly helpful when you're trying to spot the problem.

No lookaheads/behinds, only basic character classes like word, digit, space, no executable code, no recursion, no nested captures, and it's only 53 chars long. Breaking it into smaller pieces won't make it any easier to maintain if you can't read it the way it is now, that's purely a matter of style.

He's got five groups, three of them are capture and not optional, and two of them are optional and (should be) capture. It looks like the first one is missing the colon, so it's actually capturing even though it's optional, which is probably the source of the segfault.


s/(should be) capture/(should be) non-capture/ ;)

Can you try again with 5.10.1 or better yet a 5.11.5 or blead perl? Just to make sure it's not going to be in 5.12 (or 5.14).

my $name_re         = qr/( [\w|\.]+ )/x;
my $space_re        = qr/\s+/;
my $opt_ttl_re      = qr/(?: (\d+) \s+ )?/x;
my $type_re         = qr/(\w+)/;
my $opt_priority_re = qr/(?: (\d+) \s+ )?/x;
my $content_re      = qr/( [\w|\d|\.]+ )/x;

qr/
^
$name_re
$space_re
$opt_ttl_re
$type_re
$space_re
$opt_priority_re
$content_re
$
/x;
my $name_re         = qr/( [\w|\.]+ )/x;
my $space_re        = qr/\s+/;
my $opt_ttl_re      = qr/(?: (\d+) \s+ )?/x;
my $type_re         = qr/(\w+)/;
my $opt_priority_re = qr/(?: (\d+) \s+ )?/x;
my $content_re      = qr/( [\w|\d|\.]+ )/x;

qr/
^
$name_re
$space_re
$opt_ttl_re
$type_re
$space_re
$opt_priority_re
$content_re
$
/x;

$space_re is overdoing it a bit.

"a bit"? The whole comment is pretty insane, not least because it obscures capture group numbers.

How are you running this exactly? I can't get it to segfault on 5.10.0, 5.10.1 or blead.

Leave a comment

About Sawyer X

user-pic Gots to do the bloggingz