Regex /m modifier bug in Perl 5.8.8 and older
It’s 2016, but the CPAN Pull Request Challenge continues. Motivated by my 100% in 2015, I subscribed to the second year, as well. Unfortunately, I didn’t have time to blog about my January PR, but it would have been more about Git than Perl, anyway.
My March assignment was Plack::Middleware::ReverseProxyPath. I noticed the module had several testers’ failures, and looking at the matrix I noticed Perl 5.8.8 was all red in both Linux and Darwin, so I decided to have a look at that.
Most of the failures were just variations of the same problem:
# Failed test at t/basics.t line 71.
# 'psgi.multiprocess:
# SERVER_NAME: localhost
# SCRIPT_NAME: /env_inner
# PATH_INFO: /path
# CONTENT_LENGTH: 0
# REQUEST_METHOD: GET
# psgi.multithread:
# REMOTE_PORT: 3315
# QUERY_STRING:
# SERVER_PORT: 80
# REMOTE_ADDR: 127.0.0.1
# SERVER_PROTOCOL: HTTP/1.1
# psgi.streaming: 1
# REQUEST_URI: /env_inner/path
# psgi.errors: *main::STDERR
# psgi.nonblocking:
# psgi.version: ARRAY(0x2955640)
# REMOTE_HOST: localhost
# psgi.url_scheme: http
# psgi.run_once: 1
# HTTP_HOST: localhost
# psgi.input: GLOB(0x28b6ac0)
# '
# doesn't match '(?mx-is: ^ REQUEST_URI: \s /env_inner/path $ )'
Now I needed to reproduce the problem. I spent half an hour watching Perlbrew installing 5.8.8 on my machine. I had to turn off local::lib in the terminal in order to run it with the correct modules (see below), then I had another 30 minutes to watch cpan installing all the dependencies (manual intervention was sometimes needed for missing dependencies, probably modules that are now in core, but weren’t back then).
Here’s how you can turn off local::lib easily in bash:
unset ${!PERL*}
Once all the dependencies were installed, I tried to install the module itself—and got the expected failure. I created a simplified version of the code to make it possible to run it faster over and over again, and verified it works in 5.20.1:
#!/usr/bin/perl
use warnings;
use strict;
my $s = 'psgi.multiprocess:
psgi.streaming: 1
REQUEST_URI: /env_inner/path
psgi.errors: *main::STDERR
';
my $r = qr{(?mx-is: ^ REQUEST_URI: \s /env_inner/path $ )};
print $s =~ $r;
It worked smoothly, nevertheless, to my surprise, the above code printed 1 in the 5.8.8 as well. I looked into the test source and noticed it uses the qr{...}xm
syntax, so I modified the code accordingly and finally got the failure in 5.8.8, while 5.20.1 was still passing.
Maybe it’s some kind of a whitespace issue, I thought to myself and added \s*
into the regex:
my $r = qr{(?mx-is: ^ REQUEST_URI: \s /env_inner/path \s* $ )};
And the code started to pass in both the versions! Although, a hexdump of the string showed no whitespace at that position. Indeed, after changing the asterisk into a plus, the code failed in both Perls.
I tried a simple math: What could be a result of \s* - \s+
? It should match whitespace repeated zero or more times, but it shouldn’t match whitespace repeated at least once. The answer was \s{0}
, and it satisfied 5.8.8. It worked in 5.20.1, but I didn’t want to make unnecessary changes, so the final fix of the test was:
my $empty = q();
$empty = qr/\s{0}/ if $] <= 5.008008; # Workaround for a 5.8.8 bug.
# ...
my @tests = (
(GET "/env_inner/path") => sub {
like $_->content, qr{ ^ SCRIPT_NAME: \s /env_inner $empty $ }xm;
like $_->content, qr{ ^ PATH_INFO: \s /path $empty $ }xm;
like $_->content, qr{ ^ REQUEST_URI: \s /env_inner/path $empty $ }xm;
},
# ...
In case the author didn’t like the change, I also submitted a simple PR to replace a deprecated Dist::Zilla plugin by a recommended new one.
One tip for building perl with modern hardware: pass it a parallel make command (use those cores!).
Example:
perlbrew install perl-5.23.8 -j8
That passes the -j8 to make when building perl, resulting in a much faster build and less watching of perlbrew.
The problem is that
qr{}
has a bug in older perls in that it does not propagate the modifiers when they are set outside (e.g.mxs
inqr{/whatever}mxs
). See here about this.My personal "mental discipline" is to never use regexp modifiers outside, but always set them inside through a non-capturing group, e.g.
qr{(?mxs:/whatever)}
. In this way I don't have to worry about special tricks in 5.8.x.