Native Variable-Length Lookbehind

No, I'm not talking about Dr. Regex' emulated variable-length lookbehinds, which frankly make my head hurt. Beginning with Perl 5.29.9, Perl has honest-to-heaven, really truly variable-length lookbehinds.

Now, there is at least one restriction. No lookbehind assertion can be more than 255 characters long. This limit has been around, as nearly as I can tell, ever since lookaround assertions were introduced in 5.005. But it has been lightly documented until now. This restriction means you can not use quantifiers * or +. But bracketed quantifiers are OK, as is ?.

As Perl tracks Unicode's case-folding rules, variable-length lookbehinds are becoming increasingly hard to avoid, and can crop up in inobvious places. Witness Perl Porters thread 245323 and its associated RT ticket.

What happened here is that Unicode decided that (e.g.) /ss/i should match the German sharp s. This is not the only example -- ligatures are treated the same way. So a regular expression with no quantifiers at all suddenly becomes variable-length simply by making it case-blind. Discussion in the RT ticket seemed to be leaning toward special-casing the problem characters, but it was the general case that got released.

So now you can match things like /(?<=fo{2,20})bar/. It's still marked experimental, though.

Leave a comment

About Tom Wyant

user-pic I blog about Perl.