My Favorite Warnings: regexp
'A fair jaw-cracker dwarf-language must be.' -- Samwise Gamgee, The Lord of the Rings, II/iii: "The Ring Goes South", as quoted in regcomp.c
, the Perl regular expression compiler.
As you would expect, this category gets you warnings about possibly-problematic regular expression constructions. A couple specific examples are:
Assuming NOT a POSIX class
...- This warning is about things that look kind of like POSIX character classes, but do not parse that way. The full diagnostic gives examples like
[[:alnum]]
(missing colon) and[[:digit:xyz]
(missing right square bracket). These parse like simple character classes ([:[almnu]\]
and[:[dgitxyz]
respectively), so without the warning you get a hard-to-diagnose bug. Unescaped left brace in regex is passed through
...- Efforts to eliminate unescaped left braces so that they are available for new syntax have been underway since 5.17.0, released May 2012. As I recall, this effort turned to be much harder than originally anticipated because at least one toolchain external to Perl (
autoconf
if memory serves) relied on this behavior. Using /u for
...- The
/a
and/aa
regular expression modifiers cause built-in character classes such as\d
to match ASCII only. But some regular expression constructions such as\b{...}
are explicitly Unicode. Perl interprets these as written, but warns you. Note that\b{...}
is an example of the new functionality added by re-purposing curly brackets.
The above list is far from exhaustive. There are diagnostics for superfluous quantifiers (on zero-width assertions) and greediness specifications (on fixed-width items), since regular expressions are already "A fair jaw-cracker" without the unnecessary cruft. In addition, there are diagnostics for invalid or meaningless uses of the /c
, /g
, and /p
modifiers.
Within the scope of a use re 'strict';
pragma, additional diagnostics are possible. This pragma was the subject of last week's blog, My Favorite Modules: re
, which was written as background for this blog entry.
Note that use re 'strict';
is documented as experimental, with the warning that even the interface to the functionality may change. Too bad, because I would kind of like to enable some of the additional diagnostics:
Empty (?) without any modifiers in regex
...- This is of note because one of the diagnostics enabled by
use warnings 'ambiguous';
recommends the use of this construction as a way of removing the ambiguity. See My Favorite Warnings:ambiguous
for details. "%s" is more clearly written simply as "%s"
...- This is about representations of single characters. I imagined from the text of the diagnostic that it was about something like writing
\x07
versus\a
or\N{ALERT}
, but I was unable to get this diagnostic after a grueling 2-3 minutes of playing with it. - Unescaped literal right square brackets and braces
- Makes sense to me. I did not quote the diagnostic because in this context the
'%c'
that represents the character is too opaque to be helpful.
Previous entries in this series:
Leave a comment