My Favorite Warnings: regexp
'A fair jaw-cracker dwarf-language must be.' -- Samwise Gamgee, The Lord of the Rings, II/iii: "The Ring Goes South", as quoted in regcomp.c, the Perl regular expression compiler.
As you would expect, this category gets you warnings about possibly-problematic regular expression constructions. A couple specific examples are:
Assuming NOT a POSIX class...- This warning is about things that look kind of like POSIX character classes, but do not parse that way. The full diagnostic gives examples like
[[:alnum]](missing colon) and[[:digit:xyz](missing right square bracket). These parse like simple character classes ([:[almnu]\]and[:[dgitxyz]respectively), so without the warning you get a hard-to-diagnose bug. Unescaped left brace in regex is passed through...- Efforts to eliminate unescaped left braces so that they are available for new syntax have been underway since 5.17.0, released May 2012. As I recall, this effort turned to be much harder than originally anticipated because at least one toolchain external to Perl (
autoconfif memory serves) relied on this behavior. Using /u for...- The
/aand/aaregular expression modifiers cause built-in character classes such as\dto match ASCII only. But some regular expression constructions such as\b{...}are explicitly Unicode. Perl interprets these as written, but warns you. Note that\b{...}is an example of the new functionality added by re-purposing curly brackets.
The above list is far from exhaustive. There are diagnostics for superfluous quantifiers (on zero-width assertions) and greediness specifications (on fixed-width items), since regular expressions are already "A fair jaw-cracker" without the unnecessary cruft. In addition, there are diagnostics for invalid or meaningless uses of the /c, /g, and /p modifiers.
Within the scope of a use re 'strict'; pragma, additional diagnostics are possible. This pragma was the subject of last week's blog, My Favorite Modules: re, which was written as background for this blog entry.
Note that use re 'strict'; is documented as experimental, with the warning that even the interface to the functionality may change. Too bad, because I would kind of like to enable some of the additional diagnostics:
Empty (?) without any modifiers in regex...- This is of note because one of the diagnostics enabled by
use warnings 'ambiguous';recommends the use of this construction as a way of removing the ambiguity. See My Favorite Warnings:ambiguousfor details. "%s" is more clearly written simply as "%s"...- This is about representations of single characters. I imagined from the text of the diagnostic that it was about something like writing
\x07versus\aor\N{ALERT}, but I was unable to get this diagnostic after a grueling 2-3 minutes of playing with it. - Unescaped literal right square brackets and braces
- Makes sense to me. I did not quote the diagnostic because in this context the
'%c'that represents the character is too opaque to be helpful.
Previous entries in this series:
I blog about Perl.
Leave a comment