Regexp Delimiters

Perl lets you use almost anything as a regular expression delimiter. It is usual to use punctuation of some sort, but characters that match /\w/ can be used provided there is white space between the operator and the delimiter: m X foo Xsmx compiles and matches 'foobar'. In the presence of use utf8; you can go wild.

A query on the Perl 5 Porters Mailing List (a.k.a. 'p5p') a few days ago asked for opinions about appropriating the colon (':') as a delimiter for modifiers to the regular expression operators. This got me wondering about what regular expression delimiters were actually in use.

I scratched that itch by plowing through my local Mini CPAN, running everything that looked like Perl through PPI, and checking anything that parsed to an object of one of the relevant classes. A summary of the results is appended.

It was no surprise that "/" was the overwhelming favorite. The colon (":") came in 13th. I was a little surprised (after I thought about it) not to see "'" (7th) more popular, since it does not interpolate. After all, why write m/[\@\$]/ when you can write m'[@$]'?

You made it to the end of this post. Your prize (if you want to call it that) is the threatened list of regular expression delimiters, in decreasing order of frequency. The delimiters themselves were formatted by running them through B::perlstring(). I suspect most of the single-digit ones are the result of mis-parses, but believe it or not, some of the instances of "\\" are real regular expression delimiters.

"/"       1420735
"{"       128788
"!"       36081
"|"       23932
"#"       14893
"("       7369
"'"       5180
"["       4220
","       3376
"<"       2926
"%"       2308
"\@"      1302
":"       1232
"\""      828
"."       349
"~"       313
"-"       249
";"       194
"?"       182
"="       109
"^"       59
"0"       43
"`"       35
"+"       29
")"       18
"&"       17
"o"       15
"n"       14
"]"       14
"r"       13
"*"       11
"\\"      11
"\036"    8
"i"       6
"\$"      6
"\a"      6
""        5
"e"       4
">"       4
"1"       4
"8"       3
"S"       3
"6"       3
"9"       3
"_"       2
"f"       2
"a"       2
"}"       2
"g"       2
"m"       2
"5"       2
"v"       1
"q"       1
"l"       1
"I"       1
"d"       1
"M"       1
"c"       1
"s"       1
"t"       1
"H"       1
"\247"    1
"u"       1
"x"       1

1 Comment

A neat idea, thanks for this, and the handy tip regarding the use of "'" as a delimiter. Illustrates how such flexibility can make code easier to read or more difficult, depending on the intentions of the coder.

Leave a comment

About Tom Wyant

user-pic I blog about Perl.