Stupid Unicode Trick: Noncharacters

Perl RT 133292 is a request to expose the internal perl subroutine that does string interpolation, so that one does not need to figure out how to double-quote a string (escaping where necessary) when feeding it to eval() as a quick-and-dirty templating system.

In fact, there is no such subroutine (the parser makes an interpolation into a concatenation), and anyway why not just use sprintf()?

For someone determined to dive down this rabbit hole, though, Unicode offers another way out: use generic quotes; that is, qq, but delimit the string with a noncharacter. This was suggested to me when I saw the Incompatible Changes section of the Perl 5.29.0 perldelta (1). They're making unassigned code points illegal delimiters? They were legal before??? What else is legal that I did not know about????? (2)

At this point, a couple definitions are in order. A noncharacter is a legal Unicode code point which the Unicode Consortium promises never to assign to a glyph. An unassigned code point is a legal code point which is not currently assigned to a glyph, but which may be in the future.

The information in the relevant section of perldeprecation appears to say that noncharacters will work as generic quote delimiters, though perlop Quote and Quote-like Operators is silent on just which delimiters are allowed. The Unicode Private-Use Characters, Noncharacters & Sentinels FAQ seems to say that this kind of thing is actually what noncharacters are for. A one-liner example, which works from about Perl 5.8.3, is:

$ perl -le 'print eval "qq \N{U+FFFF}$ARGV[0]\N{U+FFFF}"' \
    'Hello $ARGV[1]!' world

The space after the qq seems not to be required, since the delimiter is not a word character. It is simply there for clarity.

If you have warnings enabled (as you should) you may need a no warnings qw{ utf8 }; to prevent the code from being noisy.

For the truly adventurous, illegal code points (those above U+10FFFF) also work.

Me? I'm sticking with sprintf().

1) Yes, this blog post was drafted a year ago, and then sat gathering dust.

2) Five question marks is a sure sign of a diseased mind. -- Terry Pratchett, kinda sorta (3).

3) The exact quote involves exclamation points.

Leave a comment

About Tom Wyant

user-pic Fine Perl code for over 0.005 centuries.