Quick and dirty string dumping
Sometimes, when you're trying to debug encoding issues in Perl code, it is useful to quickly get an idea of what code points Perl thinks are in your string. The straightforward approach of say $string (or print $string) isn't a good way to look at an unknown string: The results depend on what encoding layers (if any) are set on STDOUT and how your terminal renders the resulting bytes. In some cases, "\xe2\x82\xac" (a three-character string, UTF-8 bytes) may look identical to "\x{20ac}" (a one-character string, Unicode text) when printed, for example. (And some control characters are invisible or can clear (parts of) the screen or reposition the cursor, etc.)
Data::Dumper isn't great, either: For one thing, you have to load it first and second, a naive use Data::Dumper; print Dumper($string) can still dump non-printable control characters on your terminal. You have to remember to set $Data::Dumper::Useqq = 1 first, which is a hassle (and takes another line of code).
A non-obvious feature of sprintf can help here: sprintf('%vd', $string) gives the decimal values of all code points in $string separated by dots. For example:
sprintf('%vd', "abc\n\a")is97.98.99.10.7sprintf('%vd', "\xe2\x82\xac")is226.130.172sprintf('%vd', "\x{20ac}")is8364
Don't like decimal character codes? Use hex instead:
sprintf('%vx', "abc\n\a")is61.62.63.a.7sprintf('%vx', "\xe2\x82\xac")ise2.82.acsprintf('%vx', "\x{20ac}")is20ac
Don't like dots? Use whatever separator you want:
sprintf('%*vd', '~', "abc\n\a")is97~98~99~10~7
In short, sprintf('%vd', $string) works like join('.', map { sprintf('%vd', ord $_) } split(//, $string)), only a lot more compact. I don't use it for user-visible output, but when it comes to quickly logging unknown strings and tracking down encoding bugs, it can be very nice.
I blog about Perl.
Leave a comment