What's In That String?
One of the steps of debugging Perl can be to find out what is actually in a string. There are a number of more-or-less informative ways to do this, and I thought I would compare them.
For this I used two short strings. The first was just the concatenation of the characters whose ordinals are 24 through 39; that is, 16 ASCII characters straddling the divide between control characters and printable characters. The second was a small variation on the first, made by removing the last character and appending "\N{U+100}"
(a.k.a. "\N{LATIN CAPITAL A WITH MACRON}"
) to force the string's internal representation to be upgraded.
The results given below include the version of the module used, the actual code snippet that generated the output, the output itself, and any comments I thought relevant. All subroutines used to dump strings are exportable except for those called as methods. The sample code makes fully-qualified calls because of duplication of subroutine names between different modules.
Data::Dumper
2.183 (core since 5.005)
local $Data::Dumper::Useqq = 1; print Data::Dumper::Dumper( $_ );$VAR1 = "\30\31\32\e\34\35\36\37 !\"#\$%&'";
$VAR1 = "\30\31\32\e\34\35\36\37 !\"#\$%&\x{100}";
Data::Dumper
is probably the default debug output tool. One of its goals is the ability to recover the original data by eval()
-ing the output of Dumper()
. But note the need to set $Data::Dumper::Useqq
true to actually see all characters in the dumped string. If this is not done, the control characters are not converted into escape sequences, so the only way to see them is to pipe your output through hexdump -C
. For more general-purpose debugging you may also want to set $Data::Dumper::Sortkeys
to 1
so that hash keys come out in non-random order.
B
1.82 (core since 5.005)
print B::perlstring( $_ ), "\n";"\030\031\032\033\034\035\036\037 !\"#\$%&'"
"\x{18}\x{19}\x{1a}\e\x{1c}\x{1d}\x{1e}\x{1f} !\"#\$%&\x{100}"
The primary purpose of the B
module is to support rummaging around in Perl's internals. This use as a casual debugging tool is more a happy accident than the actual intent of the module. If you prefer the C language representation of a string, this module also provides cstring()
.
Devel::Peek
1.3 (core since 5.006)
Devel::Peek::Dump( $_ );SV = PV(0x7f7c1222a2b0) at 0x7f7c1200fee0
REFCNT = 2
FLAGS = (POK,IsCOW,pPOK)
PV = 0x600003e3d760 "\30\31\32\33\34\35\36\37 !\"#$%&'"\0
CUR = 16
LEN = 18
COW_REFCNT = 0
SV = PVMG(0x7f7c135c75e0) at 0x7f7c1100ac48
REFCNT = 2
FLAGS = (SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x600003e3d580 "\30\31\32\33\34\35\36\37 !\"#$%&\304\200"\0 [UTF8 "\x{18}\x{19}\x{1a}\e\x{1c}\x{1d}\x{1e}\x{1f} !"#$%&\x{100}"]
CUR = 17
LEN = 18
MAGIC = 0x60000307e310
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = -1
Devel::Peek
tells you much more than you probably need to know about a string for casual debugging. Unlike the other modules presented here, it does its output directly to STDERR
instead of just returning another string.
Data::Dump
1.25 (not in core)
print Data::Dump::dump( $_ ), "\n";"\30\31\32\e\34\35\36\37 !\"#\$%&'"
"\30\31\32\e\34\35\36\37 !\"#\$%&\x{100}"
Data::Dump
is a non-core module written as an alternative to Data::Dumper
. Its focus is more on ease of configuration and readability of output.
JSON
4.05 (not in core)
state $json = JSON->new->allow_nonref; print $json->encode( $_ ), "\n";"\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&'"
"\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&Ā"
JSON
is a general-purpose serializer whose output can be made fairly readable.
Note the need to turn on allow_nonref
to dump a string, and to turn on pretty
and canonical
to get indented structures with hash keys in order. Note also that the "\N{U+100}"
is represented literally; you will need to set your output encoding (say, by binmode STDERR, ':encoding(utf-8)';
) to avoid the dreaded Wide character in print
warning.
There are a number of JSON
modules available. Output of untried modules may differ from the output I have presented here.
YAML
1.30 (not in core)
print YAML::Dump( $_ );--- "\x18\x19\x1a\e\x1c\x1d\x1e\x1f !\"#$%&'"
--- "\x18\x19\x1a\e\x1c\x1d\x1e\x1f !\"#$%&Ā"
YAML
is a general-purpose serializer whose output is fairly readable with minimal to no configuration. Note that the "\N{U+100}"
is represented literally; you will need to set your output encoding (say, by binmode STDERR, ':encoding(utf-8)';
) to avoid the dreaded Wide character in print
warning.
There are a number of YAML
modules available. Output of untried modules may differ from the output I have presented here.
unpack()
(Perl built-in)
print unpack( 'H*', $_ ), "\n";18191a1b1c1d1e1f2021222324252627
Character in 'H' format wrapped in unpack at (eval 28) line 1 (#1)
(W unpack) You tried something like
unpack("H", "\x{2a1}")
where the format expects to process a byte (a character with a value
below 256), but a higher value was provided instead. Perl uses the
value modulus 256 instead, as if you had provided:
unpack("H", "\x{a1}")
18191a1b1c1d1e1f2021222324252600
The unpack()
built-in is included so I can say I think it is a bad idea unless you know your string is bytes, not characters. The big, fat warning (courtesy of the diagnostics
module) makes this perfectly clear. In this specific case, the output of "\N{U+100}"
is the same as the output of "\N{U+00}"
, and suppressing the warning does not change this.
It is possible to use the bytes
pragma to force byte semantics on the unpack and get the whole string. But what you get is the internal representation, subject to change without notice.
My best advice is to avoid this one unless you really, really know what you are doing.
If you must use this method (and I did warn you) you can make it a little easier on yourself by using
say unpack( 'H*', $_ ) =~ s/..\K/ /gr;
which produces (for the ASCII string)
18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27
The /r
causes the substitution to return the modified string rather than modifying it in-place, and requires Perl 5.14. Since I knew I was requiring 5.14 I replaced print()
with say()
.
Leave a comment