What's In That String?

By Tom Wyant on May 25, 2022 4:54 PM

One of the steps of debugging Perl can be to find out what is actually in a string. There are a number of more-or-less informative ways to do this, and I thought I would compare them.

For this I used two short strings. The first was just the concatenation of the characters whose ordinals are 24 through 39; that is, 16 ASCII characters straddling the divide between control characters and printable characters. The second was a small variation on the first, made by removing the last character and appending "\N{U+100}" (a.k.a. "\N{LATIN CAPITAL A WITH MACRON}") to force the string's internal representation to be upgraded.

The results given below include the version of the module used, the actual code snippet that generated the output, the output itself, and any comments I thought relevant. All subroutines used to dump strings are exportable except for those called as methods. The sample code makes fully-qualified calls because of duplication of subroutine names between different modules.

`Data::Dumper` 2.183 (core since 5.005)

local $Data::Dumper::Useqq = 1;
print Data::Dumper::Dumper( $_ );

$VAR1 = "\30\31\32\e\34\35\36\37 !\"#\$%&'";

$VAR1 = "\30\31\32\e\34\35\36\37 !\"#\$%&\x{100}";

Data::Dumper is probably the default debug output tool. One of its goals is the ability to recover the original data by eval()-ing the output of Dumper(). But note the need to set $Data::Dumper::Useqq true to actually see all characters in the dumped string. If this is not done, the control characters are not converted into escape sequences, so the only way to see them is to pipe your output through hexdump -C. For more general-purpose debugging you may also want to set $Data::Dumper::Sortkeys to 1 so that hash keys come out in non-random order.

`B` 1.82 (core since 5.005)

print B::perlstring( $_ ), "\n";

"\030\031\032\033\034\035\036\037 !\"#\$%&'"

"\x{18}\x{19}\x{1a}\e\x{1c}\x{1d}\x{1e}\x{1f} !\"#\$%&\x{100}"

The primary purpose of the B module is to support rummaging around in Perl's internals. This use as a casual debugging tool is more a happy accident than the actual intent of the module. If you prefer the C language representation of a string, this module also provides cstring().

`Devel::Peek` 1.3 (core since 5.006)

Devel::Peek::Dump( $_ );

SV = PV(0x7f7c1222a2b0) at 0x7f7c1200fee0

  REFCNT = 2

  FLAGS = (POK,IsCOW,pPOK)

  PV = 0x600003e3d760 "\30\31\32\33\34\35\36\37 !\"#$%&'"\0

  CUR = 16

  LEN = 18

  COW_REFCNT = 0

SV = PVMG(0x7f7c135c75e0) at 0x7f7c1100ac48

  REFCNT = 2

  FLAGS = (SMG,POK,pPOK,UTF8)

  IV = 0

  NV = 0

  PV = 0x600003e3d580 "\30\31\32\33\34\35\36\37 !\"#$%&\304\200"\0 [UTF8 "\x{18}\x{19}\x{1a}\e\x{1c}\x{1d}\x{1e}\x{1f} !"#$%&\x{100}"]

  CUR = 17

  LEN = 18

  MAGIC = 0x60000307e310

    MG_VIRTUAL = &PL_vtbl_utf8

    MG_TYPE = PERL_MAGIC_utf8(w)

    MG_LEN = -1

Devel::Peek tells you much more than you probably need to know about a string for casual debugging. Unlike the other modules presented here, it does its output directly to STDERR instead of just returning another string.

`Data::Dump` 1.25 (not in core)

print Data::Dump::dump( $_ ), "\n";

"\30\31\32\e\34\35\36\37 !\"#\$%&'"

"\30\31\32\e\34\35\36\37 !\"#\$%&\x{100}"

Data::Dump is a non-core module written as an alternative to Data::Dumper. Its focus is more on ease of configuration and readability of output.

`JSON` 4.05 (not in core)

state $json = JSON->new->allow_nonref;
print $json->encode( $_ ), "\n";

"\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&'"

"\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&Ā"

JSON is a general-purpose serializer whose output can be made fairly readable.

Note the need to turn on allow_nonref to dump a string, and to turn on pretty and canonical to get indented structures with hash keys in order. Note also that the "\N{U+100}" is represented literally; you will need to set your output encoding (say, by binmode STDERR, ':encoding(utf-8)';) to avoid the dreaded Wide character in print warning.

There are a number of JSON modules available. Output of untried modules may differ from the output I have presented here.

`YAML` 1.30 (not in core)

print YAML::Dump( $_ );

--- "\x18\x19\x1a\e\x1c\x1d\x1e\x1f !\"#$%&'"

--- "\x18\x19\x1a\e\x1c\x1d\x1e\x1f !\"#$%&Ā"

YAML is a general-purpose serializer whose output is fairly readable with minimal to no configuration. Note that the "\N{U+100}" is represented literally; you will need to set your output encoding (say, by binmode STDERR, ':encoding(utf-8)';) to avoid the dreaded Wide character in print warning.

There are a number of YAML modules available. Output of untried modules may differ from the output I have presented here.

`unpack()` (Perl built-in)

print unpack( 'H*', $_ ), "\n";

18191a1b1c1d1e1f2021222324252627

Character in 'H' format wrapped in unpack at (eval 28) line 1 (#1)

    (W unpack) You tried something like

    

       unpack("H", "\x{2a1}")

    

    where the format expects to process a byte (a character with a value

    below 256), but a higher value was provided instead.  Perl uses the

    value modulus 256 instead, as if you had provided:

    

       unpack("H", "\x{a1}")

    

18191a1b1c1d1e1f2021222324252600

The unpack() built-in is included so I can say I think it is a bad idea unless you know your string is bytes, not characters. The big, fat warning (courtesy of the diagnostics module) makes this perfectly clear. In this specific case, the output of "\N{U+100}" is the same as the output of "\N{U+00}", and suppressing the warning does not change this.

It is possible to use the bytes pragma to force byte semantics on the unpack and get the whole string. But what you get is the internal representation, subject to change without notice.

My best advice is to avoid this one unless you really, really know what you are doing.

If you must use this method (and I did warn you) you can make it a little easier on yourself by using

say unpack( 'H*', $_ ) =~ s/..\K/ /gr;

which produces (for the ASCII string)

18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27

The /r causes the substitution to return the modified string rather than modifying it in-place, and requires Perl 5.14. Since I knew I was requiring 5.14 I replaced print() with say().

0 comments

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Tom Wyant

I blog about Perl.

More info »

Tom Wyant