Quick and dirty string dumping

Sometimes, when you're trying to debug encoding issues in Perl code, it is useful to quickly get an idea of what code points Perl thinks are in your string. The straightforward approach of say $string (or print $string) isn't a good way to look at an unknown string: The results depend on what encoding layers (if any) are set on STDOUT and how your terminal renders the resulting bytes. In some cases, "\xe2\x82\xac" (a three-character string, UTF-8 bytes) may look identical to "\x{20ac}" (a one-character string, Unicode text) when printed, for example. (And some control characters are invisible or can clear (parts of) the screen or reposition the cursor, etc.)

Data::Dumper isn't great, either: For one thing, you have to load it first and second, a naive use Data::Dumper; print Dumper($string) can still dump non-printable control characters on your terminal. You have to remember to set $Data::Dumper::Useqq = 1 first, which is a hassle (and takes another line of code).

A non-obvious feature of sprintf can help here: sprintf('%vd', $string) gives the decimal values of all code points in $string separated by dots. For example:

  • sprintf('%vd', "abc\n\a") is 97.98.99.10.7
  • sprintf('%vd', "\xe2\x82\xac") is 226.130.172
  • sprintf('%vd', "\x{20ac}") is 8364

Don't like decimal character codes? Use hex instead:

  • sprintf('%vx', "abc\n\a") is 61.62.63.a.7
  • sprintf('%vx', "\xe2\x82\xac") is e2.82.ac
  • sprintf('%vx', "\x{20ac}") is 20ac

Don't like dots? Use whatever separator you want:

  • sprintf('%*vd', '~', "abc\n\a") is 97~98~99~10~7

In short, sprintf('%vd', $string) works like join('.', map { sprintf('%vd', ord $_) } split(//, $string)), only a lot more compact. I don't use it for user-visible output, but when it comes to quickly logging unknown strings and tracking down encoding bugs, it can be very nice.

A polymorphic quine

I thought it might be fun to write a "polymorphic virus", but that would involve learning executable file formats and machine code, which sounds like a lot of effort. So instead I went for the next best thing: Perl instead of machine code, and self-replicating code in the form of a quine rather than "infecting" other executables.

When you run this code, it will output a new version of itself, which you can run again to get another version, etc. Unlike a regular quine, which will produce an exact copy of its source code, this program is polymorphic: Its output is functionally identical to the original, but the code might look completely different. This is just a proof-of-concept, but it does try to introduce variations in all parts that a simple static scanner might use for detection (such as signatures in an antivirus). There are still fixed parts, but they're small and generic (such as ; or =) and can be found in most any perl script.

(my $Edw70Kp = "|ygn`hjs\nmo\\yjc]eyuy}YUl[h^y:YWyw\nmo\\ym`gnyu\nyyyygsy\"}mnl#y7y:Y5\nyyyyl_nolhymjlchn`\"{p~p^{&y}mnl#\nyyyyyyyyc`yl[h^\"#y6y*(,5\nyyyy}mnly7xymu\"UXy'xW#v\"U{V}V:VVW#wu\nyyyyyyyy^_`ch_^y},y9y{VV},{y4\nyyyyyyyy}+y_ky{Vh{y9y{VVh{y4\nyyyyyyyymjlchn`\"{VVr~*,r{&yil^y}+#\nyyyyw_a5\nyyyy!{!y(y}mnly(y!{!\nw\n\ngsy}p[ly7y!}!y(ydichy!!&yjc]e\"![!y((y!t!&y!;!y((y!T!&y!Y!#&yg[jyjc]e\"![!y((y!t!&y!;!y((y!T!&y!Y!&y!*!y((y!3!#&y'-y((yl[h^y++5\ngsy\"}ip[l#y7y}?^q1*Ejy7xy)7yV}\"Vq%#y7)5\n}?^q1*Ejy7xym)V}}ip[lV\\)}p[l)a5\n\ngsy}h_rny7yjc]e\"!!&y!gsy!&y!ioly!#y(y}p[ly(y!y!y(yjc]e\"!7!&y!(7!&y!vv7!#y(y!y!5\n\nc`y\"l[h^\"#y6y*(/#yu\nyyyygsy}ey7ydichy!!&yg[jyuy]blyl[h^y,/0ywy+y((yf_hanb\"}?^q1*Ej#5\nyyyy}h_rny(7ym`gn\"}e#y(y!yXy!y(ym`gn\"}eyXy}?^q1*Ej#5\nwy_fm_yu\nyyyygsy}lmy7yil^y{y{5\nyyyygsy}l_y7yil^y{x{5\nyyyygsy}lqy7y}l_y'y}lm5\nyyyygsy}^y7y+y%ychnyl[h^\"}lq#5\nyyyy\"gsy}^j_`y7y}?^q1*Ej#y7xym)\"Uy'xW#)]bl\"\"il^\"}+#y'y}lmy%y}lqy%y+y'y}^#y~y\"}lqy%y+#y%y}lm#)_a5\nyyyy}h_rny(7ym`gn\"}^j_`#5\nyyyygsy}hy7y]bl\"}lmy%y}^#5\nyyyy\"gsy}gy7y}h#y7xynl)z'xy)y'x)5\nyyyym)\"UV)V'VVW#)VV}+)ay`ily}g&y}h5\nyyyy}h_rny7y{\"}h_rn#y7xy{y(yjc]e\"!nl!&y!s!#y(y{)y'x)}h'xy'}g){5\nw\n}h_rny(7y!5y!5\n\n}h_rny(7yjc]e\"{_p[fy}p[l{&y{m)X)}p[l)__{#y(y{Vh{5\n\njlchny}h_rn5\n") =~ tr/ -~/&-~ -%/; eval $Edw70Kp

A palindromic polyglot program in x86 machine code, Perl, shell, and make

https://binary.golf/6:

Binary Golf Grand Prix is an annual small file format competition, currently in it's sixth year. The goal is to make the smallest possible file that fits the criteria of the challenge.

This year's BGGP challenge was to output or display 6. I always wanted to work with actual machine code, so I decided to submit a DOS COM executable. Why? Because the COM format has no headers or other metadata; you can just put some x86 instructions in a file and run it directly.

Having no experience with DOS, I started by looking up a "hello world" example and found https://github.com/susam/hello:

MOV AH, 9
MOV DX, 108
INT 21
RET
DB 'hello, world', D, A, '$'

This loads 9 into the AH register (the upper byte of AX) and executes interrupt 0x21, which triggers the DOS "display string" routine. The address of the string is given directly in DX; $ is used as an in-band string terminator because DOS is weird.

Adapting this snippet to output 6 instead is trivial, but I discovered something better: Function 2 of interrupt 0x21 outputs a character (code given in DL) directly. That gives us:

MOV AH, 2
MOV DL, '6'
INT 21
RET

Or in binary:

b4 02 b2 36 cd 21 c3

If you write these 7 bytes to a .COM file, the result is already a fully functional DOS executable. And since the RET command terminates the program, we can append whatever bytes we want, for example to create a palindrome:

b4 02 b2 36 cd 21 c3 21 cd 36 b2 02 b4

However, I've always liked polyglots. It turns out that the byte sequence 23 de corresponds to the x86 instruction AND BX, SI (which modifies BX, but is harmless otherwise). And byte 23 happens to be character # in ASCII, which means anything that follows will be ignored as a comment when the binary file is read by an interpreter that understands # as a comment marker (which includes Perl, Python, Ruby, PHP, make, and the shell). This leads to the following x86/Perl polyglot:

#<DE><B4><02><B2>6<CD>!<C3>
print 6;

And with a few modifications, we get a palindrome again:

#<DE><B4><02><B2>6<CD>!<C3>#
print 6#6 tnirp
#<C3>!<CD>6<B2><02><B4><DE>#

This is also a valid shell script, but it tries to run a program called print with arguments '6#6' and 'tnirp'. We can make the shell recognize # as a comment marker by putting a space in front, but there is no print command, so how do we make the shell use echo while retaining print for perl? Fortunately we don't need to if we're willing to use 6 as a "format string" and switch to printf:

#<DE><B4><02><B2>6<CD>!<C3>#
printf 6 # 6 ftnirp
#<C3>!<CD>6<B2><02><B4><DE>#

We can do one better and add make to the mix. We just need some form of dummy target and an empty list of prerequisites; the rest will be the shell command we already have. Normally that would look like this (with a literal tab before printf):

foo:
        printf 6

However, at least GNU make lets you write it all in one line without using tabs:

foo: ; printf 6

This form happens to be valid Perl already: foo: ; is just a label attached to a null statement. But the shell would try to run a program called foo: and we don't want that. A creative choice of label name and spacing takes care of this problem as well:

true :;printf 6

To make this means: The target true can be created/updated (with no prerequisites) by running printf 6. Since it is the first target in our "makefile", true automatically becomes the default target.

To perl this means: A label (named true) is attached to a null statement, followed by printf(6) (the final semicolon being optional because we're at the end of the file). 6 is implicitly converted to the format string "6", which simply outputs 6.

To the shell this means: Run the true command (with an argument of ':'), then run the printf command (with an argument of 6).

In final 55-byte palindrome + x86 machine code form:

#<DE><B4><02><B2>6<CD>!<C3>#
true :;printf 6 # 6 ftnirp;: eurt
#<C3>!<CD>6<B2><02><B4><DE>#

That's it!

Automated testing on Windows with AppVeyor

AppVeyor is a continuous integration service similar to Travis CI, just on Windows. If you have a Perl module on GitHub, it's not that hard to have it run tests automatically on Windows; it's just not well documented.

(The following information was taken from https://blogs.perl.org/users/eserte/2016/04/testing-with-appveyor.html, the AppVeyor documentation, and random trial and error.)

First you need to sign in to AppVeyor with your GitHub account and let it access your repositories, as described on https://www.appveyor.com/docs/.

Then you need to add a .appveyor.yml file to your repository. Mine looks like this:

cache:
  - C:\strawberry

install:
  - if not exist "C:\strawberry" choco install strawberryperl -y
  - set PATH=C:\strawberry\c\bin;C:\strawberry\perl\site\bin;C:\strawberry\perl\bin;%PATH%
  - cd %APPVEYOR_BUILD_FOLDER%
  - cpanm --quiet --installdeps --with-develop --notest .

build_script:
  - perl Makefile.PL
  - gmake

test_script:
  - gmake test

The cache part tells AppVeyor to save the contents of C:\strawberry after every successful build and to restore C:\strawberry (if available) before starting a fresh build. See https://www.appveyor.com/docs/build-cache/.

The install script checks for the existence of C:\strawberry. If it's not there, Chocolatey (a Windows package manager) is used to install the Strawberry Perl package (currently using Strawberry Perl 5.26.1.1). Then the relevant program directories are added to the PATH.

The next commands switch to the build directory and install any module dependencies (I run author tests on AppVeyor, so I include developer dependencies).

The build_script and test_script parts are just the usual perl Makefile.PL && make && make test step. Strawberry Perl comes with GNU make now, so we can use gmake instead of the older dmake.

And that's it. My module (including development branches and pull requests) is now automatically tested on Windows.


This post is not directly related to core perl, but AppVeyor was discussed at the 2017 Perl 5 Hackathon and this is what got me to take a closer look at the system (and write up the results). We had a good time.

Sponsors for the Perl 5 Hackathon 2017

This conference could not have happened without the generous contributions from the following companies:

Booking.com, cPanel, craigslist, bluehost, Assurant, Grant Street Group

/Fizz|Buzz/

use v5.12.0;
use warnings;

s/\A(?:[0369]|[147][0369]*(?:[147][0369]*[258][0369]*)*(?:[147][0369]*[147]|[258])|[258][0369]*(?:[258][0369]*[147][0369]*)*(?:[258][0369]*[258]|[147]))*(?:0|[147][0369]*(?:[147][0369]*[258][0369]*)*5|[258][0369]*(?:[258][0369]*[147][0369]*)*[258][0369]*5)\z/Fizzbuzz/,
s/\A(?:[0369]|[147][0369]*(?:[147][0369]*[258][0369]*)*(?:[147][0369]*[147]|[258])|[258][0369]*(?:[258][0369]*[147][0369]*)*(?:[258][0369]*[258]|[147]))+\z/Fizz/,
s/\A[0-9]*[05]\z/Buzz/,
say
for 1 .. 100