May 2014 Archives

Spelunking: why 'while(){ }' is my new favorite perl-ism

Today I saw a post by sartak mentioning a feature I didn't know: while () { ... } is legal and acts as an infinite loop.. This is awesome! I occasionally have need for a while(true) loop, and I like while() better than while(1).

But I couldn't stop there, I had to find out why, and if it's a feature I can count on.

(If you think you'll go cross-eyed looking at parsing details, please still check out the commit message I found at the end, it's a neat bit of history).

Checking the docs

First things first, I went to check the docs to see if I could find this mentioned. perldoc -f while points to perldoc perlsyn, but I couldn't find anything referring to this on that page - although I did find that while while() is exactly while(1), while(()) is equivalent to while(0) - since in the second case, there actually is an expression inside that while, that expression is an empty list, and the docs point out the empty list evaluates as false. Fun.

Since I didn't see anything in the docs, I figure'd check the other docs - the code base.

Checking the deparse

I starting my investigation by checking to see what my system perl thinks I wrote when I ask it to do while(){}. To do that I used the module Deparse, which is conveniently part of the base install:

bash $ perl -MO=Deparse -e 'while(){}'
while (1) {
    ();
}
-e syntax OK

Compare that to our false case above:

bash $ perl -MO=Deparse -e 'while(()){}'
while (()) {
    ();
}
-e syntax OK

The Deparse module (really B::Deparse, a compiler backend module) simply turns the compiled byte code of your program back into perl code. while() without any arguments decompiles as if I'd passed in a 1, while while(()) comes back exactly as I'd written it. Interesting.

Checking the parse

Where does it do that?

I've spent enough time with the parser and interpreter to have a copy (copies, really) of perl already built with DEBUGGING enabled, and finger memory of which flags to pass to see what's going on. So I ran this:

bash $ perl5.19.9 -Dp -e 'while(){}'
... lots of lines snipped ...
Shifting token '(', Entering state 199
Reducing stack by rule 15 (line 227), -> remember
Entering state 295
Reading a token: Next token is token ')' (0x1)
Reducing stack by rule 69 (line 646), -> texpr
Entering state 356
Next token is token ')' (0x1)
... lots more lines snipped ...
Entering state 449
Reducing stack by rule 39 (line 435), WHILE '(' remember texpr ')' mintro mblock cont -> barestmt
Entering state 102
... and just a few more snipped ...

Pretty obvious why this is buried under DEBUGGING! The parts that are interesting are the ')', since whatever preceded it had to be the empty expression, and texpr, which is presumably that empty expression. That line (

Reducing stack by rule 69 (line 646), -> texpr
) may be hard to read, but it tells us exactly where to look - line 646 of that version's perly.y:

 644 /* Boolean expression */
 645 texpr   :       /* NULL means true */
 646                         { YYSTYPE tmplval;
 647                           (void)scan_num("1", &tmplval);
 648                           $$ = tmplval.opval; }
 649         |       expr
 650         ;

The comment specifically says NULL means true, and the code of texpr says texpr is either an EXPR (like the docs say) or, if its NULL, is considered true (by injecting the number 1). But it doesn't give me much insight into why that is.

Checking the history

Since the officials docs don't have much on it, and the code comments don't either, it's time to check the commit history:

bash $ git blame perly.y
... lines snipped ...
^8d063cd perl.y  (Larry Wall              1987-12-18 00:00:00 +0000  645) texpr :       /* NULL means true 
f05e27e5 perly.y (Dave Mitchell           2006-12-04 15:38:05 +0000  646)                       { YYSTYPE t
f05e27e5 perly.y (Dave Mitchell           2006-12-04 15:38:05 +0000  647)                         (void)sca
f05e27e5 perly.y (Dave Mitchell           2006-12-04 15:38:05 +0000  648)                         $$ = tmpl
^8d063cd perl.y  (Larry Wall              1987-12-18 00:00:00 +0000  649)       |       expr
^8d063cd perl.y  (Larry Wall              1987-12-18 00:00:00 +0000  650)       ;
^8d063cd perl.y  (Larry Wall              1987-12-18 00:00:00 +0000  651) 

The comment that NULL means true dates from 1987, and checking the logs shows it to be the very first commit! I'd never seen that commit before, but I like it so much I want to quote the whole thing here:

     a "replacement" for awk and sed

    [  Perl is kind of designed to make awk and sed semi-obsolete.  This posting
       will include the first 10 patches after the main source.  The following
       description is lifted from Larry's manpage. --r$  ]

       Perl is a interpreted language optimized for scanning arbitrary text
       files, extracting information from those text files, and printing
       reports based on that information.  It's also a good language for many
       system management tasks.  The language is intended to be practical
       (easy to use, efficient, complete) rather than beautiful (tiny,
       elegant, minimal).  It combines (in the author's opinion, anyway) some
       of the best features of C, sed, awk, and sh, so people familiar with
       those languages should have little difficulty with it.  (Language
       historians will also note some vestiges of csh, Pascal, and even
       BASIC-PLUS.) Expression syntax corresponds quite closely to C
       expression syntax.  If you have a problem that would ordinarily use sed
       or awk or sh, but it exceeds their capabilities or must run a little
       faster, and you don't want to write the silly thing in C, then perl may
       be for you.  There are also translators to turn your sed and awk
       scripts into perl scripts.

Updating the docs

This could use an update to perlsyn, I think, if it's not mentioned already. I don't have any plans to attempt that part of it, so consider this an open invitation.

Prototypes and the call checker

Perl prototypes are fascinating. They enable: making argument count enforceable at compile time; adding an implicit argument; changing how a list of arguments are parsed; changing how individual arguments are parsed; and even allowing Perl to optimize away a call to that function. Unfortunately, it's also impossible to count on any of these things; those changes only take effect if the sub can be reliably looked up when the call to it is being compiled, and since calling a sub with a & specifically prevents doing that look up in the first place.

That's relatively common knowledge. There's another way to disable prototype handling, or rather, to replace prototype handling: the call checker. The prototype handler is the default call checker for any sub when its created, and its the only one that comes built into Perl, but it's possible to provide a replacement and attach it to individual subs of your choosing. The call checker takes affect when determining what to do when each sub is called - subject to the existing limitations of prototypes - and that enables a new class of optimizations.

The last example given for prototypes above, is generally thought of a way to give a name to a constant, but what it's actually doing is declaring a sub that's constrained enough that it's legal for it to be inlined. In other words, because sub CONSTANT() { 5; } has a prototype that makes it illegal to give it arguments, and because the body of the sub is just a single constant, there's no way for that sub to have any side-effects; and since it can't have any side effects, it's possible to skip calling the sub altogether and replace any calls to it with the number 5.

The core limits that optimization to the case of an empty prototype and a sub with only a constant (roughly...) in its call checker, but since that call checker can be overridden, it's possible for a CPAN module to do the same thing but with a more expansive definition for what can be inlined. I'm working on a module to do this now and will post more on it as it develops.

It also enables having something like sub debug { print STDERR ... }, where all calls to debug are not just no-ops, but don't even exist in the compiled version of the program that's being executed. I don't know of a CPAN module for this, and would be more than happy if someone could point me to one that does this in the comments!

Examples

  1. sub foo($$){} foo 1, 2, 3; # Errors during compilation
  2. sub foo(){} foo; # Equivalent to foo($); <
  3. sub foo($) foo bar, baz; # Equivalent to foo(bar), baz();
  4. sub foo(\@\@){} foo @bar, @baz; # Equivalent to foo(\@bar, \@baz){} instead of push @arglist, @foo; push @arglist, @bar; foo(@arglist){}
  5. sub foo(){ 5;} $bar = foo; # Equivalent to writing "$bar = 5"

About Peter Martini

user-pic I like thinking about machines, especially virtual machines like Perl's VM, the Java VM, and kvm/qemu