Spelunking: why 'while(){ }' is my new favorite perl-ism

Today I saw a post by sartak mentioning a feature I didn't know: while () { ... } is legal and acts as an infinite loop.. This is awesome! I occasionally have need for a while(true) loop, and I like while() better than while(1).

But I couldn't stop there, I had to find out why, and if it's a feature I can count on.

(If you think you'll go cross-eyed looking at parsing details, please still check out the commit message I found at the end, it's a neat bit of history).

Checking the docs

First things first, I went to check the docs to see if I could find this mentioned. perldoc -f while points to perldoc perlsyn, but I couldn't find anything referring to this on that page - although I did find that while while() is exactly while(1), while(()) is equivalent to while(0) - since in the second case, there actually is an expression inside that while, that expression is an empty list, and the docs point out the empty list evaluates as false. Fun.

Since I didn't see anything in the docs, I figure'd check the other docs - the code base.

Checking the deparse

I starting my investigation by checking to see what my system perl thinks I wrote when I ask it to do while(){}. To do that I used the module Deparse, which is conveniently part of the base install:

bash $ perl -MO=Deparse -e 'while(){}'
while (1) {
    ();
}
-e syntax OK

Compare that to our false case above:

bash $ perl -MO=Deparse -e 'while(()){}'
while (()) {
    ();
}
-e syntax OK

The Deparse module (really B::Deparse, a compiler backend module) simply turns the compiled byte code of your program back into perl code. while() without any arguments decompiles as if I'd passed in a 1, while while(()) comes back exactly as I'd written it. Interesting.

Checking the parse

Where does it do that?

I've spent enough time with the parser and interpreter to have a copy (copies, really) of perl already built with DEBUGGING enabled, and finger memory of which flags to pass to see what's going on. So I ran this:

bash $ perl5.19.9 -Dp -e 'while(){}'
... lots of lines snipped ...
Shifting token '(', Entering state 199
Reducing stack by rule 15 (line 227), -> remember
Entering state 295
Reading a token: Next token is token ')' (0x1)
Reducing stack by rule 69 (line 646), -> texpr
Entering state 356
Next token is token ')' (0x1)
... lots more lines snipped ...
Entering state 449
Reducing stack by rule 39 (line 435), WHILE '(' remember texpr ')' mintro mblock cont -> barestmt
Entering state 102
... and just a few more snipped ...

Pretty obvious why this is buried under DEBUGGING! The parts that are interesting are the ')', since whatever preceded it had to be the empty expression, and texpr, which is presumably that empty expression. That line (

Reducing stack by rule 69 (line 646), -> texpr
) may be hard to read, but it tells us exactly where to look - line 646 of that version's perly.y:

 644 /* Boolean expression */
 645 texpr   :       /* NULL means true */
 646                         { YYSTYPE tmplval;
 647                           (void)scan_num("1", &tmplval);
 648                           $$ = tmplval.opval; }
 649         |       expr
 650         ;

The comment specifically says NULL means true, and the code of texpr says texpr is either an EXPR (like the docs say) or, if its NULL, is considered true (by injecting the number 1). But it doesn't give me much insight into why that is.

Checking the history

Since the officials docs don't have much on it, and the code comments don't either, it's time to check the commit history:

bash $ git blame perly.y
... lines snipped ...
^8d063cd perl.y  (Larry Wall              1987-12-18 00:00:00 +0000  645) texpr :       /* NULL means true 
f05e27e5 perly.y (Dave Mitchell           2006-12-04 15:38:05 +0000  646)                       { YYSTYPE t
f05e27e5 perly.y (Dave Mitchell           2006-12-04 15:38:05 +0000  647)                         (void)sca
f05e27e5 perly.y (Dave Mitchell           2006-12-04 15:38:05 +0000  648)                         $$ = tmpl
^8d063cd perl.y  (Larry Wall              1987-12-18 00:00:00 +0000  649)       |       expr
^8d063cd perl.y  (Larry Wall              1987-12-18 00:00:00 +0000  650)       ;
^8d063cd perl.y  (Larry Wall              1987-12-18 00:00:00 +0000  651) 

The comment that NULL means true dates from 1987, and checking the logs shows it to be the very first commit! I'd never seen that commit before, but I like it so much I want to quote the whole thing here:

     a "replacement" for awk and sed

    [  Perl is kind of designed to make awk and sed semi-obsolete.  This posting
       will include the first 10 patches after the main source.  The following
       description is lifted from Larry's manpage. --r$  ]

       Perl is a interpreted language optimized for scanning arbitrary text
       files, extracting information from those text files, and printing
       reports based on that information.  It's also a good language for many
       system management tasks.  The language is intended to be practical
       (easy to use, efficient, complete) rather than beautiful (tiny,
       elegant, minimal).  It combines (in the author's opinion, anyway) some
       of the best features of C, sed, awk, and sh, so people familiar with
       those languages should have little difficulty with it.  (Language
       historians will also note some vestiges of csh, Pascal, and even
       BASIC-PLUS.) Expression syntax corresponds quite closely to C
       expression syntax.  If you have a problem that would ordinarily use sed
       or awk or sh, but it exceeds their capabilities or must run a little
       faster, and you don't want to write the silly thing in C, then perl may
       be for you.  There are also translators to turn your sed and awk
       scripts into perl scripts.

Updating the docs

This could use an update to perlsyn, I think, if it's not mentioned already. I don't have any plans to attempt that part of it, so consider this an open invitation.

6 Comments

As

if () {}

gives a syntax error, there must be a bug in while or if, or the docs are wrong.

Compare this:

$ perl  -e '$c = () ? 1 : 2;print $c,"\n";'
2

From the docs I would expect an empty expression to evaluate to undef.

until(){} gives a syntax error.

I wonder if Larry intended to imitate the behaviour of another language. Or if this is Larryish syntactic sugar. Or a bug.

The commits from before Perl 5 or so aren't real git commits of course; they're perl releases gathered from Usenet and maybe other places and imported into git after the fact, and the commit messages are the associated release announcements.

(And the commits from ??? until 2008 are real enough, but they were originally done in Perforce, of all things, and imported into git in 2008, when the git became the master repo.)

Well, they're real git commits, for sure. But they are cloned from all the things that preceded git for Perl's version control by an immense effort to accurate capture as much of Perl history as was now possible.

Leave a comment

About Peter Martini

user-pic I like thinking about machines, especially virtual machines like Perl's VM, the Java VM, and kvm/qemu