Spelunking: why 'while(){ }' is my new favorite perl-ism
Today I saw a post by sartak mentioning a feature I didn't know: while () { ... }
is legal and acts as an infinite loop.. This is awesome! I occasionally have need for a while(true) loop, and I like while() better than while(1).
But I couldn't stop there, I had to find out why, and if it's a feature I can count on.
(If you think you'll go cross-eyed looking at parsing details, please still check out the commit message I found at the end, it's a neat bit of history).
Checking the docs
First things first, I went to check the docs to see if I could find this mentioned. perldoc -f while
points to perldoc perlsyn
, but I couldn't find anything referring to this on that page - although I did find that while while()
is exactly while(1)
, while(())
is equivalent to while(0)
- since in the second case, there actually is an expression inside that while, that expression is an empty list, and the docs point out the empty list evaluates as false. Fun.
Since I didn't see anything in the docs, I figure'd check the other docs - the code base.
Checking the deparse
I starting my investigation by checking to see what my system perl thinks I wrote when I ask it to do while(){}
. To do that I used the module Deparse, which is conveniently part of the base install:
bash $ perl -MO=Deparse -e 'while(){}' while (1) { (); } -e syntax OK
Compare that to our false case above:
bash $ perl -MO=Deparse -e 'while(()){}' while (()) { (); } -e syntax OK
The Deparse module (really B::Deparse, a compiler backend module) simply turns the compiled byte code of your program back into perl code. while()
without any arguments decompiles as if I'd passed in a 1, while while(())
comes back exactly as I'd written it. Interesting.
Checking the parse
Where does it do that?
I've spent enough time with the parser and interpreter to have a copy (copies, really) of perl already built with DEBUGGING
enabled, and finger memory of which flags to pass to see what's going on. So I ran this:
bash $ perl5.19.9 -Dp -e 'while(){}' ... lots of lines snipped ... Shifting token '(', Entering state 199 Reducing stack by rule 15 (line 227), -> remember Entering state 295 Reading a token: Next token is token ')' (0x1) Reducing stack by rule 69 (line 646), -> texpr Entering state 356 Next token is token ')' (0x1) ... lots more lines snipped ... Entering state 449 Reducing stack by rule 39 (line 435), WHILE '(' remember texpr ')' mintro mblock cont -> barestmt Entering state 102 ... and just a few more snipped ...
Pretty obvious why this is buried under DEBUGGING! The parts that are interesting are the ')'
, since whatever preceded it had to be the empty expression, and texpr
, which is presumably that empty expression. That line (
Reducing stack by rule 69 (line 646), -> texpr) may be hard to read, but it tells us exactly where to look - line 646 of that version's perly.y:
644 /* Boolean expression */ 645 texpr : /* NULL means true */ 646 { YYSTYPE tmplval; 647 (void)scan_num("1", &tmplval); 648 $$ = tmplval.opval; } 649 | expr 650 ;
The comment specifically says NULL means true, and the code of texpr says texpr is either an EXPR (like the docs say) or, if its NULL, is considered true (by injecting the number 1). But it doesn't give me much insight into why that is.
Checking the history
Since the officials docs don't have much on it, and the code comments don't either, it's time to check the commit history:
bash $ git blame perly.y ... lines snipped ... ^8d063cd perl.y (Larry Wall 1987-12-18 00:00:00 +0000 645) texpr : /* NULL means true f05e27e5 perly.y (Dave Mitchell 2006-12-04 15:38:05 +0000 646) { YYSTYPE t f05e27e5 perly.y (Dave Mitchell 2006-12-04 15:38:05 +0000 647) (void)sca f05e27e5 perly.y (Dave Mitchell 2006-12-04 15:38:05 +0000 648) $$ = tmpl ^8d063cd perl.y (Larry Wall 1987-12-18 00:00:00 +0000 649) | expr ^8d063cd perl.y (Larry Wall 1987-12-18 00:00:00 +0000 650) ; ^8d063cd perl.y (Larry Wall 1987-12-18 00:00:00 +0000 651)
The comment that NULL means true dates from 1987, and checking the logs shows it to be the very first commit! I'd never seen that commit before, but I like it so much I want to quote the whole thing here:
a "replacement" for awk and sed [ Perl is kind of designed to make awk and sed semi-obsolete. This posting will include the first 10 patches after the main source. The following description is lifted from Larry's manpage. --r$ ] Perl is a interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC-PLUS.) Expression syntax corresponds quite closely to C expression syntax. If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don't want to write the silly thing in C, then perl may be for you. There are also translators to turn your sed and awk scripts into perl scripts.
Updating the docs
This could use an update to perlsyn, I think, if it's not mentioned already. I don't have any plans to attempt that part of it, so consider this an open invitation.
As
gives a syntax error, there must be a bug in while or if, or the docs are wrong.
Compare this:
From the docs I would expect an empty expression to evaluate to undef.
The docs says WHILE takes an EXPR, but the grammar actually has it as a texpr, which is either an EXPR, or if none is supplied, a true value (1). It's the same block of code that allows for(;;){} to mean an infinite loop (and that's the only other place where that applies).
until(){} gives a syntax error.
I wonder if Larry intended to imitate the behaviour of another language. Or if this is Larryish syntactic sugar. Or a bug.
The commits from before Perl 5 or so aren't real git commits of course; they're perl releases gathered from Usenet and maybe other places and imported into git after the fact, and the commit messages are the associated release announcements.
(And the commits from ??? until 2008 are real enough, but they were originally done in Perforce, of all things, and imported into git in 2008, when the git became the master repo.)
Well, they're real git commits, for sure. But they are cloned from all the things that preceded git for Perl's version control by an immense effort to accurate capture as much of Perl history as was now possible.