Copy the script as quine.pl, run it, and you'll get a quine.pdf which can also be mounted as a filesystem:
bash$ file quine.pdf quine.pdf: # ISO 9660 CD-ROM filesystem data 'CDROM' bash$ sudo mount -o ro quine.pdf /mnt bash$ ls /mnt quine.pdf bash$
quine.pl:
#!/usr/bin/perl use strict; use warnings; our $/ = undef; open STDERR, ">", "/dev/null"; open STDOUT, ">", "/dev/null"; my $name = "quine"; system "enscript","-p","$name.ps","$name.pl"; system "ps2pdf", "$name.ps", "$name.pdf"; system "genisoimage", "-o", "$name.iso", "$name.pdf"; unlink "$name.ps"; my ($iso, $pdf); { open my $FH, "<", "$name.iso"; $iso = <$FH>; close $FH; unlink "$name.iso"; } { open my $FH, "<", "$name.pdf"; $pdf = <$FH>; close $FH; unlink "$name.pdf"; } substr($iso, 0, length($pdf)) = $pdf; open my $FH, ">", "$name.pdf"; print $FH $iso; close $FH;
The magic trick? ISO 9660 ignores the first 32K, so we can stuff anything at all we want in there without affecting the integrity of the filesystem.
(Assumes requisite tools are installed and /dev/null exists)
]]>while () { ... }
is legal and acts as an infinite loop.. This is awesome! I occasionally have need for a while(true) loop, and I like while() better than while(1).
But I couldn't stop there, I had to find out why, and if it's a feature I can count on.
(If you think you'll go cross-eyed looking at parsing details, please still check out the commit message I found at the end, it's a neat bit of history).
First things first, I went to check the docs to see if I could find this mentioned. perldoc -f while
points to perldoc perlsyn
, but I couldn't find anything referring to this on that page - although I did find that while while()
is exactly while(1)
, while(())
is equivalent to while(0)
- since in the second case, there actually is an expression inside that while, that expression is an empty list, and the docs point out the empty list evaluates as false. Fun.
Since I didn't see anything in the docs, I figure'd check the other docs - the code base.
I starting my investigation by checking to see what my system perl thinks I wrote when I ask it to do while(){}
. To do that I used the module Deparse, which is conveniently part of the base install:
bash $ perl -MO=Deparse -e 'while(){}' while (1) { (); } -e syntax OK
Compare that to our false case above:
bash $ perl -MO=Deparse -e 'while(()){}' while (()) { (); } -e syntax OK
The Deparse module (really B::Deparse, a compiler backend module) simply turns the compiled byte code of your program back into perl code. while()
without any arguments decompiles as if I'd passed in a 1, while while(())
comes back exactly as I'd written it. Interesting.
Where does it do that?
I've spent enough time with the parser and interpreter to have a copy (copies, really) of perl already built with DEBUGGING
enabled, and finger memory of which flags to pass to see what's going on. So I ran this:
bash $ perl5.19.9 -Dp -e 'while(){}' ... lots of lines snipped ... Shifting token '(', Entering state 199 Reducing stack by rule 15 (line 227), -> remember Entering state 295 Reading a token: Next token is token ')' (0x1) Reducing stack by rule 69 (line 646), -> texpr Entering state 356 Next token is token ')' (0x1) ... lots more lines snipped ... Entering state 449 Reducing stack by rule 39 (line 435), WHILE '(' remember texpr ')' mintro mblock cont -> barestmt Entering state 102 ... and just a few more snipped ...
Pretty obvious why this is buried under DEBUGGING! The parts that are interesting are the ')'
, since whatever preceded it had to be the empty expression, and texpr
, which is presumably that empty expression. That line (
Reducing stack by rule 69 (line 646), -> texpr) may be hard to read, but it tells us exactly where to look - line 646 of that version's perly.y:
644 /* Boolean expression */ 645 texpr : /* NULL means true */ 646 { YYSTYPE tmplval; 647 (void)scan_num("1", &tmplval); 648 $$ = tmplval.opval; } 649 | expr 650 ;
The comment specifically says NULL means true, and the code of texpr says texpr is either an EXPR (like the docs say) or, if its NULL, is considered true (by injecting the number 1). But it doesn't give me much insight into why that is.
Since the officials docs don't have much on it, and the code comments don't either, it's time to check the commit history:
bash $ git blame perly.y ... lines snipped ... ^8d063cd perl.y (Larry Wall 1987-12-18 00:00:00 +0000 645) texpr : /* NULL means true f05e27e5 perly.y (Dave Mitchell 2006-12-04 15:38:05 +0000 646) { YYSTYPE t f05e27e5 perly.y (Dave Mitchell 2006-12-04 15:38:05 +0000 647) (void)sca f05e27e5 perly.y (Dave Mitchell 2006-12-04 15:38:05 +0000 648) $$ = tmpl ^8d063cd perl.y (Larry Wall 1987-12-18 00:00:00 +0000 649) | expr ^8d063cd perl.y (Larry Wall 1987-12-18 00:00:00 +0000 650) ; ^8d063cd perl.y (Larry Wall 1987-12-18 00:00:00 +0000 651)
The comment that NULL means true dates from 1987, and checking the logs shows it to be the very first commit! I'd never seen that commit before, but I like it so much I want to quote the whole thing here:
a "replacement" for awk and sed [ Perl is kind of designed to make awk and sed semi-obsolete. This posting will include the first 10 patches after the main source. The following description is lifted from Larry's manpage. --r$ ] Perl is a interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC-PLUS.) Expression syntax corresponds quite closely to C expression syntax. If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don't want to write the silly thing in C, then perl may be for you. There are also translators to turn your sed and awk scripts into perl scripts.
This could use an update to perlsyn, I think, if it's not mentioned already. I don't have any plans to attempt that part of it, so consider this an open invitation.
]]>Thanks!
]]>&
specifically prevents doing that look up in the first place.
That's relatively common knowledge. There's another way to disable prototype handling, or rather, to replace prototype handling: the call checker. The prototype handler is the default call checker for any sub when its created, and its the only one that comes built into Perl, but it's possible to provide a replacement and attach it to individual subs of your choosing. The call checker takes affect when determining what to do when each sub is called - subject to the existing limitations of prototypes - and that enables a new class of optimizations.
The last example given for prototypes above, is generally thought of a way to give a name to a constant, but what it's actually doing is declaring a sub that's constrained enough that it's legal for it to be inlined. In other words, because sub CONSTANT() { 5; }
has a prototype that makes it illegal to give it arguments, and because the body of the sub is just a single constant, there's no way for that sub to have any side-effects; and since it can't have any side effects, it's possible to skip calling the sub altogether and replace any calls to it with the number 5.
The core limits that optimization to the case of an empty prototype and a sub with only a constant (roughly...) in its call checker, but since that call checker can be overridden, it's possible for a CPAN module to do the same thing but with a more expansive definition for what can be inlined. I'm working on a module to do this now and will post more on it as it develops.
It also enables having something like sub debug { print STDERR ... }
, where all calls to debug are not just no-ops, but don't even exist in the compiled version of the program that's being executed. I don't know of a CPAN module for this, and would be more than happy if someone could point me to one that does this in the comments!
sub foo($$){} foo 1, 2, 3; # Errors during compilation
sub foo(){} foo; # Equivalent to foo($);
<sub foo($) foo bar, baz; # Equivalent to foo(bar), baz();
sub foo(\@\@){} foo @bar, @baz; # Equivalent to foo(\@bar, \@baz){} instead of push @arglist, @foo; push @arglist, @bar; foo(@arglist){}
sub foo(){ 5;} $bar = foo; # Equivalent to writing "$bar = 5"
The root idea, which I should have been clearer about, was to have $/ control that kind of matching, and the reason it got shut down (or at least deferred) is the fact that it would mean pattern matching rather than sequence matching, and the list was worrying about going down the rabbit hole.
]]>Less well known is Perl's support for the escape sequence '\R'.
What's \R?
It's definitely not the inverse of \r.
It's a pattern (so for now its only useful in regexes) that matches Unicode's TR-13, The Unicode Consortium's guidelines for what counts as a newline. It's useful in a regular expression to match \r, \n, \r\n, or a few other character sequences that are used to represent newlines, so you don't have to remember them. It's worth noting that it is a character sequence, not a character, so it doesn't really make sense in a bracketed character class.
There has been talk on perl5-porters over the course of 5.20's development cycle about how to expand that out of regular expressions into stream processing, and while nothing has happened in core yet, the discussion inspired a module I've posted to CPAN: PerlIO::unicodeeol. Using that module will add a PerlIO layer that will convert anything that matches \R into a simple \n on input, so that text can be processed in a uniform way without regard to whether it had \r, \r\n, \n, or any of the rest as the line ending. All that's necessary to use it is to add ":unicodeeol" with binmode or when opening the file (perldoc PerlIO has far more details), and now everything considered a line ending in Unicode looks like \n.
The process is not reversible though, so this is not suitable if you want to preserve the actual line ending.
Thanks to Karl Williamson for his work on Unicode in Perl, and Audrey Tang, whose PerlIO::eol I cribbed from.
]]>All of this is what I'm working on, but I don't have a commit bit, so it's not going anywhere without getting thoroughly vetted and blessed first.
I'm posting my code at http://github.com/PeterMartini/perl, in the peter/signatures branch (meant to be kept in tandem with doy/subroutine-signatures).
GOAL:
Part 1:
In the scope of use feature "signatures" (or whatever)
sub foo($bar,$baz) {}
equivalent to sub foo { my ($bar, $baz) = @_; }
sub foo($bar,@baz) {}
equivalent to sub foo { my ($bar, @baz) = @_;}
sub foo($bar,%baz) {}
equivalent to sub foo { my ($bar, %baz) = @_;}
Part 2:
Add a *new* way to access this information. This does not replace prototype, and is not meant to. I'd vote for 'signature', either as a keyword or in some namespace (Scalar::Util?)
I haven't thought this one through yet, but the big concern will be making sure it doesn't block future growth.
Part 3:
For backwards compatability, add a proto attribute to allow for the old behavior.
sub foo($bar,$baz) : proto($$) {}
equivalent to sub foo($$) { my ($bar,$baz) = @_;}
I don't plan to protect this by a feature.
I don't care either way whether:
sub foo($$);
sub foo($$) : proto($$){}
is legal or not, but will allow it as long as all of the prototypes match.
Note that:
sub foo($$);
sub foo($bar,$baz) : proto($$) {}
MUST have a proto attribute, or it will die due to the prototype mismatch (prototype would be none)
Part 4:
In the check phase of compilation, implicitly convert:
sub foo { my ($bar, $baz) = @_;}
to (internally) sub foo($bar,$baz){}
If and only if the assignment is the first statement of the sub, the named variables are the first entries in the PAD (for technical reasons) and @_ is not modified.
SELECTED POINTS OF CONTENTION:
1. Q: What is the value, if its just saving keystrokes?
Possible A: It's wanted often enough that several CPAN variants exist
Possible A: It *may* improve performance
Possible A: It adds another method of introspection, allowing the code to formally declare its own parameters.
2. Q: Will this prevent custom CPAN modules from implementing their own syntax?
My A: No. The signature will be controlled by a feature, which means if a 3rd party module is used, it can simple shut off the built in signature.
3. Q: Should a named parameter list *be* the prototype (accessible through prototype(\&sub), though not necessarily affecting parsing)?
My A: For backwards compatibility reasons, no. The current prototype is a simple string; I think it may make much more sense to expose the signature information as a string an array of hashes (depending on wantarray / GIMME), to allow room for growth.
4. Q: Do we really want to create lexical variables without an explicit ‘my’?
My A: I originally allowed for an explicit my, but there seemed to be no other reasonable options so I'd dropped it as redundant.
5. Q: What about @_?
My A: At this stage, I have no intention of touching it. It will continue to be available for read-write access, and it is necessary if a sub wants to check the count of arguments it was passed (sub foo($bar) would have no way of seeing the second, third, etc argument)
I'm sure I missed plenty of questions and answers - I'll add them either here or in future posts as I spot them.
]]>