A widespread and broken Perl idiom
The following code is using a widespread Perl idiom, taking advantage of features designed for one-liners:
my $content = do { local ( @ARGV, $/ ) = ("$file"); <> };
Another trick here is the array assignment, which slurps all elements of the list into @ARGV
, leaving $/
to undef
; that way, the diamond operator will read the entire file at once.
I've been using and publicizing this local @ARGV
idiom for years. And I've spent several days last week chasing a bug caused by this line of code.
The bug was hard to find
I'm using wallflower to generate a static version of a dynamic site. This site has several URL that can't be reached from the root of the site, so I use the --filter
option to feed it all the non-crawlable URL.
In filter mode, the code ends up reading all files on the command-line (or STDIN
if no files are given) using the same idiom:
local @ARGV = @_;
while (<>) {
...
}
In my case, the pages were not generated, even when explicitly listed. In fact, only the first link from the first file fed to the script was actually read (and recursively crawled).
It took me a while to figure out that Perl believed the ARGV
filehandle (the one read by <>
) had reached end-of-file, and stopped the while
loop after reading a single line. (I have to thank the perl debugger for this, especially its {
command. It allowed me to keep track of the value of eof ARGV
while following my program's steps.)
I fixed the issue in a small commit. (And yes, that means the --filter
feature has been broken from the moment it was introduced, in October 2012).
The cause is well documented
The issue is that the magical <>
touches all three @ARGV
, $ARGV
and ARGV
variables, so localizing @ARGV
is not enough.
The inner code that slurped an entire file using <>
and an undef $/
left ARGV
at end-of-file. This caused the outer while
loop to stop far too early. Talk about action at a distance!
This bit from perlop about <>
says it all:
It really does shift the
@ARGV
array and put the current filename into the$ARGV
variable. It also uses filehandleARGV
internally.<>
is just a synonym for<ARGV>
, which is magical.
perlvar has the details about those three variables:
@ARGV
: The array @ARGV contains the command-line arguments intended for the script. $#ARGV is generally the number of arguments minus one, because $ARGV[0] is the first argument, not the program's command name itself. See$0
for the command name.$ARGV
: Contains the name of the current file when reading from<>
.ARGV
: The special filehandle that iterates over command-line filenames in @ARGV. Usually written as the null filehandle in the angle operator<>
. Note that currentlyARGV
only has its magical effect within the<>
operator; elsewhere it is just a plain filehandle corresponding to the last file opened by<>
. In particular, passing "*ARGV" as a parameter to a function that expects a filehandle may not cause your function to automatically read the contents of all the files in @ARGV.
The fix is simple
Whenever using local @ARGV
in combination with <>
, one should always localize the entire *ARGV
glob.
So could you show us the correct idiom, please?
This is explained in the last section, and the linked patch shows an example. The first code excerpt became:
my $content = do { local ( *ARGV, $/ ); @ARGV = ("$file"); <> };
To be conviced of how widespread the technique is, just check: https://grep.metacpan.org/search?q=local.*@ARGV
I became aware of this as far back as my PerlMonks days, so at least 15 years ago. Namely, I memorised that I should be writing the idiom like this:
my $content = do { local ( *ARGV, $/ ) = [ ... ]; <> };
This differs from the incorrect version by exactly 3 character swaps.
Sadly, this elegant approach doesn't solve the original problem.
Localized assignment to a typeglob only localizes the slot being assigned to. The rest of the typeglob remains unlocalized, which means the magic
<>
still messes up the global$ARGV
and the global*ARGV{IO}
filehandle.For example:
produces:
Oh, wow.
I had to go all the way to
to make it work in a single statement. Even something like
wouldn’t work, despite the fact that circumfix deref is effectively a
do { }
block with a whole separate inner scope (e.g.*{ my $x = 'hi'; local \*ARGV } = [ __FILE__ ]; say $x;
is a strict vars violation).I don’t think
local
is that super-intelligent, which means what’s going on must be something likelocal
only schedules localisation but that it doesn’t actually happen until the next point at which the temps stack gets cleaned up… or something like that. (I’m not actually a guts hacker, unfortunately. It would help to read the actual implementation of localisation…)PS: There is of course no point in writing it that way once it becomes that subtle and that much of a mouthful. The two-statement version is simple and obvious.