Mystery Buglet #2
Hey! I know, I know: long time, no blog. I would love to blame the pandemic, but the truth is, I just haven’t been inspired with any sufficiently Perl-y topics lately. Until recently, when I ran into this.
Now, once upon a time, I wrote a post about a small buglet I had encountered. The post presented the problem, then asked you if you saw the problem, then explained what was going on. So let’s do that again. First, the code:sub generate_temp_file
{
state $TMPFILES = [];
END { unlink @$TMPFILES }
my $tmpfile = `script-which-generates-tempfile`;
chomp $tmpfile;
push @$TMPFILES, $tmpfile;
return $tmpfile;
}
As before, the actual code does a bit more, but I promise I’ve omitted nothing that’s relevant to the bug. Do you see it? If not, click to find out more.
You might be able to spot the bug just from reading the code, but I can also offer you a big hint by telling you what the actual error was. First off, I should note that there was no error in my testing. Then I committed the code and pushed it out to the repo, where all my fellow developers propmptly downloaded it and started using it ... and there was still no error. Then, one coworker (our sysadmin, as it happened) ran the script which used this code in a particular way, and reported this error: Can't use an undefined value as an ARRAY reference at ...
and the line number that contained unlink @$TMPFILES
.
To see what’s going on here, it’s worth taking a brief detour into what a state
variable is. If you’re familiar with the C-based family of languages (e.g. C++, and I’m pretty sure Java as well), it’s what they would refer to as a static
variable. I don’t know if I agree that state
was the best name for it, but it sure beats the hell out of static
.
So what’s a state
variable? Well, it’s a bit like a global variable ... and also not. To explain that apparent contradiction, and also to address why state
variables are awesome while everyone knows that globals are bad, we need to pick apart that stereotype. Are all global variables bad? Well ... depends on what you mean by “global.” See, when we talk about a “global variable,” we’re talking about a variable with global scope. And scope actually consists of two distinct parts: visibility, and lifetime. Most of the variables that we refer to as “globals” are those which have global visibility and global lifetime. And those are definitely bad. But they’re bad because the global visibility part is bad. That’s what causes all the trouble. But the global lifetime part ... nothing wrong with that at all. And a state
variable is one which has a global lifetime, but only block visibility.1 And that’s not bad at all. It’s quite useful, in fact.
Now, if my tempfile were being generated by Perl, I would of course use File::Temp
(or something which in turn used it, such as Path::Tiny
), and that would handle the cleanup for me. But, since the file is being generated by some script, I need to arrange that cleanup myself. How do I do that? simple: by using an END
block, which will always get called when my program exits.2 Admittedly, my simplistic use of it assumed that unlink
is fine with receiving no arguments (e.g. in the case where the function hasn’t actually been called yet, and thus @$TMPFILES
is empty). By the way, if you’re wondering why I’m using state $TMPFILES = []
and @$TMPFILES
instead of state @TMPFILES
and @TMPFILES
, it’s because one of state
’s quirks is, it will only work with scalar variables.
But, as it turns out, unlink
is fine with getting an empty list (I tested it). So that wasn’t the problem. Still, the last two sentences of the previous paragraph, when combined, contain the answer to the mystery. If you haven’t spotted it by now, you may want to take a moment to reread them carefully and see if I you see it before proceeding further.
Still stumped?
Very well, then. Read on.
What does happen if the function is never called? Well,
$TMPFILES
still exists: it’s a (sorta kinda) global, so the END
block can access it perfectly fine even if the function is never executed. And that’s important, because END
blocks are processed at compile time. In fact, that very sticking point is why I’m using the state
variable in the first place. That is, why can’t I just do it this way?
sub generate_temp_file
{
my $tmpfile = `script-which-generates-tempfile`;
chomp $tmpfile;
END { unlink $tmpfile }
push @$TMPFILES, $tmpfile;
return $tmpfile;
}
It’s because END
happens at compile-time, when $tmpfile
hasn’t been set yet. Not to mention what happens if generate_temp_file
is called multiple times: the END
block only gets added to the chain of END
blocks for the program once, so it only gets called once: if there’s a possibility of this function happening multiple times, I need to store my tempfiles in an array. None of those would be an issue if END
happened at run-time, of course. But that ain’t the way it works.3
So I’ve set it up to handle all that, by using a (semi-)global state
var, which will always exist, and can contain multiple things, and can get processed once by the END
block. Except that I had to use $TMPFILES = []
instead of @TMPFILES
, like I really wanted, and that’s where it all fell apart. See, the variable certainly exists whether the function is executed or not ... but it only gets assigned the first time it’s called. So, before the function is called for the first time, $TMPFILES
is not []
... it’s undef
. And that’s what triggered the error message, of course. If state
would let me declare an array instead of an arrayref, I wouldn’t have had the problem, but, once again: that ain’t the way it works.
- END { unlink @$TMPFILES }
+ END { unlink @$TMPFILES if $TMPFILES }
And now it works whether the function is called (which it always was in my testing, and always was in most of my coworkers’ usages), or whether it’s never called at all (which was the case when my sysadmin ran it). And it taught me a valuable lesson about the interaction of seemingly unrelated implementation details of the language. And now I’ve shared it with you.
Hopefully it’s been helpful.
__________
1 Assuming you declare it inside a block. I suppose a state
var outside any block would have file visibility, or package visibility, or somesuch. But that’s a more esoteric usage that we don’t really need to get into.
2 Well, not always ... in fact, the (unflattering) comparison between Perl’s END
and bash’s trap ... EXIT
was one of the points I made in my post on Perl vs shell scripts (see the “Commands on Exit” section).
3 Although, it occurs to me that, if I were using Perl 5.36+, I could probably work around this by using defer
instead of END
. I think.
Note that in v5.28 and onwards, you can declare arrays and hashes as state variables (state @TMPFILES). Of course, that doesn't help you if you need this to run on older versions.