Mystery Buglet #2

Hey! I know, I know: long time, no blog.  I would love to blame the pandemic, but the truth is, I just haven’t been inspired with any sufficiently Perl-y topics lately.  Until recently, when I ran into this.

Now, once upon a time, I wrote a post about a small buglet I had encountered.  The post presented the problem, then asked you if you saw the problem, then explained what was going on.  So let’s do that again.  First, the code:
sub generate_temp_file
state $TMPFILES = [];
END { unlink @$TMPFILES }
my $tmpfile = `script-which-generates-tempfile`;
chomp $tmpfile;
push @$TMPFILES, $tmpfile;
return $tmpfile;

As before, the actual code does a bit more, but I promise I’ve omitted nothing that’s relevant to the bug.  Do you see it?  If not, click to find out more.

You might be able to spot the bug just from reading the code, but I can also offer you a big hint by telling you what the actual error was.  First off, I should note that there was no error in my testing.  Then I committed the code and pushed it out to the repo, where all my fellow developers propmptly downloaded it and started using it ... and there was still no error.  Then, one coworker (our sysadmin, as it happened) ran the script which used this code in a particular way, and reported this error: Can't use an undefined value as an ARRAY reference at ... and the line number that contained unlink @$TMPFILES.

To see what’s going on here, it’s worth taking a brief detour into what a state variable is.  If you’re familiar with the C-based family of languages (e.g. C++, and I’m pretty sure Java as well), it’s what they would refer to as a static variable.  I don’t know if I agree that state was the best name for it, but it sure beats the hell out of static.

So what’s a state variable?  Well, it’s a bit like a global variable ... and also not.  To explain that apparent contradiction, and also to address why state variables are awesome while everyone knows that globals are bad, we need to pick apart that stereotype.  Are all global variables bad?  Well ... depends on what you mean by “global.” See, when we talk about a “global variable,” we’re talking about a variable with global scope.  And scope actually consists of two distinct parts: visibility, and lifetime.  Most of the variables that we refer to as “globals” are those which have global visibility and global lifetime.  And those are definitely bad.  But they’re bad because the global visibility part is bad.  That’s what causes all the trouble.  But the global lifetime part ... nothing wrong with that at all.  And a state variable is one which has a global lifetime, but only block visibility.1  And that’s not bad at all.  It’s quite useful, in fact.

Now, if my tempfile were being generated by Perl, I would of course use File::Temp (or something which in turn used it, such as Path::Tiny), and that would handle the cleanup for me.  But, since the file is being generated by some script, I need to arrange that cleanup myself.  How do I do that? simple: by using an END block, which will always get called when my program exits.2  Admittedly, my simplistic use of it assumed that unlink is fine with receiving no arguments (e.g. in the case where the function hasn’t actually been called yet, and thus @$TMPFILES is empty).  By the way, if you’re wondering why I’m using state $TMPFILES = [] and @$TMPFILES instead of state @TMPFILES and @TMPFILES, it’s because one of state’s quirks is, it will only work with scalar variables.

But, as it turns out, unlink is fine with getting an empty list (I tested it).  So that wasn’t the problem.  Still, the last two sentences of the previous paragraph, when combined, contain the answer to the mystery.  If you haven’t spotted it by now, you may want to take a moment to reread them carefully and see if I you see it before proceeding further.

Still stumped?

Very well, then.  Read on.

What does happen if the function is never called?  Well, $TMPFILES still exists: it’s a (sorta kinda) global, so the END block can access it perfectly fine even if the function is never executed.  And that’s important, because END blocks are processed at compile time.  In fact, that very sticking point is why I’m using the state variable in the first place.  That is, why can’t I just do it this way?
sub generate_temp_file
my $tmpfile = `script-which-generates-tempfile`;
chomp $tmpfile;
END { unlink $tmpfile }
push @$TMPFILES, $tmpfile;
return $tmpfile;

It’s because END happens at compile-time, when $tmpfile hasn’t been set yet.  Not to mention what happens if generate_temp_file is called multiple times: the END block only gets added to the chain of END blocks for the program once, so it only gets called once: if there’s a possibility of this function happening multiple times, I need to store my tempfiles in an array.  None of those would be an issue if END happened at run-time, of course.  But that ain’t the way it works.3

So I’ve set it up to handle all that, by using a (semi-)global state var, which will always exist, and can contain multiple things, and can get processed once by the END block.  Except that I had to use $TMPFILES = [] instead of @TMPFILES, like I really wanted, and that’s where it all fell apart.  See, the variable certainly exists whether the function is executed or not ... but it only gets assigned the first time it’s called.  So, before the function is called for the first time, $TMPFILES is not [] ... it’s undef.  And that’s what triggered the error message, of course.  If state would let me declare an array instead of an arrayref, I wouldn’t have had the problem, but, once again: that ain’t the way it works.

So, in the end, once I finally realized the problem, the fix was trivial:
-    END { unlink @$TMPFILES }
+ END { unlink @$TMPFILES if $TMPFILES }

And now it works whether the function is called (which it always was in my testing, and always was in most of my coworkers’ usages), or whether it’s never called at all (which was the case when my sysadmin ran it).  And it taught me a valuable lesson about the interaction of seemingly unrelated implementation details of the language.  And now I’ve shared it with you.

Hopefully it’s been helpful.


1 Assuming you declare it inside a block.  I suppose a state var outside any block would have file visibility, or package visibility, or somesuch.  But that’s a more esoteric usage that we don’t really need to get into.

2 Well, not always ... in fact, the (unflattering) comparison between Perl’s END and bash’s trap ... EXIT was one of the points I made in my post on Perl vs shell scripts (see the “Commands on Exit” section).

3 Although, it occurs to me that, if I were using Perl 5.36+, I could probably work around this by using defer instead of END.  I think.

1 Comment

Note that in v5.28 and onwards, you can declare arrays and hashes as state variables (state @TMPFILES). Of course, that doesn't help you if you need this to run on older versions.

Leave a comment

About Buddy Burden

user-pic 16 years in California, 27 years in Perl, 36 years in computers, 57 years in bare feet.