Do not use each
The each
hash iterator has two problems which almost nobody is aware
of, and which might lead to unexpected and hard to detect problems.
We were using the perl-compiler since 12 years in production and just
recently found out that the B function walksymtable
will miss random
symbols from time to time. It entirely depends on hash randomization.
I cannot predict what each
did to your code.
First problem: Do not change the data
Adding or deleting a key might lead to rehashing, which will re-order the keys. The iterator will just continue and then either miss keys or get keys again a second time.
perldoc -f each
:
If you add or delete a hash's elements while iterating over it, entries may be skipped or duplicated--so don't do that. Exception: It is always safe to delete the item most recently returned by "each()", so the following code works properly:
while (($key, $value) = each %hash) {
print $key, "\n";
delete $hash{$key}; # This is safe
}
Below is the bug fixed with http://perl5.git.perl.org/perl.git/commit/5cc8528c900964306cba9b53c6eaa27af540eaea?f=ext/B/B.pm but even the p5p committer was not able to understand the underlying problem, why it failed. This bug was in core perl since 5.005 (1998) until 5.18.
sub walksymtable {
my ($symref, $method, $recurse, $prefix) = @_;
my $sym;
my $ref;
my $fullname;
no strict 'refs';
$prefix = '' unless defined $prefix;
while (($sym, $ref) = each %$symref) {
$fullname = "*main::".$prefix.$sym;
if ($sym =~ /::$/) {
$sym = $prefix . $sym;
if (svref_2object(\*$sym)->NAME ne "main::"
&& $sym ne "<none>::"
&& &$recurse($sym))
{
walksymtable(\%$fullname, $method, $recurse, $sym);
}
} else {
svref_2object(\*$fullname)->$method();
}
}
}
The fix is to replace
while (($sym, $ref) = each %$symref)
with
foreach my $sym (keys %$symref) {
my $ref = $symref->{$sym};
walksymtable walks all entries in stashes (symbol table hashes) and
all its children, and calls a method on each symbol. There is no
apparent destructive code in walksymtable which could change the
stash, but calling methods might lead to changes. Here it's &$recurse
and ->$method()
. It could be totally unexpected, e.g. loading a module
which changes the symboltable or @ISA
hierarchy, which might lead to
skipped symbols. Or just parsing normal perl code, such as attributes
or named regex references will magically load code and change the
symbol table.
each
is a bit more performant then foreach, because foreach
stores the list
to iterate over and just advances the index into it.
But you are still able to change entries in the hash with for/foreach, you
are safe to use recursion and you are safe from unexpected
side-effects as in the walksymtable
function.
Second problem: Not re-entrant
while (($key, $value) = each %hash) {
use Data::Dumper;
print $key, "\n";
delete $hash{$key};
print Dumper(\%hash); # This is a bug
}
The problems is that Data::Dumper
internally uses each to iterate over
the hash, which resets the iterator of the hash to the end. The first
while loop will not be able continue, the iterator cannot even be
stored and reset manually.
This is a Data::Dumper bug not detected and fixed in core perl for 16
years, since perl 5.005 (1998).
each not being re-entrant is not even documented.
The hash iterators should have really been stored in the op, not the hash. This is one of the oldest perl5 bugs which will never be fixed.
So each should be treated as in php: Avoid it like a plague. Only use it in optimized cases where you know what you are doing, but not in libraries as you saw with Data::Dumper.
Not using each() seems like good advice in general.
I think that second problem unfairly singles out Data::Dumper, though - the bug is in the example code, not Data::Dumper itself. The iterator would also be reset if Dumper() used keys() instead of each(). Your example only masks that because it's unconditionally deleting each element - I'd expect something like delete $hash{$key} if rand > 0.5; should start showing duplicate output, for example.
So it's the same as the first issue - "don't call other code that can access the hash you're using each() on, unless you can guarantee it's not going to touch the iterator now or in future".
This shared iterator behaviour is documented:
although that could really do with a more explicit warning!
Yeah, the rule is to use
each
only if nobody else anywhere is looking at the hash. You got a hash passed in? Don’t useeach
. You’re passing the hash to anyone? Don’t useeach
. If you’re looping over the hash in the same scope it was declared in, and only that scope ever looks at it, theneach
is fine. That leaves quite a few places where you can useeach
, but it also means there are tons of places where you can’t.I would hope this isn’t something that “almost nobody is aware of” though! I realised this almost as soon as I first learned about
each
, not more than a few months after I started learning Perl. The part aboutkeys
resetting the iterator should be a dead give-away.(I also realised immediately that for a very similar reason, the flip-flop operator is essentially useless outside of one-liners – only much worse, because there is no way to reset its state.)
I've also avoided 'each' since a few years ago. Is there a Perl::Critic policy to forbid using each? There should be. And perhaps perl can emit experimental warning for usage of 'each' in version 5.2x? :-)
All in all, each seems so brittle that it's just much safer to avoid it entirely. Even when you know what you're doing and your original usage was entirely safe, there's no guarantee that a less careful or less knowledgeable future maintainer won't inadvertently extend the code in one of those many subtly dangerous ways.
Incidentally, the Var::Pairs module provides a(n expensive) simulation of per-op each, if you really like the iterative approach, but don't want to use the built-in per-hash each.
I only hit a problem with each recently (see http://www.martin-evans.me.uk/node/159). Using a hash in list context will use the iterator internally so simply assigning the hash you are iterating over to another hash will cause an infinite loop.
I now never use each. Even if you find the narrow safe cases someone else can come along and easily break your code.
I've also seen situations where
each
was called in a loop that short-circuits:Then later on,
each
is used again without resetting the internal state, even though the programmer expected to iterate over the entire hash. With a hash stored globally (yes, this is bad in itself, but it happens) in a persistent environment like mod_perl, this can even happen across different requests in the same process.I wonder if there's some way to make the each op warn if the iterator isn't where it's expected to be.
Unfortunately,
glob
is similarly broken:Sorry, should have been: