Do not use each

The each hash iterator has two problems which almost nobody is aware of, and which might lead to unexpected and hard to detect problems. We were using the perl-compiler since 12 years in production and just recently found out that the B function walksymtable will miss random symbols from time to time. It entirely depends on hash randomization. I cannot predict what each did to your code.

First problem: Do not change the data

Adding or deleting a key might lead to rehashing, which will re-order the keys. The iterator will just continue and then either miss keys or get keys again a second time.

perldoc -f each:

If you add or delete a hash's elements while iterating over it, entries may be skipped or duplicated--so don't do that. Exception: It is always safe to delete the item most recently returned by "each()", so the following code works properly:

while (($key, $value) = each %hash) {
    print $key, "\n";
    delete $hash{$key};   # This is safe
}

Below is the bug fixed with http://perl5.git.perl.org/perl.git/commit/5cc8528c900964306cba9b53c6eaa27af540eaea?f=ext/B/B.pm but even the p5p committer was not able to understand the underlying problem, why it failed. This bug was in core perl since 5.005 (1998) until 5.18.

sub walksymtable {
    my ($symref, $method, $recurse, $prefix) = @_;
    my $sym;
    my $ref;
    my $fullname;
    no strict 'refs';
    $prefix = '' unless defined $prefix;
    while (($sym, $ref) = each %$symref) {
        $fullname = "*main::".$prefix.$sym;
        if ($sym =~ /::$/) {
            $sym = $prefix . $sym;
            if (svref_2object(\*$sym)->NAME ne "main::"
                && $sym ne "<none>::"
                && &$recurse($sym))
            {
                walksymtable(\%$fullname, $method, $recurse, $sym);
            }
        } else {
            svref_2object(\*$fullname)->$method();
        }
    }
}

The fix is to replace

while (($sym, $ref) = each %$symref)

with

foreach my $sym (keys %$symref) {
    my $ref = $symref->{$sym};

walksymtable walks all entries in stashes (symbol table hashes) and all its children, and calls a method on each symbol. There is no apparent destructive code in walksymtable which could change the stash, but calling methods might lead to changes. Here it's &$recurse and ->$method(). It could be totally unexpected, e.g. loading a module which changes the symboltable or @ISA hierarchy, which might lead to skipped symbols. Or just parsing normal perl code, such as attributes or named regex references will magically load code and change the symbol table.

each is a bit more performant then foreach, because foreach stores the list to iterate over and just advances the index into it. But you are still able to change entries in the hash with for/foreach, you are safe to use recursion and you are safe from unexpected side-effects as in the walksymtable function.

Second problem: Not re-entrant

while (($key, $value) = each %hash) {
    use Data::Dumper;
    print $key, "\n";
    delete $hash{$key};
    print Dumper(\%hash);  # This is a bug
}

The problems is that Data::Dumper internally uses each to iterate over the hash, which resets the iterator of the hash to the end. The first while loop will not be able continue, the iterator cannot even be stored and reset manually. This is a Data::Dumper bug not detected and fixed in core perl for 16 years, since perl 5.005 (1998).

each not being re-entrant is not even documented.

The hash iterators should have really been stored in the op, not the hash. This is one of the oldest perl5 bugs which will never be fixed.

So each should be treated as in php: Avoid it like a plague. Only use it in optimized cases where you know what you are doing, but not in libraries as you saw with Data::Dumper.

9 Comments

Not using each() seems like good advice in general.

I think that second problem unfairly singles out Data::Dumper, though - the bug is in the example code, not Data::Dumper itself. The iterator would also be reset if Dumper() used keys() instead of each(). Your example only masks that because it's unconditionally deleting each element - I'd expect something like delete $hash{$key} if rand > 0.5; should start showing duplicate output, for example.

So it's the same as the first issue - "don't call other code that can access the hash you're using each() on, unless you can guarantee it's not going to touch the iterator now or in future".

This shared iterator behaviour is documented:

Each hash or array has its own internal iterator, accessed by "each", "keys", and "values". The iterator is implicitly reset when "each" has reached the end as just described; it can be explicitly reset by calling "keys" or "values" on the hash or array.

although that could really do with a more explicit warning!

Yeah, the rule is to use each only if nobody else anywhere is looking at the hash. You got a hash passed in? Don’t use each. You’re passing the hash to anyone? Don’t use each. If you’re looping over the hash in the same scope it was declared in, and only that scope ever looks at it, then each is fine. That leaves quite a few places where you can use each, but it also means there are tons of places where you can’t.

I would hope this isn’t something that “almost nobody is aware of” though! I realised this almost as soon as I first learned about each, not more than a few months after I started learning Perl. The part about keys resetting the iterator should be a dead give-away.

(I also realised immediately that for a very similar reason, the flip-flop operator is essentially useless outside of one-liners – only much worse, because there is no way to reset its state.)

I've also avoided 'each' since a few years ago. Is there a Perl::Critic policy to forbid using each? There should be. And perhaps perl can emit experimental warning for usage of 'each' in version 5.2x? :-)

If you’re looping over the hash in the same scope it was declared in, and only that scope ever looks at it, then each is fine.
...provided you don't modify the scoped hash while you're looping, don't nest loops over the same hash, and (if you're looping through the hash more than once in the scope) that you don't terminate one of the loops prematurely (i.e. last out of it) leaving the iterator half way through the hash at the start of the next loop.

All in all, each seems so brittle that it's just much safer to avoid it entirely. Even when you know what you're doing and your original usage was entirely safe, there's no guarantee that a less careful or less knowledgeable future maintainer won't inadvertently extend the code in one of those many subtly dangerous ways.

Incidentally, the Var::Pairs module provides a(n expensive) simulation of per-op each, if you really like the iterative approach, but don't want to use the built-in per-hash each.

I only hit a problem with each recently (see http://www.martin-evans.me.uk/node/159). Using a hash in list context will use the iterator internally so simply assigning the hash you are iterating over to another hash will cause an infinite loop.

I now never use each. Even if you find the narrow safe cases someone else can come along and easily break your code.

I've also seen situations where each was called in a loop that short-circuits:

while( my ($key, $val) = each %hash ) {
    last if $key =~ /foo/;
}

Then later on, each is used again without resetting the internal state, even though the programmer expected to iterate over the entire hash. With a hash stored globally (yes, this is bad in itself, but it happens) in a persistent environment like mod_perl, this can even happen across different requests in the same process.

I wonder if there's some way to make the each op warn if the iterator isn't where it's expected to be.

Unfortunately, glob is similarly broken:


perl -E 'for my $x (qw(* )) { print "$x: ", scalar glob($x), "\n"}'

Sorry, should have been:

perl -E 'for my $x (qw(* < >)) { print "$x: ", scalar glob($x), "\n"}'

Leave a comment

About Reini Urban

user-pic Working at cPanel on B::C (the perl-compiler), p2, types, parrot, B::Generate, cygwin perl and more guts (LLVM, jit, optimizations).