Closures

A casual remark about closures which I made in My Favorite Warnings: redefine touched off a long off-topic exchange with Aristotle that I thought ought to be promoted to a top-level blog entry. The big thing I learned was that any Perl subroutine can be a closure. The rest of this blog will try to make clear why I now believe this. The words are my own, as are any errors or misconceptions.

The second sentence of Wikipedia's definition of a closure says "Operationally, a closure is a record storing a function together with an environment." This makes it sound a lot like an object, and therefore of little additional interest in an O-O environment.

But I came to closures pragmatically through Perl, and to me they were a magic way to make data available somewhere else. All I had to do was get a code reference where it needed to be, and any external lexical variables got the values at the time the reference was taken. So much I understood up to the fatal blog post, and it sufficed for my simple needs.

A fairly typical Perl example of closures in action is:

my $add_one = make_adder( 1 );
my $add_two = make_adder( 2 );
say $add_one->( 5 );  # 6
say $add_two->( 8 );  # 10

sub make_adder {
    my ( $addend ) = @_;
    return sub { $addend + $_[0] };
}

A deeper understanding of this magic requires a look at how it works. In this case, the man behind the curtain is the lexical variable $addend.

The usual argument for using lexical variables is that they are only accessable in a given scope -- specifically from the point they come into existence with my to the end of the enclosing scope. Compared to global variables this is a big win in itself. But closures make use of another property of lexical variables: that each entry into the scope results in a new instance of the variable. The returned code reference incorporates not only the code itself but the current instance of any relevant lexicals, in this case $addend. In fact if you add to the above code example

say '$add_one is ', $add_one;
say '$add_two is ', $add_two;

you will see that though both variables nominally refer to the same subroutine, different values are printed.

To differentiate the behaviour of closures from a simple object, consider:

my ( $add_two, $mul_two, $chg_two ) = make_closures( 2 );
my ( $add_four, $mul_four, $chg_four ) = make_closures( 4 );

say $add_two->( 3 );  # 5
say $mul_two->( 3 );  # 6
say $add_four->( 3 ); # 7
say $mul_four->( 3 ); # 12

$chg_two->( 5 );      # What will happen now?

say $add_two->( 3 );  # 8 !!!
say $mul_two->( 3 );  # 15 !!!
say $add_four->( 3 ); # Still 7
say $mul_four->( 3 ); # Still 12

sub make_closures {
    my ( $operand ) = @_;
    return(
        sub { $operand + $_[0] },
	sub { $operand * $_[0] },
	sub { $operand = $_[0] },
    );
}

The point of this example is that all three closures share not only the value of $operand but the actual instance. If the value of the instance is changed by one closure, the other closures over that instance of the variable see the new value; but closures over other instances of that same variable do not. This was new to me, but not really startling.

The big surprise to me was the realization that any Perl subroutine can be a closure. A normal named subroutine simply closes over the first instance of the variable -- the one that was current when it was compiled.

To see this, add to the end of sub make_closures, just before the right curly bracket,

    no warnings 'closure';
    sub add_first { $operand + $_[0] }

Subroutine add_first() will only see the first instance of $operand, which is the point of this example. But we need to silence a warning since nested named subroutines making use of the outer subroutine's arguments are so far from normal Perl that they may well represent an error on the part of the programmer. (Translation of the previous sentence: Don't try this at home.)

If we then insert calls to add_first( 3 ) both before and after the call to $chg_two->( 5 ); we see that add_first() returns the same value as $add_two->(), which shows that it accesses the same instance of $operand as $add_two and friends.

After all the above, it should come as no surprise that in the following, sub increment{} and sub decremnt{} close over $number.

{
    my $number = 0;
    sub increment {
        return $number++;
    }
    sub decrement {
        return --$number;
    }
}

No warning is generated because there is only ever one instance of $number. Of course, if there were only one subroutine above, it could also be written with a state variable. But the whole question of my, our, state, and use vars is a subject for another time.

A demonstration of all the points discussed in this blog post is available on GitHub as closures.t.

2 Comments

Perl internally stores lexicals for a scope in what it calls pads (an array of variables, essentially). There is a lot of trickery going on to reuse both whole pads as well as individual entries, and generally as much of any allocated internal data structures, as possible. If a variable could be deallocated at the end of a scope, it is instead kept around and used again next time the scope is entered, to avoid throwing all of it away only to create a whole new largely identical variable next time around. In my own testing relating to our conversations on the topic, I also found other weird aspects of this, e.g. relating to compile time vs runtime.

But note that all of this trickery is only implementation details at the guts level. At the Perl language level, the behaviour you get is semantically equivalent to what you would get if every entry into the scope did create a new instance – just with less overhead than a naïve implementation would entail.

(At least in older times. I don’t know how helpful these optimisations really are today. For a long time, Perl was the fastest of the “P languages”. Nowadays it is eclipsed by other languages and instead has the rather more questionable distinction of being one of the most power-hungry languages (in CPU power consumption per unit of computation). All the little extra internal checks and heavy reuse of memory regions with mixed access patterns seems a good candidate for why… though take this more as uneducated speculation than a real guess.)

Leave a comment

About Tom Wyant

user-pic I blog about Perl.