Think globally, act local-ly

Here’s a pattern which took me a while to figure out this past week, and therefore seemed worthy of sharing with you guys.

First, let’s set the situation.  (These are certainly not the only conditions under which you could use this pattern, but it’ll probably be easier to grasp it with a concrete example.)  Let’s say you have a script which you’re going to use to launch your other Perl scripts.  This script (in my example, it’s a bash script, but it could also be a super-minimal Perl script) has exactly one job: set up the Perl environment for all your personal or company modules.  That may even include pointing your $PATH at a different Perl (e.g. one installed via perlbrew or plenv) so you’re not reliant on the system Perl.  Here’s a typical example of what such a (bash) script might look like:

export PERL_LOCAL_LIB_ROOT=/my/perl/dir
export PATH=$PERL_LOCAL_LIB_ROOT/bin:$PATH
export PERL5LIB=$PERL_LOCAL_LIB_ROOT/lib/perl5:/my/company/modules/lib
export PERL_MM_OPT=
export PERL_MB_OPT=
exec perl "$@"

This is just an example; your environment variables could be totally different, or you might have more, or fewer, or what-have-you.  The point is, you use this “launcher” script to get everything exactly the way you want it, environment-wise, then you fire up the right Perl, pointing at the right set of libraries, and you’re all set.

Now, let’s pretend that, somewhere in the actual Perl script that was launched by the example script above, you need to launch a subcommand.  There are various subcommands that might need to be launched, but maybe some of those commands are other Perl scripts.  And not your Perl scripts; Perl scripts that depend on the system Perl and the modules installed on it.  Suddenly your carefully crafted environment, which was perfect for your personal or company scripts, is all wrong.

Perhaps your function to run commands looks something like this:
sub run_subcommand
{
    my ($type, $command) = @_;
    say STDERR "running command: $command" if DEBUG;

    $command = expand_variables($command);
    if ($type eq 'capture')
    {
        return `$command`;
    }
    elsif ($type eq 'tty')
    {
        return system($command) == 0;
    }
    else
    {
        die('bad type');
    }
}

This is just a rough idea of some extra things you might be doing before or during running the command, but it’s handy, because it means that it’s likely that all subcommands are being routed through this one routine, instead of having random system calls sprinkled all over the place.  And that’s good, because what we really need to do to fix this problem is to save the original values of all those environment vars we twiddled, then temporarily put them back when running commands via the shell.  And now we have a central place to do it.

Fixing the launcher script side is trivial:
# save existing values
export PERL_LOCAL_LIB_ROOT=$PERL_LOCAL_LIB_ROOT_ORIGINAL
export PATH=$PATH_ORIGINAL
export PERL5LIB=$PERL5LIB_ORIGINAL
export PERL_MM_OPT=$PERL_MM_OPT_ORIGINAL
export PERL_MB_OPT=$PERL_MB_OPT_ORIGINAL

# set to our specific environment
export PERL_LOCAL_LIB_ROOT=/my/perl/dir
export PATH=$PERL_LOCAL_LIB_ROOT/bin:$PATH
export PERL5LIB=$PERL_LOCAL_LIB_ROOT/lib/perl5:/my/company/modules/lib
export PERL_MM_OPT=
export PERL_MB_OPT=

# run using our environment
exec perl "$@"

Fixing the subcommand-runner side is a bit trickier.  We want to put the env vars back the way we found them before running the shell command, but we want them to go back to our “real” values afterwards.  We could do that manually, but we wouldn’t be able to return inside those if conditions any more, so it makes the code messier, and that’s a bummer.  Too bad Perl doesn’t give us some magical way to temporarily change the value of something and then automatically put it back when you leave a certain scope.

But, wait: Perl does give us that.  It’s called local, and of course as dutiful, modern Perl programmers, we’ve all learned that we should never use it.  You see, local works on global variables, and global variables are bad.  Everyone knows that.  So never use local.

Except ...

Except that your program’s environment is already global—that’s just the way process environments work.  %ENV is a global hash, as it pretty much has to be.  So, since we’re using a global variable anyway, no matter what, may as well use local to make our lives easier ... right?

In fact, local is so smart that we don’t even have to localize the entire %ENV hash.  Perl is perfectly happy to localize individual hash key/value pairs for us.  So we can actually do something like this:
sub run_subcommand
{
    my ($type, $command) = @_;
    say STDERR "running command: $command" if DEBUG;

    local $ENV{PERL_LOCAL_LIB_ROOT} = $ENV{PERL_LOCAL_LIB_ROOT_ORIGINAL}
    local $ENV{PATH} = $ENV{PATH_ORIGINAL}
    local $ENV{PERL5LIB} = $ENV{PERL5LIB_ORIGINAL}
    local $ENV{PERL_MM_OPT} = $ENV{PERL_MM_OPT_ORIGINAL}
    local $ENV{PERL_MB_OPT} = $ENV{PERL_MB_OPT_ORIGINAL}

    $command = expand_variables($command);
    if ($type eq 'capture')
    {
        return `$command`;
    }
    elsif ($type eq 'tty')
    {
        return system($command) == 0;
    }
    else
    {
        die('bad type');
    }
}
And that works great.  There ya go, case closed.  (Although note that, in a real-world example, you should really take more effort to make sure some malicious entity hasn’t been setting those environment variables to something bad and then calling your script without going through the launcher.)


Except ...

It bugs me.  It’s 5 lines of remarkably similar code.  If I have to add another environment variable to restore, I’ll likely copy-paste one of those existing lines, and then change the env var name ... in two places.  I better remember to change it in both places, or else I’ll get some really hard-to-find bugs.  Nope, I just don’t like it.  I know; I’ll just do it in a loop:
sub run_subcommand
{
    my ($type, $command) = @_;
    say STDERR "running command: $command" if DEBUG;

    my @env_vars = qw< PERL_LOCAL_LIB_ROOT PATH PERL5LIB PERL_MM_OPT PERL_MB_OPT >;
    foreach (@env_vars)
    {
        local $ENV{$_} = $ENV{$_ . '_ORIGINAL'}
    }

    $command = expand_variables($command);
    if ($type eq 'capture')
    {
        return `$command`;
    }
    elsif ($type eq 'tty')
    {
        return system($command) == 0;
    }
    else
    {
        die('bad type');
    }
}
Except that doesn’t actually work, because my foreach loop introduces a new scope, so the original values of my env vars are now restored before they’re ever even needed.  That’s pretty useless.  Looks like I’m stuck with doing it the long way.


Except ... no.  I refuse to accept that.  I hate doing things the long way.

How about this?
sub run_subcommand
{
    my ($type, $command) = @_;
    say STDERR "running command: $command" if DEBUG;

    my @env_vars = qw< PERL_LOCAL_LIB_ROOT PATH PERL5LIB PERL_MM_OPT PERL_MB_OPT >;
    local $ENV{$_} = $ENV{$_ . '_ORIGINAL'} foreach @env_vars;

    $command = expand_variables($command);
    if ($type eq 'capture')
    {
        return `$command`;
    }
    elsif ($type eq 'tty')
    {
        return system($command) == 0;
    }
    else
    {
        die('bad type');
    }
}
This seemed like a promising idea when I first thought of it: by eliminating the curly braces, I’ve eliminated the new scope, right?  Yeah, well, Perl disagreed with that assessment.  And, when Perl and I disagree, Perl mostly wins.  So it turns out that doesn’t work either.

But then I was struck with an inspiration: hash slices.
sub run_subcommand
{
    my ($type, $command) = @_;
    say STDERR "running command: $command" if DEBUG;

    my @env_vars = qw< PERL_LOCAL_LIB_ROOT PATH PERL5LIB PERL_MM_OPT PERL_MB_OPT >;
    local @ENV{@env_vars} = @ENV{map { $_ . '_ORIGINAL' } @env_vars};

    $command = expand_variables($command);
    if ($type eq 'capture')
    {
        return `$command`;
    }
    elsif ($type eq 'tty')
    {
        return system($command) == 0;
    }
    else
    {
        die('bad type');
    }
}
This was so simple and elegant that I nearly squeed with joy when it came to me.  (Actually it came to me right as I was on the way to bed one night; needless to say, I didn’t make it to bed for another hour or so.)  And it works beautifully.


One last twist I’ll give you.  What if your script (the one that runs the sub-commands, that is) might not be run by the launcher?  That is, what if there’s a chance that all those _ORIGINAL env vars might not exist?  The code as is would then be setting $PATH and all those to nothingness, which is obviously not a great idea.  So we need to conditionally set the vars.  Of course, an if introduces a new scope just like a foreach does, so we have to be make sure the local isn’t inside the if.  Our first naive attempt might be something like so:

sub run_subcommand
{
    my ($type, $command) = @_;
    say STDERR "running command: $command" if DEBUG;

    my @env_vars = qw< PERL_LOCAL_LIB_ROOT PATH PERL5LIB PERL_MM_OPT PERL_MB_OPT >;
    local @ENV{@env_vars};
    if (exists $ENV{PATH_ORIGINAL})     # assume that, if one is set, they're all set
    {
        $ENV{$_} = $ENV{$_ . '_ORIGINAL'} foreach @env_vars;
    }

    $command = expand_variables($command);
    if ($type eq 'capture')
    {
        return `$command`;
    }
    elsif ($type eq 'tty')
    {
        return system($command) == 0;
    }
    else
    {
        die('bad type');
    }
}
But it turns out that’s a bad idea.  See, when you localize a variable, it actually “throws away” the previous value (temporarily, of course).  It’ll be back later, but the point is: during the time it’s localized, the previous value is not accessible any more.  At all.  So that’s a bummer.  The end result is, in the code above, if the _ORIGINAL versions are set, everything works fine (but of course we had that much before).  In the case where they’re not set, all our Perl env vars end up undefined, which is often even worse than having them pointing at the wrong directories.  In short, this:
local $SOMEVAR;
is exactly the same as:
local $SOMEVAR = undef;
Now, I’ve personally long thought that this was not the right design choice, but that ship sailed a hell of a long time ago, so there’s no point in crying over milk under the bridge.  Or ... something.

Happily, this particular problem has a very simple solution: set your local variables to what they already were.
sub run_subcommand
{
    my ($type, $command) = @_;
    say STDERR "running command: $command" if DEBUG;

    my @env_vars = qw< PERL_LOCAL_LIB_ROOT PATH PERL5LIB PERL_MM_OPT PERL_MB_OPT >;
    local @ENV{@env_vars} = @ENV{@env_vars};
    if (exists $ENV{PATH_ORIGINAL})     # assume that, if one is set, they're all set
    {
        $ENV{$_} = $ENV{$_ . '_ORIGINAL'} foreach @env_vars;
    }

    $command = expand_variables($command);
    if ($type eq 'capture')
    {
        return `$command`;
    }
    elsif ($type eq 'tty')
    {
        return system($command) == 0;
    }
    else
    {
        die('bad type');
    }
}
That looks a bit funky, but, believe it or not, it totally works.


So that’s a little more about how to use local, and why you might want to use it even though you agree that global variables are bad.  Note that this technique isn’t helpful when localizing a batch of global scalars as opposed to certain keys of a global hash, but then again, if you had a batch of global scalars, you wouldn’t be trying to set them in a loop in the first place.  Also, you wouldn’t have a batch of global scalars, because globals are bad ... right?

Still, you’re stuck with global vars sometimes, and you may as well make the best of it.  Hopefully this helps.  A little.










Leave a comment

About Buddy Burden

user-pic 10 years in California, 21 years in Perl, 30 years in computers, 51 years in bare feet.