Why I try to avoid Perl's punctuation variables

Over on Perlmonks I wrote that you should probably use this:

say join '', @array[2,4];

Instead of this:

local $" = '';
say "@array[2,4]";

And my reasoning being:

Why is that better? Because nobody knows what $" is, but everyone knows what join() is. Always write your software to be as readable as possible :)

I received a couple of upset replies along the lines of "Perl devs should be allowed to write Perl!" While I generally agree with that sentiment -- I had no problem with the array slice, for example -- I think the replies came, in part, because I had answered too quickly. I clarified what I meant, but I thought I should explain it here, too, because too many people reach for those punctuation variables.

First of all, there are still punctuation variables that I'm OK with, such as $, @. $$ and $^X. Those are so ubiquitous (or unavoidable) that I'm perfectly OK with them. However, as mentioned, I want code that is readable and safe, but it's the latter part that I forgot to write about.

To see what I mean by safe, visit a few of your @INC directories and run this:

ack '^\s*\$\w+::.*='
# or
ack '^\s*\$[[:punct:]] .*='|grep -v '$_'

And see just how many package or punctuation variables are assigned to without a local declaration, meaning that you've now globally impacted it. These are very, very hard bugs to track down. I remember once spending many hours trying to find a bug in some production code only to eventually discover this in someone's CPAN module (which has since been fixed):

$Data::Dumper::Terse = 1;

Hey, that's just changing something we're printing, right? How could that possibly break anything? But it did and it was a devil to track that down.

Package variables are frequently a very bad idea since they're so hard to validate, but Perl's built-in punctuation variables? If you rely on them, you have very little control over them and while, yes, you should generally call them with local, they tend to be so much harder to read and combining the obfuscation with the risk of action at a distance is more of a price than I'm willing to pay nowadays.

This, incidentally, is part of the reason why we're going to start transitioning to subroutine signatures for Veure. I mean, come on! Subroutine signatures have existed longer than most of us have been alive and we're still waiting for them?

So yeah, Veure's going to bite that bullet and hope it works out. As an example of why, here's (the skeleton of) some client code I rewrote today:

sub do_something {
    my $self = shift;
    my $object = __PACKAGE__->new(
        value => shift;
        type => 'foo',
    );
    my $thing = pop @_ > 1;
    $object->set_thing($thing) if $thing;
    my $next = shift;
    ... and so on
}

Do you see what's going on with those arguments? Do you really want to debug that? Neither do I. However, the client doesn't use subroutine signatures, so that became:

sub do_something {
    my ( $self, $value, $next, $thing ) = @_;
    my $object = (ref $self)->new(
        value => $value;
        type  => 'foo',
    );
    $object->set_thing($thing) if $thing;
    ... and so on
}

That's much clearer. Perfect? No, but clearer. The original wasn't impossible to read, but it was harder than it needed to be. Every time something is a little "harder than it needs to be", it adds up, but many devs are comfortable with forgetting that.

Write your code so that it's easy for even new devs to read. Yes, I know we love Perl. Yes, I know there are times you can't write code that's easy to read. But to default to obfuscated code? Not a good idea.

13 Comments

We've been using subroutine signatures for a while at $WORK and it's a huge improvement. I really wish I could use them with my CPAN code too.

I use $" a lot. Writing it using join is just awkward. Sure, in your simple example, it doesn't make much of a difference. But it does for:

my $query = do {local $" = ", "; <<"--"};
    SELECT  foo, bar, baz
      FROM  SomeTable
     WHERE  id IN @{[("?") x @ids]}
--

It all depends on what you are used to. When I started learning Perl, I thought the use of ‘and’ and ‘or’ for control flow to be confusing. Now I’m used to it. Would you really suggest that one avoid those so as not to confuse new devs? I’m sure your answer is no. But where do you draw the line? That’s what nobody agrees on.

FWIW, if you don't want to create another lexical, you can still use @{[ ]} with a join inside there, no?

my $query = <<"END_SQL";
    SELECT  foo, bar, baz
      FROM  SomeTable
     WHERE  id IN @{[ join ', ' => ('?') x @ids ]}
END_SQL

Also, in Perl 6 there's a .join method of which the delimiter parameter is optional.

say @array[2,4].join;

You got a semicolon in there that should be a comma.

What's @. ? I can't find it in perlvar.

my $query = <<"";
    SELECT foo, bar, baz
      FROM sometable
     WHERE id IN (${\ substr ',?' x @ids, 1 })

What? Oh, we weren’t golfing?

More seriously, global variables bad, because action at a distance bad, and to the extent that Abigail’s version of the code is less readable than other formulations I find it’s mostly owing to the distance between setting the punctuation variable and the effect that that assignment has.

There's at least one valid use of $" that can't be replaced with join:

my @regexes = list_of_qr_objects();
my $combined = do { local $" = '|'; qr/@regexes/ };

If any of the regex objects contain (?{ }) or (??{ }) constructs, using join would decompile and reparse the code. That in turn would require use re 'eval' in scope for the final qr, and even then it would break the use of any lexical variables that those code blocks might close over.

However, qr/@regexes/ uses the compiled form of each of the @regexes directly, including the ordinary Perl code inside their code blocks, so this works fine.

Of course, this doesn't provide a direct argument either way about whether it's good to reach for punctuation variables by default, but I think it's interesting regardless, as well as an occasionally useful technique.

This is one of the few examples where i know an even better trick, at least as long as you are using DBD::Pg

my $query = <<"END_SQL";
    SELECT  foo, bar, baz
      FROM  SomeTable
    WHERE  id = ANY(?)
END_SQL

And then pass an array reference as the value. \o/

I'm with Ovid. I love that perl will let me do very concise things and has all these features, but they should be used appropriately. Invoking perl with -e on the command line to do some one-off task is often only really possible this way, and knocking up a quick helper utility for something is much quicker and easier with these features.

But Abigail's example wouldn't make it past review onto a production server. I'm on a multi-decade project and there's an inevitable turnover of staff so maintainability of code is critical; moreso even than having it work properly (maintainable but faulty code is fixable; working but unmaintainable code that needs updating is useless). Other people who are new to my code (and that really includes me more than a couple of weeks after writing it :/ ) must be able to maintain it as easily as possible. It is increasingly difficult to find experienced perl devs and much of the resistance from others comes from all that "perl is line-noise" nonsense. I think it behooves us to write perl that is not line-noise, and is as far from it as is reasonable.

*frowns at Liz* :)

I applaud this article and others like it. At $work we are constantly fighting bugs in code that is "too clever" and not explicit. Also fascinating to see what may seem obvious to some (e.g. the Babycart operator) but quite confusing to newbies.

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/