Is This a Hashref Which I See Before Me? (no, wait, it's an arrayref ...)
As you may know, Perl 5.14 (and up) will allow you to use references for container functions (by which they mean functions such as push
or keys
that operate directly on arrays or hashes). I’d never thought much about this feature, except to think that it was, in general, a nice way to save three keystrokes on constructs such as (to steal an example from perldelta):
keys %{$hoh->{genres}{artists}}
Now, saving three keystrokes doesn’t mean much if it makes your code harder to read. But, in this case
keys $hoh->{genres}{artists}
is actually easier to read, so I call that a win-win.
This feature came up at work recently, and a couple of people whose opinions I greatly respect disparaged its use. Curious, I asked why. They pointed me at three blog posts that came out at the time the feature was first announced: two from chromatic, and one from brian d foy (again, people whose opinions I greatly respect). I hadn’t noticed these (or, if I had, I’d forgotten them), so I took some time to read through the articles and their comments.
However, at the end of it all, despite strong agreement amongst at least 5 fellow programmers whose opinions, as I say, I give strong weight to, I’m still going to disagree with them here. I don’t see anything wrong with the auto-dereferencing, or at least not enough wrong to warrant a policy of full-stop banning it from my code.
In exploring why not, the first thing to note is that these are not three articles which point out three different things wrong with the feature. They’re just more and more detailed explanations of the one thing that’s wrong with the feature, which is that it creates an ambiguity with another new-ish feaure of Perl: being able to use the hash container funcions (i.e. each
, keys
, and values
) on arrays. And, I’m going to quote chromatic here:
The real problem was making each, keys, and values work on arrays as well as hashes.
That is, there’s a clash between two features. It might be tempting to blame the later released of the two, since that’s when the ambiguity was created. However, in my opinion (and, apparently, chromatic’s), it was actually the first one that was the bad call. To my way of thinking, the convenience of auto-deferencing hashrefs and arrayrefs is something I’m likely to use all the time (and have used many times since its release). Whereas the convenience of being able to use each
on arrays ... I’ve never wanted that, nor have I ever used it since it came out (with 5.12).
But, in many ways, that’s a moot point. The ambiguity is there now, and it doesn’t matter which feature’s “fault” it is: if the problem it creates is bad enough, you should avoid it. Right?
Well, let’s look at that problem. What chromatic notes is that you’re sort of boned if you want to have an object that overloads both array and hash dereferencing (which is something I do quite a bit, actually). But, as he also notes, the feature implementors solved that problem (well, avoided it, anyway) by not allowing auto-dereferencing on anything that’s blessed. I tend to agree with chromatic that that’s not necessarily a great solution, but it is a solution. So that’s not a problem. brian d foy goes on to point out that if you should happen to accidentally auto-deref a hashref when you thought you were getting an arrayref, you might end up trying to use strings as numbers. (Or vice versa, although the problem of treating numbers as strings seems less problematic.) And that would be bad, right? Treating “123Buster” as 123, as he notes in an example.
But ... if you’re using use warnings
(and you are using use warnings
... right?), then you’re going to hear about that tootsweet. Better yet, if you’re using use warnings FATAL => 'all'
(which you really should, in my humble opinion), it becomes a run-time error, which is exactly what each %{$arrayref}
would have been. For that matter, if you take your keys
that you thought were array indices but really were hash keys and try to use them as array indices, that’s a run-time error whether you have warnings
enabled or not. That is, to take a very simple/stupid example:
my $arrayref = $hashref_that_I_thought_was_really_an_arrayref;
my $reversed = { map { $arrayref->[$_] => $_ } keys $arrayref };
That’s still going to blow up nice and good, and no later than if you’d used @$arrayref
instead.
But even with the best example I could think of, where you didn’t trigger trying to deref the wrong sort of reference, I still couldn’t get around use warnings
letting me know where I was going wrong.
Now, in one of the comments, Joel Berger (yet another coder whose opinion I greatly respect) says that he had a piece of code that auto-dereferenced a variable when it shouldn’t have:
My problem was that when writing a module (targeting back at least as far as 5.10 if not farther), I ACCIDENTALLY auto-deferenced a variable ($var rather than %$var). One little character, and the kicker was that the code ran fine on my box.
But I can’t figure out how to reproduce this problem, because, prior to 5.14, this sort of code:
my $ref = { some => 'stuff' };
print join(' ', keys $ref);
would produce a compile-time error:
Type of arg 1 to keys must be hash (not private variable) at -e line 1, near "$ref)"
So I’m having difficulty envisioning the problem Joel is reporting. He mentions that he blogged about the issue, but I’ve spent a bit of time searching his blogs and couldn’t find it, so perhaps he’ll be kind enough to mention it in the comments.
Thus, after spending a bit of time working through the problem, it seems to me that the only downside might be that you somehow manage to trigger the warning or error (from using a hash key as an array index) somewhere in your code that’s far removed from where you actualy dereferenced the ref. That is, in all my examples, the error is triggered on the same line as, or a line or two later than, the auto-deref line. That’s because examples are always over-simplified; in actual, complex code, the two sites could be completely disconnected in both time and space. But that means that, for this to be an actual problem that causes me actual grief, the following stars would have to align:
- I’d have to have a situation where I actually did manage to get a hashref into a variable that I thought held an arrayref.
- I’d have to auto-deference that variable (using the new feature of 5.14).
- I’d have to be using
keys
oreach
on that variable (using the new feature of 5.12). - (Which means that I’d have to be writing code that I didn’t care about prior compatibility with, which immediately eliminates all my CPAN modules.)
- I’d have to use the results of my
keys
oreach
auto-deferencing somewhere far away from the deferencing itself. - It would have to, somehow, be a situation where a single
warn
didn’t make me go “oh, I see what happened.”
That’s a lot of conditions. None of them are particularly unlikely (except perhaps the last one)—in fact, a couple are very common—but the combination of them all seems pretty far-fetched. Perhaps one day I’ll get bitten by exactly this set of circumstances and then I’ll change my mind.
Now, some might say, “look, if there’s even a possibility that I might have a strange error that takes me forever to track down, that’s enough to make me want to avoid it.” Which is perfectly reasonable, for some people. It’s a bit too conservative an approach for me, though. I think I’d be missing out on a whole lot of great programming things by taking that attitude. (Moose, for one: its baroque error messages often lead to spending an inordinate amount of time debugging simple errors. But that doesn’t outweigh its immense utility.) I’m much more of a “take a chance” sort of coder. For instance, take the common::sense module. It gets a lot of press about how bad it is, but most of that is theoretical. When I first heard about it, I read the press (both good and bad) and just decided to throw it into my personal scripts for a while and check it out. The first time I spent a few days trying to track down an error that would have been instantly recognizable if the uninitialized warnings had been enabled, I figured out it wasn’t worth it and never looked back. That experiment caused me a lot of frustration, and cost me quite a few of my off-time hours. But I consider it worthwhile nonetheless.
But perhaps the biggest reason I like the auto-dereferencing feature is that it’s very DWIMmy, which is very Perl. Without it, you get the compilation error I quoted above. Which is one of those errors that makes you look at Perl and go: “dude! you know what I meant.” And, now that we have the feature, we know that Perl really did know what we meant. Why shouldn’t it just go ahead do that? I enjoy this aspect of Perl. Others don’t, I’m sure, but as I always say: difference of opinion is what makes the world a beautiful place.
One final thought: Most likely the biggest reason I’m never going to hit this ambiguity is that I’m never likely to use keys
or each
on an array, much less an array reference. But even for those people who are likely to want to use the first of the conflicting features, I wonder how likely it would be that they want to use both at once? chromatic lists five possible ways that the amibiguity could be removed (or at least could have been avoided). But I have a sixth: make it illegal to use both features at once. That is:
my @array = qw< some stuff >;
my $arrayref = \@array;
my $hashref = { map { $_ => 1 } @array };
my @indices = keys @array; # this would still work
my @keys = keys $hashref; # and so would this
@indices = keys $arrayref; # but this would now be a run-time error
@indices = keys @$arrayref; # so just do this instead
Wouldn’t that satisfy everyone? I wonder if it would, and if the folks over in core ever considered it.
I think your sixth option is chromatic's fourth option in his list of five. I agree that it seems to be the best.
Apparently this inconsistency doesn't come up in other languages because they don't allow overloading a single "object" with both hash and array behavior, so it's always unambiguous which behavior is meant. Perl allows that kind of overloading at the price of needing different "operators" (e.g. {} and []) for hashes and arrays.