On defined(@array) and defined(%hash)

Perl is somewhat broken as language as it autovivifies symbol values when accessing them.

Clarification because this post has technical errors:

The following is from a naive understanding of the hypothetical defined operator as it is known from other computer languages or the C preprocessor. Perl's defined was invented to check for the undef value, but is often and falsely used to check for definedness of a symbol.

My understanding coming from a CS background was that defined should check for the existence of the symbol type slot, without creating the symbol and slot. "This this symbol exists?" This is wrong. To check for the symbol being defined, use exists in the symbol hash, the "stash".

There were horrible errors in the wild which can see not so on CPAN more via google codesearch, where defined wanted to check symbol existence but instead created the symbol. Typical defined(@array) errors have been found in the linux kernel, in openssl, everywhere. So the deprecation warning now with 5.15.7 is a very good thing, it opened my eyes.

====================================

defined is supposed to be non-destructive operator to not autovivify the accessed element.

Originally autovivification should just help accessing hash chains by creating intermediate hash values, e.g. $hash->{parent}->{value} would create the parent key and not fail. The safe version would be defined $hash->{parent} and $hash->{parent}->{value}. You can blame Chip for that design. But it did not stay with hash keys only it went on to symbols.

Also perl went an interesting way and invented exists: exists $hash->{parent} and $hash->{parent}->{value};

This was the next spec problem as defined is supposed to return false if the key exists but the value is undef.

So defined(*sub) got broken also. It will always return true, you have to use defined(*{"sub"}).

The next spec bigger error here just happened with 5.16 as the language police decided that defined(@array) and defined(%hash) is illegal, and will warn.

defined(@array) checks if the symbol *array exists without creating it, and then checks if the AV slot of the symbol is defined, is not empty. It does not check if the @array value itself is empty.

Ok now with 5.16 p5p overruled that if (@array) is semantically the same as if defined(@array), because if does not auto-vivify. How does one now when if does not auto-vivify? Only defined is the keyword which does not autovivify for sure.

Same for defined(%hash) or defined(&sub) or defined($scalar). Just that defined(%hash) and defined(@array) is now forbidden.

Or should defined(&sub) now create the *sub symbol just by looking at it? That would be next design error which I can foresee. defined(&sub) or defined(*{"sub"}) => always true

Note that defined(&sub) or defined(*sub) will return true already as defined(*sub) autovivifies.

Also note that checking for the typeslot is done by checking exists of *symbol{SCALAR}, *symbol{ARRAY}, *symbol{CODE} and so on.

BTW: I just fixed a similar issue in the perl-compiler, where checking a symbol - a scalar in this case - autovifivied it. Fixed by this code

The impact of this change: codesearch for defined

CLARIFICATION

I was completely wrong with my defined'ness assumption as other authors also. They have to fix now their code, because it is invalid.

Yesterday after I wrote this blog post, I talked on the #p5p IRC channel for about 5 hours to defend my wrong position, and got persuaded that I was completely off. Thanks. I analyzed my and other code and found multiple errors in my wrong assumption what defined does and defined does not do.

How did I come to this odd assumption? The docs perldoc -f defined are pretty clear on this.

I mainly do maintenance programming, and in most of my modules defined was used t check for symbol existance, in the symbol table hash. defined $symtab($name} which makes only sense if the hash key would return undef in some cases but it does not. The right code would be exists $symtab($name}. In my case it made no difference because there was never a undef as hash value, but in other cases it was just a hard to detect bug.

defined(@array) and defined(%hash) have been deprecated in the docs since 5.6, but only now enforced with 5.16. It never did anything useful, it just worked by accident.

mst came with a good example when it was broken. If you create an array, and then delete all array elements with delete, the array will be empty, but defined will return true.

$ perl -e'$a[0]=undef; print defined(@a); delete $a[0]; print defined(@a); print "empty" unless @a'

=> 11empty

Thanks #p5p!

Well, and if you are worried to ask questions or complain about 'stupid' decisions on the perl5-porters mailing list where such decisions are made:

Do not worry! The perl community has the thickest skin ever. You might hear some intimidating technical slang you will not understand and which will turn you off. Do not care, the explanation should be simple to understand. Do complain and you might get enlightened.

Sorry for the trouble.

29 Comments

Could you be less rude, please?

You may have some good points, but your nastiness makes reading your blog posts really tiring.

I don't know why you take this tone. I find it very off-putting, and wish you would just state your case. Did you do so on the development mailing lists? I don't remember these arguments being made, although it has been a long time.

That is, it wasn't just decided by "the language police" for 5.16. The use of "defined" on arrays and hashes has been marked deprecated in perlfunc since perl 5.005_02! That is, since 1998. That is, for about fourteen years. If it performs some important job that needs to be picked up, then *by all means* bring this to the attention of perl5-porters.

Excuse me, that pod change was not released until 5.6.0, a mere twelve years ago.

That’s just Reini. You have to filter out the persecution complex in his writing in order to get to the meat, but I’ve found it worth the effort. He has done some great work that I’m frankly surprised gets so little attention. Maybe it’s also because in some ways he seems to want Perl to be a language that it’s not, but I find that his ideas there do not detract from his other work. And frankly (again), no one engages him on those questions on the level at which he looks at them – he might well be open to reconsideration if people did do that.

I didn't realize any of this was rude until I read the comments. I think it's nasty because some people start by thinking it's nasty. I sense a lot of defensiveness here to deflect from the real problem that Perl has some dubious and confusing design decisions. I find Dave and Ric's tone off-putting. Why can't either of you just state your case?

Frankly, I think the attitude of "take it to p5p" perfectly illustrates the "language police" jab (and it's just a jab) which Dave and Ric probably find objectionable. There's an attitude that these discussions should be buried in a mailing list that virtually no one reads (compared to the size of the user base) rather than be discussed in "public". The users don't matter, so what's the point of making people aware of these issues?

> But at least we can now officially declare our language specs broken by stupidity.
Hm, isn't stupidity a bit harsh?
If I were to introduce a new feature an have this called stupidity, I would quite certainly take it personal.
I have learned that such things are better discussed by telephone or even face-to-face.

I find the informative part of the post interesting to read.

I didn't realize any of this was rude until I read the comments. I think it's nasty because some people start by thinking it's nasty. I sense a lot of defensiveness here to deflect from the real problem that Perl has some dubious and confusing design decisions.
I think it's mainly his way of calling people he disagrees with language police. It suggests people aren't willing to listen to him, when he didn't even try to speak up.

Calling people names creates an atmosphere of antagonism, and in this case I don't think anything happened that warrants that. It's not a constructive thing to do.

Frankly, I think the attitude of "take it to p5p" perfectly illustrates the "language police" jab (and it's just a jab) which Dave and Ric probably find objectionable. There's an attitude that these discussions should be buried in a mailing list that virtually no one reads (compared to the size of the user base) rather than be discussed in "public". The users don't matter, so what's the point of making people aware of these issues?
Blogs may be read by more users than p5p, but the people who actually make the language are on p5p, not on blogs. For better or worse the mailinglist is the place where decisions are made.

That doesn't mean these discussions can't also be held on blogs (I certainly agree we should do such things more often, our communication to the outside sucks), but that should be complementary to discussing them on p5p, not a replacement.

More on topic, I see Reini's use-case, but I don't think defined @foo is a good syntax for it at all. I'd find it more intuitive to have defined *foo{ARRAY} or some such do the same (though that currently doesn't seem to work).

defined is supposed to be non-destructive operator to not autovivify the accessed element.

Originally autovivification should just help accessing hash chains by creating intermediate hash values, e.g. $hash->{parent}->{value} would create the parent key and not fail. The safe version would be defined $hash->{parent} and $hash->{parent}->{value}.

defined has nothing to do with autovivification (though it's a common misconception).

doing

  if ($foo->{bar})

will not autovivify the bar key.

  if (defined $foo->{bar})

also won't. (same with exists).

doing

  if ($foo->{bar} and $foo->{bar}->{boo})

won't autovivify bar or boo.

  if (defined $foo->{bar} and $foo->{bar}->{boo})

neither. but the defined is useless here. avoiding autovivification is done by testing each "chain link" itself. defined doesn't make a difference there.

defined is for testing if something is not undef.

There is always autovivify pragma (usually used as no autovivify 'something') to avoid autovivification...

This behavior would actually be much less annoying if defined *foo{SCALAR} worked like all other defined *foo{THING}, though all of them still autovivify a nonexistent *foo glob.

I agree that porters doesn't have enough discussion. It needs more discussion and review. I hope that in the future you'll consider getting involved in the discussions that do start, or starting new ones.

I don't think that the "inside circle" holds, though. It's true that there is a large group of posters who are more active than others, but the amount of disagreement is large, and *usually* they provide rational arguments to one another. (Sometimes, unfortunately, it's just a mess.) Part of the solution is for more evidence to be provided during these discussions, because they *are* what drive the actual changes.

That is: p5p might be some kind of chamber, but I don't think it's an echo chamber. It's a chamber with a lot of discordant shouting. I hope we can keep the discord and lose the shouting. :)

brian says that *my* tone was unpleasant, above. I apologize. Insanely, I decided it was a good idea to post a reply about thirty seconds after reading your message. Next time, I will go to bed first and reply later. I was frustrated because to me it feels like you have an informed opinion to offer, but have done so in a place that (I assumed) you know is less likely to help change what will happen in perl. Feeling like you were calling me the "stupid" "language police" did not help. (I know that other people say this, and worse, but repetition doesn't lessen the sting.)

I don't know whether closer to "virtually no one" reads p5p or blogs.perl.org, but I know that you're on both, so I implore you: please do let the list know if your thoughts on these things, even if only by posting a link to relevant posts on your blog!

My 5 cents here: I personally couldn't make any use of discussions on p5p, and my very subjective view is that Reini couldn't either, therefore the rant. Imagine you're trying to get your point through, cannot, feel ignored, and venting it out to your blog. But then comes police and says the blog is the wrong place for discussion, and appeals to the tone instead of the message ... it's just cheap people, let's hear the message, not silence it. I do respect people on p5p but discussions there so often get buried in discussions leading nowhere.

All of the use cases for defined except defined($scalar) are better done with exists, scalar,!! or just removing defined.

  • defined &func === exists &func
  • defined func === exists &func
  • if defined @array === if @array
  • $def = defined @array === $def = !!@array

I find it very sad that this thread is taking this tone while the "why I left Perl" stuff is going on. Please. Everyone. Lets take this down a notch.

Although it got heated, I'm glad there has been an amicable resolution.

Is there something that can be done to make the CPAN authors aware of the now broken code?

Reini's clarification mentions Google Code Search; however, it was unfortunately shut down on 15 January 2012. The Google Code Search Team listed some alternatives.

If the test suite does anything at all, it will let both authors and users know when there is a problem. This is, by the way, an argument for running tests.

When I find such problems I file a bug report. With fix if I can work it out, and for the defined() bug it's pretty easy. The bug reports are either acted on or ignored. I suppose we could spam the CPAN authors, but authors that ignore bug reports and test results may not be likely to respond to mail.

FWIW (perhaps little enough) I've always used

__PACKAGE__->can( 'foo' )
when I wanted to know if a function existed.

Worse than a problem of tone is a lack of effective argument where listening is evident so things don't devolve into repeating the same claims back and forth.

Most of the uses of defined(@array) that I have seen have been people who really meant !!@array and had code such that they never ended up with an allocated @array that was empty so defined(@array), for their code, happened to always (or at least usually) match !!@array.

So "fixing" defined(@array) to mean "the named variable has been declared or came into being by having been initialized" would actually break a bunch of Perl code. And it would /silently/ break a bunch of Perl code.

So it is much better to noisily break the code that sometimes or often or always (so far) "worked" so that it can be replaced with code that always works.

If defined(@array) hadn't become used incorrectly, then it might be reasonable to "fix" it to be useful for people doing introspection.

But the fact that defined(@array) has often been in use /not/ for introspection means that it may be an inappropriate tool for that. Even if all of the existing misuse of defined(@array) for checking "@array isn't empty" were eliminated, leaving defined(@array) around will lead to it being misused again unless it almost never works.

Granted, if defined(@array) were to mean "variable has been declared or created", then it would less often match "@array isn't empty". But I think it would match often enough that there would still be people using defined(@array) incorrectly when they meant "@array is not empty". That may not be big enough of a problem to justify never getting to the point when defined(@array) is introspection. But it should be enough reason to cause caution in pursuing that.

But I personally think it is clearly a bad idea to jump to defined(@array) as introspection without first making it fatal for a while in order to prevent a lot of "silent change" problems.

I'm not convinced that defined(@array) /looks/ like introspection, but that's a weaker argument against it. I prefer advanced things like introspection to not look so mundane.

If there is existing module code that manages to use defined(@array) for introspection effectively, I actually have less sympathy for such advanced code containing the less accurate test (over testing via the glob).

The best improvement I could see on that front would be to officially document how to do introspection. Even provide a core module dedicated to introspection, blessed by p5p.

The case of introspecting scalars, I believe, has long been acknowledged as slightly unfortunate but "fixing" it would require adding at least a boolean to every single glob. I don't know if that can be done without a space penalty. The CPU penalty would probably be minimal. But those still mean there should be some caution when it comes to "just fix it".

I don't have a great feel for how important "don't be forced to /assume/ that $foo hasn't been declared/created just because it currently contains undef and @foo or %foo has already been declared/created" is compared to the cost of enabling such.

I can see how people on either side of these issues could have a narrower view that results in them seeing the other side as "stupid". Demonstrating one's ignorance or narrowed focus by declaring such things seems less effective...

Clinging to the assessment of "that's stupid" is even worse (even if you never state it), because it prevents you from understanding the larger picture which prevents you from addressing other people's concerns which prevents you from improving things.

Worse than a problem of tone is a lack of effective argument where listening is evident so things don’t devolve into repeating the same claims back and forth.

Thank you for doing a much better job of saying what I tried to (re: people rarely engaging Reini).

The case of introspecting scalars, I believe, has long been acknowledged as slightly unfortunate but “fixing” it would require adding at least a boolean to every single glob. I don’t know if that can be done without a space penalty. The CPU penalty would probably be minimal. But those still mean there should be some caution when it comes to “just fix it”.

Actually, there has been a patch, which as far as I remember had no downside at all. (I do not have the thread(s) handy I’m afraid, it is resisting my attempts to find it.) I have advocated this for a long time, because I firmly believe that all code which would be broken by this change (of which I have some) is only code that works around the current behaviour. But people (Nick, I think? who wrote the patch) were (quite understandably) nervous and cautious about it, so it didn’t make it in, so far.

I would be really happy to see this happen – no matter what the cost in breakage even, because the current behaviour makes a number of things quite simply impossible.

Reini, I don't how to get at codesearch now? I see that I can search googlecode hosted projects, is that all of it or can you get to more somehow? Searching github directly might help supplement.

One major source of this might be Advanced Perl Programming with the following:

Example 6.2: Dumping All Symbols in a Package

package DUMPVAR;
sub dumpvar {
my ($packageName) = @_;
local (*alias); # a local typeglob
# We want to get access to the stash corresponding to the package
# name
*stash = *{"${packageName}::"}; # Now %stash is the symbol table
$, = " "; # Output separator for print
# Iterate through the symbol table, which contains glob values
# indexed by symbol names.
while (($varName, $globValue) = each %stash) {
print "$varName ============================= \n";
*alias = $globValue;
if (defined ($alias)) {
print "\t \$$varName $alias \n";
}
if (defined (@alias)) {
print "\t \@$varName @alias \n";
}
if (defined (%alias)) {
print "\t \%$varName ",%alias," \n";
}
}
}

Hi Reini, hi all, very sorry to jump in that way, but even after reading this articles many times I can't find a way to solve my pb with defined.
Most web articles says defined(%hash) is useless but how can one test if a key of a hash of hash is itself an hash or just a end value , before running a "keys %hash" to get the unknown subkeys names ? (meaning can not do exists $hoh{$key}->$subkey since subkey is unknown.

with perl 5.10 , defined does the trick, but now I updated to 5.16, my script is broken.

My script take a hash of hash and dump its architecture into a file, like Dump::hash but in a custom way.

Any suggestion to replace defined ?
Thank you very much for your help !

sub dump_hash
{ # dump_hash_osa(hashname, filename)
my $hashname=shift;
my $filename=shift;
my $string;
open(CHAR, ">$filename");
$string=print_hash_elmt(\%{$hashname}, "");

print CHAR $string;
close CHAR;
}
sub print_hash_elmt
{
my $hashname=shift;
my $tmp_line=shift;
my $elm,$elm2,$elm3='';
my $string='';
my $stringr='';
foreach $elm ( sort sort {$a $b} keys %{$hashname}) {
if ($elm =~ m,\s,) {$elm2="\"$elm\"";}else {$elm2=$elm;}
if (defined %{$$hashname{$elm}}){
##--- KEY IS HASH ----
$string = print_hash_elmt(\%{$$hashname{$elm}}, $tmp_line."$elm2\t");
} else {
##-- KEY IS VALUE --
if (($$hashname{$elm} =~ m,\s,)&&($$hashname{$elm} !~ m,^\".*\"$,))
{ $elm3="\"$$hashname{$elm}\"" ;}
else {$elm3=$$hashname{$elm};}

$string = $tmp_line. "$elm2\t$elm3\n";
}
$stringr .= $string;
}


blogs.perl.org is a blogging platform, no support forum. You should generally consider to ask questions on other platforms, like stackoverflow or perlmonks.

To determine of what type a reference is (in Perl5 all nested data structures consist of references) you need to use the "ref" builtin (see perldoc -f ref or here: http://perldoc.perl.org/functions/ref.html ).

I took the liberty of rewriting your code a bit, so it conforms to recent best practices a little more:


sub dump_hash {
my ($hashname, $filename) = @_;
open my $file, ">", $filename;
print $file print_hoh( $hashname );
close $file;
}
 
sub print_hoh {
my $hoh = shift;
my $line_so_far = shift || '';
my $lines;
 
for my $key ( sort keys %$hoh ) {
my $value = $hoh->{$key};
 
if (ref $value eq 'HASH') {
$lines .= print_hoh( $value, $line_so_far . quote($key) );
} else {
$lines .= $line_so_far . quote($key) . quote($value) . "\n";
}
}
return $lines;
}
 
sub quote {
my $string = shift;
$string = $string =~ m/\s/ ? '"' . $string . '"' : $string;
return $string . "\t";
}

About Reini Urban

user-pic Working at cPanel on cperl, B::C (the perl-compiler), parrot, B::Generate, cygwin perl and more guts, keeping the system alive.