On defined(@array) and defined(%hash)
Perl is somewhat broken as language as it autovivifies symbol values when accessing them.
Clarification because this post has technical errors:
The following is from a naive understanding of the hypothetical defined
operator as it is known from other computer languages or the C preprocessor. Perl's defined was invented to check for the undef value, but is often and falsely used to check for definedness of a symbol.
My understanding coming from a CS background was that defined should check for the existence of the symbol type slot, without creating the symbol and slot. "This this symbol exists?" This is wrong. To check for the symbol being defined, use exists in the symbol hash, the "stash".
There were horrible errors in the wild which can see not so on CPAN more via google codesearch, where defined wanted to check symbol existence but instead created the symbol. Typical defined(@array) errors have been found in the linux kernel, in openssl, everywhere. So the deprecation warning now with 5.15.7 is a very good thing, it opened my eyes.
====================================
defined
is supposed to be non-destructive operator to not autovivify the accessed element.
Originally autovivification should just help accessing hash chains by creating intermediate hash values, e.g. $hash->{parent}->{value}
would create the parent key and not fail. The safe version would be defined $hash->{parent} and $hash->{parent}->{value}
.
You can blame Chip for that design. But it did not stay with hash keys only it went on to symbols.
Also perl went an interesting way and invented exists
:
exists $hash->{parent} and $hash->{parent}->{value};
This was the next spec problem as defined
is supposed to return false if the key exists but the value is undef
.
So defined(*sub)
got broken also. It will always return true,
you have to use defined(*{"sub"})
.
The next spec bigger error here just happened with 5.16 as the language police
decided that defined(@array)
and defined(%hash)
is illegal, and will warn.
defined(@array)
checks if the symbol *array
exists without creating it, and then checks if the AV slot of the symbol is defined, is not empty. It does not check if the @array
value itself is empty.
Ok now with 5.16 p5p overruled that if (@array)
is semantically the same as
if defined(@array)
, because if
does not auto-vivify. How does one now when if
does not auto-vivify? Only defined
is the keyword which does not autovivify for sure.
Same for defined(%hash)
or defined(&sub)
or defined($scalar)
.
Just that defined(%hash)
and defined(@array)
is now forbidden.
Or should defined(&sub)
now create the *sub
symbol just by looking at it? That would be next design error which I can foresee.
defined(&sub) or defined(*{"sub"})
=> always true
Note that defined(&sub) or defined(*sub)
will return true already as defined(*sub)
autovivifies.
Also note that checking for the typeslot is done by checking exists of *symbol{SCALAR}, *symbol{ARRAY}, *symbol{CODE} and so on.
BTW: I just fixed a similar issue in the perl-compiler, where checking a symbol - a scalar in this case - autovifivied it. Fixed by this code
The impact of this change: codesearch for defined
CLARIFICATION
I was completely wrong with my defined'ness assumption as other authors also. They have to fix now their code, because it is invalid.
Yesterday after I wrote this blog post, I talked on the #p5p IRC channel for about 5 hours to defend my wrong position, and got persuaded that I was completely off. Thanks. I analyzed my and other code and found multiple errors in my wrong assumption what defined does and defined does not do.
How did I come to this odd assumption?
The docs
perldoc -f defined
are pretty clear on this.
I mainly do maintenance programming, and in most of my modules defined was used t check for symbol existance, in the symbol table hash.
defined $symtab($name}
which makes only sense if the hash key would return undef in some cases but it does not. The right code would be exists $symtab($name}
.
In my case it made no difference because there was never a undef as hash value, but in other cases it was just a hard to detect bug.
defined(@array) and defined(%hash) have been deprecated in the docs since 5.6, but only now enforced with 5.16. It never did anything useful, it just worked by accident.
mst came with a good example when it was broken. If you create an array, and then delete all array elements with delete, the array will be empty, but defined will return true.
$ perl -e'$a[0]=undef; print defined(@a); delete $a[0]; print defined(@a); print "empty" unless @a'
=> 11empty
Thanks #p5p!
Well, and if you are worried to ask questions or complain about 'stupid' decisions on the perl5-porters mailing list where such decisions are made:
Do not worry! The perl community has the thickest skin ever. You might hear some intimidating technical slang you will not understand and which will turn you off. Do not care, the explanation should be simple to understand. Do complain and you might get enlightened.
Sorry for the trouble.
Could you be less rude, please?
You may have some good points, but your nastiness makes reading your blog posts really tiring.
I don't know why you take this tone. I find it very off-putting, and wish you would just state your case. Did you do so on the development mailing lists? I don't remember these arguments being made, although it has been a long time.
That is, it wasn't just decided by "the language police" for 5.16. The use of "defined" on arrays and hashes has been marked deprecated in perlfunc since perl 5.005_02! That is, since 1998. That is, for about fourteen years. If it performs some important job that needs to be picked up, then *by all means* bring this to the attention of perl5-porters.
Excuse me, that pod change was not released until 5.6.0, a mere twelve years ago.
That’s just Reini. You have to filter out the persecution complex in his writing in order to get to the meat, but I’ve found it worth the effort. He has done some great work that I’m frankly surprised gets so little attention. Maybe it’s also because in some ways he seems to want Perl to be a language that it’s not, but I find that his ideas there do not detract from his other work. And frankly (again), no one engages him on those questions on the level at which he looks at them – he might well be open to reconsideration if people did do that.
I didn't realize any of this was rude until I read the comments. I think it's nasty because some people start by thinking it's nasty. I sense a lot of defensiveness here to deflect from the real problem that Perl has some dubious and confusing design decisions. I find Dave and Ric's tone off-putting. Why can't either of you just state your case?
Frankly, I think the attitude of "take it to p5p" perfectly illustrates the "language police" jab (and it's just a jab) which Dave and Ric probably find objectionable. There's an attitude that these discussions should be buried in a mailing list that virtually no one reads (compared to the size of the user base) rather than be discussed in "public". The users don't matter, so what's the point of making people aware of these issues?
> But at least we can now officially declare our language specs broken by stupidity.
Hm, isn't stupidity a bit harsh?
If I were to introduce a new feature an have this called stupidity, I would quite certainly take it personal.
I have learned that such things are better discussed by telephone or even face-to-face.
I find the informative part of the post interesting to read.
Calling people names creates an atmosphere of antagonism, and in this case I don't think anything happened that warrants that. It's not a constructive thing to do.
Blogs may be read by more users than p5p, but the people who actually make the language are on p5p, not on blogs. For better or worse the mailinglist is the place where decisions are made.That doesn't mean these discussions can't also be held on blogs (I certainly agree we should do such things more often, our communication to the outside sucks), but that should be complementary to discussing them on p5p, not a replacement.
More on topic, I see Reini's use-case, but I don't think
defined @foo
is a good syntax for it at all. I'd find it more intuitive to havedefined *foo{ARRAY}
or some such do the same (though that currently doesn't seem to work).defined has nothing to do with autovivification (though it's a common misconception).
doing
will not autovivify the bar key.
also won't. (same with exists).
doing
won't autovivify bar or boo.
neither. but the defined is useless here. avoiding autovivification is done by testing each "chain link" itself. defined doesn't make a difference there.
defined is for testing if something is not undef.
There is always autovivify pragma (usually used as no autovivify 'something') to avoid autovivification...
This behavior would actually be much less annoying if
defined *foo{SCALAR}
worked like all otherdefined *foo{THING}
, though all of them still autovivify a nonexistent*foo
glob.Sorry for the rudeness by calling it these decisions stupid, but that is it how it is called from outside. And I heard stronger words. I did not call people names, I question the process. You need to step outside from the p5p view and see the broader sight.
There's no discussion culture on the porters list, just an inside circle, so nobody spoke up against it, not to disturb the circle. Certainly not me anymore.
And as the specs are already broken, why bother.
Well, CPAN authors bother now because their tests break now for no apparent reasons.
Technically:
defined *foo{ARRAY} should have been the same as defined @foo. But only the first worked as it should be and the second was never fixed rather discouraged. Why bother with the simple syntax when the obscure works?
Should defined check for not undef or for defined'ness?
I leave that you to. There are enough old blog posts about this also.
BTW: My main point was that defined(*foo) autovivifies but that is already broken, so they broke two more and gave up on defined, which is needed for defined(&sub).
Giving up on defined means breaking a lot and creating a very bad atmosphere.
I agree that porters doesn't have enough discussion. It needs more discussion and review. I hope that in the future you'll consider getting involved in the discussions that do start, or starting new ones.
I don't think that the "inside circle" holds, though. It's true that there is a large group of posters who are more active than others, but the amount of disagreement is large, and *usually* they provide rational arguments to one another. (Sometimes, unfortunately, it's just a mess.) Part of the solution is for more evidence to be provided during these discussions, because they *are* what drive the actual changes.
That is: p5p might be some kind of chamber, but I don't think it's an echo chamber. It's a chamber with a lot of discordant shouting. I hope we can keep the discord and lose the shouting. :)
brian says that *my* tone was unpleasant, above. I apologize. Insanely, I decided it was a good idea to post a reply about thirty seconds after reading your message. Next time, I will go to bed first and reply later. I was frustrated because to me it feels like you have an informed opinion to offer, but have done so in a place that (I assumed) you know is less likely to help change what will happen in perl. Feeling like you were calling me the "stupid" "language police" did not help. (I know that other people say this, and worse, but repetition doesn't lessen the sting.)
I don't know whether closer to "virtually no one" reads p5p or blogs.perl.org, but I know that you're on both, so I implore you: please do let the list know if your thoughts on these things, even if only by posting a link to relevant posts on your blog!
Oh, I forgot the most important point.
"In Perl there is now only one way to do it."
We have a police, take care.
Alternative ways are actively discouraged, although they worked for decades.
My 5 cents here: I personally couldn't make any use of discussions on p5p, and my very subjective view is that Reini couldn't either, therefore the rant. Imagine you're trying to get your point through, cannot, feel ignored, and venting it out to your blog. But then comes police and says the blog is the wrong place for discussion, and appeals to the tone instead of the message ... it's just cheap people, let's hear the message, not silence it. I do respect people on p5p but discussions there so often get buried in discussions leading nowhere.
All of the use cases for
defined
exceptdefined($scalar)
are better done withexists
,scalar
,!!
or just removingdefined
.defined &func
===exists &func
defined func
===exists &func
if defined @array
===if @array
$def = defined @array
===$def = !!@array
So I was told that I was completely wrong and all the authors with their usage of defined(@% also, so please fix it.
I found bugs in the linux kernel, in openssl, god knows where else.
I find it very sad that this thread is taking this tone while the "why I left Perl" stuff is going on. Please. Everyone. Lets take this down a notch.
Although it got heated, I'm glad there has been an amicable resolution.
Is there something that can be done to make the CPAN authors aware of the now broken code?
Reini's clarification mentions Google Code Search; however, it was unfortunately shut down on 15 January 2012. The Google Code Search Team listed some alternatives.
If the test suite does anything at all, it will let both authors and users know when there is a problem. This is, by the way, an argument for running tests.
When I find such problems I file a bug report. With fix if I can work it out, and for the
defined()
bug it's pretty easy. The bug reports are either acted on or ignored. I suppose we could spam the CPAN authors, but authors that ignore bug reports and test results may not be likely to respond to mail.FWIW (perhaps little enough) I've always used
when I wanted to know if a function existed.
Worse than a problem of tone is a lack of effective argument where listening is evident so things don't devolve into repeating the same claims back and forth.
Most of the uses of defined(@array) that I have seen have been people who really meant !!@array and had code such that they never ended up with an allocated @array that was empty so defined(@array), for their code, happened to always (or at least usually) match !!@array.
So "fixing" defined(@array) to mean "the named variable has been declared or came into being by having been initialized" would actually break a bunch of Perl code. And it would /silently/ break a bunch of Perl code.
So it is much better to noisily break the code that sometimes or often or always (so far) "worked" so that it can be replaced with code that always works.
If defined(@array) hadn't become used incorrectly, then it might be reasonable to "fix" it to be useful for people doing introspection.
But the fact that defined(@array) has often been in use /not/ for introspection means that it may be an inappropriate tool for that. Even if all of the existing misuse of defined(@array) for checking "@array isn't empty" were eliminated, leaving defined(@array) around will lead to it being misused again unless it almost never works.
Granted, if defined(@array) were to mean "variable has been declared or created", then it would less often match "@array isn't empty". But I think it would match often enough that there would still be people using defined(@array) incorrectly when they meant "@array is not empty". That may not be big enough of a problem to justify never getting to the point when defined(@array) is introspection. But it should be enough reason to cause caution in pursuing that.
But I personally think it is clearly a bad idea to jump to defined(@array) as introspection without first making it fatal for a while in order to prevent a lot of "silent change" problems.
I'm not convinced that defined(@array) /looks/ like introspection, but that's a weaker argument against it. I prefer advanced things like introspection to not look so mundane.
If there is existing module code that manages to use defined(@array) for introspection effectively, I actually have less sympathy for such advanced code containing the less accurate test (over testing via the glob).
The best improvement I could see on that front would be to officially document how to do introspection. Even provide a core module dedicated to introspection, blessed by p5p.
The case of introspecting scalars, I believe, has long been acknowledged as slightly unfortunate but "fixing" it would require adding at least a boolean to every single glob. I don't know if that can be done without a space penalty. The CPU penalty would probably be minimal. But those still mean there should be some caution when it comes to "just fix it".
I don't have a great feel for how important "don't be forced to /assume/ that $foo hasn't been declared/created just because it currently contains undef and @foo or %foo has already been declared/created" is compared to the cost of enabling such.
I can see how people on either side of these issues could have a narrower view that results in them seeing the other side as "stupid". Demonstrating one's ignorance or narrowed focus by declaring such things seems less effective...
Clinging to the assessment of "that's stupid" is even worse (even if you never state it), because it prevents you from understanding the larger picture which prevents you from addressing other people's concerns which prevents you from improving things.
@Nick: google codesearch is still working and they (Russ Cox) even open sourced it. Try cindex and csearch and you will be amazed. Much faster than book's search_cpan.pl
Generally:
I think we should adopt `exists` for checking existance of global, lexical and
local symbols, ("variables") and all types with non-autovivification of symbols.
This would clean up the difficulties with defined and help in the case of
autovivification lexicals. Think of lexical gensym's.
See http://article.gmane.org/gmane.comp.lang.perl.perl5.porters/105972
Thank you for doing a much better job of saying what I tried to (re: people rarely engaging Reini).
Actually, there has been a patch, which as far as I remember had no downside at all. (I do not have the thread(s) handy I’m afraid, it is resisting my attempts to find it.) I have advocated this for a long time, because I firmly believe that all code which would be broken by this change (of which I have some) is only code that works around the current behaviour. But people (Nick, I think? who wrote the patch) were (quite understandably) nervous and cautious about it, so it didn’t make it in, so far.
I would be really happy to see this happen – no matter what the cost in breakage even, because the current behaviour makes a number of things quite simply impossible.
Reini, I don't how to get at codesearch now? I see that I can search googlecode hosted projects, is that all of it or can you get to more somehow? Searching github directly might help supplement.
One major source of this might be Advanced Perl Programming with the following:
Example 6.2: Dumping All Symbols in a Package
package DUMPVAR;
sub dumpvar {
my ($packageName) = @_;
local (*alias); # a local typeglob
# We want to get access to the stash corresponding to the package
# name
*stash = *{"${packageName}::"}; # Now %stash is the symbol table
$, = " "; # Output separator for print
# Iterate through the symbol table, which contains glob values
# indexed by symbol names.
while (($varName, $globValue) = each %stash) {
print "$varName ============================= \n";
*alias = $globValue;
if (defined ($alias)) {
print "\t \$$varName $alias \n";
}
if (defined (@alias)) {
print "\t \@$varName @alias \n";
}
if (defined (%alias)) {
print "\t \%$varName ",%alias," \n";
}
}
}
Hi Reini, hi all, very sorry to jump in that way, but even after reading this articles many times I can't find a way to solve my pb with defined.
Most web articles says defined(%hash) is useless but how can one test if a key of a hash of hash is itself an hash or just a end value , before running a "keys %hash" to get the unknown subkeys names ? (meaning can not do exists $hoh{$key}->$subkey since subkey is unknown.
with perl 5.10 , defined does the trick, but now I updated to 5.16, my script is broken.
My script take a hash of hash and dump its architecture into a file, like Dump::hash but in a custom way.
Any suggestion to replace defined ?
Thank you very much for your help !
sub dump_hash
{ # dump_hash_osa(hashname, filename)
my $hashname=shift;
my $filename=shift;
my $string;
open(CHAR, ">$filename");
$string=print_hash_elmt(\%{$hashname}, "");
print CHAR $string;
close CHAR;
}
sub print_hash_elmt
{
my $hashname=shift;
my $tmp_line=shift;
my $elm,$elm2,$elm3='';
my $string='';
my $stringr='';
foreach $elm ( sort sort {$a $b} keys %{$hashname}) {
if ($elm =~ m,\s,) {$elm2="\"$elm\"";}else {$elm2=$elm;}
if (defined %{$$hashname{$elm}}){
##--- KEY IS HASH ----
$string = print_hash_elmt(\%{$$hashname{$elm}}, $tmp_line."$elm2\t");
} else {
##-- KEY IS VALUE --
if (($$hashname{$elm} =~ m,\s,)&&($$hashname{$elm} !~ m,^\".*\"$,))
{ $elm3="\"$$hashname{$elm}\"" ;}
else {$elm3=$$hashname{$elm};}
$string = $tmp_line. "$elm2\t$elm3\n";
}
$stringr .= $string;
}
blogs.perl.org is a blogging platform, no support forum. You should generally consider to ask questions on other platforms, like stackoverflow or perlmonks.
To determine of what type a reference is (in Perl5 all nested data structures consist of references) you need to use the "ref" builtin (see perldoc -f ref or here: http://perldoc.perl.org/functions/ref.html ).
I took the liberty of rewriting your code a bit, so it conforms to recent best practices a little more: