On defined(@array) and defined(%hash)

Perl is somewhat broken as language as it autovivifies symbol values when accessing them.

Clarification because this post has technical errors:

The following is from a naive understanding of the hypothetical defined operator as it is known from other computer languages or the C preprocessor. Perl's defined was invented to check for the undef value, but is often and falsely used to check for definedness of a symbol.

My understanding coming from a CS background was that defined should check for the existence of the symbol type slot, without creating the symbol and slot. "This this symbol exists?" This is wrong. To check for the symbol being defined, use exists in the symbol hash, the "stash".

There were horrible errors in the wild which can see not so on CPAN more via google codesearch, where defined wanted to check symbol existence but instead created the symbol. Typical defined(@array) errors have been found in the linux kernel, in openssl, everywhere. So the deprecation warning now with 5.15.7 is a very good thing, it opened my eyes.

====================================

defined is supposed to be non-destructive operator to not autovivify the accessed element.

Originally autovivification should just help accessing hash chains by creating intermediate hash values, e.g. $hash->{parent}->{value} would create the parent key and not fail. The safe version would be defined $hash->{parent} and $hash->{parent}->{value}. You can blame Chip for that design. But it did not stay with hash keys only it went on to symbols.

Also perl went an interesting way and invented exists: exists $hash->{parent} and $hash->{parent}->{value};

This was the next spec problem as defined is supposed to return false if the key exists but the value is undef.

So defined(*sub) got broken also. It will always return true, you have to use defined(*{"sub"}).

The next spec bigger error here just happened with 5.16 as the language police decided that defined(@array) and defined(%hash) is illegal, and will warn.

defined(@array) checks if the symbol *array exists without creating it, and then checks if the AV slot of the symbol is defined, is not empty. It does not check if the @array value itself is empty.

Ok now with 5.16 p5p overruled that if (@array) is semantically the same as if defined(@array), because if does not auto-vivify. How does one now when if does not auto-vivify? Only defined is the keyword which does not autovivify for sure.

Same for defined(%hash) or defined(&sub) or defined($scalar). Just that defined(%hash) and defined(@array) is now forbidden.

Or should defined(&sub) now create the *sub symbol just by looking at it? That would be next design error which I can foresee. defined(&sub) or defined(*{"sub"}) => always true

Note that defined(&sub) or defined(*sub) will return true already as defined(*sub) autovivifies.

Also note that checking for the typeslot is done by checking exists of *symbol{SCALAR}, *symbol{ARRAY}, *symbol{CODE} and so on.

BTW: I just fixed a similar issue in the perl-compiler, where checking a symbol - a scalar in this case - autovifivied it. Fixed by this code

The impact of this change: codesearch for defined

CLARIFICATION

I was completely wrong with my defined'ness assumption as other authors also. They have to fix now their code, because it is invalid.

Yesterday after I wrote this blog post, I talked on the #p5p IRC channel for about 5 hours to defend my wrong position, and got persuaded that I was completely off. Thanks. I analyzed my and other code and found multiple errors in my wrong assumption what defined does and defined does not do.

How did I come to this odd assumption? The docs perldoc -f defined are pretty clear on this.

I mainly do maintenance programming, and in most of my modules defined was used t check for symbol existance, in the symbol table hash. defined $symtab($name} which makes only sense if the hash key would return undef in some cases but it does not. The right code would be exists $symtab($name}. In my case it made no difference because there was never a undef as hash value, but in other cases it was just a hard to detect bug.

defined(@array) and defined(%hash) have been deprecated in the docs since 5.6, but only now enforced with 5.16. It never did anything useful, it just worked by accident.

mst came with a good example when it was broken. If you create an array, and then delete all array elements with delete, the array will be empty, but defined will return true.

$ perl -e'$a[0]=undef; print defined(@a); delete $a[0]; print defined(@a); print "empty" unless @a'

=> 11empty

Thanks #p5p!

Well, and if you are worried to ask questions or complain about 'stupid' decisions on the perl5-porters mailing list where such decisions are made:

Do not worry! The perl community has the thickest skin ever. You might hear some intimidating technical slang you will not understand and which will turn you off. Do not care, the explanation should be simple to understand. Do complain and you might get enlightened.

Sorry for the trouble.

ExtUtils::MakeMaker make release

I often wonder why people praise Dist::Zilla for ease of use. Recently I heard this argument: 'It was never easier to make a release. You cannot do that with EUMM'.

So I here is my little make release snippet from on of my Makefile.PL.

There is more in it. make README, make gcov and make gprof for XS extensions.

sub depend {
  "
README : \$(VERSION_FROM)
    pod2text \$(VERSION_FROM) > README

release : dist
    git commit -a -m\"release \$(VERSION)\"
    git tag \$(VERSION)
    cpan-upload \$(DISTVNAME).tar\$(SUFFIX)
    git push
    git push --tags

gcov : \$(BASEEXT).c.gcov \$(BASEEXT).gcov cover_db/\$(BASEEXT)-xs.html

\$(BASEEXT).c.gcov \$(BASEEXT).xs.gcov : \$(BASEEXT).xs
    \$(MAKE) CCFLAGS=\"\$(CCFLAGS) -fprofile-arcs -ftest-coverage\" \
      LDDLFLAGS=\"\$(LDDLFLAGS) -fprofile-arcs -ftest-coverage\"
    gcov \$(BASEEXT).c \$(BASEEXT).xs

cover_db/\$(BASEEXT)-xs.html : \$(BASEEXT).xs.gcov
    PERL5OPT=-MDevel::Cover make test
    -$^X -S gcov2perl \$(BASEEXT).c.gcov \$(BASEEXT).xs.gcov
    $^X -S cover

gprof :
    \$(MAKE) CCFLAGS=\"\$(CCFLAGS) -pg\" LDDLFLAGS=\"\$(LDDLFLAGS) \
      -pg\"
"
}

The unexpected case of -Mblib

I'm in constant worry of unnecessary bloat because our perls have to run fast. "Bloat" means added dependencies, loading new files at startup, needing more time.

With the perl compiler I can analyze code and dependencies at compile-time and strip unneeded packages.
My recent concerns have been:

1. Carp being used in the DynaLoader AUTOLOAD fallback, when a XS is not loaded give a full stacktrace.
2. Carp checking for B being loaded and checking CALLER_OVERRIDE_CHECK_OK. This outsmarts the compilers which pulls in B at…

What to avoid in BEGIN blocks

I came along this crazy code.

But first some short explanation. The perl compiler B::C saves the state of any program at CHECK time, and runs it later. That means every action in BEGIN blocks is frozen and then thawed on a potential another machine in later time. Not all action can be frozen/thawed as we know from the common modules Data::Dumper or Storable

The compiler is better than those. It can restore regular expressions which Storable cannot. It can save the state of some IO object, which Data::Dumper can not. It can save the whole dependency tree of code. Data::Dumper or Storable can only save data, not code along with it.

But some actions are really a bad a idea to be restored at run-time.

1st sample:

my $f;open($f,">&STDOUT");print $f "ok"

This is trivial. The fileno 2 is dup'ed to $f. All at run-time. Nevertheless you will not be able to freeze/thaw $f. The compiler can.

my $f;BEGIN{open($f,">&STDOUT");}print $f "ok"

This is not so trivial. Again the fileno 2 is dup'ed to $f. But the compiler must restore the $f->IoIFP and IoOFP handles to stdout. Done. This a typical problem from Test::Builder and testing the core testsuite with the compiler. Test::NoWarnings e.g. failed here. The old perlcc had a --testsuite command-line switch to cope with that.

my $f;BEGIN{open($f,"<README");}read $f,my $in, 2; print "ok"

This is hard or next to impossible. A IO::File handle must be restored. But as everybody knows only on Windows it is easy the restore a filename from a filehandle. It's pretty hard for the compiler to save the filename from open. It would have to override CORE::open or override IO::File. So this cannot be generally compiled. There can be pipes or sockets to be restored.

Right now B::C warns and leaves a line in the resulting C code, so a tool can fill in the missing filename. Of course it is better to fix the sourcecode to avoid this.

And now think of this:

my ($pid, $out, $in);
BEGIN {
  local(*FPID);
  $pid = open(FPID, "echo <<EOF |");    # DIE
  open($out, ">&STDOUT");       # EASY
  open(my $tmp, ">", ".tmpfile");   # HARD to get filename, WARN
  print $tmp "test\n";
  close $tmp;               # OK closed
  open($in, "<", ".tmpfile");       # HARD to get filename, WARN
}
# === run-time ===
print $out "ok";
kill 1, $pid;           # DIE, if $pid is set at BEGIN only
read $in, my $x, 4;
unlink ".tmpfile";

Killing a pid at run-time saved at compile-time is a really bad idea. I tried the example with the $pid being -1. All windows gone. Restart.

On the right are comments from the compiler point of view.

How to check the kill problem at compile-time? Where does the $pid come from? One must analyze the code where its data came from. From user input or command-line is okay, this is run-time. From compiled code executed at run-time, okay. From compiled code from BEGIN blocks, very bad!

BEGIN { $myself = $$; }

Whow. Good idea, but does not work yet. Avoid BEGIN here. Fixable though. The compiler can add code to restore the pid from run-time, and do not store the compile-time pid. But do we want that? No. The user has to decide.

There are many more examples which are a really bad idea to do. Beware. Not always are BEGIN blocks the best idea.

At cPanel where we use compiled perl we put common init stuff into the INIT block if the compiled code should init it at run-time, and into the BEGIN block if it should be initialized at compile-time.

How to install into 5.6.2

Contrary to popular belief you can install almost all CPAN packages under 5.6.2.

First Step:

Set the urllist in your ./cpan/CPAN/MyConfig.pm to http://cp5.6.2an.barnyard.co.uk/ and rm ./cpan/Metadata and the 3 outdated sources/authors/01mailrc.txt.gz sources/modules/02packages.details.txt.gz sources/modules/03modlist.data.gz

Then start installing.

There are a couple of authors who agressively used to boycott 5.6 in their dependencies (schwern, dagolden, kenwilliams, rsignes), but it is quite easy to fix this. So after a couple of refused installations do this:

cd .cpan/build
grep 'use ExtUtils::MakeMaker' */Makefile.PL
sed -i 's/use ExtUtils::MakeMaker 6.3[0-9];/use ExtUtils::MakeMaker;/' \
   */Makefile.PL

Module::Build can also be installed (0.34), DBI (1.604), Moose (0.40), DateTime (0.66) and many more.

If your CPAN Metadata fails to find the package this is because nobody submitted a PASS report for this package on 5.6.2 (I will not do, because I patched the packages) and so the dynamic cp5.6.2an.barnyard.co.uk does not offer it. Fall back to your default CPAN url and try again then.

For makefile writers: It is easy to allow older perls to install by checking the EUMM version at run-time. For META and LICENSE checks.

Enjoy the speed of an unbloated and fast perl5.6.2!

BTW: If you are concerned about the 5.6 vulnerability for CGI scripts on oCERT-2011-003 style DOS attacks apply this patch to 5.6.2: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2011-12/msg01082.html

p5p in the last years made such things possible: http://www.reddit.com/r/perl/comments/odt7m/perlisnowtheslowestprogramminglanguage/