Reini Urban [blogs.perl.org]

A little warning to EUMM and shell-script users

By Reini Urban on January 9, 2015 4:22 PM

I sometimes need to write shell-script test scripts and not perl, to be able to test perl scripts, without interfering with perl, and also for performance and easier IO reasons.

In order to find out with which perl this distro was built, we need to parse the generated Makefile.

Recent EUMM 7.0x introduced a new feature which broke all my scripts. They started double-quoting PERL and FULLPERL in the generated Makefile. Damage is already done. They only thing you can do is to remove the quote.

PERL=`grep "^PERL =" Makefile|cut -c8-`
PERL=${PERL:-perl}
PERL=`echo $PERL|sed -e's,^",,; s,"$,,'`

They obviously were afraid of spaces in Windows paths. Only cmd.exe accepts "cmd", no other shell. So the obvious fix would be to add double quotes on Win32 only, and only of a space appears on the NAME or the PATH. Same as we have to do with $^X in system calls, where we have to double-quote $^X explicitly in string -context. Like with

$X = $^X =~ / / ? qq("$^X") : $^X; system("$X ...")

Initial feedback to the maintainers was not positive, they don't care. EUMM needs to write Makefiles, nothing else. The second reply was: Just use sh -c $PERL $args. Yeah. Exactly.

So I fear the toolchain also starts rotting now with the newbies taking over. Test::Builder is also in great danger with a newbie maintainer. The initial trials were twice as slow to be able to support streaming. Given that p5p has similar technical problems it doesn't look to good for 5.2x being usable too soon. I'm still forced to use 5.14.4.

Let's just hope CPAN will not get new maintainers.

My fix: https://github.com/rurban/perl-compiler/commit/16379cf29cbffdf8ffce9d0822af0548cfb65051

1 comment

The sad story of pseudohash criticism

By Reini Urban on January 8, 2015 5:35 PM

I just had to endure MJD’s horrible pseudohash explanation at the Pittsburgh Workshop. “A new, never-before-seen talk on Perl’s catastrophic experiment with “pseudohashes”, which wasted everyone’s time for nine years between 1998 and 2007”

https://www.youtube.com/watch?v=-HlGQtAuZuY

Watch it, you can fast forward through it. I honestly had higher opinions on Marc-Jason.

So let’s see what’s wrong with the popular and uninformed pseudohash critic:

Their main points are that storing a hash in the array slot 0 for run-time lookup is too complicated, the exists and delete ops need to check for arrays / pseudohashes now also, and all the pseudohash checks slowed down general hash usage by 15%. Which basically levelled the advantage of faster compile-time accelerated array lookup on those pseudo hashes, which was ~15%.

package Critter;
use fields qw(NAME TYPE);

my Critter $h;    # compile-time optimization: href NAME => aref 1
$h->{NAME};     # ==> $h->[1]

but:

$key = "NAME";  # defer to run-time lookup of href in aref 0
$h->{$key};       # ==> $h->[ $h->[0]->{$key} ]

So by allowing the slow run-time access, you need to preserve the hash semantics of the array. Still, the compilers knows about the type of $h, and can still compile it to a href $key aref 0.

Same problem with exists and delete.

exists $h->{NAME} is compile-time constant foldable to YES or NO.

delete $h->{NAME} needs to store a sentinel as with hashes in aref 1. This only slows down aref for pseudohashes, but should not slow down href or aref for arrays.

Of course this was not how it was implemented. In good old perl5 fashion $h was kept as hash, and all the hash ops were extended to check for pseudohashes at run-time. Yes, at run-time, in the ops.

What should have been done instead was to either reject pseudohash optimization when a run-time key was parsed, maybe with a warning under use warnings.

Or if you really don’t want to punish bad behaviour by using computed keys with explicitly requested compile-time keys, compile $h to arrays and not to hashes.

As I said before, perl5 is just badly implemented, but still fixable. use fields could still be a hash hint for the compiler to change it to arrays.

Just don’t expect anything from the current maintainers and the old-timers.

Horrible talk.

3 comments

New perl5 Porting/bench.pl

By Reini Urban on January 4, 2015 1:02 PM

Dave Mitchel finally got fed up by the lack of stable perl benchmarks, and the impossibility do catch performance regressions.

This was his announcement, with the sample:

$ Porting/bench.pl -j 8 --raw --fields=Ir,Dr,Dw \
   --tests=expr::assign::scalar_lex \
   perl5210o perl5211o perl5212o perl5213o perl5214o perl5215o perl5216o

....

expr::assign::scalar_lex
lexical $x = 1

   perl5210o perl5211o perl5212o perl5213o perl5214o perl5215o perl5216o
   --------- --------- --------- --------- --------- --------- ---------
Ir     155.0     155.0     155.0     155.0     161.0     161.0     161.0
Dr      54.0      54.0      54.0      54.0      57.0      57.0      57.0
Dw      30.0      30.0      30.0      30.0      31.0      31.0      31.0

and the bisect usage sample:

D=/home/davem/perl5/git/bleed

$D/Porting/bisect.pl              \
 --start=v5.21.3                  \
 --end=v5.21.4                    \
 -Doptimize=-O2                   \
 -j 16                            \
 --target=miniperl                \
 -- perl5201o $D/Porting/bench.pl \
      -j 8                             \
      --benchfile=$D/t/perf/benchmarks \
      --tests=expr::assign::scalar_lex \
      --perlargs='-Ilib'               \
      --bisect='Ir,153,157'            \
      ./miniperl

p5p had universal praise for it, because probably nobody did proper benchmarks before. Well, it's at least better than nothing.

It uses cachegrind, which means it is much slower than linux perf, but works on all platforms. It does not display error rates, it runs the sample only once, so you can only trust it, or do not trust it, e.g. in case of high load. Dave said the results are trustworthy, even with -j8, probably because it runs through cachegrind.

Currently it only measures micro ops, not the big picture, a sample over the most used ops, as I argued before at On simple benchmarks and for the right op distribution at Idea - costmodel for B::Stats.

My parrot-bench uses linux perf on linux instead of just cachegrind. It uses bigger samples, which do work across all versions, it is much faster, using native speed and is more reliable IMHO. I rather see a bad error rate than none, and I rather see the absolute timings down to 6 digits.

For the API you can compare it to this similar existing microbenchmark library by google: https://github.com/google/benchmark

There are now two different benchmark tests in p5p t/: The old t/benchmark/rt26188-speed-up-keys-on-empty-hash.t, and the new t/perf directory with the various micro benchmarks.

So with this bench, you can say this op got faster, but you cannot say that perl got faster.

What can you expect. We even don't have a proper realtime clock in our Time::HiRes module yet. i.e. asm rdtsc on intel or any other realtime clock sources for the various platforms, as in Devel::NYTProf.

0 comments

Reini Urban

January 2015 Archives

A little warning to EUMM and shell-script users

The sad story of pseudohash criticism

New perl5 Porting/bench.pl

About Reini Urban

Search this blog