New perl5 Porting/bench.pl

Dave Mitchel finally got fed up by the lack of stable perl benchmarks, and the impossibility do catch performance regressions.

This was his announcement, with the sample:

$ Porting/bench.pl -j 8 --raw --fields=Ir,Dr,Dw \
   --tests=expr::assign::scalar_lex \
   perl5210o perl5211o perl5212o perl5213o perl5214o perl5215o perl5216o

....

expr::assign::scalar_lex
lexical $x = 1

   perl5210o perl5211o perl5212o perl5213o perl5214o perl5215o perl5216o
   --------- --------- --------- --------- --------- --------- ---------
Ir     155.0     155.0     155.0     155.0     161.0     161.0     161.0
Dr      54.0      54.0      54.0      54.0      57.0      57.0      57.0
Dw      30.0      30.0      30.0      30.0      31.0      31.0      31.0

and the bisect usage sample:

D=/home/davem/perl5/git/bleed

$D/Porting/bisect.pl              \
 --start=v5.21.3                  \
 --end=v5.21.4                    \
 -Doptimize=-O2                   \
 -j 16                            \
 --target=miniperl                \
 -- perl5201o $D/Porting/bench.pl \
      -j 8                             \
      --benchfile=$D/t/perf/benchmarks \
      --tests=expr::assign::scalar_lex \
      --perlargs='-Ilib'               \
      --bisect='Ir,153,157'            \
      ./miniperl

p5p had universal praise for it, because probably nobody did proper benchmarks before. Well, it's at least better than nothing.

It uses cachegrind, which means it is much slower than linux perf, but works on all platforms. It does not display error rates, it runs the sample only once, so you can only trust it, or do not trust it, e.g. in case of high load. Dave said the results are trustworthy, even with -j8, probably because it runs through cachegrind.

Currently it only measures micro ops, not the big picture, a sample over the most used ops, as I argued before at On simple benchmarks and for the right op distribution at Idea - costmodel for B::Stats.

My parrot-bench uses linux perf on linux instead of just cachegrind. It uses bigger samples, which do work across all versions, it is much faster, using native speed and is more reliable IMHO. I rather see a bad error rate than none, and I rather see the absolute timings down to 6 digits.

For the API you can compare it to this similar existing microbenchmark library by google: https://github.com/google/benchmark

There are now two different benchmark tests in p5p t/: The old t/benchmark/rt26188-speed-up-keys-on-empty-hash.t, and the new t/perf directory with the various micro benchmarks.

So with this bench, you can say this op got faster, but you cannot say that perl got faster.

What can you expect. We even don't have a proper realtime clock in our Time::HiRes module yet. i.e. asm rdtsc on intel or any other realtime clock sources for the various platforms, as in Devel::NYTProf.

About Reini Urban

user-pic Working at cPanel on cperl, B::C (the perl-compiler), parrot, B::Generate, cygwin perl and more guts, keeping the system alive.