On simple benchmarks

use Benchmark is not good enough. At all. - you can specify -2 as count which means 2 seconds. Good. - if you specify the test code as string not coderef means that you bench also the parsing time for all counts, and not plain run-time. coderefs should be used. The result is entirely unrealistic as you compile once and run often. - the iteration results are not used at all to check the statistical test quality. - without using :hireswallclock you get time(2) precision which is integer seconds.

benchmark-perlformance is too good and too slow. It"s good to have a single special and reliable machine for this, but I see no useful results. And I miss simple tests with good op coverage. I even do not see op coverage at all.

How fast is my perl, how good is my test and how good is my test result?

Dumpbench reports at least some statistical quality, but needs too many args. initialruns and targetprecision should not be mandatory.

I need a Benchmark package which automatically selects the number of iterations to get a reliable and reproducible result, rejects automatically high load prior to start, filters outliers within the testing (warmup oscillations, random spikes) and statistically bad results (low precision or low accuracy). Do not print bad results, reject them.

This is no high art as reading other documentation suggests. Reproducing measurements is basic work done daily. In my previous work with AVL our customers demanded to get good quality measurements results, repeatability and statistical verification. The user should not be bothered with mandatory arguments which will highly influence the results. E.g. Graphics benchmarks give us a single number FPS, which tells us all on any CPU, GPU and architecture. Results should be comparable on different machines. 31.161/s on one machine means what on a slower machine? 2.164.227/s means what? Probably that the test is just a null-op. This should be rejected right away.

Furthermore it should be possible to see the graph 1. for the single benchmark run (to verify quality), and 2. compare results with a graph in time for different perl versions and time. Similar to http://speed.perlformance.net/timeline/ but with more than 3 results in the graph. At least for every major perl version, but optionally for every single commit.

I"ll give it a try. At least for the questions how fast is it and how good is my test result.

The profiling stats for how good is my test is another problem, which eg. cannot be tackled with a simple line level profiler like this (simplified version from brian d foy's Mastering Perl)

package Devel::prof;
sub DB::DB {
  my ($file,$line) = (caller)[1,2];
  return unless $file eq $0;
  $c[$line]++
}
END{
  open F, $0 or die;
  print "\nEND - Linecount for $0:\n";
  while (<F>) {
    printf "%5d: %5d %s", $c[$i], $i, $_ if $c[++$i];
  }
}
1;

which is just a simplified version of NYTProf. Rather only with an op-level profiler like this B::Stats which tells you which ops were called how often, so we can see which ops were missed and how the distribution of ops is and compare this to typical perl programs.

Also needed would be the knowledge of typical op costs, which can be extracted from system profiling tools. How slow is divide compared to idivide, helem vs hslice, keys vs values, method vs methodnamed, leavesub vs leavesublv, enteriter vs grepwhile vs mapwhile.

My initial take on perl-core testing was:

sub f{my($n)=@_;$n==8 and bless{1..4}and$a=~s/$/../;$n<2 and return$n;
f($n-1)+f($n-2)}f(33)

but how what ops does this use and is this a fair example? See the thread starting at http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2010-09/msg00403.html

Did you see spark BTW? https://github.com/holman/spark

2 Comments

Leave a comment

About Reini Urban

user-pic Working at cPanel on B::C (the perl-compiler), p2, types, parrot, B::Generate, cygwin perl and more guts (LLVM, jit, optimizations).