Optimizing compiler benchmarks (part 1)

By Reini Urban on September 20, 2012 11:39 PM

Since my goal is to improve the compiler optimizer (staticly with B::CC, but also the perl compiler in op.c) I came to produce these interesting benchmarks.

I took the regex-dna example from "The Computer Language Benchmarks Game" at shootout.alioth.debian.org/

$ time perl t/regex-dna.pl <t/regexdna-input
agggtaaa|tttaccct 0
[cgt]gggtaaa|tttaccc[acg] 3
a[act]ggtaaa|tttacc[agt]t 9
ag[act]gtaaa|tttac[agt]ct 8
agg[act]taaa|ttta[agt]cct 10
aggg[acg]aaa|ttt[cgt]ccct 3
agggt[cgt]aa|tt[acg]accct 4
agggta[cgt]a|t[acg]taccct 3
agggtaa[cgt]|[acg]ttaccct 5

101745
100000
133640

real    0m**0.130s**  /(varying from 0.125 to 0.132)/
user    0m0.120s
sys     0m0.008s

t/regexdna-input contains 100KB 1600 lines of DNA code, which is used to match DNA 8-mers and substitute nucleotides for IUB codes.

$ wc t/regexdna-input 
1671   1680 101745 t/regexdna-input

Perl behaves pretty good in this benchmark, it is actually the fastest scripting language. But the compiler should do better, and I had some ideas to try out for the optimizing compiler. So I thought.

First the simple and stable B::C compiler with -O3:

$ perlcc -O3 -o regex-dna-c -S t/regex-dna.pl
$ time ./regex-dna-c <t/regexdna-input
agggtaaa|tttaccct 0
[cgt]gggtaaa|tttaccc[acg] 3
a[act]ggtaaa|tttacc[agt]t 9
ag[act]gtaaa|tttac[agt]ct 8
agg[act]taaa|ttta[agt]cct 10
aggg[acg]aaa|ttt[cgt]ccct 3
agggt[cgt]aa|tt[acg]accct 4
agggta[cgt]a|t[acg]taccct 3
agggtaa[cgt]|[acg]ttaccct 5

101745
100000
133640

real    0m**0.285s**
user    0m0.272s
sys     0m0.004s

0.130s vs 0.285s compiled? What's going on? B::C promises faster startup-time and equal run-time. With -S we keep the intermediate C source to study it. Let's try B::CC, via -O. Here you don't need a -O3 as B::CC already contains all B::C -O3 optimizations

$ perlcc -O -o regex-dna-cc t/regex-dna.pl
$ time ./regex-dna-cc <t/regexdna-input
...
real    0m**0.267s**
user    0m0.256s
sys     0m0.008s

Hmm? Let's see what's going on with -v5.

$ perlcc -O3 -v5 -S -oregex-dna-c -v5 t/regex-dna.pl

script/perlcc: Compiling t/regex-dna.pl
script/perlcc: Writing C on regex-dna-c.c
script/perlcc: Calling /usr/local/bin/perl5.14.2d-nt -Iblib/arch -Iblib/lib -MO=C,-O3,-Dsp,-v,-oregex-dna-c.c t/regex-dna.pl
Starting compile
 Walking tree
 done main optree, walking symtable for extras
 Prescan 0 packages for unused subs in main::
 %skip_package: B::Stackobj B::Section B::FAKEOP B::C B::C::Section::SUPER B::C::Flags
 B::Asmdata O DB B::CC Term::ReadLine B::Shadow B::C::Section B::Bblock B::Pseudoreg
 B::C::InitSection B::C::InitSection::SUPER
 descend_marked_unused: 
...
%INC and @INC:
 Delete unsaved packages from %INC, so run-time require will pull them in:
 Deleting IO::Handle from %INC
 Deleting XSLoader from %INC
 Deleting B::C::Flags from %INC
 Deleting B::Asmdata from %INC
 Deleting Tie::Hash::NamedCapture from %INC
 Deleting B::C from %INC
 Deleting SelectSaver from %INC
 Deleting IO::Seekable from %INC
 Deleting base from %INC
 Deleting Config from %INC
 Deleting B from %INC
 Deleting Fcntl from %INC
 Deleting IO from %INC
 Deleting Symbol from %INC
 Deleting O from %INC
 Deleting Carp from %INC
 Deleting mro from %INC
 Deleting File::Spec::Unix from %INC
 Deleting FileHandle from %INC
 Deleting Exporter::Heavy from %INC
 Deleting strict from %INC
 Deleting Exporter from %INC
 Deleting vars from %INC
 Deleting Errno from %INC
 Deleting File::Spec from %INC
 Deleting IO::File from %INC
 Deleting DynaLoader from %INC
 %include_package: warnings warnings::register
 %INC: warnings.pm warnings/register.pm
 amagic_generation = 1
 Writing output
 Total number of OPs processed: 323
 NULLOP count: 8

%include_package contains: warnings warnings::register. These two cost a lot of time. Carp is also a nice example of code bloat for the static compiler.

Let's try without:

$ perlcc -O3 -Uwarnings -Uwarnings::register -S -oregex-dna-c1  t/regex-dna.pl
$ wc regex-dna-c.c
2293  16084 128953 regex-dna-c.c
$ wc regex-dna-c1.c
1201  7488 57236 regex-dna-c1.c

128953 down to 57236 bytes. Double size with warnings. So lot of startup-time overhead.

$ perlcc -O -O2 -Uwarnings -Uwarnings::register -S -oregex-dna-cc1 t/regex-dna.pl

$ time ./regex-dna-c1 <t/regexdna-input
...
real    0m**0.284s**
user    0m0.271s
sys     0m0.004s

$ time ./regex-dna-cc1 <t/regexdna-input
...
real    0m**0.266s**
user    0m0.255s
sys     0m0.008s

Not much gain by stripping warnings, since the main part is run-time, startup-time is usually 0.010 (uncompiled) to 0.001 (compiled).

Wait, what perl is perlcc calling at all? Hopefully the same as perl. Nope. As it turns out perlcc was compiled debugging, and comparing debugging perls with non-debugging explains double run-time. You see it with -v in the output above /usr/local/bin/perl5.14.2d-nt, which is my naming perlall-derived convention for debugging non-threaded.

Recompiling the compiler with normal perl, and re-testing:

$ perl -S perlcc -O3 -Uwarnings -Uwarnings::register -S -oregex-dna-c1  t/regex-dna.pl
$ perl -S perlcc -O -O2 -Uwarnings -Uwarnings::register -S -oregex-dna-cc1  t/regex-dna.pl

$ time ./regex-dna-c1 <t/regexdna-input
...
real    0m0.127s
user    0m0.124s
sys     0m0.000s

$ time ./regex-dna-cc1 <t/regexdna-input
...
real    0m0.121s
user    0m0.120s
sys     0m0.008s

0.130s vs 0.127s (compiled) vs 0.121s (optimizing compiled) makes now sense. But not much room to improve here, as the regex engine already has a pretty good DFA (not the fastest as re::Engine::RE2 would be faster) but is not optimizable by the optimizing compiler.

Better optimize numbers. Tomorrow. I want to improve stack smashing in B::CC. Getting rid of copying intermediate C values from the C stack and back to the perl heap.

See the arithmetic part 2

3 comments

Tagged as:

benchmark

3 Comments

Steven Haryanto | September 21, 2012 3:36 AM

BTW, are there people here using B::C or B::CC for their applications? I'd like to hear about their experiences.

Isaac Gouy | September 24, 2012 5:21 PM

Nice article, but please stop saying "alioth shootout" -- the project was renamed over 5 years ago.

Isaac Gouy | October 5, 2012 9:07 AM

from "The Computer Language Benchmarks Game" at

Thank you.

(part 2) "the shootout precision tests"

The phrase - the benchmarks game precision tests - seems to work.

About Reini Urban

Working at cPanel on cperl, B::C (the perl-compiler), parrot, B::Generate, cygwin perl and more guts, keeping the system alive.

More info »

Reini Urban