The alioth shootout Computer Language Benchmarks Game

By Reini Urban on July 3, 2011 6:10 PM

I'm tired hearing from people that perl is dead or slow.

But basically they are right. If you check out a popular language comparison,
perl is almost on the last ranks, if listed at all. Behind python, php and ruby. And we all
know that at least ruby is slower than perl.

Partially because those perl scripts are pretty lame, not optimized as the comparable scripts. And partially because perl is slower than C or LISP.

I started improving some of the slowest scripts which give us a bad reputation.
E.g. I optimized fasta from 7 sec to 2.3 sec
without any algorithmic improvement. Maybe others want to help out also.

Worthwhile targets are fasta, binary-trees.

As I found out adjusting the algo helps much more.
The official tracker contains a version which is 20x faster than fasta.perl by using pythons algo and is therefore 2x faster than python3. This is the range we are used to.

And I'm also working on the compiled perlcc versions with several optimizations.

And I'll try out the new ideas which I talked at YAPC::NA about. So far my typed perlcc versions are slower, because I copy forth and back too many lexicals. Needs more optimization improvements.

17 comments

Tagged as:

benchmark

17 Comments

:m) | July 4, 2011 7:52 AM

Thanks for taking care of this issue. :-)

is
print 'hello', $world;
a major speed improvement over
print "hello $world";
?

If so, I'm not too worried. Things like these can be optimized really easily. But then, don't suboptimize.
For a vast majority of programs "good code" is somewhat equal to "readable code", not "fast code".
The abilities of the print statement in Perl is one of the features which set Perl ahead of other languages in terms of readability and ease of use.
It's been said often, but let me repeat: Most of the time the productivity of the programmer is more important than the speed of the code. Otherwise we would be hacking assembler or C all the time, wouldn't we?

Naveed Massjouni | July 4, 2011 9:16 AM

It's hard to tell what actually changed by looking at the diff on github. I think it would have been better if you did not modify the whitespace on each line.

zgrim | July 4, 2011 1:09 PM

I lost all faith in the benchmarks game when - puzzled a bit by the regex results - i did try the regex-dna on my machine and observed that perl was almost twice as fast as python. On the website this is reversed, python is said to be faster. So, not only orders-of-magnitude errors but actual reversed results. Btw, on "interesting" results, there is one submission that's faster than c++ (the fastest regex-dna implementation, using boost AFAIR), but that's not yet taken into account. So, i wouldn't be so quick into trusting the "game" too much with these kind of "errors". As a side note, the C code there seems a joke, uses tcl.h (lol ?!). Not to say perl itself does not require further optimizations, your work (and others, ofc) proves there are many spots still for much better behaviour, work which we're thankful for and watch closely. :)

Isaac Gouy | July 4, 2011 6:43 PM

> I started improving some of the slowest scripts...

Here are step by step instructions for contributing your better scripts - http://shootout.alioth.debian.org/help.php#contribute

Isaac Gouy | July 4, 2011 7:00 PM

@zgrim - on my machine and observed that perl was almost twice as fast as python

Maybe you made a mistake :-)

Or maybe you didn't measure the same programs, the same workloads, the same versions, the same OS, the same quad-core ...

@zgrim - on "interesting" results, there is one submission...

On "interesting alternative" programs, there are 2 Perl programs which split the pattern at | rather than use the regex that includes |

@zgrim - As a side note, the C code...

So is that program a C program or a Tcl program?

Meanwhile there are 2 other C regex-dna programs.

dagolden.com | July 4, 2011 11:32 PM

I played around with optimizing some of them a while ago, e.g. fannkuch. I found that many of them mostly tested data manipulation primitives and that short of Inline-C or XS or XS modules on CPAN, it's hard to optimize much further.

My work on fannkuch led me to recommend the optimization for reversing an array to itself that was added to Perl 5.12 (e.g. @a = reverse @a), so I think that looking at what's slow to figure out what we could make faster in Perl is still a worthwhile exercise.

zgrim | July 5, 2011 12:11 AM

@Isaac Gouy - Maybe you made a mistake :-)

Sure, maybe i did, but then, maybe - just maybe - i didn't.

Let me be more clear about what I did, not to let people think my measurements were somehow influenced by phases-of-the-moon. I verified my results with a friend who incidentally runs x86_64 Debian, my machine ran 32bit at the time. Also, perlbrew made it trivial for me to test across threads/no-threads/10.x/12.x/14.x/etc. All differences were insignificant between the perls run in contrast with what was kindda constant, the fact that all ran about 2x (give or take, depending on threads/no-threads, but around 2x) the speed of the python3 implementation. I generated the input with fasta 5000000 and ... obviously gave the same input to all programs, i surely did not make a larger file for the python3 program :). And sure, it all must've gone very wrong somewhere, because on the regex-dna python3 vs perl, perl is said to be 2x slower, which is, well, about the reverse of what me and my friend both got. As you so bluntly expect me to have made a mistake somewhere, I also expect that there's a bug in the shootout measurement code and that anyone who runs this particular benchmark should get similar results to mine. All with a "maybe" and a smiley. :)

Reini Urban replied to comment from Naveed Massjouni | July 5, 2011 2:33 AM

Sure. Changed back the indentation.

I also over micro-optimized a bit. No stringifiy in print only saves some microseconds outside the loop.

I'm more worried that all my compiled versions are slower than the pure perl versions.
Esp. the perlcc -O typed versions. Not ready yet for primetime

Reini Urban replied to comment from :m) | July 5, 2011 2:34 AM

You are right.
I started too ambitious :)

Isaac Gouy | July 5, 2011 2:49 AM

@zgrim - You still haven't told us which Perl regex-dna program you're talking about. Is it one of the 4 Perl regex-dna programs currently shown? Which program?

@zgrim - As you so bluntly expect me to have made a mistake somewhere, I also expect that there's a bug in the shootout measurement code...

You already did "bluntly" claim - "So, not only orders-of-magnitude errors but actual reversed results" but still haven't given the minimum information required to investigate a bug.

Reini Urban | July 5, 2011 3:39 AM

@Isaac Gouy:
The tracker contains a much better fasta5.pl by Rodrigo de Oliveira, which uses a faster algorithm.

Some more micro-optimization:
multi + thread is usually not used on linux/bsd.
If something like threads are wanted, than usually Coro.
Is this default at debian?
The fastest is usually multi without ithreads, some distros also have just no ithreads without multi.
Only windows uses ithreads to emulate fork.

Isaac Gouy | July 5, 2011 4:31 AM

@Reini Urban - The tracker contains...

I know, and I know the program it's based on a program that probably changed the algorithm to remove most of the work, and I'll have to work through it line-by-line :(

Apart from the 2 benchmarks game tasks that explicitly use pre-emptive kernel threads or pre-emptive lightweight threads - several Perl programmers contributed programs to use all 4 cores on other tasks, for example -

http://shootout.alioth.debian.org/u32q/program.php?test=mandelbrot&lang=perl&id=1

Isaac Gouy | July 5, 2011 6:07 AM

@Reini Urban - some distros also have

The website says which distro is used - Ubuntu.

The distro Perl is

$ /usr/bin/perl -V
Summary of my perl5 (revision 5 version 10 subversion 1) configuration:

Platform:
osname=linux, osvers=2.6.24-27-server, archname=i686-linux-gnu-thread-multi

Isaac Gouy | July 7, 2011 2:20 AM

@Reini Urban - The fastest is usually multi without ithreads

Here are measurements for i686-linux-multi

http://shootout.alioth.debian.org/u32/measurements.php?lang=perl

What's makes you think they would be faster than measurements for i686-linux? (I think they're slower.)

Reini Urban replied to comment from Isaac Gouy | July 12, 2011 1:17 PM

You are right.

I compared
http://shootout.alioth.debian.org/u32q/measurements.php?lang=perl
vs.
http://shootout.alioth.debian.org/u32/measurements.php?lang=perl

multi non-multi CPU-secs (lower number is faster)
binary-trees 761.37 662.98
fannkuch-redux 3,210.13 2,756.98
fasta 2.38 2.18
k-nucleotide 239.10 226.75
mandelbrot 4,080.26 3,911.21
meteor-contest 63.30 55.92
n-body 1,733.08 1,563.23
pidigits 6.34 6.04
regex-dna 34.97 30.59
reverse-complement 4.88 4.72
spectral-norm 1,094.59 939.56

Why I though that multi is faster (but is not)

If I compare the assembly code for both versions I see that the multi version uses fast stack access for PL_op compared to slow heap access for PL_op.

Compare
http://cpansearch.perl.org/src/RURBAN/Jit-0.04_09/i386thr.c
against
http://cpansearch.perl.org/src/RURBAN/Jit-0.04_09/i386.c

Thanks for updating the benchmarks!

Reini Urban replied to comment from Reini Urban | January 13, 2012 10:23 PM

I used github not alioth for a pull request, improved binarytree:

https://github.com/kragen/shootout/pull/1

Isaac Gouy | January 17, 2012 8:51 PM

> I used github not alioth for a pull request

Why? github is an out-of-date snapshot.