Perl performance evolution over the last decade

I was reading recently about some significant Python 3.11 performance improvements, and I was wondering whether Perl 5 still gets significant performance improvements on each version - even though it might be more mature, thus more optimized in the first place.

I thought I'd compare the final releases of alternating versions starting with 5.12.5 released 10 years ago, using a benchmark I made for a cloud vm comparison. As is the case with any benchmark, it might not be representative of your own workloads - it benchmarks things that are relevant to me and also some things that I would avoid, but are used by many modules and are notoriously slow (mainly DateTime and Moose). However, it is more representative of "real-life", with results that are not lost in noise, than say, PerlBench (which has a different purpose of course). 
Here is the list of the tested Perl releases:

Major Ver.ReleaseDate

Performance on ARM64

I ran the benchmarks on an Apple M1, as I've found it's the fastest CPU type at running Perl (and not only) currently:

Perl Version:
BioPerl Codons:12.4012.867.517.187.606.487.23
BioPerl Monomers:8.919.387.037.397.887.167.10
Regex/Replace utf8:10.7710.979.238.509.819.919.36
Test Moose:8.218.688.318.067.948.528.30
Total time:107.02103.2291.5084.9784.1683.7182.29
*Malformed UTF-8 character warnings with perl 5.12.5

Or in a chart:
(click to open large version)

From the chart we see how until 5.24 we had some very generous gains, over 25% overall. After that, we had some smaller gains, it seems either there were no more easy gains due to the maturity of the code, or the focus was more on fixes/features.
It should be noted that from 5.12 to 5.24 there were quite a few improvements related to unicode, so e.g. the Regex/Subst UTF8 test which gained 49% in performance is probably also doing more work than before while being much faster.
I was very surprised to see the biggest gains where in an XS module (Math::DCT), so I had to profile that to see what was going on, and it turns out unpack was much slower in older Perl versions - the C code is running a the same speed as expected.

Performance on AMD64

While arm64 is gaining market share, it's not yet the most popular architecture, so I repeated on an AMD EPYC Milan (Google Cloud's n2d type) just to verify:

Perl Version:
BioPerl Codons:16.5015.409.859.829.709.539.70
BioPerl Monomers:11.8712.529.819.799.549.139.35
Regex/Replace utf8:17.2415.6411.4610.0411.1610.5610.76
Test Moose:13.7114.2814.6214.3614.6015.2315.16
Total time:149.60139.87125.07117.05112.80111.69111.99

(click to open large version)

The results are not very different, except a small performance regression of the Moose Tests. As that test is using prove, most of the runtime is loading the interpreter and the modules for each test, if you want to speed that up by orders of magnitude, try using Test2::Aggregate. Moose itself seems to have gained speed, but, of course if you really want to speed it up just switch away from Moose. Corinna is not ready (and not directly compatible), but there is Mouse, Moo, etc. 


Here is a table with the average of the two architectures and 5.12 as the 100% base:
Perl Version:
BioPerl Codons100%102%166%170%167%182%171%
BioPerl Mono100%95%124%121%119%127%126%
Reg/Subst utf8100%104%134%149%132%136%138%
Test Moose100%95%96%99%99%93%95%

I think a fair conclusion is that you should not be afraid of any worsening performance due to newer Perls adding more features to the core, it's the opposite really - especially if you are using Perl older than 5.24 it is probably worth it to upgrade just for the performance.

Of course you should benchmark your own workload as there are special cases. E.g. we found out above that unpack is so much faster in 5.28 onwards - but there could be specific things that are slower which might affect you.


Thanks for the comparisons. Indeed, perl is getting faster.

Note that there's also Benchmark::Perl::Formance: . It yields similar results: (visualized: ).

Note that 5.38 will include some nice performance improvements on the very basic level (variable assignment, clearing, anon subroutines etc.). Hopefully these changes will be very visible on the benchmarks. 5.37.5 already includes a lot of them. It hypes me up a lot as someone who loves when stuff run fast.

Thanks for the hint! That's great to know.

But how does Perl behave when compared to other languages like Python? Does make sense, or are there other resources/articles you recommend?

I like the general idea of speeding things up regardless of how it does against other interpreted languages since they were not intended for time-sensitive tasks or really high-throughput; in such cases, I would use the suitable languages for that specific purpose with enough abstraction to the problem (C, C++, Go,...).

BenchmarksGame isn't bad, but its perl comparisons are a bit flawed:

  • perl threads are not as good as in other languages. Perldoc says: [The "interpreter-based threads" provided by Perl are not the fast, lightweight system for multitasking that one might expect or hope for]. Most of those benchmarks are implemented to use multiple CPU cores. Those Perl benchmarks run with threads or with forks (syncing data afterwards), so we take a performance hit up front.
  • Since threads are used, perl interpreter they use for the results is compiled with threads support (program versions are listed below). When perl is compiled with threads, it runs noticeably slower, even when threads are not used by the program. I heard the speed difference is 5 - 15%
  • These types of comparisons don't matter much, as you could just implement the calculation-heavy part of your code in something else (C? C++? Pascal?) and use it directly in your program with XS, FFI, Inline etc. It would run almost as fast as native solution in those languages. Pure Perl tends to be slow doing calculations, but fast doing string manipulation. Remember it's a great glue language.

Overall, speed is an important factor. Obviously features are more important, but lately many stuff has been rewritten just to gain some speed. Ack was rewritten in C as Ag, people were moving away from Ranger (Python file manager application) to vifm / lf just for the speed. I hope the trend of making Perl faster continues. Obviously there's a ceiling somewhere, but latest improvements made me very optimistic

Tracking times in smaller things would also help, since they probably contribute to the changes in the larger, real-world examples. I noticed that the performance of the lc() function changes drastically among perl versions. Just between 5.34 and 5.36 the speed is more than 2x slower. I thought at first this might be actually be measuring speed differences in memory allocation or copying, but it's probably related to string internals or UTF-8. Here's the simple test:
perlbrew exec perl -MBenchmark=timethis -e '$s=(join"","A".."Z")x1_000; timethis(-1, sub { my $c= lc $s })'
timethis for 1:  1 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 192752.38/s (n=202390)

timethis for 1:  1 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 192752.38/s (n=202390)

timethis for 1:  2 wallclock secs ( 1.08 usr +  0.00 sys =  1.08 CPU) @ 424769.44/s (n=458751)

timethis for 1:  1 wallclock secs ( 1.08 usr +  0.00 sys =  1.08 CPU) @ 72403.70/s (n=78196)

timethis for 1:  1 wallclock secs ( 1.08 usr +  0.00 sys =  1.08 CPU) @ 424769.44/s (n=458751)

timethis for 1:  1 wallclock secs ( 1.10 usr +  0.00 sys =  1.10 CPU) @ 91995.45/s (n=101195)

timethis for 1:  1 wallclock secs ( 1.10 usr +  0.00 sys =  1.10 CPU) @ 91995.45/s (n=101195)

timethis for 1:  1 wallclock secs ( 1.10 usr +  0.00 sys =  1.10 CPU) @ 91995.45/s (n=101195)

timethis for 1:  1 wallclock secs ( 1.04 usr +  0.01 sys =  1.05 CPU) @ 91021.90/s (n=95573)

timethis for 1:  1 wallclock secs ( 1.10 usr +  0.01 sys =  1.11 CPU) @ 91166.67/s (n=101195)

timethis for 1:  1 wallclock secs ( 1.08 usr +  0.01 sys =  1.09 CPU) @ 87681.65/s (n=95573)

timethis for 1:  1 wallclock secs ( 1.09 usr +  0.00 sys =  1.09 CPU) @ 87681.65/s (n=95573)

timethis for 1:  1 wallclock secs ( 1.08 usr +  0.01 sys =  1.09 CPU) @ 87681.65/s (n=95573)

timethis for 1:  1 wallclock secs ( 1.08 usr +  0.00 sys =  1.08 CPU) @ 910222.22/s (n=983040)

timethis for 1:  1 wallclock secs ( 1.22 usr +  0.01 sys =  1.23 CPU) @ 932422.76/s (n=1146880)

timethis for 1:  1 wallclock secs ( 1.04 usr +  0.01 sys =  1.05 CPU) @ 936228.57/s (n=983040)

timethis for 1:  1 wallclock secs ( 1.10 usr +  0.00 sys =  1.10 CPU) @ 735965.45/s (n=809562)

Leave a comment

About Dimitrios Kechagias

user-pic Computer scientist, physicist, amateur astronomer.