Benchmarking index() and regex in Perl 6

By Aaron Baugher on August 7, 2015 1:55 PM

I noticed Perl 6 has a Benchmark module already, so I was wanting to use it, and Liz's suggestion of using index() rather than a regex in my last script gave me an excuse. The results were striking.

The script and results are below. Benchmark.pm6 doesn't have a cmpthese() routine, but timethese() does well enough. Below is the script, then the average times required for one grep through the array of about 150 lines. (I ran the script five times and averaged those times in the bottom row of the table.)

What did I learn?

Well, for starters, index() is at least 10 times faster than the best regex solution, and 100 times better than my first attempt. So that's the way to go, whenever possible.

Comparing the regexes was interesting too, though, so I ended up trying several things. Putting the regex in the grep with a bare variable (regex1) was terrible. Replacing the variable with a constant (regex2) was much faster, but that's not usually an option in a real program. The next thing I tried was actually regex4, creating a regex object outside the loop. I was a little surprised that that didn't gain anything over regex1. I guess since it can't know for sure that $string will never change, it still has to reinterpolate it every time.

So then I tried regex5, and wasn't surprised to see it fast again with the constant. Then I thought of regex6: putting quotes around the variable in the regex object, so it would go ahead and interpolate it. That sped it up a lot, though not as much as the constant. And if I print out $r6, it shows it just as it is there, so it hasn't forgotten it's a variable.

That led me to try regex3 and discover that quoting the variable inside the grep test gains the same thing.

So creating the rx// object in advance didn't gain anything; in fact regex{456} are slightly slower than regex{123}. What made the difference was putting quotes around $string. And I'm a little puzzled why that would be. Maybe it's time to read some more of the Synopsis on regexes and see if it enlightens me.

#!/usr/bin/env perl6
use v6;
use Benchmark;

my $p = run 'ps', 'auxww', :out; 
my $header = $p.out.get;
my @lines = $p.out.lines;

my $string = 'xterm';
my $r4 = rx{  $string  };
my $r5 = rx{   xterm   };
my $r6 = rx{ "$string" };

my %h = timethese 1000, {
    'regex1' => &regex1, 'regex2' => &regex2, 'regex3' => &regex3,
    'regex4' => &regex4, 'regex5' => &regex5, 'regex6' => &regex6,
    'index1' => &index1,
};
say map { $_ => %h{$_}[3] }, sort keys %h;

sub regex1 { my @new = grep { /  $string  /           }, @lines }
sub regex2 { my @new = grep { /   xterm   /           }, @lines }
sub regex3 { my @new = grep { / "$string" /           }, @lines }
sub regex4 { my @new = grep { $r4                     }, @lines }
sub regex5 { my @new = grep { $r5                     }, @lines }
sub regex6 { my @new = grep { $r6                     }, @lines }
sub index1 { my @new = grep { .index($string).defined }, @lines }

# results
| index | regex1 | regex2 | regex3 | regex4 | regex5 | regex6 |
|-------+--------+--------+--------+--------+--------+--------|
| 0.003 |  0.435 |  0.028 |  0.068 |  0.485 |  0.032 |  0.069 |
| 0.008 |  0.446 |  0.029 |  0.070 |  0.517 |  0.033 |  0.072 |
| 0.003 |  0.437 |  0.028 |  0.068 |  0.474 |  0.031 |  0.074 |
| 0.003 |  0.449 |  0.028 |  0.070 |  0.593 |  0.034 |  0.070 |
| 0.003 |  0.472 |  0.030 |  0.078 |  0.508 |  0.031 |  0.087 |
|-------+--------+--------+--------+--------+--------+--------|
| 0.004 |  0.448 |  0.029 |  0.071 |  0.515 |  0.032 |  0.074 |

0 comments

Tagged as:

Benchmark, Perl 6, regex

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Aaron Baugher

I'm a programmer and Unix sysadmin who uses Perl as much as possible, operating from the Midwest USA. To hire me for sysadmin or programming work, contact me at aaron.baugher @ gmail.com or as 'abaugher' on #perl6.

More info »

Aaron's Perl 6 Blog