Grepping exact values

In my perspective Perl syntax for grep doesn't help making it faster when searching for exact values. Let me explain. While it is not that common, what you do when trying to grep for an exact string from an array?

Perl makes the use of pattern matching on greps easy:

@selected = grep { /^foo$/ } @list;

You can argue on the usability. But trust me, every once and then, some strange constructs get really useful. Unfortunately the above expression is not efficient. If you replace it by

@selected = grep { $_ eq "foo" } @list;

you get a two times faster code (check the bottom for Benchmark results).

Following the idea of split that accepts a string and uses it for the split process, I think grep could accept a string as well (at least on grep EXPR,LIST construct):

@selected = grep "foo", @list;

What kind of inconveniences would this raise?

Benchmark: timing 50000 iterations of equal, match...
equal: 22 wallclock secs (21.25 usr + 0.14 sys = 21.39 CPU) @ 2337.54/s (n=50000)
match: 53 wallclock secs (51.50 usr + 0.29 sys = 51.79 CPU) @ 965.44/s (n=50000)


In Perl 6 this form works already, because each list item is simply smart-matched against the first argument (aka matcher).

If the matcher is a block/closure, the smart-matching executes the block, handing the list item as a parameter.

If it's a string, it does string comparison. If it's a number, it does numeric comparsion. If it's a range, it checks if the list item is within the range. If it's a type object, it does a type check. And so on, you get the picture.

Perl 5 could do the same thing, at least if the first argument is a literal.

Maybe there should be a "smartgrep" or "sgrep" in Perl 5? It would work just like grep, but use smart matching instead.

Probably doable right now with Devel::Declare.

Hmm.. maybe I'm missing something here, but why not use List::Util 'first'?

The problem with grep is that it does not stop going over the list once it has found what it's looking for. That is because it doesn't work as a boolean (though people use it as such).

Grep returns all matching results. If you're trying to match a complete string, there's no point in continuing on after finding it (at least in 90% of the time), especially if you want a boolean comparison.

List::Util::first stops after the first find. So, it's much faster than grep and does proper matching with code block:

if ( first { $_ eq $wanted } @possible ) {}

Combinining Nilson's and Sawyer's observations: if you're only testing for existence of a fixed string or number -- there's no need to return the values, since you already know what they are, and it's only rarely that you're looking for a count rather than presence -- then you don't need anything that's not already in perl.

Just do "foo" ~~ @list. It's simple and it's faster than either of the grep-based alternatives or List::Util::first (which, in fact, manages to be slower than grep with an eq, if the thing to be found is near the end of the list.)

Did you bench :

@selected = grep /^foo$/, @list;

It gave surprising result...

I guess ~~ is really all you need. If you're just testing in boolean context. However you may want to count occurrences so I still think a "smart grep" would be useful.

Leave a comment

About Alberto Simões

user-pic I blog about Perl. D'uh!