Here we go loop de loop

Well looks like old fiend smartypants was up to her old tricks again.

You know the type, spends the day at work in some IRC, never forgets the minutia of page 67 of the help file of a sys-admin command, has memorized and and has a better version every regex ever written, knows she is always right and in the bosses eyes can never make a poor choice when designing code.

Anyway let's get on with the code example. Sometimes the ugly way to do
things is the best. In this case processing a large array where lets say the following has to be extracted,

  1. The maximum value
  2. The minimum value
  3. Clean out any duplicates
  4. Group the data into 3 sets

Simple enough really. But then I saw the code (changed to protect the guilty and save space)

use List::MoreUtils qw(minmax uniq part);
my ($set_min, $set_max) = minmax @set;
my @unique_set = uniq @set;
my $not_used;
my ($not_used,$sub_set_a) = part { $_ <= $a } @set;
my ($not_used,$sub_set_b) = part { $_ > $a and $_<=$b } @set;
my ($not_used,$sub_set_c) = part { $_ > $b } @set;

Ok not much here to complain about Yes I agree this does read well and does everything correctly but is it good code?

In most cases I would say yes but in this case the @set happened to have several years worth of data and thus thousands of values.

So would not this be better??

my ($set_min,$set_max,%unique,@sub_set_a,@sub_set_b,@sub_set_c) foreach my $item (@set){ $set_max = $item if ($item > $set_max); $set_min = $item if ($item < $set_min || !$set_min ); $unique{$item}=1; push(@sub_set_a,$item) if ($item <= $a); push(@sub_set_b,$item) if (($item > $a) and ($item <= $b)); push(@sub_set_c,$item) if ($item > $b); }

From max of 5x (assuming maxmin does only 1 iteration) down to 1x. I don't care the
the List::MoreUtils using 'C' with the 'XS' you still have the possibility of a
huge amount of extra iteration. It doesn't take a higher maths degree to argue that one.

The point being that Smartypants wanted show off by using some nifty-cool module whiteout actually looking at problem.

Don't get me wrong the List::MoreUtils are great when you have one thing to do a to a list.

Have more than one task? Take a few seconds and think a bout it.


Your snark is unfortunate, doubly so because you're wrong. Your smartypants apparently wears the smart pants for a reason. Their version is faster.

With 10,000 entries in @set.

Benchmark: timing 1000 iterations of ForLoop, ListMoreUtils...
ForLoop: 11 wallclock secs (10.16 usr + 0.04 sys = 10.20 CPU) @ 98.04/s (n=1000)
ListMoreUtils: 4 wallclock secs ( 4.37 usr + 0.01 sys = 4.38 CPU) @ 228.31/s (n=1000)
ok 1 - Do they match

With 100,000 entries in @set.

Benchmark: timing 100 iterations of ForLoop, ListMoreUtils...
ForLoop: 11 wallclock secs (11.18 usr + 0.04 sys = 11.22 CPU) @ 8.91/s (n=100)
ListMoreUtils: 4 wallclock secs ( 3.49 usr + 0.02 sys = 3.51 CPU) @ 28.49/s (n=100)
ok 1 - Do they match

With 1,000,000 entries in @set.

Benchmark: timing 10 iterations of ForLoop, ListMoreUtils...
ForLoop: 14 wallclock secs (13.17 usr + 0.15 sys = 13.32 CPU) @ 0.75/s (n=10)
ListMoreUtils: 5 wallclock secs ( 4.42 usr + 0.13 sys = 4.55 CPU) @ 2.20/s (n=10)
ok 1 - Do they match

So yeah, the smarty pants version scales better at every scale.

Though I think the part version could be better.

my ($sub_set_a, $sub_set_b, $sub_set_c, $not_used) = part {
$_ <= $a ? 0
: $_ > $a and $_<= $b ? 1
: $_ > $b ? 2
: 3
} @set;

(not tested)

Funny, there are other things that could be improved. For a start, the $not_used is unnecessary, you can just write it like this:

my (undef,$sub_set_a) = part { $_ <= $a } @set;
my (undef,$sub_set_b) = part { $_ > $a and $_<=$b } @set;
my (undef,$sub_set_c) = part { $_ > $b } @set;

More importantly though, if $a < $b is a precondition here (but only then!), then this is a misunderstanding of the part function and you can reduce these three passes to a single one:

my ($sub_set_a, $sub_set_b, $sub_set_c) = part {
    $_ <= $a ? 0 :
    $_ <= $b ? 1 :
} @set;

Joel: if $a < $b doesn’t always hold, then the subsets can overlap, which cannot be produced from a single part pass – so your version is broken and the original is necessary. If it does always hold, then your version is unnecessarily complex.

Ah, yes, I see it. Well, in my defense (maybe?) I was going fast (evident in my "not tested" warning). Thanks for the catch.

Leave a comment

About byterock

user-pic Long time Perl guy, a few CPAN mods allot of work on DBD::Oracle and a few YAPC presentations