Here we go loop de loop

By byterock on January 25, 2014 8:55 PM

Well looks like old fiend smartypants was up to her old tricks again.

You know the type, spends the day at work in some IRC, never forgets the minutia of page 67 of the help file of a sys-admin command, has memorized and and has a better version every regex ever written, knows she is always right and in the bosses eyes can never make a poor choice when designing code.

Anyway let's get on with the code example. Sometimes the ugly way to do
things is the best. In this case processing a large array where lets say the following has to be extracted,

The maximum value
The minimum value
Clean out any duplicates
Group the data into 3 sets

Simple enough really. But then I saw the code (changed to protect the guilty and save space)

use List::MoreUtils qw(minmax uniq part);
my ($set_min, $set_max) = minmax @set;
my @unique_set = uniq @set;
my $not_used;
my ($not_used,$sub_set_a) = part { $_ <= $a } @set;
my ($not_used,$sub_set_b) = part { $_ > $a and $_<=$b } @set;
my ($not_used,$sub_set_c) = part { $_ > $b } @set;

Ok not much here to complain about Yes I agree this does read well and does everything correctly but is it good code?

In most cases I would say yes but in this case the @set happened to have several years worth of data and thus thousands of values.

So would not this be better??

my ($set_min,$set_max,%unique,@sub_set_a,@sub_set_b,@sub_set_c) foreach my $item (@set){ $set_max = $item if ($item > $set_max); $set_min = $item if ($item < $set_min || !$set_min ); $unique{$item}=1; push(@sub_set_a,$item) if ($item <= $a); push(@sub_set_b,$item) if (($item > $a) and ($item <= $b)); push(@sub_set_c,$item) if ($item > $b); }

From max of 5x (assuming maxmin does only 1 iteration) down to 1x. I don't care the
the List::MoreUtils using 'C' with the 'XS' you still have the possibility of a
huge amount of extra iteration. It doesn't take a higher maths degree to argue that one.

The point being that Smartypants wanted show off by using some nifty-cool module whiteout actually looking at problem.

Don't get me wrong the List::MoreUtils are great when you have one thing to do a to a list.

Have more than one task? Take a few seconds and think a bout it.

6 comments

6 Comments

Rebecca | January 25, 2014 11:53 PM | Reply

Your snark is unfortunate, doubly so because you're wrong. Your smartypants apparently wears the smart pants for a reason. Their version is faster.

https://gist.github.com/iarna/8625596

With 10,000 entries in @set.
1..1 Benchmark: timing 1000 iterations of ForLoop, ListMoreUtils... ForLoop: 11 wallclock secs (10.16 usr + 0.04 sys = 10.20 CPU) @ 98.04/s (n=1000) ListMoreUtils: 4 wallclock secs ( 4.37 usr + 0.01 sys = 4.38 CPU) @ 228.31/s (n=1000) ok 1 - Do they match

With 100,000 entries in @set.
1..1 Benchmark: timing 100 iterations of ForLoop, ListMoreUtils... ForLoop: 11 wallclock secs (11.18 usr + 0.04 sys = 11.22 CPU) @ 8.91/s (n=100) ListMoreUtils: 4 wallclock secs ( 3.49 usr + 0.02 sys = 3.51 CPU) @ 28.49/s (n=100) ok 1 - Do they match

With 1,000,000 entries in @set.
1..1 Benchmark: timing 10 iterations of ForLoop, ListMoreUtils... ForLoop: 14 wallclock secs (13.17 usr + 0.15 sys = 13.32 CPU) @ 0.75/s (n=10) ListMoreUtils: 5 wallclock secs ( 4.42 usr + 0.13 sys = 4.55 CPU) @ 2.20/s (n=10) ok 1 - Do they match

So yeah, the smarty pants version scales better at every scale.

Joel Berger | January 26, 2014 12:16 AM | Reply

Though I think the part version could be better.



my ($sub_set_a, $sub_set_b, $sub_set_c, $not_used) = part { 

   $_ <= $a ? 0 

 : $_ > $a and $_<= $b ? 1

 : $_ > $b ? 2

 : 3

} @set;

(not tested)

Aristotle | January 26, 2014 12:20 AM | Reply

Funny, there are other things that could be improved. For a start, the $not_used is unnecessary, you can just write it like this:

my (undef,$sub_set_a) = part { $_ <= $a } @set;
my (undef,$sub_set_b) = part { $_ > $a and $_<=$b } @set;
my (undef,$sub_set_c) = part { $_ > $b } @set;

More importantly though, if $a < $b is a precondition here (but only then!), then this is a misunderstanding of the part function and you can reduce these three passes to a single one:

my ($sub_set_a, $sub_set_b, $sub_set_c) = part {
    $_ <= $a ? 0 :
    $_ <= $b ? 1 :
               2
} @set;

Aristotle replied to comment from Joel Berger | January 26, 2014 12:26 AM | Reply

Joel: if $a < $b doesn’t always hold, then the subsets can overlap, which cannot be produced from a single part pass – so your version is broken and the original is necessary. If it does always hold, then your version is unnecessarily complex.

Joel Berger replied to comment from Aristotle | January 26, 2014 2:33 AM | Reply

Ah, yes, I see it. Well, in my defense (maybe?) I was going fast (evident in my "not tested" warning). Thanks for the catch.

byterock | January 27, 2014 11:26 PM | Reply

Ok will have to check to see if that works out funky data set we had not just a simple array of scalars.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About byterock

Long time Perl guy, a few CPAN mods allot of work on DBD::Oracle and a few YAPC presentations

More info »

byterock