distributed data analysis and reporting and perl, ...

By jaw on February 4, 2010 9:15 PM

so I was asked this question regarding our users and our products
and such. wanting a quantitative numerical result,
- I quickly (less than 5 minutes) wrote up a map-reduce (+ reduce) job,
- submitted it to our array of servers,
- and in less than 10 seconds, I had the answer.

actual job code (names changed):


<%map>
    my $data = shift;

    return unless $data->{category} eq 'CATEGORY';

    return unless $data->{design};

    return unless $data->{type} eq 'TYPE';

    return ("$data->{design} $data->{status}", 1 );

</%map>

<%reduce>

    my $key  = shift;

    my $iter = shift;

    my $total = 0;

    $iter->foreach(sub { $total ++ });

    my($design, $status) = split /\s/, $key;

    return($design, { status => $status, total => $total });

</%reduce>

<%reduce>

    my $key  = shift;

    my $iter = shift;

    my %totals;

    $iter->foreach(sub {

        my $r = shift;

        $totals{ $r->{status} } = $r->{total};

        $totals{all}           += $r->{total};

    } );

    return( $key, \%totals);

</%reduce>

<%final>

    my $key = shift;

    my $tot = shift;

    my $all = $tot->{all};

    for my $status (keys %$tot){

        my $pct = $all ? sprintf('%.2f', 100 * $tot->{$status} / $all) : 0;

        print "$key\t$status\t$tot->{$status}\t%$pct\n";

    }

</%final>

we are a perl + mason shop, so one of the design goals was to keep
the learning curve easy and comfortable for our junior staff.

since it is common with some other m/r frameworks, to run a job, and
then use the results as input for another job (et al...), I wanted to be able
to specify multiple stages, from start to end, all together.

did I mention that we plan to open-source this?
oh yeah, that....

0 comments

Tagged as:

perl map reduce analysis scale

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About jaw

I am this tall.

More info »

jaw