June 2012 Archives

Adapting PDL to a Big Data Landscape

Note: although this article is directed at current PDL users, I would particularly appreciate the opinion of Perl users who are considering using PDL. Does my assessment seem accurate to you?

I was just watching a few of the talks on youtube from from YAPC::NA that I wanted to attend in Madison but could not because I was busy (writing my talks) or attending other talks. And it reminded me of the revelation that I had at YAPC. Although I am not looking for a job, I spoke with the sponsors at their job booths, just to get a feel for what's out there. Is it possible for a Perl programmer to get a job doing real data crunching? The answer, happily, was "yes".

Almost immediately, I began to realize that there is a whole world of data analysis that is on the horizon for which PDL is well suited. PDL was written by and for scientists, but there's no reason it couldn't be applied to the analysis of Big Data (made possible in large part due to Chris Marshall's work on fully cross-platform memory mapping and 64-bit cleanups). Analyses of large data sets are already happening at many private corporations using languages such as SAS, SPSS, S, and R. Some of them might use Matlab; a rare few might use Python or Perl. Due to our limited marketing budget (ha!), the only corporations that will choose to use Perl and PDL are those which already use Perl in some significant capacity. We PDL folks have two major things to take away from this. First, we must engage with the wider Perl community, and second, we must make it easy for PDL outsiders to learn about and use the full breadth of PDL.

Yet Another YAPC::NA Report

Finally, I can sit down to write my report!

This was my first YAPC, and it was fantastic! For many years my knowledge of the Perl community has been through the PDL mailing list, through the many Perl blogs, and through occasional IRC. I attended few Chambana.pm meetings, but they were social and didn't really get me fired up for Perl. My first experience with a collection of Perl programmers would be joining Chicago.pm last fall, and my first Perl conference was DC/Baltimore Perl Workshop this spring. But wow, 400+ Perl programmers in one place!

I gave two talks at the conference: an introduction to the Perl Data Lanuage (PDL) and an introduction to my new plotting library called PDL::Graphics::Prima. Both were well attended and well recieved, and I have gotten a handful of follow-up email and irc discussions as a result of both talks. Building the PDL community was one reason I attended YAPC::NA and I get the impression that it's paying off.

About David Mertens

user-pic This is my blog about numerical computing with Perl.