Adapting PDL to a Big Data Landscape
Note: although this article is directed at current PDL users, I would particularly appreciate the opinion of Perl users who are considering using PDL. Does my assessment seem accurate to you?
I was just watching a few of the talks on youtube from from YAPC::NA that I wanted to attend in Madison but could not because I was busy (writing my talks) or attending other talks. And it reminded me of the revelation that I had at YAPC. Although I am not looking for a job, I spoke with the sponsors at their job booths, just to get a feel for what's out there. Is it possible for a Perl programmer to get a job doing real data crunching? The answer, happily, was "yes".
Almost immediately, I began to realize that there is a whole world of data analysis that is on the horizon for which PDL is well suited. PDL was written by and for scientists, but there's no reason it couldn't be applied to the analysis of Big Data (made possible in large part due to Chris Marshall's work on fully cross-platform memory mapping and 64-bit cleanups). Analyses of large data sets are already happening at many private corporations using languages such as SAS, SPSS, S, and R. Some of them might use Matlab; a rare few might use Python or Perl. Due to our limited marketing budget (ha!), the only corporations that will choose to use Perl and PDL are those which already use Perl in some significant capacity. We PDL folks have two major things to take away from this. First, we must engage with the wider Perl community, and second, we must make it easy for PDL outsiders to learn about and use the full breadth of PDL.