Adapting PDL to a Big Data Landscape
Note: although this article is directed at current PDL users, I would particularly appreciate the opinion of Perl users who are considering using PDL. Does my assessment seem accurate to you?
I was just watching a few of the talks on youtube from from YAPC::NA that I wanted to attend in Madison but could not because I was busy (writing my talks) or attending other talks. And it reminded me of the revelation that I had at YAPC. Although I am not looking for a job, I spoke with the sponsors at their job booths, just to get a feel for what's out there. Is it possible for a Perl programmer to get a job doing real data crunching? The answer, happily, was "yes".
Almost immediately, I began to realize that there is a whole world of data analysis that is on the horizon for which PDL is well suited. PDL was written by and for scientists, but there's no reason it couldn't be applied to the analysis of Big Data (made possible in large part due to Chris Marshall's work on fully cross-platform memory mapping and 64-bit cleanups). Analyses of large data sets are already happening at many private corporations using languages such as SAS, SPSS, S, and R. Some of them might use Matlab; a rare few might use Python or Perl. Due to our limited marketing budget (ha!), the only corporations that will choose to use Perl and PDL are those which already use Perl in some significant capacity. We PDL folks have two major things to take away from this. First, we must engage with the wider Perl community, and second, we must make it easy for PDL outsiders to learn about and use the full breadth of PDL.
Engaging the wider community I am happy to report that my Introduction to PDL was very well attended. In other words, the Perl people care about and are interested in PDL. We PDL people simply need to make ourselves better known and accessible to the other Perl people who live and work in our midst. I highly recommend attending your local Perl Mongers. If there is no such group and you're the outgoing type, try searching on LinkedIn for other Perl folks in your neck of the woods and contact them if you can. A sysadmin who knows about PDL is one thing; a sysadmin that can put his coworker in touch with you, a PDL user that he sees once a month, is a much more powerful thing. If you're less outgoing, join the #pdl channel at irc.perl.org. If you don't have an irc client or don't know how to use irc, just use the in-browser mibbit client.
Making it easy for outsiders With the release of the PDL::Book, we finally have a single comprehensive resource for learning PDL. This is great. However, both the core docs and the Book can be improved. As the need to analyze Big Data grows, new users will come to PDL needing new functionality, and they will need to be able to learn to implement that functionality. Do you understand the intricacies of PDL threading? At the very least, do you feel like you could sit down with another programmer and hack at it until you got it right? Or, going further, have you used PDL::PP? There's a chapter in the book on that, too. (I should know, I wrote it. :-) If not, read selected chapters from the book and give your feedback. (Credits are listed in the back of the book, or just email the mailing list. New users, you have to sign-up to send mail.) The better we can make the book and the docs, the better we will be able to accommodate newcomers. The more people in our little community who understand these things, the more responsive we can be when newcomers arrive and ask questions, and the more of them will stay and start contributing, making PDL even better.
Finally, yes, I am talking to you, Jane PDL Hacker. I know that some of you, even some of the PDL Big Wigs, do not attend your local Perl Mongers. You should. Furthermore, only one person gave me thorough feedback on the PDL::PP chapter, and I must shamefully admit that I have yet to read most of the rest of the book. If you do not help, you should not be surprised if PDL slowly bitrots into oblivion. But if you tell others about PDL and give useful feedback on the docs, PDL will grow and improve and your efforts will pay off in the form of an even more awesome tool.
The tide of Big Data is coming. Do Your Part: help make PDL awesome, and help other Perlers discover how awesome it is, and maybe even make it better.