Putting Perl Back on Top in the Fields of Scientific and Financial Computing
As a bioinformatician and software developer of many years and avid Perl programmer and supporter, one thing I've noticed over the past few years is that Perl has been needlessly losing ground to Python in the major areas of scientific and financial computing, areas where it used to be *the* high-level interpreted language of choice. I am constantly having to correct people on blogs and forums that state incorrect Perl shortcomings when compared to Python or they were shortcomings from many years ago which don't exist anymore in the current language and ecosystem. If they spent two seconds researching Modern Perl and Enlightened Perl they would say WOW look where Perl has come!!!
All of us know there is no absolutely no technical reason for this, Python as a language is not "better" than Perl for any reason, choosing one over the other is simply a matter of personal preference to the style of the language. I program in Python as well and I definitely think that Perl has far more to offer in terms of CPAN and its community.
To illustrate what I've been seeing let's look at the following. In scientific and financial computing, Python has a great set of libraries and toolkits that users commonly use together in their research and work:
- NumPy - N-dimensional array object container for SciPy and tools to integrate C/C++ code
- SciPy - scientific computing libraries for science, mathematics, engineering
- Rpy2 - tightly integrated low-level interface between Python and R for statistical computing
- Matplotlib - 2D plotting library
- IPython - Enhanced interactive Python shell
- Boost.Python - Seamless interoperability between C++ and Python
In Perl we have these same capabilities and tools if not more:
- PDL - The Perl Data Language, which has:
- N-dimensional array objects
- integrated scientific computing libraries for science, mathematics engineering
- integrated 2D plotting libraries via PGPLOT and PLplot
- integrated 3D graphics libraries via OpenGL and TriD
- and much more...
- Statistics::R, Statistics::useR - basic integration between Perl and R for statistical computing
- ExtUtils::XSpp and SWIG - interoperability between C++ and Perl
- Countless other libraries on CPAN for math, science, engineering (just look at Math::*, Statistics::* namespaces for example)
The problem seems to me that simply no one knows about these tools and/or for the average scientist or quant installing/using/integrating them is more difficult than their equivalents in Python. For example, PDL is only well known in the astrophysics community when it is perfectly suited and written for any science, math, engineering work! Compared to Python we don't make it easy enough for newcomers to get going and these are the people that need this help the most. This shouldn't be happening and is bad for the community because the overall goal to keep a language thriving, growing, and dynamic is to get new programmers into that language. When they come to those times in their working life or in school when they have to make a decision as to what they are going to choose that they see Perl has an equal if not better platform to offer them!
I think we really need to:
- Communicate to the public in a clear, exciting, and attractive manner what we have to offer (why are there very few if not zero Perl books in the pipeline? Look at Python they have tons... why?)
- Make our tools and libraries much easier to install and integrate
The Python community seems to package their tools together, make them easier to install and use, and communicate that they exist to the public better which is a shame because again I believe Perl has so much more to offer than Python in terms of CPAN and its community.
I would really appreciate any input, feedback, criticism anyone has... I really care about Perl and want the public to see all it has to offer!
1. identify a couple (or more) common (bioinformatics) tasks
2. write example apps that help
3. profit!
Hi minty, thank you for your comments... i don't understand would you like me to write out at least point 1. you mentioned?
I think that matplotlib is a bit easier to work with than just about anything in Perl.
It has been a year or three since I last compared plotting libraries to be fair.
As a sysadmin at a university I actually recommend Python (numpy/scipy/matplotlib stack) over Perl (PDL/any plotting package) because most everyone uses Matlab (for good or for bad) and its plotting is rather easy to use.
I still think Perl is probably better overall but without a nicer plotting library (which I suppose I should compare things again) Python is my recommendation.
Chart::Clicker looked really nice and easy to use but was a bit of a pain to get installed properly last I checked (a few years ago).
gizmo
If I understand correctly, Chart::Clicker depends on Graphics::Primitive::Driver::Cairo (which depends on Cairo graphics library), but can work with any other Graphics::Primitive driver.
P.S. GD::Graph is also easy to use.
I don't think Chart::Clicker and GD::Graph are not for Scientific plotting.
Perl needs visual attraction like these with independent own project website, not only on CPAN text-only documents.
http://www.rstudio.org/
http://matplotlib.sourceforge.net/gallery.html
http://addictedtor.free.fr/graphiques/allgraph.php
http://groups.google.com/group/perl-scientific-computing is dead ?
If you have MATLAB users that know and want to stick with MATLAB but don't have a license then maybe they should consider GNU Octave as the languages and functionality are quite similar.
As for plotting were you able to compare the PDL integrated plotting libraries PGPLOT and PLplot with matplotlib? I haven't been able to do a head-to-head comparison myself but I've read pretty much every review on the web comparing the Perl PDL, Python SciPy/matplotlib, GNU Octave, R, MATLAB stacks and no review stated that matplotlib is any better, but please correct me if I am wrong.
For people interested in using PDL here is some information, homepage http://pdl.perl.org/.
For a quick full installation using just packages (with Fedora as example):
yum install libX11-devel libXi-devel libXmu-devel libXext-devel freeglut-devel mesa-libGLU-devel libICE-devel plplot-devel hdf-devel proj-devel proj-nad compat-gcc-34-g77 compat-libf2c netpdm-progs gd-devel gsl-devel fftw2-devel plplot-libs plplot-devel plplot-perl perl-OpenGL perl-Devel-REPL perl-Term-ReadLine-Gnu perl-ExtUtils-F77 perl-PDL perl-PDL-Graphics-PLplot
If you want to include PGPLOT for newer linux distros pgplot and pgplot-devel don't exist anymore so I had to first download, unpack and build PGPLOT into, for example, /usr/local/pgplot from ftp://ftp.astro.caltech.edu/pub/pgplot/pgplot5.2.tar.gz. Make sure to build it with at least the /XSERVE and /XWINDOW devices and it you aren't building right in the install directory that you copy the instructed files into the install directory. Then in order to have PDL install with PGPLOT support *don't* install the last two packages above (perl-PDL, perl-PDL-Graphics-PLplot) we have to install PDL from CPAN/CPANPLUS:
Start cpan like this:
PGPLOT_DIR=/usr/local/pgplot cpan
Then in the terminal:
cpan> install PGPLOT PDL
or in CPANPLUS:
PGPLOT_DIR=/usr/local/pgplot cpanp
Then in the terminal:
CPAN Terminal> i PGPLOT PDL
Using the CPAN/CPANPLUS method you will get the latest versions of PDL, etc. versus the RPM/APT package methods.
If anyone has any problems or questions just ask!!
Here are some interesting links:
Numerical Programming Performance Comparison in Perl and Python
http://heim.ifi.uio.no/~erikd/pdf/hlarray.pdf
PDL, R, Octave Reviews
http://www.linuxgoodies.com/review_pdl.html
http://www.linuxgoodies.com/review_R.html
http://www.linuxgoodies.com/review_octave.html
PDL vs MATLAB
http://use.perl.org/~david+m/journal/39838
PDL for MATLAB Users
http://sourceforge.net/apps/mediawiki/pdl/index.php?title=PDL_for_Matlab_users
Neat Performance Comparison of PDL, MATLAB, Octave Using Fractals as an Example
http://www.freesoftwaremagazine.com/articles/cool_fractals_with_perl_pdl_a_benchmark
"The benchmarks on the examples (see figure 3) provided here show that some high-level, array-oriented languages like IDL or PDL, when properly coded to avoid array index loops and if statements, are only about three to four times slower than the faster, low-level languages like C or FORTRAN77. MATLAB and Octave perform similarly and they are clearly slower than IDL or PDL.
From my personal point of view, PDL wins hands down. It provides nearly the same or better capabilities than other more expensive proprietary solutions. In particular, the mathematical and I/O functions provided by PDL are nearly equivalent to the ones from proprietary solutions and, in the case of array manipulation and language syntax, PDL is better. This comparison has no colour when price is considered, free vs. expensive. With PDL you will certainly never get a message like \u201cNo license available to run this software\u201d, not to mention the risks of basing your programming projects on the decisions of the owners of the proprietary languages."
Looks like I have a homework assignment this month. :-)
Hopefully I'll have time to digest the informative comments, do a comparison (or at least get a better handle on PDL's plotting) and do a write up.
Also, sounds like a short talk for my monger's group as well.
scipy has own plot library(Matplotlib).
R has own plot library and enhancement(ggplot2)
PDL don't have de facto standard and too many choices.
PGPLOT? its the latest update date is 2002 and oldie fortran code base. Does it promising?
PLPLOT? PDL/Perl can control its future road map?
If PDL can't afford to make a new plotting library, I think that PDL would rather integrate GNUPlot into PDL.
I was told on the PDL mailing list recently that work is proceeding to implement standard cross-platform graphics (2D and 3D).
Gnuplot has also been mentioned to the PDL developers, Gnuplot would be fairly straightforward to integrate and as it's the plotting library behind Octave would be a great addition to PDL giving PDL Gnuplot, PLplot, and PGPLOT as choices.
In addition the PDL developers are working with the Padre developers to develop a PDL interactive shell for the Padre IDE, and I hope in the future embedded wxPerl plotting.
For those of you who are interested in using Gnuplot from within Perl, it looks to me that you have three libraries with different capabilities to choose from:
Your best option for using Gnuplot is probably to just print out raw gnuplot commands. You have to tweak the heck out of publication-quality plots, and what are the odds that a Perl wrapper will support the options you need? e.g.
laboff = 0.9
unset log; set auto
set key bot right;
set ylab 'True positives (same superfamily)' laboff,0 font "cmr10,32";
set xlab 'False positives' 0,laboff font "cmr10,32";
set term aqua 1 fname "cmr10" fsize 24; set pointsize 1;set linest lw 5.0
set xtics 0,0.5,1;set ytics 0,0.5,1 # set ylab 1
set xran [0:1];set yran [0:1]
p 'roc-prof-poster.txt' ev 100 t 'Profiles' w linesp pt 9 lt -1, \
'roc-ib-poster.txt' ev 100 t 'IB' w linesp pt 1 lt 1, \
'../poster/roc-cons.txt' ev 100 t 'Consensus' w linesp pt 3 lt 3, x t '' lt 8
What I would like to see are some examples that do some graph work like the following:
I can do this with mathplotlib with about a dozen measly lines of code
Because of this I have been looking at using mathplotlib even though I rather do it in perl.
I've never used Perl so I can't comment on Perl vs. Python, but you missed two critical libraries/tools that Python also brings to scientific computing:
pandas (http://pandas.pydata.org/): offers efficient R-like DataFrames with tons of built-in capability
IPython Notebook (http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html): provides a web-based notebook that enables scientists to log their research in HTML-like cells; embed images, figures, videos, and external web pages; run Python/R/octave code in code cells; and do all kinds of shell commands, data management, data exploration, plotting, etc. -- all in the same tool.
A couple people in my lab were fairly religious about Perl until I showed them IPython Notebook and all the other tools Python has to offer. They switched over without even thinking about it and haven't looked back.
Just a thought from an end-user.
I am not sure what (3) exactly mean. For (1) and (2), Perl can also be done with just few lines. For example, with Chart::Gnuplot, it would be like: