Putting Perl Back on Top in the Fields of Scientific and Financial Computing

As a bioinformatician and software developer of many years and avid Perl programmer and supporter, one thing I've noticed over the past few years is that Perl has been needlessly losing ground to Python in the major areas of scientific and financial computing, areas where it used to be *the* high-level interpreted language of choice. I am constantly having to correct people on blogs and forums that state incorrect Perl shortcomings when compared to Python or they were shortcomings from many years ago which don't exist anymore in the current language and ecosystem. If they spent two seconds researching Modern Perl and Enlightened Perl they would say WOW look where Perl has come!!!

All of us know there is no absolutely no technical reason for this, Python as a language is not "better" than Perl for any reason, choosing one over the other is simply a matter of personal preference to the style of the language. I program in Python as well and I definitely think that Perl has far more to offer in terms of CPAN and its community.

To illustrate what I've been seeing let's look at the following. In scientific and financial computing, Python has a great set of libraries and toolkits that users commonly use together in their research and work:

  • NumPy - N-dimensional array object container for SciPy and tools to integrate C/C++ code
  • SciPy - scientific computing libraries for science, mathematics, engineering
  • Rpy2 - tightly integrated low-level interface between Python and R for statistical computing
  • Matplotlib - 2D plotting library
  • IPython - Enhanced interactive Python shell
  • Boost.Python - Seamless interoperability between C++ and Python

In Perl we have these same capabilities and tools if not more:

  • PDL - The Perl Data Language, which has:
    • N-dimensional array objects
    • integrated scientific computing libraries for science, mathematics engineering
    • integrated 2D plotting libraries via PGPLOT and PLplot
    • integrated 3D graphics libraries via OpenGL and TriD
    • and much more...
  • Statistics::R, Statistics::useR - basic integration between Perl and R for statistical computing
  • ExtUtils::XSpp and SWIG - interoperability between C++ and Perl
  • Countless other libraries on CPAN for math, science, engineering (just look at Math::*, Statistics::* namespaces for example)

The problem seems to me that simply no one knows about these tools and/or for the average scientist or quant installing/using/integrating them is more difficult than their equivalents in Python. For example, PDL is only well known in the astrophysics community when it is perfectly suited and written for any science, math, engineering work! Compared to Python we don't make it easy enough for newcomers to get going and these are the people that need this help the most. This shouldn't be happening and is bad for the community because the overall goal to keep a language thriving, growing, and dynamic is to get new programmers into that language. When they come to those times in their working life or in school when they have to make a decision as to what they are going to choose that they see Perl has an equal if not better platform to offer them!

I think we really need to:

  1. Communicate to the public in a clear, exciting, and attractive manner what we have to offer (why are there very few if not zero Perl books in the pipeline? Look at Python they have tons... why?)
  2. Make our tools and libraries much easier to install and integrate

The Python community seems to package their tools together, make them easier to install and use, and communicate that they exist to the public better which is a shame because again I believe Perl has so much more to offer than Python in terms of CPAN and its community.

I would really appreciate any input, feedback, criticism anyone has... I really care about Perl and want the public to see all it has to offer!

16 Comments

1. identify a couple (or more) common (bioinformatics) tasks
2. write example apps that help
3. profit!

I think that matplotlib is a bit easier to work with than just about anything in Perl.

It has been a year or three since I last compared plotting libraries to be fair.

As a sysadmin at a university I actually recommend Python (numpy/scipy/matplotlib stack) over Perl (PDL/any plotting package) because most everyone uses Matlab (for good or for bad) and its plotting is rather easy to use.

I still think Perl is probably better overall but without a nicer plotting library (which I suppose I should compare things again) Python is my recommendation.

Chart::Clicker looked really nice and easy to use but was a bit of a pain to get installed properly last I checked (a few years ago).

gizmo

If I understand correctly, Chart::Clicker depends on Graphics::Primitive::Driver::Cairo (which depends on Cairo graphics library), but can work with any other Graphics::Primitive driver.


P.S. GD::Graph is also easy to use.

I don't think Chart::Clicker and GD::Graph are not for Scientific plotting.

Perl needs visual attraction like these with independent own project website, not only on CPAN text-only documents.

http://www.rstudio.org/
http://matplotlib.sourceforge.net/gallery.html
http://addictedtor.free.fr/graphiques/allgraph.php

http://groups.google.com/group/perl-scientific-computing is dead ?

Looks like I have a homework assignment this month. :-)

Hopefully I'll have time to digest the informative comments, do a comparison (or at least get a better handle on PDL's plotting) and do a write up.

Also, sounds like a short talk for my monger's group as well.

scipy has own plot library(Matplotlib).
R has own plot library and enhancement(ggplot2)
PDL don't have de facto standard and too many choices.

PGPLOT? its the latest update date is 2002 and oldie fortran code base. Does it promising?
PLPLOT? PDL/Perl can control its future road map?

If PDL can't afford to make a new plotting library, I think that PDL would rather integrate GNUPlot into PDL.

Your best option for using Gnuplot is probably to just print out raw gnuplot commands. You have to tweak the heck out of publication-quality plots, and what are the odds that a Perl wrapper will support the options you need? e.g.

laboff = 0.9
unset log; set auto
set key bot right;
set ylab 'True positives (same superfamily)' laboff,0 font "cmr10,32";
set xlab 'False positives' 0,laboff font "cmr10,32";
set term aqua 1 fname "cmr10" fsize 24; set pointsize 1;set linest lw 5.0
set xtics 0,0.5,1;set ytics 0,0.5,1 # set ylab 1
set xran [0:1];set yran [0:1]
p 'roc-prof-poster.txt' ev 100 t 'Profiles' w linesp pt 9 lt -1, \
'roc-ib-poster.txt' ev 100 t 'IB' w linesp pt 1 lt 1, \
'../poster/roc-cons.txt' ev 100 t 'Consensus' w linesp pt 3 lt 3, x t '' lt 8

What I would like to see are some examples that do some graph work like the following:

  1. set up axis with ranges and annotation text
  2. plot several data point array with line color choice
  3. put multiple cross hairs on the screen in both x and y dirs

I can do this with mathplotlib with about a dozen measly lines of code

Because of this I have been looking at using mathplotlib even though I rather do it in perl.

I've never used Perl so I can't comment on Perl vs. Python, but you missed two critical libraries/tools that Python also brings to scientific computing:

pandas (http://pandas.pydata.org/): offers efficient R-like DataFrames with tons of built-in capability

IPython Notebook (http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html): provides a web-based notebook that enables scientists to log their research in HTML-like cells; embed images, figures, videos, and external web pages; run Python/R/octave code in code cells; and do all kinds of shell commands, data management, data exploration, plotting, etc. -- all in the same tool.

A couple people in my lab were fairly religious about Perl until I showed them IPython Notebook and all the other tools Python has to offer. They switched over without even thinking about it and haven't looked back.

Just a thought from an end-user.

I am not sure what (3) exactly mean. For (1) and (2), Perl can also be done with just few lines. For example, with Chart::Gnuplot, it would be like:


my $chart = Chart::Gnuplot->new(
    output => "test.png",
    xlabel => "x-axis",    # annotation text
    xrange => [0, 10],     # axis range
);

# About data point array 1
my $dataSet1 = Chart::Gnuplot::DataSet->new(
points => \@data1,
color => "blue", # line color chosen
);

# About data point array 2
my $dataSet2 = Chart::Gnuplot::DataSet->new(
points => \@data2,
color => "dark-red", # line color chosen
);

$chart->plot2d($dataSet1, $dataSet2);

Leave a comment

About hermidalc

user-pic I'm a bioinformatician, software developer, avid Perl programmer and supporter and I'm here to blog about the language that is Perl!