Data analysis and visualization in Perl

position<em>stack</em>02_02.png

Hello everybody, this is my first post here, so forgive me if I screw it up.

Let me firstly introduce background of my work. Several years ago I landed onto a Perl job. It also involves some other languages like Python and R, but it was mainly Perl, until last year focus of my role switched and I still do some Perl but much less since then. I was a little bit sad. Perl is indeed a good language, but usually underated outside its community. I am quite good at several programming languages, but with Perl I feel most comfortable and productive. So I thought I might write something at my after-work time, to use my Perl knowledge to create something to give back to the Perl community.

I remembered I read this post Putting Perl Back on Top in the Fields of Scientific and Financial Computing a couple of years ago. I know Perl has PDL since long time ago, but why does it not have a big market share? Well, IMHO there are a few reasons, and one of them is that, it needs a whole ecosystem, by ecosystem I mean several strong libraries that target different aspects of scientific or financial computing, to fullfill user's needs. Python has numpy/scipy and matplotlib since early 2000s. And based on numpy they created pandas which is used a lot today in dataframe-based data analysis. R itself has out-of-the-box dataframe and plotting features, and Hadley Wickham's ggplot2 is so powerful that I heard people not switching from R to Python only because of ggplot2. For Perl IMHO it lacks a great plotting library, and it lack a library that can do dataframe well.

Then I decided to do something in this area: to improve ZMUGHAL's Data::Frame, and to port R's ggplot2 to Perl. It took me several months but it finally has got to a rough usable state. In recent couple of weeks I released two new libraries onto CPAN: Chart::GGPlot and Alt::Data::Frame::ButMore. The latter one was because I've not been able to reach ZMUGHAL.

For those who have not heard of R's ggplot2, it's an implementaion of Leland Wilkinson's The Grammar of Graphics. Basically it allows one to define a plot by specifying various components like geometry layers, scales, etc., on a high level, and map columnar data to the plot. For example, below piece of code would get you that plot on top of this post.

#!/usr/bin/env perl

use strict;
use warnings;

use Chart::GGPlot qw(:all);
use Data::Frame::Examples qw(diamonds);

my $p = ggplot(
    data => diamonds(),
    mapping => aes( x => 'price', fill => 'cut' )
)->geom_histogram( binwidth => 500, position => 'fill' );

$p->show();

More of my Chart::GGPlot examples are here.

So that's it. Thanks for reading. And comments or ideas are welcome :-)

11 Comments

I ask one question.

How to output as ping?

Thank you for your comment.

I install Chart::GGPlot into my CentOS server.

And I run example program in this entry.

I write frank comments to improve user experience.

1. Hope installation more fast.

When I install Chart::GGPlot, 155 distribution is installed.

I feel installation time is too long.

2. Hope Chart creating more fast.

I run this example. I wait over 30 hours.

In R language, Chart creating time is much fast.

3. Chart::GGPlot depend on JavaScript library.

When I read blog entry at first, I think Chart::GGPlot is pure Perl/XS library.

And I thought chart is output to png image easily.

Chart::GGPlot is depend on JavaScript library now.

Goal and Design of Chart::GGPlot is very attractive!

Sorry 30 seconds.

Is it difficult to write point, line, color, text from data frame by Cairo?

I think one of Perl lacks is Perl/XS graphic library which can create 2D/3D chart easily.

If installation become easy and perfomrance of chart creating is improved as same as R language, it become more attractive for me.


Leave a comment

About Stephan Loyd

user-pic I blog about Perl.