Machine learning in Perl: Kyuubi goes to a (Model)Zoo during The Starry Night.

Hello all, this is a fourth blog post in the Machine learning in Perl series, focusing on the AI::MXNet, a Perl interface to Apache MXNet, a modern and powerful machine learning library.

If you're interested in refreshing your memory or just new to the series, please check previous entries over here: 1 2 3

If you're following ML research then you're probably well aware of two most popular libraries out there, Google's TensorFlow and a relative newcomer to the field but rapidly gaining widespread acceptance, Facebook's PyTorch.

The reason why PyTorch has gained so much ground on TensorFlow is in dynamic nature of that library. TensorFlow started as a static graph library (which is easier to optimize) and PyTorch went with dynamically allocated graphs and NumPy (read PDL) style of programming (with a robust GPU support and auto-differentiation of the gradients) that is as easy to debug as an ordinary Python's code.

Of course nor TF nor PyTorch are against common sense compromises and TF recently added dynamic style via tf.eager package and PyTorch is working on getting to 'low memory usage, high speed, ready for production' kinda state.

Apache MXNet, that in its turn is supported by Amazon (but started as free software project in academia), is relatively less popular. I am not sure what is the reason for that, I'd err on herd mentality, perception over reality, that kinda stuff. Perl has suffered from undeserved overly negative perception as a 'write only' language and whatever is actual reality does not seem to matter much.

MXNet stays for MiXNet, that is a mix of static(symbolic) and dynamic(tensor) style of programming from get go, this is the philosophy of the lib. You can be flexible and code absolutely whatever you want with raw tensors (NDArrays in MXNet) or you can go strict symbolic style and the lib will mercilessly optimize the execution of your graph, reducing memory usage and make the training/inference as efficient and speedy as possible. Check it out on google, MXNet consistently beats other libs over various benchmarks.

However raw tensor dynamic style of ML programming is not for everyone, it's hard, you need to know a lot about the details, besides, due to popularity of ML, the topic overrun with new, inexperienced people (me included), looking for quick, easy, fool-proof solution, hence the popularity of Keras (now tf.keras, an official front-end to TF) that covers over raw complexities with a layer of syntactic sugar, allowing for more inclusive environment and wider enterprise adoption. That layer of sugar tends to hurt performance, bit it seems to be a wise choice, judging on the wide adoption of the lib.

MXNet's answer to Keras and PyTorch is Gluon. Essentially a layer very similar in syntax and capabilities to PyTorch and Keras, dynamically created graphs, full flexibility to do any kinds of dynamic tensor operations with automatic differentiation, transparent multi GPU and multi machine training, etc. But with MXNet twist. Gluon stays true to the MiX roots of the library, allowing for extremely easy and transparent conversion of dynamic graphs to static with all optimization related benefits while not taking (there are some caveats) away the freedom of the dynamic programming.

Okay, so far so good. But why this wall of text is being published on ?

The reason is simple, AI::MXNet fully supports Gluon, all capabilities, state of the art networks from very recent ML papers can be implemented in Perl very easy, efficiently and painlessly.

To demonstrate this fact and hopefully spark an interest to the ML topic among Perl community I recently ported Gluon ModelZoo (a set of pretrained vision networks) with state of the art models for ImageNet dataset to Perl as AI::MXNet::Gluon::ModelZoo and added two new really cool Gluon examples.

This new module and the examples are main topics of the post.

Lets start with AI::MXNet::Model::Zoo itself. It's a collection of seven different deep neural networks capturing the effort to 'solve' ImageNet dataset in time span between 2012 and 2017 (It's effectively solved and state of the art has surpassed human performance).

For the sake of brevity we'll concentrate our attention on smallest network, AlexNet; a grand daddy of todays ML craze. AlexNet has started it all in 2012 by beating closest competitor by whopping 41% percent.

Here is how it's defined in Gluon.

package AI::MXNet::Gluon::ModelZoo::Vision::AlexNet;
use strict;
use warnings;
use AI::MXNet::Function::Parameters;
use AI::MXNet::Gluon::Mouse;
extends 'AI::MXNet::Gluon::HybridBlock';

has 'classes' => (is => 'ro', isa => 'Int', default => 1000);
method python_constructor_arguments() { ['classes'] }

my $self = shift;
$self->name_scope(sub {
$self->features->name_scope(sub {
$self->features->add(nn->Conv2D(64, kernel_size=>11, strides=>4,
padding=>2, activation=>'relu'));
$self->features->add(nn->MaxPool2D(pool_size=>3, strides=>2));
$self->features->add(nn->Conv2D(192, kernel_size=>5, padding=>2,
$self->features->add(nn->MaxPool2D(pool_size=>3, strides=>2));
$self->features->add(nn->Conv2D(384, kernel_size=>3, padding=>1,
$self->features->add(nn->Conv2D(256, kernel_size=>3, padding=>1,
$self->features->add(nn->Conv2D(256, kernel_size=>3, padding=>1,
$self->features->add(nn->MaxPool2D(pool_size=>3, strides=>2));
$self->features->add(nn->Dense(4096, activation=>'relu'));
$self->features->add(nn->Dense(4096, activation=>'relu'));

method hybrid_forward(GluonClass $F, GluonInput $x)
$x = $self->features->($x);
$x = $self->output->($x);
return $x;

Few things here warrant attention. AI::MXNet::Gluon::Mouse is just a Mouse's subclass that implicitly adds internal trigger on any attribute for the sake of user convenience of not adding that trigger explicitly.

The network itself is a subclass of AI::MXNet::Gluon::HybridBlock, a class that allows Gluon to be true to its MiX roots, hence Hybrid in its name. By simply calling ->hybridize method on the net object a user signifies that the work on the graph creation is complete and the lib is now allowed to optimize all innards to its likings.

sub BUILD is a heart of the net creation, the place where the magic happens. In order to make conversions of Python examples to Perl as simple as possible I added auto-vivification of new attributes (via AUTOLOAD that just calls Mouse's 'has' first time the attribute is mentioned).

method hybrid_forward is essentially what happens during forward phase of the net execution, at this point you can think it as something that executes the network, converting the input into the output, the image of a cat to the answer 'this picture contains a cat'.

You may be wondering what is nn-> means ? It's there also to allow converting Python examples to Perl in the least painful manner. To the Perl it's just 'AI::MXNet::Gluon::NN', a module that houses a vast collection of predefined blocks of which deep nets are built as a lego.

Want to see deeper ? Easy. Let's stringify the net and let it tell us its structure

use AI::MXNet qw(mx); 
print mx->gluon->model_zoo->vision->alexnet

(features): HybridSequential(
(0): Conv2D(64, kernel_size=(11,11), stride=(4,4), padding=(2,2))
(1): MaxPool2D(size=(3,3), stride=(2,2), padding=(0,0), ceil_mode=0)
(2): Conv2D(192, kernel_size=(5,5), stride=(1,1), padding=(2,2))
(3): MaxPool2D(size=(3,3), stride=(2,2), padding=(0,0), ceil_mode=0)
(4): Conv2D(384, kernel_size=(3,3), stride=(1,1), padding=(1,1))
(5): Conv2D(256, kernel_size=(3,3), stride=(1,1), padding=(1,1))
(6): Conv2D(256, kernel_size=(3,3), stride=(1,1), padding=(1,1))
(7): MaxPool2D(size=(3,3), stride=(2,2), padding=(0,0), ceil_mode=0)
(8): Flatten
(9): Dense(4096 -> 0, Activation(relu))
(10): Dropout(p = 0.5)
(11): Dense(4096 -> 0, Activation(relu))
(12): Dropout(p = 0.5)
(output): Dense(1000 -> 0, linear)

Want to see how the input dimensions get changed layer by layer? Easy. Lets call a summary method for that.

use AI::MXNet qw(mx); 
use AI::MXNet::Gluon qw(gluon); 
my $net = mx->gluon->model_zoo->vision->alexnet; 
$net->(nd->random->uniform(shape => [1,3,224,224])); 
        Layer (type)                                Output Shape         Param #
               Input                            (1, 3, 224, 224)               0
        Activation-1                             (1, 64, 55, 55)               0
            Conv2D-2                             (1, 64, 55, 55)           23296
         MaxPool2D-3                             (1, 64, 27, 27)               0
        Activation-4                            (1, 192, 27, 27)               0
            Conv2D-5                            (1, 192, 27, 27)          307392
         MaxPool2D-6                            (1, 192, 13, 13)               0
        Activation-7                            (1, 384, 13, 13)               0
            Conv2D-8                            (1, 384, 13, 13)          663936
        Activation-9                            (1, 256, 13, 13)               0
           Conv2D-10                            (1, 256, 13, 13)          884992
       Activation-11                            (1, 256, 13, 13)               0
           Conv2D-12                            (1, 256, 13, 13)          590080
        MaxPool2D-13                              (1, 256, 6, 6)               0
          Flatten-14                                   (1, 9216)               0
       Activation-15                                   (1, 4096)               0
            Dense-16                                   (1, 4096)        37752832
          Dropout-17                                   (1, 4096)               0
       Activation-18                                   (1, 4096)               0
            Dense-19                                   (1, 4096)        16781312
          Dropout-20                                   (1, 4096)               0
            Dense-21                                   (1, 1000)         4097000
          AlexNet-22                                   (1, 1000)               0
Total params: 61100840
Trainable params: 61100840
Non-trainable params: 0
Shared params: 0

We can do even better and convert the net into a static graph and print out an image if its structure.

Ok, that is may be cool but I think a little dry. Lets add some cute pictures to the mix. You may have wondered why the title of the blog post is what it is.

Kyuubi is my dog (love him soooo much :-)), four year old Pembrock Welsh Corgi.

It's easy for the networks with 95% of accuracy on ImageNet to identify Corgi. I do however have a photo that obsures his doggy features a bit. On this photo Kyuubi is enjoying the total Sun eclipse on Aug 21, 2017 in Salem, OR. Naturally he took some precaution measures in order to protect his eyes.

Let's see if one of the ModelZoo networks will be able to correctly classify what's in the picture. Below you can see the example that I included with the AI::MXNet::Gluon::ModelZoo module.

use strict;
use warnings;
use AI::MXNet::Gluon::ModelZoo 'get_model';
use AI::MXNet::Gluon::Utils 'download';
use Getopt::Long qw(HelpMessage);

## my Pembroke Welsh Corgi Kyuubi, enjoing Solar eclipse of August 21, 2017
'image=s' => \(my $image = ''.
'model=s' => \(my $model = 'resnet152_v2'),
'help' => sub { HelpMessage(0) },
) or HelpMessage(1);

## get a pretrained model (download parameters file if necessary)
my $net = get_model($model, pretrained => 1);

## ImageNet classes
my $fname = download('');
my @text_labels = map { chomp; s/^\S+\s+//; $_ } IO::File->new($fname)->getlines;

## get the image from the disk or net
if($image =~ /^https/)
eval { require IO::Socket::SSL; };
die "Need to have IO::Socket::SSL installed for https images" if $@;
$image = $image =~ /^https?/ ? download($image) : $image;

# Following the conventional way of preprocessing ImageNet data:
# Resize the short edge into 256 pixes,
# And then perform a center crop to obtain a 224-by-224 image.
# The following code uses the image processing functions provided
# in the AI::MXNet::Image module.

$image = mx->image->imread($image);
$image = mx->image->resize_short($image, $model =~ /inception/ ? 330 : 256);
($image) = mx->image->center_crop($image, [($model =~ /inception/ ? 299 : 224)x2]);

## CV that is used to read image is column major (as PDL)
$image = $image->transpose([2,0,1])->expand_dims(axis=>0);

## normalizing the image
my $rgb_mean = nd->array([0.485, 0.456, 0.406])->reshape([1,3,1,1]);
my $rgb_std = nd->array([0.229, 0.224, 0.225])->reshape([1,3,1,1]);
$image = ($image->astype('float32') / 255 - $rgb_mean) / $rgb_std;

# Now we can recognize the object in the image.
# We perform an additional softmax on the output to obtain probability scores.
# And then print the top-5 recognized objects.
my $prob = $net->($image)->softmax;
for my $idx (@{ $prob->topk(k=>5)->at(0) })
my $i = $idx->asscalar;
"With prob = %.5f, it contains %s\n",
$prob->at(0)->at($i)->asscalar, $text_labels[$i]

The core of the script is quite simple. Convert the input (image) to the output (one dimensional array of 1k size for 1000 ImageNet classes). The array is essentially a probability distribution (sum of all values adds to 1) and the value at the index for a specific class is the probability that the object of that class is present in the picture.

To convert the input into the output we just need to pretend that our network is a reference to a subroutine and feed it our input as we would to any other perl sub. Everything else is merely a busy work, read image, convert it to an appropriate format (people never seem to agree which dimension ordering method is superior) and then print out five largest probability values along with their text labels.

Downloading synset.txt from ...
Downloading kyuubi.jpg from ...
With prob = 0.69273, it contains Pembroke, Pembroke Welsh corgi
With prob = 0.30584, it contains Cardigan, Cardigan Welsh corgi
With prob = 0.00041, it contains beagle
With prob = 0.00029, it contains basset, basset hound
With prob = 0.00010, it contains Eskimo dog, husky

As we can see 152 layer deep residual (input and output of a layer summed up and fed to a next layer) network had no trouble seeing through Kyuubi's shenigans.

But this is not all, there's one more Gluon example, artistic style transfer. You may have seen the examples of this technique, there are even mobile apps that allow you convert your photos into a 'timeless art'.

Now you can just read a well formatted Perl code and see for yourself what's under the hood. The example implements an absolute state of the art style transfer, as good as it gets, and real time one to boot, not much wait needed. Feed it any style picture and your image/photo and prepare to get amused at the result in 10, 15 seconds.

I'll refrain from explaining the code, the actual 'network' part is still something that I do not fully understand (working on it), everything else is busy work with images.

I hope you'll have fun with this example and produce a lot of nice pictures with it. Below are the images produced by the example script from the Kyuubi's photo and different classic paintings.

Style image: Kazimir Malevich, Black Square Style image: random ornate stone wall image Style image: Salvador Dali, The Enigma of Desire Style image: Vincent van Gogh, The Starry Night

That's all that wanted to share today and it's time for a little rant.

When I started porting MXNet to Perl in Dec 2016 my motivation was to get Perl's community an exposure to a modern ML lib, get people to use it, mitigate in some way severe lack of ML tools in public Perl sphere.

Now, it worked for me personally pretty well on many planes. An exposure to complex Python taught me a lot about that language, the porting of the lib itself taught me a lot about under the hood details of the ML process. Using AI::MXNet I was able to introduce ML at my day job (we are mostly Perl shop) with considerable success.

But I am having second thoughts now about actual value of this work to you guys. Does Perl's community actually need ML ? What's it ? Everybody who needs ML just switches to Python and that's it ?

In these two years I am remaining a sole contributor to the codebase (with exception of my work colleague a bit), almost no issues submitted, number of docker downloads is at about 130 or so.

When I started to learn Perl in 1998 I wrote a module called Net::RawIP to help myself get a hold of the language. That module is an absolute disaster of a crappy code, like really horrible. But in days after releasing it I got dozens of emails, patches, people contacted, helped me to port it to Solaris, *BSD, etc. The work felt needed.

With AI::MXNet the feel is opposite. It looks like I am writing the module for myself and it's not really needed or asked for. Hopefully not. Looking for more contributors to the Perl part of MXNet codebase, there's a lot of work and I could use some help.

Thank you for reading this far.


All good stuff. I did a quick audit of the Perl AI modules and they all seemed pretty dead. Is MxNet being supported well? Is there much need for it to be?

Also, I recently used Net::RawIP for a quick prototype :)

Hello Sergey,
Excellent work as always! You may recall we corresponded a while back. I continue to use AI::MXNet and both myself and a colleague continue to use MXNet, in general, for work. This is a work project for which the upside for our employer s potentially many many millions per year. So, be assured that your work is not going unused! Other than a small thing or two I contacted you about I just haven't had any problems, things work very well! Perhaps I am not fully using the whole API enough to find bugs? Or perhaps you did a rock solid job in development and testing!

To the broader problem of getting more AI work in Perl I made some small effort toward this by showing off some AI::MXNet project work at the meeting last November. It was one of the best attended Boston Perl Monger meetings in some time so there is definitely interest! I'd think that a similar talk to an even bigger audience at this years Perl Conference would have been nice and given more people a head start at doing AI in Perl. I half considered doing so myself but personal commitments prevented me from pursuing that this time around. Maybe next year? Have you considered presenting at the Perl Conference?

Perl programmers still tend to be very best programmers out there, with a solid "hacker mentality" and willingness to roll their sleeves up and go deep into a problem and helpful culture. Sometimes though, a lot of great things happen outside the lens of social media and so there is a greater appearance of disinterest than is actually the case.

>It was one of the best attended Boston Perl Monger meetings in some time...

I can confirm that. (I'm a co-organizer of

ML with Perl is definitely a popular topic, and seemed to be one that got people interested in taking another look at the language.

Thanks to Adam for giving the talk, and keep up the good work Sergey.

I am in the process of learning ML and was reluctantly learning python even though we are big perl users. I have just come across this blog and its really made my day. Going to give this library a go!

Thank you for your selfless work


You made a wonderfull work Sergey !!
I cannot do stats for you. But at least I am one (french) human enjoying your work. Really excelent.

I followed your tutorial, everything compiles, works super easy. Time for me to discover the API, make some slides, graph, images ...

Currently, I am not trained enought to make pull request in github. But I'll try to correct some errors when they appears.

This is contributing to AI diversity. I'd like to imitate you (softly), making a GEcode (constraint programming) wrapper. But I am scared of the overabundance of C++ templates (macro) in all those AI libs.

Thank you soooo much and congrats for making all that working as a package on your own.

Leave a comment

About Sergey Kolychev

user-pic I blog about Perl.