June 2013 Archives

Flux: new streaming data processing framework

Flux is the framework I've been meaning to release for a very long time [1].

What's it good for? Message queues; organizing your data processing scripts in a scalable way; de-coupling your processing pipeline elements, making them reusable and testable; seeing your system as a collection of lego-like blocks which can be combined and replaced as you like. With Flux, your code is a series of tubes.

Flux is a rewrite of Stream framework which we wrote and used in Yandex for many years. Stream:: namespace on CPAN is taken, though, which gave me the reason to do a cleanup before uploading it, as well as a chance to rewrite everything with Moo/Moose.

I'm planning to release Flux in small chunks, explaining them along the way in separate blog posts, as time will allow. Today, I'll explain the main ideas behind it, some core classes, and how all its parts are working together.

Don't copy "use autodie" in every module

You pay a constant price in your app's starting performance for each time you use autodie;.

Here's a quick benchmark:

$ time perl -E 'say "package X$_; use autodie qw(:all);" for 1..100;' | perl

real	0m1.482s
user	0m1.431s
sys	0m0.047s

Compare with Moose:

$ time perl -E 'say "package X$_; use Moose;" for 1..100;' | perl

real	0m0.343s
user	0m0.328s
sys	0m0.016s

It doesn't get much better without qw(:all):

$ time perl -E 'say "package X$_; use autodie;" for 1..100;' | perl

real	0m1.212s
user	0m1.169s
sys	0m0.047s

But it gets significantly better if you import only a small number of functions:

$ time perl -E 'say "package X$_; use autodie qw(open close);" for 1..100;' | perl

real	0m0.175s
user	0m0.166s
sys	0m0.011s

Basically, you pay for each function you import, once per function instance, in each module, again and again. That's different from, for example, Moose, where 99% of importing performance hit is on first use, when perl compiles all the code, and then each subsequent import() is almost free. Due to this, if your app has many modules, autodie can easily become the biggest bottleneck in its starting performance.

So, it's a bad idea to add use autodie qw(:all) thoughtlessly to your boilerplate, in addition to use strict; use warnings; use 5.0xx;. If you do need to use autodie, it might be a good idea to explicitly list all functions you want to replace.

PS: I don't know why it should be that way. I know autodie 2.18 does more caching and is significantly faster than previous versions, but it still doesn't cache much, apparently.

PPS: This post was brought to you by questhub, as usual :)

About Vyacheslav Matyukhin

user-pic I wrote Ubic. I worked at Yandex for many years, and now i'm building my own startup questhub.io (formerly PlayPerl). I'm also working on Flux, streaming data processing framework. CPAN ID: MMCLERIC.