Writing Non-Blocking Applications with Mojolicious: Part 3

This the the third part in an on-going series about using Mojolicious to write non-blockging applications (with an eye towards the web, obviously). In part 1 I demonstrated the how it can improve the number of requests/clients served when the application uses high-latency backends (in that case a database). In part 2 I showed how each request can be sped up when that request needs multiple resources from a high-latency service (e.g. external web services).

In each, I showed a blocking example, then a non-blocking example. I then gave the usual warning that you had to use a Mojolicious server for the nonblocking version. While its true that you need a Mojolicious server to get the benefits of the nonblocking architecture, in this post I will show how with a little care in construction, you can build your application so that it will run correctly on any supported server and the nonblocking benefits will be evident where possible.

For this example, I was inspired by a recent blog post by Charlie Harvey. In it he describes a web service, using Perl and CGI, he created which turns your twitter stream into an atom feed. His service got more attention than expected so the post continues by describing how he rewrote the service using FCGI to improve throughput (amongst other optimizations).

Naturally my thoughts went to Mojolicious. The non-blocking user agent could easily be leveraged to get the data, and the non-blocking server would support a much higher throughput, as my previous examples demonstrate. So, yesterday I sat down and ported the code to Mojolcious. This made for a good real-world example, so I decided to make it part of the series.

Blocking

As usual when porting a piece of code to Mojo, I first ported more-or-less directly, in a standard blocking style. I moved some of the logic to helpers, I translated the xpath to css selectors, the xml template is now externalized, but pratically a straight port.

The import part of the code is this:

my $tx = $self->ua->get($url);
unless ($tx->success) { 
  return $self->render_exception(scalar $tx->error);
}

my $tweets = $tx->res->dom('li.js-stream-item .tweet .content');
my $items  = $tweets->map(sub{$self->parse_tweet($_)});  

$self->res->headers->cache_control("max-age=$max_age");
$self->render( 'atom', format => 'rss', user => $user, items => $items );

The logic is easy enough to follow

  1. request the stream url
  2. find all the tweets
  3. parse them into data structures (@$items) using a helper
  4. render the xml from the template.

Non-Blocking

From there it is simple enough arrange for the rendering to be done in the callback for the user-agent request, like so:

$self->render_later;
my $tx = $self->ua->get($url, sub {
  my ($ua, $tx) = @_;
  unless ($tx->success) { 
    return $self->render_exception(scalar $tx->error);
  }

  my $tweets = $tx->res->dom('li.js-stream-item .tweet .content');
  my $items  = $tweets->map(sub{$self->parse_tweet($_)});  

  $self->res->headers->cache_control("max-age=$max_age");
  $self->render( 'atom', format => 'rss', user => $user, items => $items );
});

As I have shown before, it is important to call render_later to prevent automatic rendering. From there it is easy to see that most of the logic is simply moved into the callback.

So far, so good, now what?

When I pushed these scripts to my repo fork on github, the biggest problem I had was describing to the user, presumably one who did not have experience with Mojolicious, how to run the different examples. The non-blocking script requires the Mojolicious servers, the blocking script can run under others as well. Once I had it written I saw that the biggest issue here was deployment. But that doesn’t have to be.

The portability problem

To restate the underlying problem: a non-blocking script, if run without an event loop, will simply start the first request and then never call the callback. If you know that the script will be run on a server without an appropriate event loop, the same code will work if you simply start such a loop after the request is made, then stop it again when done.

While this is indeed how we will solve the portability issue, it presents another problem. When this code would be run on a server with an event loop, you might get a problem when telling the loop to start (again), and you certainly will run into problems when you then tell it to stop.

We need some way to tell the code to start a loop if necessary, and if it was necessary, then stop it again when done. Challenge accepted!

Using Mojo::IOLoop::Delay to provide portability

In fact Mojo::IOLoop::Delay (seen previously in part 2 to manage parallel requests) has a mechanism to do just what we need, as seen here

$self->render_later;
my $delay = Mojo::IOLoop->delay(
  sub { 
    my $delay = shift;
    $self->ua->get($url, $delay->begin);
  },
  sub {
    my ($delay, $tx) = @_;
    return $self->render_exception(scalar $tx->error)
      unless $tx->success;

    my $tweets = $tx->res->dom('li.js-stream-item .tweet .content');
    my $items  = $tweets->map(sub{$self->parse_tweet($_)});  

    $self->res->headers->cache_control("max-age=$max_age");
    $self->render( 'atom', format => 'rss', user => $user, items => $items );
  }
);
$delay->wait unless $delay->ioloop->is_running;

The delay constructor, if passed a list of subrefs, will use these as so-called “steps”, each of which will be performed sequentially. Any number of parallel non-blocking tasks may be started (using $delay->begin) and only when all are finished will the next step be started. The step itself will not block the main loop (assuming you don’t block in the step).

Interestingly, as described in part 2, the call to the begin method returns a special callback, one which will take the arguments passed to it and pass them forward to the next step. In this way the second step gets the result of the get request, namely the transaction object $tx. From there the rendering proceeds as before.

The secret sauce is the last line,

$delay->wait unless $delay->ioloop->is_running;

(Remember that the steps are now queued up to be run using the event loop, but this line occurs before those steps begin.)

The conditional checks to see if the loop is already running (the case when using a Mojolicious server) and if it is, nothing special is done. If the loop is not running, we are calling $delay->wait. Under the hood the wait method does two things:

  1. it attaches an finish event handler to the delay which will stop the loop
  2. it starts the loop

This is exactly the kind of logic we needed. When the steps are all completed, and thus the page is rendered, we will only stop the the loop if we had to start it.

One more note about using steps

While I didn’t really use the power of steps to do multiple sequential non-blocking requests, and I hope to in a future post, you can get an idea of the process already. If more subrefs were used to create the steps, those would all be run, in order, when ready, before completing. Maybe you want to find data from several web services then process the data (map-reduce style), then insert it into a database, then render a confirmation message. You want to be sure to get and process all of the data before inserting it into the database, and you want to be sure it inserted before renedering. Using the steps concept, doing these sequential non-blocking steps is easier than ever! But like I said more about that in a future example.

In conclusion

Admittedly I haven’t done any benchmarks on Charlie’s code vs my nonblocking version. This is do mostly to the fact that I don’t want to set up a CGI server (ever again!). I’m going to trust the metrics that I have laid out before on the benefits of nonblocking vs blocking and call it a day. What I am happy about is that Charlie can still deploy under CGI (with a couple path tweaks since I moved some files), or via PSGI/Plack or via a Mojolicious nonblocking server all using the same portable application script.

This post is the third in an on-going series, beginning with part 1

3 Comments

Hey Joel,

Thanks for a great post, I’m going to go back and read the rest of the series too.

I’ll have a crack at benchmarking the mojo version, I’m especially interested in what the memory usage is like when there are lots of requests coming in.

Charlie

This way you’ll have to add extra code to “provide portability” in each controller#action. There is a better way - add global hook for cgi mode in your app’s startup script: https://gist.github.com/powerman/5456484

Leave a comment

About Joel Berger

user-pic As I delve into the deeper Perl magic I like to share what I can.