Writing Non-Blocking Applications with Mojolicious: Part 2
Last time, I showed you how to write non-blocking (web) applications using Mojolicious. In those examples, each action only had to perform one non-blocking action. In this article I’m going to take things a little further, introducing you to Mojo::IOLoop::Delay. I will use a delay object to perform multiple simultaneous non-blocking actions and wait until they complete before moving on, all without blocking the server for other requests.
The Application
The application I will be demonstrating can be seen in its entirety here, though I’m going to build it up slowly here. The purpose of the application will be to scrape the titles from a list of websites. A modest task to be sure, but one that can be illustrative of much more complex tasks.
#!/usr/bin/env perl
use Mojolicious::Lite;
app->ua->max_redirects(10);
my @urls = qw/mojolicio.us mojocasts.com/;
helper 'render_dumper' => sub {
my $self = shift;
$self->render( text => $self->dumper( \@_ ) );
};
# The routes will go here!
app->start;
Of course we need to import Mojolicious::Lite,
which imports strict
, warnings
, utf8
and all v5.10
features.
Every application has a default Mojo::UserAgent stashed away in the ua
helper.
We will use it later, so the following line I tell it to allow up to 10 redirects per request.
Then I define the list of @urls
that we will be scraping.
In the final example I will encourage you to add to that list, but for now, we will start here.
More interesting is the rendering helper I then build.
This application does’t need a pretty web interface, so I am just going to have it render all the arguments as dumped via the built-in Data::Dumper
helper.
(If you would prefer, you may replace the calls to render_dumper
with $self->render( json => [...] )
to render the items with the built-in Mojo::JSON renderer as easily.)
We will see the routes later, so I’ll skip those. Then finally we start the app. We won’t need any templates for this example.
If you save the file as titles.pl
, you can then run it as
perl titles.pl get [page]
You can also start it as a persistent app using the usual
perl titles.pl daemon
or while developing you might want
morbo titles.pl
which will automatically reload the app if you change the script.
One Non-Blocking Request
Much as I showed in the previous post, we can make a simple route which makes a request of a high-latency service, here another website. Since we don’t want to block the server while we wait for the response, we employ the non-blocking style that Mojolicious makes available.
any '/one' => sub {
my $self = shift;
$self->render_later;
$self->ua->get($urls[0] => sub {
my ($ua, $tx) = @_;
$self->render_dumper($tx->res->dom->at('title')->text);
});
};
Here we can see that we use the useragent to request an external web resource.
We pass a callback to get
so that the server will move on to handle other requests for other clients while it waits on this response.
Once the response comes back from that server, the callback is invoked and the page is rendered and served to the client.
The careful observer will see that I employ Mojo::DOM to parse the response and extract the title text.
If I had wanted to do something even more clever, I could use the CSS3 selectors provided by Mojo::DOM::CSS, and its handy integration with the blessed arrayref container class Mojo::Collection to really dig through and transform the results. Frequent users of jQuery will feel at home using these classes. For now however, extracting titles will suffice.
Try it out by running
perl titles.pl get /one
Two Non-Blocking Requests
One request was simple enough, but what would two such requests look like?
any '/two' => sub {
my $self = shift;
$self->render_later;
my @titles;
$self->ua->get($urls[0] => sub {
my ($ua, $tx) = @_;
push @titles, $tx->res->dom->at('title')->text;
$self->ua->get($urls[1] => sub {
my ($ua, $tx) = @_;
push @titles, $tx->res->dom->at('title')->text;
$self->render_dumper(@titles);
});
});
};
Yikes! I mean it runs, and it works, but … ick.
Three problems are immediately evident when I look at this code.
First, it’s starting to suffer from arrow code or callback hell problems.
Second, it doesn’t easily generalize from 2 items to n
items.
Sure you could write some recursive callback which might “help”, but that still doesn’t fix the biggest problem.
Although this code doesn’t block the server, within the scope of rendering this request, each request does block the next one! This is easy to see, the callback is only invoked once the response is received. Since the next request is contained in the callback for the previous, you cannot kick off all the requests simultaneously and then collect the results when the each finish.
But this would be a bad tutorial if I left it there :-)
Using Mojo::IOLoop::Delay to Manage Non-Blocking Flow
As I hinted in the introduction, the solution is contained in the remarkably useful module Mojo::IOLoop::Delay. An instance of this class can be used to manage the flow of both parallel and sequential non-blocking steps, and then take some action on completion. When written using a delay object, the above example looks like this:
any '/all' => sub {
my $self = shift;
$self->render_later;
my $delay = Mojo::IOLoop->delay;
$delay->on(finish => sub{
my $delay = shift;
my @titles = map { $_->res->dom->at('title')->text } @_;
$self->render_dumper(@titles);
});
$self->ua->get( $_ => $delay->begin ) for @urls;
};
The flow of this route handler is very different from the previous on first glance, so I will walk you through it.
As with all of the non-blocking handlers, we first call render_later
to prevent automatic rendering.
We then create a delay object and attach a callback to be used as the “on finish” handler.
The callback to the handler has the delay itself as the first argument, and possibly more arguments; we’ll talk more about these in a minute.
As you can see, @_
contains the transaction objects from before ($tx
) and we then extract the titles and render the result.
But how did they get there?
Remember this is the “on finish” handler, before we worry about how these arguments work, let’s look at the meat of the process.
The bulk of the magic happens with the begin
method on the delay object.
This method does two things.
First, it increments an internal counter, which keeps track of the number of parallel tasks to wait on before advancing invoking the finish handler (or moving on to the next step, shoot, I shouldn’t mention that yet :-) ).
Second, it returns a callback suitable to be used as the callback for some non-blocking call.
(Of course you may wrap that callback in your own subroutine reference if you want to do some other processing, but that is for another post).
That callback itself does two things when invoked.
First, it decrements that internal counter.
Second, it stores the arguments passed to it (though by default it strips the first argument since you usually don’t need the callback invocant).
When the finish handler (or the next step) is called, what it gets is the list of these arguments.
The list is built up of all the arguments to the first begin
, then all the arguments to the second, etc, no matter in which order the processes complete.
Now that you know this, you can see how the finish handler works.
When we kick off the ua->get
calls, we know that the resulting transations will be passed to the finish handler via the begin
callbacks.
Actually now that you look at it, you can see that it doesn’t matter how many @urls
you have, this will work exactly as expected for any number of parallel calls.
Try adding a few more sites and run:
perl titles.pl get /all
to see the results.
The Proof in the Pudding
As I did last time, I do want to show a comparison of some metrics. I hope I have convinced you that writing non-blocking applications will help serve many clients, so this metric is only about the speed of serving one client who needs these two high-latency data sources to respond.
When I run
time perl titles.pl get /two
I see the expected results, and the amount of time taken
real 0m1.121s
user 0m0.222s
sys 0m0.012s
Versus running
time perl titles.pl get /all
which requests the same information, but only takes
real 0m0.670s
user 0m0.202s
sys 0m0.019s
The explanation for this time difference is easy to understand. From the data it appears that each website takes about 0.5-0.6s to respond. By performing these requests in parallel, the time it takes to respond to the client is only limited by the slowest response. Rather if the server makes these requests sequentially, the times add up.
If you can now serve more clients simultaneously by writing the application in a non-blocking style and their requests are each coming back faster because you manage the high-latency requests that your response depends on, you are making real progress in my book :-)
Conclusion
The example I presented revolved around web scraping, but in reality this technique will work for any high-latency dependencies in your rendering chain. This might include database reads and writes, file system reads and writes, or other long-running processes, though you need to have a non-blocking mechanism for making these requests.
In the last example I used Mango to do non-blocking database interaction. In this example I used Mojolicious built-in non-blocking user agent for making web requests. In a future post in this series I will present ideas for how to do blocking processes in a non-blocking framework.
I hope I have shown how you can use delays to make parallel non-blocking calls. For now, I leave it as an exercise to the reader to use multiple steps to perform sequential non-blocking calls, though I plan this as a topic of a future post as well.
In the meantime, happy Perling!
P.S. YAPC::Brazil
If you happen to be in Brazil in a few weeks for YAPC::Brazil I will be speaking about some of the design patterns I have used in my scientific modeling. However, I will also be happy to talk with anyone about Mojolicious or any other the other projects that I am involved in. Hope to see you there!
This post is the second in an on-going series, beginning with part 1 and continued in part 3
good job,man
I would be very glad to see an example of using DBI in a non-blocking application (DBD::mysql/SQLite/Pg) or whatever relational database with or without ORM.
It would be interesting to see how a blocking Mojolicious (non-Lite) application can be easily turned into a nonblocking one.
For example is it feasible to leverage the “around_action” hook and wrap all your otherwise blocking actions into a
object, this way turning instantly your blocking application into a nonblocking one?
Thank you for sharing the knowledge!!!!
The closest thing I know of is sri’s Coro hack, which lets you write your code in a style almost like blocking: https://gist.github.com/kraih/6082061 This is still not purely automatic.
As to nonblocking dbi examples/apps, remember that you must be using a nonblocking db driver and few of these exist. Mango works for mongodb and Mojo::Redis works too. I’m not sure what the state of nonblocking dbi is.