Poor Mans Jobqueue for Catalyst Apps

By davewood on September 17, 2013 10:15 AM

Handling long-running or heavy tasks inside a a requests is something you should avoid.

It blocks the available processes for other requests.
browser request timeout
...

Instead of using one of the job queue implementations available for perl

Resque
ActiveMQ
ZeroMQ
Gearman
TheSchwartz
...

i decided to reuse/abuse my database.

Add a 'job' table


CREATE TABLE "job" (
  "id" serial NOT NULL,
  "test_id" integer NOT NULL,
  "status" character varying DEFAULT 'pending' NOT NULL,
  "created" timestamp NOT NULL,
  "data" character varying,
  PRIMARY KEY ("id")
);

Schedule a job

In our Catalyst Application users create and run tests. Running a test can take minutes to complete. So when the user clicks on the "run" button only a single insert is executed and the request returns.


INSERT INTO "job" ( "created", "test_id", "type")
VALUES ( '2013-09-17 09:53:54+0000', '220', 'test_foo');

Users can see job status

The next thing the user sees is a list of jobs and their status which can be any of the values "pending", "in_progress" or "finished".

Job Daemon

A daemon process takes care ofprocessing the jobs. I am using the excellent module Daemon::Control. script/epplication_job_daemon.initd


#!/usr/bin/env perl

use warnings;
use strict;
use Daemon::Control;
use FindBin qw/$Bin/;

# 1) configure user, group and perl path
my $user  = 'www-data';
my $group = 'www-data';
my $perl  = '/usr/bin/perl';

my $root     = $Bin;
my $program  = "$perl $root/myapp_job_daemon.pl";
my $name     = 'MyAppJobDaemon';
my $pid_file = $root . '/myapp_job_daemon.pid';

Daemon::Control->new(
    {
        name      => $name,
        lsb_start => '$all',
        lsb_stop  => '$all',
        lsb_sdesc => $name,
        lsb_desc  => $name,
        path      => $root . '/myapp_job_daemon.initd',

        user      => $user,
        group     => $group,
        directory => $root,
        program   => $program,

        pid_file    => $pid_file,
        stderr_file => $root . '/myapp_job_daemon.out',
        stdout_file => $root . '/myapp_job_daemon.out',

        fork => 2,    # Default: 2
    }
)->run;

create initd file ... ./script/myapp_job_daemon.initd get_init_file > foo
copy to /etc/init.d/myapp_job_daemon ... cp foo /etc/init.d/myapp_job_daemon
install to runlevels ... update-rc.d myapp_job_daemon defaults

script/myapp_job_daemon.pl

the code for job processing


#!/usr/bin/env perl

use strict;
use warnings;
use 5.010;
use FindBin qw/$Bin/;
use lib "$Bin/../lib";
use MyApp::Util;

my $verbose  = 1;
my $interval = 5;
my $schema   = MyApp::Util::get_schema();
run();

sub run {
    while (1) {
        my $job = check_queue();
        if ( $job ) {
            process_job( $job );
        }
        else {
            sleep($interval);
        }
    }
}

sub check_queue {
    my $schema = shift;
    return $schema->resultset('Job')->search( { status => 'pending' } )->first;
}

sub process_job {
    my $job = shift;

    say "Processing job: " . $job->id
        if $verbose;

    my $test = $job->test;
    die "no test found.\n" unless $test;

    say "Processing test: " . $test->name . '(' . $test->id . ')'
        if $verbose;

    my $test_env = MyApp::Util::get_test_env( $schema );

    $job->update( { status => 'in_progress' } );

    my @results = $test->run( $test_env );

    $job->update(
        {
            data   => \@results,
            status => 'finished',
        }
    );

    say "Finished processing " . @results . " steps."
        if $verbose;
}

Disadvantages

client polls results every n seconds
daemon polls database for pending jobs (could use DB triggers)
doesnt scale so well

But it works and I can come up with a better solution if required.

8 comments

Tagged as:

Catalyst. DBIx::Class, Heavy long running task, Job Queue

8 Comments

Jerome Eteve | September 17, 2013 2:22 PM | Reply

As far as I know, gearman by itself will manage your three needed states "pending", "in_progress" or "finished" just by using the handler it gives you back when you launch a job asynchronously.

If you need extra info (like started date, data , result, error maybe? ..) it makes sense to store it somewhere (in a table).

But it seems that here you're mixing storing extra job attributes (which is perfectly legitimate) and managing the job dispatching itself (which mentionned systems are designed to do).

From your code, it also looks like having more than one demon will put you at risk of processing the same jobs more than once. Could be quite embarrassing/damaging.

rob.kinyon | September 17, 2013 2:28 PM | Reply

"Abuse" is the right word. The moment you hit more than three concurrent jobs, you will see a nasty slowdown because your table will have contention. A relational table is the WORST data structure you could use (other than a stack) to implement a queue. Tables are designed to store data in a way that optimizes for multiple reads per row. Ideally, each row is read 10x or more in its lifetime. A queue, on the other hand, is meant to be write-once-read-once-AND-DELETE. The delete is what kills the database.

Don't be stupid - just use a real queue. You will thank me.

Miguel Prz | September 17, 2013 2:38 PM | Reply

I use Queue::DBI for these tasks

Mike Friedman | September 17, 2013 8:11 PM | Reply

I've implemented this anti-pattern more times than I care to admit. I like Redis for maintaining ephemeral queues these days. Easy and dirt simple. As it happens, I have an old YAPC presentation about this exact topic. (Video, Slides)

lajandy | September 19, 2013 1:35 AM | Reply

Have you taken a look at Helios? I admit it may not quite do what you want out of the box (yet); it currently is more of a "fire-and-forget" job management system rather than one that tightly integrates with your front-end webapp. It does however have a reliable job agent daemon that manages multiple worker processes and extensible APIs for logging and configuration management. There are also a lot of hooks and extension capabilities to tailor the system to work with your environment. We have a lot of new features and APIs coming up in the next major release, so even if it can't quite do what you want yet, it may be able to very soon.

davewood | September 19, 2013 8:00 AM | Reply

Thanks for the input. I will look into improving the job queue as soon as there is time. For the moment it just works but certainly doesn't appear as scalable as before I wrote this posting.

davewood | January 18, 2014 5:01 PM | Reply

reporting back

need more processes now.

david@nio:~/dev/myapp$ ./script/myapp_job_daemon.pl
Found job: 25
Found job: 25
Found job: 25
Processing test_id: 1355
Processing test_id: 1355
Processing test_id: 1355

looking for a job queue now, currently checking out ZeroMQ

davewood | August 1, 2014 10:55 AM | Reply

Update: https://blogs.perl.org/users/davewood/2014/07/asynchronuous-task-distribution-with-anyevent-and-zeromq.html

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About davewood

I like Toast.

More info »

davewood