I am attempting to reinvent a wheel. One that has many variants on CPAN, and that is the lowly job queue. I know that there is Gearman, POE::Component::JobQueue, POEx::WorkerPool, Helios, Swarmage, TheSchwartz, and probably a couple of more I don't know about, but nothing really did what I need. So I am back starting from scratch.

My Goals:

  • Reliable - Jobs must not get lost

  • Data store independent - true data store independence with support for multiple backends

  • Simple - No complicated features built into the core

  • Multiple Client Support

  • Documented - Well defined and documented APIs

  • Tested - Generic tests for APIs that are independent of the implementation details

Initially, TheSchwartz seemed to do everything I wanted, but the documentation is lacking and it relies on a SQL database as the store. Additionally, the tests failed when I attempted to install it and it's pretty clear that Brad is no longer maintaining it, so I guess it's time todo it myself.

At this point, I'm kind of lost, but I think that if I pay close attention to the design of the API, I will come up with something worth putting on the CPAN. For now, you can see the progress at:


Random thoughts:

One thing that I liked about the TheSchwartz and was written into the proprietary job queue used at my last company is a feature to 'coalesce' jobs that are similar in some way and let them be handled by a single job. For example, you may have a job called "SendWelcomeEmail" which takes an email address and sends a message. With coalesce you'd group a bunch of email addresses and send the message in one go. This feature can reduce load for certain types of jobs.

Something I've seen some jobqueues is the ability to set a job to 'retry' if it fails. Last time I looked at Gearman this was missing but I find it valuable to handle retrys of common case problems, such as having an email server temporarily down, etc.

I really liked the distributed nature of Gearman and also the fact it supports languages other than Perl. In some companies this would be relevant. I'd like to see something with crazy easy distributed support, like if you started up an extra job server and said 'you belong to this queue' it would all just start working.

Some sort of RESTful API would also be nice :) Oh, and lots of console admin tools ;)

I'll be watching your progress on github. For the storage side, I wonder if somehow you could leverage the work of KiokuDB to provide a bunch of storage backends with little effort?


Have you looked at Beanstalkd? It now offers queue persistence (and there are Perl bindings). I looked into this briefly a few months ago and came to the conclusion that between TheSchwartz, Beanstalkd, Redis, and the Starling-alikes*, I wasn't likely to do be breaking new ground. But queues are great! What's going to distinguish your guy from TS, other than back-end independence?

* Key features were job buckets and run_after, which TS and Beanstalkd both support, and queue persistence, which Beanstalkd didn't support until recently.

Leave a comment

About Guillermo Roditi

user-pic I am attempting to blog about perl