Thread pool for a boss/worker model

This is a pretty simple idea - a boss thread assigns work to a pool of worker threads who do nothing until some work enters their queue. This way the boss can fill a queue very quickly and you have multiple back end processes that can consume that queue.

I'm using threading and not an async thing because some of the work I'll be assigning to threads are long-polling operations. The workers will hit some REST API route on some other application, and some of those routes take up to 30 seconds to complete or have dependencies or followup work. Rather than block and spin in an async call, for my tasks, its easier to have a queue of work and workers that execute them.

Each task may have multiple sub-tasks associated that are ordered, so each worker will be assigned a "task group". This way it can manage its own dependencies and I can manage the total load on both the server its sitting on and the cloud, and the database. Its not about "speed" in the sense of less time to execute, its more about keeping a small pipe filled and not blocking other workers while on a long blocking call.

There aren't a lot of thread pooling type modules that I could see on CPAN. Its not a complex task, but there are a lot of things to think about. The Thread::Pool module causes problems with the perl MongoDB module, and thats really the only recent option that seems to fit my problem set.

I ended up rolling my own module using threads::shared, Thread::Queue, and Thread::Semaphore. I basically spawn X amount of threads using the semaphore as a creator of the tokens to keep the number of threads at the right level. I use shared queues and non-blocking checks against the queue from the worker to get its work. I use a shared variable with each worker thread to control its loop and kill it when its time. This also allows for things like stopping the boss and waiting for the workers to drain the queue before killing the pool. You can add workers and stop individual workers. You can keep the workers running and kill the boss, or kill the workers and let the boss run. Boss/workers have callback functions for executing their tasks. Its shiny and runs great.

I'm 99% certain my code will pass any test case I throw against it and it will work fine for what I'll be using it for (work queue for a back-office cloud controller).

Did I miss a module? Is there something better out there that is less intrusive than Thread::Pool? If not I'll clean up the POD on this and submit it to CPAN but I'd rather not duplicate with Yet Another CPAN Module that already has 10 different variations.

9 Comments

I can propose a very different way using zeroMQ .

It a powerful yet simple library for messaging between processes, you may implements everything you need just using simples processes communicating easily with each others.
Using only messaging, you drop a whole class of concurrent acces issues.

You have a perl module for it .

Hi Kal

What about these Perl modules:
1) Gearman
2) Hopkins
3) TheSchwartz
4) Helios (which uses TheSchwatrz)

In fact, it'd be marvellous if you'd try all of them and write a comparative review for those of us who haven't actually tried any...

Another suggestion: beanstalkd and the Perl module Beanstalk::Client.

I use this by forking off a number of workers per machine. Then I add more machines as load increases. Having isolated processes that only talk to the beanstalkd server means I don't have to deal with synchronisation or blocking issues. This makes for very simple and efficient code.

Have you seen Parallel::Workers or Thread::Pool on CPAN?

Gratuitous self promotion: Thread::Apartment provides a pooling interface, and should allow use of direct method calls to threaded objects.

Hi Kal

Beanstalk::Client is written by Graham Barr, which is a big plus in my books.

BTW: I haven't used any of these, but would be v-e-r-y confused if confronted with the need to to do.

As for overlooking those modules I listed, I'm not surprised. Many modules have obscurantistic names.

That's why I keep a tiddlywiki (check them out!) (one of many) for Perl with a section dedicated to 'Interesting Modules' I stumble across.

I guess it's time to calve out that list into its own web page and put it on my site. I'll do that over the next week and blog on blogs.perl.org.

Right now I'm chasing an embarrassingly huge bug on GraphViz2::Marpa........

Cheers
Ron

Leave a comment

About Kal

user-pic I stuff with perl and stuff.