Preventing Collisions with Perl cron jobs

Imagine you are a brilliant developer who just created a Perl script that takes form submissions from your website and imports them to your ticketing system. Well done! Now you want to set this script to run periodically so that as new requests come in they are automatically submitted to your ticketing system. Working in a Linux environment, you quickly add a line to the crontab to ensure this script runs every 5 minutes.

Perfect! Your done.

Time passes …

One day you start receiving complaints that duplicate tickets are being submitted to your ticketing system. After some investigation, you discover that the number of forms being submitted to your website is more than your script can process before another instance of your script is being called again. This results in your script processing the same form multiple times!

[09:00:00 Instance 1] Searching for all cases … found 8!
[09:01:03 Instance 1] Processing case A
[09:02:12 Instance 1] Processing case B
[09:02:55 Instance 1] Processing case C
[09:03:20 Instance 1] Processing case D
[09:04:32 Instance 1] Processing case E
[09:05:00 Instance 2] Searching for all cases … found 3!
[09:05:02 Instance 2] Processing case F
[09:05:12 Instance 1] Processing case F
[09:06:10 Instance 2] Processing case G
[09:06:28 Instance 1] Processing case G
[09:07:42 Instance 2] Processing case H
[09:07:50 Instance 1] Processing case H

One way to do this, is to prevent your script from running again until the previous instance has finished. Using, the Proc::ProcessTable module, you can retrieve a listing of running processes to see if your script is already running, and if so, exit to let your other instance finish.

use Proc::ProcessTable;
my $count = 0;
my $table = Proc::ProcessTable->new;
for my $process ( @{ $table->table } ) {
  next unless $process->{cmndline};
  if ($process->{cmndline} =~ /$0/) {
    $count++;
    exit if $count > 1;
  }
}

In the above code, we create an instance of the Proc::ProcessTable, and by calling the table method, we can iterate through the list of running processes and check to see if the script is running. If the total number of instances is greater than one, the script simply exits.

Checking for the exists of the script, is accomplished by using the special Perl variable $0 which is populated with the name of the file being executed. So, for example, if your script was called “process_website_forms.pl” this is what the $0 would contain.

Simply place this block of code at the start of your script, and there will be no more redundant cases submitted to your ticketing system.

6 Comments

or:

use Proc::ProcessTable;
use List::MoreUtils 'any';
exit
    if any { $_->{cmndline} =~ /\Q$0/ }
    @{ Proc::ProcessTable->new->table };

We do this at $work with a command-line tool called lockrun. You can find the source here: http://www.unixwiz.net/tools/lockrun.html

It uses file based locking with the flock system call. Very effective.

As Alex pointed out, aside from looking at the process table, you can also use other (arguably simpler) methods to make sure that only one instance of your program is running, e.g. file locking, PID file, shared memory, etc.

Some sample CPAN modules for these: Sys::RunAlone, Proc::PID::File.

Cool solution, thanks. An option for jobs which run on > 1 server is DBIx::Locker.

From 12 years ago, but it still works great: the Highlander solution.

I find myself simply moving things to daemons rather than cron jobs.

Leave a comment

About Jonathan Lloyd

user-pic I blog about Perl.