monitoring failed minion tasks with minion-check

Minion-check is a sparrow plugin to monitor failed minion tasks.

One could easily verify if any minion jobs are failed for a certain period of time. Which could be important to understand if you have any failures for your long running task executed by minion.

The installation:

$ sparrow plg install minion-check 
$ sparrow project create myhost
$ sparrow check add myhost minion
$ sparrow check set myhost minion minion-check

And the configuration:

Here you need to provide bash command to run your minion command and optionally period of time to look up failed minion tasks:

$ export EDITOR=nano
$ sparrow check ini myhost minion


# bash script to run minion command
command = cd /foo/bar/app/path && carton exec ./app.pl minion

 # check failed tasks for last 5 minutes, 10 hours, 2 days , etc ...
 history = 10 minutes

That' ok. Now let's run a plugin on the host where your minion workers run:

$ sparrow check run  myhost minion 
# running cd /home/vagrant/sparrow/plugins/public/minion-check && carton exec 'strun --root ./  --ini /home/vagrant/sparrow/projects/myhost/checkpoints/minion/suite.ini ' ...

/tmp/.outthentic/18157/home/vagrant/sparrow/plugins/public/minion-check/failed-tasks/story.t ..
ok 1 - stdout is already set
ok 2 - stdout saved to /tmp/.outthentic/18157/3r2uXbq_kM
ok 3 - output match /Q=(1|0)/
ok 4 - output match /(\d+)\s+failed/
ok 5 - stdout is already set
ok 6 - stdout saved to /tmp/.outthentic/18157/HpQ2V5mxm4
# foo_task (default, failed, p0, r0)
# []
# "100 at app.pl line 11, <DATA> line 742.\n"
# 2016-04-01T19:06:24Z (created)
# 2016-04-01T19:08:52Z (started)
# 2016-04-01T19:08:53Z (finished)
ok 7 - output match /(.*)/
ok 8 - output match /finished/
ok 9 - '2016-04-01T19:08:53Z (finished)' match /(\d\d\d\d-\d\d-\d\d)T(\S+)Z.*/
not ok 10 - 0 failed jobs found for period 10 minutes
1..10

#   Failed test '0 failed jobs found for period 10 minutes'
#   at /home/vagrant/sparrow/plugins/public/minion-check/local/lib/perl5/Outthentic.pm line 130.
# Looks like you failed 1 test of 10.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/10 subtests

Test Summary Report
-------------------
/tmp/.outthentic/18157/home/vagrant/sparrow/plugins/public/minion-check/failed-tasks/story.t (Wstat: 256 Tests: 10 Failed: 1)
  Failed test:  10
  Non-zero exit status: 1
Files=1, Tests=10,  1 wallclock secs ( 0.03 usr  0.00 sys +  0.60 cusr  0.05 csys =  0.68 CPU)
Result: FAIL

Running by cron.

Thus, one may monitor minion tasks failures by cron every 10 minutes:

$ crontab -l

*/10 * * * * sparrow check run  myhost minion --cron

Running sparrow check with --cron options means giving output only in case of test failures ...

Leave a comment

About melezhik

user-pic Dev & Devops --- Then I beheld all the work of God, that a man cannot find out the work that is done under the sun: because though a man labour to seek it out, yet he shall not find it; yea further; though a wise man think to know it, yet shall he not be able to find it. (Ecclesiastes 8:17)