monitoring failed minion tasks with minion-check
Minion-check is a sparrow plugin to monitor failed minion tasks.
One could easily verify if any minion jobs are failed for a certain period of time. Which could be important to understand if you have any failures for your long running task executed by minion.
The installation:
$ sparrow plg install minion-check
$ sparrow project create myhost
$ sparrow check add myhost minion
$ sparrow check set myhost minion minion-check
And the configuration:
Here you need to provide bash command to run your minion command and optionally period of time to look up failed minion tasks:
$ export EDITOR=nano
$ sparrow check ini myhost minion
# bash script to run minion command
command = cd /foo/bar/app/path && carton exec ./app.pl minion
# check failed tasks for last 5 minutes, 10 hours, 2 days , etc ...
history = 10 minutes
That' ok. Now let's run a plugin on the host where your minion workers run:
$ sparrow check run myhost minion
# running cd /home/vagrant/sparrow/plugins/public/minion-check && carton exec 'strun --root ./ --ini /home/vagrant/sparrow/projects/myhost/checkpoints/minion/suite.ini ' ...
/tmp/.outthentic/18157/home/vagrant/sparrow/plugins/public/minion-check/failed-tasks/story.t ..
ok 1 - stdout is already set
ok 2 - stdout saved to /tmp/.outthentic/18157/3r2uXbq_kM
ok 3 - output match /Q=(1|0)/
ok 4 - output match /(\d+)\s+failed/
ok 5 - stdout is already set
ok 6 - stdout saved to /tmp/.outthentic/18157/HpQ2V5mxm4
# foo_task (default, failed, p0, r0)
# []
# "100 at app.pl line 11, <DATA> line 742.\n"
# 2016-04-01T19:06:24Z (created)
# 2016-04-01T19:08:52Z (started)
# 2016-04-01T19:08:53Z (finished)
ok 7 - output match /(.*)/
ok 8 - output match /finished/
ok 9 - '2016-04-01T19:08:53Z (finished)' match /(\d\d\d\d-\d\d-\d\d)T(\S+)Z.*/
not ok 10 - 0 failed jobs found for period 10 minutes
1..10
# Failed test '0 failed jobs found for period 10 minutes'
# at /home/vagrant/sparrow/plugins/public/minion-check/local/lib/perl5/Outthentic.pm line 130.
# Looks like you failed 1 test of 10.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/10 subtests
Test Summary Report
-------------------
/tmp/.outthentic/18157/home/vagrant/sparrow/plugins/public/minion-check/failed-tasks/story.t (Wstat: 256 Tests: 10 Failed: 1)
Failed test: 10
Non-zero exit status: 1
Files=1, Tests=10, 1 wallclock secs ( 0.03 usr 0.00 sys + 0.60 cusr 0.05 csys = 0.68 CPU)
Result: FAIL
Running by cron.
Thus, one may monitor minion tasks failures by cron every 10 minutes:
$ crontab -l
*/10 * * * * sparrow check run myhost minion --cron
Running sparrow check with --cron options means giving output only in case of test failures ...
Leave a comment