Merry Christmas! Parallel testing with Test::Class::Moose has arrived
You'll want to checkout the forks branch to see it in action. Read the docs for Test::Class::Moose::Role::Parallel to see how to use it (you'll probably need to create your own schedule).
What follows is a very naïve benchmark where I reduced a 12 minute test suite down to 30 seconds.
Before we start, the following script is what I used to generate benchmarking material. It generates about 300 test classes. We assume an average load time of one second for the lib/
directory and for added fun, we threw a sleep
in a test in the base class (which is overridden in child classes, so they don't experience that slow down).
use 5.10.1;
use strict;
use warnings;
use autodie ':all';
use File::Path 'make_path';
make_path('lib');
make_path('t/lib/TestsFor');
make_path('tcm');
#
# Create a slow loading class
#
open my $fh, '>', 'lib/SlowLoader.pm';
print $fh <<END;
package SlowLoader;
use strict;
use warnings;
BEGIN { sleep 1 }
1;
END
close $fh;
#
# Create our test base class
#
open my $base_class, '>', 't/lib/MyBaseClass.pm';
print $base_class <<'END';
package MyBaseClass;
use Test::Class::Moose;
with 'Test::Class::Moose::Role::Parallel';
1;
END
#
# Create our test classes and their driver .t files
#
my $module = 'A';
for ( 1 .. 100 ) {
foreach my $sub ( \&parent, \&child, \&grandchild ) {
my ( $name, $code ) = $sub->($module);
my $filename;
if ( $name =~ /TestsFor::(.*)/ ) {
$filename = $1;
open my $fh, '>', "t/lib/TestsFor/$filename.pm";
print $fh $code;
open my $driver, '>', "t/$filename.t";
print $driver <<"END";
use lib 't/lib';
use $name;
Test::Class::Moose->new->runtests;
END
}
}
$module++;
}
#
# Create a Test::Class::Moose single process driver
#
open my $tcm, '>', 'tcm/tcm_standard.t';
print $tcm <<'END';
use Test::Class::Moose::Load 't/lib';
MyBaseClass->new(
jobs => ( $ENV{NUM_JOBS} // 0 ),
statistics => 1
)->runtests;
END
sub parent {
my $module = shift;
my $name = "TestsFor::$module";
my $code = <<"END";
package $name;
use Test::Class::Moose extends => 'MyBaseClass';
use SlowLoader;
sub test_this { sleep 1; ok 1, "test $_" for 1 .. 5; }
sub test_that { ok 1, "test $_" for 1 .. 5; }
1;
END
return $name, $code;
}
sub child {
my $module = shift;
my $name = "TestsFor::Child$module";
my $code = <<"END";
package $name;
use Test::Class::Moose extends => 'TestsFor::$module';
sub test_this { ok 1, "test $_" for 1 .. 3; }
1;
END
return $name, $code;
}
sub grandchild {
my $module = shift;
my $name = "TestsFor::GrandChild$module";
my $code = <<"END";
package $name;
use Test::Class::Moose extends => 'TestsFor::Child$module';
1;
END
return $name, $code;
}
We can run the test suite with prove
:
$ prove -l t
t/A.t ............. ok
t/AA.t ............ ok
t/AB.t ............ ok
t/AC.t ............ ok
t/AD.t ............ ok
...
t/O.t ............. ok
t/P.t ............. ok
t/Q.t ............. ok
t/R.t ............. ok
t/S.t ............. ok
t/T.t ............. ok
t/U.t ............. ok
t/V.t ............. ok
t/W.t ............. ok
t/X.t ............. ok
t/Y.t ............. ok
t/Z.t ............. ok
All tests successful.
Files=300, Tests=1200, 704 wallclock secs
Result: PASS
So that's almost 12 minutes. As you know from my previous post, there are a lot of duplicated tests in there due to test inheritance.
So now let's run them in a single process, using Test::Class::Moose::Load (note: this program is also created by the script above):
prove -l tcm/
tcm/tcm_standard.t .. 301/301 # Test classes: 301
# Test methods: 600
# Total tests run: 2600
tcm/tcm_standard.t .. ok
All tests successful.
Files=1, Tests=301, 106 wallclock secs
Result: PASS
Not bad! Less than two minutes.
We can do even better with forkprove:
forkprove -Ilib -MSlowLoader -j8 t
...
t/Z.t ............. ok
All tests successful.
Files=300, t/Z.t ............. ok
All tests successful.
Files=300, Tests=1200, 58 wallclock secs
Result: PASS
Unfortunately, forkprove
doesn't offer scheduling, a key need of large test suites.
So now let's try again with 8 jobs, using Test::Class::Moose::Role::Parallel
features:
$ NUM_JOBS=8 prove -l tcm/
tcm/tcm_standard.t .. ok
All tests successful.
Files=1, Tests=8, 31 wallclock secs
Result: PASS
Down to half a minute. Awesome!
There are some caveats:
- You cannot (currently) use
Test::Class::Moose
reporting features with parallel tests - The tests will currently appear to hang until they're finished
- The scheduler is naïve and you'll probably need to write your own
While those numbers look impressive, it's important to remember that your results are very unlikely to be this good. Just because you've forked off process, you will likely have code fighting for resources (bandwidth, databases, etc.)
Also, you'll probably need to provide your own schedule()
method because:
- Not all tests can be run in parallel
- Some tests can be run in parallel, but only with a subset of other tests
- You'll want to distribute long-running methods across separate jobs
For those curious how I pulled this off, this is all subject to wild change, but surprisingly, I didn't have to do any monkey-patching of code. It works like this:
I use Parallel::ForkManager to create our jobs.
For each job, I grab the schedule for that job number and the test_classes
and test_methods
methods only return classes and methods in the current job
schedule. Then I run only those tests, but capture the output like this:
my $builder = Test::Builder->new;
my $output;
$builder->output( \$output );
$builder->failure_output( \$output );
$builder->todo_output( \$output );
$self->runtests;
# $output contains the TAP
Afterwards, if there are any sequential tests, I run them using the above procedure.
All output is assembled using the experimental TAP::Stream
module bundled
with this one. If it works, I may break it into a separate distribution
later. That module allows you to combine multiple TAP streams into a single
stream using subtests.
Then I simply print the resulting combined TAP to the current
Test::Builder
output handle (defaults to STDOUT
) and prove
can read the
output as usual.
Note that because we're merging the regular output, failure output, and TODO output into a single stream, there could be side effects if your failure output or TODO output resembles TAP (and doesn't have a leading '#' mark to indicate that it should be ignored).
Have fun and let me know what you think!
Cool beans. Maybe a next level parallelism can be achieved by distributing tests via Gearman.