Single process versus parallel tests
Whenever I present a talk on Test::Class or one of its variants, invariably someone asks me about parallelization. The reason is simple: I advocate running your xUnit tests in a single process instead of multiple processes, but it's hard to run tests in parallel when they're already forced into a single process.
For Test::Class
, this means using separate *.t
tests for every test class versus using a single *.t
test and Test::Class::Load
.
I am working on making parallel tests possible with Test::Class::Moose
, and while I have test classes running in parallel, the confused is output (yes, that was deliberate). I know how to solve this using only publicly exposed APIs, but there are some tricky bits. I thought about asking for a TPF grant, but since most don't use xUnit style testing, the value seems marginal. Plus, I am on the Board of Directors for the Perl Foundation and that can look like a conflict of interest. Hence, my slow work in this area.
That being said, it's worth doing the math and asking ourselves where we get the greatest gain.
The problem requires some explanation and recently while writing a proposal for a new contract (ah, the joys of being freelance), I found myself trying to explain, again, the trade-offs involved. So here's some backstory.
I was working on a large system that had approximately 500 *.t
files. If you loaded the entire system, which used Catalyst, Moose, DBix::Class, many other CPAN modules and, of course, the massive code base itself, it took three seconds to load everything. While a many developers would be ecstatic at a system that "builds" in only three seconds, we weren't. Many of our tests spent an extra 3 seconds to load the code, plus extra time to connect/disconnect from the database, or other "one-shot" tasks that are only repeated because each *.t
test runs separately.
Let's keep the math simple and assume that each test takes an average of 2 seconds to run after the load time. And we'll assume, for the sake of simplicity, that every test loads everything. In this case, calculating the test suite time for test suite is:
$number_of_tests * ( $load_time + $run_time )
That's 2500 seconds, or almost 42 minutes. Not good! However, in theory you can get that test suite running in under 3 minutes. We'll explain that later, but practice and theory, of course, are not friends.
So now you decide that you need to speed up your test suite. Assuming that every test must load everything and the average run time won't change, you have two basic choices:
- Run all tests in a single process
- Run tests in parallel
Which of these is going to get you the greatest bang for the buck?
Well, that turns out to be tricky.
We'll take the easy route first. Let's say you want to go the single process route. That involves something like Test::Aggregate or Test::Aggregate::Nested
(part of the former's distribution) to gradually move all tests over to an "aggregate" test directory. The theoretical run time for this is:
( $num_tests * $run_time ) + $load_time
The reason $load_time
is only added once is because Test::Aggregate
and friends run all tests in a single process, thus avoiding reloading the modules. As a result, your 42 minute test suite has now dropped to about 17 minutes. That's a huge win! Naturally, not all tests play nicely with this. Tests that damage global state play havoc with this approach, but in our experience, most tests can be run in a single process with a bit of fine-tuning. (In the real world, I generally find test suite run times dropping by one-half to two-thirds using this approach).
You can also use Test::Class
or one of its variants (mostly written by me, I must confess), but that's beyond the scope of this post.
If you want to run tests in parallel, you can read TAP::Parser::Scheduler to understand how to create a schedule. We're going to be very, very kind and assume that all tests take the same amount of time to run and can be run in parallel. However, keep in mind what's actually happening. Let's say you have a wimpy box and you can only run two jobs in parallel. That means you have 250 tests per job, but those tests run sequentially and are loaded separately. Thus, your 42 minute test suite only runs in 21 minutes and that makes it slower than a single process test suite. The math looks like this:
( $num_tests / $num_jobs ) * ( $load_time + $run_time );
In reality, many of us today have multiple processor machines and we can run more than just two jobs.
However, the gain isn't as much as we can hope for because what happens is that you split the tests over multiple jobs, but each job, for each test, must still load perl, the CPAN modules, your code base, and then run the tests. It would be nice if that could only be done once per job, but again, we'll come back to that in a moment.
If you can run 4 jobs, your test suite drops to about ten and a half minutes and if you can run six jobs, you get a seven minute test suite run, and 6 minutes for 8 jobs (diminishing marginal returns, anyone?)
This sounds ideal, but in reality, parallelism causes all sorts of issues, such as lock contention and global state maintenance nightmares. In practice, what usually winds up happening is that you run most tests in parallel, but you have a set of tests which much run sequentially after the parallel tests, thus limiting much of the gain vis-a-vis single process tests.
So in reality, given the amount of work it might take to make tests run in parallel, running them in a single-process might actually be a better option.
But as Al Gore once said, "you win some, you lose some, and then there's the little-known third option."
I have a forking branch of Test::Class::Moose that run tests in parallel. Currently it runs tests in parallel, but the various tests spit out TAP which gets mixed with other tests. My current plan is this:
- Track which classes have which tests
- Run the tests according to the schedule, using something similar to my prove progress bar hack
- Capture the test output using the publicly exposed API from Test::Builder
- Reassemble the output per class
- Print the output for
Test::Harness
to read - Profit!
(Note that test reporting wouldn't work in that scenario, unless someone uses MooseX::Storage
or something similar to recreate the report objects)
In the future, it should be a simple matter of using the Test::Class::Moose::Role::Parallel
role and using its naïve schedule or writing your own schedule()
method.
What this would do it let you parallelize your test suite and you would load perl, your CPAN modules and your codebase only once per job. Running 6 jobs for the 42 minute hypothetical test suite above, you could run the test suite in under three minutes! Can you imagine how productive your team could be if your test suite was over ten times faster? Unfortunately, that's a bit of work and given the time I'm spending chasing contracts and spending time with my wife and daughter, I simply haven't had the time to get to this. However, I'm sure someone could (hint, hint).
Please note that I've made it clear repeatedly that the above data is on a hypothetical scenario that loosely models a real-world problem I solved (I got an hour+ test suite down to about 12 minutes). If you want to play around with my assumptions,here's the code I used to create the timing information:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.10.0;
use Getopt::Long;
# Yes, this works, but damn, it looks crazy to some people.
GetOptions(
'load_time=i' => \( my $load_time = 3 ),
'run_time=i' => \( my $run_time = 2 ),
'num_jobs=i' => \( my $num_jobs = 6 ),
'num_tests=i' => \( my $num_tests = 500 ),
);
my @args = ( $load_time, $run_time, $num_tests );
say sprintf "Standard .t tests: %s" => minutes( standard(@args) );
say sprintf "Single process: %s" => minutes( single_process(@args) );
say sprintf "Multiple processes: %s" =>
minutes( multiple_processes( $num_jobs, @args ) );
say sprintf "Test::Class::Moose forks: %s" =>
minutes( test_class_moose_forks( $num_jobs, @args ) );
sub minutes {
my $seconds = shift;
return sprintf "%d minutes %d seconds" => int( $seconds / 60 ),
$seconds % 60;
}
sub standard {
my ( $load_time, $run_time, $num_tests ) = @_;
return $num_tests * ( $load_time + $run_time );
}
sub single_process {
my ( $load_time, $run_time, $num_tests ) = @_;
return ( $num_tests * $run_time ) + $load_time;
}
sub multiple_processes {
my ( $jobs, $load_time, $run_time, $num_tests ) = @_;
$num_tests = int( $num_tests / $jobs );
return standard( $load_time, $run_time, $num_tests );
}
sub test_class_moose_forks {
my ( $jobs, $load_time, $run_time, $num_tests ) = @_;
$num_tests = int( $num_tests / $jobs );
return single_process( $load_time, $run_time, $num_tests );
}
Obviously it's too simplistic and doesn't account for real-world parallel schedules or any of a number of caveats, but it does show the general concepts. Here are the numbers assuming the above data using 4 jobs:
Standard .t tests: 41 minutes 40 seconds
Single process: 16 minutes 43 seconds
Multiple processes: 10 minutes 25 seconds
Test::Class::Moose forks: 4 minutes 13 seconds
Note: you may be able to get most of these benefits today using Test::Class::Moose
and TAP::Harness::Remote, but that does require extra servers.
Steven at work has suggested App::Forkprove https://metacpan.org/release/forkprove
I use forkprove almost all the time. Yet there are one or two of my distributions that fail under forkprove. And not due to concurrency issues - by default forkprove runs tests serially.