Eating my own dogfood (parallel tests)
I'm relatively pleased with my work in creating parallel testing with Test::Class::Moose, but I wanted to make sure that it worked with a real world example, so today I took a small, but real test suite and converted it and tried out my parallel testing code. The results were interesting.
The tests were for a personal project of mine that I've hacked on for a while and they were originally written using Test::Class::Most. The test suite is small and has a total of 24 test classes, 53 test methods and 469 tests. It takes around 8 seconds to run on my box. That's very small, but real.
The code is a standard Catalyst
, DBIx::Class
, Template
and Moose
stack. I consistently found that just loading the core modules takes about .5 seconds on my iMac and about a second on my MacBook pro.
$ time perl -MCatalyst -MMoose -MDBIx::Class -MDateTime -e 1
real 0m0.483s
user 0m0.455s
sys 0m0.025s
For 24 test classes, that should add about 12 seconds if I had run them in separate processes. So I expected that my test suite would now take a little over 20 seconds to run.
Boy was I wrong. You see, considering class loading time isn't enough. You also have to consider that when you actually use (not just load) those classes, they do a lot of things internally that don't always need to be done more than once. Here, in a real test suite, running the test classes with separate *.t
files ballooned the run time of the test suite from 8 seconds to 57 seconds. In other words, running separate *.t
files slowed the test suite by a factor of seven.
(Note: because I don't use inheritance in my code, the above slowdown is not due to accidentally duplicated tests).
After seeing that, I converted the tests to use Test::Class::Moose
. I had expected, given the overhead of Moose
, to see the tests run a bit slower. That was a concern because I could see people pointing to that and saying "Moose is too slow for production work" (yes, I still hear this). Instead, the test suite ran marginally faster. It wasn't much of a gain, but it was consistently about half a second faster. I was very surprised, not to mention pleased!
Plus, because I could now use roles, there was some common code that I was using that was easy to refactor out into a role to share a test fixture (the slight runtime gain was with the roles, I might add).
Next, it was time to see how parallel testing did.
I was pretty careful when I designed the original test suite to ensure that it would run in parallel when I could figure out how to make that happen. I did that with the following test control methods in my base class:
sub test_setup {
my $test = shift;
$test->app->schema->txn_begin;
}
sub test_teardown {
my $test = shift;
$test->app->schema->txn_rollback;
}
With that, every test method will run in its own transaction. It was a simple matter of adding the Test::Class::Moose::Role::Parallel
role to my base class and running my tests:
prove -l t/tcm.t :: -j 2 # using Getopt::Long to capture the number of jobs
That only increased the speed of the test suite to 6 seconds (from 8). For a test suite this small, the benefits of forking off extra processes is marginal, at best. Of course, over time, the transaction contention via locked tables may prove to be an issue, too.
I stepped it up to 4 and 8 jobs and got the same results, but with intermittent test failures. Hmm.
I added -v
to make my tests verbose and realized my error. One of my classes, TestsFor::App::DBIx::Migration
requires a test database without a certain table. Oops! So I added a noparallel
tag to all tests that could not safely run in parallel. For example:
sub test_migrate : Tags(noparallel) {
my $test = shift;
my $m = $test->migrator;
for my $level (qw/1 2 1 0 2 0/) {
my $old = $m->version // 0;
ok $m->migrate($level),
"We should be able to migrate from level $old to level $level";
is $m->version, $level,
'... and have out database at the correct level';
}
}
And then in my base class, I had to write my own schedule:
with qw(
Test::Class::Moose::Role::Parallel
Test::Class::Moose::Role::AutoUse
);
use aliased 'Test::Class::Moose::TagRegistry';
# skip some code
sub schedule {
my $self = shift;
my $config = $self->test_configuration;
my $jobs = $config->jobs;
my @schedule;
my $current_job = 0;
my %sequential;
foreach my $test_class ( $self->test_classes ) {
my $test_instance = $test_class->new( $config->args );
METHOD: foreach my $method ( $test_instance->test_methods ) {
if ( TagRegistry->method_has_tag( $test_class, $method, 'noparallel' ) ) {
$sequential{$test_class}{$method} = 1;
next METHOD;
}
$schedule[$current_job] ||= {};
$schedule[$current_job]{$test_class}{$method} = 1;
$current_job++;
$current_job = 0 if $current_job >= $jobs;
}
}
unshift @schedule => \%sequential;
return @schedule;
}
And then my tests blew up due to a bug in Test::Class::Moose
forks branch. I fixed a couple of issues and then pushed it.
Now I can safely run all of my tests in parallel and as the test suite grows, it will be more of a win as time goes on. If a test can't run in parallel, I just add the noparallel
tag and forget about it.
Interestingly, chromatic wrote about his preference for One test class per file. I didn't comment there as he's had to disable comments due to blog spam, so I'll comment here.
chromatic wrote that he prefers separate test drivers per test class because he likes:
- The ability to run an individual class's tests apart from the entire suite
- The knowledge that each test's environment is isolated at the process level
For the first, it's because he doesn't like to type this to run an individual test class:
prove -l t/test_class_runner.t :: Name::Of::Class::To::Run
He states that this is laziness and concedes that it's not that big of a deal. For me, with my mappings in vim, I never notice this. I just hit ,t
and run the individual class.
His next concern is the more serious one (and is the most valid objection):
Second--and this is more important--I like the laziness of knowing that each individual test class I write will run in its own process. No failing test in another class will modify the environment of my test class. No garbage left in the database will be around in my process's view of the database. Maybe that's laziness on my part for not writing copious amounts of cleanup code to recover from possible failures in tests, but it is what it is.
I can understand that concern and I wondered with people using jUnit don't seem to worry about this. Then I realized that Java is far less of a dynamic language and the quick 'n easy hacks we use to just get stuff done are less common.
I don't have to worry about garbage in the database due to my use of transactions and I generally avoid nasty hacks that impact global state. Maybe it's just me, but speeding up my test suite by a factor of seven seems like enough of a win that I'm willing to pay the price. Plus, if my application is naughtily munging state, running my tests in separate processes is less likely to catch that, but running them in the same process increases the odds of finding that tremendously.
So far, everything here appears to be a huge win. Test::Class::Moose
is shaping up to be (in my humble opinion) the best testing framework for Perl. Roles make it easy to share fixtures. Running tests in a single process is a huge win for performance. Running tests in parallel works, but it remains to be seen what the impact will be in the long run.
I'm currently converting a test suite to use TCM and your example of transactional approach is pure genius. What happens when there's transaction within transaction? Eg application code already uses transactions? Thank you
Ovid, Very nice, I have some questions regarding the use of transactions for your tests and making sure not to step on any other tests. Have you used DBICx::TestDatabase and if so, is there a reason for using transactions instead of this? Does using the TestDatabase module add too much time to make it impractical? I'm wondering only because I've started to use it for my projects and I'm curious if I should switch to transactions vs using an in-memory database.
Very interesting. Thanks for posting this.
Roman, if your application code already uses transactions, you'll have to deal with that one a "per app" basis because there are simply too many variables: how it's used, which database you have, are nested transactions allowed, and so on. The main point is catching that prior to the database handle committing to ensure you have test isolation, or use a separate test database that you can throw away, though there's been some discussion on Perl QA about allowing commits to the test database on the theory that tests need to deal with "real" data the same way the application does. I was dead set against that at first, but am rethinking the idea now.
Joel, I would never use DBICx::TestDatabase because it hard-codes the SQLite driver. I know many people say "use SQLite for a test database", but frankly, I find that if I'm testing on a different database than the one I'm using, I'll get too many divergent behaviors, particularly for larger test suites for code bases where the devs are really taking advantage of the database. For example, here's a custom domain I've created for my Postgresql database:
I can use that just like a brand-new datatype in my tables and if I fat-finger my business logic, my database catches it for me. Can't do that too easily in SQLite.
Otherwise, I've often considered having another process create a "pool" of test databases that other workers can checkout out and discard. However, my experience (at least for larger MySQL databases) is that the overhead of recreating a database is significant. For the BBC on one system I worked on, I lowered the test suite run time from 80 minutes to 22 minutes just be avoiding dropping and recreating the test suite for every test file.
Also, isolation, in this case, should be done per test method, not per test class, because you don't want the results of one method to potentially impact the behavior of another. Do you really to drop and recreate the database for every test method?
Note that there are no perfect solutions here and many of our struggles are simply adjusting to artifacts of slow computers and databases.
Based on experience on a relatively small project, using transactions are faster then recreating the database for every single test. Dropping/creating a database is quite faster on SSD or in ramdisk as on HDD.
For code, that uses transactions already I used to create separate test database - i.e. all the regular tests operate on the same database with transactions (on the beginning of the test i start a transaction, on the end of the test i do a rollback), except that special ones, which use transactions. They operate each on a separate database and those tests does not have transaction/rollback on the start/end of the test file. So far it is working quite good for me.
I have figured it out. MySQL supports nested txns via savepoints. You can set "auto_savepoint => 1" on connection to enable that functionality.
More info: https://metacpan.org/pod/DBIx::Class::Manual::Cookbook#Nested-transactions-and-auto-savepoints