Speeding up the test suite with subtests
The BBC team I'm currently working on has a very, very slow test suite. On my box, it was generally taking about 2h10m to complete. Between Johan Lindström and myself, we've shaved half an hour from that. Johan used transactional savepoints for his part. This allowed us to roll back some database changes rather than rebuild the database tables. My part involved subtests and a very strange use of Test::Class.
The first thing I did was instrument Test::Class to tell me:
- The length of time each test class takes to run
- The number of test methods each test class runs
- The length of time each setup takes to run
Multiplying the setup time by the number of test methods is critical, because that shows where you can get interesting savings. For example, in Dynamite::Test::Block::Fetcher::Recommendations, I found that it took 256 seconds to run 16 test methods. As it turned out, the setup time for each of those was roughly 12 to 14 seconds. The tests themselves were read only. As a result, If I put all of the tests into a single test method, I could save a huge amount of time.
The problem with this is how our tests are named:
http://search.cpan.org/dist/Test-Simple/ Each test method covers a particular bit of functionality and we did not want to lose that granularity, but we didn't have to. I immediately thought of subtests and converted the above to something like:
sub recommendations : Tests {
my $test = shift;
subtest basic_fetcher => sub { ... };
subtest personalisation => sub { ... };
subtest alba_cymru => sub { ... };
# and so on
}
In fact, I wrote a vim macro which handled most of that for me.
Non-verbose output is the same. Verbose output is similar to:
# Dynamite::Test::Block::Fetcher::Recommendations->recommendations
ok 1 - Four blocks back.
ok 2 - We don't get the block we based our search on back.
ok 3 - No non-childrens cats came back.
1..3
ok 1 - alba_cymru
ok 1 - fetcher object isa Dynamite::Interface::Block::Fetcher::Recommendation
1..1
ok 2 - basic_fetcher
ok 1 - only one block
ok 2 - episode from choicestream isa Dynamite::Interface::Block::Episode
ok 3 - rec id is correct on blocklist
1..3
ok 3 - personalisation
ok 1 - New blocklist used isa Dynamite::Interface::BlockList::Recommendations
ok 2 - only one block
ok 3 - is_personalised is set
... and so on
We still have our granularity, but because we're only running a single test method, our time has dropped from 256 seconds down to 36. After applying this to enough test classes, I shaved another 15 minutes from the test suite.
This really isn't the sort of fundamental change we'd like to have in our test suite performance, but every little bit helps.
Your title gave me an idea, although it might not be a good idea (or if it is, work for you).
My normal process of testing involves getting all the little bits right in unit tests, building up integration tests based on those, and so on up to acceptance tests.
I wonder what would happen if I changed that around so I ran the integration tests first, and only ran associated unit tests when those fail.
This doesn't mean that I always do it that way, but that I have some mechanism that lets me. The nested TAP would work nicely for that, I think. If the first nested test passes, no worries and move on. If it fails, start drilling down into tests to see what low-level bit is failing.
This makes me think about a tree of tests where we can test any part of the tree. We have to do a lot of human work to construct the tree (like, you know, plan things and so on), but then testing in the large might be much more flexible.
Oh, this is giving me a lot of other interesting ideas I don't have time to type at the moment.
brian: that sounds fascinating. I like where you're going with this. Hope something interesting comes out of this :)
Great ideas, both of you. Having higher-level tests representing multiple lower-level tests would definitely enable faster builds.
We'll need additional tests to notify us when a higher-level test passes despite a failure in one of its corresponding lower-level tests.
Thus, it would still be important to run the entire set of tests as often as we are practically able to.
Okay, the tree idea isn't going to work because multiple higher level tests might all depend on the same lower level tests. I think that means it's not a graph, either, in the mathematical sense. I think we could do it without cycles, but I'm not sure how we'd be able to enforce that.
Curiously, I also remembered that I've already programed such a system. It's called Brick. It's no good for this testing issue though.