Why Are Fast Tests Important?
I recently made the following comment on Twitter:
EVERY tech company I've worked with has had a test suite broken in some fundamental way, even if all test pass. No exceptions.
Unfortunately, Twitter is not a great medium for exchanging ideas like this, but when challenged on this by one person, I mentioned:
Code + path coverage? Test suite runtime. Duplicate code. Procedural tests for OO code? Cut-n-paste? Skipped tests? Organization?
There's a huge amount to cover there, but I want to touch on test suite runtime for just a moment.
One presentation I gave a long time ago was about Turbo-charged test suites.
The information in that is somewhat out of date and also assumes you're limited to a single process (something I'm planning on fixing with Test::Class::Moose). That being said, the idea is still very relevant.
A case in point is the BBC. When I first joined them as a Senior Software Engineer, I joined the PIPs team. Before I go further, I should explain something about this team: they were awesome. There is not a single person on that team I wouldn't be happy to have working with me. They were very talented and conscientious developers. However, like the proverbial story of the frog in water whose temperature is slowly increasing, they had allowed their test suite runtime to increase to one hour and twenty minutes. One developer, Rob, had (IIRC) done some great work to improve their test suite runtime, but it was still too slow. What's worse, some tests were failing and the test suite also spit out many warnings, some of which obscured real bugs in their software.
So let's say you fix the failing tests and eliminate the warnings (and the related bugs), is having a test suite that slow acceptable?
Here's the real problem. I'm a conscientious developer and in my last contract, I did my tasks in a separate branch and ran the tests for what I was working on. When I was done, I'd rerun the full test suite. Then I'd rebase my branch on top of the integration branch and rerun the test suite (still in my branch). Then I'd merge my code back into integration and rerun the test suite, just to be paranoid. For every finished task, I'd rerun my test suite at least three times. If the test suite took an hour and twenty minutes to run, that's over half my working hours gone on rerunning that test suite.
At the BBC, in theory, because the test suite took so long to run, many developers were just rerunning that hour+ test suite when they merged their code into trunk rather than the two or three time minimum that you would expect.
So let's say you can accomplish five tasks in a two-week sprint. That's at minimum, five test suite runs. That's almost seven hours of test suite, or a full day of work lost. Sure, you could go to meetings (if they're conveniently scheduled when you need to run the test suite), read documentation (until your eyes glaze over), or interrupt other developers who are, well, developing, to chat about technical problems, but there's only so much unscheduled work you can actually accomplish during the test suite runs.
But what actually happens is that you don't stop reading Facebook the second the test suite is run (I remember the BBC sending out an email asking employees to take it easy on Facebook while they upgraded their network capacity). You keep reading it for a while, responding, and then you notice your test suite has finished. So in reality, you're losing more than that full day of work per sprint.
That's a day per sprint, per developer. Do you have five devs? Congratulations, you've just thrown away a week of development time for that sprint. What could you do if you had an extra week of development time per sprint?
No, I lied. The above math only works if the full test suite run always passes. Of course it doesn't. And the developers need to fix the failures and rerun the test suite and you're wasting more time than you thought. By not allowing your developers to improve their test suite time you're burning money. You're throwing it away. And you're annoying your developers. How much money does lost morale cost?
So I fixed the problem and got most test suite runs down to under twenty minutes. In fact, in one sample run (with code that didn't get merged), I got the test suite down to twelve minutes, and that was with more tests, not deleting them!
So how did a talented, conscientious team of developers get to an eighty minute test suite with broken tests and lots of warnings?
They got there because they were conscientious. They knew they needed to keep pushing their product forward and they didn't want failed sprints, so for many small changes, they'd run a subset of hopefully relevant tests and hope they didn't break things, merge their code and move on to the next task. "Fixing" their test suite simply wasn't something they were really given the license to do. To be fair, I wasn't either. I just went ahead and did it (it's often better to ask forgiveness than permission).
As an aside, if you're not allowed to improve your test suite's performance and find that running a relevant subset of tests is your only option, check out Johan Lindström's Devel::CoverX::Covered module. Near the end of my Turbo-Charged Test suites presentation, I present vim mappings that allow you use
Devel::CoverX::Covered to see exactly which test files test the code you're actually editing, allowing you to run only the tests which are likely to break.
Oh, and hire me to fix your test suite! (ovid at cpan dot org).
Great article Ovid - thank you.
I would love to read a little more on how you setup / teardown databases to avoid using transactions or drop / recreate - and why.
That's another article unto itself :) There are a variety of trade-offs in using transactions, one of the biggest is that test failures leave the data an impenetrable black box. However, if you go the multi-process route for testing, you want the isolation of transactions. I'll try to write later as I have more time.
Yeah, I'd love to read more details about your setup. I'm working on a smaller project, now my tests run in 1min (we are at the beginning of the project) but i feel like most of it is setting the database. Right now I'm using MySQL (the database is in tmpfs in RAM, which gave a nice speed boost). I drop the whole DB once in the very first test, after that i truncate all the tables and reload my DBIc fixtures. Making tests run in parallel is also an interesting idea..