Your Test Suite is Broken

Here are some indicators that your test suite is broken:

  • It has any failing tests.
  • It emits anything other than test information.
  • It takes too long to run.

People are going to have issues with the second item on that list but they're definitely going to argue with that last one. I don't care. It's time to throw the gauntlet down. Let's look at these three points in detail.

Any failing tests

If a test suite has any failing tests, even "known" failures, developers learn to overlook failing tests. Those who've been testing for any length of time will generally agree with this. It's a bad thing. In fact, it's enough of a bad thing that every developer (well, except one) that I've spoken with agrees that leaving broken tests is bad, so I won't belabour this point.

Any "non-test" output

This one throws a few people off and I confess I'm guilty of ignoring this myself; it's easy to ignore a warning which is hard to suppress and just shrug. However, like the "any failing tests" rule, this should be dealt with quickly. Specifically, if you start getting accustomed to output which isn't test related, sooner or later you'll see non-test output which might indicate a bug but you'll learn ignore it. I still remember cleaning up a test suite warning which had existed for months, only to discover that the warning was indicative of a bug in the test program, making the tests useless.

Long running tests

People will agree that this is annoying, but they won't necessarily agree that their test suite is broken. I argue that it is. I strongly argue that it is. The first problem is the "downtime" problem:

Enterprise Perl

Sure, many argue "but the developer can choose another task to handle at that point", but it's often very hard to find a meaningful task that only takes an hour or two. Plus, many developers are simply going to wander off for a bit. If you expect seven hours of productivity a day and you have seven developers who run their full one-hour test suite once a day, you've lost a day of work every day. It's like having six developers!

No, I'm lying. It's like having less than six developers because they won't come back the second the test suite is done, or they won't be mentally rarin'-to-go the second the test suite is done. A single failure means rerunning that suite, so that's also more time lost. Long-running test suites cause developers to lose productivity. It causes them to lose serious amounts of productivity. (I really should start tracking this at work so I can show management the costs involved. They're staggering!)

So what happens then? Forget for a moment what you think should happen. I think employers should double my paycheck every year. Ain't gonna happen. Instead, what happens is reality. Even if management is not pushing the programmers to be more productive, the developers themselves will want to be more productive and they've found a trick to make this happen. I personally have experienced this trick at five teams across three companies with long-running test suites. I've had many developers in other companies tell me that their teams use the same trick themselves (remember, I'm "that testing guy". People talk to me about this stuff all the time).

Here's the trick: don't always run the full test suite.

Remember: I've had this happen at every company I've worked at with long-running test suites. Many developers have told me their teams do the same thing with their long-running test suites. They don't always run those tests. Forget what "should" be done. This is reality. When you're under pressure to deliver, those deadlines are looming and you just know that X only affects Y and X is well-tested and you can slip this in really quickly and no one will notice, it's easy to commit now and not wait an hour or two. I've seen it eventually happen to every team with a long-running test suite. I'm sure there are exceptions, but I've never seen one first-hand. Sooner or later that failed test is going to creep in.

At my current team, you can't commit to trunk unless you have a stuffed camel on your desk. One of the most common questions on our team is "who has the camel?"

Possibly the next most popular question is "do you mind if I commit this small change to trunk anyway? It doesn't affect anything."

It doesn't affect anything? Really? So why is it being committed? Of course, I've asked this question, too.

Mind you, these aren't lazy developers. These aren't bad or unconscientious developers. These are developers who have deadlines and have to get things done. They have to make individual value calls on whether or not the risk is worth the reward and usually they're right. When they're wrong, the effect cascades across everyone's work. As a result, I've lost a huge amount of time today trying to debug test failures that I didn't realize were in trunk. This is not the first time this has happened to me.

Maybe your team is different. Maybe your developers are so incredibly careful and meticulous that your three-hour test suite is run every single time it should be. Maybe your developers are so conscientious that they immediately find a task which should take two hours and fifty minutes when they run that three hour test suite. And maybe your developers are so anal-retentive that I'd want to hang myself after working with them for more than a week.

For the rest of us, a long-running test suite means a significant loss in productivity and a drop in quality. I've seen this too many times to think that my experiences are anomalous. There's a lot of interesting stuff which needs to be done around testing, but I think more stuff needs to be done around speeding up test suites. For the vast majority of Perl users, this is not a problem. For those of us suffering from this problem, it's pretty damned serious. I've done a lot of work in this area, but more needs to be done. Much more.

15 Comments

Does a long-running test suite imply that your system needs to be broken up into components (sub-projects)? The hope being that the component you are currently working on can have its own (smaller) set of low-level tests and that the other components with which it interfaces can be treated as black boxes? This is sort of making "not running all the tests" official by creating a clear boundary between unit testing and system testing.

I think it's a bit too black and white to suggest that a test suite that takes too long to run is "broken". If you have a large complex system it's going to take a long time to test.


In my $work project our test suite takes around 40 minutes to run. Obviously that means a developer can't run all the tests on every commit but we've addressed that in two ways.


Our test suite is modular. We have a tree of test directories and a developer can run individual tests, a directory full of tests or a subtree of directories of tests. This would typically be done as part of getting a code change ready to commit.


We use continuous integration. Nothing fancy, we just have a script running on a dedicated test 'server' looping through the whole test suite repeatedly. When the tests pass, the script builds a set of .deb packages and deploys them to our staging servers.


So developers run a subset of the tests before committing and then if the commit broke something outside that subset, they'll get notified within about an hour. That's typically a small enough time frame to make apportioning blame easy :-)

My personal rule is that any test suite which takes longer than ten minutes to run in full is unusable and any core tests which take longer than a minute to run are too cumbersome.

I dislike the "We'll just use continuous integration!" approach because I tend to get notifications only after I've moved on to something else. Context switching is even worse for wetware than hardware.

Seems like there are 2 problems compounded into one.


1) Monolithic codebase

So you have DataBridge which depends on Dynamite. Sounds like there should be a common library that both projects share.


How does making a 3rd dependency help?


The common library consists of commonly used components ( DBIC classes, middleware interface classes, etc ) and therefore will have the majority of tests due to it's interfacing with all of the other system. It will therefore also take the longest time to run but should be quite stable as core interfaces should change less often than external.


On the other hand the main projects ( where the majority of the business logic resides ) test suites will be much faster.


Only at integration will the entire suit require running.


2) branching and merging strategy

This is commonly over looked as being one of the major causes of delay.


If a developer need to "wait" for a camel before merging maybe there is a flaw in your branching strategy.


If a developer can branch from broken ( tests failing ) code maybe there is a flaw in your branching strategy.


Poor branching strategies REALLY affects productivity.


There is a trade off between "continuous integration" and "known stable code". I think you'll find your team has chosen the former as being more important.

Wouldn't branchy development help there, i.e. branching from code known to pass tests, and having CI or maintainer merge branches and tests results of merging?

Can't developers run tests on their branches (on their machines) before submitting them for inclusion in trunk (master branch)? This should reduce CI failures.

I just can't see #3 as so black and white. Yes at my $job we have a large test suite that tests some things that take a long time to run (they even take a long time to run in production, so I don't see why the tests shouldn't emulate what's in production). And yes we use CI to help developers so that they don't have to run the full test suite on each commit.

There are ways to work around lots of people investigating CI failures. But if you have a really large project that has some things that just take a long time to run, you reach a point where you can't make your tests any faster. Sure you could buy beefier hardware and try to make things run in parallel (our tests already make our 2 cores run about 80%) but if the project is large enough at some point you will hit a wall. I see CI as a "good enough" solution for getting over that wall.

I think you've got a bit of a chicken-and-egg problem. To make your test suite smaller, the only way to go is to break up the monolithic project into several independent projects, each with their own test suite. You don't re-run the tests on all your CPAN dependencies every time you commit (I hope!) so why run tests on DataBridge for a change to Dynamite? The problem is the tight coupling: you can't separate the dependencies from the code, so you can't separate the test suites. I think that if you want to fix this, at some point there's going to have to be a lot of pain while you take a chainsaw to the code base.

About Ovid

user-pic Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/