How to be agile without testing

What's a bug?

Fair warning: if you're someone who has the shining light of the converted in your eyes and you've discovered the One True Way of writing software, you might feel a bit challenged by this post.

Your newest developer just pushed some code, but it has a bug. They screwed up the CSS and one of the links is a bright, glaring red instead of the muted blue that your company requires. While you're sitting with her, counseling her on the importance of testing, you get a call from marketing congratulating you on the last release. Sales have jumped 50%.

You know that the only change is the link color.

Was that code really a bug? Are you honestly going to roll it back?

More importantly, and this is the question that many people get wrong: what are you going to learn from this?

Why we write tests

Why do we go agile? Because we believe we can improve the software process. Because we believe we can share information better and, this is very important to agile: get feedback early and often (hint: that theme is going to recur). Another interesting thing about agile is that every agile methodology, without exception, says that you need to adjust that methodology to meet you particular needs. I've said this before and it bears repeating: you can't be agile unless you're, well, agile. So some agile teams don't have fixed iterations. Others (most?) don't do pair programming. Code review is only done on the tricky bits or maybe for newer programmers. And guess what? These companies often do very well, despite doing things differently.

But none of them talk about getting rid of testing. They just don't.

And yet in our "red link" example above, testing might well have made it harder to discover a 50% increase in sales.

For most of us, we learned about testing years ago and it was good. Then we learned about TDD and realized we had found testing Nirvana. FIT testing was a nifty idea that's heavily evangelized by those who offer FIT testing consulting services — and pretty much no one else. And now BDD leaves some breathless while others yawn. It's just the next craze, right?

But what is testing for? From a technical perspective, we might argue that it's to make sure the software does what we want it to do. But is that the most important thing? Remember that over a size of mumble lines of code, all software has bugs. All of it.

Instead, I think it's better to say that all software has unexpected behavior. We write tests because we hope the software will do what we want the software to do, but instead, isn't it better if the customers do what we want them to do? You can build a better mousetrap, but there's no guarantee they will come. And if there's anything to learn from Digg or other tech disasters it's this: customers are going to do what they damned well please, regardless of whether or not your software "works", the experts you've consulted or how many focus groups you've held.

So rather than introduce software testing as some proxy for customer behavior, let's think about the consumers of our software for a moment.

A list of undesirable things

Considering our "bright red link" example above, I ask again: is it a bug? In that example (which was not chosen at random), it's easy to argue that it's a software bug, but that's only because the software exhibited unexpected behavior. In this case, it was a 50% increase in sales.

So now, instead of bugs — always bad! — we can think in terms of "unexpected behavior", sometimes good, sometimes bad.

So how do you know which is which?

You make lists of undesirable things. 500 errors on your Web site are bad, but are 302s? Tough to say. Maybe you want to keep RAM usage below a certain level, or not see a significant drop in sales. And you probably want to make sure that responses never take more than $x milliseconds.

Make a list of everything that's unequivocally undesirable (for example, a Facebook "like" button going away doesn't count as unequivocally undesirable) and add monitoring for all of those behaviors. Every time you change or add technologies, go over your list of undesirable things again. Are they up to date? Is there anything you need to change?

Some of those undesirable things are reversible (dropping sales) and the alternative is good. So monitor those, too. Maybe you want to get notified when a release improves response time by 10%.

And then?

Well, it's great, but it doesn't replace testing. Not by a long shot. You've made your bi-weekly release, RAM consumption has skyrocketed, you're swapping like mad and now you a 3,000 line diff to go through. Finding a memory leak can be hard at the best of times and normal testing often misses them (but checkout Test::LeakTrace), but now you have a roll back a huge change and goes through 3,000 lines of code to find your problem.

So you don't do that. Instead, you're switching to continuous deployment. With this model, you push code to production the moment it's ready. Of course, it's good if you actually push it to a box, watch it, push it to a cluster, watch it, and then push it to all servers. With your extensive monitoring, undesirable things usually show up pretty quickly and your memory leak is a 30 line diff instead of a 3,000 line diff.

Which one do you want to deal with?

(Naturally, I used a memory leak as an example, but that's one of the things which often takes longer to show up, but I'm too lazy to change that example. Pretend I wrote "5% increase in 404s.)

In my experience with this, customers are fairly forgiving about minor quirks and most unexpected behaviors are things like "this image isn't showing up" or "these search results are ordered incorrectly." Those tend to not be catastrophic. In fact, many times this unexpected behavior goes unnoticed. Most of the time the unexpected behaviors will turn out to be neutral or bad, in terms of undesirable things, but sometimes they turn out to be the good unexpected behavior. You'll never know if you don't try.

As you may expect, this technique works very well with A/B testing and if you have the courage to look for unexpected behaviors instead of bugs, A/B testing is the next logical step.

Note: None of the above precludes writing tests. None of it. I've seen the "monitoring undesirable things" strategy work extremely well and I firmly believe that it can work in conjunction with software testing. However, it's a different way of testing software, one that's more reliant on customer behavior than exacting specifications. So the title of this post is actually a bit of a lie; it's just a different want of looking at testing.

And that's really the most interesting idea of this entire post: your customer's behavior is more important than your application's behavior.

See also: when must you test your code?.

11 Comments

Isn't this the same argument some people use against using safety-belt in cars?

"I read about an accident where the person could not escape the car because of the safety belt, so I won't use it."

You're conflating two totally different ideas. When we talk about testing we're talking about correctness, or what you call "testing for unequivocally undesirable things".

Everything else you're taking about in this post is called analytics.

And, yes, good analytics and a/b testing is important.

For me, there are two very good reasons for writing tests.

  1. I am lazy and I write web applications. Clicking myself through that up until I find the feature I'm working on just to be greeted by a stupid error such as "Cannot use an undefined value as a hash-ref" is way to exhausting for me.

  2. I will have to add features and modify existing ones in the future. You make it sound like tests are written so that we can "prove" that our existing code works. That's not the point. I can use those tests to check whether what I wrote a couple of weeks/months ago still does what it's supposed to do although I added features and you can now sign up for another corporate newsletter.

I dont test automatically. I have precisely 0 unit tests, yet, I have a successful website, getting 2 million requests daily. Im not worried about adding new features, or breaking existing code. I add a feature, manually test that it works, and if it does, deploy to live.

What I do is live by these rules. 1) The system must be fast to deploy / restart. It is essential that when I add a change, I can see that change within seconds. Not have to wait for an ant/maven rebuild, or for caches to rebuild. Instant feedback.

2) Steer clear of anything complex. If I write something complex, that I cant understand fully, or wont fit into my mind without paging, this could be considered a prime candidate for testing to verify its doing as required. I dont want to spend time writing tests, so instead, I find another less complex way of doing what I want.

3) If things break (and they do occasionally break!) , then fix them. When I put things live, sometimes bugs do get out. my users tell me about bugs within minutes. Because of the first 2 points, I can get a fix out very quickly, usually within the following few mins. The bugs that have arisen, have typically been because of things that I didn't anticipate, and so I wouldn't have written tests for.

Im happy. I have a clean codebase, and very fast turnarounds. I kind of believe that unit tests would get in the way, or encourage code complexity. Though, thats just my opinion.

Monitoring is a kind of testing. Application monitoring is doubly so. We select how to monitor our applications from the set of behaviors exhibited by the system and write code that validates the presence of the behavior. If we choose, for example, to monitor that our style guide is adhered to. If we test that no glaring red appears in links, we loose the opportunity to realize the serendipitous increase in sales due to a violation of the style guide. The opportunity would never have occurred.

When we test and when we monitor and when we do all the other good things that make our software stable, usable and predictable we must realize that we are also locking down dimensions of variability that could lead to serendipitous outcomes. We choose to test and monitor and automate so that we avoid common bad outcomes. In the process we throw out a few potentially good outcomes.

I do so love the continual delivery stuff. It makes building a culture of experimentation so much easier.

That said - you seem to be assuming that the only reason to write tests is to find bugs / behaviour that the customer doesn't want. There are other reasons to write tests.

For example:

  • I write the vast majority of my code using TDD. Here I'm writing tests to help my design.

  • Sometimes I write story level tests / acceptance tests / customer tests. Here I'm writing tests to help me figure out when I've done something.

... and so on...

So I don't dispute what you say about tests, but I will cheerfully dispute whether or not currently recommended best practices in testing are the elusive Holy Grail of performant software.

Hmmm.. whose best practices are you reading ;-)

I've been running workshops on the advantages of a more experimental metric-driven approach for a couple of years now. Jez Humble's CD book was published nearly three years ago. The classic "The Deployment Production Line" paper from Agile 2006 is more than six years old now.

We've got the Lean Startup folk ranting about CD, metrics, split testing, etc. for the last three years.

Maybe it's just that I spend more time with new product development and startups - but this sort of stuff is best practice now.

It's just orthogonal to testing.

Leave a comment

About Ovid

user-pic Have Perl; Will Travel. Freelance Perl/Testing/Agile consultant. Photo by http://www.circle23.com/. Warning: that site is not safe for work. The photographer is a good friend of mine, though, and it's appropriate to credit his work.