Code Evolution Versus Intelligent Design
I didn't actually intend for this to be a series of posts, but hey, that's the consequence of going with the flow rather than rigidly planning everything out beforehand and it nicely mirrors the theme of:
If you have not read those, I strongly recommend that you do so before continuing on this post. Mostly the comments have been positive, but Adrian Howard has offered some interesting counter-points and some good resources for further reading. I will not say that he's wrong, but there is a different way of looking at this situation.
First, the summary of what I am arguing for:
- Use continuous deployment
- Monitor any behaviors that absolutely impact the bottom line
- Write integration tests to catch all fatal errors
- Write integration tests to validate conversion funnels
- Write tests to prevent any "harmful" behavior
So far that's pretty uncontroversial and, as Adrian points out, much of this is best practice in the agile community today, but this is where he and I diverge slightly.
First and foremost, I want to be clear that I am only outlining this approach as one of many possible approaches. Second, this is only for customer-facing code (specifically, for customers who's behavior can be "improved" by a different design, such as for e-commerce sites).
If you just follow what I outline above, you can have plenty of untested code. Oh noes! Quelle horreur! The sky is falling!
Why Do We Test?
At this point, if you want to test, be my guest. Test away! I am certainly not anti-testing by any stretch of the imagination. I largely argue for tests as a preventative measure, but Adrian (and many others) stresses different goals. Adrian wrote:
- I write the vast majority of my code using TDD. Here I'm writing tests to help my design.
- Sometimes I write story level tests / acceptance tests / customer tests. Here I'm writing tests to help me figure out when I've done something.
Driving the design and knowing when you've finished something are certainly ways you can use tests and I've done both quite a bit, though tests do not have the be the only way of accomplishing these goals. However, if you want to use tests for that (and for non-customer facing code I applaud this approach), be my guest.
So why do I think that another approach is worthwhile? Because intelligent design often isn't very intelligent (sorry Adrian).
One of the biggest issues with introducing A/B testing is ego. I've seen this time and time again (and have been victim to it myself). Many people assume that they "know" what the optimal design is and others refuse to work in an environment where customer behavior is seen as more valuable developer opinion. Overcoming ego is hard. I mean it's almost NP-Complete hard. Watching your "expert" opinion turn out to be wrong over and over is a painful lesson, but certainly one that I feel everyone should be exposed to. Humble pie isn't delicious while you're eating it, but it sure is filling.
Case in point: I used to manage the owner of one company by always putting glaring flaws in my interfaces because he always had to change something. He'd spot the flaw and I'd go back to my desk and commit the fix I already had for it. Somehow I'd always get rave reviews from him, even though he was always correcting my work. Egos make it harder to get things done.
All Code Has Bugs
So Adrian's using tests for, amongst other things, design drivers, but I'm actually happy to allow a grey area of a little bit of sloppiness in there. Why? Is it because I want my code to have bugs? Not exactly. All code has bugs. We know this, so rather than fight against the inevitable, can we get it to work in our favor?
Think for a moment about what would happen in biology if mutations stopped: evolution would end, species would no longer be able to adapt to their environment and there would likely be mass extinction. Mutation, however, allows us to sometimes have a positive result, even though most mutations are neutral in effect and a handful are negative. This is why I argue that we need to at least test for immediate harm and conversion funnels: think of these tests as your immune system. Any behavior which harms your customer, or has a strong potential to harm your customer, or blocks your conversion funnels (er, "fitness function"), should be tested.
So what happens with the rest of the code? Every time there is a new release of the code, you can think of this as the child of the parent(s). This child is going to have a combination of intelligent design (new features) and evolution. The evolution might simply be emergent behavior of the new design or it may be what I referred to as an unexpected behavior. The latter is the "mutation" that I have been referring to, though unexpected behavior and emergent behavior can overlap.
Unexpected behaviors (mutations) seem to be, as I've personally experienced as a direct result of using this methodology, mostly neutral in terms of "fitness", though sometimes they're negative. But rather than talk in the abstract, let's get real and see how this works (though I'll be deliberately vague due to an NDA).
I was hacking on some code that returned a collection of "results" to users when I noticed that, under a particular condition, no results were returned even though results existed. Curious, I dug in a little bit and discovered that the discarded results appeared to be relevant. So I "fixed" this unexpected behavior by wrapping it in an A/B test and pushed it live. It took about two hours from start to finish to fix and push this code. A/B tests can easily take a while (days or even weeks) to run, but this one had a strong result in only a couple of days: the unexpected behavior of discarded results strongly outperformed the "correct" behavior. I could speculate all day long (and did!) as to why this was the case, but sometimes people act in mysterious ways.
Had this code been thoroughly tested up front, this very counter-intuitive positive mutation would never have occurred. By using testing to rigidly lock ourselves into a particular set of known behaviors, we lose the ability to sometimes allow our code to organically discover better ways of getting things done, though we gain some comfort that our code is doing more or less what we tell it to do. (Note: except for those cases where I argue that tests should be mandatory, I think many unexpected behaviors should be first investigated via A/B testing in addition to the possibility of a strict test.)
So if we accept the premise that most mutations are "neutral" (yes, that's a big "if"!) and we can hopefully catch most (if not all) negative mutations through vigilant monitoring, we have the potential to let our guard down a little bit and start pushing out features and not worrying too much that we've done things "perfectly". We already know that we're going to have unexpected behaviors so why not embrace them and let them work for us?
Again, this is not "anti-testing" at all. Do you feel more comfortable testing? Then do so! Absolutely do so! Have a tricky algorithm? Test it. Want to flesh out a design issue? Test it. Want a programmatic definition of "done"? Test it. Have mission critical code? Test it. Feel uncomfortable with unexpected behavior? Test it.
However, if you have a solid team, and well-thought out monitoring, and you're comfortable with trading some technical debt for a bit of flexibility, consider letting your code evolve. I was not comfortable with this at all when I first encountered it, but it's astonishing the leaps that evolution can make.
I started to respond, then realized I'd written enough for a blog post of my own: