Failing your way to success with A/B Testing
So I've read and very carefully considered the words of Douglas Bowman and his decision to leave Google and much of this was about Google's use of A/B testing ... Google's extensive use of A/B testing. Google engineers clearly valued function over form. He complains about a situation where 41 shades of blue were tested for optimum response (see page 3) and makes it clear that he felt constrained. No matter what he did, how simply he did it, he had to "prove" it worked. He simply wanted to build something beautiful and not worry about extreme micromanagement of every aspect.
I can understand that. Many times I want to just "make stuff happen", secure in the knowledge that I know what I'm doing. I look at how foolishly code is implemented and I think "what our customers really want is X!". And A/B testing shows me, over and over again, that I'm wrong. It's humbling. It's humiliating. And if you really care about being the most effective you can be, you'll love it, but only if your pride can handle being proved wrong on a regular basis.
The core of A/B testing is rather simple. You have your current behavior (often referred to as the null hypothesis) and one or more "variants" of that behavior. In the context of a Web site, you randomly show every visitor one and only one variant of the site and ensure that if they return, they see the same variant. For example, if you sell shoes, does providing more than one picture of that shoe lead people to buy more or does a slower download time kill the sale? If you can provide a matching belt to go with those shoes, does suggesting that lead to better sales or does it distract customers from buying? Over time you can use fairly basic statistics to verify that one of those variants is (statistically) better than the others (like many things in this post, it's a gross oversimplification).
In my personal experience and in reading about others successes and failures, I've discovered something amazing: most A/B tests perform the same or worse as their original behavior. These aren't people making things up: these are people saying "I think this can improve our conversion". These are often experts in their field insisting that the two-column layout will make more money than the three-column layout. For most companies, whoever has the most important title or argues the most persuasively will win and a particular layout will be chosen and they'll move on, completely ignorant that they may have improved their conversion rate 25% by using to the other layout. In contrast, companies which use A/B testing will fail and fail and fail, until they have one test which is successful, makes them plenty of money, and they throw away the rest and start over (another gross oversimplification). They keep accumulating these small successes (and occasionally huge ones) and these add up over time. Instead of guessing what works, they know what works. Instead of launching a bunch of features and hoping their customers like them, they know their customers like them (and don't forget that just because customers want something doesn't mean that the enormous expense involved is going to induce them to spend more money).
A/B testing slays egos. Like the developer who learns to stop "optimizing" their code and start benchmarking it, A/B (and multivariate — MVT) testing teaches you to stop saying "this is better" in favor of "let's find out what's better". Instead of benchmarking your code, you're benchmarking your customers.
It's no wonder that the market for A/B and MVT testing is exploding. Not only is it proving to be dramatically successful for many companies (read some of the case studies for one A/B testing solution provider), for people who truly care about producing results rather than repeating dogma, A/B testing is where it's at. No longer do you worry about killing your business with a redesign; you gradually evolve that new design based on what people actually do rather than what they say they do or on your personal hunches.
Sadly, because we have an exploding market in this area we also have snake oil salesmen (a little voice in the back of my head says "you can have snake oil saleswomen too!"). I'm finding online A/B testing calculators which get their basic math wrong ("maths", Dave. Are you happy now? :) I'm finding "A/B testing consultant" web sites where they confuse standard deviation and standard error, an error which does not inspire confidence. I'm seeing people say "A/B testing" is the only way to go and others say "MVT is the one true path, but they don't mention the strengths and weaknesses of different approaches. It's a big, scary and confusing world out there and for those of you who don't believe me, here's a picture of my wife:
|My wife with an orange thing on her head.|
A/B (and MVT) testing is not exactly Perl, but I suspect some developers are close enough to this area that you might be interested. If you are (or at least if people don't scream stop), I might write more about this later.