Jim's Notebook: Testing in the Wild

There is much to be said for testing user experience design in the wild – it has a number of advantages if used properly, but using it properly is something of a delicate matter – but for the moment, I’d like to focus on the various kinds of tests that are run in a production environment, as there seems to be a great deal of confusion among them.

Aside of being persnickety about calling things by the proper name (many people use “A:B Test” when what they really mean is a multivariate or champion-challenger test), it’s also important to make the distinction so that you can plan the right test to get the information you need and relate the results in a meaningful and accurate way, using each method to its best advantage.

Champion-Challenger Test

A champion-challenger test is the most commonly used variety. It compares an as-is design to a proposed to-be design to confirm that the desired effect on user behavior is effected. Consider these two examples:

The “challenger” version of the element has many differences to the “champion” version: the headline is different text, font, and color; the call-to-action button is a different color, a different phrase, a different size, and in a different position; the image is different; and the copy is different. As such, if you should notice a difference in their performance, you have no idea which of these changes caused the difference to occur.

That’s not necessarily a bad thing – since the ultimate goal is to increase click-through, you likely do not care which of these elements is to credit, so a champion-challenger test effects a one-time improvement but does not “teach” you anything that could be used to improve any future design.

A:B Test

The A:B test is the one most often claimed, but often mistakenly so. It is misunderstood because the parameters of an A:B test are very stringent: only one thing, and one very specific thing, can be tested. Consider these two examples:

The only difference between them is the color of the button. If you change anything else about the layout, or even another aspect of the button (the words, the size, or the position), you are no longer running an A:B test but a champion-challenger test, with the combined impact of all changes commingled and inseparable.

The greatest benefit of an A:B test is that it supports not merely a one-time decision (this layout is better than that one) but a general observation (green buttons perform better than blue ones) that is very likely to be applicable across all such layouts on your site, and possibly across multiple sites.

Multivariate Test

A multivariate test combines the best of both worlds: it enables you to make multiple changes to a layout, but test them in such a way that you can identify the degree of influence each change has on the outcome. Consider these examples:

These four variations test two elements: the wording and color of the button. By doing so, you can be informed not only that one does better than the rest, but also the influence of each factor. That is to say that A:B testing might suggest a green button is better than a blue one and “get info” is better than “learn more,” leading you to conclude that a green “get info” button is the best of all – but when you actually test every possible combination, you may discover that the blue “learn more” button outperforms is.

This arises due to covariance, a statistical concept that is best explained as being the combined effect of many different factors, and it is ultimately what is significant to the user, who does not mentally dissect your layout into its component elements but experiences it as a whole, whose impact may be greater or lesser than the sum of its parts.

Multivariate testing yields the same general result as a champion-challenger test, identifying the combination of elements that has the greatest impact for a very specific instance. And because the influence of each factor is examined, it also has the benefit of providing the general insight that would be yielded by multiple A:B tests of each factor.

Informed Decisions, Informed Interpretations

It is important to make the distinction between types of tests in order to decide which kind of test is appropriate to the information you wish to learn, as well as in knowing whether you can rely on test results to guide future decisions.

For example, if you run a champion-challenger test in which the champion has a blue button and the challenger (which wins) has a green button, you cannot accurately or reliably state that green buttons outperform blue because you do not really know which of several changes effected the difference in outcome. Sadly, claims of this sort are very frequently made, and the result is proceeding boldly on bad information.

Of importance: consider (and report upon) each test for what it’s actually worth:

A champion-challenger test enables you to make an accurate one-time decision but yields no reusable observations
An A:B test enables you to make a precise observation that may be too limited or granular to make a practical decision
A multivariate test supports both one-time decisions and general observations, plus it tests for covariance among the factors

Thus considered, the multivariate test would seem to be the best of the three in terms of outcomes – but depending on the decision with which you are faced, one of the other two may be more relevant, and so long as you are aware of what the results actually mean, may be entirely acceptable to a given decision-making scenario.

Jim's Notebook

Friday, November 29, 2013

Testing in the Wild

No comments:

Post a Comment