There is much to be said for
testing user experience design in the wild – it has a number of advantages if
used properly, but using it properly is something of a delicate matter – but
for the moment, I’d like to focus on the various kinds of tests that are run in
a production environment, as there seems to be a great deal of confusion among
them.
Aside of being persnickety
about calling things by the proper name (many people use “A:B Test” when what
they really mean is a multivariate or champion-challenger test), it’s also
important to make the distinction so that you can plan the right test to get
the information you need and relate the results in a meaningful and accurate
way, using each method to its best advantage.
A champion-challenger test is
the most commonly used variety. It
compares an as-is design to a proposed to-be design to confirm that the desired
effect on user behavior is effected.
Consider these two examples:
The “challenger” version of
the element has many differences to the “champion” version: the headline is
different text, font, and color; the call-to-action button is a different
color, a different phrase, a different size, and in a different position; the
image is different; and the copy is different. As such, if you should notice a difference
in their performance, you have no idea which of these changes caused the
difference to occur.
That’s not necessarily a bad
thing – since the ultimate goal is to increase click-through, you likely do not
care which of these elements is to credit, so a champion-challenger test
effects a one-time improvement but does not “teach” you anything that could be
used to improve any future design.
A:B Test
The A:B test is the one most
often claimed, but often mistakenly so.
It is misunderstood because the parameters of an A:B test are very
stringent: only one thing, and one very specific thing, can be tested. Consider
these two examples:
The only difference between
them is the color of the button. If you
change anything else about the layout, or even another aspect of the button
(the words, the size, or the position), you are no longer running an A:B test
but a champion-challenger test, with the combined impact of all changes
commingled and inseparable.
The greatest benefit of an A:B
test is that it supports not merely a one-time decision (this layout is better
than that one) but a general observation (green buttons perform better than
blue ones) that is very likely to be applicable across all such layouts on your
site, and possibly across multiple sites.
Multivariate Test
A multivariate test combines
the best of both worlds: it enables you to make multiple changes to a layout,
but test them in such a way that you can identify the degree of influence each
change has on the outcome. Consider
these examples:
These four variations test two
elements: the wording and color of the button.
By doing so, you can be informed not only that one does better than the
rest, but also the influence of each factor.
That is to say that A:B testing might suggest a green button is better
than a blue one and “get info” is better than “learn more,” leading you to
conclude that a green “get info” button is the best of all – but when you
actually test every possible combination, you may discover that the blue “learn
more” button outperforms is.
This arises due to covariance,
a statistical concept that is best explained as being the combined effect of
many different factors, and it is ultimately what is significant to the user,
who does not mentally dissect your layout into its component elements but
experiences it as a whole, whose impact may be greater or lesser than the sum
of its parts.
Multivariate testing yields
the same general result as a champion-challenger test, identifying the
combination of elements that has the greatest impact for a very specific
instance. And because the influence of
each factor is examined, it also has the benefit of providing the general
insight that would be yielded by multiple A:B tests of each factor.
Informed Decisions, Informed Interpretations
It is important to make the
distinction between types of tests in order to decide which kind of test is
appropriate to the information you wish to learn, as well as in knowing whether
you can rely on test results to guide future decisions.
For example, if you run a
champion-challenger test in which the champion has a blue button and the
challenger (which wins) has a green button, you cannot accurately or reliably
state that green buttons outperform blue because you do not really know which
of several changes effected the difference in outcome. Sadly, claims of this sort are very
frequently made, and the result is proceeding boldly on bad information.
Of importance: consider (and
report upon) each test for what it’s actually worth:
- A champion-challenger test enables you to make an accurate one-time
decision but yields no reusable observations
- An A:B test enables you to make a precise observation that may be too
limited or granular to make a practical decision
- A multivariate test supports both one-time decisions and general
observations, plus it tests for covariance among the factors
Thus considered, the
multivariate test would seem to be the best of the three in terms of outcomes –
but depending on the decision with which you are faced, one of the other two
may be more relevant, and so long as you are aware of what the results actually
mean, may be entirely acceptable to a given decision-making scenario.