Design · User Experience Design

What are your concerns with A/B testing on UX design?

Rachel Zheng Business Development Manager at Honyee Media

February 24th, 2015

While working with designers, I find they often get concerned about using A/B testing as a product decision-making model, they want to be holistic design but companies also make decisions based on isolated testing. Is anyone else having similar doubts? 

I came across this cartoon on A/B testing on UX design, pretty funny but very truthful to this question

Is there a balance between data driven design or designing from gut? 

Josh Orum CMO, Operating Partner at Spotlight Equity

February 24th, 2015

This is a good question, and highlights a common misperception: that designs that aren't based on specific tests are "from the gut."

Good designs solve problems. The process may emphasize creativity and intuition, but it's still about solving problems. A design based on the whim of a designer (or pointy haired boss) is not a good design. 

Interface designs often have to solve a variety of problems such as promoting conversion, being easy to use, reinforcing the company's brand, giving users a sense of delight. These are not always aligned, and often they are in opposition. Testing - whether through A/B tests or other means - can add data, but ultimately it's about weighing priorities and making tradeoffs. 

Tests are a fastistic resource and should be part of any UX designer's toolkit, but there are several challenges with relying heavily on testing.

First, tests are only as valuable as the skill of the test designer. Because tests seem scientific, a poorly designed test can as damaging (or more damaging) than a bad design. 

Second, tests are naturally myopic: they can only measure what they've been designed to measure, and often only measure within narrow frames (which button gets the most clicks? which layout gets the most downloads?). Making only ground-level decisions can lead to a disjointed and bad overall design. I recently read a study (I don't have the link available) that tested two designs: one was easier to use, but ugly, while the other was nicer looking, but users took longer to accomplish tasks (i.e., it was harder to use). Users, however, reported that the better looking design felt easier to use and was more satisfying. An organization relying solely on task-completion tests would have green-lighted a design that actually reduced customer satisfaction.

Increasingly, companies are able to use big data to measure holistic performance of a website or application, but these tests must be carefully designed and account for many factors, and many organizations simply don't have the resources to do it effectively.

Finally, tests are good for optimizing designs, but not maximizing designs. A test will not tell you to try something completely different; it can only optimize between existing designs. Over-reliance on testing leads to optimized designs and reduced innovation.

In the end, Dilbert's pointy-haired boss may have been right. The orange button may get 13% more clicks, but that result may be unique to that page, and changing buttons throughout the site may lead to an overall reduction in clicks. Or having different colored buttons on each page may initially maximize clicks, but eventually lead to user confusion. Or maybe the company's brand is green and their competitor's colors are orange. Or maybe it just makes the page really ugly, and while people are clicking more, they are leaving with a bad taste and repeat business suffers. 

To answer your question: good designs should not come solely from the gut, tests are very useful, be careful when relying on them and be aware of their limitations. Hope this helps.

Benjamin Olding Co-founder, Board Member at Jana

February 26th, 2015

A/B tests are only as good as your A idea or your B idea.  I think it's somehow tempting to think we can "randomly" vary the user interface and have something "emerge" from this process.  This is especially tempting a thought for non-designers.

However, mathematically (sorry, I am a statistician), this is a pretty flawed idea.  Steven pointed out the idea that you'll only climb to the peak of the hill you are on (and there could be Mt. Everest a short ride *down* the hill from you), but it's even worse than that: any time you engage in multiple testing, you are guaranteed false positives.  So, at least some of your tests - even if you perform them correctly - are going to reach the opposite conclusion you should.  The more tests you run, the more this occurs.

Other issues include bugs in the way you implement the test.  I have noticed that, culturally, data is sacrosanct... people don't question results as often as they should, and bugs occur more often than you might expect, as engineers have a hard time getting their heads around testing code that implements randomness.  Dilbert loses faith in humanity before he loses faith in the code that brought him this silly answer.  The cult of A/B testing is a tough one to fight against, and I sympathize with every designer who kind of loses the ability to calmly communicate in the face of it.  

Finally, A/B testing assumes you can draw a random sample from all your users - both current and future.  For a large company like AOL I buy into this... they have massive traffic, and it's likely very similar day-to-day, so today's users are tomorrow's users.  For a startup, your user base is in flux.  Even if random interface "A" is preferable to "B" today, I'm skeptical you know for sure it will be in the future.  A/B tests can't be enough to make final decisions on.  Call it "gut" if you want...  I call it experienced based knowledge.

Ok, now having said all that, I'm not some kind of wacky philistine.  I studied stats for years and love data.  What I'm trying to say is I absolutely use A/B tests, but I use them sparingly and carefully.  Here's how: rather than hoping they'll cause user interfaces to "emerge" from chaos, I use them to keep the designers from getting too full of themselves, as a way to guide research, and as a way to remind designers of our actual business objectives.  

Instead of making the null hypothesis that A is no different than B (common approach), and celebrating any "improvement," I insist anyone who is proposing a new interface make a claim about how much better it should be than the current one.  So, if I am told that B should improve our click through by 5%, then the null hypothesis is B is at least 5% better than A.  (If they tell me it will be 1% better, I tell them to work on something else this week - that's usually well in the noise of my testing precision anyway.)

If I can run data for a while and disprove the null that B is at least 5% of better than A (note that this is still A/B testing - it's just not the way people talk about it colloquially, at least in my experience), then I go back to the designer and ask what went wrong.  If he or she is stumped, I get ideas from other people and we start researching these ideas.

If this consistently happens and the designer just yells at me I don't know what I'm doing using A/B tests to question their genius, I get a new designer.  However, I've found good designers are eager to engage in this way: they still control the UI and UX; I'm just asking them to be open to the idea that they might be wrong and that there is something to be learned from mistakes.  I am not using A/B tests to replace designers or to create a culture where anyone's crackpot idea can be validated by the users somehow.

I do not apply A/B testing (of any kind) to overhauls to the whole funnel; not that it can't be done, but implementing the test is pretty hard.  Usually you can use other metrics to tell if things are better than randomly assigning your users to one experience or the other.

Danny Sung

February 24th, 2015

I went to a really good talk by Teresa Torres on Hypothesis Testing (  One of the things she said about A/B testing is that you should always have a key goal and reasoning behind why you're testing for something.  This gets rid of cases where you decide to go green, just because you like it.

Chris Shayan Head of Engineering

February 25th, 2015

A/B testing can give us the what to the user testing's why, by providing a framework to validate a hypothesis. Our framework on A/B Testing has simple steps:

  • Start with a strong hypothesis. A hypothesis that is backed by data.
  • Ensure your hypothesis is testable. A good A/B testing candidate should be noticeable change, something that can create an impact.
  • Define the Acceptance Criteria upfront.Don't leave it open to interpretation.

We do all above steps and even more things like focus group, CAB, 5 Second Test, recording the mouse flow, studying the bounce rate and a lot more. However, we do not make only data driven decisions.  Sometimes we just trust our guts specially our UX experts.

Steven Schkolne Computer Scientist on a Mission

February 25th, 2015

I strongly agree with everything Josh Orum says above. One way to think about A/B testing is optimization - we could call it hill climbing. It can point the way to get up the hill, but once you're at the top - you can't get any further up.

A good designer can see a spot in the distance, and take you to a new hill. A new metaphor, etc., that enhances usability. A/B testing is mechanical and can't make these leaps to new paradigms. However, A/B often can increase key factors by 50% or more so you're a fool if you don't use it as some part of the process.

In-person user testing - in my experience - is barely mentioned in online forums where everyone extolls A/B virtues. In my practice I've found this form invaluable. This isn't a focus group. People aren't being asked their opinions. Instead, an experienced, quite human designer silently observes the interface being used, reads the user, and utilizes their intuition to create new hypotheses for possible improvements. No entrepreneur wants to hire a usability lab to do this for them. I feel in the industry usability labs are stigmatized as a gratuitous expense. However the creative entrepreneur can get many of the benefits of this kind of testing by being resourceful. And the big company, imho, should spend the cash to hire a real firm because the ROI can be significant.

Daniel Turner Interaction Designer, Xerox PARC

February 26th, 2015

Good points above -- one quick addition.

A/B testing can, if the questions are good and based on data already (think about how Alberto Cairo teaches his journalism students to interrogate the data), answer certain questions.

What A/B testing is really poor at, if it can do at all, is answer any "Why" questions. So you may find that 13% more users click on orange than green. Why? It could be that the contrast with the button test is better in one than the other, it could be that your population has certain color blindness (Color Oracle is your friend), it could be cultural (Molly Holzschlag has a great study on this) -- but A/B testing won't tell you.