Test & Learn is more than just A vs B
20th May 2019
Author Marketing Automation Team
A/B testing is ubiquitous in business today. Marketing and Customer Experience design teams are no strangers to the idea of putting two solutions head to head and deciding on a “winner” based on which performs better along specific Key Performance Indicators. A/B testing has gained popularity in recent years because of the clear business value it offers to practitioners. Good A/B/(n) testing methodology can indeed enable teams to rapidly test ideas while mitigating the impact of failure, know which ideas work best, and learn valuable knowledge about customer behaviour that makes future efforts even easier. The problem is: most Marketers and Customer Experience practitioners unknowingly impair their own ability to gain these benefits.
The sources of impairment usually manifest along four different themes:
1. Skipping hypotheses
i.e. Designing multiple solutions to test without a clear hypothesis about which should perform best and why.
2. Comparing apples and oranges
i.e. Pitting two solutions head to head that have too many differences between them.
3. Being out of control (groups)
i.e. Neglecting to answer the question “what if we did nothing?”
4. Committing statistical sins
i.e. Failure to follow the rules of statistical analysis.
I’ll address each source in turn and provide advice for how to avoid them.
It is very important to have a valid and good hypothesis if you want to get the most from your Test & Learn efforts. If you don’t have a good hypothesis, you will do a lot of testing, but you won’t learn very much. You also might waste time and resources testing something that is unlikely to impact metrics that truly matter to your organisation.
A good hypothesis in Test & Learn frameworks has four components.
1. First, it identifies a problem or opportunity for improvement: “Our display ads have a very high click through rate, but few customers are put off by needing to log in or sign up to our website before getting to the product they saw advertised.”
2. Second, it must articulate evidence for the problem: “Our goal conversion funnels show a high proportion of people who come from digital ads for products dropping off at the login landing page.”
3. Third, it must suggest a solution: “We can increase the proportion of people who purchase by landing them directly on the product page and having them log in or sign up just before purchase.”
4. Finally, a good hypothesis must say why the proposed solution should work: “Customers can fall in love with the product as quickly as possible, increasing their likelihood to overcome obstacles such as logging in to buy it.” Once you’ve formed a good hypothesis, you can design variations that test it and learn from the results. If you’re missing even one piece of a good hypothesis, you could be wasting your time and money.
Comparing apples to oranges
Designing a good hypothesis is trickier than it looks, but properly testing it can be even trickier. It is very common for practitioners to decide they’ll do an A/B test, form a hypothesis (or not) and then put two wildly different variations of a customer experience into market. Sure, at the end of the test, they’ll know which version “won,” but they will have no idea why. It’s the “why” that counts. What’s worse is if the newly-designed version actually performs worse than current state, the team could be left with no insights about what to try next. The correct way to design a good hypothesis test involves designing variations that only vary along the dimensions that fall under the scope of your hypothesis. For example, if you run with the hypothesis formed in the section above, you’ll need a variation of the ad that links directly with the product page and a version that does not. However, if you expand that hypothesis to add that you might see the highest uplift in purchases if you make the login process optional because it would allow people to skip that tedious step, you now need to create a test that covers both suggested improvements.
You are going to vary two elements at once, each with two possible states. Therefore, you’ll need four variations:
1) Ad links to login, login is mandatory;
2) Ad links to login, login is optional;
3) Ad links to product page, login is mandatory; and
4) Ad links to product page, login is optional.
Incidentally, you’ll also need to increase your experiment audience size to ensure you have a large enough sample to test all your variations. Without any one of these four variations, you would have trouble knowing why any particular variation outperformed any of the others. Did variation 4 win because the ad linked to the product page, or because the login was optional, or both?
Being out of control (groups)
The third source of impairment often occurs because many practioners leap straight into testing ideas on how to improve the conversations they’re having with their customers. The problem is, practioners often aren’t sure how effective their conversations are in the first place.
Causal attribution of value in marketing is difficult and complex to achieve. However, if it’s never been established how effective current state is, it doesn’t matter if a hypothesised improvement is more effective—it could still be worse than simply leaving your customers alone.
Luckily, A/B testing is a remarkably effective way to establish a causal link between your design decisions and any additional value your company sees. To gain this benefit, randomly allocate a subset of the audience to be left out of your solution (i.e. create a control group). You can save time and still test new ideas by designing, A/B/C(ontrol) tests. Once you have done this a few times, you’ll begin to see consistent difference in effectiveness (or not) between your solution and your control group. Now you can confidently say that the specific marketing solution you designed typically contributes a certain amount of value for your company. Then when you improve upon that performance, you can be sure your marketing efforts are contributing a net positive value.
Committing statistical sins
The final source of self-impairment is a mistake made by business and science practitioners alike. Failure to understand and respect the limitations of statistical analysis results in biased decision-making and makes the whole Test & Learn process no better (and possibly much worse) than relying on gut feeling.
If you are using traditional null-hypothesis significance tests for your analyses, make sure you plan sample size in advance using a power calculation tool, select the correct statistical test for the experiment you’re running, pay attention to confidence intervals (not just whether or not you can reject the null hypothesis), and meta-analyse experiments that examine the same hypotheses and metrics. If you don’t, you run the risk of making a disproportionate number of decisions based on “winners” and “losers” that shouldn’t have rightfully been declared. It is most prudent to use a data analyst with a strong foundation in statistics to analyse your Test & Learn experiments.
If any of these sources of Test & Learn impairment sound familiar to you, you may want to consider reviewing your Test & Learn programme and upskilling your team on how to effectively apply the scientific method to design processes. Get in touch with Davanti Consulting to learn more about establishing a robust and repeatable Test & Learn framework for your organisation.