As data-driven marketers, we would like it best, to test every single change before implementing it globally. And this is essential to be sure that your changes really have a lasting positive effect and it is not just your personal opinion of what works best or coincidence. But how do you create A/B tests that really give you the results you need? How do you know how long the experiment has to run? And what alternatives are there, when you just have too little data? We will focus on answering all those questions in this blog post, so let’s get started.
Not only big companies but also small or medium sized ones want to continually improve their performance and want to do A/B testing to do so. The problem is that the conversion volume is often way too small to reach statistical significance in a reasonable time, as you usually don’t want to run one experiment for half a year. Search Engine Journal states (among many others) that you need at least 1,000 conversions per month and 250 per test for your results to be significant. In the beginning, it often looks like one or the other variant is performing better, sometimes even for some weeks.
But what often happens is that with more and more data, you regress back to the mean in the end.
So, to get an estimation of how long your test will have to run and how much traffic you will need, it is essential to do a sample size calculation BEFORE you start your test. Evanmiller.org offers a great, free online tool. You need to know your baseline conversion rate and the minimum detectable effect. The minimum detectable effect is the minimum size of the effect you expect your experiment change to have versus the current conversion rate. Naturally, to detect a very small effect you will need to collect a lot of data to be sure that it is not just pure coincidence. To detect big changes, you will need much less data but your changes probably need to be more essential. Based on your input the tool will give you the sample size PER VARIATION.
Once you have an estimate of your sample size you can start your test. At some point, you will want to know, if your results are significant or not. An excellent online tool for this is provided by VWO. It is very tempting to test for significance every day, because you hope your experiment will work. But keep in mind that with a significance level of 95% out of 100 times you test, you can see a significant result 5 times out of pure coincidence (this of course isn’t true for every 100 times you test, but if you test very often, on average you will see 5 false significant results out of 100).
If you have enough traffic and data and your result is significant (hopefully showing better results for your new variant), you have a real A/B test and a real result.
But what can you do, if you could never reach the required traffic and conversions numbers because you have too little traffic or too low of a conversion rate. The solution is not to take your final sale or sign up as a conversion to test, but to take micro-conversions. A micro-conversion can be the click-through-rate to the first step of the sign up or sales funnel, the use of the search or it can also be the bounce rate all of which are typically much higher than your conversion rate. With this, you can still get significant results for your A/B tests, even if you don’t have a huge number of conversions every month.
So, if you have a lot of traffic, just make sure you know beforehand how many sessions you need per test variation and only test for significance once you have reached the required traffic numbers. If you don’t have that much traffic, start thinking about which alternative metrics you can use, to still get significant results. With all of this being said, you are prepared to implement and evaluate real A/B tests and hopefully keep improving your performance.