The Seven Steps to a Split Test That Gets Results
OK, you’ve heard me pound the table about split, or A/B, testing. And, yessir, I’m at it again. Because it really is that important.
But I’m going to go you one better in this week’s sermon: I’m going to give you seven steps to take so you can run and benefit from your own split test.
Come forward, brothers and sisters, and hear the Word. Ye shall conduct split testing, and ye shall heal thine conversion optimization wounds.
Pardon my fervor. If you’ve heard me before, you know that split testing alone is not the answer to conversion optimization. But A/B testing is the foundation of optimization. Any part of your website or marketing program can – and should – be split tested to determine user preference.
And, lo: the seven signs. Steps. Below I outline the seven steps to valid and usable A/B split testing results.
Pope Francis is a chemist by training. You can bet he’s a believer in A/B testing.
The Beauty of the Classic A/B Split
“A/B testing” is one of those perfect, self-descriptive names. The test is between two variables, A or B, and it’s a split; each will be the choice of a certain percentage of your test participants. For example, you test two versions of your homepage that have different headlines. Which one works better, A or B? If 70 percent of your respondents pick A and 30 percent pick B, you know which way to go.
In a conversion optimization split test, A and B are each shown to half (50 percent) of users, and if A results in a 25 percent conversion rate and B has a 12 percent conversion rate, Option A is your winner.
Courtesy of Optimizely, a classic A/B split test for the ACME company:
It’s pretty simple, but it’s also more than, “Here they are. Pick one.” For one thing, split testing can go beyond A/B and offer C, D, E … That’s “multivariate testing,” which basically means there’s more than two options, or more than one split.
The thing is, whether A/B or multivariate, if you do your split test correctly, it can yield valid, actionable information. Or you can do it wrong and wind up with invalid, actionable data. You want to make sure you are acting on valid data from your split testing.
Lemme tell you how to wind up with valid data from your split test:
1. Establish a baseline. Before you can make comparisons, you need to know what your expectations are. Gather analytical data for your site or campaign as it exists before you do any testing. If your site, campaign, etc., is new, you’ll need to let it run as is for a while to make sure you have enough data to paint a picture of how you are doing. You have to know how well A is being received before you can learn much about how A performs when it’s presented alongside B. This is sometimes referred to as A/A testing. You’re testing what you have against itself.
2. Decide what you want to test. Right, that’s so simple; of course, you have to know what you want to test. But you could split test anything, down to fonts, colors, phrases or single words.
Below (from Which Test Won) the test is a comparison of CTA copy: “See Product Video” vs. “Watch Demo” in the orange button under “It’s time to thrive.”
So you need to set some priorities. You should test features near the end of the conversion funnel first. If you’re setting up a new site or campaign, test your call to action text first. The last place you want to lose someone is at the CTA. Headlines on your pages are crucial to making people want to read further once they land there. They’re good to test.
If you’re working with an established site or campaign, ask yourself where you are having problems. If you get a good click-through rate but users abandon you once they get to your landing page, you should test various elements of your landing page. Your analytics data will show you where people are abandoning you. The problem is there somewhere. You need to test to pinpoint it.
Defining what to test requires a hypothesis: We’re losing conversions because A is wrong. And to this proposition, you add: Let’s see how it compares to B.
3. Choose a limited number of elements to test. In a classic A/B test you test two specific items and one gets a thumb’s up and the other, by process of elimination, gets a thumb’s down. It’s pretty clear. But there could be many, let’s just say dozens, of elements to a single landing page. As the number of variables in a test grows, the potential choices (splits) grow quantitatively. The easiest split test for obtaining the clearest results is the A/B test. Unless you’re well-versed in statistical theory, resist the temptation to test a bunch of ideas at once, and just run an A/B split.
4. Create a variation. Here, I’m going to tell you to reinvent the wheel. Or to make someone else do it. Your copywriter, designer, whoever, has given you their best idea, and now you need another that’s just as good to see whether their first best idea is actually any good. Don’t worry. They’re used to it. In fact they probably have several pretty good ideas that didn’t get used.
Anyhow, you need a B to pair with your A.
Below is an illustration of the classic A/B split test for the position of a signup form, courtesy of Smashing Magazine. Except (in addition to a repeated typo), the illustration indicates two other differences between A and B. Version A has a flush-left page title and B has a flush-right title, and the position of the navigation bar differs, as well. Did users respond to the signup form’s position or the page title’s, or the nav bar’s? As you deploy your variation, make sure the only difference(s) is / are what you mean to test.
Early in the game, test against a variable that represents a markedly different approach. As you move more toward fine tuning a campaign (testing should be constant), the variance between A and B should get smaller. This may make creating the variation easier. Maybe you just need a synonym for the keyword you want to test in the headline. Or you want to see whether adding the name of your city or state makes a measurable difference. Maybe you’re testing a green CTA button against a red one.
But everyone knows green means “go” and red means “stop,” so why test that? Because HubSpot said that in their A/B test on button color, the red button scored a 21 percent increase in CTRs over the green button, that’s why. Don’t assume. Test.
5. Get help. Split testing is easy, but not something you want to try on your own. That sounds contradictory but it’s not really. You could do it, but it’s not an efficient use of your time. If you’re going to get a statistically valid sample, that could take a significant chunk of time that you could put toward other uses. There are services that can run your test for you while you take care of other pressing needs, which I’m sure you have.
Among the better ones available (IMHO) are:
- Optimizely, which has an easy-to-use dashboard and a variety of useful tools.
- Convert, which features easy test setup and runs tests automatically. It also supports multivariate tests.
- Visual Website Optimizer, which has a strong suite of metrics and many other features.
- Google Analytics “Content Experiments,” which is a part of Google’s package of tools and free, but not necessarily the easiest to use.
- Adobe Target, which is integrated with Omniture analytics, can help you run A/B and multivariate testing, and has other optimization assistance features.
There are others you can find with a simple search. You may need to look at a few to find the one that fits you best.
6. Run the test long enough. Just as you need enough data to establish a baseline, you need test data that truly represents traffic over a substantial period of time. How long that takes depends on several variables, such as traffic and the number of elements being tested. An A/B test is faster than a multivariate test because there are fewer variables in the A/B split. You testing service software will include a statistical significance tool, which will track responses and let you know when you have statistically valid sample.
A split test should cover at least two full, seven-day weeks to account for natural buying cycles according to day of the week and time of day. If there was a holiday or an impactful news event during one of your test weeks, that could have significantly skewed results. Add another week.
The necessary length of a split test will also depend on the length of your customers’ purchasing cycle. It needs to cover at least two, but preferably more, purchasing cycles. For example, in Google Analytics, the Multi-Channel Funnels section’s Time Lag and Path Length reports show how long (in days and in interactions) it takes users to ultimately become customers. If you’re selling big-ticket items, you’re likely working with a longer purchasing cycle as customers research, consider, hem, haw and come to your site again and again before deciding to buy.
Opinions and testing tools differ, but a rule of thumb to use is that you want at least 100 conversions per variation. The statistical confidence, which your testing tool will report, should be 95 percent or better. If you have 100 conversions per variation and a confidence of 85 percent, you need to keep testing until you have 200 to 250, maybe. Don’t be fooled by a 95 percent confidence with a small sample of conversions, which some tools will report. Above all, get a proper sample.
7. Keep on testing. If you are a marketer, “testing” is part of your job description. You should be testing constantly to gain a clearer understanding of your customer base and what kinds of marketing material works, from keywords, to ad copy, headlines, images, etc., etc. I’ve seen one blog post that suggests more than 150 email campaign elements you can split test. That’s overdone, maybe, but a list that long is instructive in the sense that it reminds us that, yeah, you can test that, too.
Those Who Test Are Those Who Know
So there you are, a split-testing plan you can put to use right away. To boil it down, run the data so you know what’s going on with whatever part of your marketing campaign you’re looking at, pinpoint a problem area and develop a hypothesis about why it is a problem. Then create a variant and test A and B to see what your users say. And then test some more.
Trust me, you’ll love having test results and knowing whether something does or doesn’t work for you.