Why Should We Use Controlled Experiments for A/B Testing?
Published on Apr 2, 2024
by Jonas Alves
Why Should We Use Controlled Experiments for A/B Testing?
Using controlled experimentation for A/B testing on your websites and apps is a powerful way to make informed decisions, leading to improved user experiences, higher conversion rates, and overall better website performance. By adopting a test-and-learn approach to online experimentation, you can continuously refine and enhance your website or app to meet the evolving needs of your audience.
Improved User Experience (UX): Experimentation allows you to test different elements of your website—from layout, content, images and navigation to algorithms, infrastructure and libraries—to see what works best for your users. By analyzing how changes impact user behavior, you can make informed decisions that enhance the user experience, leading to higher satisfaction and engagement rates.
Data-Driven Decisions: Instead of relying on guesswork or subjective opinions, experimentation provides concrete data on what works and what doesn't. This data-driven approach helps in making decisions that are more likely to improve your website's effectiveness and achieve your business goals.
Increased Conversion Rates: Through A/B testing or multivariate testing, you can identify the website variations that lead to higher conversion rates, whether it's signing up for a newsletter, making a purchase, or any other action you want your visitors to take. Experimentation helps in optimising these key performance indicators (KPIs) effectively.
Personalisation: Online experimentation can tailor the website experience to different segments of your audience. With app or website A/B testing and implementing personalised content, offers, and messages, you can better meet your users' needs, leading to increased engagement and loyalty.
Risk Management: Before making changes to your website, experimentation allows you to test those changes and see the impact they have on your guardrail metrics (revenue, page load times, errors, …). This approach helps mitigate the risks associated with website redesigns or functionality updates, identifying and addressing any potential negative impacts before a full rollout.
ABsmartly Experimentation Planning
Figure 1: A/B testing framework from ABsmartly experimentation platform
When planning your product experiments, you need to be clear on what you’re hoping to achieve and which business and customer needs you’re addressing. Ensure that you have a clear hypothesis in mind when designing the experiment because this will allow you to predict what result you’re expecting to see. An important part of the process is understanding how you’re going to measure the results—which metrics are you looking at to prove your hypothesis?
You’ll also need to have a plan regarding the details of how you’re going to implement the experiment; ensure you calculate how long the experiment will take to reach the impact you expect from the experiment. If it is too long, you might need to work on something bolder with a higher chance of a bigger impact, or use a surrogate metric that gives you more power.
If the product experiment takes a long time to implement it might be a good idea to gather some early feedback to understand if there’s interest in the feature from the users. That’s sometimes called the Painted door test approach (a.k.a Smoke test). That’s how user accounts were introduced at Booking.com. The login functionality was initially implemented client-side only. Everything was stored in client-side cookies; no server-side implementation was done until we found a way to display it to users without decreasing conversion rate.
The final step in the online experiment design process is to plan what to do if your results are significantly positive, significantly negative, or inconclusive. After all, you’re conducting experiments to improve customer experience or business results and processes.
Culture of Experimentation
When I joined Booking.com in 2008, two of us were running experiments, the designer Luciano and myself as the developer; it wasn’t a sophisticated process. We had a single metric (booking conversion rate) and A/B testing Chi-square for significance.
That was all we needed at the time, but as we grew and became more sophisticated we felt the need to refine the processes, and to grow that culture across more teams and departments in the organisation. As David Vismans, Chief Product Officer at Booking.com, said in a Harvard Business Review article(1) in 2020,
A/B testing is a really powerful tool; in our industry, you have to embrace it or die. If I had any advice for CEOs, it's this: large-scale testing is not a technical thing; it's a cultural thing that you need to fully embrace.
You need to ask yourself two big questions: How willing are you to be confronted every day by how wrong you are? And how much autonomy are you willing to give to the people who work for you?
And if the answer is that you don't like to be proven wrong and don't want employees to decide the future of your products, it's not going to work. You will never reap the full benefits of experimentation.
In a nutshell, the message is that A/B testing is very powerful, but for a culture of experimentation to really bed in, leadership must be open to being wrong and to giving their teams the autonomy to follow a general direction rather than mindlessly stick to specific instructions. I believe leaders play a critical role in shaping the organization's mindset and approach to experimentation. If the leadership has a growth mindset, they encourage curiosity and iteration and make it safe to see failing as a learning experience.
How to Democratise Experimentation Tests
If you want to democratise experimentation, tests need to be easy to set up, and decision-making needs to be cascaded down to the team. Aleksander Fabijan (Microsoft), Benjamin Arai (Microsoft), Pavel Dmitriev (Outreach.io), and Lukas Vermeer (Booking.com) discuss this notion of the A/B testing flywheel(2).
What we did at Booking.com was to decentralise control; individual teams could set up experiments with no external help so as not to lose time or momentum with meetings or clashing egos. Ultimately, the best way to learn is to actually do something. As Lukas Vermeer frequently says, initially, the objective should not be to run the best test but to run tests. You’re trying to get teams comfortable so that they feel comfortable executing experiments.
My advice is to start with one team, allow the team to slowly learn by doing, slowly iterate, and then when they get in the rhythm and get used to it, rinse and repeat. Anyone without a statistics background should be able to make a decision. In the beginning, you’ll likely show just a p-value for each experiment — and that’s fine, not everyone will feel comfortable making decisions independently. But you want people to ask questions and communicate with each other.
Over time, you’ll want to lower the decision-making cost and run more agile experiments.
You’ll get to the stage where you’re automating your reports, maybe even building a sequential testing engine with efficacy and futility boundaries. This makes it super clear that the test should end and what the decision is when one of those lines gets crossed.
As the number of teams running experiments grows they might start stepping on each other's toes.
This is where the communication piece starts becoming more important than ever. It’s easy to know which tests are running on the website when you have a team of two, like Luciano and me in the beginning, sitting at adjacent desks at Booking.com. But when you grow the organisation to dozens, hundreds, or even thousands of people running experiments, it’s a completely different beast.
You need to broadcast every change to the whole organisation. That’s why at ABsmartly, like at Booking.com, we’ve embedded a social network into the tool. Every action or decision is documented and broadcasted. It allows for full transparency and allows for people to challenge each other decisions, even if they were not part of the team implementing the test. Using the collaboration tool, you are able to embed graphs in the comments, upload screenshots, and soon you’ll be able to tag people, and even deep-link to specific results or parts of the results page that might be of interest to the wider team. You can also integrate with Slack, one of the most common business collaboration tools being used today.
The tool should also allow the teams to build a knowledge base to run improved experiments in the future. Make it able to do a full-text search to find results by their impact, by the platform or place on the page that was changed, by the team that ran them, or by the discussion that it created.
Scale Your Experimentation
We help organizations improve their experimentation culture and run better experiments because, in doing so, you can make informed decisions that improve user experience, convert more browsers, and help you make your business more efficient and make the internet a better place for everyone. By adopting a culture of experimentation and encouraging the whole organisation to get involved in the process, you can continuously refine your website and app products to meet customers’ needs more effectively.
(1) https://hbr.org/2020/03/building-a-culture-of-experimentation
(2) https://medium.com/booking-product/it-takes-a-flywheel-to-fly-b79ad69a62ee