Optimising A/B Testing for Low-Traffic Sites or Apps

Published on Oct 24, 2024

by Jonas Alves

A/B testing is incredibly valuable for businesses looking to refine their websites and apps, and understand the user experience better. But experimentation is significantly more challenging when it comes to low-traffic sites and apps. We are referring to folks with a few thousand visitors/users per month. Drawing statistically significant conclusions from user experience usability testing is much more difficult with fewer visitors. When your traffic is low, and you’re working with relatively small numbers, it’s harder to collect meaningful data that confirm the success (or otherwise) of your tests. Despite these challenges, experiments on low-traffic sites and apps can still be effective, if businesses think creatively and make a few key strategic adjustments, using a reliable A/B testing platform

Understanding Statistical Significance with Low Traffic

Businesses that are running experiments need to make sure that their A/B test outcomes are real, not just random fluctuations. In high-traffic environments, achieving significance is relatively straightforward because large sample sizes make it easier to detect even small changes. However, low-traffic websites and apps face an uphill battle. A product with only a few thousand visitors/users each month will struggle to recognise small variations, such as a 1% change in conversion rates, in the standard two-to-four-week experiment window.

One solution for low-traffic sites and apps is to make bolder changes. Instead of testing minor tweaks, such as small text changes, companies with low traffic should focus on more impactful alterations—such as redesigning key parts of the user journey or significantly changing the call-to-action buttons. Larger adjustments usually yield more dramatic results, making statistical significance easier to achieve in a shorter time frame.

Effective Strategies for Low Traffic A/B Testing

Businesses with low traffic need to adapt their testing approaches to account for the constraints of limited sample sizes. These are some of the strategies we recommend to ensure effective experimentation, even with fewer visitors.

Run Experiments Over Longer Periods

High-traffic sites and apps can often complete tests within a few weeks, but for low-traffic sites and apps, you need to be patient. Extending the duration of experiments allows more data to accumulate, increasing the likelihood of detecting statistically significant changes. There are potential downsides to running long tests though, such as user behaviour changes over time. Users might also switch devices or browsers, complicating the tracking of interactions and conversions.

Why Group Sequential Testing is well-suited for Low traffic sites and apps

Not all A/B testing methods are created equal. For low-traffic products, group sequential testing (GST) allows businesses to make interim decisions throughout the experiment without sacrificing statistical power and integrity. This means that if a test reaches significance earlier than expected, it can be stopped, saving valuable time and resources.

This methodology is much more effective than traditional approaches (Fixed Horizon), where businesses have to wait longer until the predetermined sample size is reached before making any conclusions. Group sequential testing provides a pragmatic balance between speed and accuracy, making it an excellent option for low-traffic experimentation.

Use Surrogate Metrics for Faster Results

Using surrogate metrics or proxy metrics—closely related to, but not exactly, the ultimate goal (or primary metric)—is a great way to speed up decision-making. For example, if the primary objective of an experiment is to increase revenue, it will take too long to gather enough data to make an informed decision when a business has low traffic. Instead, focus on intermediary actions, such as how many users add an item to their cart.

By concentrating on metrics that are closer to the behaviour being tested, businesses can get quicker feedback and flex their strategies more rapidly. While surrogate metrics should not replace the final business objective, they offer an early indication of how users interact with the change, giving valuable insights along the way. A good surrogate metric should be quite correlated with the original metric and a good tool will allow you to still check the impact on the main metric after the decision was made. 

How Power Calculations Help in A/B Testing

Power calculations are extremely useful, especially for low-traffic sites and apps, as they help determine how long the test needs to run with the traffic we have. Without an adequate sample size, tests risk being underpowered, making it impossible to understand whether the data is genuine or something that happened just by chance. 

Power calculations are useful in showing how many users are needed and how long the test should run to achieve statistical significance. Power calculations ensure businesses don’t conclude experiments too early or rely on inconclusive data due to insufficient traffic.

At ABsmartly, we have a built-in power calculator so that teams don’t skip it or have to jump out of the A/B testing software to an external calculator that most times uses a different statistical method than the one that will be used by the AB testing tool.

Use Cases for A/B Testing on Low-Traffic Sites

For businesses with low traffic, A/B testing isn’t always about improving conversion rates or boosting revenue directly. Sometimes, small-scale experiments are used for risk mitigation and quality assurance. Sometimes businesses aim to ensure that changes made to the product don’t negatively impact user experience or system performance.

For example, a company might run tests to confirm that a new feature or bugfix functions correctly across different browsers or mobile devices, and doesn’t break other parts of the website, safeguarding the user experience testing and preventing issues before they spread. Using A/B tests in your development processes, either to enhance your continuous integration, or canary testing efforts, or as another step of risk mitigation, will allow you to not only speed up the time to ship features, but also offer you more sensitivity and specificity to detect problems. See our webinar on real-time data for more details. A/B testing in this context is important for safeguarding the user experience and preventing issues before they spread. 

Running Multiple Experiments on Low-Traffic sites and apps

A common misconception is that low traffic limits the ability to run multiple experiments simultaneously. However, companies can run several tests at the same time, on multiple areas of the website or app, across multiple devices (web app, mobile app), etc., as long as these experiments don’t interfere with each other. It’s unwise to run two tests on the same element—changing both its colour and its text simultaneously—since these variables could interact with each other, making the results of the experiment unclear.

At ABsmartly, we offer interaction detection, which notifies users when multiple tests might be interfering with each other.  Although non-planned interactions are very rare, or even almost nonexistent, experimenters are frequently concerned about running dozens of experiments simultaneously on the same page. A good experimentation platform with this feature ensures peace of mind. Having this functionality allows them to do it with peace of mind, knowing that in case there’s an issue, the tool will alert them.  

Democratising Experimentation

Effective A/B testing is still possible for companies with limited traffic by using strategies like bolder changes, longer tests, surrogate metrics, and experimentation as risk mitigation. Using a power calculator to know if an experiment is feasible in a reasonable amount of time allows experimenters to know ahead of time if they are good to go, if they need to change the metric to a surrogate closer to the change, or even if they should just go with a non-inferiority test for risk-mitigation. Interaction detection is also a good feature to have, which allows businesses to run multiple experiments, sometimes dozens or even hundreds of them simultaneously without fear of interference, ensuring they get faster, data-driven decisions.

While A/B testing on low-traffic sites and apps has some challenges, it’s certainly not beyond reach when approached intentionally and with the right tactics. 

Platforms like ABsmartly, support experimentation for businesses of all sizes, with features such as group sequential testing, interaction detection and power calculations, making the process of experimentation simpler, more reliable and smoother. 


Home

Benefits

Resources

About

Pricing

Benefits

Resources

About

Pricing