A 5 Step Guide to Set Up a Reliable Experiment (and Avoid the Most Common Pitfalls)
Published on Jul 22, 2025
by Christophe Perrin
High quality experiments always start with a strong experiment design that allows you to learn, act, and make better decisions. But setting up a reliable experiment is harder than it looks, and many teams make the same avoidable mistakes. That is why we have put together this guide to take you through the required steps for setting up a trustworthy experiment, whilst also highlighting pitfalls to avoid along the way.
Step 1: Start with a Clear Business Question
Before thinking about variants, metrics, or technical details, consider what question you are trying to answer? Experiments are tools for decision-making, therefore the more precise your question, the more meaningful your experiment will be.
Step 2: Ideate Thoughtfully
Don’t default to testing the first idea that comes to mind. Instead, generate multiple hypotheses, ideally grounded in past data, user research, or earlier experiments, and then prioritise those ideas.
Your priors, ie what you already know or suspect, are the best predictors of whether your test will be successful. Hypothesis ideation and prioritization are important factors for reducing false positives and increasing your learning rate.
Step 3: Formulate a Clear, Testable Hypothesis
A good hypothesis includes at the very least:
A specific change being introduced,
The expected direction of impact, and
A measurable outcome tied to that impact.
Example: “We believe that moving the pricing information above the fold on the product detail page will increase conversions by at least 4%.”
Keep it simple, actionable, and testable.
Step 4: Choose the Right Metrics
Metrics are the key to the success of your experiment. You should define:
Primary Metric: The one metric that answers your hypothesis. This drives the decision.
Secondary Metrics: Provide supporting insights but shouldn’t dictate action on their own.
Guardrail Metrics: Ensure that you’re not hurting critical areas like performance, retention, or revenue while testing.
A fourth category, Exploratory Metrics, can be useful for debugging or hypothesis generation, but this metric should not be used to make decisions. If you find yourself relying on them, you should consider promoting them to primary, secondary, or guardrail status.
Metrics generally fall into two types:
Binomial Metrics: Yes/No or Success/Failure (e.g., “Did the user convert?”). These are easier to interpret and compute, and they’re great for clear business actions.
Continuous Metrics: Capture values on a scale (e.g., time on site, revenue per visitor). These allow for more nuance, but require more careful interpretation.
When choosing a metric, you will need to consider the type that matches your business question and experiment goals, and remember that binomial metrics often have more statistical power at lower sample sizes.
Equally, you need to consider metric quality. A good metric should be:
Aligned with the hypothesis you're testing.
Measurable so that you can reliably track it across platforms and users.
Sensitive so that it can detect meaningful changes from your test.
Useful ie. the results are interpretable and drive clear decisions.
You should avoid using metrics just because they are easy to track. If your metric does not help you to answer your business question, you’re tracking the wrong metric.
Step 5: Pre-Register Your Next Steps
One of the most overlooked (but essential) parts of experimentation is defining what you’ll do next, before you see the results.
This process is sometimes referred to as pre-registering next steps. It helps you to:
Avoid decision-making bias
Increase transparency
Create consistency across experiments
Build trust in your experimentation culture
Example of pre-registered actions:
“If the primary metric increases by more than 5% with no red flags, we will roll out the feature.”
“If the result is insignificant, we’ll revisit the hypothesis or test design.”
“If the guardrail metric dips beyond acceptable limits, we will pause and investigate.”
You can document this in tools like ABsmartly using a custom field or just agree as a team. What matters is that it’s explicit and visible to stakeholders.
Common Pitfalls to Avoid
Even with a solid plan, there are plenty of ways experimentation can go wrong. Here are some of the most common mistakes:
Skipping the hypothesis – Testing without a clear reason leads to unclear results.
Using exploratory metrics to justify decisions – These are not designed for decision-making.
Choosing misaligned or low-sensitivity metrics – Your metrics must help answer your hypothesis and detect meaningful changes.
Switching metrics mid-experiment – This destroys validity and introduces bias.
Changing decisions after seeing the results – Always define next steps ahead of time.
Using inconsistent metrics across experiments – Makes it hard to compare outcomes and measure long-term impact.
Not validating data instrumentation – If the event tracking is broken or inconsistent, your results are meaningless.
Focusing only on statistical significance – Don’t ignore practical significance and decision context.
Over-relying on secondary or guardrail metrics – They’re for support, not primary decision-making.
Not documenting learnings or iterating – A test that fails to launch learning is wasted effort.
Final Thoughts
Great experiments should provide you with a decision framework. When done right, they can help you learn, reduce uncertainty, and move forward with sound data.
To summarize, setting up a strong experiment means:
Asking the right question
Forming a testable hypothesis
Choosing metrics that are aligned, sensitive, and actionable
Planning your decisions before results come in
Avoiding the temptation to change course mid-stream
Whilst it may take some time to get things right, it is particularly important that you’re systematic, transparent, and consistent. Over time, you’ll build much better experiments, as well as set out on the right path to enable experimentation culture in your organization.