Choosing the Right Metric for Every Test

Published on Jun 30, 2022

by Carolyn Campbell-Baldwin

When you’re leading a team through scaling experimentation, it’s easy to focus on the volume of tests or the speed at which they’re shipped. But how do you know if those experiments are actually delivering business value?

Running tests is the easy part. Knowing whether they’re worth acting on is where teams often get stuck.

This is where Overall Evaluation Criteria (OECs) come in. Think of them as your experimentation compass, a single, strategic measure that keeps your team aligned on what “good” really looks like. Done well, OECs shape how your entire organization makes decisions.

What Are Overall Evaluation Criteria (OECs)?

An Overall Evaluation Criterion (OEC) is the single most important metric you use to judge the success of an experiment. It’s not just another KPI on the dashboard - it’s the answer to this question: If this test wins, what does “winning” mean for our business?

Early pioneers of online experimentation realised that without a shared yardstick, different teams could draw wildly different conclusions from the same test. One team might celebrate a spike in click-through rate, while another worries about a dip in long-term retention. Without an OEC, both could technically be “right”, and that’s the problem.

A good OEC unifies your testing strategy. It translates your high-level business goals into a concrete, measurable outcome. It allows product teams, engineers, and leadership to speak the same language when assessing impact. Crucially, your OEC isn’t necessarily the flashiest metric. It’s the one that best captures sustained value creation for your users and your business.

Here’s how to tell you’ve found the right one: when you win an experiment based on your OEC, you’re confident it’s a step forward, not just a spike on a chart.

Why OECs Are Essential for Scaling Experimentation

If you’ve ever reviewed a stack of test results and thought, “These all look promising - but which ones actually moved the needle?” you’re not alone. As organizations mature in their experimentation practices, the challenges shift. Early on, the biggest hurdle is just getting people to test. However, once testing becomes a habit across multiple teams and surfaces, the next challenge is consistency. How do you ensure that all those experiments are driving the business forward in a meaningful, measurable way?

That’s where Overall Evaluation Criteria earn their keep.

OECs align your experiments with strategy, not vanity

Let’s say your mobile team ships an experiment that increases the tap-through rate on a promotional banner. Meanwhile, your retention team’s test nudges users into a slower onboarding flow that ultimately improves 30-day activity. Both results look good in isolation. But are they pointing in the same direction? Are they helping the business grow sustainably, or just pushing different metrics around?

Without a unifying OEC, teams risk optimising for their own local goals, sometimes at the expense of broader business outcomes. Think of it like a rowing crew: each person is rowing at full power… but not in sync. A clear, shared OEC ensures that all teams are rowing in the same direction. It acts as a strategic filter, helping teams prioritise the experiments that create the most lasting value.

OECs create trust in experimentation at the leadership level

For leadership, A/B testing can sometimes feel like a black box. Tests are run, charts are shared, but it’s not always clear how those tests map to business KPIs. That gap in understanding can lead to hesitation, scepticism, or even pushback, especially when results are ambiguous or counterintuitive.

An agreed-upon OEC builds trust. It lets leaders know, “This test is a win because it improves the metric we all care about most.” And when that metric is clearly tied to business value, revenue per user, subscription renewal, or user retention, it’s much easier to green-light rollouts and double down on successful strategies.

OECs prevent ‘false wins’ and optimise for long-term impact

One of the most common pitfalls in A/B testing is declaring victory too early, often based on short-term spikes that don’t sustain over time. Clicks go up, engagement looks promising, but two months later, user churn creeps up or revenue drops.

Focusing on your OEC, especially when paired with guardrail metrics, shifts the focus from short-term gains to long-term, reliable outcomes. It encourages teams to ask smarter questions: Does this test move us closer to our strategic goals? Or just game the surface metrics?

OECs are more than a reporting tool. They’re a leadership tool. They give you visibility, alignment, and confidence across teams, experiments, and time horizons.

Choosing the Right OEC: What to Consider

Defining your Overall Evaluation Criterion isn’t a one-size-fits-all exercise, and it shouldn’t be. The right OEC depends on your business model, your growth stage, and what you’re optimizing for as a company. But regardless of your domain, one principle holds: your OEC should reflect what success really looks like, not just what’s easiest to measure.

What Makes a Strong OEC?

A strong OEC is:

Aligned with business value: It should track something that matters to the health of your business, not just a surface-level engagement stat.
Measurable: You need to be able to track it cleanly and consistently, across experiments and over time.
Sensitive to change: It should respond to the kinds of changes you're testing, so you’re not waiting six months to see impact.
Resilient: It shouldn’t be too noisy or volatile - otherwise, you’ll end up chasing false positives or overreacting to small shifts.

Often, this means striking a balance between precision and practicality. You want something close enough to your long-term goals (like customer lifetime value) but measurable on a realistic testing horizon (like week-two retention or repeat purchase rate).

Examples Across Industries

Let’s bring this to life with a few industry-specific examples:

E-commerce: Instead of just tracking conversion rate, consider metrics like revenue per user or repeat purchase rate, they tell you more about long-term value than one-off clicks.
Media & Content: Clicks and scrolls are easy to inflate. Try focusing on engaged time on site or subscriptions started per session.
SaaS: Feature adoption might seem tempting, but active usage over time, retention at day 30, or trial-to-paid conversion often paint a clearer picture of product value.
Marketplaces: You’ll likely want a blend, such as a successful transaction rate per session or a buyer-seller match rate, to ensure both sides of the platform are growing in harmony.

Even with the perfect OEC, context matters. A winning test under one user segment, season, or channel may not translate elsewhere, that’s why audience segmentation and real-time monitoring are critical to trustworthy decisions.

Don’t Forget the Guardrails

Your OEC is your North Star, but guardrail metrics act as checks and balances. They alert you if your “winning” experiment is having unintended side effects elsewhere (say, increasing revenue but hurting customer satisfaction or operational costs). Think of them as your safety net: invisible when all’s going well, but invaluable when things go sideways. Choosing the right OEC is about enabling better decisions. When teams have clarity on what “success” means, they move faster, with more confidence, and far less second-guessing.

ABsmartly’s Approach to Evaluation Criteria

At ABsmartly, we know that defining a good Overall Evaluation Criterion isn’t just a nice-to-have, it’s a fundamental part of building a scalable experimentation culture. And yet, too often, teams are left to figure this out alone. Our platform is designed to help you not only run experiments, but run the right experiments - the ones that align with your business goals and give you trustworthy answers, fast.

Define Your OECs with Intention - and Flexibility

We built our system to enable you to define and track your key metrics, including your primary OEC and any guardrails, in a way that suits your business. Whether your North Star is revenue per user, time to conversion, or week-three retention, you can measure what matters in real time. And because we support full metric versioning and metadata, you get consistency without losing context. Every test can be evaluated with the same rigour, even as your business evolves.

Real-Time Monitoring, Smarter Decisions

Our platform features real-time monitoring, keeping you informed without overwhelming your team. You’ll see how your OEC is trending as data rolls in - and be alerted when anything drifts off course. That’s especially powerful when combined with Group Sequential Testing (GST) - a statistically robust method that allows you to check in on results midstream without sacrificing rigour. If a test is clearly underperforming or safely conclusive, you can stop early, saving time, reducing risk, and speeding up learning loops.

Building Company-Wide Alignment

Defining an OEC isn’t just a technical decision. It’s a cultural one. That’s why we don’t just hand you tools - we help you use them well. Whether you’re setting your first OEC or refining one across multiple product lines, our team works alongside you to guide metric selection, align stakeholders, and build shared understanding. This is part of our hands-on support model because we believe statistical integrity and usability should go hand in hand.

Make Every Test Count

Scaling experimentation isn’t about running more tests, it’s about making more decisions you can trust. And that starts with clarity. Clarity about what you’re measuring. Clarity about why it matters. And clarity about what a “win” really looks like. Overall Evaluation Criteria give your teams that clarity. They help product, engineering, and leadership stay aligned on what success means - across experiments, teams, and time.

At ABsmartly, we’ve seen the impact of getting this right. Teams that define strong OECs move faster, make fewer mistakes, and build far more confidence in the value of their experiments. They don’t just test more—they test smarter. So, if you’re serious about building a high-velocity, high-integrity experimentation culture - one that drives real business outcomes, it’s time to take a closer look at the metrics guiding your decisions.

Let’s work together to define what success looks like for your business—and make every test a step in the right direction. Book a demo to see how ABsmartly helps you set smarter metrics and scale experimentation with confidence.

Home

Benefits

Resources

About

Pricing

Benefits

Resources

About

Pricing