Experiment Stopping Rules

When should you call a test, and how much speed can you trade for honesty?

Lesson 1 of 1

How Long Should You Run Each Test?

Imagine the same stream of ideas run through six stopping rules at once. They differ only in when they call each test. Some wait for a full, well-powered horizon (accurate, but slow to bank a win); some decide fast on a short fixed duration (sooner, but underpowered, so they miss real wins); and some peek often and stop early when conclusive, like mSPRT or AGILE group sequential with error control built in, or naive peeking with none, so it over-claims wildly. The chart banks each shipped test's true lift, and a tick at each rule's end shows how much it over-claimed: what it believed versus what it delivered.

Simulation

Win / Flat / Loss %

Base rate

conversion

Power

fixed horizons

Effect scale

typical size

Monthly traffic visitors / test

Velocity tests / month

Takeaways Over Simulations

References

Georgi Z. Georgiev. (2017). Efficient A/B Testing in Conversion Rate Optimization: The AGILE Statistical Method.
Georgi Z. Georgiev. (2022, updated 2023). Comparison of the Statistical Power of Sequential Tests: SPRT, AGILE, and Always Valid Inference.