Prioritized Experiments
Does it matter which experiments you run first?
Perceived Winners First?
Imagine you have a finite set of 100 A/B tests to run. Some will be winners, some losers, and many will be flat. You reject stat-sig negatives (don't ship them) and ship stat-sig positives.
If a human can correctly guess the direction of a test even 59% of the time, should you let them prioritize the order by winners first? The simulation below shows what happens.
Simulation
Takeaways Over 1,000 Simulations
Ordering does not magically create more winners in a finite set. What it does change is when you reach them. Front-loading likely winners compounds gains sooner, so the product gets better faster.
Only Test Positive Estimates From A Finite Set?
Let's see what happens when we apply our intuition to gate experiments believed to be negative.
Imagine having a finite set of 100 ideas while comparing two policies: only running tests with positive estimates versus running all test ideas (with negative estimates in the second half).
Simulation
Loss lasts two test slots, then returns to zero.
Takeaways Over 1,000 Simulations
If you use intuition to decide whether a test gets run at all, you may avoid waste. But at 59% accuracy you will still filter out some real winners, and you may also remove the failures that would have generated valuable follow-up ideas.
What If The Idea Backlog Is Infinite?
When new ideas keep coming in, skipping a negative-estimated test doesn't waste a slot — a fresh candidate fills it immediately. So the question shifts: is there still a case for running negative-estimated ideas at all?
Here we compare test positives only and random ordering against two strategies that also continue into negative-estimated ideas: adding them within the main prioritized queue, or running them in parallel alongside positives.
Simulation
Takeaways Over 1,000 Simulations
With an infinite backlog, skipping a negative-estimated idea costs nothing — a fresh positive candidate takes its place. Running negatives in parallel (line 3) adds tests without slowing the main queue, so the lift compounds freely — but only if those extra tests truly run in the same time frame. Slotting negatives into the main queue after positives (line 2) means the queue gets slower, and that drag shows up in the results.
Direction vs. Impact Magnitude Sorting?
One sorting method uses predicted direction to both filter and order: only positive-estimated tests run, sorted positives-first. The other ignores direction entirely and filters + sorts by estimated impact magnitude — dropping anything with a negative estimated effect and running the biggest estimated effects first.
This lesson isolates the two signals to show which carries more value on its own.
Simulation
Takeaways Over 1,000 Simulations
Direction sorting uses a binary signal — predicted win or loss — and ignores magnitude entirely. Magnitude-only sorting uses estimated effect size but has no directional filter, so it can front-load large losers alongside large winners. The impact estimate error slider shows how quickly the magnitude signal degrades as estimates get noisier.
Does Iterating On Results Help?
What happens when we follow up on statistically significant results with one additional test?
We compare three strategies: test positives only, iterate on all results, or iterate only on results above an effect size threshold.
Follow-up tests keep a configurable relative gain from the source test. A negative follow-up that lands is treated as flipping the original loss into a smaller positive gain.
Simulation
Takeaways Over 1,000 Simulations
Does Iterating On Selected Higher Impact Results Help?
Lesson 5 showed that iterating on large effects can beat iterating on all results. This lesson asks: how does threshold-based iteration compare to simply running negatives in parallel as free follow-ups?
Large absolute effect means winners above the cutoff and losers below it — a strong loss is treated as a signal worth following up, since a loser follow-up can flip it into a smaller win.
Simulation
Takeaways Over 1,000 Simulations
Sequential iteration has to earn its slot. Larger source effects can survive decay better, while smaller source effects often get diluted enough that the next fresh positive-estimated idea is a stronger bet.
Cumulative Gains After Running 100 Tests
This final view stacks the policies from earlier lessons into a single progression, from a random roadmap to prioritized, filtered, open-ended, and iterative experimentation.
The bar chart compares average final cumulative lift after the same number of experiment slots, using the same batch-of-10 candidate rule and the same relative follow-up gain setting for the open-ended and iterative steps.
Average Final Lift
Takeaway
Each added layer changes a different part of the system: ordering changes timing, filtering changes selection, an open-ended backlog changes opportunity cost batch by batch, and iteration changes what happens after you learn something. How much it pays depends both on hit rate and on the relative follow-up gain you assume.