Most A/B tests are too small to be meaningful. Here is how to run one properly.
Sample size. Calculate the minimum sample size for the lift you want to detect. A 5% open-rate lift on a 25% baseline requires roughly 4,500 sends per variant for 95% confidence. Smaller tests are noise.
One variable. Test subject only OR send-time only OR preview text only. Multivariate tests need much larger samples and are harder to interpret.
Random assignment. Inbox OSS randomly assigns recipients to variants. Do not segment first — that biases the test.
Run to completion. Do not stop early because one variant is "winning." Random variation can show 20-30% differences early that collapse to nothing at scale.
Pick the right metric. Open rate is what subject lines affect. Click rate is what content affects. Conversion rate is what the entire funnel affects. Match metric to variable.
Beware survivorship bias. A subject line that lifts opens but drops clicks is probably clickbait. Watch downstream metrics, not just the headline.
