A Practical Guide to A/B Testing
Methods, Pitfalls, and Ethics • Practitioner Notes
This practical guide explains how to design, run, and analyze controlled experiments to
make data-informed decisions.
Abstract
We summarize a robust A/B testing workflow, common statistical traps (p-hacking,
peeking), and reporting templates.
1. Experiment Design
• Define a single primary metric (e.g., conversion rate) and guardrail metrics (e.g., churn).
• Power analysis to estimate required sample size for a chosen effect size and significance
level.
2. Running the Test
• Randomization and bucketing • Equal exposure • Avoid mid-test parameter changes
(peeking).
3. Analysis
• Use confidence intervals • Consider Bayesian posteriors for decision support • Correct for
multiple comparisons.
4. Reporting Template
• Hypothesis • Design • Metrics • Results • Interpretation • Decision • Next Steps
Appendix A: Sample Size (Quick Reference)
Baseline Conv. Min Detectable Alpha Samples/Arm
Effect (approx.)
5% 10% relative 0.05 ≈ 15,000
10% 5% relative 0.05 ≈ 50,000
20% 3% relative 0.05 ≈ 120,000
Further Reading
• Google Causal Impact (overview)
• Optimizely Stats Engine (concepts)
• Microsoft: Controlled Experiments on the Web