A/B Test Sample Size Calculator
Calculate how many visitors you need for statistically significant A/B test results. Never run underpowered tests again.
Your current conversion rate (control variant)
Smallest improvement you want to detect (e.g., 0.5% absolute increase)
How confident you want to be in your results
Probability of detecting an effect if it exists
What Is Sample Size?
Sample size is the number of visitors or observations needed in each variant (control and treatment) of your A/B test to achieve statistical significance. It's a critical calculation that determines whether your test results are reliable or just due to random chance.
Without proper sample size, you risk two costly mistakes: (1) declaring a "winning" variant that's actually just lucky variation (Type I error), or (2) missing a real improvement because you didn't collect enough data (Type II error). The correct sample size balances these risks based on your specific test parameters.
Why Sample Size Matters for A/B Tests
- Avoid False Positives: Adequate sample size reduces the risk of declaring a winner when the difference is just random fluctuation. At 95% confidence, you accept only a 5% chance of being wrong.
- Detect Real Improvements: Proper power (typically 80%) ensures you have a high probability of catching real improvements when they exist, avoiding missed opportunities.
- Business Confidence: Statistically significant results give you confidence to implement changes with real revenue impact, rather than relying on gut feeling.
- Efficient Resource Allocation: Knowing your sample size upfront prevents wasting time and resources running tests longer than needed or stopping early out of impatience.
Understanding Statistical Significance
Statistical significance answers the question: "Is this result real, or just random chance?" A result is statistically significant when the probability of observing it by chance alone (if there were truly no difference) is very small - typically less than 5%.
- P-Value: The p-value is the probability of seeing your results if there's actually no difference. A p-value of 0.03 (3%) means there's only a 3% chance this difference occurred randomly. With 95% confidence, you need p < 0.05.
- Confidence Interval: A 95% confidence interval around your observed effect means if you repeated the test 100 times, the true effect would fall within this range 95 times. Wider intervals indicate less certainty; narrower ones indicate more precision.
- Statistical vs. Practical Significance: A result can be statistically significant (real difference exists) but practically insignificant (too small to matter). Conversely, a practically significant result might not reach statistical significance without enough data. Both matter for good business decisions.
Confidence Levels and Statistical Power
Confidence Level (Type I Error Protection)
- 90% - 10% false positive risk: More lenient, detects effects with fewer visitors. Use when speed matters and you can tolerate higher false positive risk.
- 95% - 5% false positive risk: Industry standard. Recommended for most A/B tests. Provides good balance between safety and efficiency.
- 99% - 1% false positive risk: Very strict, requires more visitors. Use only when false positives are extremely costly.
Statistical Power (Type II Error Protection)
- 80% - 20% false negative risk: Standard in A/B testing. Recommended for most tests. Good balance of detecting real effects while managing sample size.
- 90% - 10% false negative risk: More strict, requires more visitors. Use when missing improvements is very costly.
Minimum Detectable Effect (MDE)
Minimum detectable effect is the smallest change in conversion rate (or other metric) that your test is powered to detect reliably. It's expressed as an absolute percentage point change.
- 0.1% MDE: 2.5% → 2.6% conversion - Very large sample size (hundreds of thousands)
- 0.5% MDE: 2.5% → 3.0% conversion - Large sample size (tens of thousands)
- 1.0% MDE: 2.5% → 3.5% conversion - Moderate sample size (thousands)
- 5.0% MDE: 2.5% → 7.5% conversion - Small sample size (hundreds)
Smaller MDEs require exponentially larger sample sizes. Define your MDE based on business impact: what's the smallest improvement worth implementing? If a 0.5% improvement would save $100K/year, that's worth detecting; if a 0.1% improvement would only save $5K, you might not need to detect it.
How to Use This Calculator
- Enter Your Baseline Conversion Rate: Get this from your analytics (Google Analytics, Rybbit, etc.). If 100 visitors and 3 convert, your baseline is 3%. Use your most recent representative data.
- Define Your Minimum Detectable Effect: What's the smallest improvement worth detecting? If your baseline is 2%, a 0.5% MDE means you want to reliably detect moving to 2.5% or higher. Think about business impact, not just percentage points.
- Choose Your Confidence Level (Default: 95%): 95% is standard and recommended. This means you accept a 5% chance that your result is a false positive. Only use 90% if time is critical, or 99% if false positives are extremely costly.
- Select Your Statistical Power (Default: 80%): 80% power is standard, meaning you accept a 20% chance of missing a real improvement. Use 90% if missing improvements is very costly (requires larger sample size).
- Calculate and Plan Your Test: The calculator shows the required sample size per variant, total visitors needed, and estimated test duration. Use Rybbit Analytics to monitor progress and track when you reach statistical significance.