Question 1

What is sample size in A/B testing?

Accepted Answer

Sample size is the number of visitors or observations required in each test variant (control and treatment) to reach statistical significance. It's calculated based on your baseline conversion rate, the minimum detectable effect (smallest improvement you want to reliably detect), the confidence level (how certain you want to be), and statistical power (probability of detecting the effect if it exists). Proper sample size ensures your A/B test results are reliable and not due to random chance.

Question 2

Why does minimum detectable effect matter?

Accepted Answer

Minimum detectable effect (MDE) defines the smallest improvement you want to detect in your A/B test. For example, a 0.5% improvement in conversion rate or a 2% improvement. Smaller MDEs require much larger sample sizes because you need more data to distinguish tiny differences from random variation. A 0.1% improvement might need 10x more visitors than a 1% improvement. Setting realistic MDEs based on business impact helps you balance statistical rigor with practical feasibility.

Question 3

What is statistical significance and why does it matter for A/B tests?

Accepted Answer

Statistical significance measures whether the difference between your control and treatment variants is real or just due to random chance. A statistically significant result typically means there's less than a 5% probability (at 95% confidence) that the difference occurred by chance alone. Without sufficient sample size, you might see a 'winning' variant that's actually just lucky variation. Reaching statistical significance requires hitting the calculated sample size, which is why you shouldn't stop A/B tests early even if you see early winners.

Question 4

What is the difference between confidence level and statistical power?

Accepted Answer

Confidence level (often 95%) measures the risk of Type I error - claiming a difference exists when it doesn't (false positive). A 95% confidence level means you accept a 5% chance of false positives. Statistical power (typically 80%) measures the risk of Type II error - missing a real difference when it exists (false negative). An 80% power means you have a 20% chance of missing a real effect if it exists. Both matter: 95% confidence and 80% power are standard in A/B testing as they balance risk appropriately.

Question 5

How long should I run my A/B test?

Accepted Answer

Run your A/B test until you've reached the calculated sample size, regardless of what the results look like mid-test. The duration depends on your daily visitor volume. If you need 10,000 visitors total and get 1,000/day, you'll need about 10 days. At 5,000/day, that's 2 days. Running tests longer than needed wastes resources, but stopping early risks false positives - you might declare a 'winner' that's just random luck. Use analytics to track progress toward your sample size goal.

Question 6

What baseline conversion rate should I use?

Accepted Answer

Your baseline conversion rate is your current control variant's conversion rate. If you're testing a website redesign and currently 2.5% of visitors convert, use 2.5% as your baseline. You can get this from your analytics tools (Google Analytics, Rybbit Analytics, etc.). Use your most recent data that's representative of normal traffic. Seasonal variations and traffic changes can affect your baseline, so update it if conditions change significantly. More accurate baselines lead to more realistic sample size calculations.

A/B Test Sample Size Calculator

What Is Sample Size?

Why Sample Size Matters for A/B Tests

Understanding Statistical Significance

Confidence Levels and Statistical Power

Confidence Level (Type I Error Protection)

Statistical Power (Type II Error Protection)

Minimum Detectable Effect (MDE)

How to Use This Calculator

Frequently Asked Questions

Related Tools

Run better A/B tests with Rybbit