Tools · Decision-first

A/B Test Calculator

Two numbers most dashboards won’t give you straight: is B actually better than A, and how sure can you be? Enter the results to get lift, a p-value, and a verdict. Then size the next test before you launch it.

Did B beat A?

Variant A (control)

Variant B (challenger)

Rate A
Rate B
Relative lift
p-value

The test: a two-proportion two-tailed z-test. The p-value is the chance of seeing a gap this large if A and B were truly equal; below 0.05 is the usual bar for “real.” The interval is the 95% range for the true difference in rates. It assumes a fixed sample decided in advance — peeking at a live test and stopping when it looks good inflates false positives.

How many do I need?

%
% relative

A 10% relative lift on an 8% baseline means catching B at 8.8%.

Visitors needed per variant

Standard two-proportion power calculation at 95% significance. Halve nothing — that count is per variant, so a two-arm test needs roughly double. Smaller effects cost dramatically more traffic; that trade is the whole planning conversation.

Measurement before celebration. Knowing whether a number is real — and how much traffic it takes to find out — is the same discipline I bring to an AI feature in production.

Email me Eval set sizer