The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one computed from your sample, assuming the null hypothesis is true. A small p-value (< 0.05) suggests the data is inconsistent with H₀.

What does p < 0.05 mean?

There is less than a 5% probability of seeing a result this extreme if the null hypothesis were true. By convention, p < 0.05 is the threshold for statistical significance, though this threshold is arbitrary and context-dependent.

Can p-value prove the null hypothesis?

No. A large p-value means you failed to reject H₀, not that H₀ is true. 'Absence of evidence is not evidence of absence.' The test may lack statistical power to detect a real effect, especially with small samples.

P-Value Calculator

Convert a z-score, t-statistic, or chi-square statistic to a p-value. Supports one-tailed and two-tailed tests.

Results

Understanding P-Values

The p-value is one of the most cited (and misunderstood) statistics in science. It is the probability of observing a test statistic at least as extreme as the one computed, assuming the null hypothesis H₀ is true. It does not tell you the probability that H₀ is true or false.

Common Significance Thresholds

p < 0.05: significant (5% threshold). p < 0.01: very significant. p < 0.001: highly significant. These are conventions, not laws. In some fields (particle physics), 0.0000003 (5σ) is required for discovery claims.

One-Tailed vs Two-Tailed

Two-tailed p-value: probability of a result this extreme in either direction. Use this unless you have a strong, pre-specified directional hypothesis. One-tailed p is half the two-tailed p for symmetric distributions (z, t). Chi-square tests are inherently one-tailed (right tail only) since χ² ≥ 0.

Worked Example

A researcher calculates a z-score of 2.10 from a one-sample z-test. For a two-tailed test, the p-value = 2 × P(Z > 2.10) = 2 × 0.0179 = 0.0357. Since 0.0357 < 0.05, the result is statistically significant at the 5% level. For a one-tailed (right) test, p = 0.0179, which is significant at both the 5% and 2% levels.

P-Value Interpretation Reference

P-Value Range	Common Interpretation	Convention
p < 0.001	Highly significant	Reported as p < 0.001
0.001 ≤ p < 0.01	Very significant	Strong evidence against H₀
0.01 ≤ p < 0.05	Significant	Standard threshold
0.05 ≤ p < 0.10	Marginally significant	Weak evidence, report actual p
p ≥ 0.10	Not significant	Fail to reject H₀

Frequently Asked Questions

P(data | H₀) — the probability of observing data at least as extreme as yours, assuming H₀ is true. A small p doesn't mean H₀ is false; it means the data is unlikely under H₀. Multiple testing, sample size, and effect size all matter.

Less than 5% of random samples from a H₀ population would produce a test statistic this extreme. By convention, we label this "statistically significant." It's not a statement about probability of truth.

Both are bell-shaped and symmetric. The t-distribution has heavier tails, especially for small degrees of freedom. As df → ∞, the t-distribution converges to the standard normal (z). Use z when σ is known; use t otherwise.

No. P-values do not prove H₀ or H₁. A significant result says the data is unlikely under H₀. Always consider effect size, confidence intervals, replication, and domain knowledge alongside p-values.

If you run 20 independent tests each at α=0.05, you expect about 1 false positive by chance even if all null hypotheses are true (0.05×20=1). To correct for this, researchers apply Bonferroni correction (divide α by the number of tests) or other adjustments. For example, running 10 tests requires each individual p-value to be below 0.005 to maintain an overall 5% error rate.

P-hacking (or data dredging) is the practice of running many analyses and selectively reporting only those with p < 0.05. Because random chance produces false positives, cherry-picking results inflates the apparent discovery rate. Pre-registering hypotheses before data collection and reporting all analyses are standard safeguards against p-hacking in scientific research.

Formula sources & accuracy standards: Calculator Methodology · Editorial Policy