P-Value Calculator
Convert a z-score, t-statistic, or chi-square statistic to a p-value. Supports one-tailed and two-tailed tests.
Results
Understanding P-Values
The p-value is one of the most cited (and misunderstood) statistics in science. It is the probability of observing a test statistic at least as extreme as the one computed, assuming the null hypothesis H₀ is true. It does not tell you the probability that H₀ is true or false.
Common Significance Thresholds
p < 0.05: significant (5% threshold). p < 0.01: very significant. p < 0.001: highly significant. These are conventions, not laws. In some fields (particle physics), 0.0000003 (5σ) is required for discovery claims.
One-Tailed vs Two-Tailed
Two-tailed p-value: probability of a result this extreme in either direction. Use this unless you have a strong, pre-specified directional hypothesis. One-tailed p is half the two-tailed p for symmetric distributions (z, t). Chi-square tests are inherently one-tailed (right tail only) since χ² ≥ 0.
Worked Example
A researcher calculates a z-score of 2.10 from a one-sample z-test. For a two-tailed test, the p-value = 2 × P(Z > 2.10) = 2 × 0.0179 = 0.0357. Since 0.0357 < 0.05, the result is statistically significant at the 5% level. For a one-tailed (right) test, p = 0.0179, which is significant at both the 5% and 2% levels.
P-Value Interpretation Reference
| P-Value Range | Common Interpretation | Convention |
|---|---|---|
| p < 0.001 | Highly significant | Reported as p < 0.001 |
| 0.001 ≤ p < 0.01 | Very significant | Strong evidence against H₀ |
| 0.01 ≤ p < 0.05 | Significant | Standard threshold |
| 0.05 ≤ p < 0.10 | Marginally significant | Weak evidence, report actual p |
| p ≥ 0.10 | Not significant | Fail to reject H₀ |
Frequently Asked Questions
P(data | H₀) — the probability of observing data at least as extreme as yours, assuming H₀ is true. A small p doesn't mean H₀ is false; it means the data is unlikely under H₀. Multiple testing, sample size, and effect size all matter.
Less than 5% of random samples from a H₀ population would produce a test statistic this extreme. By convention, we label this "statistically significant." It's not a statement about probability of truth.
Both are bell-shaped and symmetric. The t-distribution has heavier tails, especially for small degrees of freedom. As df → ∞, the t-distribution converges to the standard normal (z). Use z when σ is known; use t otherwise.
No. P-values do not prove H₀ or H₁. A significant result says the data is unlikely under H₀. Always consider effect size, confidence intervals, replication, and domain knowledge alongside p-values.
If you run 20 independent tests each at α=0.05, you expect about 1 false positive by chance even if all null hypotheses are true (0.05×20=1). To correct for this, researchers apply Bonferroni correction (divide α by the number of tests) or other adjustments. For example, running 10 tests requires each individual p-value to be below 0.005 to maintain an overall 5% error rate.
P-hacking (or data dredging) is the practice of running many analyses and selectively reporting only those with p < 0.05. Because random chance produces false positives, cherry-picking results inflates the apparent discovery rate. Pre-registering hypotheses before data collection and reporting all analyses are standard safeguards against p-hacking in scientific research.