T-Test Calculator
One-sample or two-sample (independent) t-test. Enter raw data to get t-statistic, degrees of freedom, and two-tailed p-value.
Results
The T-Test
The t-test is among the most widely used statistical tests. It compares means and accounts for sampling variability using the t-distribution (which has heavier tails than the normal distribution — especially important for small samples).
One-Sample T-Test
Tests whether a sample mean differs from a hypothesized population mean μ₀. Formula: t = (x̄ − μ₀) / (s/√n). The larger |t|, the less likely the sample came from a population with mean μ₀.
Two-Sample (Welch's) T-Test
Tests whether two independent group means are equal. Uses Welch's formula (unequal variance version): t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂). Degrees of freedom are computed via the Welch-Satterthwaite equation.
Worked Example: One-Sample T-Test
A manufacturer claims their bolts have a mean diameter of 10 mm. You measure a sample of 8 bolts: 10.2, 9.8, 10.5, 10.1, 9.9, 10.3, 10.4, 10.0 mm. Is the true mean significantly different from 10?
x̄ = 81.2/8 = 10.15 mm. Sample std dev s ≈ 0.232. Standard error = 0.232/√8 ≈ 0.082. t = (10.15 − 10)/0.082 ≈ 1.83 with df = 7. For a two-tailed test, p ≈ 0.11. Since p > 0.05, we do not reject the null hypothesis — there is insufficient evidence the true mean differs from 10 mm.
T-Test Assumptions and When to Use Each Type
| Test Type | Use When | Key Formula |
|---|---|---|
| One-Sample | Comparing sample mean to a known standard | t = (x̄ − μ₀) / (s/√n) |
| Two-Sample (Welch's) | Comparing two independent group means | t = (x̄₁−x̄₂)/√(s₁²/n₁+s₂²/n₂) |
| Paired | Same subjects measured twice (before/after) | Run one-sample t-test on differences |
Frequently Asked Questions
A t-test tests whether sample mean(s) are consistent with a null hypothesis (H₀). The t-statistic measures how many standard errors the observed mean is from the hypothesized mean. A large |t| gives a small p-value.
The p-value is the probability of observing a t-statistic this extreme (in either direction for two-tailed) if H₀ were true. p < 0.05 is the common threshold for rejecting H₀ and concluding a significant difference exists.
Independent (two-sample): two separate groups, different people. Paired: same subjects measured twice (before/after). For paired data, subtract the pairs first, then run a one-sample t-test on the differences.
Minimum is n=2 per group (df≥1), but for reliable results you generally want n≥10–30. Very small samples have wide confidence intervals and low power to detect real differences.
Statistical power is the probability of correctly rejecting a false null hypothesis. Larger sample sizes increase power — they reduce the standard error, making it easier to detect real differences. A power of 0.80 (80%) is the common target, meaning an 80% chance of detecting a true effect. Use a power analysis to determine the minimum sample size needed before collecting data.
Almost always use a two-tailed test unless you have a strong theoretical reason to predict the direction of the difference before collecting data. One-tailed tests have more power to detect effects in one direction, but using them selectively after seeing the data inflates your false positive rate. Most published research uses two-tailed tests.
Cohen's d measures the practical significance of a t-test result: d = (x̄ − μ₀) / s for one-sample, or d = (x̄₁ − x̄₂) / pooled_s for two-sample. Values of 0.2 are small, 0.5 medium, and 0.8 large. A result can be statistically significant (small p-value) with a tiny effect size if the sample is very large — always report effect size alongside p-values.