A t-test is a statistical test that determines whether there is a significant difference between means. The one-sample t-test compares a sample mean to a known value. The two-sample t-test compares the means of two independent groups.

When do I use a t-test vs z-test?

Use a t-test when the population standard deviation is unknown (which is almost always). Use a z-test when the population standard deviation is known and the sample is large (n > 30). In practice, t-tests are far more common.

What is a two-tailed vs one-tailed t-test?

Two-tailed: tests whether the mean differs in either direction. One-tailed: tests whether the mean is specifically higher or lower. Use two-tailed unless you have a specific directional hypothesis established before seeing the data.

What is Welch's t-test?

Welch's t-test is a version of the two-sample t-test that doesn't assume equal variances. It uses a more complex degrees-of-freedom calculation (Welch-Satterthwaite equation) and is more robust. It's the default in most software.

T-Test Calculator

One-sample or two-sample (independent) t-test. Enter raw data to get t-statistic, degrees of freedom, and two-tailed p-value.

Results

The T-Test

The t-test is among the most widely used statistical tests. It compares means and accounts for sampling variability using the t-distribution (which has heavier tails than the normal distribution — especially important for small samples).

One-Sample T-Test

Tests whether a sample mean differs from a hypothesized population mean μ₀. Formula: t = (x̄ − μ₀) / (s/√n). The larger |t|, the less likely the sample came from a population with mean μ₀.

Two-Sample (Welch's) T-Test

Tests whether two independent group means are equal. Uses Welch's formula (unequal variance version): t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂). Degrees of freedom are computed via the Welch-Satterthwaite equation.

Worked Example: One-Sample T-Test

A manufacturer claims their bolts have a mean diameter of 10 mm. You measure a sample of 8 bolts: 10.2, 9.8, 10.5, 10.1, 9.9, 10.3, 10.4, 10.0 mm. Is the true mean significantly different from 10?

x̄ = 81.2/8 = 10.15 mm. Sample std dev s ≈ 0.232. Standard error = 0.232/√8 ≈ 0.082. t = (10.15 − 10)/0.082 ≈ 1.83 with df = 7. For a two-tailed test, p ≈ 0.11. Since p > 0.05, we do not reject the null hypothesis — there is insufficient evidence the true mean differs from 10 mm.

T-Test Assumptions and When to Use Each Type

Test Type	Use When	Key Formula
One-Sample	Comparing sample mean to a known standard	t = (x̄ − μ₀) / (s/√n)
Two-Sample (Welch's)	Comparing two independent group means	t = (x̄₁−x̄₂)/√(s₁²/n₁+s₂²/n₂)
Paired	Same subjects measured twice (before/after)	Run one-sample t-test on differences

Frequently Asked Questions

A t-test tests whether sample mean(s) are consistent with a null hypothesis (H₀). The t-statistic measures how many standard errors the observed mean is from the hypothesized mean. A large |t| gives a small p-value.

The p-value is the probability of observing a t-statistic this extreme (in either direction for two-tailed) if H₀ were true. p < 0.05 is the common threshold for rejecting H₀ and concluding a significant difference exists.

Independent (two-sample): two separate groups, different people. Paired: same subjects measured twice (before/after). For paired data, subtract the pairs first, then run a one-sample t-test on the differences.

Minimum is n=2 per group (df≥1), but for reliable results you generally want n≥10–30. Very small samples have wide confidence intervals and low power to detect real differences.

Statistical power is the probability of correctly rejecting a false null hypothesis. Larger sample sizes increase power — they reduce the standard error, making it easier to detect real differences. A power of 0.80 (80%) is the common target, meaning an 80% chance of detecting a true effect. Use a power analysis to determine the minimum sample size needed before collecting data.

Almost always use a two-tailed test unless you have a strong theoretical reason to predict the direction of the difference before collecting data. One-tailed tests have more power to detect effects in one direction, but using them selectively after seeing the data inflates your false positive rate. Most published research uses two-tailed tests.

Cohen's d measures the practical significance of a t-test result: d = (x̄ − μ₀) / s for one-sample, or d = (x̄₁ − x̄₂) / pooled_s for two-sample. Values of 0.2 are small, 0.5 medium, and 0.8 large. A result can be statistically significant (small p-value) with a tiny effect size if the sample is very large — always report effect size alongside p-values.

Formula sources & accuracy standards: Calculator Methodology · Editorial Policy