Binomial distribution versus Normal distribution comparison chart
Edit this comparison chartBinomial distributionNormal distribution
Introduction (from Wikipedia) In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. In probability theory and statistics, the normal (or Gaussian) distribution is a continuous probability distribution characterized by a symmetric bell-shaped curve and fully specified by its mean μ and variance σ².
Type of distribution Discrete Continuous
History / Discoverer Formalized by Jacob Bernoulli in Ars Conjectandi, published posthumously in 1713. The underlying binomial coefficients were studied much earlier (e.g., by Pascal and Pingala). First derived by Abraham de Moivre in 1733 as the limiting form of the binomial distribution. Later developed independently by Carl Friedrich Gauss (1809, in the context of measurement error) and Pierre-Simon Laplace, who extended it via the central
Notation B(n, p) N(μ, σ²)
Parameters n ∈ N₀ — number of trials; p ∈ [0,1] — success probability in each trial μ ∈ R — mean (location); σ² > 0 — variance (squared scale)
Support k ∈ {0, 1, …, n} — number of successes (integer-valued) x ∈ R — all real numbers (real-valued)
Bounded vs unbounded Bounded. Values are constrained to the closed integer interval [0, n] — there is a hard floor (zero successes) and a hard ceiling (n successes). Unbounded. Values extend from −∞ to +∞, although probability mass becomes vanishingly small far from the mean. A practical consequence: using a normal to model an inherently non-negative quantity (e.g., counts, prices) can yield nonsensical neg
Probability function Probability mass function (PMF): P(X = k) = C(n, k) · pᵏ · (1 − p)ⁿ⁻ᵏ. The PMF gives an actual probability — P(X = k) is the probability of exactly k successes. Probability density function (PDF): f(x) = (1 / (σ√(2π))) · exp(−(x − μ)² / (2σ²)). Important: f(x) is a density, not a probability — for a continuous distribution P(X = x) = 0 for every x. Probabilities are obtained only by integratin
Shape Symmetric bell shape only when p = 0.5. Right-skewed when p < 0.5, left-skewed when p > 0.5. Becomes more bell-shaped as n grows for any fixed p ∈ (0, 1). Symmetric bell-shaped curve for all parameter values. The shape is the same up to location (μ) and scale (σ) — every normal distribution is a shifted and scaled standard normal.
Mean np μ
Median ⌊np⌋ or ⌈np⌉ (and exactly np when np is an integer) μ (mean, median, and mode all coincide)
Mode ⌊(n + 1)p⌋ or ⌊(n + 1)p⌋ − 1 μ
Variance np(1 − p) — maximized at p = 0.5 σ²
Standard deviation √(np(1 − p)) σ
Skewness (1 − 2p) / √(np(1 − p)). Equals 0 only when p = 0.5; positive (right-skewed) when p < 0.5, negative when p > 0.5. 0 (always perfectly symmetric)
Excess kurtosis (1 − 6p(1 − p)) / (np(1 − p)) 0
Typical examples / use cases Number of heads in n coin flips; number of defective units in a batch of n; number of click-throughs in n ad impressions; pass/fail counts in n independent trials; number of voters preferring a candidate in a sample of n; any count of yes/no outcomes Heights, weights, and IQ scores in a population; measurement and instrument errors; residuals in linear regression; aggregate returns in finance (often assumed); biological traits influenced by many small additive factors. Used as the asymptotic dist
Related distributions Special case of the Bernoulli distribution when n = 1. Generalized by the multinomial distribution (more than two outcomes per trial). Approximated by the Poisson distribution when n is large and p is small (with λ = np). Approximated by the normal The standard normal N(0, 1) is obtained by the transformation Z = (X − μ)/σ. Sums and averages of independent normals are normal. Closely related to the t-distribution (small samples), chi-squared (sum of squared standard normals), F-distribution
Relationship between the two Approaches the normal distribution as n grows large — the De Moivre–Laplace theorem, a special case of the central limit theorem. Rule of thumb: N(np, np(1 − p)) is a good approximation when both np ≥ 10 and n(1 − p) ≥ 10. When using the The central limit theorem makes the normal the limiting distribution of standardized sums of i.i.d. random variables with finite variance. As a consequence, the normal arises as the large-parameter approximation to many other distributions — includ

Add content for Binomial distribution vs. Normal distribution or review and improve the comparison table above.