Binomial distribution vs Normal distribution - Difference and Comparison

Binomial distribution versus Normal distribution comparison chart
	Binomial distribution	Normal distribution
Introduction (from Wikipedia)	In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.	In probability theory and statistics, the normal (or Gaussian) distribution is a continuous probability distribution characterized by a symmetric bell-shaped curve and fully specified by its mean μ and variance σ².
Type of distribution	Discrete	Continuous
History / Discoverer	Formalized by Jacob Bernoulli in Ars Conjectandi, published posthumously in 1713. The underlying binomial coefficients were studied much earlier (e.g., by Pascal and Pingala).	First derived by Abraham de Moivre in 1733 as the limiting form of the binomial distribution. Later developed independently by Carl Friedrich Gauss (1809, in the context of measurement error) and Pierre-Simon Laplace, who extended it via the central
Notation	B(n, p)	N(μ, σ²)
Parameters	n ∈ N₀ — number of trials; p ∈ [0,1] — success probability in each trial	μ ∈ R — mean (location); σ² > 0 — variance (squared scale)
Support	k ∈ {0, 1, …, n} — number of successes (integer-valued)	x ∈ R — all real numbers (real-valued)
Bounded vs unbounded	Bounded. Values are constrained to the closed integer interval [0, n] — there is a hard floor (zero successes) and a hard ceiling (n successes).	Unbounded. Values extend from −∞ to +∞, although probability mass becomes vanishingly small far from the mean. A practical consequence: using a normal to model an inherently non-negative quantity (e.g., counts, prices) can yield nonsensical neg
Probability function	Probability mass function (PMF): P(X = k) = C(n, k) · pᵏ · (1 − p)ⁿ⁻ᵏ. The PMF gives an actual probability — P(X = k) is the probability of exactly k successes.	Probability density function (PDF): f(x) = (1 / (σ√(2π))) · exp(−(x − μ)² / (2σ²)). Important: f(x) is a density, not a probability — for a continuous distribution P(X = x) = 0 for every x. Probabilities are obtained only by integratin
Shape	Symmetric bell shape only when p = 0.5. Right-skewed when p < 0.5, left-skewed when p > 0.5. Becomes more bell-shaped as n grows for any fixed p ∈ (0, 1).	Symmetric bell-shaped curve for all parameter values. The shape is the same up to location (μ) and scale (σ) — every normal distribution is a shifted and scaled standard normal.
Mean	np	μ
Median	⌊np⌋ or ⌈np⌉ (and exactly np when np is an integer)	μ (mean, median, and mode all coincide)
Mode	⌊(n + 1)p⌋ or ⌊(n + 1)p⌋ − 1	μ
Variance	np(1 − p) — maximized at p = 0.5	σ²
Standard deviation	√(np(1 − p))	σ
Skewness	(1 − 2p) / √(np(1 − p)). Equals 0 only when p = 0.5; positive (right-skewed) when p < 0.5, negative when p > 0.5.	0 (always perfectly symmetric)
Excess kurtosis	(1 − 6p(1 − p)) / (np(1 − p))	0
Typical examples / use cases	Number of heads in n coin flips; number of defective units in a batch of n; number of click-throughs in n ad impressions; pass/fail counts in n independent trials; number of voters preferring a candidate in a sample of n; any count of yes/no outcomes	Heights, weights, and IQ scores in a population; measurement and instrument errors; residuals in linear regression; aggregate returns in finance (often assumed); biological traits influenced by many small additive factors. Used as the asymptotic dist
Related distributions	Special case of the Bernoulli distribution when n = 1. Generalized by the multinomial distribution (more than two outcomes per trial). Approximated by the Poisson distribution when n is large and p is small (with λ = np). Approximated by the normal	The standard normal N(0, 1) is obtained by the transformation Z = (X − μ)/σ. Sums and averages of independent normals are normal. Closely related to the t-distribution (small samples), chi-squared (sum of squared standard normals), F-distribution
Relationship between the two	Approaches the normal distribution as n grows large — the De Moivre–Laplace theorem, a special case of the central limit theorem. Rule of thumb: N(np, np(1 − p)) is a good approximation when both np ≥ 10 and n(1 − p) ≥ 10. When using the	The central limit theorem makes the normal the limiting distribution of standardized sums of i.i.d. random variables with finite variance. As a consequence, the normal arises as the large-parameter approximation to many other distributions — includ

Introduction (from Wikipedia)

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.

In probability theory and statistics, the normal (or Gaussian) distribution is a continuous probability distribution characterized by a symmetric bell-shaped curve and fully specified by its mean μ and variance σ².

Type of distribution

Discrete

Continuous

History / Discoverer

Formalized by Jacob Bernoulli in Ars Conjectandi, published posthumously in 1713. The underlying binomial coefficients were studied much earlier (e.g., by Pascal and Pingala).

First derived by Abraham de Moivre in 1733 as the limiting form of the binomial distribution. Later developed independently by Carl Friedrich Gauss (1809, in the context of measurement error) and Pierre-Simon Laplace, who extended it via the central

Notation

B(n, p)

N(μ, σ²)

Parameters

n ∈ N₀ — number of trials; p ∈ [0,1] — success probability in each trial

μ ∈ R — mean (location); σ² > 0 — variance (squared scale)

Support

k ∈ {0, 1, …, n} — number of successes (integer-valued)

x ∈ R — all real numbers (real-valued)

Bounded vs unbounded

Bounded. Values are constrained to the closed integer interval [0, n] — there is a hard floor (zero successes) and a hard ceiling (n successes).

Unbounded. Values extend from −∞ to +∞, although probability mass becomes vanishingly small far from the mean. A practical consequence: using a normal to model an inherently non-negative quantity (e.g., counts, prices) can yield nonsensical neg

Probability function

Probability mass function (PMF): P(X = k) = C(n, k) · pᵏ · (1 − p)ⁿ⁻ᵏ. The PMF gives an actual probability — P(X = k) is the probability of exactly k successes.

Probability density function (PDF): f(x) = (1 / (σ√(2π))) · exp(−(x − μ)² / (2σ²)). Important: f(x) is a density, not a probability — for a continuous distribution P(X = x) = 0 for every x. Probabilities are obtained only by integratin

Shape

Symmetric bell shape only when p = 0.5. Right-skewed when p < 0.5, left-skewed when p > 0.5. Becomes more bell-shaped as n grows for any fixed p ∈ (0, 1).

Symmetric bell-shaped curve for all parameter values. The shape is the same up to location (μ) and scale (σ) — every normal distribution is a shifted and scaled standard normal.

Mean

np

μ

Median

⌊np⌋ or ⌈np⌉ (and exactly np when np is an integer)

μ (mean, median, and mode all coincide)

Mode

⌊(n + 1)p⌋ or ⌊(n + 1)p⌋ − 1

μ

Variance

np(1 − p) — maximized at p = 0.5

σ²

Standard deviation

√(np(1 − p))

σ

Skewness

(1 − 2p) / √(np(1 − p)). Equals 0 only when p = 0.5; positive (right-skewed) when p < 0.5, negative when p > 0.5.

0 (always perfectly symmetric)

Excess kurtosis

(1 − 6p(1 − p)) / (np(1 − p))

0

Typical examples / use cases

Number of heads in n coin flips; number of defective units in a batch of n; number of click-throughs in n ad impressions; pass/fail counts in n independent trials; number of voters preferring a candidate in a sample of n; any count of yes/no outcomes

Heights, weights, and IQ scores in a population; measurement and instrument errors; residuals in linear regression; aggregate returns in finance (often assumed); biological traits influenced by many small additive factors. Used as the asymptotic dist

Related distributions

Special case of the Bernoulli distribution when n = 1. Generalized by the multinomial distribution (more than two outcomes per trial). Approximated by the Poisson distribution when n is large and p is small (with λ = np). Approximated by the normal

The standard normal N(0, 1) is obtained by the transformation Z = (X − μ)/σ. Sums and averages of independent normals are normal. Closely related to the t-distribution (small samples), chi-squared (sum of squared standard normals), F-distribution

Relationship between the two

Approaches the normal distribution as n grows large — the De Moivre–Laplace theorem, a special case of the central limit theorem. Rule of thumb: N(np, np(1 − p)) is a good approximation when both np ≥ 10 and n(1 − p) ≥ 10. When using the

The central limit theorem makes the normal the limiting distribution of standardized sums of i.i.d. random variables with finite variance. As a consequence, the normal arises as the large-parameter approximation to many other distributions — includ

Binomial distribution vs. Normal distribution