Law of Large Numbers

Run enough trials and the average becomes the expectation — the reason casinos always win.

Suggested next → Power Laws & Fat Tails · SYS

The brief

Run an experiment with random outcomes — a coin flip, a die roll, a draw from any probability distribution — repeatedly, and average the results. As the number of trials grows, the average converges to the expectation. This is the law of large numbers. It was first proved rigorously by Jacob Bernoulli in his posthumous Ars Conjectandi (1713) for the special case of binary outcomes; Khinchin (1929) gave the weak form for general distributions; Kolmogorov (1933) the strong form. The theorem is the conceptual foundation of all of frequentist statistics: probabilities are observable as long-run frequencies precisely because the law of large numbers makes them so. It is also the reason casinos always win.

Let X₁, X₂, X₃, … be independent and identically distributed random variables with finite expectation μ = 𝔼[Xᵢ], and let X̄ₙ = (X₁ + … + Xₙ)/n be the sample average. The weak law of large numbers says X̄ₙ converges to μ in probability: for any ε > 0, P(|X̄ₙ − μ| > ε) → 0 as n → ∞. The strong law says X̄ₙ converges to μ almost surely. The proof of the weak law follows from Chebyshev's inequality applied to the sample average, whose variance σ²/n shrinks linearly in n. What the law does not say: it does not say individual deviations get smaller (the sum Σ Xᵢ continues to fluctuate, with fluctuations of order √n by the central limit theorem); it says the average of deviations gets smaller. The gambler's fallacy treats the law as enforcing balance — a string of bad luck "due to be balanced" by good luck — but each trial is independent, and the long-run convergence is not enforced by a mechanism that catches up. Casinos exploit the law as their business model: each game is a small expected loss for the customer; aggregated over millions of games, the law makes the casino's gross income statistically near-deterministic. Insurance operates on the same principle: individual loss outcomes are unpredictable, but the aggregate loss across a large pool of policyholders is predictable enough to price. The failure modes are correlated samples (when Xᵢ are not independent, the law can fail), infinite variance (the Cauchy distribution is the classical example), and fat-tailed distributions where convergence is technically valid but very slow.

Why nowStatistical estimation — means, proportions, regression coefficients, machine-learning training metrics — is justified by the law of large numbers: as the sample grows, sample-based estimates converge to population values. Monte Carlo methods in physics, finance, and engineering rely on the law to converge random sampling to true integrals. Insurance prices risk on the assumption that aggregated losses are LLN-stable; catastrophe insurance is the difficult tail where the law fails. Portfolio diversification averages out idiosyncratic risk by the LLN; systemic risk (correlated downturns) is the failure mode 2008 demonstrated. AI training relies on the LLN at every scale: stochastic gradient descent over mini-batches assumes batch averages of gradients converge to true gradients, which they do if the data is i.i.d. — and which they don't, exactly, when the data is correlated.

Further readingArs Conjectandi (Bernoulli, 1713). Foundations of the Theory of Probability (Kolmogorov, 1933). Probability and Measure (Billingsley, 3rd ed., 1995). Probability: Theory and Examples (Durrett, 5th ed., 2019).