Expectation

The long-run average — the number a random variable equals if you let it run forever.

Suggested next → Hypothesis Testing · MATH · T5

The brief

In 1718, Abraham de Moivre published The Doctrine of Chances, a textbook for English gentlemen who wanted to win at gambling. The book introduced — among many other innovations — the systematic computation of what de Moivre called the expectation of a wager: the long-run average a player can expect to win or lose per game if the experiment is repeated indefinitely. The arithmetic was not new, but treating the expectation as the central numerical summary of a random variable was. Three centuries later, every decision-theoretic calculation in economics, statistics, machine learning, and operations research is, at root, an expectation — and the discipline of correctly computing one is most of what it means to think probabilistically.

For a discrete random variable X taking values xᵢ with probabilities pᵢ, the expectation is E[X] = Σ xᵢ · pᵢ — the probability-weighted average of the possible outcomes. For a continuous variable with density ƒ(x), the sum becomes an integral: E[X] = ∫ x · ƒ(x) dx. The number is sometimes called the mean of the distribution. Two properties make expectation more powerful than its definition suggests. First, linearity: E[aX + bY] = a·E[X] + b·E[Y], which holds whether or not X and Y are independent. This is genuinely surprising — the mean of a sum is the sum of the means even when the variables are tangled together — and it is the workhorse of practical probability calculations. Second, expectation extends naturally to functions of random variables: E[g(X)] = Σ g(xᵢ) · pᵢ, which lets you compute moments (E[X²], E[X³]), variance (E[(X − μ)²]), and any other distributional summary. The law of large numbers (Jacob Bernoulli, 1713; rigorized by Khinchin and Kolmogorov in the twentieth century) is the rigorous statement that, as the number of independent samples grows, the sample average converges to the expectation — the formal justification for the gambling-house edge, the casino's mathematical guarantee, and the entire frequentist interpretation of probability. Conditional expectation — E[X | Y] — is a refinement that is itself a random variable (a function of Y), and it underlies most of modern stochastic analysis: martingales, filtrations, and the theory of optimal prediction.

Why nowExpected utility is the foundational object of economic decision theory. Expected loss is what every machine-learning training loop minimizes. Risk-neutral expectation is what financial derivative prices are computed as. Insurance premiums are expectations plus a margin. Reinforcement learning picks actions to maximize expected long-run reward. Markov chains are characterized by transition expectations. The little de Moivre formula is now baked into nearly every quantitative practice that involves uncertainty — and the gap between intuition about averages and correctly computed expectations is one of the most reliable sources of avoidable error in expert judgment.

Further readingFeller (vol. I) and Ross both develop expectation rigorously alongside the distribution families. For the philosophical depth — what an expected value is and when the long run is meaningful — Ian Hacking's The Emergence of Probability (1975) and An Introduction to Probability and Inductive Logic (2001) are essential. Persi Diaconis and Brian Skyrms's Ten Great Ideas about Chance (2018) is a warmer companion. For the financial misuse of expectation under fat tails, Taleb's Fooled by Randomness (2001) remains the popular classic.