In 1718, Abraham de Moivre published The Doctrine of Chances, a textbook for English gentlemen who wanted to win at gambling. The book introduced — among many other innovations — the systematic computation of what de Moivre called the expectation of a wager: the long-run average a player can expect to win or lose per game if the experiment is repeated indefinitely. The arithmetic was not new, but treating the expectation as the central numerical summary of a random variable was. Three centuries later, every decision-theoretic calculation in economics, statistics, machine learning, and operations research is, at root, an expectation — and the discipline of correctly computing one is most of what it means to think probabilistically.
For a discrete random variable X taking values xᵢ with probabilities pᵢ, the expectation is E[X] = Σ xᵢ · pᵢ — the probability-weighted average of the possible outcomes. For a continuous variable with density ƒ(x), the sum becomes an integral: E[X] = ∫ x · ƒ(x) dx. The number is sometimes called the mean of the distribution. Two properties make expectation more powerful than its definition suggests. First, linearity: E[aX + bY] = a·E[X] + b·E[Y], which holds whether or not X and Y are independent. This is genuinely surprising — the mean of a sum is the sum of the means even when the variables are tangled together — and it is the workhorse of practical probability calculations. Second, expectation extends naturally to functions of random variables: E[g(X)] = Σ g(xᵢ) · pᵢ, which lets you compute moments (E[X²], E[X³]), variance (E[(X − μ)²]), and any other distributional summary. The law of large numbers (Jacob Bernoulli, 1713; rigorized by Khinchin and Kolmogorov in the twentieth century) is the rigorous statement that, as the number of independent samples grows, the sample average converges to the expectation — the formal justification for the gambling-house edge, the casino's mathematical guarantee, and the entire frequentist interpretation of probability. Conditional expectation — E[X | Y] — is a refinement that is itself a random variable (a function of Y), and it underlies most of modern stochastic analysis: martingales, filtrations, and the theory of optimal prediction.
Expected utility is the foundational object of economic decision theory. Expected loss is what every machine-learning training loop minimizes. Risk-neutral expectation is what financial derivative prices are computed as. Insurance premiums are expectations plus a margin. Reinforcement learning picks actions to maximize expected long-run reward. Markov chains are characterized by transition expectations. The little de Moivre formula is now baked into nearly every quantitative practice that involves uncertainty — and the gap between intuition about averages and correctly computed expectations is one of the most reliable sources of avoidable error in expert judgment.