Probability Distributions

The shape of a random variable — uniform, normal, Poisson, each capturing a different kind of randomness.

Suggested next → Shannon Entropy · MATH · T5

The brief

Probability theory has a bestiary. Before any creature in it can be named, we need the random variable — a quantity whose value is decided by chance — and its distribution, the full map from outcomes to probabilities. Each named distribution is a particular shape of randomness, fitted to a particular kind of process. The uniform spreads its weight evenly, the picture of pure ignorance. The binomial counts successes in a fixed number of trials. The Poisson counts arrivals in a window of time. The exponential measures the wait between arrivals. The normal is what averages tend to look like, the shape that emerges whenever many small independent effects add up. The power law describes inequality, the world where a few extremes outweigh everything else. Each distribution has its own iconography — the bell, the long tail, the thin spike — and learning the bestiary is most of what "having statistical intuition" actually means.

A probability distribution is a complete description of a random variable's behavior — a recipe that says how likely each possible value is. For discrete random variables, the description is a probability mass function p(x) giving the probability of each value, summing to one across all possibilities. For continuous random variables, the description is a probability density function f(x) — not a probability of any single value (which is zero), but a density whose integral over an interval gives the probability of falling in that interval. Either way the same information lives in the cumulative distribution function, F(x), the probability of landing at or below x, which rises from zero to one and carries the whole shape. The major families form a kind of periodic table. Uniform distributions describe complete ignorance over a bounded range. Bernoulli describes a single yes/no trial. Binomial generalizes Bernoulli to N trials. Poisson describes rare independent events arriving at a constant rate (radioactive decays, customer arrivals, mutations per generation). Exponential gives the waiting time between Poisson events. Normal (the bell curve) describes the sum of many small independent contributions — its appearance is enforced by the Central Limit Theorem. Power-law distributions describe phenomena where one big thing dominates many small things — wealth, city sizes, earthquake magnitudes — and have fat tails that the normal distribution dramatically underestimates. Each distribution has parameters that fit it to data, and is summarized by its moments — the mean fixing its center, the variance its spread, with higher moments capturing skew and the weight of the tails; statistical inference is largely the practice of choosing a family from this catalog and estimating the handful of numbers that pin down which member of it the data came from.

Why nowDistributions are the basic vocabulary of uncertainty across science and finance, and every applied statistical model picks one as its noise model or generative process — linear regression assumes normal residuals, logistic regression Bernoulli outcomes. Insurance pricing increasingly uses heavy-tailed distributions because catastrophic risks turn out not to be normally distributed — a lesson the 2008 financial crisis taught the hard way to firms whose risk models had assumed they were. Even modern generative AI is at bottom a distribution-fitting machine, learning to sample plausible text or images from a vast estimated distribution over data. The recurring discipline is the same everywhere: name the right family, summarize it through its moments, then let the data speak by estimating its parameters — uncertainty made legible enough to compute with.

Further readingWasserman's All of Statistics (2004) gives the working data-scientist's compact tour; for deeper mathematical grounding, Casella and Berger's Statistical Inference (2001) is the standard graduate text. Ross's A First Course in Probability (2019) is the gentlest entry. The Poisson, exponential, and gamma families are taught most memorably in Pitman's Probability (1993). For a modern Bayesian framing of the same families, Gelman et al.'s Bayesian Data Analysis (2013) is the reference.