PolymathicAll ideas →
Mathematics

Probability Distributions

The shape of a random variable — uniform, normal, Poisson, each capturing a different kind of randomness.

Probability theory has a bestiary. Each named distribution is a particular shape of randomness, fitted to a particular kind of process. The binomial counts successes in a fixed number of trials. The Poisson counts arrivals in a window of time. The exponential measures the wait between arrivals. The normal is what averages tend to look like. The power law describes inequality. Each distribution has its own iconography — the bell, the long tail, the thin spike — and learning the bestiary is most of what "having statistical intuition" actually means.

A probability distribution is a complete description of a random variable's behavior — a recipe that says how likely each possible value is. For discrete random variables, the description is a probability mass function p(x) giving the probability of each value, summing to one across all possibilities. For continuous random variables, the description is a probability density function f(x) — not a probability of any single value (which is zero), but a density whose integral over an interval gives the probability of falling in that interval. The major families form a kind of periodic table. Uniform distributions describe complete ignorance over a bounded range. Bernoulli describes a single yes/no trial. Binomial generalizes Bernoulli to N trials. Geometric counts trials until the first success. Poisson describes rare independent events arriving at a constant rate (radioactive decays, customer arrivals, mutations per generation). Exponential gives the waiting time between Poisson events. Normal (the bell curve) describes the sum of many small independent contributions — its appearance is enforced by the Central Limit Theorem. Gamma and beta generalize exponential and uniform respectively, and serve as conjugate priors in Bayesian inference. Power-law distributions describe phenomena where one big thing dominates many small things — wealth, city sizes, earthquake magnitudes — and have fat tails that the normal distribution dramatically underestimates. Each distribution has parameters that fit it to data; statistical inference is largely the practice of choosing a family and estimating those parameters.

Why it matters now

Every applied statistical model picks a distribution as its noise model or generative process. Linear regression assumes normal residuals. Logistic regression assumes Bernoulli outcomes. Survival analysis uses Weibull or exponential distributions. Reinforcement learning uses categorical distributions over actions. Statistical mechanics uses the Boltzmann distribution to describe energy levels in a thermal system. Insurance pricing increasingly uses heavy-tailed distributions because catastrophic risks turn out not to be normally distributed — a lesson the 2008 financial crisis taught the hard way to firms whose risk models had assumed they were.

Further readingWasserman's All of Statistics (2004) gives the working data-scientist's compact tour; for deeper mathematical grounding, Casella and Berger's Statistical Inference (2001) is the standard graduate text. Ross's A First Course in Probability (2019) is the gentlest entry. The Poisson, exponential, and gamma families are taught most memorably in Pitman's Probability (1993). For a modern Bayesian framing of the same families, Gelman et al.'s Bayesian Data Analysis (2013) is the reference.
Read it in Polymathic →Browse the catalogue
Polymathic — a curated catalogue of the ideas worth keeping across twelve disciplines. polymathic.app