PolymathicAll ideas →
Computer Science & AI

Backpropagation

Blame flows backward along the chain rule.

In 1986, a paper by Rumelhart, Hinton, and Williams in Nature gave the most influential treatment of an algorithm called backpropagation — a way of efficiently computing how a neural network's weights should change to reduce its error. The algorithm itself had been worked out independently several times in the 1960s and 70s; the 1986 paper was the cultural moment at which the deep-learning community recognized that the problem of training multi-layer networks had been solved in principle. Forty years later, every modern AI system runs on backpropagation, and the consequences have rearranged industries.

Backpropagation is the chain rule applied at industrial scale. A neural network is a chain of differentiable transformations from input to output to a loss function (how wrong the network is). The chain rule lets you compute the gradient of the loss with respect to every parameter by propagating error signals backward through the network, layer by layer, multiplying local Jacobians. With the gradient in hand, gradient descent (or a stochastic variant) updates the parameters slightly in the direction of lower loss. Repeat for billions of training examples. The genius of the technique is computational efficiency: a forward pass and a backward pass each cost O(network size), which makes training possible on networks with hundreds of billions of parameters. The early neural-net winters of the 1970s and 1990s were caused by related but different problems — vanishing gradients, insufficient data, insufficient compute — that took decades to resolve. The 2012 AlexNet result on ImageNet, which dramatically beat hand-engineered computer-vision pipelines, was the empirical demonstration that deep networks trained with backpropagation could outperform classical methods on hard real-world problems. Everything since — image generation, voice assistants, AlphaGo, GPT, Claude, AlphaFold — has been an application or extension of the same paradigm.

Why it matters now

Backpropagation is, by orders of magnitude, the most economically consequential algorithm of the twenty-first century. The current frontier — large language models, diffusion models, multimodal systems, robotics policies — is all backpropagation at increasing scale. The biological-plausibility critique (real neurons probably do not implement backprop) remains an active research question for theoretical neuroscience, but the pragmatic AI community treats it as a non-issue: whatever the brain does, backprop works.

Read it in Polymathic →Browse the catalogue
Polymathic — a curated catalogue of the ideas worth keeping across twelve disciplines. polymathic.app