Compilers & Interpreters

How text becomes a running program — lex, parse, walk. Every programmer relies on it; few look inside.

Suggested next → Language Faculty · MIND

The brief

In 1957, an IBM team led by John Backus released FORTRAN and settled an argument most programmers thought unwinnable. Until then, to make a computer fast you wrote its raw instructions by hand; the notion that a machine could translate human-friendly formulas into code as tight as an expert's own struck many as fanciful. Backus's team spent three years proving otherwise, and the result ran close enough to hand-written assembly that the objection simply collapsed. Programming was never the same afterward: you could write what you meant — a mathematical formula — and let a program, the compiler, work out the machine's end of it. Nearly seventy years later, under all the sophistication, every compiler is still doing that one job.

A compiler is best understood as a sequence of translations, each lowering the program one rung closer to the machine. It begins by reading the raw text and grouping the characters into words; then it works out the grammatical structure, much as one diagrams a sentence, turning the flat stream of words into a branching tree that records what nests inside what. From that tree it builds an inner representation deliberately stripped of human conveniences — a plain, regular form that is easy to rearrange — and here the real labor of optimization happens: constants are computed ahead of time, unreachable code is thrown away, repeated calculations are shared, loops are reshaped, all to make the eventual output faster without changing what it does. Only at the very end does it emit the actual instructions for a particular processor. The old dividing line between compiled and interpreted languages has largely dissolved: most so-called interpreted languages quietly compile to a compact intermediate code first, and the fastest systems watch a program as it runs, notice which stretches are hot, and compile just those to native code on the fly. Running through all of it is the type system, the compiler's deepest source of power — the body of rules deciding which mistakes it can catch before the program is ever run. A strict type system refuses to build a program that tries to add a number to a piece of text; a permissive one lets that error wait until the code is live and a user is watching it fail. That single choice, far more than raw speed, is what gives one language its cautious character and another its freewheeling one.

Why nowThe compiler stack has quietly become the substrate of the AI boom. The open compiler infrastructure LLVM, begun around 2000 as a graduate-school project, now underlies the toolchains of language after language, while a specialized layer of compilers translates the high-level Python of machine-learning models into the dense GPU instructions a training run actually executes — and how well they do it is a large part of why one lab's models train faster than another's. WebAssembly lets code written in almost any language run at near-native speed inside a browser. And a new twist has appeared at the top of the stack: AI assistants now write much of the source code that the traditional compiler then turns into machine instructions, leaving genuinely open the question of whether the human-readable language in the middle will always be needed at all.