Big-O notation originated in 1894 with the German mathematician Paul Bachmann, was popularized by Edmund Landau, and was imported into computer science by Donald Knuth in the 1970s. It captures one of the most useful intuitions in algorithm analysis: what matters is not how fast a program runs on a given input, but how its running time grows as the input grows. An O(n²) sort is catastrophically slower than an O(n log n) sort on a million items, regardless of which is faster on ten items. Cost hides in the shape of the curve, not the coefficient.
Big-O notation classifies algorithms by their asymptotic growth rate: O(1) constant, O(log n) logarithmic, O(n) linear, O(n log n) linearithmic, O(n²) quadratic, O(2ⁿ) exponential. The notation deliberately throws away constant factors and lower-order terms — it asks how the algorithm scales when the inputs become large. This is exactly the right thing for predicting performance on real workloads, where input sizes routinely vary by many orders of magnitude. The discipline trains a particular kind of algorithmic taste: to look for representations that allow logarithmic search rather than linear; to avoid nested loops that produce quadratic time; to use hashing where possible to reduce O(n) lookups to amortized O(1); to recognize when a problem is intrinsically NP-hard and exact solutions will not scale. The constants ignored by Big-O are not always negligible in practice — cache effects, memory hierarchy, branch prediction, and parallelization can change the absolute performance by orders of magnitude — but the asymptotic shape is almost always the right starting point for thinking about which approach will work as data grows.
Big-O is the first thing taught in any algorithms course, the first thing asked in any technical interview, and the first thing forgotten by working programmers who do not exercise the muscle. The current scaling debates — about training large language models, about graph algorithms on social networks, about cryptographic primitives in a post-quantum world, about the energy cost of computation — all run on Big-O reasoning, sometimes explicitly, often implicitly. The simple discipline of thinking in growth rates is one of the more durable cognitive upgrades a software career provides.