Big-O Notation

Cost hides in the shape, not the coefficient.

Suggested next → Data Structures · CS·AI

The brief

Big-O notation originated in 1894 with the German mathematician Paul Bachmann, was popularized by Edmund Landau, and was imported into computer science by Donald Knuth in the 1970s. It captures one of the most useful intuitions in algorithm analysis: what matters is not how fast a program runs on a given input, but how its running time grows as the input grows. An O(n²) sort is catastrophically slower than an O(n log n) sort on a million items, regardless of which is faster on ten items. The behaviour at ten items is a rounding error; the behaviour as the data balloons is destiny. Put numbers to it: for n = 1,000,000, an O(n) single pass takes about a million steps, while an O(n²) double loop takes about a trillion — the difference between a coffee break and a geological epoch. Cost hides in the shape of the curve, not the coefficient.

Big-O notation classifies algorithms by their asymptotic growth rate: O(1) constant, O(log n) logarithmic, O(n) linear, O(n log n) linearithmic, O(n²) quadratic, O(2ⁿ) exponential. The notation deliberately throws away constant factors and lower-order terms — it asks how the algorithm scales when the inputs become large, because in that limit the dominant term swamps everything else. On the same million-item input, an O(log n) binary search settles in roughly twenty steps where the linear scan needs a million and the quadratic loop a trillion; the gaps between the rungs of the ladder dwarf any constant a careful coder might shave. That is why a term like 3n² + 7n + 200 is simply written O(n²) — for large n the quadratic part outgrows the linear and the constant so completely that keeping them is false precision. This is exactly the right thing for predicting performance on real workloads, where input sizes routinely vary by many orders of magnitude. The discipline trains a particular kind of algorithmic taste: to look for representations that allow logarithmic search rather than linear; to avoid nested loops that produce quadratic time; to use hashing where possible to reduce O(n) lookups to amortized O(1); to recognize when a problem is intrinsically NP-hard and exact solutions will not scale — the open P vs NP question is, at bottom, whether a whole class of such problems hides a polynomial algorithm no one has yet found. The constants ignored by Big-O are not always negligible in practice — cache effects, memory hierarchy, branch prediction, and parallelization can change the absolute performance by orders of magnitude, and a clean O(n log n) method can lose to a brute-force O(n²) one at small n or under a large hidden constant — but the asymptotic shape is almost always the right starting point for thinking about which approach will work as data grows.

Why nowBig-O is the first thing taught in any algorithms course, the first thing asked in any technical interview, and the first thing forgotten by working programmers who do not exercise the muscle. The current scaling debates — about training large language models, about graph algorithms on social networks, about cryptographic primitives in a post-quantum world, about the energy cost of computation — all run on Big-O reasoning, sometimes explicitly, often implicitly. The mature version of the discipline knows its own edges: asymptotics can mislead at small n or behind a large constant, which is why a galactically efficient algorithm whose constant runs to the millions can be useless against the naive method everyone actually ships. The profiler, not the textbook, has the final word on a real machine, and the seasoned engineer reads the growth class first and then measures. Still, the simple discipline of thinking in growth rates is one of the more durable cognitive upgrades a software career provides.