Rectangles of numbers had been used to organize calculations for two thousand years before anyone gave them a name. The Chinese Nine Chapters on the Mathematical Art (around 100 BCE) presented systems of linear equations in matrix-like tableaux and solved them by what we now call Gaussian elimination — sixteen centuries before Gauss. Carl Friedrich Gauss himself, in 1809, used the technique without naming the object. The naming and abstraction came in 1858, when the British mathematician Arthur Cayley — a barrister by training, a mathematician by inclination — published A Memoir on the Theory of Matrices. Cayley took the rectangle out of its calculational context and made it a thing in its own right: an object with its own algebra, its own multiplication rule, and its own surprises.
A matrix is a rectangular grid of numbers, with m rows and n columns. Matrix addition is component-wise (and only defined when the dimensions agree). Scalar multiplication multiplies every entry by the same number. Matrix multiplication — the Cayley innovation — is the rule that makes matrices interesting: the product AB is defined when A's column count equals B's row count, and (AB)ᵢⱼ is the dot product of A's i-th row with B's j-th column. This rule looks arbitrary until you see why: matrix multiplication corresponds to composing the transformations the matrices represent. Two surprises follow. Matrix multiplication is not commutative: AB is generally not equal to BA. And AB can be zero even when neither A nor B is, which means matrices have zero divisors — the ring of square matrices is not a field. The determinant — a single number assigned to a square matrix — captures whether the matrix is invertible (det ≠ 0) and the volume-scaling factor of its associated transformation. The inverse A⁻¹ exists exactly when the determinant doesn't vanish, and AA⁻¹ = I (the identity matrix). Special families — symmetric, orthogonal, diagonal, triangular, sparse — have efficient algorithms tailored to their structure. The deep observation, however, is that a matrix is a recipe for transforming a vector: the product Av acts on v, sending it to a new vector. This is the bridge to linear transformations and the rest of linear algebra.
Modern machine learning is, structurally, matrix arithmetic at scale: a transformer's parameters are organized into matrices that multiply each other through the network's forward pass; backpropagation differentiates through these multiplications; GPUs are designed around fast matrix-matrix products. Computer graphics renders 3D scenes by multiplying every vertex through a chain of 4×4 transformation matrices. PageRank, the algorithm that made Google, finds the dominant eigenvector of an enormous web-graph matrix. Spreadsheets are matrices with formulas attached. The little rectangle Cayley named in 1858 is now the most-stored, most-multiplied data structure in computational history.