Linear Transformations

Stretches, rotations, projections — every matrix is one of these acting on space.

Suggested next → Eigenvalues & Eigenvectors · MATH · T5

The brief

In 1872, the German mathematician Felix Klein — twenty-three years old and freshly appointed to a chair at Erlangen — gave an inaugural lecture proposing a radical reorganization of geometry. Geometry, Klein argued, is the study of which properties are invariant under which transformations. Euclidean geometry is what survives under rotations, reflections, and translations. Affine geometry is what survives under linear maps that preserve parallel lines. Projective geometry is what survives under arbitrary projections. The transformations come first; the geometry is what's left over. The Erlangen Programme reorganized nearly all of nineteenth-century geometry, and it elevated transformations themselves — particularly linear ones — into the central object of study.

A linear transformation T: V → W is a function between vector spaces that preserves the vector-space operations: T(au + bv) = a·T(u) + b·T(v) for any vectors u, v and any scalars a, b. The condition is mild but powerful: it says the transformation respects both addition and scaling, which is enough to determine T's behavior on the entire space from its behavior on a basis. Once you know what T does to each basis vector, you know what it does to everything — by linearity, you just take the appropriate combination. This is the deep connection between linear transformations and matrices: choose a basis for V and W, and every linear transformation between them is exactly captured by an m×n matrix, where the columns are the images of V's basis vectors. The matrix and the transformation are two views of the same thing. Composition of transformations corresponds to multiplication of matrices — that is why matrix multiplication has the rule it does. Geometric examples in 2D are easy to visualize: rotation by angle θ, reflection across a line, projection onto an axis, scaling by different factors in different directions, shearing (slanting one axis relative to another). Each is a different matrix, and combining them combines the matrices. The kernel of T is the set of vectors that get sent to zero — the directions T destroys. The image is the set of vectors T can produce. The rank-nullity theorem says these two pieces account for all of V: dim(V) = dim(kernel) + dim(image), one of the cleanest balance equations in mathematics.

Why nowModern image processing — every blur, sharpen, rotate, color-correct — is a linear transformation applied pixel by pixel. 3D rendering pipelines compose linear transformations to move from world coordinates to camera coordinates to screen pixels. Principal component analysis in statistics is a linear transformation that projects high-dimensional data onto its most informative directions. Quantum mechanics describes physical observables as linear operators on Hilbert space — the eigenvalues of which are the actual measurable values. Neural-network layers, ignoring their nonlinear activations, are linear transformations of feature vectors; the entire field of deep learning is, very loosely, the study of how to compose linear transformations with simple nonlinearities into models that can learn arbitrary functions.

Further readingStrang's Introduction to Linear Algebra (6th ed., 2023) is the standard, paired with his MIT OCW lectures — together the field's most-used resource. For the geometric-first, transformation-first perspective, Axler's Linear Algebra Done Right (2024) is the modern alternative. 3Blue1Brown's Essence of Linear Algebra video series is the visual companion many learners report as decisive. For the applied side — SVD, PCA, projections — Trefethen and Bau's Numerical Linear Algebra (1997) is unmatched.