Visual Perception

V1 fires for edges at specific orientations; the cortical hierarchy composes simple features into complex ones.

Suggested next → Color Theory · ART

The brief

In 1959, two postdoctoral researchers at Johns Hopkins — David Hubel and Torsten Wiesel — pushed a microelectrode into the primary visual cortex (area V1) of an anesthetized cat. They tried spots, dots, complex patterns; nothing much. By accident, while changing slides, the edge of a slide swept across the projector — and a neuron exploded with activity, selective for a moving edge at a specific orientation. In a series of papers over the following decade Hubel and Wiesel mapped the cortical hierarchy — simple cells responding to oriented edges, complex cells combining them, hypercomplex cells responding to specific configurations — and received the 1981 Nobel Prize. The cortical hierarchy they discovered turned out to be the architectural template for convolutional neural networks.

Visual perception begins with photoreceptors in the retina — ~120 million rods (high sensitivity, low resolution, achromatic, peripheral) and ~6 million cones (low sensitivity, high resolution, color, central) — and the retina performs initial processing through bipolar and ganglion cells that compute center-surround difference to sharpen edges. The optic nerve carries ~1 million axons per eye to the lateral geniculate nucleus of the thalamus, which relays to primary visual cortex (V1); V1 is retinotopically mapped (the central fovea gets disproportionate cortical real estate) and contains Hubel-Wiesel orientation columns. From V1 two major processing streams diverge — the dorsal stream (V1 → MT → posterior parietal cortex) is the where/how pathway for motion, spatial location, and action-guidance, and the ventral stream (V1 → V4 → inferior temporal cortex) is the what pathway for form, color, and object recognition — with higher areas increasingly invariant: inferotemporal cortex to size and rotation, fusiform face area to faces, parahippocampal place area to scenes, and very late in the hierarchy concept cells (Quian Quiroga 2005) like the Halle Berry cell that fired to photographs of the actress from many angles — and even to her printed name. The cortical hierarchy is not strictly feedforward — massive feedback connections implementing predictive processing run from higher to lower areas, with the cortex constantly predicting what it expects to see and propagating prediction errors up. Visual illusions (the Müller-Lyer illusion, the Kanizsa triangle, the dress) reveal where the cortex's powerful priors go wrong when reconstructing a 3D world from 2D retinal images.

Why nowConvolutional neural networks (CNNs, the technology behind every commercial computer-vision system from 2012's AlexNet to today's foundation models) are biologically inspired by the Hubel-Wiesel hierarchy: early layers compute oriented-edge filters, middle layers compute texture and part features, late layers compute object-level features, and intermediate layers of trained CNNs predict the firing patterns of macaque inferotemporal-cortex neurons better than any handcrafted model. Vision transformers (~2020+) are a partial alternative but still operate hierarchically. Diffusion models (Stable Diffusion, DALL-E, Midjourney) generate images from text by inverting the visual hierarchy, with failure modes (hands, text within images, multi-object scenes) revealing where the hierarchy still has limits. The cortical-hierarchy architecture Hubel and Wiesel mapped in cat V1 has, six decades later, been ratified across species, methods, and even artificial systems as the basic logic of how the brain — and the brain-modeled machine — sees.