PolymathicAll ideas →
Life Sciences

AlphaFold & The Protein-Folding Problem

A fifty-year grand challenge solved: protein structure predicted from sequence at near-experimental accuracy.

In July 2021, DeepMind published AlphaFold 2 in Nature. The paper described a neural network that could predict the three-dimensional structure of a protein from its amino-acid sequence alone with near-experimental accuracy. The protein-folding problem — sometimes called biology's grand challenge — had stood unsolved for more than fifty years; progress had been measured in fractions of an Ångström per decade across the CASP benchmarks since 1994, and AlphaFold 2 jumped a generation in a single year (CASP14, 2020). DeepMind released predictions for nearly every known protein, over 200 million entries, in a public database. The 2024 Nobel Prize in Chemistry went jointly to Demis Hassabis and John Jumper (for AlphaFold) and David Baker (for de-novo protein design).

The protein-folding problem is to predict, from a linear sequence of amino acids alone, the three-dimensional structure the protein collapses into when released into water. It matters because function follows structure — enzymes catalyse through the geometry of their active sites, antibodies recognize antigens through complementary shape, membrane channels gate ions through conformational change. The problem was hard for the reason Cyrus Levinthal pointed out in 1969: the conformational space is astronomical, with a hundred-residue protein admitting roughly 10⁶⁰ possible arrangements, yet real proteins find a unique low-energy state in seconds. Random search cannot be the answer; the actual folding pathway involves cooperative collapse on funnel-shaped energy landscapes, and theoretical work over five decades progressed at the rate of fractions of an Ångström per CASP biennium. What AlphaFold 2 changed was the input. A transformer-based network trained on the Protein Data Bank's roughly 170,000 experimentally solved structures was given two things at once: the target sequence, and a multiple-sequence alignment of evolutionarily related sequences from across all of life. The alignment is the key. Residues that co-evolve across homologous proteins — whose mutations are correlated across species — are almost always physically close in the folded structure, because evolution preserves the contact when one residue mutates by mutating its partner to compensate. AlphaFold's Evoformer extracts this evolutionary signal as implicit contact information and feeds it through a structure module that refines three-dimensional coordinates iteratively. On the 2020 CASP14 benchmark the result was median accuracy around one Ångström — within crystallographic experimental error for many proteins — a generation's worth of progress in a single year. AlphaFold 3 in 2024 extended the model with a diffusion-based generative head capable of predicting protein-ligand, protein-DNA, and protein-RNA complexes. Limitations remain — AlphaFold predicts a single most-likely state rather than a conformational ensemble — but a fifty-year-old grand challenge has been transformed into a working tool.

Why it matters now

AlphaFold has reorganised structural biology in roughly three years. Crystallography and cryo-EM remain essential for novel folds and high-resolution active-site geometry, but most structural questions in cell biology, pharmacology, and protein engineering now start from an AlphaFold prediction. Structure-based drug design that previously required years of crystallography starts from a predicted structure on day one. Antibody design uses predicted antibody-antigen complexes to guide therapeutic engineering; vaccine design (notably for SARS-CoV-2 spike protein) increasingly relies on AlphaFold-derived geometry. Synthetic biology uses AlphaFold-validated structures for novel industrial enzymes. The 2024 Nobel recognized that the field's intellectual centre of gravity has shifted toward computational and AI-assisted methods. The open frontiers are protein dynamics, conformational ensembles, membrane proteins, and de novo enzymes for arbitrary chemistry.

Read it in Polymathic →Browse the catalogue
Polymathic — a curated catalogue of the ideas worth keeping across twelve disciplines. polymathic.app