Karl Popper — an Austrian philosopher who fled to New Zealand and then London ahead of the Nazis — published Logik der Forschung (The Logic of Scientific Discovery) in 1934, proposing a single criterion for distinguishing science from non-science: a theory is scientific only if it makes claims that could in principle be falsified by empirical observation. Marxism, psychoanalysis, and astrology — theories that, in Popper's view, could accommodate any observation by reinterpretation — were not scientific by this criterion, however interesting they might be otherwise. Einstein's general relativity, by contrast, made risky predictions (light bending around the sun by about 1.75 arcseconds) that could have failed and didn't — the canonical scientific virtue. Falsifiability became the most-cited demarcation criterion in the philosophy of science.
Popper's core argument is built on an asymmetry: scientific theories are universal claims ("all swans are white", "F = Gm₁m₂/r²"), and no finite number of confirming observations can verify a universal claim while one disconfirming observation can falsify it. So a good scientific theory makes risky predictions that could be wrong and survives those tests, while a theory with no observable implications fails to be scientific. Three corollaries follow: good theories forbid more — a theory saying "A at time t" forbids almost everything else, and the more it forbids the better it is if it survives; ad hoc rescues that modify a theory to accommodate falsifying observations preserve it at the cost of its scientific status; theories are never verified, only corroborated. Popper's framework was a clean answer to Hume — scientists don't induce, they propose, test, and discard or retain provisionally — but it has been endlessly contested. Thomas Kuhn (1962) argued that scientists don't actually behave that way: faced with disconfirming evidence they typically protect the dominant paradigm until anomalies accumulate to crisis, and then shift wholesale. Imre Lakatos refined the picture into research programmes with a hard core protected by a protective belt of auxiliary hypotheses, with progress measured by whether modifications are progressive (predicting new facts) or degenerative. The Duhem-Quine thesis observed that no hypothesis is tested in isolation, so any apparent falsification can be deflected onto auxiliary assumptions or the experimental setup. Modern statistical hypothesis testing (Fisher, Neyman-Pearson) operationalizes Popperian falsification — the null hypothesis is what you try to reject — and is the working version of "try to falsify" in most empirical fields.
"Is it falsifiable?" has become the standard methodological question, applied (often glibly) to claims in psychology, economics, climate science, vaccines, conspiracy theories, and AI capability claims. The question is useful but blunt — many genuinely scientific theories require sophisticated machinery to test (string theory's predicted particles lie at energies no foreseeable accelerator can reach), and many unscientific claims are technically falsifiable but practically protected by adherents' willingness to absorb anything. The replication crisis has put falsification back on the agenda; pre-registration — specifying hypotheses and analysis plans before collecting data — is the methodological response, forcing genuine falsification tests instead of post-hoc rationalization.