In 1967 Philippa Foot — an English philosopher at Oxford and granddaughter of US President Grover Cleveland — published a paper titled The Problem of Abortion and the Doctrine of Double Effect containing, almost in passing, a brief example: a runaway trolley heading toward five workmen on the track, and you can divert it onto a spur where it will kill one workman instead. Most people say divert. Judith Jarvis Thomson (1976, 1985) elaborated: now you are on a footbridge above the track with a large stranger whose body would stop the trolley if pushed, and most people refuse. The two cases involve the same arithmetic — one death to save five — but produce opposite intuitions, and the literature that grew from this thought experiment now spans fifty years.
The cases line up as a sliding scale of intuitions: standard trolley (pull the lever to kill one and save five) is mostly endorsed; footbridge (push a heavy stranger off the bridge to stop the trolley) is mostly refused; the loop variant splits intuitions; the transplant case (kill a healthy patient to harvest his organs for five dying ones) is almost universally rejected. Utilitarian arithmetic says all four are equivalent. The philosophical attempts to systematize the gap are the doctrine of double effect, derived from medieval scholastic ethics, which says it is permissible to foresee an evil consequence as a side effect of an act aimed at a good end but not to intend the evil as a means (the workman's death is foreseen in standard trolley, intended in footbridge); the personal-versus-impersonal distinction (Joshua Greene, 2001+), where footbridge requires physically pushing a person — Greene's neuroimaging shows personal dilemmas activate medial prefrontal and posterior cingulate emotion regions more strongly than impersonal ones, supporting his dual-process account; and Thomson's later view that the asymmetries may not track anything morally significant but reflect cognitive habits that don't survive scrutiny. The empirical literature — Greene's lab, Fiery Cushman's work, the Moral Machine experiment with millions of participants on autonomous-vehicle analogs — has documented a robust personal/impersonal distinction, real-but-secondary cultural variation, and strong framing effects. Critics (Barbara Fried, David Edmonds) argue that fictional perfect-information dilemmas poorly model real situations involving uncertainty, repeated interactions, and richer agency; defenders argue the artifice is the point, isolating the moral variables from real-world noise.
Autonomous-vehicle ethics has made the trolley problem practically operational: when a self-driving car must choose between hitting one pedestrian or swerving to hit two, the algorithm encodes some answer to a trolley-style dilemma, and the Moral Machine experiment (MIT, 2014–2018) collected ~40 million decisions from people in 233 countries. The practical engineering question turns out to be less consequential than the political question of who decides the algorithm's values, and AI alignment increasingly thinks in trolley terms about how powerful systems should trade off welfare, rights, autonomy, and aggregation. Military ethics, medical triage, and machine-learning fairness all instantiate trolley-shaped reasoning at higher stakes.