The derivative was discovered twice, almost simultaneously, by men who would spend the rest of their lives accusing each other of theft. Isaac Newton worked it out around 1666 in the plague year, calling it the fluxion — the rate at which a fluent quantity changes. Gottfried Wilhelm Leibniz worked it out independently in the 1670s, called it the differential, and gave it the notation dy/dx and the integral sign ∫ that we still use. The priority dispute that followed poisoned English mathematics for over a century — Newton's notation was used by his loyalists, Leibniz's by everyone on the continent, and the continental mathematicians proceeded to leave the English behind. The notation won the argument, even if Newton won the politics.
The derivative of a function ƒ at a point x measures how fast ƒ is changing at x — geometrically, the slope of the tangent line to the graph at that point. Formally it is the limit of the difference quotient: ƒ′(x) = lim h→0 [ƒ(x+h) − ƒ(x)] / h, when that limit exists. A function that has a derivative at x is differentiable there; differentiability implies continuity but not vice versa (a continuous function can have corners or cusps where no tangent exists). The mechanics of taking derivatives become, with practice, mechanical: the power rule (d/dx of xⁿ is n·xⁿ⁻¹), the product rule, the quotient rule, and — most consequentially — the chain rule, which says that the derivative of a composition ƒ(g(x)) is ƒ′(g(x)) · g′(x). The chain rule is the workhorse of every later application: it is literally the algorithm that backpropagation runs through neural networks. The derivative also gives linear approximation: near any point where ƒ is differentiable, ƒ(x + h) ≈ ƒ(x) + ƒ′(x)·h, the cheapest model of the function and the basis of Newton's method, gradient descent, and Taylor series. The second derivative measures curvature; higher derivatives capture finer local structure. The deep reason every law of physics is a differential equation is that physics is mostly local: forces, fields, fluxes, and rates of reaction all act on infinitesimal neighborhoods, and the derivative is exactly the language for what infinitesimal neighborhoods do.
Derivatives are the foundation of optimization: every gradient-descent algorithm, every neural-network training loop, every solver in operations research is computing derivatives and following them downhill. Backpropagation — the algorithm that trains modern AI models — is the chain rule applied recursively across a computation graph. Automatic differentiation (the technology behind PyTorch, TensorFlow, JAX) is industrialized derivative-taking, fast enough to run on billion-parameter models. Marginal everything in economics is a derivative. Control theory shapes physical systems by manipulating their derivatives. The little limit Newton and Leibniz invented in the 1670s is, three and a half centuries later, the engine that runs most of quantitative civilization.