Edgar F. Codd, a British computer scientist at IBM Research San Jose in 1969, had spent years watching application programmers fight pointer-navigation through hierarchical and network databases — IBM's IMS and the CODASYL standard — where a change in physical storage silently broke working code. His response was a six-page paper in Communications of the ACM in June 1970, A Relational Model of Data for Large Shared Data Banks: data as relations (sets of tuples — tables of rows and columns), queries written declaratively, the access plan delegated to a query optimizer in the engine. Logical structure was decoupled from physical storage entirely. IBM was initially unenthusiastic; Codd pushed the work through against institutional resistance and won the 1981 Turing Award.
The relational model rests on a small set of abstractions. A relation is a set of tuples conforming to a schema; a primary key uniquely identifies each row, and a foreign key references a primary key in another table. The relational algebra — selection, projection, join, union, intersection, Cartesian product, rename — is the formal core. Codd's normal forms (first through Boyce-Codd) eliminate data redundancy and update anomalies. SQL, developed at IBM by Chamberlin and Boyce in the mid-1970s and standardized by ANSI in 1986, is the dominant declarative language across every relational database; the query optimizer that converts SQL to an execution plan using table statistics, cardinality estimation, and join-order enumeration is one of the deepest pieces of practical computer science. ACID transactions — Atomicity, Consistency, Isolation, Durability — are the correctness guarantees: a transaction commits or rolls back in full, concurrent transactions behave as if serial, committed effects survive crashes. Jim Gray's 1981 The Transaction Concept synthesized the framework; Gray won the 1998 Turing Award. The CAP theorem (Eric Brewer's 2000 conjecture, Gilbert-Lynch's 2002 proof) showed that in a distributed system you can have at most two of Consistency, Availability, and Partition tolerance; since partitions are inevitable, real systems trade C against A. The NoSQL movement of the late 2000s (MongoDB, Redis, Cassandra, DynamoDB) arose for web-scale workloads where ACID was unaffordable; the field has since substantially returned to SQL through NewSQL systems (Spanner, CockroachDB) that combine horizontal scalability with full ACID.
Most application data globally is stored in relational databases; the SQL market is roughly $60 billion annually by 2024, with PostgreSQL the default open-source choice and Oracle, SQL Server, and Snowflake dominating the commercial high end. Analytical workloads run on columnar OLAP systems (Snowflake, BigQuery, ClickHouse, DuckDB) queried via SQL on a distributed engine. Vector databases (Pinecone, pgvector, Milvus) emerged after 2021 to store the high-dimensional embeddings on which retrieval-augmented generation runs. SQLite — Richard Hipp's 2000 embedded library — is by some measures the most-deployed software in history, present in every iOS and Android device. Natural-language-to-SQL is starting to blur whether SQL syntax remains the lingua franca, but the relational model itself remains where the engineering depth lives.