On June 21, 1970, a researcher at IBM named Edgar F. Codd published a paper called "A Relational Model of Data for Large Shared Data Banks." At the time, databases stored information in rigid hierarchical trees - to find a piece of data, you had to know exactly which branch it was hanging from. Codd proposed something different: store everything in flat tables connected by shared values. His idea was largely ignored by IBM for years. Within two decades it had become the dominant model for storing structured data on the planet, and it still is today.
You are working with that model every time you open a database. Understanding why it is built this way is not background reading - it is the reason every SQL query you write will make sense instead of feeling like memorizing syntax.
Tables Are Not Spreadsheets
When you first see a database table, it looks like a spreadsheet. Rows, columns, a grid. That resemblance is misleading. A spreadsheet will let you put anything in any cell - a number in one row, a name in the next, a formula in the one after that. A database table refuses this. Every value in a column must conform to a declared type: integer, text, date, boolean. The column price will only accept numbers. The column email will only accept text strings. Try to violate either and the database rejects the entry before it is written.
This rigidity is a feature. It means you can trust what comes back when you query. If you ask for all orders over $100, the database will not surprise you with a row where price contains the word "pending."
Key Point: A database table enforces its own structure through data types and constraints. This means every row in a table has the same shape - you can depend on that shape when writing queries.
Primary Keys: The Unique Identifier
Every table needs a way to point at exactly one row. That is what a primary key does. It is a column - or occasionally a combination of columns - whose value is unique across every row in the table and is never empty.
Think of it like a parcel tracking number. Two packages can have the same sender, the same destination, the same weight, and still be different packages. The tracking number is what distinguishes them. In a users table, the primary key might be a column called user_id that the database automatically increments by one for every new user added. User 1, user 2, user 3. No two users share an ID. No user has a blank ID. The primary key makes every row unambiguous.
Foreign Keys: The Thread That Connects Tables
Codd's insight was that you do not need to store everything about a subject in one place. You store related information in separate tables and connect them by reference.
An orders table does not need to store a customer's name, address, and phone number in every row. It only needs to store the customer's ID - a foreign key that points to the customer's row in a separate customers table. When you need the full picture, you combine the two tables using that shared ID.
This design eliminates repetition. If a customer changes their address, you update one row in customers and every order automatically reflects the new address. If you had copied the address into every order row, you would need to update hundreds of rows and would almost certainly miss some. The relational model makes the correct approach the easy approach.
Referential Integrity: What the Database Enforces for You
When a foreign key relationship exists, the database enforces referential integrity. That means it will not let you create an order for a customer ID that does not exist in the customers table. It also will not let you delete a customer who still has orders attached to their ID, unless you tell the database explicitly what to do in that situation.
This sounds restrictive until you consider the alternative: a database full of orders pointing at customer IDs that were deleted years ago, with no way to know whose orders they were. Referential integrity is the database refusing to let you create that mess in the first place.