Table Normalization Basics in SQL

Mar 07, 2026

Normalization structures tables so one fact lives in one place, which helps inserts, updates, and deletes avoid conflicting copies of the same data. It starts with spotting repeated groups and relationships that are hiding inside wide rows, then splitting that data into tables that match how the rows relate to each other, so changes stay consistent without side effects.

Why Normalization Exists

In short, normalization keeps facts in one place so the database does not store conflicting copies of the same information. Tables start drifting when one row tries to represent more than one thing at the same time, such as an order, the items on that order, and customer details. That drift usually appears in two ways repeated groups that grow sideways into extra columns, and anomalies where routine edits create mismatched data.

Repeated Groups Show Up as Extra Columns or Mixed Lists

Wide rows can hide repeated groups when the same set of fields repeats side by side, like item1_*, item2_*, and so on. Another form shows up when a table stores a list inside one column, like SKUs joined by commas. Both styles hide the same relationship. One order can contain multiple items, yet the table is laid out like an order can only hold a fixed number of items.

An early draft order table looks like this because it’s normal to keep everything together in one record:

CREATE TABLE order_sheet (
  order_id        BIGINT PRIMARY KEY,
  order_date      DATE NOT NULL,
  customer_id     BIGINT NOT NULL,
  customer_name   VARCHAR(255) NOT NULL,
  customer_phone  VARCHAR(32),
  item1_sku       VARCHAR(50),
  item1_name      VARCHAR(255),
  item1_price     NUMERIC(10,2),
  item1_qty       INTEGER,
  item2_sku       VARCHAR(50),
  item2_name      VARCHAR(255),
  item2_price     NUMERIC(10,2),
  item2_qty       INTEGER
);

Pressure shows up quickly. First, a hard ceiling appears on how much one order can hold, so extra items push you toward adding item3_* columns or splitting one order into multiple rows, both of which blur what order_id is supposed to mean. Second, the layout stops matching the questions you want to ask. Searching for orders that contain a product turns into a scan across multiple columns, and the query gets longer every time more item columns are added:

SELECT order_id, order_date
FROM order_sheet
WHERE item1_sku = 'SKU-1007'
   OR item2_sku = 'SKU-1007';

Total calculations run into similar trouble. Once line counts vary, totals turn into repeated expressions with lots of null handling, which is a sign the table is carrying a repeating structure that wants rows instead of columns.

Lists inside one column cause the same problems in a different disguise. One column that holds values like SKU-1007,SKU-2040 cannot be validated with a normal foreign key, cannot be joined safely to a product table, and turns searches into string work. That pushes business rules into ad hoc parsing logic, and different queries start parsing the same column in different ways.

Anomalies Signal Structural Trouble

Problems show up fast when routine changes force edits across more rows than expected, when recording one fact requires inventing unrelated facts, or when deleting one row removes data you did not mean to remove.

Update anomalies appear when one fact is stored in multiple places. In the order_sheet table, customer data is copied into each order row. If a customer changes their phone number, the edit needs to touch every row that repeated that value. Missing one row leaves the database claiming two phone numbers for the same customer.

Two updates that do not hit the same set of rows can create a mismatch fast:

UPDATE order_sheet
SET customer_phone = '715-555-0199'
WHERE customer_id = 501;

UPDATE order_sheet
SET customer_phone = '715-555-0188'
WHERE customer_id = 501
  AND order_date >= DATE '2026-02-01';

After those statements, customer_id 501 has two phone numbers stored in the same table. Nothing in the table structure prevents it, because the layout treats customer details as if they were order details.

Insert anomalies show up when the table layout blocks you from recording something that should stand on its own. If product details only exist inside item1_name, item1_price, and similar columns, adding a new product to a catalog becomes impossible without also creating an order row, because the table has no separate home for product facts. People work around that by inserting placeholder orders or placeholder customers, which pollutes data quality.

Delete anomalies show up when removing one row also removes facts that were not part of the deletion decision. If the only place a product name exists is inside old order rows, deleting old orders can erase the last record of what that product was called. That is a side effect of storing multiple concepts in one row, where each concept should have its own table.

First Normal Form Through Third Normal Form With One Running Example

This part takes the order story and reshapes it step by step so the tables match the relationships in the data. The same order can contain multiple line items, product details belong to a product catalog, and customer details belong to the customer record. Normal forms give names to those separations so you can reason about them and spot trouble early.

First Normal Form Make Rows Atomic

Row structure comes first. Values stay atomic, meaning one value per column, and repeating groups stop living as extra column sets like item1_* and item2_*. Instead of widening the table each time an order has more items, line items become rows in their own table, tied back to the order. The table split below keeps order level facts in orders and item level facts in order_item. line_no gives each item a stable identity inside its order, which matters when the same SKU appears twice for separate lines or when line order matters on invoices.

CREATE TABLE orders (
  order_id     BIGINT PRIMARY KEY,
  order_date   DATE NOT NULL,
  customer_id  BIGINT NOT NULL
);

CREATE TABLE order_item (
  order_id    BIGINT NOT NULL,
  line_no     INTEGER NOT NULL,
  sku         VARCHAR(50) NOT NULL,
  qty         INTEGER NOT NULL,
  unit_price  NUMERIC(10,2) NOT NULL,
  PRIMARY KEY (order_id, line_no),
  FOREIGN KEY (order_id) REFERENCES orders(order_id),
  CHECK (qty > 0),
  CHECK (unit_price >= 0)
);

After that split, any order can carry as many items as needed without altering the schema. Searches also get easier because the SKU lives in one column across all line rows, rather than spread across item1_sku, item2_sku, and future columns.

Data entry also starts to be a bit more natural. Insert the order once, then insert one row per line item:

INSERT INTO orders (order_id, order_date, customer_id)
VALUES (90001, DATE '2026-02-10', 501);

INSERT INTO order_item (order_id, line_no, sku, qty, unit_price)
VALUES
  (90001, 1, 'SKU-1007', 2, 19.99),
  (90001, 2, 'SKU-2040', 1, 8.50),
  (90001, 3, 'SKU-1007', 1, 17.99);

That last line item repeats the same SKU on a separate line with a different unit_price. That can happen with discounts, price changes, or partial returns. A wide table with one SKU slot per column set struggles with that case because the repeated group is trying to represent a list, not a fixed set of columns.

Second Normal Form Remove Partial Dependencies From Composite Identifiers

Composite identifiers deserve extra attention. order_item identifies a row by (order_id, line_no), which means any non identifying column in that table should depend on the full pair, not only part of it. Quantities and invoice prices belong to that line within that order, so they fit naturally.

Order level values like order_date and customer_id depend on order_id alone, so they stay in orders instead of being repeated on every order_item row identified by (order_id, line_no). That keeps one edit from turning into many edits, and it prevents line rows from disagreeing about the same order.

After that, you can split catalog facts into product so the name and current list_price live with the sku, while order_item keeps the invoice unit_price that reflects what was charged on that order.

CREATE TABLE product (
  sku         VARCHAR(50) PRIMARY KEY,
  name        VARCHAR(255) NOT NULL,
  list_price  NUMERIC(10,2) NOT NULL,
  CHECK (list_price >= 0)
);

ALTER TABLE order_item
  ADD CONSTRAINT order_item_product_fk
  FOREIGN KEY (sku) REFERENCES product(sku);

order_item can keep sku, qty, and the invoice unit_price, while the current catalog name comes from joining to product when needed. Put a foreign key on order_item.sku so the database can validate that every line item references a real catalog row. That division also supports a common business rule. list_price can change over time, while unit_price on the order line reflects what was charged on that date.

Third Normal Form Remove Transitive Dependencies

Transitive dependencies show up when a non identifying column depends on another non identifying column. Customer attributes in an order table are the classic case. When orders stores customer_id, any customer name or phone stored in orders depends on customer_id, not on order_id. That creates duplicated customer facts across order rows.

Customer attributes belong in a customer table, with orders referencing it:

CREATE TABLE customer (
  customer_id  BIGINT PRIMARY KEY,
  name         VARCHAR(255) NOT NULL,
  phone        VARCHAR(32)
);

ALTER TABLE orders
  ADD CONSTRAINT orders_customer_fk
  FOREIGN KEY (customer_id) REFERENCES customer(customer_id);

With that structure, changing a phone number becomes one update in one place, and all orders that reference the customer still point at the current contact data. Joins in reports pull the related rows back into one result set so you can view an order with its customer details in a single query:

SELECT
  o.order_id,
  o.order_date,
  c.name AS customer_name,
  c.phone AS customer_phone
FROM orders o
JOIN customer c ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE '2026-01-01'
ORDER BY o.order_date, o.order_id;

Some data looks like customer data but is actually order data. Shipping address is a good example. Customers can have multiple addresses, orders can ship to a one time destination, and historical invoices usually need the exact address that was used at the time. Normalization does not force one choice there. It pushes you to decide what the column represents. If the value represents the customer profile, store it with the customer. If the value represents what happened on the order, store it with the order.

Spotting Junction Table Needs

Junction tables come up when two entity tables relate in a many to many way. Promotions and orders fit that structure well, because an order can have multiple promotions applied, and a promotion can apply to multiple orders. Repeated columns like promo_code1 and promo_code2 run into the same repeated group problem as item columns, and a comma separated list runs into the same validation and query problems as any list stored in a single column.

With a junction table, the relationship is stored as rows instead of being packed into columns or text:

CREATE TABLE promotion (
  promo_code  VARCHAR(50) PRIMARY KEY,
  description VARCHAR(255) NOT NULL
);

CREATE TABLE order_promotion (
  order_id    BIGINT NOT NULL,
  promo_code  VARCHAR(50) NOT NULL,
  PRIMARY KEY (order_id, promo_code),
  FOREIGN KEY (order_id) REFERENCES orders(order_id),
  FOREIGN KEY (promo_code) REFERENCES promotion(promo_code)
);

That structure supports questions that are awkward in a wide column layout, such as finding all orders that used a given promotion or counting promotion usage over a date range. It also supports validation. Foreign keys can confirm that every promo_code recorded on an order exists in the promotion table, and the composite primary identifier prevents duplicate application of the same promotion to the same order unless you intentionally model repeated application as its own concept.

Data entry follows the same rhythm as line items. Insert promotions into promotion, then record pairings in order_promotion as they happen on orders:

INSERT INTO promotion (promo_code, description)
VALUES
  ('WELCOME10', 'New customer discount'),
  ('FREESHIP', 'Free shipping promotion');

INSERT INTO order_promotion (order_id, promo_code)
VALUES
  (90001, 'WELCOME10'),
  (90001, 'FREESHIP');

After the relationship is stored as rows, the database can answer relationship questions through joins and filtering, rather than through column scanning or string parsing.

Conclusion

Normalization works by turning hidden repetition into explicit relationships that the database can store and join reliably. First normal form moves repeating groups out of wide rows so line items become their own rows tied back to an order. Second normal form keeps order level facts off line item rows when the line item table is identified by a multi column identifier, so values like order_date and customer_id stay on the single order row in orders instead of being repeated across lines. Third normal form separates customer facts from order facts, so customer attributes change in one place while orders keep the history of order facts. If you need historical customer name or phone as it was at purchase time, store a snapshot on the order or in a separate history table. When the relationship is many to many, the junction table keeps each pairing in its own row, so constraints can validate the data and queries stay join based instead of scanning extra columns or splitting strings.

PostgreSQL Documentation on CREATE TABLE
PostgreSQL Documentation on Constraints
PostgreSQL Documentation on Foreign Keys
SQL Server Documentation on CREATE TABLE
MySQL Documentation on InnoDB Foreign Keys
SQLite Documentation on Foreign Key Support

Share Alexander Obregon's Substack

Alexander Obregon's Substack

Discussion about this post

Ready for more?