Lab 10: Database Design & Normalization

Time: 30 minutes | Level: Foundations | DB: MySQL 8 / PostgreSQL 15

Overview

Learn 1NF, 2NF, and 3NF normalization, understand denormalization trade-offs, ER diagram concepts, and design a fully normalized schema from scratch.


Step 1: Why Normalization?

Normalization is the process of organizing data to reduce redundancy and improve integrity.

Problems with un-normalized data:

  • Insert anomaly: Can't record a professor without a course

  • Update anomaly: Changing a city name requires updating thousands of rows

  • Delete anomaly: Deleting the last student drops course information

-- Un-normalized table (everything in one place)
-- order_details: order_id, customer_name, customer_email, customer_city,
--                product_sku, product_name, product_category, quantity, price

-- Problems:
-- 1. Customer info repeated for every order item
-- 2. Change customer city → update ALL their order rows
-- 3. Delete all items → lose customer record
-- 4. Can't store product without an order

Step 2: First Normal Form (1NF)

1NF requires:

  1. Each cell contains a single (atomic) value

  2. Each row is unique (has a primary key)

  3. No repeating groups or arrays

VIOLATION — Not in 1NF:

FIXED — 1NF:


Step 3: Second Normal Form (2NF)

2NF requires:

  1. Already in 1NF

  2. Every non-key column must depend on the entire primary key (no partial dependencies)

Only applies when you have a composite primary key.

VIOLATION — Not in 2NF:

student_name depends only on student_id → partial dependency! course_name depends only on course_id → partial dependency!

FIXED — 2NF:


Step 4: Third Normal Form (3NF)

3NF requires:

  1. Already in 2NF

  2. No transitive dependencies (non-key column must not depend on another non-key column)

VIOLATION — Not in 3NF:

dept_name → depends on dept_id (not on emp_id)! dept_location → depends on dept_id (not on emp_id)!

This is a transitive dependency: emp_id → dept_id → dept_name

FIXED — 3NF:


Step 5: Build Normalized Schema from Raw Data

Raw un-normalized data:

Normalized design:


Step 6: Query Normalized Schema

📸 Verified Output:


Step 7: Denormalization Trade-offs

Scenario
Normalized
Denormalized

Storage

Less data, no redundancy

More data, some redundancy

Writes

Clean (update in one place)

Complex (update multiple places)

Reads

Requires JOINs

Fast flat reads

Integrity

Enforced by FK

Application must maintain

Use case

OLTP (transactional)

OLAP (analytics), caching layers

When to denormalize:

  • Read-heavy reporting tables (data warehouses)

  • Known performance bottleneck after measuring

  • Cache tables for dashboards

  • When JOIN cost exceeds redundancy cost

💡 Rule of thumb: Start normalized. Denormalize only when you have a measured performance problem that normalization is causing.


Step 8: Capstone — Verify Schema Integrity

📸 Verified Output:

Cleanup:


Summary

Normal Form
Requirement
Fixes

1NF

Atomic values, unique rows, no repeating groups

Multi-valued cells, arrays in columns

2NF

1NF + no partial dependencies on composite PK

Attributes depending on part of PK

3NF

2NF + no transitive dependencies

A→B→C where B is non-key

BCNF

3NF + every determinant is a candidate key

Rare edge cases in 3NF

Concept
Description

Insert anomaly

Can't add data without adding other unrelated data

Update anomaly

Must update same fact in multiple rows

Delete anomaly

Deleting data inadvertently removes other facts

Denormalization

Intentional redundancy for read performance

Next: Lab 11 — Primary Keys, Foreign Keys, and Constraints

Last updated