Lab 08: Data Warehousing

Time: 50 minutes | Level: Architect | DB: PostgreSQL 15


🎯 Objective

Build a star schema data warehouse in PostgreSQL. Implement fact/dimension tables, slowly changing dimensions (SCD Type 1/2/3), GROUP BY ROLLUP/CUBE for OLAP queries, and MATERIALIZED VIEWs for performance.


📚 Background

OLTP vs OLAP

Property
OLTP
OLAP / Data Warehouse

Purpose

Transactional (order, pay)

Analytics (trends, reporting)

Schema

Normalized (3NF)

Denormalized (star/snowflake)

Query type

Simple, point lookups

Complex aggregations

Data volume

Current state

Historical (years)

Row size

Small

Wide (100+ columns)

Optimization

Index on PK/FK

Column-store, partitioning

Examples

PostgreSQL, MySQL

Redshift, BigQuery, Snowflake

Star Schema

         dim_customer

dim_date ─── fact_sales ─── dim_product

         dim_store

Fact table: measurable events (sales, clicks, transactions). Contains foreign keys + metrics. Dimension table: descriptive context (who, what, when, where).

SCD Types

Type
Strategy
History

SCD 1

Overwrite old value

Lost

SCD 2

Add new row with version/date

Full history

SCD 3

Add "previous_value" column

One version back


Step 1: Start PostgreSQL & Create Star Schema

📸 Verified Output:


Step 2: Populate Dimensions

📸 Verified Output:


Step 3: Load Facts & OLAP Queries

📸 Verified Output:


Step 4: GROUP BY ROLLUP and CUBE

📸 Verified Output:

💡 ROLLUP vs CUBE: ROLLUP creates subtotals along a hierarchy (year → category → total). CUBE creates subtotals for ALL combinations of dimensions.


Step 5: Slowly Changing Dimensions (SCD)

📸 Verified Output:


Step 6: Materialized View for Aggregates

📸 Verified Output:


Step 7: ETL vs ELT

📸 Verified Output:


Step 8: Capstone — Warehouse Design Review

📸 Verified Output:


Summary

Concept
Key Takeaway

Star schema

Fact table at center; dimensions as spokes; simple joins

Snowflake schema

Normalized dimensions; more joins but less storage

Fact table

Contains metrics (revenue, qty) + FK references to dimensions

Dimension table

Descriptive context (customer, product, date)

SCD Type 1

Overwrite — simple, no history

SCD Type 2

New row per change — full history; use surrogate keys

SCD Type 3

Previous value column — one version back only

GROUP BY ROLLUP

Hierarchical subtotals (year → month → day)

GROUP BY CUBE

All combinations of dimensions

Materialized View

Pre-computed aggregates; REFRESH to update

ETL

Transform outside warehouse (Spark, Glue)

ELT

Transform inside warehouse (dbt, SQL)

💡 Architect's insight: Star schemas intentionally violate 3NF. Denormalization is a performance feature in analytics: fewer joins = faster aggregations. Use MATERIALIZED VIEW for your most expensive recurring queries.

Last updated