Lab 14: Zero-Downtime Migrations

Time: 50 minutes | Level: Architect | DB: PostgreSQL 15, MySQL 8.0


🎯 Objective

Master zero-downtime migration patterns: expand-contract, online schema changes with pt-online-schema-change and pg_repack, blue-green database deployment, and feature flags for data migrations.


📚 Background

Zero-Downtime Migration Patterns

Pattern 1: Expand-Contract

Phase 1 (Expand):   Add new_column (nullable), deploy code that writes both old+new
Phase 2 (Migrate):  Backfill existing rows in batches
Phase 3 (Switch):   Deploy code that reads new_column only
Phase 4 (Contract): Drop old_column (after all code deployed)

Pattern 2: Online Schema Change Tools

  • pt-online-schema-change (Percona): MySQL — shadow table copy + trigger sync

  • gh-ost (GitHub): MySQL — binlog-based, no triggers

  • pg_repack: PostgreSQL — rewrite table without exclusive lock

Pattern 3: Blue-Green


Step 1: Set Up PostgreSQL

📸 Verified Output:


Step 2: Expand Phase — Add Column Safely

📸 Verified Output:


Step 3: Backfill in Batches

📸 Verified Output:


Step 4: Contract Phase — Drop Old Columns

📸 Verified Output:


Step 5: CREATE INDEX CONCURRENTLY

📸 Verified Output:


Step 6: NOT VALID Constraint Pattern

📸 Verified Output:


Step 7: pg_repack (Reclaim Space, No Lock)

📸 Verified Output:


Step 8: Capstone — Migration Runbook Template

📸 Verified Output:


Summary

Concept
Key Takeaway

Expand-Contract

Add nullable → backfill → make NOT NULL → drop old (4 phases)

Batch backfill

UPDATE in batches of 1K-10K rows with sleep to reduce I/O pressure

CREATE INDEX CONCURRENTLY

Builds index without blocking reads/writes

NOT VALID constraint

Add constraint instantly (new rows only); VALIDATE separately

pt-osc

Shadow table + triggers; industry standard for MySQL DDL

gh-ost

Binlog-based MySQL schema change; no triggers; GitHub's tool

pg_repack

Reclaim table bloat without VACUUM FULL (no exclusive lock)

Feature flags

Toggle new code path without redeployment; instant rollback

Blue-green

Two environments; switch traffic at load balancer/DNS

💡 Architect's insight: The most dangerous migration is the one without a tested rollback plan. Every migration should have a documented rollback procedure AND a tested "Phase 5: Contract" timeline. Never drop columns in the same deployment as the business logic change.

Last updated