Lab 13: Multi-Region Database Architecture

Time: 50 minutes | Level: Architect | DB: PostgreSQL 15 (Logical Replication)


🎯 Objective

Design multi-region database architectures: active-passive vs active-active, conflict resolution, and geo-routing. Implement PostgreSQL logical replication with CREATE PUBLICATION/SUBSCRIPTION. Compare latency trade-offs.


📚 Background

Multi-Region Topologies

Active-Passive:
  Primary (us-east) ─sync─► Standby (eu-west)  [reads always hit primary]
  Failover: manual or semi-automatic (~2-5 min)

Active-Active:
  Primary-A (us-east) ◄──────► Primary-B (eu-west)
  [each region accepts writes; conflicts must be resolved]
  
Active-Active with CockroachDB/Spanner:
  All regions accept writes; consensus-based (Raft/Paxos)
  Automatic conflict-free transactions

Latency Reality Check

Route
Latency

Same datacenter

0.5 ms

Same city (different DC)

1-5 ms

US East ↔ US West

60-70 ms

US East ↔ Europe

80-120 ms

US East ↔ Asia Pacific

150-200 ms

US West ↔ Asia Pacific

100-150 ms

Implication: Cross-region synchronous replication adds ~80-200ms to every write. This is why most multi-region systems use asynchronous replication for read replicas.


Step 1: Set Up Two PostgreSQL Instances (Simulating Two Regions)

📸 Verified Output:


Step 2: Configure Logical Replication — Publisher

📸 Verified Output:


Step 3: Configure Logical Replication — Subscriber

📸 Verified Output:


Step 4: Test Logical Replication

📸 Verified Output:


Step 5: Conflict Resolution Strategies

📸 Verified Output:


Step 6: Geo-Routing Architecture

📸 Verified Output:


Step 7: Active-Active Architecture

📸 Verified Output:


Step 8: Capstone — Multi-Region Design Checklist

📸 Verified Output:


Summary

Concept
Key Takeaway

Active-passive

Single primary; replicas read-only; failover is manual/semi-auto

Active-active

Multiple writeable primaries; conflict resolution required

Logical replication

CREATE PUBLICATION / SUBSCRIPTION — table-level replication

Physical replication

Block-level streaming replication (full server copy)

Replication lag

Async = lag; sync = latency; choose based on RTO/RPO

LWW

Last-Write-Wins: simple but can lose concurrent updates

CRDTs

Conflict-free data types: counters, sets auto-merge

Geo-routing

Route53/Cloudflare → nearest read replica

Data sovereignty

GDPR requires EU data stay in EU → partition by region

RTT impact

US-EU = 100ms; async replication adds 100ms replica lag

💡 Architect's insight: Most applications don't need active-active multi-region writes — they need active-passive with fast failover and read replicas in each region. Only go active-active when you have specific use cases and a plan for every conflict scenario.

Last updated