Lab 01: CAP Theorem & Consistency

Time: 50 minutes | Level: Architect | DB: Distributed Systems (Python3 simulation)


🎯 Objective

Understand the CAP theorem, its practical implications, and the PACELC extension. Simulate partition scenarios to observe consistency vs. availability trade-offs. Map real databases (MySQL, Cassandra, ZooKeeper) to CAP categories.


📚 Background

The CAP Theorem (Brewer's Theorem, 2000) states that a distributed data store can only guarantee two of three properties simultaneously:

  • C — Consistency: Every read receives the most recent write or an error

  • A — Availability: Every request receives a (non-error) response, though it may not be the most recent

  • P — Partition Tolerance: The system continues operating despite network partitions (message loss between nodes)

💡 Key insight: Network partitions are unavoidable in real distributed systems. You must choose: during a partition, do you sacrifice C (stay available with potentially stale data) or A (go offline to maintain consistency)?

CAP Categories

Category
Description
Examples

CP

Consistent + Partition Tolerant (sacrifices availability)

MySQL (with sync replication), ZooKeeper, HBase, etcd

AP

Available + Partition Tolerant (sacrifices consistency)

Cassandra, DynamoDB, CouchDB, DNS

CA

Consistent + Available (no partition tolerance)

Single-node RDBMS (not truly distributed)

PACELC Extension

PACELC extends CAP to cover normal operation (no partition):

System
Partition choice
Else choice

MySQL (sync replication)

CP

CL (consistency over latency)

Cassandra

AP

EL (eventual, low latency)

DynamoDB (strong)

CP

CL

DynamoDB (eventual)

AP

EL

Consistency Models (from strongest to weakest)

  1. Strong Consistency — All reads see the latest write. Requires coordination. (ZooKeeper, etcd)

  2. Sequential Consistency — Operations appear in program order across all nodes.

  3. Causal Consistency — Causally related operations are seen in order. (MongoDB sessions)

  4. Eventual Consistency — All nodes converge to the same value given no new updates. (Cassandra default)

  5. Read-Your-Writes — After writing, you always see your own write.


Step 1: Set Up Python Environment

📸 Verified Output:


Step 2: Simulate a Distributed Key-Value Store

Create the simulation file:

📸 Verified Output:


Step 4: MySQL as CP System

📸 Verified Output:

💡 MySQL CP characteristics: With synchronous replication (rpl_semi_sync enabled), MySQL waits for at least one replica to acknowledge before committing. This makes it CP — during partition, writes may block/timeout rather than returning stale data.


Step 5: Cassandra as AP System Concepts

📸 Verified Output:


Step 6: PACELC in Practice

📸 Verified Output:


Step 7: Eventual Consistency Deep Dive

📸 Verified Output:


Step 8: Capstone — Consistency Model Decision Framework

📸 Verified Output:


Summary

Concept
Key Takeaway

CAP Theorem

Can't have C + A + P simultaneously; P is unavoidable → choose C or A

CP Systems

MySQL, ZooKeeper, etcd — refuse requests during partition to stay consistent

AP Systems

Cassandra, DynamoDB, DNS — serve requests with potentially stale data

PACELC

Extends CAP: normal operation also requires Latency vs Consistency choice

Strong Consistency

All reads see latest write; requires coordination overhead

Eventual Consistency

Nodes converge given time; high availability and low latency

Causal Consistency

Your writes visible to you immediately; others may lag

Tunable Consistency

Cassandra: ONE=AP, QUORUM=CP, ALL=CP (strongest)

LWW Conflict Resolution

Last-Write-Wins; simple but may lose concurrent updates

💡 Architect's insight: "CP vs AP" is not a binary choice — Cassandra lets you tune per-operation. Design systems with the RIGHT consistency level for each data type, not a one-size-fits-all approach.

Last updated