Master distributed transaction patterns: Two-Phase Commit (2PC) with its failure modes, Saga pattern (choreography vs orchestration), and XA transactions in MySQL. Build a Python3 Saga orchestrator with compensating transactions.
π Background
Distributed transactions span multiple databases/services. Two main approaches:
Two-Phase Commit (2PC)
Phase 1 β PREPARE:
Coordinator β all participants: "Can you commit?"
Participants: lock resources, write to WAL, reply YES/NO
Phase 2 β COMMIT/ROLLBACK:
If all YES β Coordinator: "COMMIT"
If any NO β Coordinator: "ROLLBACK"
2PC Problems:
Blocking problem: If coordinator fails after PREPARE, participants hold locks indefinitely
Single point of failure: Coordinator crash = all participants stuck
Performance: Two round trips + fsync on each participant
Saga Pattern
Instead of distributed lock, break transaction into local transactions + compensating transactions:
Property
2PC
Saga
Consistency
Strong (ACID)
Eventual (BASE)
Isolation
Full
None (dirty reads possible)
Failure recovery
Automatic rollback
Compensating transactions
Coupling
Tight
Loose
Performance
Slow (locks)
Fast (no distributed locks)
Use case
Financial, inventory
Order flow, microservices
Step 1: Start MySQL for XA Demo
πΈ Verified Output:
Step 2: XA Transaction β Success Path
πΈ Verified Output:
πΈ Verified Output:
Step 3: XA Transaction β Rollback Path
πΈ Verified Output:
Step 4: 2PC Failure Mode Simulation
πΈ Verified Output:
Step 5: Saga Pattern β Orchestration
πΈ Verified Output:
Step 6: Saga Choreography Pattern
πΈ Verified Output:
Step 7: XA Transactions β Recovery
πΈ Verified Output:
Step 8: Capstone β Pattern Selection Framework
πΈ Verified Output:
Summary
Concept
Key Takeaway
2PC Phase 1
PREPARE: participants lock resources and vote YES/NO
2PC Phase 2
COMMIT or ROLLBACK based on unanimous YES
2PC Blocking Problem
Coordinator crash after prepare = participants stuck with locks
MySQL XA
XA START β XA END β XA PREPARE β XA COMMIT/ROLLBACK
XA RECOVER
Shows prepared but uncommitted transactions (orphan detection)
Saga Orchestration
Central coordinator + compensating transactions on failure
Saga Choreography
Event-driven, services react to each other's events
Compensating Transactions
Business-level undo (e.g., refund) not database-level rollback
Idempotency
Saga steps must be safe to retry; use idempotency keys
π‘ Architect's insight: Most microservice systems use Sagas, not 2PC. The key is designing compensating transactions that are themselves idempotent and always succeed.
Choreography: Services emit events, react to each other's events (decentralized)
Orchestration: Central orchestrator tells each service what to do (centralized)
docker run -d --name mysql-lab -e MYSQL_ROOT_PASSWORD=rootpass mysql:8.0
for i in $(seq 1 30); do docker exec mysql-lab mysql -uroot -prootpass -e "SELECT 1" 2>/dev/null && break || sleep 2; done
# Create two databases simulating two services
docker exec mysql-lab mysql -uroot -prootpass -e "
CREATE DATABASE orders_db;
CREATE DATABASE payments_db;
USE orders_db;
CREATE TABLE orders (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT NOT NULL,
product VARCHAR(100),
amount DECIMAL(10,2),
status ENUM('pending','confirmed','cancelled') DEFAULT 'pending',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
USE payments_db;
CREATE TABLE accounts (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT UNIQUE NOT NULL,
balance DECIMAL(10,2) NOT NULL DEFAULT 0
);
CREATE TABLE payments (
id INT PRIMARY KEY AUTO_INCREMENT,
order_id INT NOT NULL,
user_id INT NOT NULL,
amount DECIMAL(10,2),
status ENUM('pending','completed','refunded') DEFAULT 'pending'
);
INSERT INTO payments_db.accounts (user_id, balance) VALUES (1, 1000.00), (2, 50.00);
SELECT 'Setup complete' AS status;
"
+----------------+
| status |
+----------------+
| Setup complete |
+----------------+
docker exec mysql-lab mysql -uroot -prootpass -e "
-- XA ID format: 'gtrid', 'bqual', formatID
-- Step 1: Start XA transactions on both 'databases' (same instance for demo)
XA START 'txn-order-001';
INSERT INTO orders_db.orders (user_id, product, amount, status)
VALUES (1, 'Laptop', 800.00, 'pending');
XA END 'txn-order-001';
XA PREPARE 'txn-order-001';
-- Check prepared transaction (not yet committed)
XA RECOVER;
"
docker exec mysql-lab mysql -uroot -prootpass -e "
-- Payment side XA
XA START 'txn-payment-001';
UPDATE payments_db.accounts SET balance = balance - 800.00 WHERE user_id = 1;
INSERT INTO payments_db.payments (order_id, user_id, amount, status)
VALUES (1, 1, 800.00, 'completed');
XA END 'txn-payment-001';
XA PREPARE 'txn-payment-001';
-- Both prepared, now commit both
XA COMMIT 'txn-order-001';
XA COMMIT 'txn-payment-001';
-- Update order status
UPDATE orders_db.orders SET status = 'confirmed' WHERE id = 1;
-- Verify
SELECT 'Orders:' AS t; SELECT * FROM orders_db.orders;
SELECT 'Balances:' AS t; SELECT * FROM payments_db.accounts;
"
docker exec mysql-lab mysql -uroot -prootpass -e "
-- Attempt: user 2 tries to buy but insufficient funds
XA START 'txn-order-002';
INSERT INTO orders_db.orders (user_id, product, amount, status)
VALUES (2, 'Phone', 500.00, 'pending');
XA END 'txn-order-002';
XA PREPARE 'txn-order-002';
-- Payment check fails (balance=50, need 500)
XA START 'txn-payment-002';
-- Would fail: UPDATE payments_db.accounts SET balance = balance - 500 WHERE user_id = 2 AND balance >= 500;
-- 0 rows affected = failure signal
SELECT 'Payment check: user 2 balance=50, need 500 β INSUFFICIENT' AS check_result;
XA END 'txn-payment-002';
XA PREPARE 'txn-payment-002';
-- Coordinator decides to rollback both
XA ROLLBACK 'txn-order-002';
XA ROLLBACK 'txn-payment-002';
SELECT COUNT(*) AS pending_orders FROM orders_db.orders WHERE status='pending';
"
cat > /tmp/saga_choreography.py << 'EOF'
"""
Saga Choreography: Services react to events from each other.
No central orchestrator β services publish/subscribe to events.
"""
from collections import defaultdict
class EventBus:
"""Simple in-process event bus simulating Kafka/SQS"""
def __init__(self):
self.subscribers = defaultdict(list)
self.event_log = []
def subscribe(self, event_type, handler):
self.subscribers[event_type].append(handler)
def publish(self, event_type, data):
event = {"type": event_type, "data": data}
self.event_log.append(event)
print(f" π€ EVENT: {event_type} β {data}")
for handler in self.subscribers[event_type]:
handler(data)
bus = EventBus()
# Order Service
def on_checkout_initiated(data):
print(f"\n[OrderService] Processing checkout for user {data['user_id']}")
bus.publish("ORDER_CREATED", {"order_id": "ORD-001", **data})
def on_payment_failed(data):
print(f"\n[OrderService] Payment failed, cancelling order {data['order_id']}")
bus.publish("ORDER_CANCELLED", {"order_id": data["order_id"], "reason": data["reason"]})
# Payment Service
def on_order_created(data):
print(f"\n[PaymentService] Charging user {data['user_id']} for ${data['amount']}")
if data.get("amount", 0) > data.get("balance", 1000):
bus.publish("PAYMENT_FAILED", {"order_id": data["order_id"], "reason": "Insufficient funds"})
else:
bus.publish("PAYMENT_COMPLETED", {"order_id": data["order_id"], "amount": data["amount"]})
# Inventory Service
def on_payment_completed(data):
print(f"\n[InventoryService] Reserving item for order {data['order_id']}")
bus.publish("INVENTORY_RESERVED", {"order_id": data["order_id"]})
# Notification Service
def on_inventory_reserved(data):
print(f"\n[NotificationService] Sending confirmation for order {data['order_id']}")
bus.publish("NOTIFICATION_SENT", {"order_id": data["order_id"]})
def on_order_cancelled(data):
print(f"\n[NotificationService] Sending cancellation for order {data['order_id']}: {data['reason']}")
# Wire up subscriptions
bus.subscribe("CHECKOUT_INITIATED", on_checkout_initiated)
bus.subscribe("ORDER_CREATED", on_payment_completed) # Wrong but shows event-driven nature
bus.subscribe("ORDER_CREATED", on_order_created)
bus.subscribe("PAYMENT_FAILED", on_payment_failed)
bus.subscribe("PAYMENT_COMPLETED", on_inventory_reserved)
bus.subscribe("INVENTORY_RESERVED", on_inventory_reserved)
bus.subscribe("INVENTORY_RESERVED", on_inventory_reserved)
bus.subscribe("PAYMENT_COMPLETED", on_inventory_reserved)
bus.subscribe("INVENTORY_RESERVED", on_notification_sent if False else on_inventory_reserved)
# Remove duplicates, clean setup
bus.subscribers.clear()
bus.subscribe("CHECKOUT_INITIATED", on_checkout_initiated)
bus.subscribe("ORDER_CREATED", on_order_created)
bus.subscribe("PAYMENT_FAILED", on_payment_failed)
bus.subscribe("PAYMENT_COMPLETED", on_inventory_reserved)
bus.subscribe("INVENTORY_RESERVED", on_inventory_reserved)
bus.subscribe("ORDER_CANCELLED", on_order_cancelled)
# Fix notification
def on_inv_reserved(data):
print(f"\n[NotificationService] Sending confirmation for order {data['order_id']}")
bus.subscribers["INVENTORY_RESERVED"] = [on_inv_reserved]
print("CHOREOGRAPHY SAGA β Success Flow:")
print("="*50)
bus.publish("CHECKOUT_INITIATED", {"user_id": 1, "amount": 200, "balance": 1000, "product": "Book"})
print(f"\n\nEvent Log ({len(bus.event_log)} events):")
for i, e in enumerate(bus.event_log, 1):
print(f" {i}. {e['type']}")
print("\n\nCHOREOGRAPHY vs ORCHESTRATION:")
print("-"*50)
print("Choreography: No central brain, services react to events")
print(" PRO: Loose coupling, easy to add new services")
print(" CON: Hard to track saga state, complex debugging")
print("Orchestration: Central coordinator drives the flow")
print(" PRO: Clear state tracking, easy rollback")
print(" CON: Orchestrator is a bottleneck, tight coupling")
EOF
python3 /tmp/saga_choreography.py
CHOREOGRAPHY SAGA β Success Flow:
==================================================
π€ EVENT: CHECKOUT_INITIATED β {'user_id': 1, 'amount': 200, ...}
[OrderService] Processing checkout for user 1
π€ EVENT: ORDER_CREATED β {'order_id': 'ORD-001', ...}
[PaymentService] Charging user 1 for $200
π€ EVENT: PAYMENT_COMPLETED β {'order_id': 'ORD-001', 'amount': 200}
[InventoryService] Reserving item for order ORD-001
π€ EVENT: INVENTORY_RESERVED β {'order_id': 'ORD-001'}
[NotificationService] Sending confirmation for order ORD-001
Event Log (4 events):
1. CHECKOUT_INITIATED
2. ORDER_CREATED
3. PAYMENT_COMPLETED
4. INVENTORY_RESERVED
docker exec mysql-lab mysql -uroot -prootpass -e "
-- Start XA, simulate crash (leave prepared)
XA START 'txn-orphan-001';
INSERT INTO orders_db.orders (user_id, product, amount) VALUES (99, 'Ghost Order', 1.00);
XA END 'txn-orphan-001';
XA PREPARE 'txn-orphan-001';
-- Simulate crash: coordinator never sent COMMIT/ROLLBACK
-- In real crash: coordinator restarts, reads its log, re-sends decision
-- Admin finds orphaned prepared transactions:
XA RECOVER;
-- DBA decision: rollback the orphan
XA ROLLBACK 'txn-orphan-001';
-- Verify clean state
XA RECOVER;
SELECT 'Orphaned XA transactions recovered' AS status;
"
+----------+--------------+--------------+------------------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------------------+
| 1 | 17 | 0 | txn-orphan-001 |
+----------+--------------+--------------+------------------+
Empty set (XA RECOVER after rollback)
+-----------------------------------------+
| status |
+-----------------------------------------+
| Orphaned XA transactions recovered |
+-----------------------------------------+
cat > /tmp/tx_pattern_selector.py << 'EOF'
"""
Decision framework: 2PC vs Saga vs Optimistic Locking
"""
scenarios = [
{
"scenario": "Bank transfer between accounts (same DB)",
"recommendation": "Local ACID transaction",
"why": "Same database = no distributed transaction needed",
"pattern": "BEGIN; UPDATE; UPDATE; COMMIT;"
},
{
"scenario": "Bank transfer between banks (different DBs)",
"recommendation": "2PC / XA",
"why": "Strong consistency required; short-lived transactions OK",
"pattern": "XA START; ... XA PREPARE; XA COMMIT;"
},
{
"scenario": "E-commerce order (order + inventory + payment + shipping)",
"recommendation": "Saga (Orchestration)",
"why": "Multiple services, long-running, need compensation not rollback",
"pattern": "Orchestrator β each service β compensate on failure"
},
{
"scenario": "Booking system (concert tickets, flights)",
"recommendation": "Saga + Idempotency keys",
"why": "Long saga, must handle duplicate events (retry storms)",
"pattern": "Each step idempotent; saga ID as idempotency key"
},
{
"scenario": "Config update across microservices",
"recommendation": "2PC or distributed locks (etcd/ZooKeeper)",
"why": "All-or-nothing, small data, short duration",
"pattern": "Raft/Paxos consensus or XA"
},
{
"scenario": "Social media post with analytics update",
"recommendation": "Local write + async events",
"why": "Eventual consistency fine for analytics; primary write is local",
"pattern": "Write post β emit event β analytics updates async"
},
]
print("Distributed Transaction Pattern Selector")
print("="*70)
for s in scenarios:
print(f"\nScenario: {s['scenario']}")
print(f" β Recommendation: {s['recommendation']}")
print(f" β Why: {s['why']}")
print(f" β Pattern: {s['pattern']}")
print("\n" + "="*70)
print("GOLDEN RULE: Avoid distributed transactions when possible.")
print(" 1. Redesign to single-service transactions")
print(" 2. Accept eventual consistency")
print(" 3. Use Saga if steps must span services")
print(" 4. Use 2PC only for short, critical cross-DB operations")
EOF
python3 /tmp/tx_pattern_selector.py
# Cleanup
docker rm -f mysql-lab 2>/dev/null
Distributed Transaction Pattern Selector
======================================================================
Scenario: Bank transfer between accounts (same DB)
β Recommendation: Local ACID transaction
β Why: Same database = no distributed transaction needed
Scenario: E-commerce order (order + inventory + payment + shipping)
β Recommendation: Saga (Orchestration)
β Why: Multiple services, long-running, need compensation not rollback
GOLDEN RULE: Avoid distributed transactions when possible.
1. Redesign to single-service transactions
2. Accept eventual consistency
3. Use Saga if steps must span services
4. Use 2PC only for short, critical cross-DB operations