Master DynamoDB data modeling: partition key design, sort keys, GSI/LSI, single-table design, access-patterns-first methodology, and avoiding hot partitions. Build a complete e-commerce model with boto3.
📚 Background
DynamoDB Core Concepts
Concept
Description
Partition Key (PK)
Hash key — determines physical partition
Sort Key (SK)
Range key — enables range queries within partition
Item
Row equivalent (up to 400 KB)
GSI
Global Secondary Index — different PK+SK, separate partition
LSI
Local Secondary Index — same PK, different SK, same partition
On-demand
Pay per request; auto-scales instantly
Provisioned
Fixed RCU/WCU; can use auto-scaling
Access-Patterns-First Design
DynamoDB design rule: Define your access patterns BEFORE designing your schema.
Unlike RDBMS where you normalize then query, DynamoDB is query-driven:
List all access patterns
Design partition key to spread data evenly
Use SK to support range queries
Add GSI for additional access patterns
Single-Table Design
Store multiple entity types in ONE table using composite keys:
Must have high cardinality; distributes data across partitions
Sort Key
Enables range queries and efficient data organization
Single-table design
Multiple entity types in one table with composite keys
GSI
Separate partition key for alternative access patterns; eventual
LSI
Same partition key, different sort key; created at table creation
Hot partition
Single PK getting >1,000 WCU/s → throttled; use write sharding
On-demand
Auto-scales; best for unpredictable or new applications
Provisioned + auto-scaling
Better cost for predictable workloads
DynamoDB Streams
CDC: INSERT/MODIFY/REMOVE events trigger Lambda
Access-patterns-first
Design schema for your queries, not normalization
💡 Architect's insight: DynamoDB rewards you for knowing your access patterns upfront. The single-table design pattern — counterintuitive at first — eliminates table joins and gives consistent single-digit-millisecond performance at any scale.
PK SK Entity
USER#alice PROFILE User profile
USER#alice ORDER#2024-001 Order header
USER#alice ORDER#2024-002 Order header
ORDER#2024-001 ITEM#001 Order item
PRODUCT#laptop INFO Product info
PRODUCT#laptop REVIEW#alice Review
cat > /tmp/dynamodb_access_patterns.py << 'EOF'
"""
Step 1: Define access patterns before writing any schema.
E-commerce platform example.
"""
access_patterns = [
# User operations
{"id": "AP-01", "entity": "User", "operation": "Get user profile", "query": "PK=USER#{userId} SK=PROFILE"},
{"id": "AP-02", "entity": "User", "operation": "Get all orders for user", "query": "PK=USER#{userId} SK begins_with ORDER#"},
{"id": "AP-03", "entity": "User", "operation": "Get orders in date range", "query": "PK=USER#{userId} SK between ORDER#2024-01 and ORDER#2024-03"},
# Order operations
{"id": "AP-04", "entity": "Order", "operation": "Get order by ID", "query": "PK=ORDER#{orderId} SK=INFO"},
{"id": "AP-05", "entity": "Order", "operation": "Get all items in order", "query": "PK=ORDER#{orderId} SK begins_with ITEM#"},
{"id": "AP-06", "entity": "Order", "operation": "Get pending orders by user", "query": "GSI1: PK=USER#{userId} SK=STATUS#pending"},
# Product operations
{"id": "AP-07", "entity": "Product", "operation": "Get product info", "query": "PK=PRODUCT#{productId} SK=INFO"},
{"id": "AP-08", "entity": "Product", "operation": "Get products by category", "query": "GSI2: PK=CATEGORY#{cat} SK begins_with PRODUCT#"},
{"id": "AP-09", "entity": "Product", "operation": "Get all reviews for product", "query": "PK=PRODUCT#{productId} SK begins_with REVIEW#"},
{"id": "AP-10", "entity": "Product", "operation": "Get top-rated products", "query": "GSI3: PK=CATEGORY#{cat} SK=RATING# (sort by rating)"},
]
print("E-Commerce DynamoDB Access Patterns")
print("="*80)
print(f"{'ID':<7} {'Entity':<10} {'Operation':<40} {'Query Pattern'}")
print("-"*80)
for ap in access_patterns:
print(f"{ap['id']:<7} {ap['entity']:<10} {ap['operation']:<40} {ap['query']}")
print("\n\nTable Design from Access Patterns:")
print("-"*50)
print("Main Table: PK (partition) + SK (sort)")
print("GSI1: UserId-Status + Timestamp")
print("GSI2: Category + ProductId")
print("GSI3: Category + Rating")
EOF
python3 /tmp/dynamodb_access_patterns.py
E-Commerce DynamoDB Access Patterns
================================================================================
ID Entity Operation Query Pattern
--------------------------------------------------------------------------------
AP-01 User Get user profile PK=USER#{userId} SK=PROFILE
AP-02 User Get all orders for user PK=USER#{userId} SK begins_with ORDER#
AP-04 Order Get order by ID PK=ORDER#{orderId} SK=INFO
AP-06 Order Get pending orders by user GSI1: PK=USER#{userId} SK=STATUS#pending
AP-08 Product Get products by category GSI2: PK=CATEGORY#{cat} SK begins_with PRODUCT#
cat > /tmp/dynamodb_items.py << 'EOF'
"""
DynamoDB single-table design: multiple entities in one table.
Simulates boto3 put_item, query, scan operations.
"""
import json
from decimal import Decimal
from datetime import datetime, timedelta
# Simulate DynamoDB table (in-memory)
class DynamoDBSimulator:
def __init__(self):
self.items = {} # key: (PK, SK) -> item
def put_item(self, item):
key = (item["PK"], item["SK"])
self.items[key] = item
return {"ResponseMetadata": {"HTTPStatusCode": 200}}
def get_item(self, pk, sk):
return self.items.get((pk, sk))
def query(self, pk_value, sk_prefix=None, sk_between=None):
results = []
for (pk, sk), item in self.items.items():
if pk == pk_value:
if sk_prefix is None or sk.startswith(sk_prefix):
if sk_between is None or (sk_between[0] <= sk <= sk_between[1]):
results.append(item)
return sorted(results, key=lambda x: x["SK"])
def scan(self, filter_expr=None):
return list(self.items.values())
table = DynamoDBSimulator()
# === User Profile ===
table.put_item({
"PK": "USER#alice",
"SK": "PROFILE",
"entityType": "USER",
"userId": "alice",
"email": "[email protected]",
"name": "Alice Chen",
"createdAt": "2024-01-15T10:00:00Z",
"tier": "premium"
})
# === Orders (related to user AND stored as order entity) ===
order1 = {
"PK": "USER#alice", # User partition: supports AP-02
"SK": "ORDER#2024-03-01#001", # Sort by date
"GSI1PK": "USER#alice",
"GSI1SK": "STATUS#pending#2024-03-01T12:00:00Z",
"entityType": "ORDER",
"orderId": "2024-03-01#001",
"status": "pending",
"total": Decimal("850.00"),
"currency": "USD",
"createdAt": "2024-03-01T12:00:00Z",
}
table.put_item(order1)
# Duplicate under ORDER# PK for AP-04 (get order by ID)
table.put_item({**order1, "PK": "ORDER#2024-03-01#001", "SK": "INFO"})
# === Order Items ===
table.put_item({
"PK": "ORDER#2024-03-01#001",
"SK": "ITEM#001",
"entityType": "ORDER_ITEM",
"productId": "laptop-pro-2024",
"productName": "Laptop Pro 2024",
"quantity": 1,
"unitPrice": Decimal("800.00"),
"subtotal": Decimal("800.00"),
})
table.put_item({
"PK": "ORDER#2024-03-01#001",
"SK": "ITEM#002",
"entityType": "ORDER_ITEM",
"productId": "usb-hub-4port",
"productName": "USB Hub 4-Port",
"quantity": 1,
"unitPrice": Decimal("50.00"),
"subtotal": Decimal("50.00"),
})
# === Products ===
table.put_item({
"PK": "PRODUCT#laptop-pro-2024",
"SK": "INFO",
"GSI2PK": "CATEGORY#laptops",
"GSI2SK": Decimal("4.8"), # rating
"entityType": "PRODUCT",
"productId": "laptop-pro-2024",
"name": "Laptop Pro 2024",
"price": Decimal("800.00"),
"category": "laptops",
"rating": Decimal("4.8"),
"stockCount": 50,
})
# Product Review
table.put_item({
"PK": "PRODUCT#laptop-pro-2024",
"SK": "REVIEW#alice#2024-03-01",
"entityType": "REVIEW",
"rating": 5,
"text": "Excellent performance!",
"verified": True,
})
# === Query demos ===
print("Single-Table DynamoDB Queries:")
print("="*60)
print("\n[AP-01] Get user profile:")
profile = table.get_item("USER#alice", "PROFILE")
print(f" name={profile['name']}, tier={profile['tier']}")
print("\n[AP-02] Get all orders for alice:")
orders = table.query("USER#alice", sk_prefix="ORDER#")
for o in orders:
print(f" orderId={o['orderId']}, status={o['status']}, total={o['total']}")
print("\n[AP-04] Get order INFO:")
order_info = table.get_item("ORDER#2024-03-01#001", "INFO")
print(f" order={order_info['orderId']}, total={order_info['total']}")
print("\n[AP-05] Get all items in order:")
items = table.query("ORDER#2024-03-01#001", sk_prefix="ITEM#")
for item in items:
print(f" {item['productName']} x{item['quantity']} = ${item['subtotal']}")
print("\n[AP-09] Get all reviews for laptop:")
reviews = table.query("PRODUCT#laptop-pro-2024", sk_prefix="REVIEW#")
for r in reviews:
print(f" rating={r['rating']}, text={r['text'][:30]}")
print(f"\n\nAll {len(table.items)} items in single table:")
entities = {}
for (pk, sk), item in table.items.items():
et = item.get("entityType", "?")
entities[et] = entities.get(et, 0) + 1
for et, count in sorted(entities.items()):
print(f" {et}: {count} items")
EOF
python3 /tmp/dynamodb_items.py
Single-Table DynamoDB Queries:
============================================================
[AP-01] Get user profile:
name=Alice Chen, tier=premium
[AP-02] Get all orders for alice:
orderId=2024-03-01#001, status=pending, total=850.00
[AP-05] Get all items in order:
Laptop Pro 2024 x1 = $800.00
USB Hub 4-Port x1 = $50.00
[AP-09] Get all reviews for laptop:
rating=5, text=Excellent performance!
All 7 items in single table:
ORDER: 1 items
ORDER_ITEM: 2 items
PRODUCT: 1 items
REVIEW: 1 items
USER: 1 items
cat > /tmp/dynamodb_hot_partition.py << 'EOF'
"""
DynamoDB hot partition problem and solutions.
"""
import hashlib
def show_hot_partition_problem():
print("HOT PARTITION PROBLEM")
print("="*55)
print("""
Bad partition key: status (only 'pending'/'completed'/'cancelled')
3 million orders → 3 partitions handling all traffic:
pending: ████████████████████ (2M writes/day)
completed: ████ (800K writes/day)
cancelled: ██ (200K writes/day)
DynamoDB limit: 1,000 WCU per partition
→ Hot partition throttling! 🔥
""")
def good_partition_design():
print("SOLUTIONS TO HOT PARTITIONS")
print("="*55)
solutions = {
"Write Sharding": {
"problem": "High-cardinality keys (timestamps, status)",
"solution": "Add random suffix: USER#{userId}#SHARD#{0-9}",
"tradeoff": "Reads must query all shards and merge",
"code": 'import random\npk = f"USER#{user_id}#SHARD#{random.randint(0,9)}"'
},
"Calculated Partition": {
"problem": "Sequential IDs (1, 2, 3...) all go to same partition",
"solution": "Hash the key: PRODUCT#{hash(productId) % 10}",
"tradeoff": "Predictable but breaks range queries",
"code": 'pk = f"PRODUCT#{hashlib.md5(product_id.encode()).hexdigest()[:8]}"'
},
"Composite Sort Key": {
"problem": "All user activity in single partition",
"solution": "USER#{userId} + SK: TYPE#TIMESTAMP (spread reads temporally)",
"tradeoff": "Good distribution; time-range queries work well",
"code": 'pk = f"USER#{user_id}"\nsk = f"ORDER#{datetime.now().isoformat()}"'
},
"Table Design Change": {
"problem": "Single table receiving all writes",
"solution": "Partition by time period: orders_2024_03, orders_2024_04",
"tradeoff": "App must know which table to query; more tables to manage",
"code": 'table_name = f"orders_{datetime.now().strftime(\'%Y_%m\')}"'
},
}
for solution_name, details in solutions.items():
print(f"\n[{solution_name}]")
print(f" Problem: {details['problem']}")
print(f" Solution: {details['solution']}")
print(f" Tradeoff: {details['tradeoff']}")
print(f" Code: {details['code']}")
def demonstrate_sharding():
print("\n\nWRITE SHARDING DEMO")
print("-"*40)
# Bad: all writes to same partition
print("BAD: Sequential counter as partition key")
for i in range(1, 11):
pk = f"COUNTER#global" # All go to same partition!
print(f" PK={pk} (all 10 writes → same partition! 🔥)")
break
print(" ... (same partition for all writes)")
# Good: sharded counter
print("\nGOOD: Sharded counter (10 shards)")
import random
for i in range(1, 6):
shard = random.randint(0, 9)
pk = f"COUNTER#global#SHARD#{shard}"
print(f" PK={pk} (distributed across 10 partitions ✓)")
print("\n To read total: query all 10 shards and sum")
show_hot_partition_problem()
good_partition_design()
demonstrate_sharding()
EOF
python3 /tmp/dynamodb_hot_partition.py
HOT PARTITION PROBLEM
=======================================================
Bad partition key: status (only 'pending'/'completed'/'cancelled')
pending: ████████████████████ (2M writes/day)
→ Hot partition throttling! 🔥
SOLUTIONS TO HOT PARTITIONS
[Write Sharding]
Problem: High-cardinality keys (timestamps, status)
Solution: Add random suffix: USER#{userId}#SHARD#{0-9}
[Composite Sort Key]
Solution: USER#{userId} + SK: TYPE#TIMESTAMP
cat > /tmp/dynamodb_capacity.py << 'EOF'
"""
DynamoDB capacity planning: RCU/WCU calculation.
"""
def calculate_capacity():
print("DynamoDB Capacity Calculation")
print("="*55)
# Scenario: E-commerce during peak
workload = {
"reads_per_second": 10000,
"writes_per_second": 500,
"avg_item_size_kb": 2,
"strong_reads_percent": 20, # 20% need strong consistency
"eventual_reads_percent": 80, # 80% can be eventual
}
print("\nWorkload:")
for k, v in workload.items():
print(f" {k}: {v}")
# RCU calculation
# 1 RCU = 1 strongly consistent read of up to 4 KB
# 1 RCU = 2 eventually consistent reads of up to 4 KB
item_size_units = max(1, -(-workload["avg_item_size_kb"] // 4)) # ceil(size/4)
strong_reads = workload["reads_per_second"] * (workload["strong_reads_percent"] / 100)
eventual_reads = workload["reads_per_second"] * (workload["eventual_reads_percent"] / 100)
strong_rcus = strong_reads * item_size_units
eventual_rcus = eventual_reads * item_size_units / 2 # eventual = half cost
total_rcus = strong_rcus + eventual_rcus
# WCU calculation
# 1 WCU = 1 write per second of up to 1 KB
write_size_units = max(1, workload["avg_item_size_kb"])
total_wcus = workload["writes_per_second"] * write_size_units
# Costs (us-east-1 2024)
provisioned_rcu_cost = 0.00013 # per RCU per hour
provisioned_wcu_cost = 0.00065 # per WCU per hour
on_demand_read_cost = 0.25 / 1_000_000 # per read request unit
on_demand_write_cost = 1.25 / 1_000_000 # per write request unit
hours_per_month = 730
provisioned_monthly = (total_rcus * provisioned_rcu_cost + total_wcus * provisioned_wcu_cost) * hours_per_month
on_demand_monthly = ((total_rcus + total_wcus) * 3600 * hours_per_month) * ((on_demand_read_cost + on_demand_write_cost) / 2)
print(f"\nRequired Capacity:")
print(f" Strong RCUs: {strong_rcus:,.0f}")
print(f" Eventual RCUs: {eventual_rcus:,.0f}")
print(f" Total RCUs: {total_rcus:,.0f}")
print(f" Total WCUs: {total_wcus:,.0f}")
print(f"\nMonthly Cost Estimate:")
print(f" Provisioned: ${provisioned_monthly:,.2f}/month")
print(f" On-demand: ${on_demand_monthly:,.2f}/month (estimate)")
print(f"\n → Use provisioned + auto-scaling for predictable workloads")
print(f" → Use on-demand for unpredictable/bursty workloads")
print(f" → Reserved capacity: 1yr = 76% savings on provisioned")
calculate_capacity()
print("\n\nRCU/WCU Quick Reference:")
print("-"*55)
print(" 1 RCU = 1 strong read of 4 KB (or 2 eventual reads)")
print(" 1 WCU = 1 write of 1 KB")
print(" Transactional reads: 2x RCU")
print(" Transactional writes: 2x WCU")
print(" GSI writes: 1 additional WCU per GSI item")
print(" Batch/transaction max: 100 items, 16 MB")
EOF
python3 /tmp/dynamodb_capacity.py
DynamoDB Capacity Calculation
=======================================================
Required Capacity:
Strong RCUs: 2,000
Eventual RCUs: 8,000
Total RCUs: 10,000
Total WCUs: 1,000
Monthly Cost Estimate:
Provisioned: $1,095.90/month
On-demand: estimated higher for this volume
→ Use provisioned + auto-scaling for predictable workloads
cat > /tmp/dynamodb_design_review.py << 'EOF'
"""
Complete DynamoDB design checklist and anti-patterns.
"""
print("DynamoDB Design Checklist")
print("="*60)
checklist = [
("✓", "Access patterns defined BEFORE schema design"),
("✓", "Partition key has high cardinality (thousands+ of distinct values)"),
("✓", "No hot partitions (no single PK gets >1000 WCU/s)"),
("✓", "Sort key enables range queries where needed"),
("✓", "Single-table design considered (fewer tables = fewer connections)"),
("✓", "GSI projection type chosen: ALL vs INCLUDE vs KEYS_ONLY"),
("✓", "DynamoDB Streams enabled for CDC/event-driven patterns"),
("✓", "TTL configured for session/cache/temporary data"),
("✓", "Point-in-time recovery enabled"),
("✓", "Encryption at rest (KMS)"),
("✓", "Backup strategy (PITR + on-demand backups)"),
("✓", "Capacity mode: on-demand vs provisioned based on traffic pattern"),
]
anti_patterns = [
"❌ Using timestamp as partition key (all writes go to same second's partition)",
"❌ Using status as partition key (low cardinality = hot partitions)",
"❌ Storing large items (>50KB) in DynamoDB (use S3 + store reference)",
"❌ Scan on large tables (reads ENTIRE table, expensive)",
"❌ Too many GSIs (each GSI = additional WCU cost per write)",
"❌ Using DynamoDB like a relational DB (JOINs don't exist)",
"❌ Ignoring eventual consistency window (read-after-write may be stale)",
]
print("\nBest Practices:")
for status, practice in checklist:
print(f" {status} {practice}")
print("\nAnti-Patterns to Avoid:")
for ap in anti_patterns:
print(f" {ap}")
print("\n\nWhen DynamoDB is the RIGHT choice:")
print(" ✓ Predictable single-digit millisecond latency at any scale")
print(" ✓ Massive scale: millions of requests/second")
print(" ✓ Access patterns known and limited in number")
print(" ✓ Serverless / Lambda-based applications")
print(" ✓ Gaming, IoT, mobile backends, session stores")
print("\nWhen DynamoDB is the WRONG choice:")
print(" ✗ Complex queries, ad-hoc analytics → use Aurora/Redshift")
print(" ✗ Many-to-many relationships → use RDBMS")
print(" ✗ Unknown access patterns → use PostgreSQL first")
print(" ✗ Transactions across multiple tables → DynamoDB transactions help but complex")
EOF
python3 /tmp/dynamodb_design_review.py
DynamoDB Design Checklist
============================================================
✓ Access patterns defined BEFORE schema design
✓ Partition key has high cardinality (thousands+ of distinct values)
✓ No hot partitions (no single PK gets >1000 WCU/s)
✓ DynamoDB Streams enabled for CDC/event-driven patterns
Anti-Patterns to Avoid:
❌ Using timestamp as partition key (hot partitions)
❌ Scan on large tables (reads ENTIRE table, expensive)
❌ Using DynamoDB like a relational DB (JOINs don't exist)