Master MongoDB Atlas architecture: cluster tiers, Atlas Search, Atlas Triggers, Data Federation, and connection management. Simulate Atlas patterns using local mongo:7 with the same pymongo code.
📚 Background
Atlas Cluster Tiers
Tier
RAM
vCPU
Storage
Cost/month
Use Case
M0 Free
512 MB
Shared
512 MB
Free
Dev/prototyping
M10
2 GB
2
10 GB
~$57
Small production
M20
4 GB
2
20 GB
~$115
Medium workloads
M30
8 GB
2
40 GB
~$230
Production baseline
M40
16 GB
4
80 GB
~$460
High traffic
M50
32 GB
8
160 GB
~$920
Heavy workloads
Atlas Features
Atlas Search: Lucene-based full-text search, built into Atlas
Atlas Data Federation: Query S3, Atlas, HTTP endpoints with unified SQL-like syntax
Database/Scheduled/Auth triggers run serverless functions
Data Federation
Query S3, Atlas, HTTP sources with unified aggregation pipeline
Connection string
mongodb+srv:// — SRV record handles host discovery
w=majority
Write confirmed by majority of replicas; safe default
PITR
Point-in-time restore to any second within 7-day window (M10+)
Atlas Charts
Built-in BI/dashboard tool; no additional cost for M10+
💡 Architect's insight: Atlas Search eliminates a separate Elasticsearch deployment for most use cases — if you're already on Atlas, use it. The real cost savings of Atlas is ops time, not just money.
Atlas Triggers Simulation
==================================================
1. Database Trigger (INSERT on orders):
[Trigger: ORDER_INSERTED]
orderId: order-demo-001
→ Send confirmation email to user alice
→ Decrement inventory for 1 items
→ Create invoice record
2. Scheduled Trigger:
[Trigger: DAILY_REPORT] 2024-03-01
→ Generate daily sales report
→ Archive orders older than 90 days to cold storage
cat > /tmp/atlas_federation.py << 'EOF'
"""
Atlas Data Federation: Query S3, MongoDB, HTTP sources with unified SQL.
"""
print("Atlas Data Federation")
print("="*55)
print("""
Architecture:
┌─────────────────────────────────────────┐
│ Atlas Federated Query │
│ (Virtual Namespace Abstraction) │
└─────┬───────────┬────────────┬──────────┘
│ │ │
Atlas DB S3 Bucket HTTP URL
(real-time) (archive) (API data)
Query unifies data from multiple sources:
""")
# Atlas Data Federation query examples
federation_examples = [
{
"scenario": "Query current + archived orders",
"query": """
db.getSiblingDB("VirtualDB").getCollection("AllOrders").aggregate([
// Federated query across Atlas (recent) + S3 (archived)
{"$match": {"userId": "alice"}},
{"$sort": {"createdAt": -1}},
{"$limit": 100}
])
// Atlas routes: createdAt > 90d → MongoDB cluster
// createdAt < 90d → S3 parquet files
"""
},
{
"scenario": "Join MongoDB users with S3 CSV analytics",
"query": """
db.getSiblingDB("FederatedDB").getCollection("UserAnalytics").aggregate([
{"$lookup": {
"from": "s3_bucket.analytics_csv", // S3 data source
"localField": "_id",
"foreignField": "user_id",
"as": "analytics"
}}
])
"""
}
]
for ex in federation_examples:
print(f"[{ex['scenario']}]")
print(ex['query'])
print("Atlas Backup Strategy:")
print("-"*40)
backup_tiers = [
("M0 Free", "No automated backup"),
("M2/M5", "Continuous backup, no PITR"),
("M10+", "Continuous backup + PITR up to 7 days"),
("Dedicated", "Snapshots + PITR, configurable retention"),
]
for tier, backup in backup_tiers:
print(f" {tier:<12}: {backup}")
print("\nAtlas Backup Concepts:")
print(" Continuous backup: Write-ahead log shipped every minute")
print(" PITR: Restore to any second within retention window")
print(" Cross-region backup: Automatic copy to secondary region")
print(" Encryption: Atlas-managed or customer-managed KMS keys")
print(" Cost: ~$2.50/GB/month for backup storage")
EOF
python3 /tmp/atlas_federation.py
Atlas Data Federation
=======================================================
Architecture:
┌─────────────────────────────────────────┐
│ Atlas Federated Query │
└─────┬───────────┬────────────┬──────────┘
│ │ │
Atlas DB S3 Bucket HTTP URL
Atlas Backup Strategy:
M0 Free : No automated backup
M10+ : Continuous backup + PITR up to 7 days
cat > /tmp/atlas_connection.py << 'EOF'
"""
MongoDB Atlas connection strings and best practices.
"""
# Atlas connection string anatomy
atlas_connection = "mongodb+srv://username:[email protected]/"
# Connection string options for production
production_options = {
"retryWrites": "true", # Auto-retry transient errors
"w": "majority", # Write concern: wait for majority
"readPreference": "secondaryPreferred", # Read from replicas when possible
"maxPoolSize": "100", # Connection pool size
"serverSelectionTimeoutMS": "5000", # 5s timeout for server selection
"connectTimeoutMS": "10000", # 10s for initial connection
"socketTimeoutMS": "45000", # 45s socket timeout
"tls": "true", # Atlas always uses TLS
}
print("Atlas Connection String Best Practices")
print("="*55)
print(f"\nBase URI: {atlas_connection}")
full_uri = atlas_connection + "?retryWrites=true&w=majority&readPreference=secondaryPreferred"
print(f"\nProduction URI:")
print(f" {full_uri}")
print("\nPymongo Client Options:")
client_code = '''
from pymongo import MongoClient
import certifi
client = MongoClient(
"mongodb+srv://user:[email protected]/",
# Connection pool
maxPoolSize=100,
minPoolSize=10,
maxIdleTimeMS=60000, # Close idle connections after 1 min
# Timeouts
serverSelectionTimeoutMS=5000,
connectTimeoutMS=10000,
socketTimeoutMS=45000,
# Write concern
w="majority",
wTimeoutMS=10000,
# TLS (Atlas requires TLS)
tls=True,
tlsCAFile=certifi.where(),
# Read preference
readPreference="secondaryPreferred",
# Retry
retryWrites=True,
retryReads=True,
)
'''
print(client_code)
print("Connection Pooling guidance:")
print(" Web servers: maxPoolSize = 10-50 per process")
print(" Lambda/FaaS: reuse client outside handler; min pool = 0")
print(" High traffic: maxPoolSize = 100-200, use Atlas proxy")
print(" Atlas Data API: HTTP REST API (no driver needed for simple ops)")
EOF
python3 /tmp/atlas_connection.py
Atlas Connection String Best Practices
=======================================================
Base URI: mongodb+srv://username:[email protected]/
Production URI:
mongodb+srv://...?retryWrites=true&w=majority&readPreference=secondaryPreferred
Connection Pooling guidance:
Web servers: maxPoolSize = 10-50 per process
Lambda/FaaS: reuse client outside handler; min pool = 0