Lab 16: MongoDB Sharding

Time: 45 minutes | Level: Advanced | DB: MongoDB 7

Overview

MongoDB sharding horizontally distributes data across multiple replica sets (shards). A mongos router transparently directs queries to the correct shard(s). This lab demonstrates range sharding, hashed sharding, and the balancer.


Step 1: Architecture Overview and Setup

# MongoDB sharding requires:
# 1. Config servers (3-node replica set) - store cluster metadata
# 2. Shard servers (1+ replica sets) - store actual data
# 3. mongos router - application connection point

docker network create mongo-shard-net

# Start config server (single node for lab simplicity)
docker run -d \
  --name mongo-config \
  --network mongo-shard-net \
  --hostname mongo-config \
  mongo:7 \
  --configsvr \
  --replSet configRs \
  --bind_ip_all

# Start shard 1
docker run -d \
  --name mongo-shard1 \
  --network mongo-shard-net \
  --hostname mongo-shard1 \
  mongo:7 \
  --shardsvr \
  --replSet shard1Rs \
  --bind_ip_all

# Start shard 2
docker run -d \
  --name mongo-shard2 \
  --network mongo-shard-net \
  --hostname mongo-shard2 \
  mongo:7 \
  --shardsvr \
  --replSet shard2Rs \
  --bind_ip_all

# Wait for all to start
sleep 10
for svc in mongo-config mongo-shard1 mongo-shard2; do
  for i in $(seq 1 30); do
    docker exec $svc mongosh --quiet --eval "db.runCommand({ping:1})" 2>/dev/null | grep -q "ok.*1" && break || sleep 2
  done
  echo "$svc ready"
done

📸 Verified Output:


Step 2: Initialize Config Server and Shards as Replica Sets

📸 Verified Output:


Step 3: Add Shards to the Cluster

📸 Verified Output:


Step 4: Enable Sharding on Database and Collection

📸 Verified Output:


Step 5: Insert Data and Observe Shard Distribution

📸 Verified Output:


Step 6: sh.status() — Detailed Cluster Status

📸 Verified Output:

💡 Hashed sharding divides the hash space (MinKey to MaxKey) into chunks. MongoDB's balancer automatically moves chunks between shards to maintain even distribution.


Step 7: Range vs Hashed Shard Key Analysis

📸 Verified Output:

💡 Range shard key: great for range queries but risk of hotspots. Hashed shard key: great for even distribution and point lookups, but range queries must hit all shards.


Step 8: Capstone — Add a Third Shard and Watch Balancer

📸 Verified Output:


Summary

Component
Role
Command

Config servers

Cluster metadata (sharding topology)

--configsvr --replSet configRs

Shards

Actual data storage (each a replica set)

--shardsvr --replSet shardNRs

mongos

Query router (application connects here)

mongos --configdb configRs/host:port

sh.addShard()

Register shard with cluster

sh.addShard('rsName/host:port')

sh.enableSharding()

Enable sharding on database

sh.enableSharding('dbname')

sh.shardCollection()

Shard a collection

sh.shardCollection('db.col', {key: 1})

sh.status()

Full cluster status

Shows shards, chunks, databases

Balancer

Automatic chunk redistribution

Runs in background; use sh.getBalancerState()

Key Takeaways

  • mongos is stateless — add more mongos instances for horizontal routing scale

  • Hashed shard key = even distribution, good for high-write workloads

  • Range shard key = efficient range queries, risk of hotspots with sequential keys

  • The balancer runs automatically — chunks move between shards for even distribution

  • Always shard early — resharding an existing collection is complex and slow

Last updated