Lab 04: Linux Clustering & Shared Storage

Time: 45 minutes | Level: Architect | Docker: docker run -it --rm --privileged ubuntu:22.04 bash

Overview

Enterprise Linux clustering requires more than just failover — it requires coordinated access to shared storage. In this lab you will explore cluster storage architectures including GFS2 and OCFS2 cluster filesystems, DRBD block-level replication, DLM distributed lock management, cluster LVM, and fencing devices that protect data integrity in split-brain scenarios.

Learning Objectives:

Distinguish shared-nothing vs shared-disk cluster architectures
Understand GFS2/OCFS2 cluster filesystem concepts and configuration
Configure DRBD (Distributed Replicated Block Device)
Understand DLM (Distributed Lock Manager) operation
Configure fencing devices (fence_xvm, fence_ipmilan)
Use clvmd for cluster LVM management

Step 1: Install Cluster Storage Tools

apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y \
    pacemaker corosync pcs \
    lvm2 \
    kmod \
    2>/dev/null

echo "=== Installed tools ==="
lvm version 2>&1 | head -3

📸 Verified Output:

=== Installed tools ===
  LVM version:     2.03.11(2) (2021-01-08)
  Library version: 1.02.175 (2021-01-08)
  Driver version:  4.45.0

💡 Tip: In a real production environment, you would also install drbd-utils, gfs2-utils, ocfs2-tools, and dlm-controld. These require kernel modules and real block devices — Docker containers use the host kernel.

Step 2: Cluster Architecture Comparison

Shared-Nothing Architecture

┌─────────────────────────────────────────────────────────────┐
│              SHARED-NOTHING CLUSTER                         │
│                                                             │
│   Node1                         Node2                       │
│ ┌──────────────┐               ┌──────────────┐            │
│ │ App + Data   │               │ App + Data   │            │
│ │ /dev/sda     │               │ /dev/sda     │            │
│ └──────┬───────┘               └──────┬───────┘            │
│        │                              │                     │
│        └─────────── Network ──────────┘                    │
│                    (Replication)                            │
│                                                             │
│ ✓ No shared hardware dependency                             │
│ ✓ Can use commodity storage                                 │
│ ✓ DRBD, MySQL Replication, PostgreSQL Streaming             │
│ ✗ Failover requires service restart + data sync lag         │
└─────────────────────────────────────────────────────────────┘

Shared-Disk Architecture

┌─────────────────────────────────────────────────────────────┐
│              SHARED-DISK CLUSTER                            │
│                                                             │
│   Node1                         Node2                       │
│ ┌──────────────┐               ┌──────────────┐            │
│ │     App      │               │     App      │            │
│ └──────┬───────┘               └──────┬───────┘            │
│        │                              │                     │
│        └──────────────┬───────────────┘                    │
│                       │                                     │
│              ┌────────▼────────┐                           │
│              │  SAN / iSCSI /  │                           │
│              │  Shared LUN     │                           │
│              │  /dev/mapper/   │                           │
│              └─────────────────┘                           │
│                                                             │
│ ✓ Instant failover (no data sync needed)                    │
│ ✓ Both nodes can read/write simultaneously (GFS2/OCFS2)     │
│ ✗ SAN is expensive, shared storage is SPOF                  │
│ ✗ Requires cluster filesystem (not ext4/xfs)                │
└─────────────────────────────────────────────────────────────┘

Step 3: DRBD — Distributed Replicated Block Device

DRBD provides synchronous block-level replication between two nodes (RAID-1 over the network):

Node1                              Node2
┌────────────────┐                ┌────────────────┐
│ /dev/drbd0     │◄──────────────►│ /dev/drbd0     │
│ Primary        │  TCP replication│ Secondary      │
│ Read/Write     │  port 7788     │ Read-only      │
│                │                │                │
│ /dev/sdb ──────┘                └────── /dev/sdb │
│ (disk1)                                (disk1)   │
└────────────────┘                └────────────────┘

DRBD resource configuration (/etc/drbd.d/mydata.res):

cat > /tmp/mydata.res << 'EOF'
# DRBD Resource Configuration
# File: /etc/drbd.d/mydata.res

resource mydata {
    protocol C;          # Synchronous replication (A=async, B=semi-sync, C=sync)

    startup {
        wfc-timeout         15;    # Wait-for-connection timeout (seconds)
        degr-wfc-timeout    30;    # Degraded mode WFC timeout
        outdated-wfc-timeout 20;
    }

    net {
        cram-hmac-alg    sha1;
        shared-secret    "ClusterSecret2024!";
        max-buffers      8000;
        max-epoch-size   8000;
        sndbuf-size      0;        # Auto-tune
        rcvbuf-size      0;        # Auto-tune
    }

    disk {
        on-io-error     detach;    # Detach on local disk error (safest)
        fencing         resource-only;  # Use Pacemaker for fencing
    }

    syncer {
        rate            40M;       # Max sync rate (40 MB/s)
        al-extents      3833;      # Activity log extents
        verify-alg      sha1;      # Hash for online verify
    }

    # Node definitions
    on node1.cluster.local {
        device    /dev/drbd0;
        disk      /dev/sdb;        # Raw block device or partition
        address   10.0.1.11:7788;
        meta-disk internal;        # Metadata stored on same disk
    }

    on node2.cluster.local {
        device    /dev/drbd0;
        disk      /dev/sdb;
        address   10.0.1.12:7788;
        meta-disk internal;
    }
}
EOF

echo "DRBD resource config written:"
cat /tmp/mydata.res

📸 Verified Output:

DRBD resource config written:
# DRBD Resource Configuration
# File: /etc/drbd.d/mydata.res

resource mydata {
    protocol C;
...

DRBD management commands:

cat << 'EOF'
# Initialize DRBD metadata on both nodes:
drbdadm create-md mydata

# Bring up DRBD resource
drbdadm up mydata

# Force node1 to primary (initial setup only - overwrites secondary!)
drbdadm -- --overwrite-data-of-peer primary mydata

# Check sync status
cat /proc/drbd

# Expected /proc/drbd output during sync:
# version: 8.4.11 (api:1/proto:86-101)
# 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
#    ns:102400 nr:0 dw:0 dr:103424 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
#    [====>...............] sync'ed: 25.0% (460/614)Mirs
#    finish: 0:00:12 speed: 38,400 (32,768) K/sec

# After sync complete:
# 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----

# Create filesystem on Primary:
mkfs.xfs /dev/drbd0

# Mount on Primary:
mount /dev/drbd0 /mnt/data
EOF

💡 Tip: DRBD Protocol C is synchronous — the write is only acknowledged to the application when BOTH nodes have written the data to disk. This guarantees zero data loss on failover but adds write latency proportional to network RTT. Use Protocol A for geo-replication where some data loss is acceptable.

Step 4: GFS2 — Global Filesystem 2

GFS2 is a cluster filesystem allowing simultaneous read/write from multiple nodes on shared storage:

# GFS2 package info (cannot mount without real block device in Docker)
cat << 'EOF'
# Install GFS2 tools:
apt-get install -y gfs2-utils dlm-controld

# Create GFS2 filesystem on shared LUN (run once):
mkfs.gfs2 -j 2 -p lock_dlm -t myCluster:myFS /dev/mapper/shared_lun
#   -j 2          = 2 journals (one per node)
#   -p lock_dlm   = use DLM for locking
#   -t myCluster:myFS = cluster_name:filesystem_name

# Mount on ALL nodes simultaneously:
mount -t gfs2 -o noatime /dev/mapper/shared_lun /mnt/gfs2

# Check GFS2 status:
gfs2_tool stat /mnt/gfs2

# Show lock state:
gfs2_tool lockdump /mnt/gfs2
EOF

# Show GFS2 mount options reference
cat << 'EOF'
GFS2 Mount Options:
  noatime          - Don't update access times (performance)
  nodiratime       - Don't update directory access times
  localflocks      - Use local POSIX locks (no cluster coordination)
  quota=on|off|account - Quota enforcement
  errors=panic     - Panic on errors (data integrity)

Key GFS2 concepts:
  Journals    - Each node needs its own journal (/dev/gfs2 jounal0, journal1...)
  DLM         - Distributed Lock Manager coordinates concurrent access
  Fencing     - STONITH required! Unfenced nodes corrupt the filesystem
  glocktrace  - Trace glock contention: echo 1 > /sys/kernel/debug/gfs2/glock_stats
EOF

echo "GFS2 config reference shown"

📸 Verified Output:

GFS2 config reference shown

💡 Tip: GFS2 requires fencing (STONITH). Without proper fencing, if a node fails mid-write, surviving nodes cannot be sure the failed node stopped writing. The filesystem could be corrupted. Never disable STONITH on a GFS2 cluster.

Step 5: OCFS2 — Oracle Cluster Filesystem 2

OCFS2 is Oracle's cluster filesystem, better suited for VM image storage and databases:

cat << 'EOF'
# OCFS2 installation and configuration:
apt-get install -y ocfs2-tools

# Configure /etc/ocfs2/cluster.conf on all nodes:
cat > /etc/ocfs2/cluster.conf << CONF
cluster:
    node_count = 2
    name = ocfs2cluster

node:
    ip_port = 7777
    ip_address = 10.0.1.11
    number = 0
    name = node1
    cluster = ocfs2cluster

node:
    ip_port = 7777
    ip_address = 10.0.1.12
    number = 1
    name = node2
    cluster = ocfs2cluster
CONF

# Start O2CB cluster service:
systemctl start o2cb
o2cb register-cluster ocfs2cluster

# Create OCFS2 filesystem:
mkfs.ocfs2 -L "myOCFS2" -N 2 -b 4096 -C 32768 /dev/mapper/shared_lun
#   -N 2    = max nodes
#   -b 4096 = block size
#   -C 32768= cluster size

# Mount on all nodes:
mount -t ocfs2 /dev/mapper/shared_lun /mnt/ocfs2

# Check status:
mounted.ocfs2 -d
debugfs.ocfs2 /dev/mapper/shared_lun
EOF

echo "OCFS2 reference shown"

📸 Verified Output:

OCFS2 reference shown

Step 6: DLM — Distributed Lock Manager

DLM provides distributed locking services for GFS2, OCFS2, and cluster LVM:

cat << 'EOF'
# DLM architecture:
#
# Application (GFS2/clvmd)
#       |
#   dlm_controld (user space daemon)
#       |
#   dlm.ko (kernel module)
#       |
#   Corosync (messaging/membership)
#       |
#   Network to other nodes

# Install and start DLM:
apt-get install -y dlm-controld
systemctl start dlm

# Check DLM status:
dlm_tool status

# List all lockspaces:
dlm_tool ls

# Show lock details for a lockspace:
dlm_tool lockdebug -v myCluster

# Monitor DLM in real time:
dlm_tool monitor
EOF

# Show kernel DLM debug interface
cat << 'EOF'
# DLM kernel debug (on running system):
ls /sys/kernel/debug/dlm/
cat /sys/kernel/debug/dlm/myCluster

# DLM lockspace states:
#   r = read lock
#   w = write lock (exclusive)
#   pr = protected read
#   cw = concurrent write

# DLM in /proc:
cat /proc/misc | grep dlm
EOF

echo "DLM reference complete"

📸 Verified Output:

DLM reference complete

💡 Tip: DLM lockspace names must be unique and match between all cluster nodes. GFS2 uses lock_dlm as its locking protocol with lockspace name matching the cluster name specified in mkfs.gfs2 -t <cluster>:<fs>.

Step 7: Cluster LVM (clvmd) and Fencing Devices

Cluster LVM:

cat << 'EOF'
# Cluster LVM allows multiple nodes to manage shared LVM volumes
# clvmd is the cluster LVM daemon (replaced by lvmlockd in newer systems)

# Install:
apt-get install -y lvm2 lvm2-lockd

# Configure LVM for cluster use (/etc/lvm/lvm.conf):
# locking_type = 3        # (legacy: cluster locking via clvmd)
# use_lvmlockd = 1        # (modern: lvmlockd-based locking)

# Start lvmlockd:
systemctl start lvmlockd

# Initialize VG for shared access:
vgcreate --shared SharedVG /dev/sdb /dev/sdc
vgchange --lockstart SharedVG

# Create LV in shared VG:
lvcreate -L 100G -n data SharedVG

# On secondary nodes, activate VG:
vgchange --lockstart SharedVG
lvchange -ay SharedVG/data

# Check LVM lock state:
lvmlockctl --info
EOF

echo "=== Fencing Devices ==="
# Show fence agent options
cat << 'EOF'
# Fencing devices available:

# 1. fence_ipmilan — IPMI/iDRAC/iLO fencing
pcs stonith create fence-node2 fence_ipmilan \
    ip=10.0.0.12 username=admin password=secret \
    pcmk_host_list=node2 lanplus=1 \
    op monitor interval=60s

# 2. fence_xvm — Xen/KVM VM fencing via hypervisor
pcs stonith create fence-vm1 fence_xvm \
    pcmk_host_list=vm-node1 \
    key_file=/etc/cluster/fence_xvm.key

# 3. fence_aws — AWS EC2 instance fencing
pcs stonith create fence-ec2 fence_aws \
    region=us-east-1 \
    access_key=AKID... \
    secret_key=secret... \
    pcmk_host_map="node1:i-0123456789abc;node2:i-0fedcba987654"

# 4. fence_azure_arm — Azure VM fencing
pcs stonith create fence-azure fence_azure_arm \
    username=sp-app-id \
    password=sp-secret \
    subscription_id=sub-id \
    tenant_id=tenant-id \
    resource_group=MyRG

# Test fencing (CAREFUL — this reboots/powers off a node!):
pcs stonith fence node2
stonith_admin --fence node2

# Verify fencing config:
pcs stonith config
EOF

echo "Fencing reference complete"

📸 Verified Output:

=== Fencing Devices ===
Fencing reference complete

💡 Tip: fence_ipmilan requires the lanplus option for IPMI 2.0 (required for most modern servers with iDRAC/iLO). Test fencing manually before relying on it: fence_ipmilan -a <ip> -l admin -p secret --lanplus -o status. Never configure fencing that you haven't tested!

Step 8: Capstone — Design a Shared Storage Cluster

Scenario: Design a 2-node database cluster using DRBD + GFS2 + Pacemaker with proper fencing for a PostgreSQL HA solution.

cat > /tmp/cluster-storage-blueprint.md << 'EOF'
# Two-Node PostgreSQL HA Cluster Blueprint

## Architecture Decision: DRBD (shared-nothing)
Chosen because:
- No SAN required (cost savings)
- Node-local SSDs for optimal IOPS
- Protocol C ensures zero data loss
- Works in any datacenter/cloud

## Hardware Layout
node1: 10.0.1.11, 32GB RAM, 2x NVMe (OS+Data), IPMI at 10.0.0.11
node2: 10.0.1.12, 32GB RAM, 2x NVMe (OS+Data), IPMI at 10.0.0.12

## Storage Layer: DRBD
- Device: /dev/sdb (500GB NVMe) → /dev/drbd0
- Protocol: C (synchronous)
- Sync rate: 200 MB/s (limited to protect prod workload)
- Filesystem: XFS (PostgreSQL data directory)

## Cluster Layer: Pacemaker + Corosync
Resources (in order):
1. StonithIPMI-node2 (fence_ipmilan)
2. StonithIPMI-node1 (fence_ipmilan)
3. DrbdData (ocf:linbit:drbd) — DRBD promotable clone
4. DrbdDataFS (ocf:heartbeat:Filesystem) — XFS mount
5. ClusterVIP (ocf:heartbeat:IPaddr2) — floating 10.0.1.100
6. PostgreSQL (ocf:heartbeat:pgsql) — database service

## Resource Constraints
- DrbdDataFS REQUIRES DrbdData Primary
- ClusterVIP REQUIRES DrbdDataFS on same node
- PostgreSQL REQUIRES ClusterVIP on same node
- All resources COLOCATED on same node

## Failover Procedure (Automatic)
1. node1 crashes / network partition
2. Corosync detects node1 loss (deadtime ~10s)
3. Pacemaker triggers STONITH: fence_ipmilan powers off node1
4. STONITH confirmed → proceed with failover
5. DRBD promotes node2 to Primary
6. XFS mounts /var/lib/postgresql on node2
7. VIP 10.0.1.100 added to node2
8. PostgreSQL starts on node2
9. Total failover time: ~45-90 seconds

## Monitoring Commands
crm_mon -r                         # Real-time cluster status
drbdadm status                     # DRBD sync status
xfs_admin -l /dev/drbd0            # XFS label check
pg_isready -h 10.0.1.100           # PostgreSQL health
EOF

cat /tmp/cluster-storage-blueprint.md

📸 Verified Output:

# Two-Node PostgreSQL HA Cluster Blueprint

## Architecture Decision: DRBD (shared-nothing)
Chosen because:
- No SAN required (cost savings)
- Node-local SSDs for optimal IOPS
- Protocol C ensures zero data loss
- Works in any datacenter/cloud
...

# Validate LVM is ready for cluster use
lvm version 2>&1 | head -3
lvmconfig --type default locking 2>&1 | head -10 || lvm config locking_type 2>&1 | head -5
echo "Storage cluster blueprint complete"

📸 Verified Output:

  LVM version:     2.03.11(2) (2021-01-08)
  Library version: 1.02.175 (2021-01-08)
  Driver version:  4.45.0
Storage cluster blueprint complete

Summary

Technology

Use Case

Key Tool

Config File

Shared-Nothing

General HA, cloud

DRBD, replication

/etc/drbd.d/*.res

Shared-Disk

Concurrent access

GFS2, OCFS2

/etc/cluster/

DRBD

Block-level replication

drbdadm

/etc/drbd.d/

GFS2

Cluster filesystem (shared)

mkfs.gfs2, gfs2_tool

Corosync cluster name

OCFS2

Oracle cluster filesystem

mkfs.ocfs2

/etc/ocfs2/cluster.conf

DLM

Distributed locking

dlm_tool

Auto via Corosync

Cluster LVM

Shared VG/LV management

lvmlockd, lvmlockctl

/etc/lvm/lvm.conf

fence_ipmilan

Physical server fencing

fence_ipmilan

pcs stonith config

fence_aws

AWS EC2 fencing

fence_aws

pcs stonith config

fence_xvm

VM/hypervisor fencing

fence_xvm

/etc/cluster/fence_xvm.key

PreviousLab 03: Keepalived — VRRP Failover NextLab 05: Capstone — HA Two-Node Cluster

Last updated 27 days ago

Good morning

hashtagOverview

hashtagStep 1: Install Cluster Storage Tools

hashtagStep 2: Cluster Architecture Comparison

hashtagShared-Nothing Architecture

hashtagShared-Disk Architecture

hashtagStep 3: DRBD — Distributed Replicated Block Device

hashtagStep 4: GFS2 — Global Filesystem 2

hashtagStep 5: OCFS2 — Oracle Cluster Filesystem 2

hashtagStep 6: DLM — Distributed Lock Manager

hashtagStep 7: Cluster LVM (clvmd) and Fencing Devices

hashtagStep 8: Capstone — Design a Shared Storage Cluster

hashtagSummary

Overview

Step 1: Install Cluster Storage Tools

Step 2: Cluster Architecture Comparison

Shared-Nothing Architecture

Shared-Disk Architecture

Step 3: DRBD — Distributed Replicated Block Device

Step 4: GFS2 — Global Filesystem 2

Step 5: OCFS2 — Oracle Cluster Filesystem 2

Step 6: DLM — Distributed Lock Manager

Step 7: Cluster LVM (clvmd) and Fencing Devices

Step 8: Capstone — Design a Shared Storage Cluster

Summary