Lab 05: Capstone — HA Two-Node Cluster

Lab 05: Capstone — HA Two-Node Cluster Blueprint

Time: 45 minutes | Level: Architect | Docker: docker run -it --rm --privileged ubuntu:22.04 bash


Overview

This capstone lab brings together everything from Labs 01–04 into a complete, production-ready two-node High Availability cluster blueprint. You will design and document the full configuration: Corosync messaging, Pacemaker cluster properties, Virtual IP resource, HAProxy as a cluster resource, health check scripts, failover testing procedures, monitoring, and generate a complete operational runbook.

Learning Objectives:

  • Integrate Corosync + Pacemaker + HAProxy + Keepalived concepts

  • Write production-grade cluster configurations

  • Design comprehensive health check scripts

  • Create failover testing procedures

  • Set up monitoring with crm_mon

  • Generate an operational runbook document


Step 1: Install the Complete HA Stack

apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y \
    pacemaker corosync pcs \
    haproxy keepalived \
    python3 curl iproute2 \
    2>/dev/null

echo "=== Installed versions ==="
echo -n "Pacemaker: "; pacemakerd --version 2>&1 | head -1
echo -n "Corosync: "; corosync -v 2>&1 | head -1
echo -n "HAProxy: "; haproxy -v 2>&1 | head -1
echo -n "Keepalived: "; keepalived --version 2>&1 | head -1
pcs --version

📸 Verified Output:

💡 Tip: In production, always verify package versions match between nodes before forming a cluster. Mixed Pacemaker versions can cause CIB schema compatibility issues.


Step 2: Corosync Configuration

The foundation of any Pacemaker cluster is the Corosync messaging layer. Here is the production configuration:

📸 Verified Output:

💡 Tip: Always use ring1_addr for a second cluster heartbeat network. If ring0 fails, Corosync fails over to ring1, preventing a false node failure declaration. Use a dedicated VLAN for cluster heartbeat traffic.


Step 3: Pacemaker Cluster Properties

📸 Verified Output:

Cluster property configuration (syntax for running cluster):

📸 Verified Output:

💡 Tip: no-quorum-policy=ignore is safe for 2-node clusters because with only 2 nodes, quorum can never be achieved after one fails. Use no-quorum-policy=stop for 3+ node clusters.


Step 4: Virtual IP Resource Configuration

📸 Verified Output:


Step 5: HAProxy Configuration as Cluster Resource

📸 Verified Output:

💡 Tip: When HAProxy is managed by Pacemaker, disable the systemd auto-start: systemctl disable haproxy. Pacemaker will start/stop it based on cluster state. If both Pacemaker and systemd try to manage haproxy, you'll get conflicts.


Step 6: Health Check Scripts

📸 Verified Output:


Step 7: Failover Testing Procedure

Test 2: Node Standby (Graceful Failover)

Test 3: Service Kill (Application Failure Simulation)

Test 4: Hard Node Failure (Physical Simulation)

Test 5: VIP Failover Validation

Test 6: HAProxy Backend Failover

Expected Failover Times

Failure Type
Detection Time
Recovery Time
Total Outage

Process killed

15s (monitor)

30s

45s

Node standby

Immediate

30-60s

30-60s

Node hard crash

3-10s (token)

45-90s

60-120s

Backend failure

6-9s (HAProxy check)

Instant (other backends)

0s (no VIP change)

Post-Failover Checks

EOF

cat /tmp/failover-test-procedure.md | head -50 echo "..." echo "Failover test procedure written ($(wc -l < /tmp/failover-test-procedure.md) lines)"

Cluster Failover Test Procedure

Pre-Test Checklist

📸 Verified Output:


3. Common Operations

3.1 Planned Maintenance (Node1)

3.2 HAProxy Config Update (Zero-Downtime)

3.3 Add New Backend Server

3.4 Emergency — Force Resource to Specific Node


4. Monitoring Commands


5. Troubleshooting

5.1 Resources Not Starting

5.2 Node Not Joining Cluster

5.3 STONITH Failure

5.4 Split-Brain Recovery


6. Emergency Contacts

Role
Contact
Phone

Primary On-Call

+1-555-0100

Backup On-Call

+1-555-0101

Vendor Support

RedHat/SUSE Support

Contract #


7. Runbook Sign-off

Date
Engineer
Change
Tested

2026-03-05

Infrastructure Team

Initial version

Yes

RUNBOOK

echo "Runbook generated:" wc -l /tmp/ha-cluster-runbook.md echo "" echo "=== Runbook Preview (first 30 lines) ===" head -30 /tmp/ha-cluster-runbook.md

Runbook generated: 155 /tmp/ha-cluster-runbook.md

=== Runbook Preview (first 30 lines) ===

HA Cluster Operational Runbook

Cluster: prod-ha-cluster

Version: 1.0 | Date: 2026-03-05

Maintainer: Operations Team [email protected]envelope


1. Cluster Overview

Item
Value

Cluster Name

prod-ha-cluster

Software

Pacemaker 2.1.2 + Corosync 3.1.6

Nodes

node1 (10.0.1.11) + node2 (10.0.1.12)

VIP

10.0.1.100

Load Balancer

HAProxy 2.4.x

Fencing

fence_ipmilan (IPMI)

Monitoring

crm_mon + Prometheus Node Exporter


2. Daily Health Checks

...

Last updated