Lab 01: High Availability & Pacemaker/Corosync
Time: 45 minutes | Level: Architect | Docker: docker run -it --rm --privileged ubuntu:22.04 bash
Overview
In this lab you will explore enterprise High Availability (HA) concepts and the Pacemaker/Corosync stack — the industry-standard open-source clustering solution used in RHEL, SUSE, and Ubuntu Server environments. You will understand SPOF elimination, quorum, fencing, and how the cluster resource manager orchestrates service failover.
Learning Objectives:
Understand HA concepts: SPOF, failover, quorum, fencing/STONITH
Explore Pacemaker architecture: CRM, LRM, Policy Engine
Configure Corosync messaging layer
Work with cluster resources: primitive, group, clone
Use
pcsandcrm_monfor cluster management
Step 1: Install the HA Stack
Install Pacemaker, Corosync, and the pcs management tool:
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y pacemaker corosync pcsVerify versions:
📸 Verified Output:
💡 Tip: In production, nodes must have identical software versions. Mixed-version clusters can cause split-brain or unexpected resource behaviour.
Step 2: HA Concepts — SPOF, Failover, Quorum
Single Point of Failure (SPOF): Any component whose failure causes total service outage.
SPOF
Component with no redundancy (single NIC, single node)
Failover
Automatic service restart on a surviving node
Quorum
Majority vote to determine cluster health (N/2+1)
Fencing/STONITH
"Shoot The Other Node In The Head" — isolate failed node
VIP
Virtual IP that floats between nodes
Two-node quorum problem:
A 2-node cluster cannot achieve quorum after one failure without special configuration:
Solution: Configure no-quorum-policy=ignore for 2-node clusters, or add a quorum device (qdevice).
📸 Verified Output:
💡 Tip: In 3+ node clusters, always use an odd number of nodes. A 4-node cluster has the same fault tolerance as a 3-node cluster (both can lose 1 node).
Step 3: Pacemaker Architecture
Pacemaker consists of three key subsystems:
📸 Verified Output:
💡 Tip: Resource agents follow the OCF (Open Cluster Framework) standard. They accept
start,stop,monitor,meta-data, andvalidate-allactions.
Step 4: Corosync Configuration
Corosync provides the messaging and membership layer. Examine the default config:
📸 Verified Output:
Production 2-node corosync.conf:
💡 Tip: Use
corosync-keygento generate the/etc/corosync/authkeyfile for encrypted cluster communication. Copy it to all nodes before starting the cluster.
Step 5: PCS Cluster Management Commands
pcs (Pacemaker/Corosync Configuration System) is the unified management tool:
📸 Verified Output:
Key pcs commands for architects:
📸 Verified Output:
💡 Tip: Run
pcs resource listto see all available OCF/LSB/systemd resource agents.pcs resource describe ocf:heartbeat:IPaddr2gives full documentation for a specific agent.
Step 6: Cluster Resources — Primitives, Groups, Clones
Resource Types:
Creating cluster resources with pcs (syntax — requires running cluster):
📸 Verified Output (pcs resource list sample):
💡 Tip: Always create a colocation constraint between your VIP and the service it fronts. Without it, HAProxy might run on node1 while the VIP is on node2.
Step 7: Monitoring with crm_mon and STONITH
crm_mon provides real-time cluster status:
📸 Verified Output:
Sample crm_mon output (from a running cluster):
STONITH (fencing) configuration:
💡 Tip: Never disable STONITH in production! Without fencing, a failed node may continue running services causing data corruption (split-brain). Use
fence_ipmilan,fence_apc, orfence_awsdepending on your environment.
Step 8: Capstone — Architect a 3-Node HA Cluster Blueprint
Scenario: Your company runs a critical internal web application. Design a fault-tolerant 3-node Pacemaker cluster that:
Survives loss of any single node
Has proper fencing to prevent split-brain
Provides a floating VIP for the application
Runs HAProxy for load balancing
Capstone Solution Blueprint:
Verify the design:
📸 Verified Output:
Summary
Cluster setup
pcs cluster setup
Initialize Pacemaker/Corosync
Node auth
pcs host auth
Authenticate cluster nodes
Resource create
pcs resource create
Define managed services
Constraints
pcs constraint
Placement, ordering, colocation
Monitoring
crm_mon -r
Real-time cluster status
Fencing
pcs stonith create
STONITH fence agents
Quorum
pcs quorum config
View/set quorum options
CIB XML
crm_verify
Validate cluster config
Corosync config
/etc/corosync/corosync.conf
Messaging layer config
Resource agents
/usr/lib/ocf/resource.d/
OCF RA scripts
Last updated
