Lab 01: High Availability & Pacemaker/Corosync

Time: 45 minutes | Level: Architect | Docker: docker run -it --rm --privileged ubuntu:22.04 bash


Overview

In this lab you will explore enterprise High Availability (HA) concepts and the Pacemaker/Corosync stack — the industry-standard open-source clustering solution used in RHEL, SUSE, and Ubuntu Server environments. You will understand SPOF elimination, quorum, fencing, and how the cluster resource manager orchestrates service failover.

Learning Objectives:

  • Understand HA concepts: SPOF, failover, quorum, fencing/STONITH

  • Explore Pacemaker architecture: CRM, LRM, Policy Engine

  • Configure Corosync messaging layer

  • Work with cluster resources: primitive, group, clone

  • Use pcs and crm_mon for cluster management


Step 1: Install the HA Stack

Install Pacemaker, Corosync, and the pcs management tool:

apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y pacemaker corosync pcs

Verify versions:

📸 Verified Output:

💡 Tip: In production, nodes must have identical software versions. Mixed-version clusters can cause split-brain or unexpected resource behaviour.


Step 2: HA Concepts — SPOF, Failover, Quorum

Single Point of Failure (SPOF): Any component whose failure causes total service outage.

HA Concept
Description

SPOF

Component with no redundancy (single NIC, single node)

Failover

Automatic service restart on a surviving node

Quorum

Majority vote to determine cluster health (N/2+1)

Fencing/STONITH

"Shoot The Other Node In The Head" — isolate failed node

VIP

Virtual IP that floats between nodes

Two-node quorum problem:

A 2-node cluster cannot achieve quorum after one failure without special configuration:

Solution: Configure no-quorum-policy=ignore for 2-node clusters, or add a quorum device (qdevice).

📸 Verified Output:

💡 Tip: In 3+ node clusters, always use an odd number of nodes. A 4-node cluster has the same fault tolerance as a 3-node cluster (both can lose 1 node).


Step 3: Pacemaker Architecture

Pacemaker consists of three key subsystems:

📸 Verified Output:

💡 Tip: Resource agents follow the OCF (Open Cluster Framework) standard. They accept start, stop, monitor, meta-data, and validate-all actions.


Step 4: Corosync Configuration

Corosync provides the messaging and membership layer. Examine the default config:

📸 Verified Output:

Production 2-node corosync.conf:

💡 Tip: Use corosync-keygen to generate the /etc/corosync/authkey file for encrypted cluster communication. Copy it to all nodes before starting the cluster.


Step 5: PCS Cluster Management Commands

pcs (Pacemaker/Corosync Configuration System) is the unified management tool:

📸 Verified Output:

Key pcs commands for architects:

📸 Verified Output:

💡 Tip: Run pcs resource list to see all available OCF/LSB/systemd resource agents. pcs resource describe ocf:heartbeat:IPaddr2 gives full documentation for a specific agent.


Step 6: Cluster Resources — Primitives, Groups, Clones

Resource Types:

Creating cluster resources with pcs (syntax — requires running cluster):

📸 Verified Output (pcs resource list sample):

💡 Tip: Always create a colocation constraint between your VIP and the service it fronts. Without it, HAProxy might run on node1 while the VIP is on node2.


Step 7: Monitoring with crm_mon and STONITH

crm_mon provides real-time cluster status:

📸 Verified Output:

Sample crm_mon output (from a running cluster):

STONITH (fencing) configuration:

💡 Tip: Never disable STONITH in production! Without fencing, a failed node may continue running services causing data corruption (split-brain). Use fence_ipmilan, fence_apc, or fence_aws depending on your environment.


Step 8: Capstone — Architect a 3-Node HA Cluster Blueprint

Scenario: Your company runs a critical internal web application. Design a fault-tolerant 3-node Pacemaker cluster that:

  1. Survives loss of any single node

  2. Has proper fencing to prevent split-brain

  3. Provides a floating VIP for the application

  4. Runs HAProxy for load balancing

Capstone Solution Blueprint:

Verify the design:

📸 Verified Output:


Summary

Concept
Tool/Command
Purpose

Cluster setup

pcs cluster setup

Initialize Pacemaker/Corosync

Node auth

pcs host auth

Authenticate cluster nodes

Resource create

pcs resource create

Define managed services

Constraints

pcs constraint

Placement, ordering, colocation

Monitoring

crm_mon -r

Real-time cluster status

Fencing

pcs stonith create

STONITH fence agents

Quorum

pcs quorum config

View/set quorum options

CIB XML

crm_verify

Validate cluster config

Corosync config

/etc/corosync/corosync.conf

Messaging layer config

Resource agents

/usr/lib/ocf/resource.d/

OCF RA scripts

Last updated