Lab 17: cgroups — Resource Control

Time: 40 minutes | Level: Advanced | Docker: docker run -it --rm --privileged --cgroupns=host ubuntu:22.04 bash

Control groups (cgroups) are the Linux kernel mechanism for limiting, accounting, and isolating resource usage of process groups. They're the other half of container technology alongside namespaces. In this lab you'll work with cgroup v2 (the unified hierarchy), set memory and CPU limits, throttle I/O, and understand how Docker's --memory and --cpus flags map to cgroup knobs.

⚠️ Note: This lab requires --cgroupns=host so the container can write to the cgroup hierarchy. Run: docker run -it --rm --privileged --cgroupns=host ubuntu:22.04 bash


Step 1: cgroup v1 vs v2 — Understanding the Hierarchy

# Check which version is running
stat -f /sys/fs/cgroup/ | grep Type

📸 Verified Output:

    ID: a70c354cb5b8ef16 Namelen: 255     Type: cgroup2fs

cgroup2fs = cgroup v2 (unified hierarchy). Most modern Linux systems (Ubuntu 22.04+, RHEL 9+) use v2 by default.

# v1 had separate trees per controller:
# /sys/fs/cgroup/memory/   /sys/fs/cgroup/cpu/   /sys/fs/cgroup/blkio/
# v2 has ONE unified tree:
ls /sys/fs/cgroup/

# See available controllers
cat /sys/fs/cgroup/cgroup.controllers

📸 Verified Output:

Feature
cgroup v1
cgroup v2

Hierarchy

Multiple trees (one per controller)

Single unified tree

Controller attachment

Per-controller

Unified — all controllers per cgroup

Writeback attribution

Limited

Full process-level writeback tracking

Pressure stall info

No

Yes (memory.pressure, cpu.pressure)

BPF integration

Limited

Full

💡 Docker added native cgroup v2 support in Docker 20.10. Podman supported it earlier. Check with docker info | grep "Cgroup Version".


Step 2: Enable Controllers and Create a cgroup

📸 Verified Output:

📸 Verified Output:

💡 In cgroup v2, once you enable a controller in cgroup.subtree_control, all child cgroups automatically get that controller's files.


Step 3: Memory Limits

📸 Verified Output:

📸 Verified Output:

📸 Verified Output:

💡 Docker mapping: docker run --memory=100m sets memory.max = 104857600. The old v1 memory.limit_in_bytes is gone in v2.


Step 4: CPU Limits and Shares

📸 Verified Output:

📸 Verified Output:

💡 cpu.weight.nice maps cpu.weight to a nice(1) priority value (-20 to 19). This bridges cgroups and the traditional Unix scheduling API.


Step 5: I/O Throttling (blkio / io controller)

📸 Verified Output:

💡 Docker mapping: docker run --device-read-bps=/dev/sda:10mb sets io.max rbps=10485760 in the container's cgroup.


Step 6: PID Limits and Process Assignment

📸 Verified Output:

📸 Verified Output:

💡 Moving between cgroups: Write PID to the destination cgroup.procs. The process automatically leaves its old cgroup. You cannot be in two cgroups simultaneously for the same controller.


Step 7: systemd Slices and Scopes

On systemd systems, cgroups are managed through the slice/scope/service hierarchy:

📸 Verified Output:

💡 systemctl set-property myservice.service MemoryMax=100M at runtime creates a drop-in in /etc/systemd/system/myservice.service.d/ and updates the live cgroup.


Step 8: Capstone — Container Resource Accounting from Scratch

Scenario: You're a platform team member debugging resource contention. A "container" is using too much CPU and memory. You need to: (1) set limits manually, (2) run a stress test, (3) observe the kernel enforcing limits, and (4) read the accounting data.

📸 Verified Output:

📸 Verified Output:

The kernel enforced your memory limit by sending SIGKILL to the over-allocating process.


Summary

Concept
cgroup v1
cgroup v2
What It Controls

Memory hard limit

memory.limit_in_bytes

memory.max

OOM trigger threshold

Memory soft limit

memory.soft_limit_in_bytes

memory.high

Reclaim pressure

CPU relative weight

cpu.shares (1024=default)

cpu.weight (100=default)

Scheduling priority

CPU hard limit

cpu.cfs_quota_us

cpu.max

Bandwidth cap

I/O throttle

blkio.throttle.*

io.max

BPS/IOPS limits

Process count

pids.max

pids.max

Fork bomb protection

Assign process

echo PID > tasks

echo PID > cgroup.procs

Move to cgroup

Docker --memory

memory.limit_in_bytes

memory.max

Container memory cap

Docker --cpus

cpu.cfs_quota_us/period_us

cpu.max

Container CPU cap

systemd unit

MemoryLimit=

MemoryMax=

Service resource limit

Key insight: Every container runtime (Docker, containerd, Podman, CRI-O) is ultimately writing numbers into /sys/fs/cgroup/. There is no magic — it's just files.

Last updated