Control groups (cgroups) are the Linux kernel mechanism for limiting, accounting, and isolating resource usage of process groups. They're the other half of container technology alongside namespaces. In this lab you'll work with cgroup v2 (the unified hierarchy), set memory and CPU limits, throttle I/O, and understand how Docker's --memory and --cpus flags map to cgroup knobs.
⚠️ Note: This lab requires --cgroupns=host so the container can write to the cgroup hierarchy. Run: docker run -it --rm --privileged --cgroupns=host ubuntu:22.04 bash
Step 1: cgroup v1 vs v2 — Understanding the Hierarchy
# Check which version is runningstat-f/sys/fs/cgroup/|grepType
📸 Verified Output:
ID: a70c354cb5b8ef16 Namelen: 255 Type: cgroup2fs
cgroup2fs = cgroup v2 (unified hierarchy). Most modern Linux systems (Ubuntu 22.04+, RHEL 9+) use v2 by default.
# v1 had separate trees per controller:# /sys/fs/cgroup/memory/ /sys/fs/cgroup/cpu/ /sys/fs/cgroup/blkio/# v2 has ONE unified tree:ls/sys/fs/cgroup/# See available controllerscat/sys/fs/cgroup/cgroup.controllers
📸 Verified Output:
Feature
cgroup v1
cgroup v2
Hierarchy
Multiple trees (one per controller)
Single unified tree
Controller attachment
Per-controller
Unified — all controllers per cgroup
Writeback attribution
Limited
Full process-level writeback tracking
Pressure stall info
No
Yes (memory.pressure, cpu.pressure)
BPF integration
Limited
Full
💡 Docker added native cgroup v2 support in Docker 20.10. Podman supported it earlier. Check with docker info | grep "Cgroup Version".
Step 2: Enable Controllers and Create a cgroup
📸 Verified Output:
📸 Verified Output:
💡 In cgroup v2, once you enable a controller in cgroup.subtree_control, all child cgroups automatically get that controller's files.
Step 3: Memory Limits
📸 Verified Output:
📸 Verified Output:
📸 Verified Output:
💡 Docker mapping:docker run --memory=100m sets memory.max = 104857600. The old v1 memory.limit_in_bytes is gone in v2.
Step 4: CPU Limits and Shares
📸 Verified Output:
📸 Verified Output:
💡 cpu.weight.nice maps cpu.weight to a nice(1) priority value (-20 to 19). This bridges cgroups and the traditional Unix scheduling API.
Step 5: I/O Throttling (blkio / io controller)
📸 Verified Output:
💡 Docker mapping:docker run --device-read-bps=/dev/sda:10mb sets io.max rbps=10485760 in the container's cgroup.
Step 6: PID Limits and Process Assignment
📸 Verified Output:
📸 Verified Output:
💡 Moving between cgroups: Write PID to the destination cgroup.procs. The process automatically leaves its old cgroup. You cannot be in two cgroups simultaneously for the same controller.
Step 7: systemd Slices and Scopes
On systemd systems, cgroups are managed through the slice/scope/service hierarchy:
📸 Verified Output:
💡 systemctl set-property myservice.service MemoryMax=100M at runtime creates a drop-in in /etc/systemd/system/myservice.service.d/ and updates the live cgroup.
Step 8: Capstone — Container Resource Accounting from Scratch
Scenario: You're a platform team member debugging resource contention. A "container" is using too much CPU and memory. You need to: (1) set limits manually, (2) run a stress test, (3) observe the kernel enforcing limits, and (4) read the accounting data.
📸 Verified Output:
📸 Verified Output:
The kernel enforced your memory limit by sending SIGKILL to the over-allocating process.
Summary
Concept
cgroup v1
cgroup v2
What It Controls
Memory hard limit
memory.limit_in_bytes
memory.max
OOM trigger threshold
Memory soft limit
memory.soft_limit_in_bytes
memory.high
Reclaim pressure
CPU relative weight
cpu.shares (1024=default)
cpu.weight (100=default)
Scheduling priority
CPU hard limit
cpu.cfs_quota_us
cpu.max
Bandwidth cap
I/O throttle
blkio.throttle.*
io.max
BPS/IOPS limits
Process count
pids.max
pids.max
Fork bomb protection
Assign process
echo PID > tasks
echo PID > cgroup.procs
Move to cgroup
Docker --memory
memory.limit_in_bytes
memory.max
Container memory cap
Docker --cpus
cpu.cfs_quota_us/period_us
cpu.max
Container CPU cap
systemd unit
MemoryLimit=
MemoryMax=
Service resource limit
Key insight: Every container runtime (Docker, containerd, Podman, CRI-O) is ultimately writing numbers into /sys/fs/cgroup/. There is no magic — it's just files.
low 0
high 0
max 0
oom 0
oom_kill 0
oom_group_kill 0
# Set CPU weight (v2 replacement for cpu.shares in v1)
# Default weight is 100; range is 1-10000
echo '200' > /sys/fs/cgroup/mylab/cpu.weight
cat /sys/fs/cgroup/mylab/cpu.weight
200
# cpu.max: hard CPU bandwidth limit
# Format: "quota period" in microseconds
# Allow 50% of one CPU: 50000us quota / 100000us period
echo '50000 100000' > /sys/fs/cgroup/mylab/cpu.max
cat /sys/fs/cgroup/mylab/cpu.max
# Limit process count in the cgroup
echo '50' > /sys/fs/cgroup/mylab/pids.max
cat /sys/fs/cgroup/mylab/pids.max
50
# Assign a process to the cgroup by writing its PID
bash -c '
echo $$ > /sys/fs/cgroup/mylab/cgroup.procs
echo "My PID is $$, now in mylab cgroup"
cat /proc/self/cgroup
echo "pids.current: $(cat /sys/fs/cgroup/mylab/pids.current)"
'
My PID is 42, now in mylab cgroup
0::/mylab
pids.current: 0
# All child processes inherit the cgroup
bash -c '
echo $$ > /sys/fs/cgroup/mylab/cgroup.procs
for i in 1 2 3; do
sleep 10 &
done
echo "Spawned 3 background sleeps"
cat /sys/fs/cgroup/mylab/pids.current
kill %1 %2 %3 2>/dev/null
'
apt-get install -y -qq systemd 2>/dev/null
# systemd hierarchy:
# -.slice (root)
# ├── system.slice (system services)
# │ ├── sshd.service
# │ └── nginx.service
# ├── user.slice (user sessions)
# │ └── user-1000.slice
# └── machine.slice (VMs, containers)
# └── docker-CONTAINERID.scope
# Show the hierarchy in /sys/fs/cgroup
find /sys/fs/cgroup -maxdepth 2 -type d | head -20
# Create a systemd transient scope (resource-limited process group)
# systemd-run creates a transient scope/service:
systemd-run --scope --slice=mylab.slice -p MemoryMax=100M -p CPUWeight=50 \
bash -c 'echo "Running in managed cgroup"; cat /proc/self/cgroup' 2>/dev/null \
|| echo "systemd not running as PID 1 (expected in container)"
/sys/fs/cgroup
/sys/fs/cgroup/mylab
/sys/fs/cgroup/init.scope
/sys/fs/cgroup/system.slice
...
systemd not running as PID 1 (expected in container)
# Simulate what systemd does: create slice directory structure
mkdir -p /sys/fs/cgroup/mylab.slice/myservice.service
echo '209715200' > /sys/fs/cgroup/mylab.slice/myservice.service/memory.max
echo '50' > /sys/fs/cgroup/mylab.slice/myservice.service/cpu.weight
echo 'Created slice → service cgroup hierarchy'
ls /sys/fs/cgroup/mylab.slice/myservice.service/ | head -5