Lab 05: CPU & Memory Profiling

Time: 40 minutes | Level: Advanced | Docker: docker run -it --rm --privileged ubuntu:22.04 bash


Overview

Understanding memory and CPU resource consumption at the OS level is essential for performance tuning, capacity planning, and debugging memory leaks. This lab covers /proc/meminfo, /proc/vmstat, OOM killer behavior, CPU affinity with taskset, numactl, and load simulation with stress-ng.


Step 1: Analyze /proc/meminfo

The primary interface for memory statistics is /proc/meminfo:

cat /proc/meminfo | head -20

📸 Verified Output:

MemTotal:       127539180 kB
MemFree:        95401012 kB
MemAvailable:   121376588 kB
Buffers:          911832 kB
Cached:         24433200 kB
SwapCached:            0 kB
Active:          4436600 kB
Inactive:       24802688 kB
Active(anon):    3941000 kB
Inactive(anon):        0 kB
Active(file):     495600 kB
Inactive(file): 24802688 kB
Unevictable:       26512 kB
Mlocked:           26512 kB
SwapTotal:       8388604 kB
SwapFree:        8388604 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:            101072 kB
Writeback:             0 kB

Key fields explained:

Field
Description

MemTotal

Total physical RAM

MemFree

Completely unused RAM

MemAvailable

RAM available for new allocations (incl. reclaimable cache)

Buffers

Raw block device cache

Cached

Page cache (file data)

Active(anon)

Anonymous (heap/stack) memory recently used

Inactive(anon)

Anonymous memory not recently used (swap candidate)

SwapTotal/SwapFree

Total and available swap space

Dirty

Pages modified but not yet written to disk

💡 MemAvailableMemFree. The kernel can reclaim Cached pages under pressure — MemAvailable accounts for this. Use MemAvailable to estimate actual available memory, not MemFree.


Step 2: Analyze /proc/vmstat

/proc/vmstat provides virtual memory activity counters since boot:

📸 Verified Output:

Critical counters:

Counter
Meaning

pswpin

Pages swapped in (non-zero = memory pressure)

pswpout

Pages swapped out (high = system under pressure)

pgmajfault

Major page faults (required disk I/O)

pgfault

Minor page faults (resolved in RAM)

oom_kill

Processes killed by OOM killer — should be 0

💡 Monitor vmstat over time with watch -n 1 'grep oom_kill /proc/vmstat'. A non-zero and growing oom_kill indicates serious memory exhaustion.


Step 3: Monitor Real-Time Memory with vmstat

📸 Verified Output:

Memory pressure indicators:

  • si/so > 0: swap activity — memory is exhausted

  • b > 0: processes blocked waiting for I/O

  • wa CPU% > 20%: I/O bottleneck

  • free decreasing + si increasing: OOM imminent


Step 4: OOM Killer and Memory Overcommit

The OOM (Out-of-Memory) killer terminates processes when memory is exhausted. Control its behavior:

📸 Verified Output:

Overcommit policies:

Value
Behavior

0 (default)

Heuristic: allow reasonable overcommit

1

Always allow overcommit (no limits)

2

Never overcommit beyond overcommit_ratio% of RAM + swap

💡 Set oom_score_adj = -1000 for critical daemons like databases. Set it to +500 for non-critical batch jobs to make them OOM-killed first, protecting your database.


Step 5: CPU Affinity with taskset

Bind processes to specific CPU cores to reduce cache thrashing and improve predictability:

📸 Verified Output:

📸 Verified Output:

Use cases for CPU affinity:

  • Real-time applications: Pin to isolated CPUs to avoid scheduling jitter

  • NUMA optimization: Pin processes to cores local to their memory

  • Cache isolation: Prevent different workloads from evicting each other's cache

💡 For persistent affinity settings in systemd services, use CPUAffinity=0,1 in the [Service] section of the unit file.


Step 6: Inspect Per-Process Memory with /proc/PID/smaps

/proc/PID/smaps provides detailed memory mapping information:

📸 Verified Output:

Key smaps fields:

Field
Description

Size

Total mapping size

Rss

Resident set size (currently in RAM)

Pss

Proportional set size (shared pages counted fractionally)

Private_Dirty

Modified private pages — the real memory cost

Swap

Pages swapped out for this mapping

📸 Verified Output:

💡 For detecting memory leaks: monitor Private_Dirty growth over time. A process with steadily growing Private_Dirty in anonymous mappings is leaking heap memory.


Step 7: numactl and NUMA Topology

On multi-socket servers, NUMA (Non-Uniform Memory Access) affects performance. Memory access is faster when using RAM local to the CPU socket.

📸 Verified Output (single-node system):

📸 Verified Output:

💡 High numa_miss in /proc/vmstat means processes are accessing remote NUMA memory — a performance penalty of 2–4x latency. Fix with numactl --membind or kernel-level NUMA balancing (kernel.numa_balancing=1).


Step 8: Capstone — Load Simulation and Memory Analysis

Scenario: Simulate load and measure system behavior under stress.

📸 Verified Output:

📸 Verified Output:

A stable Private_Dirty across samples = no memory leak. A growing value = potential leak.


Summary

Tool / File
Purpose

/proc/meminfo

System-wide memory statistics

/proc/vmstat

Virtual memory event counters

/proc/PID/smaps

Per-process detailed memory mappings

/proc/PID/status

Process VmRSS, VmSize, VmSwap summary

/proc/sys/vm/overcommit_memory

OOM overcommit policy

/proc/PID/oom_score_adj

OOM kill priority adjustment

vmstat 1 3

Real-time memory + CPU + swap overview

taskset -c 0,1 <cmd>

Pin process to CPU cores

taskset -p PID

Check current CPU affinity

numactl --hardware

Show NUMA topology

numactl --membind=0 <cmd>

Bind process memory to NUMA node

stress-ng --cpu N

CPU load simulation

stress-ng --vm 1 --vm-bytes 512M

Memory pressure simulation

awk '/Private_Dirty/ {sum+=$2}' smaps

Total private memory cost

Last updated