Understanding system resource usage is essential for performance tuning, capacity planning, and troubleshooting. In this lab you'll use top, ps, free, vmstat, uptime, and the /proc filesystem to monitor CPU, memory, I/O, and load average in real time.
Step 1: uptime and Load Average
Load average is the single most important quick-glance health metric.
# Check system uptime and loaduptime# Read /proc/loadavg directlycat/proc/loadavg
π‘ Trend matters more than snapshots: Compare 1-min vs 15-min load. If 1-min >> 15-min: sudden spike (investigate now). If 1-min β 15-min and both high: sustained overload (need more capacity). If 1-min << 15-min: load is decreasing (situation resolving).
Step 2: Memory Monitoring with free
πΈ Verified Output:
π‘ "available" is the real metric: Ignore the free column β Linux aggressively uses RAM for disk cache (buff/cache), which it releases on demand. The available column shows how much RAM is actually usable for new processes. If available < 10% of total, you have a real memory problem.
Step 3: ps aux with Sorting β Finding Resource Hogs
πΈ Verified Output:
π‘ RSS vs VSZ:RSS (Resident Set Size) = actual physical RAM in use right now. VSZ (Virtual Size) = all virtual memory including mapped files and shared libraries. RSS is what you care about for memory pressure. A process with high VSZ but low RSS is fine.
Step 4: vmstat β Virtual Memory Statistics
vmstat gives a system-wide snapshot of processes, memory, swap, I/O, and CPU.
πΈ Verified Output:
vmstat column guide:
πΈ Verified Output:
π‘ Key warning signs in vmstat:r > CPU count = CPU bound. b > 0 persistently = I/O bottleneck. si/so > 0 = swapping (RAM pressure). wa > 20% = disk I/O bottleneck. st > 5% = noisy neighbor on VM host.
Step 5: top β Interactive Process Monitor
top is a live, updating process monitor. Key interactive commands:
πΈ Verified Output (top -bn1):
top interactive key reference:
πΈ Verified Output:
π‘ htop is better: Install htop (apt-get install htop) for a much more user-friendly alternative with color coding, mouse support, and easy tree view. In production environments where htop isn't available, top is your fallback.
Step 6: /proc/meminfo Deep Dive
πΈ Verified Output:
π‘ Dirty pages:Dirty in /proc/meminfo shows data waiting to be written to disk. High dirty pages + slow disk = data loss risk during a crash. Kernel writes dirty pages to disk periodically (controlled by vm.dirty_ratio and vm.dirty_background_ratio in /proc/sys/vm/).
Step 7: sar and iostat β I/O and Historical Stats
πΈ Verified Output:
π‘ Enabling historical sar: On Ubuntu/Debian, edit /etc/default/sysstat and set ENABLED="true", then restart the sysstat service. Data is collected every 10 minutes and stored in /var/log/sysstat/. Invaluable for "what happened last Tuesday at 3 AM" forensics.
Step 8: Capstone β System Health Dashboard Script
Scenario: Create a comprehensive system health check script that monitors CPU, memory, disk, and processes β useful for cron-based alerting or quick triage.
πΈ Verified Output:
π‘ Schedule it with cron: Add to cron for proactive alerting: */15 * * * * /usr/local/bin/system_health.sh | grep -E 'WARNING|CRITICAL' | mail -s "Alert: $(hostname)" [email protected]. Only emails when thresholds are breached β no noise when everything is healthy.
# Interpret load average values
echo "=== Load Average Interpretation ==="
echo "Format: 1-minute, 5-minute, 15-minute averages"
echo ""
echo "Load average = average number of processes waiting for CPU"
echo ""
echo "On a 4-CPU system:"
echo " Load 1.0 = 25% capacity (1 of 4 CPUs busy)"
echo " Load 4.0 = 100% capacity (all CPUs busy)"
echo " Load 8.0 = 200% capacity (overloaded!)"
echo ""
echo "Quick rule: Load should be <= number of CPU cores"
nproc
echo "CPU cores on this system (above)"
echo "Ideal load: <= $(nproc).0"
=== Load Average Interpretation ===
Format: 1-minute, 5-minute, 15-minute averages
Load average = average number of processes waiting for CPU
On a 4-CPU system:
Load 1.0 = 25% capacity (1 of 4 CPUs busy)
Load 4.0 = 100% capacity (all CPUs busy)
Load 8.0 = 200% capacity (overloaded!)
Quick rule: Load should be <= number of CPU cores
32
CPU cores on this system (above)
Ideal load: <= 32.0
# Human-readable memory overview
free -h
# More detailed with totals
free -h -t
# Show memory in megabytes
free -m
# /proc/meminfo for granular detail
cat /proc/meminfo | head -20
# Top 8 processes by CPU usage
ps aux --sort=-%cpu | head -8
echo "---"
# Top 8 processes by memory usage
ps aux --sort=-%mem | head -8
echo "---"
# Custom format: show PID, CPU, MEM, RSS, command
ps -eo pid,%cpu,%mem,rss,comm --sort=-%mem | head -10
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 12.0 0.0 4364 3364 ? Ss 05:49 0:00 bash
root 7 0.0 0.0 7064 3096 ? R 05:49 0:00 ps aux
root 8 0.0 0.0 2804 1476 ? S 05:49 0:00 head
---
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 12.0 0.0 4364 3368 ? Ss 05:49 0:00 bash
root 9 0.0 0.0 7064 3040 ? R 05:49 0:00 ps aux
root 10 0.0 0.0 2804 1552 ? S 05:49 0:00 head
---
PID %CPU %MEM RSS COMMAND
1 0.0 0.0 3364 bash
9 0.0 0.0 3040 ps
10 0.0 0.0 1552 head
# Single snapshot
vmstat
echo "---"
# 3 samples, 1 second apart (first row = since boot)
vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 0 96123024 893376 25630892 0 0 0 2 1 1 3 0 97 0 0
---
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 0 96123024 893376 25630892 0 0 0 2 1 1 3 0 97 0 0
3 0 0 96064688 893376 25673184 0 0 0 30488 11234 4326 8 2 90 0 0
2 1 0 96022744 893680 25695648 0 0 0 7864 13004 6338 8 3 89 0 0
echo "=== vmstat Column Reference ==="
echo "procs:"
echo " r = processes RUNNING or waiting for CPU (runqueue)"
echo " b = processes in UNINTERRUPTIBLE sleep (usually I/O wait)"
echo ""
echo "memory (kB):"
echo " swpd = swap in use"
echo " free = free RAM"
echo " buff = buffer cache (metadata)"
echo " cache= page cache (file data)"
echo ""
echo "swap:"
echo " si = swap IN (reading from swap to RAM) - high = memory pressure!"
echo " so = swap OUT (writing RAM to swap) - high = memory pressure!"
echo ""
echo "io (blocks/s):"
echo " bi = blocks read from disk"
echo " bo = blocks written to disk"
echo ""
echo "cpu (%):"
echo " us = user space CPU"
echo " sy = kernel/system CPU"
echo " id = idle CPU"
echo " wa = I/O wait (high = disk bottleneck)"
echo " st = stolen (by hypervisor, VMs only)"
=== vmstat Column Reference ===
procs:
r = processes RUNNING or waiting for CPU (runqueue)
b = processes in UNINTERRUPTIBLE sleep (usually I/O wait)
memory (kB):
swpd = swap in use
free = free RAM
buff = buffer cache (metadata)
cache= page cache (file data)
swap:
si = swap IN (reading from swap to RAM) - high = memory pressure!
so = swap OUT (writing RAM to swap) - high = memory pressure!
io (blocks/s):
bi = blocks read from disk
bo = blocks written to disk
cpu (%):
us = user space CPU
sy = kernel/system CPU
id = idle CPU
wa = I/O wait (high = disk bottleneck)
st = stolen (by hypervisor, VMs only)
# Start top (interactive β press 'q' to quit)
# top
# Non-interactive: take 1 snapshot (useful in scripts)
top -bn1 | head -20
top - 05:49:22 up 6 days, 7:19, 0 users, load average: 3.23, 1.94, 1.59
Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.2 us, 0.3 sy, 0.0 ni, 96.2 id, 0.2 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 124550.0 total, 93956.1 free, 4604.5 used, 25989.5 buff/cache
MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 112328.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 4364 3364 3072 S 0.0 0.0 0:00.01 bash
7 root 20 0 7064 3096 2816 R 0.0 0.0 0:00.00 top
echo "=== top Interactive Keys ==="
echo "q = quit"
echo "h = help"
echo "k = kill a process (prompts for PID)"
echo "r = renice a process"
echo ""
echo "Sorting:"
echo "P = sort by CPU usage (default)"
echo "M = sort by Memory usage"
echo "T = sort by Time (accumulated CPU)"
echo "N = sort by PID"
echo ""
echo "Display toggles:"
echo "1 = toggle per-CPU stats"
echo "m = toggle memory display format"
echo "t = toggle task/CPU display format"
echo "c = toggle full command path"
echo "V = forest view (process tree)"
echo ""
echo "Filtering:"
echo "u = filter by username"
echo "/ (htop) = search (use htop for search in top)"
echo "o = add filter (e.g., COMMAND=python3)"
=== top Interactive Keys ===
q = quit
h = help
k = kill a process (prompts for PID)
r = renice a process
Sorting:
P = sort by CPU usage (default)
M = sort by Memory usage
...