Lab 18: System Monitoring & Performance
Time: 30 minutes | Level: Practitioner | Docker: docker run -it --rm ubuntu:22.04 bash
Overview
Performance analysis is a critical sysadmin skill. In this lab you will use vmstat, iostat, sar, top, uptime, and /proc files to measure CPU, memory, disk I/O, and identify system bottlenecks with real data from a running system.
Prerequisites: Docker installed, Labs 01–15 completed.
Step 1: uptime & Load Averages
Load average is the most fundamental performance metric.
docker run -it --rm ubuntu:22.04 bash
uptime
cat /proc/loadavg📸 Verified Output:
05:50:18 up 6 days, 7:20, 0 users, load average: 3.51, 2.26, 1.72
3.51 2.26 1.72 3/789 1031Interpreting load average:
Load average = average number of runnable + uninterruptible-sleep processes over 1/5/15 minutes.
Single CPU
1
100% busy
200% (queued)
✅ Yes
Quad core
4
25% busy
50% busy
❌ Fine
32 CPUs
32
3% busy
6% busy
❌ Fine
📸 Verified Output:
💡 The 15-minute load average tells the trend. If 1min > 15min, load is increasing. If 1min < 15min, load is decreasing. A 1min spike might be a cron job; sustained high 15min average means a real problem.
Step 2: top — Real-time Process Monitoring
📸 Verified Output:
CPU line breakdown (%Cpu(s)):
us
User space CPU
Normal workload
sy
System/kernel CPU
Too many syscalls?
ni
Nice (low-priority) processes
Usually fine
id
Idle
Low idle = busy
wa
I/O wait
Disk bottleneck!
hi
Hardware interrupts
Network/device overload
si
Software interrupts
Usually network
st
Steal time
VM CPU being taken by hypervisor
top interactive keys:
P
Sort by CPU usage
M
Sort by memory usage
k
Kill a process (enter PID)
r
Renice a process
1
Show per-CPU stats
H
Show threads
q
Quit
💡
%st(steal time) > 5% in a VM means your cloud provider is over-provisioning the host. Your VM is waiting for CPU that was promised to it. Consider upgrading instance type or moving to a dedicated host.
Step 3: vmstat — Virtual Memory Statistics
vmstat gives a compact view of processes, memory, I/O, and CPU.
📸 Verified Output:
vmstat column guide:
procs
r
Runnable processes (in CPU queue)
procs
b
Blocked (waiting for I/O)
memory
swpd
Virtual memory used (KB)
memory
free
Idle memory (KB)
memory
buff
Memory used as buffers
memory
cache
Memory used as cache
swap
si
Swap-in per second (KB/s)
swap
so
Swap-out per second (KB/s)
io
bi
Blocks read from devices
io
bo
Blocks written to devices
system
in
Interrupts per second
system
cs
Context switches per second
cpu
us/sy/id/wa/st
CPU percentages
Red flags in vmstat:
rconsistently > number of CPUs → CPU bottleneckb> 0 regularly → I/O bottlenecksi/so> 0 → Swapping (memory pressure!)wa> 20% → Disk I/O wait
💡 The first vmstat line shows averages since boot. Start reading from the second line for current activity. Use
vmstat -sfor a full memory summary, andvmstat -dfor disk statistics.
Step 4: free — Memory Analysis
📸 Verified Output:
Memory concepts:
used
Allocated by processes
Normal — monitor trend
buff/cache
Kernel disk cache
Normal — kernel reclaims when needed
available
What apps can actually use
Low available → add RAM
Swap used
Overflow to disk
Investigate memory leak!
📸 Verified Output:
💡
MemAvailableis more accurate thanMemFreefor determining actual free memory.MemFreeexcludes cache, but the kernel will reclaim cache when needed.MemAvailableaccounts for this and shows what's truly available for new processes.
Step 5: iostat — Disk I/O Analysis
📸 Verified Output:
Key iostat columns:
r/s / w/s
Reads/writes per second
Very high = busy disk
rkB/s / wkB/s
Throughput (KB/s)
Near disk max = saturated
r_await / w_await
Average I/O latency (ms)
> 20ms for HDD, > 1ms for SSD
%util
Disk utilization
> 80% = potential bottleneck
%iowait (CPU)
CPU waiting for I/O
> 20% = disk bottleneck
💡
%utilnear 100% means the disk is saturated. For SSDs,%utilcan be misleading since they handle parallel I/O — look atawaitlatency instead. Highr_awaitwith low%utilcan indicate a slow SAN or NFS mount.
Step 6: sar — System Activity Reporter
sar collects and reports historical system activity (the sadc daemon must run for history).
📸 Verified Output:
sar command flags:
sar -u 1 5
CPU utilization (5 samples, 1s interval)
sar -r 1 5
Memory utilization
sar -d 1 5
Disk I/O activity
sar -n DEV 1 5
Network interface statistics
sar -q 1 5
Load average and queue
sar -b 1 5
I/O and transfer rate
sar -f /var/log/sa/sa15
Read saved data (15th of month)
📸 Verified Output:
💡 Enable
sadcfor historical data. On Ubuntu:systemctl enable --now sysstat. This runssadcevery 10 minutes, storing data in/var/log/sa/. After 24h you can runsar -uwithout arguments to see today's history, orsar -u -f /var/log/sa/sa$(date +%d).
Step 7: Bottleneck Identification Methodology
📸 Verified Output:
💡 Performance tuning order: Always check in this sequence: CPU → Memory → Disk I/O → Network. A disk bottleneck often masquerades as high CPU (the kernel burning cycles waiting for I/O). Check
%iowaitfirst when CPU looks high.
Step 8: Capstone — Comprehensive Performance Dashboard
Scenario: Your manager asks for a 1-page performance snapshot to baseline a new server.
📸 Verified Output:
💡 Schedule this dashboard with cron for shift handover reports. Run every 8 hours and email the output:
0 */8 * * * /usr/local/bin/perf-dashboard.sh | mail -s "Server Status $(hostname)" [email protected]. Over time, you build a performance baseline that makes anomalies obvious.
Summary
uptime
Load average overview
—
/proc/loadavg
Raw load average data
—
top -bn1
Process list snapshot
-b batch, -n iterations
free -h
Memory overview
-h human-readable
/proc/meminfo
Detailed memory stats
—
vmstat 1 5
System-wide stats
1 interval, 5 count
iostat -x 1 3
Disk I/O detail
-x extended stats
sar -u 1 5
CPU history
-r memory, -d disk
ps aux --sort=-%cpu
Process CPU ranking
--sort=-%mem for memory
nproc
CPU count
—
Last updated
