The Unix text processing triad — grep, awk, and sed — is your Swiss Army knife for log analysis, data transformation, and automation. In this lab you will master extended regular expressions, field-based processing, stream editing, and build real-world log analysis pipelines.
💡 Keep a test dataset for practicing text processing. Real log files can be huge. Before running sed -i (in-place edit) on a production log, always test on a copy. Use cp /var/log/nginx/access.log /tmp/test.log to make a safe copy.
💡 grep -o extracts just the matching part. Combined with sort and uniq, it's powerful: grep -oE '\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b' access.log | sort | uniq -c | sort -rn extracts and counts all IP addresses.
Step 3: awk — Field-Based Processing
awk processes text field by field. It's a complete programming language built for columnar data.
📸 Verified Output:
💡 awk uses $0 for the entire line, $1–$NF for fields, NR for line number, NF for field count. Change the field separator with -F ':' for colon-delimited files (like /etc/passwd): awk -F: '{print $1, $3}' /etc/passwd prints usernames and UIDs.
Step 4: awk Advanced — Aggregation & Reporting
📸 Verified Output:
💡 awk arrays are associative (hash maps). You can accumulate any key-value data: awk '{sum[$1]+=$10} END{for(ip in sum) print ip, sum[ip]}' access.log gives total bytes per IP. Arrays are automatically created when first referenced — no declaration needed.
Step 5: sed — Stream Editor for Transformations
sed edits text streams line by line using commands.
📸 Verified Output:
💡 sed -i edits files in-place — always test first without -i. On macOS, sed -i requires an extension argument: sed -i '' 's/old/new/' file. On Linux, sed -i 's/old/new/' file works directly. Use sed -i.bak to create a backup before editing.
Step 6: sed Advanced — In-place Editing & Config Management
📸 Verified Output:
💡 Use | as a delimiter in sed when the pattern contains /.sed 's|/old/path|/new/path|g' avoids escaping slashes. You can use any character: sed 's#old#new#g' works too. This is essential when editing file paths or URLs.
Step 7: Combining grep + awk + sed in Pipelines
📸 Verified Output:
💡 Build pipelines incrementally. Start with cat file, add | grep pattern, check output, add | awk ..., check again. Never write a 5-stage pipeline from scratch — build and verify each stage. Use | head -5 to preview without processing everything.
Step 8: Capstone — Complete Log Analysis Script
Scenario: Build a production-ready log analyzer that generates an HTML-friendly report.
📸 Verified Output:
💡 This script is a foundation for a real log monitoring tool. Add --since filtering with awk '$4 > "[15/Jan/2024:08:00:05"', email output with | mail -s "Log Report" [email protected], or schedule with cron. The grep+awk+sed combination handles any structured text file — Apache logs, nginx logs, custom app logs.