Lab 14: Regular Expressions

🎯 Objective

Master Python's re module — pattern matching, extraction, validation, and text transformation using regular expressions.

📚 Background

Regular expressions (regex) are a mini-language for describing text patterns. They're used everywhere: validating emails, parsing log files, extracting data from HTML, reformatting dates, and sanitizing user input. Python's re module implements Perl-compatible regex. While regex can look cryptic, a handful of patterns handles 90% of real-world tasks.

⏱️ Estimated Time

35 minutes

📋 Prerequisites

  • Lab 13: Debugging & Testing

🛠️ Tools Used

  • Python 3.12 (re module — no install needed)

🔬 Lab Instructions

Step 1: Regex Basics — Matching Patterns

📸 Verified Output:

Step 2: Core Regex Syntax

📸 Verified Output:

Step 3: Capturing Groups

📸 Verified Output:

Step 4: Common Validation Patterns

📸 Verified Output:

Step 5: re.sub() — Search and Replace

📸 Verified Output:

Step 6: re.compile() — Pre-compiled Patterns

📸 Verified Output:

Step 7: re.split() and Lookahead/Lookbehind

📸 Verified Output:

Step 8: Real-World Log Parser

📸 Verified Output:

✅ Verification

Expected output:

🚨 Common Mistakes

  1. Not using raw strings: re.search('\d', text)\d works here but \b is a backspace! Always use r'\d'.

  2. Greedy vs non-greedy: .* is greedy (matches as much as possible); use .*? for minimal matching.

  3. re.match vs re.search: match only checks the START; search scans the entire string.

  4. Forgetting to compile for repeated use: Re-compiling the same pattern in a loop is wasteful — use re.compile().

  5. Catastrophic backtracking: Nested quantifiers like (a+)+ on long strings can cause timeout — simplify patterns.

📝 Summary

  • re.search() finds first match; re.findall() returns all; re.match() matches at start

  • Core syntax: \d digit, \w word, \s whitespace, . any, + one+, * zero+, ? optional

  • Groups () capture; named groups (?P<name>...) for readability

  • re.sub(pattern, replacement, text) — search and replace

  • re.compile() — compile once for repeated use (performance)

  • Lookahead (?=...) / lookbehind (?<=...) — match positions without consuming

🔗 Further Reading

Last updated