Lab 03: Generators & itertools

Objective

Build lazy data pipelines using generators, yield from, custom iterators, and the itertools module for memory-efficient data processing.

Time

30 minutes

Prerequisites

  • Python Foundations Lab 04 (Comprehensions)

Tools

  • Docker image: zchencow/innozverse-python:latest


Lab Instructions

Step 1: Generators & yield

docker run --rm zchencow/innozverse-python:latest python3 -c "
import sys

# Generator function — lazy evaluation
def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

def take(n, iterable):
    for i, val in enumerate(iterable):
        if i >= n: break
        yield val

gen = fibonacci()
first_10 = list(take(10, gen))
print('First 10 fib:', first_10)

# Memory comparison: list vs generator
def range_gen(n):
    i = 0
    while i < n:
        yield i
        i += 1

N = 1_000_000
lst = list(range(N))
gen = range_gen(N)
print(f'List size:      {sys.getsizeof(lst):,} bytes')
print(f'Generator size: {sys.getsizeof(gen)} bytes')  # ~120 bytes always

# Generator expression
squares_gen = (x**2 for x in range(10))
print('Squares:', list(squares_gen))

# send() — two-way communication
def accumulator():
    total = 0
    while True:
        value = yield total
        if value is None: break
        total += value

acc = accumulator()
next(acc)  # prime the generator
for val in [10, 20, 30, 40]:
    total = acc.send(val)
    print(f'  sent {val:3d}, running total: {total}')
"

💡 Generators are lazy — they compute values one at a time, only when asked. A generator function returns a generator object without running any code. This is why a generator for a million items uses ~120 bytes: it stores only the current state (local variables, instruction pointer), not all the values.

📸 Verified Output:


Step 2: yield from & Custom Iterators

📸 Verified Output:


Steps 3–8: itertools, Data Pipeline, CSV Processing, Chunking, Windowing, Capstone

📸 Verified Output:


Summary

Tool
Use case

yield

Produce values lazily, one at a time

yield from

Delegate to sub-generator

__iter__ / __next__

Custom iterator protocol

itertools.chain

Concatenate iterables without copying

itertools.islice

Lazy slice of any iterable

itertools.groupby

Group consecutive equal items

itertools.accumulate

Running total/max/min

Generator pipeline

Chain transformations lazily

Further Reading

Last updated