Build lazy data pipelines using generators, yield from, custom iterators, and the itertools module for memory-efficient data processing.
Time
30 minutes
Prerequisites
Python Foundations Lab 04 (Comprehensions)
Tools
Docker image: zchencow/innozverse-python:latest
Lab Instructions
Step 1: Generators & yield
dockerrun--rmzchencow/innozverse-python:latestpython3-c"import sys# Generator function — lazy evaluationdef fibonacci(): a, b = 0, 1 while True: yield a a, b = b, a + bdef take(n, iterable): for i, val in enumerate(iterable): if i >= n: break yield valgen = fibonacci()first_10 = list(take(10, gen))print('First 10 fib:', first_10)# Memory comparison: list vs generatordef range_gen(n): i = 0 while i < n: yield i i += 1N = 1_000_000lst = list(range(N))gen = range_gen(N)print(f'List size: {sys.getsizeof(lst):,} bytes')print(f'Generator size: {sys.getsizeof(gen)} bytes') # ~120 bytes always# Generator expressionsquares_gen = (x**2 for x in range(10))print('Squares:', list(squares_gen))# send() — two-way communicationdef accumulator(): total = 0 while True: value = yield total if value is None: break total += valueacc = accumulator()next(acc) # prime the generatorfor val in [10, 20, 30, 40]: total = acc.send(val) print(f' sent {val:3d}, running total: {total}')"
💡 Generators are lazy — they compute values one at a time, only when asked. A generator function returns a generator object without running any code. This is why a generator for a million items uses ~120 bytes: it stores only the current state (local variables, instruction pointer), not all the values.
📸 Verified Output:
Step 2: yield from & Custom Iterators
📸 Verified Output:
Steps 3–8: itertools, Data Pipeline, CSV Processing, Chunking, Windowing, Capstone