Lab 04: Memory Allocator

Time: 60 minutes | Level: Architect | Docker: docker run -it --rm python:3.11-slim bash

Overview

CPython uses a layered memory allocator (pymalloc) on top of the system allocator, with a cyclic garbage collector for reference cycles. This lab covers tracemalloc for memory profiling, the gc module internals, and weakref for cache patterns.

Step 1: CPython Memory Architecture

import sys

# CPython pymalloc: arenas → pools → blocks
# - Arenas: 256 KB chunks allocated from OS
# - Pools: 4 KB within arenas, one size class per pool
# - Blocks: fixed-size within pools (8, 16, 24, ... 512 bytes)
# Objects > 512 bytes go directly to the system allocator

# getsizeof returns the object's own memory (not referenced objects)
data = [1, 2, 3, 4, 5]
print(f"List (5 ints) getsizeof: {sys.getsizeof(data)} bytes")
print(f"  Note: doesn't include the ints themselves")

# Recursive size calculation
def deep_sizeof(obj, seen=None):
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    seen.add(obj_id)
    size = sys.getsizeof(obj)
    if isinstance(obj, dict):
        size += sum(deep_sizeof(k, seen) + deep_sizeof(v, seen) for k, v in obj.items())
    elif isinstance(obj, (list, tuple, set)):
        size += sum(deep_sizeof(item, seen) for item in obj)
    return size

nested = {'users': [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]}
print(f"\nNested dict getsizeof: {sys.getsizeof(nested)} bytes")
print(f"Nested dict deep_sizeof: {deep_sizeof(nested)} bytes")

💡 sys.getsizeof only counts the container's own memory, not the objects it references. Use a recursive traversal for true memory cost.

Step 2: tracemalloc — Memory Profiling

📸 Verified Output:

Step 3: Snapshot Comparison

Step 4: gc Module — Cyclic Garbage Collector

CPython's reference counting can't handle cycles. The gc module handles them:

💡 The three-generation GC assumes: objects that survive collection 0 are likely long-lived. Gen 0 is collected frequently, gen 2 rarely.

Step 5: gc.get_referrers — Finding What Holds References

Step 6: weakref — Cache Without Preventing Collection

Step 7: Memory Leak Detection Pattern

Step 8: Capstone — Memory Profiler Decorator

📸 Verified Output (tracemalloc):

Summary

Concept
API
Use Case

Object size

sys.getsizeof

Memory estimation

Deep size

Custom recursive traversal

True allocation cost

Memory profiling

tracemalloc.start/take_snapshot

Find memory hogs

Snapshot comparison

snapshot.compare_to

Detect memory growth

Cycle collection

gc.collect, gc.get_count

Force GC, diagnose cycles

Reference tracing

gc.get_referrers

Find what keeps objects alive

Weak caches

weakref.WeakValueDictionary

Cache without preventing GC

Leak detection

tracemalloc + multiple iterations

CI/CD memory regression tests

Last updated