Lab 13: Serialization & Protocols

Objective

Master Python's serialization ecosystem: json with custom encoders/decoders, pickle for arbitrary objects, struct for binary protocols, base64 encoding, schema validation with dataclasses, and a lightweight binary frame format for high-throughput data pipelines.

Background

Serialization converts in-memory objects to bytes for storage or transmission. Different formats have different tradeoffs: JSON is human-readable and universal but slow; pickle is Python-native and fast but not portable; struct produces compact binary packets for fixed-format protocols. Choosing the right format determines throughput at scale.

Time

30 minutes

Prerequisites

  • Python Advanced Lab 06 (ctypes & Binary Protocols)

Tools

  • Docker: zchencow/innozverse-python:latest


Lab Instructions

Steps 1–8: JSON custom encoder/decoder, pickle with hooks, struct binary frames, base64, dataclass schema, round-trip tests, size comparison, Capstone

💡 Use struct.Struct (pre-compiled) not struct.pack/unpack directly. Pre-compiling the format string with struct.Struct(">HBBi") parses the format once and caches the compiled version — making repeated pack/unpack calls ~3x faster. The > prefix means big-endian (network byte order), which is portable across CPU architectures. Always use big-endian for network protocols.

📸 Verified Output:


Summary

Format
Size
Speed
Portable
Use for

JSON

Large

Medium

Universal

APIs, config

Pickle

Medium

Fast

Python only

ML models, caches

struct

Small

Fastest

Any language

Network protocols, IoT

Base64

+33%

Fast

Text-safe binary

Email, JSON embedding

Further Reading

Last updated