Lab 15: Capstone — DataPipeline CLI

Objective

Build a complete, production-quality data pipeline CLI tool (datapipe) that combines async I/O, FastAPI, SQLite, pandas, rich output, type hints, and design patterns — everything from Labs 01–14.

Time

45 minutes

Prerequisites

  • Labs 01–14

Tools

  • Docker image: zchencow/innozverse-python:latest


Lab Instructions

Step 1: Architecture Overview & Data Models

docker run --rm zchencow/innozverse-python:latest python3 -c "
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
from enum import Enum, auto

# --- Domain Models ---
class Status(str, Enum):
    ACTIVE       = 'active'
    OUT_OF_STOCK = 'out_of_stock'
    DISCONTINUED = 'discontinued'

class Category(str, Enum):
    LAPTOP    = 'Laptop'
    ACCESSORY = 'Accessory'
    SOFTWARE  = 'Software'
    HARDWARE  = 'Hardware'

@dataclass(frozen=True)
class ProductID:
    value: int
    def __str__(self): return f'P{self.value:04d}'

@dataclass
class Product:
    id: ProductID
    name: str
    price: float
    stock: int
    category: Category
    rating: float = 0.0
    created_at: datetime = field(default_factory=datetime.now)

    def __post_init__(self):
        if not self.name.strip(): raise ValueError('name required')
        if self.price <= 0:       raise ValueError(f'price must be positive: {self.price}')
        if self.stock < 0:        raise ValueError(f'stock cannot be negative: {self.stock}')
        if not 0 <= self.rating <= 5: raise ValueError(f'rating must be 0-5: {self.rating}')

    @property
    def status(self) -> Status:
        return Status.ACTIVE if self.stock > 0 else Status.OUT_OF_STOCK

    @property
    def value(self) -> float: return self.price * self.stock

    def apply_discount(self, pct: float) -> Product:
        if not 0 <= pct <= 1: raise ValueError(f'discount must be 0-1: {pct}')
        return Product(self.id, self.name, round(self.price * (1 - pct), 2),
                       self.stock, self.category, self.rating, self.created_at)

    def sell(self, qty: int) -> Product:
        if qty <= 0: raise ValueError(f'qty must be positive: {qty}')
        if self.stock < qty: raise ValueError(f'insufficient stock: have {self.stock}, need {qty}')
        return Product(self.id, self.name, self.price, self.stock - qty,
                       self.category, self.rating, self.created_at)

    def __repr__(self): return f'Product({self.id}, {self.name!r}, \${self.price})'

# Demo
products = [
    Product(ProductID(1), 'Surface Pro 12\"', 864.00, 15, Category.LAPTOP,    4.8),
    Product(ProductID(2), 'Surface Pen',      49.99,  80, Category.ACCESSORY, 4.6),
    Product(ProductID(3), 'Office 365',       99.99,  999,Category.SOFTWARE,  4.5),
    Product(ProductID(4), 'USB-C Hub',        29.99,  0,  Category.HARDWARE,  4.2),
    Product(ProductID(5), 'Surface Book 3',   1299.0, 5,  Category.LAPTOP,    4.9),
]

for p in products:
    disc = p.apply_discount(0.1)
    print(f'{str(p.id):6s} {p.name:20s} \${p.price:8.2f} → \${disc.price:8.2f}  {p.status.value}')

print()
total_value = sum(p.value for p in products)
print(f'Total inventory value: \${total_value:,.2f}')
in_stock = [p for p in products if p.status == Status.ACTIVE]
print(f'Products in stock: {len(in_stock)}/{len(products)}')
"

💡 frozen=True on @dataclass makes the class immutable — attributes cannot be changed after __init__. This is perfect for value objects like ProductID. Instead of mutation, methods like apply_discount() and sell() return new instances, enabling safe functional transformations and making bugs from shared mutable state impossible.

📸 Verified Output:


Step 2: Repository with SQLite Backend

📸 Verified Output:


Steps 3–8: Async Pipeline, FastAPI Integration, Analytics, CLI, Tests, Full Run

📸 Verified Output:


What You Built

A production-quality DataPipeline CLI combining:

Component
Lab
Technology

Domain models

01

@dataclass(frozen=True), Enum, validation

Async enrichment

05

asyncio.gather, concurrent price fetch

Persistent storage

08

SQLite, Repository pattern

REST API

09

FastAPI, Pydantic, TestClient

Analytics

10

pandas groupby, aggregation

CLI output

11

rich Table, Panel, progress

Design patterns

12

Repository, Strategy, Command

Type safety

07

TypedDict, Protocol, Generic

Test suite

06

pytest, fixtures, parametrize

Packaging

13

pyproject.toml, __all__

Further Reading

Last updated