Lab 01: MLOps Platform Architecture

Time: 50 minutes | Level: Architect | Docker: docker run -it --rm zchencow/innozverse-ai:latest bash

Overview

MLOps (Machine Learning Operations) bridges the gap between model development and production systems. This lab covers the MLOps maturity model, end-to-end ML pipeline architecture, experiment tracking, model registry patterns, and CI/CD for ML systems.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    MLOps Platform                           │
├─────────────────────────────────────────────────────────────┤
│  Data Layer        │  Training Layer  │  Serving Layer      │
│  ─────────────     │  ──────────────  │  ─────────────      │
│  Feature Store     │  Experiment Track│  Model Server        │
│  Data Catalog      │  Distributed Train│  A/B Testing        │
│  Data Validation   │  HPO             │  Canary Deploy       │
├─────────────────────────────────────────────────────────────┤
│  Model Registry → Staging → Production → Archived          │
├─────────────────────────────────────────────────────────────┤
│  CI/CD Pipeline → Build → Test → Deploy → Monitor          │
└─────────────────────────────────────────────────────────────┘

Step 1: MLOps Maturity Model

Understand the three maturity levels before designing any ML platform.

Level
Name
Characteristics
Deployment Frequency

L0

Manual

Jupyter notebooks, manual steps, no CI/CD

Monthly/quarterly

L1

ML Pipeline Automation

Automated retraining, model monitoring, pipeline orchestration

Weekly

L2

CI/CD Automation

Automated pipeline deployment, feature store, A/B testing

On-demand (hours)

💡 Most enterprises start at L0-L1. The jump to L2 requires significant platform investment but delivers 10x faster iteration.

L0 Pain Points:

  • Models live in notebooks, not production

  • Reproducibility is near-impossible

  • Manual retraining triggered by gut feeling

  • No performance monitoring

L2 Benefits:

  • New models deploy in hours, not weeks

  • Automated drift detection triggers retraining

  • Full audit trail for compliance

  • Experiments reproducible across teams


Step 2: ML Pipeline Stages

A production ML pipeline has six core stages:

Stage Details:

Stage
Purpose
Key Tools
Failure Mode

Data Ingestion

Collect raw data

Kafka, Airflow, Spark

Schema drift, data gaps

Data Validation

Check data quality

Great Expectations, Deequ

Silent corruption

Feature Engineering

Transform features

Feast, Tecton

Training-serving skew

Model Training

Fit model

sklearn, PyTorch, TF

Underfitting/overfitting

Model Evaluation

Measure performance

MLflow, custom metrics

Wrong evaluation split

Deployment

Serve predictions

FastAPI, BentoML, KServe

Latency regression

Monitoring

Detect drift

Evidently, Prometheus

Silent degradation

💡 Training-serving skew is the #1 production ML bug. Your feature store must serve identical features at training and inference time.


Step 3: Experiment Tracking with MLflow

MLflow provides four components: Tracking, Projects, Models, and Registry.

Tracking Server Architecture:

Experiment Hierarchy:

Key MLflow Concepts:

💡 Always log: (1) git commit hash, (2) data version, (3) all hyperparameters. This is the minimum for reproducibility.


Step 4: Model Registry Workflow

The model registry tracks model lifecycle from development to retirement.

Registry States:

State
Description
Gate Criteria

Registered

Model exists in registry

Successfully logged in MLflow

Staging

Undergoing validation

Passes unit tests, performance threshold

Production

Serving live traffic

Canary test passed, stakeholder approval

Archived

Retired/superseded

Replaced by better model

Promotion Checklist:


Step 5: CI/CD for ML Pipelines

ML CI/CD differs from software CI/CD — you're testing data and models, not just code.

What to Test in ML CI:

Test Type
What to Check
Example

Data tests

Schema, distributions, nulls

assert df['age'].between(0,120).all()

Feature tests

Value ranges, cardinality

Feature store output validation

Model tests

Performance > threshold

assert accuracy > 0.80

Integration tests

Pipeline end-to-end

Run full pipeline on 1% sample

💡 Use pytest + great_expectations for data tests. A broken data pipeline will silently degrade your model.


Step 6: Feature Store Architecture

Feature stores solve training-serving skew by centralizing feature computation.

Feature Store Components:

Component
Technology Options
Purpose

Feature pipeline

Spark, Flink, dbt

Compute features at scale

Offline store

S3 + Parquet, BigQuery

Historical features for training

Online store

Redis, DynamoDB, Bigtable

Low-latency feature serving

Registry

Feast, Tecton, Hopsworks

Feature discovery and versioning


Step 7: MLOps Platform Design Patterns

Pattern 1: Scheduled Retraining

Pattern 2: Triggered Retraining (Reactive)

Pattern 3: Continuous Training (Advanced)

💡 Start with scheduled retraining (simplest). Move to triggered when you have drift monitoring. Only implement continuous training if your data truly changes that fast.


Step 8: Capstone — Design MLOps Platform for Financial Services

Scenario: You're the AI Architect at a bank deploying a credit scoring model used for 50,000 loan decisions per day. Design the MLOps platform.

Requirements:

  • Regulatory compliance (Basel III, GDPR, ECOA)

  • Model explainability for every decision

  • Bias monitoring across protected groups

  • Full audit trail for 7 years

  • Retraining SLA: < 24 hours when drift detected

Architecture Exercise:

📸 Verified Output:

Platform Design Decisions:

Decision
Choice
Rationale

Orchestration

Airflow/Kubeflow Pipelines

Enterprise support, audit trail

Experiment tracking

MLflow on-prem

Data sovereignty requirements

Feature store

Feast + Redis

Open source, GDPR compliant

Model serving

KServe on K8s

Scalable, explainability hooks

Monitoring

Evidently + Prometheus

Drift detection + alerting

Audit storage

S3 + Glacier

7-year retention, cost-optimized


Summary

Concept
Key Points

MLOps Maturity

L0 (manual) → L1 (automated pipeline) → L2 (CI/CD automation)

Pipeline Stages

Data → Feature → Train → Evaluate → Registry → Deploy → Monitor

Experiment Tracking

Log params + metrics + artifacts; enable reproducibility

Model Registry

Staging → Production → Archived lifecycle with gates

CI/CD for ML

Test data + models + code; canary deployments

Feature Store

Offline (training) + Online (serving) = eliminate training-serving skew

Retraining Triggers

Schedule (simple) → Drift-triggered (reactive) → Continuous (advanced)

Next Lab: Lab 02: Model Serving at Scale →

Last updated