Lab 17: Real-Time AI Inference
Overview
Architecture
┌──────────────────────────────────────────────────────────────┐
│ Real-Time AI Inference Pipeline │
├──────────────────────────────────────────────────────────────┤
│ Event Source │
│ (Kafka Topic) → Feature Computation → Online Feature Store │
│ ↓ (Flink, < 5ms) (Redis, < 1ms) │
│ Feature Assembly │
│ (join event + stored features, < 2ms) │
│ ↓ │
│ Model Inference (ML server, < 20ms) │
│ ↓ │
│ Post-processing → Action/Response (< 5ms) │
│ Total budget: < 50ms │
├──────────────────────────────────────────────────────────────┤
│ CIRCUIT BREAKER: fallback to rule-based if model fails │
└──────────────────────────────────────────────────────────────┘Step 1: Streaming Inference Architecture
Dimension
Apache Flink
Spark Streaming
Step 2: Latency Budget Design
Technique
Latency Reduction
Implementation
Step 3: Feature Freshness Requirements
Feature Type
Example
Max Staleness
Update Mechanism
Step 4: Model Warm-Up
Step 5: Blue-Green Model Deployment
Step 6: Circuit Breaker Pattern
Step 7: Online Feature Computation
Pattern
Example
Implementation
Step 8: Capstone — Inference Pipeline Simulator
Summary
Concept
Key Points
Last updated
