Lab 11: AI Cost Optimization
Overview
Architecture
┌──────────────────────────────────────────────────────────────┐
│ AI Cost Landscape │
├───────────────┬──────────────────┬───────────────────────────┤
│ COMPUTE │ DATA │ PEOPLE │
│ ─────────── │ ─────────────── │ ───────────────── │
│ GPU training │ Storage (hot) │ ML engineers │
│ GPU inference│ Storage (cold) │ Data engineers │
│ CPU serving │ Data transfer │ MLOps engineers │
│ Spot savings │ Feature store │ AI product managers │
├───────────────┴──────────────────┴───────────────────────────┤
│ OPTIMIZATION LEVERS │
│ Spot (70% off) | Distillation (60% off) | Cache (30% off) │
│ Quantization (50% off) | Batching | Right-sizing │
└──────────────────────────────────────────────────────────────┘Step 1: AI Cost Components
Category
Component
Typical % of Total
Optimization Potential
Step 2: GPU Utilization Optimization
Technique
GPU Utilization Impact
Complexity
Step 3: Spot and Preemptible Instances
Cloud
On-demand (A100 80G)
Spot (A100 80G)
Savings
Step 4: Model Distillation
Variant
Method
Best For
Step 5: Query Caching
Layer
Type
Hit Rate
Implementation
Step 6: Batching for Cost Optimization
Step 7: FinOps Practices for AI
Step 8: Capstone — AI Cost Model with ROI
Summary
Concept
Key Points
Last updated
