Build a complete, production-ready ML pipeline from raw data to deployed model: ingest raw security logs → feature engineering → train multiple models → evaluate + select → explain predictions → deploy as API → monitor in production. The full lifecycle in one lab.
Real ML projects are not single notebooks. They are pipelines:
Raw Data → Cleaning → Feature Eng → Train → Evaluate → Explain → Deploy → Monitor
↑ ↓ ↓
data quality model registry alerts
This capstone integrates every technique from Labs 01–19:
Feature engineering (Lab 04)
Model evaluation (Lab 05)
Gradient boosting (Lab 03)
Anomaly detection (Lab 17)
SHAP-style explainability (Lab 04)
FastAPI serving (Lab 19)
Monitoring (Lab 19)
Step 1: Raw Data Ingestion and Quality Checks
📸 Verified Output:
Step 2: Data Cleaning Pipeline
📸 Verified Output:
Step 3: Feature Engineering
📸 Verified Output:
Step 4: Multi-Model Tournament
📸 Verified Output:
Step 5: Model Explainability (SHAP-Style)
📸 Verified Output:
💡 failed_logins and unique_ports are the top predictors — this directly maps to brute force and port scanning. Feature importance validates our model is learning real attack patterns, not noise.
Step 6: Threshold Optimisation
📸 Verified Output:
💡 Lower threshold (0.30) catches more attacks (higher TP, fewer FN) and delivers £328,500 more business value despite more false alarms — because the cost of a missed attack dwarfs false alarm costs.
Step 7: Model Serialisation and Deployment Package
📸 Verified Output:
Step 8: Full Production Capstone — Live Pipeline with Monitoring
📸 Verified Output:
Capstone Summary
You've built a complete production ML pipeline:
Stage
What You Did
Key Skill
Ingest
Raw messy logs → quality report
Pandas, data audit
Clean
Missing values, invalid data, encoding
Reproducible pipelines
Features
Ratio features, log transforms, interactions
Domain knowledge
Train
3-model tournament
Model selection
Explain
Permutation importance
Interpretability
Threshold
Business-cost optimisation
Risk management
Package
Versioned deployment artifact
MLOps
Deploy
Live inference with monitoring
Production readiness
Pipeline performance: ROC-AUC 0.9913, 0.83ms latency, 7.8% attack detection rate, zero false negatives on critical alerts.
What's Next: Architect Level
MLflow / DVC: Full experiment tracking and data versioning
Kubernetes + Seldon: Scalable model serving at millions req/sec
Feature Store: Feast / Tecton for shared, real-time features
Continual Learning: Model retraining on new attack patterns
Full SHAP: shap library for accurate Shapley value attribution