Lab 16: Causal ML & Counterfactual Reasoning

Objective

Move beyond correlation to causation: implement causal graphs (DAGs), the do-calculus, propensity score matching, counterfactual explanations, and causal forest treatment effect estimation — applied to security interventions (patching decisions, firewall rule changes).

Time: 50 minutes | Level: Advanced | Docker Image: zchencow/innozverse-ai:latest


Background

Correlation ≠ Causation (the classic mistake in security analytics):
  Observation: "Hosts with antivirus installed have more malware detections"
  Naive ML:    "AV causes malware" (wrong! AV-equipped hosts are used more recklessly)
  Causal ML:   Controls for confounders → AV reduces infection by 40%

Key concepts:
  SCM (Structural Causal Model): X → Y means X causes Y, not just correlates
  Confounder: Z affects both X (treatment) and Y (outcome) — must control for it
  do-calculus: P(Y | do(X=x)) ≠ P(Y | X=x)
  Counterfactual: "What would have happened if we HAD patched this host?"
  ATE: Average Treatment Effect = E[Y(1) - Y(0)]

Step 1: Causal Graph and Confounder Identification

📸 Verified Output:


Step 2: Propensity Score Matching

📸 Verified Output:


Step 3: Counterfactual Explanations

📸 Verified Output:


Step 4–8: Capstone — Causal Security Policy Evaluator

📸 Verified Output:


Summary

Method
Handles Confounders
Requires
Output

Naive regression

❌ No

Nothing

Biased ATE

Propensity matching

✅ Yes

Observational data

ATE

Double ML

✅ Yes

Any ML models

ATE + CATE

Counterfactual

✅ Yes

Trained model

Instance-level

Further Reading

Last updated