Lab 19: AI Red Teaming & Security Audit

Objective

Systematically attack ML systems to find vulnerabilities before adversaries do: model inversion attacks, membership inference, data poisoning detection, supply chain threats (pickle injection), adversarial patch generation, and build a comprehensive AI security audit framework.

Time: 55 minutes | Level: Advanced | Docker Image: zchencow/innozverse-ai:latest


Background

AI systems introduce a new attack surface beyond traditional software:
  
  Training phase:
    Data poisoning:      inject malicious samples → backdoor or degrade model
    Supply chain:        malicious pre-trained weights (pickle injection)
  
  Inference phase:
    Model inversion:     recover training data from model predictions
    Membership inference:determine if a specific sample was in training set
    Adversarial examples:perturb input → wrong prediction
    Model extraction:    clone model via API queries
  
  Deployment phase:
    Prompt injection:    (see lab 13)
    Evasion:             craft inputs that evade detection
    Sponge attacks:      maximise compute/latency

Step 1: Membership Inference Attack

📸 Verified Output:

💡 Regularisation is a privacy defence! A smaller train/test gap means the model memorises less, making membership inference harder.


Step 2: Data Poisoning Detection

📸 Verified Output:


Step 3: Supply Chain — Pickle Injection Demo

📸 Verified Output:


Step 4–8: Capstone — AI Security Audit Report

📸 Verified Output:


Summary

Attack
Threat
Detection
Defence

Membership Inference

Privacy leak

Low train/test gap

Regularisation, DP

Data Poisoning

Backdoor / degradation

Isolation Forest

Data sanitation

Pickle injection

RCE on model load

Static bytecode scan

Use safetensors

Adversarial examples

Evasion

Adversarial evaluation

Adversarial training

Model extraction

IP theft

Rate limiting

Prediction throttling

Further Reading

Last updated