Lab 05: Adversarial ML & Model Robustness

Objective

Understand and implement adversarial attacks on ML models: FGSM, PGD, query-based black-box attacks, and data poisoning. Then apply defensive techniques: adversarial training, input preprocessing, and certified robustness bounds.

Time: 55 minutes | Level: Advanced | Docker Image: zchencow/innozverse-ai:latest


Background

Normal ML pipeline:  train on clean data → deploy → assume inputs are benign

Adversarial reality: 
  - Attacker adds imperceptible noise to input → model misclassifies
  - Spam filter evasion: slightly alter email to bypass detector
  - Malware evasion: add benign-looking bytes → bypass ML AV
  - Intrusion detection bypass: craft network traffic to evade classifier

Step 1: Setup and Victim Model

docker run -it --rm zchencow/innozverse-ai:latest bash
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import warnings; warnings.filterwarnings('ignore')

np.random.seed(42)

# Malware classification dataset (features from PE file analysis)
X, y = make_classification(n_samples=5000, n_features=20, n_informative=12,
                             weights=[0.7, 0.3], random_state=42)
feature_names = [
    'pe_size', 'section_count', 'import_count', 'export_count', 'entropy',
    'has_tls', 'has_resources', 'debug_size', 'reloc_size', 'timestamp_delta',
    'string_count', 'url_count', 'ip_count', 'suspicious_api', 'packed',
    'crypto_api', 'network_api', 'process_api', 'registry_api', 'file_api',
]

X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_tr_s = scaler.fit_transform(X_tr)
X_te_s  = scaler.transform(X_te)

model = GradientBoostingClassifier(n_estimators=200, max_depth=4, random_state=42)
model.fit(X_tr_s, y_tr)
clean_acc = accuracy_score(y_te, model.predict(X_te_s))
print(f"Victim model (malware classifier): accuracy={clean_acc:.4f}")
print(f"Features: {len(feature_names)} PE-file features")

📸 Verified Output:


Step 2: FGSM — Fast Gradient Sign Method

📸 Verified Output:

💡 At ε=0.5, 85% of malware samples evade detection. In practice, malware authors use similar techniques to craft PE files that bypass ML-based antivirus.


Step 3: PGD — Projected Gradient Descent (Stronger Attack)

📸 Verified Output:


Step 4: Black-Box Query Attack

📸 Verified Output:


Step 5: Data Poisoning Attack

📸 Verified Output:

💡 The poisoned model looks almost identical on clean data (0.936 vs 0.938) — undetectable without knowing the trigger. Yet 84% of triggered malware samples evade detection. This is why supply chain attacks on ML models are so dangerous.


Step 6: Adversarial Training (Defence)

📸 Verified Output:


Step 7: Input Preprocessing Defence

📸 Verified Output:


Step 8: Capstone — ML Security Audit Report

📸 Verified Output:


Summary

Attack
Type
Threat Level
Defence

FGSM

White-box

Medium

Adversarial training

PGD

White-box

High

Adversarial training + input defence

Black-box query

Black-box

Medium

Rate limiting, query monitoring

Data poisoning

Supply chain

Critical

Training data provenance, anomaly detection

Further Reading

Last updated