Lab 09: Reinforcement Learning for Security

Objective

Implement core RL algorithms from scratch: Q-learning, DQN, and policy gradient — applied to security scenarios: adaptive network defence, intrusion response automation, and optimal patch scheduling.

Time: 55 minutes | Level: Advanced | Docker Image: zchencow/innozverse-ai:latest


Background

Supervised ML: learn f(x) → y from labelled examples
RL:            learn policy π(s) → a to maximise cumulative reward

Security RL applications:
  - Network defence: which firewall rule to apply next?
  - Patch scheduling: which vulnerability to patch first?
  - Incident response: sequence of containment actions to minimise damage
  - Red team automation: which attack vector to try next?

Step 1: Network Defence Environment

docker run -it --rm zchencow/innozverse-ai:latest bash

📸 Verified Output:


Step 2: Q-Learning (Tabular)

📸 Verified Output:


Step 3: Deep Q-Network (DQN)

📸 Verified Output:


Step 4: Policy Gradient — REINFORCE

📸 Verified Output:


Step 5–8: Capstone — Automated Incident Response Agent

📸 Verified Output:


Summary

Algorithm
Type
Best For
Convergence

Q-Learning

Model-free, tabular

Small discrete spaces

Slow but guaranteed

DQN

Model-free, neural

Large state spaces

Moderate

REINFORCE

Policy gradient

Continuous actions

High variance

PPO (next step)

Actor-critic

Most prod use cases

Fast, stable

Further Reading

Last updated