Lab 01: PyTorch Deep Dive — Custom Training Loops

Objective

Master PyTorch fundamentals by building everything from scratch: custom datasets, DataLoaders, training loops with gradient accumulation, learning rate schedulers, early stopping, and mixed-precision training — applied to a network intrusion detection classifier.

Time: 60 minutes | Level: Advanced | Docker Image: zchencow/innozverse-ai:latest


Background

Practitioner labs used scikit-learn's .fit(). Advanced ML requires manual control of the training loop:

sklearn:    model.fit(X, y)          → black box, convenient
PyTorch:    for batch in loader:     → full control
                loss = criterion(model(X), y)
                loss.backward()
                optimizer.step()

Why manual loops?
  - Custom loss functions (focal loss, contrastive loss)
  - Gradient accumulation (simulate large batch on small GPU)
  - Mixed precision (FP16 for 2× speedup)
  - Per-step monitoring and early stopping
  - Multi-task learning (multiple losses combined)

Step 1: Environment and Data

📸 Verified Output:


Step 2: Custom Dataset and DataLoader

📸 Verified Output:


Step 3: Neural Network Architecture

📸 Verified Output:


Step 4: Custom Loss — Focal Loss for Class Imbalance

📸 Verified Output:

💡 Focal loss was developed for object detection (RetinaNet) but is invaluable for any heavily imbalanced dataset. A 6% attack rate means 94% of BCE loss comes from easy normal examples — focal loss fixes this.


Step 5: Optimiser with Learning Rate Scheduling

📸 Verified Output:


Step 6: Full Training Loop with Early Stopping

📸 Verified Output:

💡 Early stopping prevents overfitting — the model stopped improving at epoch 30 and we saved the best weights. Without it, training to 100 epochs would degrade performance as the model memorises training noise.


Step 7: Gradient Accumulation (Simulate Large Batches)

📸 Verified Output:


Step 8: Capstone — Production Training Pipeline

📸 Verified Output:


Summary

Technique
What It Does
When to Use

Custom Dataset

Type-safe data loading

Any PyTorch project

BatchNorm

Normalises layer inputs

Deep networks (>3 layers)

Dropout

Random neuron deactivation

Overfit reduction

Focal Loss

Down-weights easy examples

Class imbalance

AdamW

Adam + decoupled weight decay

Standard choice for most tasks

Cosine LR

Smooth LR annealing

Long training runs

Early stopping

Stops when val metric plateaus

Always

Grad accumulation

Simulates large batches

Memory-constrained GPU

Further Reading

Last updated