Lab 06: Neural Networks from Scratch (NumPy)

Objective

Build a multi-layer neural network from scratch using only NumPy — no frameworks. By implementing forward pass, backpropagation, and gradient descent yourself, you will deeply understand what PyTorch and TensorFlow do under the hood.

Time: 55 minutes | Level: Practitioner | Docker Image: zchencow/innozverse-ai:latest


Background

A neural network is a chain of matrix multiplications and non-linear activations:

Input x → [W1, b1] → ReLU → [W2, b2] → ReLU → [W3, b3] → Softmax → Output ŷ

Forward pass:  compute ŷ from x
Backward pass: compute ∂Loss/∂W for every weight W
Update:        W := W - learning_rate * ∂Loss/∂W

The magic: backpropagation is just the chain rule from calculus applied systematically.


Step 1: Environment Setup

docker run -it --rm zchencow/innozverse-ai:latest bash
import numpy as np
print(f"NumPy: {np.__version__}")
np.random.seed(0)

📸 Verified Output:


Step 2: Activation Functions and Their Derivatives

Every neuron applies an activation function. The derivative is needed for backprop:

📸 Verified Output:

💡 ReLU has zero gradient for negative inputs (the "dying ReLU" problem). This is why careful weight initialisation matters.


Step 3: Weight Initialisation

Poor initialisation = vanishing/exploding gradients:

📸 Verified Output:

💡 Zero init → all neurons learn the same thing (symmetry breaking problem). He init maintains variance through deep networks with ReLU activations.


Step 4: Forward Pass

📸 Verified Output:


Step 5: Loss Function and Backpropagation

📸 Verified Output:


Step 6: Mini-Batch Training and Validation

📸 Verified Output:

💡 Mini-batch training (batch_size=64) is faster than full-batch gradient descent and less noisy than stochastic (batch_size=1). PyTorch uses the exact same approach internally.


Step 7: Adding Dropout Regularisation

📸 Verified Output:

💡 Dropout reduced the overfitting gap from 0.087 to 0.033 while slightly improving test accuracy. Dropout randomly deactivates neurons during training, forcing the network to learn redundant representations.


Step 8: Real-World Capstone — Network Intrusion Detection Neural Network

📸 Verified Output:

💡 Flagging low-confidence predictions for human review is a key production pattern. The model handles the easy cases; humans focus on the ambiguous 14%.


Summary

What you built from scratch:

  • Activation functions (ReLU, sigmoid, softmax) + their gradients

  • Weight initialisation (He init for ReLU networks)

  • Forward pass through N layers

  • Backpropagation using the chain rule

  • Mini-batch gradient descent

  • Dropout regularisation

Core equations:

Further Reading

Last updated