Lab 07: Convolutional Neural Networks

Objective

Understand how CNNs process images through convolutional filters, pooling, and fully-connected layers. Implement a CNN from scratch with NumPy and understand why convolutions are the right tool for spatial data.

Time: 50 minutes | Level: Practitioner | Docker Image: zchencow/innozverse-ai:latest


Background

A regular neural network treats each pixel as an independent input — a 224×224 image has 50,176 inputs. That ignores structure: nearby pixels are related, and the same edge can appear anywhere in an image.

CNNs solve this with three key ideas:

  1. Local connectivity: each filter looks at a small patch (e.g., 3×3)

  2. Parameter sharing: the same filter slides across the whole image

  3. Hierarchy: early layers detect edges; deeper layers detect shapes, then objects


Step 1: Environment Setup

docker run -it --rm zchencow/innozverse-ai:latest bash
import numpy as np
print("NumPy:", np.__version__)

📸 Verified Output:


Step 2: The Convolution Operation

📸 Verified Output:

💡 The horizontal edge filter detected the top and bottom edges of the rectangle (positive and negative values). The vertical filter detected left and right edges. CNNs learn these filters automatically during training.


Step 3: Multi-Channel Convolution

Real images have 3 channels (RGB). Each filter has depth=3:

📸 Verified Output:


Step 4: Pooling — Spatial Downsampling

Pooling reduces spatial dimensions while keeping the most important information:

📸 Verified Output:

💡 Max pooling preserves the strongest activations (the most prominent features detected by each filter). This also provides translation invariance — the feature is detected regardless of its exact position.


Step 5: Full CNN Architecture

📸 Verified Output:

💡 Without training, predictions are near-random (~0.25 per class). This is expected — the filters are random. In a trained CNN, filters become meaningful edge/texture detectors.


Step 6: CNN Architecture Zoo

Modern CNNs you should know:

📸 Verified Output:


Step 7: ResNet Skip Connections

The key innovation of ResNet — skip connections prevent gradient vanishing in deep networks:

📸 Verified Output:

💡 Standard deep network: activations collapse to ~0 by layer 10 (vanishing gradient). ResNet maintains strong activations through 20 layers because the skip connection always provides a direct path for gradient flow.


Step 8: Real-World Capstone — Malware Screenshot Classifier

📸 Verified Output:

💡 This is the transfer learning pattern: use a pretrained CNN (like ResNet-50 trained on ImageNet) to extract features from your images, then train a simple classifier on top. You get excellent results without training a deep network from scratch.


Summary

Component
Purpose
Key Parameter

Convolutional layer

Detect spatial features

filter size, n_filters

ReLU

Non-linearity

Max pooling

Downsample, translation invariance

pool_size, stride

Skip connection

Gradient highway in deep networks

Flatten + FC

Classification head

hidden_dim

Key Takeaways:

  • Filters learn to detect edges → textures → parts → objects hierarchically

  • Parameter sharing: a 3×3 filter has only 9 weights regardless of image size

  • ResNet skip connections solved the vanishing gradient problem for 50-150+ layer networks

  • Transfer learning (CNN features + linear classifier) works well with limited data

Further Reading

Last updated