Lab 12: Convolutional Operations

Objective

Implement convolution from scratch: 2D convolution, edge detection with Sobel/Prewitt filters, Gaussian blur, max and average pooling, stride and padding, feature map visualisation — using only NumPy, applied to processing Surface device spec "images" and simulated product image tensors.

Background

Convolutional Neural Networks (CNNs) apply learned filters to detect local patterns in images. A filter (kernel) is a small weight matrix that slides over the input, computing a dot product at each position. Early layers learn edge detectors; deeper layers learn textures and shapes. The same conv2d operation in PyTorch/TensorFlow is just the function you implement here — repeatedly applied across many channels with learned weights.

Time

35 minutes

Prerequisites

  • Lab 03 (Neural Network) — weight matrices

  • Lab 07 (PCA) — matrix operations

Tools

  • Docker: zchencow/innozverse-python:latest


Lab Instructions

💡 Max-pooling provides translation invariance. If an edge is detected at position (3,4) or (4,3), after 2×2 max-pooling both give the same pooled output. This is why CNNs recognise a cat in the centre of an image AND in the corner. stride controls the downsampling: stride=2 halves each dimension. padding controls output size: padding=kernel_size//2 (same padding) keeps output the same size as input, as in PyTorch's Conv2d(padding='same').

📸 Verified Output:


Summary

Operation
Output size
Purpose

Conv2d, s=1, p=0

(H-k+1, W-k+1)

Feature extraction

Conv2d, s=1, p=k//2

(H, W)

Same-size output

Max Pool 2×2 s=2

(H/2, W/2)

Downsample, invariance

Sobel

Edge map

Gradient magnitude

Gaussian

Blurred

Noise reduction

Last updated