Lab 02: Computer Vision Pipelines — Detection and Segmentation

Objective

Build advanced computer vision pipelines: custom image augmentation, feature pyramid networks, anchor-based object detection, and semantic segmentation — applied to security camera and screenshot analysis scenarios.

Time: 60 minutes | Level: Advanced | Docker Image: zchencow/innozverse-ai:latest


Background

Practitioner labs covered basic CNN classification. Advanced CV addresses harder tasks:

Classification:   "Is this image an attack screenshot?"     → single label
Detection:        "Where in this screenshot are the threats?" → boxes + labels
Segmentation:     "Which pixels belong to each region?"      → pixel masks

Architecture evolution:
  2012: AlexNet — deep CNN classification
  2015: ResNet   — skip connections, 152 layers possible
  2017: FPN      — multi-scale feature pyramids for detection
  2018: YOLOv3   — real-time detection in one pass
  2022: DINO     — self-supervised ViT features for everything

Step 1: Image Augmentation Pipeline

📸 Verified Output:


Step 2: ResNet Skip Connections

📸 Verified Output:


Step 3: Feature Pyramid Network (FPN)

📸 Verified Output:


Step 4: Anchor-Based Object Detection

📸 Verified Output:


Step 5: Semantic Segmentation

📸 Verified Output:


Step 6: Video Understanding (Frame-Level)

📸 Verified Output:


Step 7: Model Evaluation — COCO Metrics

📸 Verified Output:


Step 8: Capstone — Security Screenshot Analysis System

📸 Verified Output:


Summary

Technique
Architecture
Use Case

Augmentation

Random crop, jitter, noise

Prevent overfitting

ResNet

Skip connections, 16+ blocks

Backbone feature extraction

FPN

Multi-scale pyramid

Objects at different scales

Anchor-based detection

9 anchors/location

Object localisation

U-Net segmentation

Encoder-decoder + skip

Pixel-level labelling

Video analysis

Temporal GRU

Motion + scene change detection

COCO metrics

Standard detection evaluation

Further Reading

Last updated