Lab 2: How AI Actually Works — Symbols, Statistics, Neural Nets
Objective
Strip away the magic and understand the three fundamental approaches to building intelligent systems. By the end you will be able to explain:
Why symbolic AI works in closed worlds but fails in open ones
How statistical machine learning learns patterns from data
What a neural network actually computes
Why scale changed everything
The Three Paradigms
There is no single "AI." There are three fundamentally different philosophies about what intelligence is and how to build it:
Symbolic AI
Intelligence = manipulating symbols by rules
Chess engines, Prolog, expert systems
Statistical ML
Intelligence = patterns extracted from data
Spam filters, recommendation engines
Neural Networks
Intelligence = learned representations in layers
GPT-4, image classifiers, AlphaGo
Paradigm 1: Symbolic AI (The Logic Approach)
Symbolic AI — also called GOFAI (Good Old-Fashioned AI) — treats intelligence as rule-following. Knowledge is encoded explicitly:
Strengths:
Transparent and explainable — you can inspect every rule
Guaranteed correct within its domain
Requires no training data
Weaknesses:
Rules must be written by hand — doesn't scale
Brittle: any situation not covered by a rule causes failure
The "frame problem" — maintaining a consistent world-model as things change is computationally intractable
💡 Real example: Early GPS navigation used symbolic rules: "turn left at junction X." When a road was closed, the system had no idea what to do — it had no ability to reason about novel situations.
Paradigm 2: Statistical Machine Learning
Instead of writing rules, machine learning finds patterns in data automatically.
The key insight: given enough examples of inputs and correct outputs, an algorithm can learn the mapping between them — without being told the rules.
Types of ML:
Supervised — labelled data (spam/not-spam, cat/dog)
Unsupervised — find structure in unlabelled data (clustering customers)
Reinforcement — learn from rewards (game playing, robotics)
Weaknesses:
Requires huge amounts of labelled data
Black box — hard to explain why a prediction was made
Learns correlations, not causation ("ice cream sales predict drownings")
Paradigm 3: Neural Networks (Deep Learning)
Neural networks were inspired by the brain but are better understood as function approximators. A neural network takes numbers in, applies layers of mathematical transformations, and produces numbers out.
The Perceptron (1957)
The simplest unit — one neuron:
The weights (w₁, w₂, w₃) are learned from data. The activation function (σ) introduces non-linearity — without it, stacking layers would be pointless (linear × linear = linear).
Deep Neural Networks
Stack many layers of neurons, and each layer learns increasingly abstract features:
How Training Works
Forward pass — input flows through the network; prediction is made
Loss calculation — how wrong was the prediction? (e.g., cross-entropy loss)
Backpropagation — calculate the gradient of the loss with respect to every weight
Update — nudge each weight slightly in the direction that reduces loss (gradient descent)
Repeat millions of times
Why Scale Changed Everything
For decades, neural networks underperformed simpler methods. Three things changed:
1. Data — the internet produced billions of labelled examples (text, images, clicks) that didn't exist before.
2. Compute — GPUs, originally designed for video games, turned out to be perfect for the matrix multiplications that neural networks require. Training time dropped from years to days.
3. Architecture — the Transformer (2017) enabled efficient parallel training on sequences of any length.
The "bitter lesson" (Rich Sutton, 2019): every time researchers added domain knowledge to AI systems, general-purpose learning methods trained on more data eventually surpassed them. Scale beats cleverness.
What Neural Networks Are Really Doing
A neural network is a compressed statistical summary of its training data, encoded as billions of floating-point numbers (weights).
When GPT-4 "knows" that Paris is the capital of France, it's not because there's a database lookup. It's because the relationship between tokens "Paris", "capital", and "France" appeared together in patterns across millions of documents, and the weights learned to encode that relationship.
This is why LLMs:
Hallucinate — they generate statistically plausible text, not verified facts
Work across domains — the same weights encode everything they saw during training
Improve with scale — more parameters = more capacity to encode patterns
Summary Comparison
Data needed
None (rules written)
Moderate
Massive
Explainability
High (inspect rules)
Medium
Low (black box)
Generalisation
Poor (brittle)
Good
Excellent
Development cost
High (manual rules)
Medium
High (compute)
Performance ceiling
Low
Medium
Very high
Best for
Closed-world logic
Structured tabular data
Unstructured: text, images, audio
Further Reading
Goodfellow, I. et al. (2016). Deep Learning. MIT Press. (free online)
Last updated
