Implement linear regression completely from scratch using only NumPy: mean squared error loss, gradient descent optimiser, RΒ² score, multi-feature regression, and a learning-rate sensitivity experiment β then apply it to predict Surface device prices from specs.
Background
Linear regression models a relationship between input features X and a continuous output y as Ε· = Xw + b. Training means finding weights w and bias b that minimise the Mean Squared Error (MSE) loss. Gradient descent iteratively moves weights in the direction of steepest loss decrease: w = w - Ξ±Β·βL/βw. Understanding this from scratch reveals why learning rate, feature scaling, and data quality matter β lessons that apply to every ML algorithm.
Time
30 minutes
Prerequisites
Python Practitioner Lab 10 (numpy/pandas)
Tools
Docker: zchencow/innozverse-python:latest
Lab Instructions
π‘ Feature normalisation is not optional for gradient descent. If RAM ranges 8β32 and Storage ranges 64β512, the gradient for Storage is ~16Γ larger β gradient descent oscillates wildly and may never converge. After z-score normalisation, all features have mean=0 and std=1, so gradients are comparable in magnitude and convergence is smooth. This is why StandardScaler is almost always the first step in a scikit-learn pipeline.