Lab 09: numpy Advanced

Objective

Master numpy's advanced features: n-dimensional array reshaping, broadcasting rules, fancy/boolean indexing, np.einsum for tensor operations, np.vectorize, structured arrays, and performance-critical patterns for data-intensive pipelines.

Background

numpy's speed comes from two sources: BLAS/LAPACK C libraries for linear algebra, and avoiding Python loops via vectorization. The key mental model: think in arrays, not loops. arr * 0.9 applies the operation to all 10M elements simultaneously in C — thousands of times faster than for x in arr.

Time

35 minutes

Prerequisites

  • Practitioner Lab 10 (pandas & numpy basics)

Tools

  • Docker: zchencow/innozverse-python:latest


Lab Instructions

Step 1: Reshaping, Stacking & Broadcasting

💡 Broadcasting rule: numpy aligns shapes from the right. (5,) and (4,1) → align right → (1,5) and (4,1) → expand 1s → (4,5). No data is copied; numpy creates a virtual expanded view. This lets you apply any operation between arrays without explicit loops.

📸 Verified Output:


Step 2: Fancy Indexing & Boolean Masking

📸 Verified Output:


Steps 3–8: einsum, vectorize, structured arrays, linear algebra, ufuncs, Capstone

📸 Verified Output:


Summary

Feature
API
When to use

Reshape

arr.reshape(m, n)

Change dims, same data

Broadcasting

automatic

Apply scalar/vector to array

Boolean mask

arr[arr > 0]

Filter rows

Fancy index

arr[[0, 2, 4]]

Select by integer array

np.where

np.where(cond, x, y)

Element-wise if/else

np.einsum

'ij,jk->ik'

Tensor contractions

np.vectorize

@np.vectorize

Broadcast Python functions

Structured array

np.dtype([('f', type)])

Mixed-type records

np.linalg

lstsq, inv, eig

Linear algebra

Further Reading

Last updated