Week 3 Part I · Foundations

MLPs & Backpropagation

Multilayer perceptrons; the forward pass; backpropagation mechanics via PyTorch autograd.

Learning goals

Build and train a multilayer perceptron.
Understand the forward pass and backpropagation.
Use autograd correctly and verify a gradient by hand.

This is the weekly homework lab, completed independently after the lecture and the practice lesson. It follows the course's Build / Predict & probe / Explain & defend model: use an AI assistant freely for the Build; the graded learning is in Predict and Explain. See the AI-use policy and a fully worked sample submission.

⚙Exercise

Part A · AI assistant welcomeBuild

Implement an MLP (nn.Module) and train it on a small classification task with autograd.
Implement the backward pass for one linear layer by hand (manual tensor ops).

Part B · student reasoningPredict & probe

Predict the sign and rough magnitude of one weight's gradient after a single step on a toy example.
Predict how training changes if the nonlinearity is removed.

Part C · in plain languageExplain & defend

Verify the hand-computed gradient matches autograd within tolerance and explain any difference.
Explain in words what each gradient tells its weight to do.

✓Deliverables

An MLP notebook with training results.
The hand-derived gradient plus the autograd check.

Hints.

Zero the gradients each step; compare against torch.autograd.gradcheck.
Without a nonlinearity an MLP collapses to a linear model.

❓Self-check

Answer each before expanding it. If one is unclear, revisit the lab and the references.

Why does an MLP need nonlinear activations?

Without them, stacked linear layers collapse into a single linear map.

What does backpropagation compute?

Gradients of the loss with respect to each parameter, via the chain rule.

Why call optimizer.zero_grad() each step?

PyTorch accumulates gradients, so they must be cleared before each backward pass.

What does .backward() do?

Computes and stores .grad for every tensor with requires_grad=True.

How can a computed gradient be checked for correctness?

Compare it to a finite-difference (numerical) estimate, e.g. with gradcheck.

Instructor lesson plan (with references)