Week 3 Part I · Foundations
MLPs & Backpropagation
Multilayer perceptrons; the forward pass; backpropagation mechanics via PyTorch autograd.
Learning goals
- Build and train a multilayer perceptron.
- Understand the forward pass and backpropagation.
- Use autograd correctly and verify a gradient by hand.
This is the weekly
homework lab, completed independently after the lecture and the practice lesson. It follows the course's
Build / Predict & probe / Explain & defend model: use an AI assistant freely for the Build; the graded learning is in Predict and Explain. See the
AI-use policy and a
fully worked sample submission.
⚙Exercise
Part A · AI assistant welcomeBuild
- Implement an MLP (nn.Module) and train it on a small classification task with autograd.
- Implement the backward pass for one linear layer by hand (manual tensor ops).
Part B · student reasoningPredict & probe
- Predict the sign and rough magnitude of one weight's gradient after a single step on a toy example.
- Predict how training changes if the nonlinearity is removed.
Part C · in plain languageExplain & defend
- Verify the hand-computed gradient matches autograd within tolerance and explain any difference.
- Explain in words what each gradient tells its weight to do.
✓Deliverables
- An MLP notebook with training results.
- The hand-derived gradient plus the autograd check.
Hints.- Zero the gradients each step; compare against torch.autograd.gradcheck.
- Without a nonlinearity an MLP collapses to a linear model.
❓Self-check
Answer each before expanding it. If one is unclear, revisit the lab and the references.
Why does an MLP need nonlinear activations?
Without them, stacked linear layers collapse into a single linear map.
What does backpropagation compute?
Gradients of the loss with respect to each parameter, via the chain rule.
Why call optimizer.zero_grad() each step?
PyTorch accumulates gradients, so they must be cleared before each backward pass.
What does .backward() do?
Computes and stores .grad for every tensor with requires_grad=True.
How can a computed gradient be checked for correctness?
Compare it to a finite-difference (numerical) estimate, e.g. with gradcheck.
Instructor lesson plan (with references)