Introduction to Deep Learning · HIT

Week 3   Part I · Foundations

MLPs & Backpropagation

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

🎓Lecture · 3 hours

0:00–0:1010 minRecap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:2515 minMotivationFrom a single linear layer to a universal function approximator, and how learning actually happens.
0:25–1:1045 minThe multilayer perceptron
  • An MLP stacks linear layers with nonlinear activations (ReLU, sigmoid, tanh).
  • The nonlinearity is essential: stacked linear layers are still just a linear map.
  • nn.Module holds the parameters and defines the forward pass; calling the module runs forward.
  • Width and depth set capacity; more is not always better.
1:10–1:2010 minBreak
1:20–2:0545 minBackpropagation and autograd
  • The forward pass builds a computational graph of the operations applied.
  • Backpropagation applies the chain rule backward through that graph to get each parameter's gradient.
  • Autograd records the graph automatically; .backward() fills .grad; optimizer.step() updates the weights.
  • Gradients accumulate, so call zero_grad() each iteration; verify a gradient against a finite-difference estimate.
2:05–2:3530 minLive demo (predict, then run)Ask the class to predict .grad after the second backward (with no zero_grad) before revealing it. Build an MLP, inspect .grad before and after zero_grad, and check a hand-derived gradient against autograd.
2:35–2:5015 minWrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:0010 minBuffer & questions
Common misconception to confront.

Students often think: Backpropagation is a separate, mysterious algorithm.
Set it straight: Backprop is exactly the chain rule applied backward over the computational graph; autograd just records the graph and applies it automatically.

Check for understanding (pose during the concept blocks; let students answer before revealing).
After two backward() calls on the same loss with no zero_grad(), what is in .grad?
Twice the single-step gradient: gradients accumulate, which is why you zero them each step.
How would you sanity-check that autograd is correct?
Compare against a finite-difference estimate, or use torch.autograd.gradcheck.
Key takeaways.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:1010 minSetup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:0050 minInstructor demonstrations
  • Build an MLP with nn.Module live and train it on a small task.
  • Inspect .grad after a backward pass and show the effect of zero_grad.
1:00–1:055 minBreak
1:05–1:4540 minInstructor demonstrations (continued)
  • Compare a hand-computed gradient with autograd on a tiny example.
1:45–2:0015 minWrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.
Common pitfalls to pre-empt.

Open the practice notebook in Colab Curated references Lab (homework)

PreviousWeek 2: Tensors & Data RepresentationNextWeek 4: Data Pipelines