Introduction to Deep Learning · HIT

Week 7   Part II · Training Infrastructure

Regularization & Generalization

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

🎓Lecture · 3 hours

0:00–0:1010 minRecap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:2515 minMotivationFitting the training set is easy; generalizing is the actual job.
0:25–1:1045 minOverfitting and capacity
  • Training error always improves; the test error is what matters.
  • Overfitting shows as a widening gap between low training loss and higher validation loss.
  • Capacity (width, depth) and dataset size set how easily a model overfits.
  • The bias-variance trade-off: too simple underfits, too complex overfits.
1:10–1:2010 minBreak
1:20–2:0545 minRegularizers
  • Weight decay (L2) penalizes large weights, reducing variance.
  • Dropout randomly zeros activations during training (off at eval), forcing redundancy.
  • Early stopping halts when the validation metric stops improving.
  • Data augmentation enlarges the effective training set; apply it to the training split only.
2:05–2:3530 minLive demo (predict, then run)Ask the class to predict what happens to the validation curve after dropout is added, before running the regularized model. Force a model to overfit, then close the train/validation gap with dropout, weight decay, and augmentation.
2:35–2:5015 minWrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:0010 minBuffer & questions
Common misconception to confront.

Students often think: Low training loss means the model is good.
Set it straight: Low training loss with high validation loss is overfitting; the train-minus-validation gap, not training loss, is the signal that matters.

Check for understanding (pose during the concept blocks; let students answer before revealing).
Validation loss rises while training loss keeps falling. What is happening, and one fix?
Overfitting; fix with early stopping, weight decay, dropout, or more and augmented data.
Is dropout active at evaluation time?
No: dropout is on during training and off at eval (model.eval()); the full network is used at eval.
Key takeaways.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:1010 minSetup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:0050 minInstructor demonstrations
  • Force a model to overfit and show the train-minus-validation gap.
  • Add dropout and weight decay live and watch the gap close.
1:00–1:055 minBreak
1:05–1:4540 minInstructor demonstrations (continued)
  • Demonstrate data augmentation on a few images.
1:45–2:0015 minWrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.
Common pitfalls to pre-empt.

Open the practice notebook in Colab Curated references Lab (homework)

PreviousWeek 6: OptimizationNextWeek 8: Convolutional Networks I