Week 7 Part II · Training Infrastructure

Regularization & Generalization

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

Diagnose overfitting from the train and validation gap.
Apply dropout, weight decay, early stopping, and augmentation.
Attribute a generalization gain to a specific cause.

🎓Lecture · 3 hours

0:00–0:10	10 min	Recap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:25	15 min	MotivationFitting the training set is easy; generalizing is the actual job.
0:25–1:10	45 min	Overfitting and capacity Training error always improves; the test error is what matters. Overfitting shows as a widening gap between low training loss and higher validation loss. Capacity (width, depth) and dataset size set how easily a model overfits. The bias-variance trade-off: too simple underfits, too complex overfits.
1:10–1:20	10 min	Break
1:20–2:05	45 min	Regularizers Weight decay (L2) penalizes large weights, reducing variance. Dropout randomly zeros activations during training (off at eval), forcing redundancy. Early stopping halts when the validation metric stops improving. Data augmentation enlarges the effective training set; apply it to the training split only.
2:05–2:35	30 min	Live demo (predict, then run)Ask the class to predict what happens to the validation curve after dropout is added, before running the regularized model. Force a model to overfit, then close the train/validation gap with dropout, weight decay, and augmentation.
2:35–2:50	15 min	Wrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:00	10 min	Buffer & questions

Common misconception to confront.

Students often think: Low training loss means the model is good.
Set it straight: Low training loss with high validation loss is overfitting; the train-minus-validation gap, not training loss, is the signal that matters.

Check for understanding (pose during the concept blocks; let students answer before revealing).

Validation loss rises while training loss keeps falling. What is happening, and one fix?

Overfitting; fix with early stopping, weight decay, dropout, or more and augmented data.

Is dropout active at evaluation time?

No: dropout is on during training and off at eval (model.eval()); the full network is used at eval.

Key takeaways.

Watch the train-minus-validation gap, not validation alone.
Augmentation applies to the training split only.
Dropout goes after activations, not on the output.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:10	10 min	Setup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:00	50 min	Instructor demonstrations Force a model to overfit and show the train-minus-validation gap. Add dropout and weight decay live and watch the gap close.
1:00–1:05	5 min	Break
1:05–1:45	40 min	Instructor demonstrations (continued) Demonstrate data augmentation on a few images.
1:45–2:00	15 min	Wrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.

Common pitfalls to pre-empt.

Watch the train-minus-validation gap, not just validation alone.
Augmentation applies to the training split only.

Open the practice notebook in Colab Curated references Lab (homework)