Week 5 Part II · Training Infrastructure

Loss Functions & Metrics

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

Choose a task-appropriate loss.
Track metrics that reveal real performance.
Write a clean train and evaluation loop.

🎓Lecture · 3 hours

0:00–0:10	10 min	Recap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:25	15 min	MotivationThe loss defines what the model optimizes; the metric defines what matters in practice, and they are not the same.
0:25–1:10	45 min	Loss functions Cross-entropy for classification takes raw logits and integer class indices. BCE for binary or multi-label tasks; MSE (or MAE) for regression. Pass logits, not softmax probabilities, to CrossEntropyLoss for numerical stability. The loss must be differentiable; the reported metric need not be.
1:10–1:20	10 min	Break
1:20–2:05	45 min	Metrics and evaluation Accuracy is intuitive but misleading under class imbalance. Precision, recall, and F1 expose minority-class behavior; the confusion matrix shows the full picture. model.eval() and torch.no_grad() switch off dropout and gradient tracking for evaluation. Track training and validation metrics together to catch overfitting early.
2:05–2:35	30 min	Live demo (predict, then run)Ask the class to predict the accuracy of an always-negative classifier on the imbalanced set before computing it. A training loop with metric logging, MSE-on-classification failing, and accuracy versus F1 on an imbalanced set.
2:35–2:50	15 min	Wrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:00	10 min	Buffer & questions

Common misconception to confront.

Students often think: The loss and the evaluation metric should be the same thing.
Set it straight: The loss must be differentiable to train on; the metric (accuracy, F1) need not be and reflects what you care about. You optimize one and report the other.

Check for understanding (pose during the concept blocks; let students answer before revealing).

Why pass raw logits, not softmax probabilities, to CrossEntropyLoss?

It applies log-softmax internally; pre-softmaxing double-applies it and is numerically unstable.

A model is 95% accurate on a 95%-negative dataset. Is it good?

Not necessarily: always predicting negative also scores 95%. Check precision, recall, and F1 on the positive class.

Key takeaways.

Optimize the right loss and report the right metric.
Accuracy hides minority-class failure.
Use logits with CrossEntropyLoss.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:10	10 min	Setup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:00	50 min	Instructor demonstrations Run a training loop with loss and metric logging. Show MSE on a classification task failing, then cross-entropy working.
1:00–1:05	5 min	Break
1:05–1:45	40 min	Instructor demonstrations (continued) Compute accuracy versus F1 on an imbalanced example.
1:45–2:00	15 min	Wrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.

Common pitfalls to pre-empt.

CrossEntropyLoss expects logits and class indices, not softmax plus MSE.
Accuracy hides minority-class failure; check per-class metrics or F1.

Open the practice notebook in Colab Curated references Lab (homework)