Introduction to Deep Learning · HIT

Week 5   Part II · Training Infrastructure

Loss Functions & Metrics

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

🎓Lecture · 3 hours

0:00–0:1010 minRecap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:2515 minMotivationThe loss defines what the model optimizes; the metric defines what matters in practice, and they are not the same.
0:25–1:1045 minLoss functions
  • Cross-entropy for classification takes raw logits and integer class indices.
  • BCE for binary or multi-label tasks; MSE (or MAE) for regression.
  • Pass logits, not softmax probabilities, to CrossEntropyLoss for numerical stability.
  • The loss must be differentiable; the reported metric need not be.
1:10–1:2010 minBreak
1:20–2:0545 minMetrics and evaluation
  • Accuracy is intuitive but misleading under class imbalance.
  • Precision, recall, and F1 expose minority-class behavior; the confusion matrix shows the full picture.
  • model.eval() and torch.no_grad() switch off dropout and gradient tracking for evaluation.
  • Track training and validation metrics together to catch overfitting early.
2:05–2:3530 minLive demo (predict, then run)Ask the class to predict the accuracy of an always-negative classifier on the imbalanced set before computing it. A training loop with metric logging, MSE-on-classification failing, and accuracy versus F1 on an imbalanced set.
2:35–2:5015 minWrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:0010 minBuffer & questions
Common misconception to confront.

Students often think: The loss and the evaluation metric should be the same thing.
Set it straight: The loss must be differentiable to train on; the metric (accuracy, F1) need not be and reflects what you care about. You optimize one and report the other.

Check for understanding (pose during the concept blocks; let students answer before revealing).
Why pass raw logits, not softmax probabilities, to CrossEntropyLoss?
It applies log-softmax internally; pre-softmaxing double-applies it and is numerically unstable.
A model is 95% accurate on a 95%-negative dataset. Is it good?
Not necessarily: always predicting negative also scores 95%. Check precision, recall, and F1 on the positive class.
Key takeaways.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:1010 minSetup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:0050 minInstructor demonstrations
  • Run a training loop with loss and metric logging.
  • Show MSE on a classification task failing, then cross-entropy working.
1:00–1:055 minBreak
1:05–1:4540 minInstructor demonstrations (continued)
  • Compute accuracy versus F1 on an imbalanced example.
1:45–2:0015 minWrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.
Common pitfalls to pre-empt.

Open the practice notebook in Colab Curated references Lab (homework)

PreviousWeek 4: Data PipelinesNextWeek 6: Optimization