Introduction to Deep Learning · HIT

Week 6   Part II · Training Infrastructure

Optimization

Gradient descent; SGD, momentum, and Adam; learning rates and optimization dynamics.

Learning goals

This is the weekly homework lab, completed independently after the lecture and the practice lesson. It follows the course's Build / Predict & probe / Explain & defend model: use an AI assistant freely for the Build; the graded learning is in Predict and Explain. See the AI-use policy and a fully worked sample submission.

Exercise

Part A · AI assistant welcomeBuild

  1. Build an optimizer-comparison harness that trains the same model with SGD, SGD with momentum, and Adam.

Part B · student reasoningPredict & probe

  1. For three learning rates (too small, good, too large), predict the loss-curve shape before running.

Part C · in plain languageExplain & defend

  1. Explain divergence versus slow convergence in terms of step size, then tune to hit a target validation accuracy.

Deliverables

Hints.

Self-check

Answer each before expanding it. If one is unclear, revisit the lab and the references.

What is the single most important optimization hyperparameter?
The learning rate.
What does momentum do?
Accumulates a velocity over past gradients to smooth and accelerate descent.
How does Adam differ from plain SGD?
It adapts a per-parameter learning rate using gradient moment estimates.
A rising (diverging) loss usually means what?
The learning rate is too high.
What is a learning-rate schedule?
A rule that changes the learning rate over training (e.g. decay or warmup).

Instructor lesson plan (with references)

PreviousWeek 5: Loss Functions & MetricsNextWeek 7: Regularization & Generalization