Week 6 Part II · Training Infrastructure
Gradient descent; SGD, momentum, and Adam; learning rates and optimization dynamics.
Curated, free, canonical references for this week: a course or lecture, a book chapter, a video, and an authoritative blog post or official tutorial. Each opens in a new tab.
Covers SGD, momentum/Nesterov, Adagrad/RMSProp/Adam, and learning-rate annealing.
SGD, dynamic learning-rate schedules, and convergence behavior.
Visual, intuition-first explanation of gradient descent and the negative-gradient update.
The authoritative comparison of SGD variants, momentum, Adagrad/RMSProp/Adam, with practical guidance.