Introduction to Deep Learning · HIT

Week 6   Part II · Training Infrastructure

Optimization

Gradient descent; SGD, momentum, and Adam; learning rates and optimization dynamics.

Curated, free, canonical references for this week: a course or lecture, a book chapter, a video, and an authoritative blog post or official tutorial. Each opens in a new tab.

Course
Stanford CS231n: Neural Networks Part 3 (Optimization) cs231n.github.io

Covers SGD, momentum/Nesterov, Adagrad/RMSProp/Adam, and learning-rate annealing.

Book
Dive into Deep Learning, 12.4 Stochastic Gradient Descent d2l.ai

SGD, dynamic learning-rate schedules, and convergence behavior.

Video
3Blue1Brown: Gradient descent, how neural networks learn youtube.com

Visual, intuition-first explanation of gradient descent and the negative-gradient update.

Blog / Docs
Sebastian Ruder: An overview of gradient descent optimization algorithms ruder.io

The authoritative comparison of SGD variants, momentum, Adagrad/RMSProp/Adam, with practical guidance.

← Back to the Week 6 lab

PreviousWeek 5: Loss Functions & MetricsNextWeek 7: Regularization & Generalization