Introduction to Deep Learning · HIT

Week 1   Part I · Foundations

Deep Learning Overview & ML-to-Network Framing

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

🎓Lecture · 3 hours

0:00–0:1010 minRecap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:2515 minMotivationWhy deep learning now: representation learning, scale, and one framework that spans vision, language, and more.
0:25–1:1045 minWhat a neural network is
  • A neural network is a parametric function: it maps an input tensor to an output tensor through learned weights.
  • It is built from layers (linear transforms) interleaved with nonlinear activations; without the nonlinearity it collapses to a single linear map.
  • Learning means adjusting the weights to reduce a loss that scores how wrong the outputs are.
  • Board work: a single neuron computing w.x + b, then stacking neurons into a layer.
1:10–1:2010 minBreak
1:20–2:0545 minFraming an ML task as a network
  • Decide the input representation and its tensor shape (a vector, an image, a sequence).
  • Choose the output layer: one value for regression, or one logit per class for classification.
  • Match the loss to the output: MSE for regression, cross-entropy for classification.
  • Assemble the loop: forward pass, compute loss, backward pass, optimizer step, repeat over the data.
2:05–2:3530 minLive demo (predict, then run)Before running, ask the class to predict what the loss curve does when the learning rate is set 10x too high, then run it and compare. Build a minimal training loop on a toy dataset, watch the loss fall, then change the learning rate to show divergence.
2:35–2:5015 minWrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:0010 minBuffer & questions
Common misconception to confront.

Students often think: Stacking more linear layers makes a more powerful model.
Set it straight: Without a nonlinearity between them, any stack of linear layers is equivalent to a single linear layer W x + b; the activation is what gives depth its power.

Check for understanding (pose during the concept blocks; let students answer before revealing).
Remove every activation from a 5-layer MLP. What function class can it still represent?
Only linear functions: the whole stack collapses to one linear map.
You predict house price from features. What output layer and loss, and why?
One linear output unit and MSE (or MAE): the target is a continuous value, so no softmax and no cross-entropy.
Key takeaways.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:1010 minSetup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:0050 minInstructor demonstrations
  • Set up PyTorch live and confirm the device (GPU or CPU).
  • Walk through a minimal training loop on a toy dataset, run it, and read the loss curve.
1:00–1:055 minBreak
1:05–1:4540 minInstructor demonstrations (continued)
  • Vary the learning rate live to show divergence versus convergence.
  • Frame a classification and a regression example as tensors-in, loss-out, in code.
1:45–2:0015 minWrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.
Common pitfalls to pre-empt.

Open the practice notebook in Colab Curated references Lab (homework)

NextWeek 2: Tensors & Data Representation