Introduction to Deep Learning · HIT

Week 8   Part III · Architectures & Representation Learning

Convolutional Networks I

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

🎓Lecture · 3 hours

0:00–0:1010 minRecap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:2515 minMotivationWhy fully-connected nets waste parameters on images, and how convolution exploits structure.
0:25–1:1045 minConvolution and pooling
  • A convolution slides a small learned filter over the input, sharing weights across positions.
  • Local connectivity plus weight sharing makes convolutions far cheaper than dense layers on images.
  • Stride and padding set the output size: out = floor((in + 2p - k) / s) + 1.
  • Pooling downsamples and adds a little translation invariance.
1:10–1:2010 minBreak
1:20–2:0545 minBuilding a CNN classifier
  • Stack conv, activation, and pooling blocks, growing channels while shrinking spatial size.
  • Flatten the feature maps and add a linear classifier head.
  • Track the shape through each layer and verify the parameter count by hand.
  • Early layers learn edges; later layers learn parts and whole objects.
2:05–2:3530 minLive demo (predict, then run)Ask the class to predict the output shape after each conv and pool before printing the per-layer shapes. Build a small CNN, print the per-layer shapes, train on FashionMNIST, and visualize a few feature maps.
2:35–2:5015 minWrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:0010 minBuffer & questions
Common misconception to confront.

Students often think: A convolution layer has a separate weight for every pixel position.
Set it straight: A filter shares one small set of weights across all positions; that weight sharing is why CNNs use far fewer parameters than dense layers on images.

Check for understanding (pose during the concept blocks; let students answer before revealing).
Input 32x32, a 3x3 filter, stride 1, padding 1. Output spatial size?
32x32: out = floor((32 + 2 - 3)/1) + 1 = 32. A padded 3x3 stride-1 conv preserves size.
Why are CNNs more parameter-efficient than MLPs on images?
Local connectivity plus weight sharing: one filter reused everywhere, not a unique weight per input-output pair.
Key takeaways.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:1010 minSetup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:0050 minInstructor demonstrations
  • Build a CNN and print the layer-by-layer output shapes.
  • Compute output sizes and parameter counts by hand and verify against a summary.
1:00–1:055 minBreak
1:05–1:4540 minInstructor demonstrations (continued)
  • Train on FashionMNIST and visualize a few feature maps.
1:45–2:0015 minWrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.
Common pitfalls to pre-empt.

Open the practice notebook in Colab Curated references Lab (homework)

PreviousWeek 7: Regularization & GeneralizationNextWeek 9: Convolutional Networks II