Week 8 Part III · Architectures & Representation Learning

Convolutional Networks I

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

Build and train a CNN image classifier.
Understand convolution, pooling, and feature maps.
Compute how shapes and parameters change layer by layer.

🎓Lecture · 3 hours

0:00–0:10	10 min	Recap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:25	15 min	MotivationWhy fully-connected nets waste parameters on images, and how convolution exploits structure.
0:25–1:10	45 min	Convolution and pooling A convolution slides a small learned filter over the input, sharing weights across positions. Local connectivity plus weight sharing makes convolutions far cheaper than dense layers on images. Stride and padding set the output size: out = floor((in + 2p - k) / s) + 1. Pooling downsamples and adds a little translation invariance.
1:10–1:20	10 min	Break
1:20–2:05	45 min	Building a CNN classifier Stack conv, activation, and pooling blocks, growing channels while shrinking spatial size. Flatten the feature maps and add a linear classifier head. Track the shape through each layer and verify the parameter count by hand. Early layers learn edges; later layers learn parts and whole objects.
2:05–2:35	30 min	Live demo (predict, then run)Ask the class to predict the output shape after each conv and pool before printing the per-layer shapes. Build a small CNN, print the per-layer shapes, train on FashionMNIST, and visualize a few feature maps.
2:35–2:50	15 min	Wrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:00	10 min	Buffer & questions

Common misconception to confront.

Students often think: A convolution layer has a separate weight for every pixel position.
Set it straight: A filter shares one small set of weights across all positions; that weight sharing is why CNNs use far fewer parameters than dense layers on images.

Check for understanding (pose during the concept blocks; let students answer before revealing).

Input 32x32, a 3x3 filter, stride 1, padding 1. Output spatial size?

32x32: out = floor((32 + 2 - 3)/1) + 1 = 32. A padded 3x3 stride-1 conv preserves size.

Why are CNNs more parameter-efficient than MLPs on images?

Local connectivity plus weight sharing: one filter reused everywhere, not a unique weight per input-output pair.

Key takeaways.

Convolution means local connectivity with shared weights.
Output size = floor((in + 2p - k) / s) + 1.
Feature maps detect patterns at increasing abstraction.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:10	10 min	Setup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:00	50 min	Instructor demonstrations Build a CNN and print the layer-by-layer output shapes. Compute output sizes and parameter counts by hand and verify against a summary.
1:00–1:05	5 min	Break
1:05–1:45	40 min	Instructor demonstrations (continued) Train on FashionMNIST and visualize a few feature maps.
1:45–2:00	15 min	Wrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.

Common pitfalls to pre-empt.

Output size = floor((in + 2p - k) / s) + 1.
Conv parameters = (k * k * Cin + 1) * Cout; verify with a summary tool.

Open the practice notebook in Colab Curated references Lab (homework)