Week 12 Part III · Architectures & Representation Learning

Representation Learning

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

Train an autoencoder and a contrastive embedding.
Probe and interpret a learned latent space.
Reason about what makes a representation useful.

🎓Lecture · 3 hours

0:00–0:10	10 min	Recap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:25	15 min	MotivationGood representations make downstream tasks easy; learn them with or without labels.
0:25–1:10	45 min	Autoencoders An autoencoder compresses the input through a bottleneck (encoder) and reconstructs it (decoder). An undercomplete bottleneck forces it to learn salient structure rather than copy the input. The latent code is a learned, lower-dimensional representation. Denoising variants reconstruct a clean input from a corrupted one.
1:10–1:20	10 min	Break
1:20–2:05	45 min	Contrastive and self-supervised learning Self-supervised learning creates its own labels from unlabeled data. Contrastive methods pull augmented views of the same example together and push different examples apart. The augmentation policy defines what counts as similar. A linear probe on the frozen features measures representation quality.
2:05–2:35	30 min	Live demo (predict, then run)Ask the class to predict what a latent interpolation between two examples will look like before decoding it. Train an autoencoder, visualize reconstructions and a latent interpolation, and sketch a contrastive setup.
2:35–2:50	15 min	Wrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:00	10 min	Buffer & questions

Common misconception to confront.

Students often think: An autoencoder is useful because it reconstructs its input accurately.
Set it straight: Perfect reconstruction is trivial if the bottleneck is wide enough; the value is the constrained latent code, where an undercomplete bottleneck forces salient structure.

Check for understanding (pose during the concept blocks; let students answer before revealing).

Why does an autoencoder need an undercomplete (narrow) bottleneck?

Otherwise it can learn the identity and copy the input; the narrow bottleneck forces compression to salient features.

In contrastive learning, what defines which examples are similar?

The augmentation policy: two augmented views of the same example are a positive pair pulled together, other examples are pushed apart.

Key takeaways.

A bottleneck forces useful compression.
Augmentation choices define similarity.
Learned representations transfer to new tasks.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:10	10 min	Setup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:00	50 min	Instructor demonstrations Train an autoencoder and visualize reconstructions and the latent space. Interpolate between two points in latent space live.
1:00–1:05	5 min	Break
1:05–1:45	40 min	Instructor demonstrations (continued) Sketch a contrastive setup and show the augmentation views.
1:45–2:00	15 min	Wrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.

Common pitfalls to pre-empt.

A too-large bottleneck just copies the input.
For contrastive learning, the augmentation choice defines what counts as similar.

Open the practice notebook in Colab Curated references Lab (homework)