Introduction to Deep Learning · HIT

Week 11   Part III · Architectures & Representation Learning

LSTMs, GRUs & Sequence Tasks

Instructor lesson plan: lecture (3 h) and practice (2 h).

Learning objectives

🎓Lecture · 3 hours

0:00–0:1010 minRecap & retrievalOpen with two quick questions on last week's material (retrieval practice), then state this week's objectives.
0:10–0:2515 minMotivationGates: a learned mechanism to keep or forget information across long sequences.
0:25–1:1045 minLSTM and GRU
  • An LSTM adds a cell state plus input, forget, and output gates that control information flow.
  • The cell state provides a near-linear path that preserves gradients over long sequences.
  • A GRU merges gates and states into a lighter unit with similar performance.
  • The gates are learned, so the network decides what to keep and what to forget.
1:10–1:2010 minBreak
1:20–2:0545 minSequence tasks
  • Sequence classification produces one label for a whole sequence.
  • Sequence-to-sequence uses an encoder and a decoder for variable-length outputs.
  • Teacher forcing feeds the true previous token to the decoder during training.
  • Match the architecture (many-to-one, many-to-many) to the task.
2:05–2:3530 minLive demo (predict, then run)Ask the class to predict whether the LSTM or the plain RNN holds the long-range signal better before comparing them. Swap an RNN for an LSTM, inspect the gates and cell state, and compare long- versus short-sequence gradients.
2:35–2:5015 minWrap-up & practice previewRevisit the misconception and concept checks below, recap the takeaways, and preview the practice lesson.
2:50–3:0010 minBuffer & questions
Common misconception to confront.

Students often think: LSTMs beat plain RNNs because they are bigger and have more parameters.
Set it straight: It is the cell state’s near-linear, gated path, not the parameter count, that preserves gradients across long sequences; the gates learn what to keep and forget.

Check for understanding (pose during the concept blocks; let students answer before revealing).
What is the role of the cell state versus the hidden state in an LSTM?
The cell state is a protected, near-linear memory carried across steps (a gradient highway); the hidden state is the gated output exposed to the next step or layer.
Sentiment classification of a whole sentence: many-to-one or many-to-many?
Many-to-one: a whole sequence in, a single label out.
Key takeaways.

💻Practice · 2 hours

In the practice lesson the instructor demonstrates implementations, runs code, and works through examples, using the practice notebook linked below. The weekly lab is then set as homework, where students apply this themselves.

0:00–0:1010 minSetup & recapRecap the lecture's key ideas and open the working notebook.
0:10–1:0050 minInstructor demonstrations
  • Swap the RNN for an LSTM or GRU on the same task and compare.
  • Walk through the gates and the cell state on the board and in code.
1:00–1:055 minBreak
1:05–1:4540 minInstructor demonstrations (continued)
  • Show behavior on long versus short sequences.
1:45–2:0015 minWrap-up & lab briefSummarize the patterns shown and brief the weekly lab (homework), which students complete on their own.
Common pitfalls to pre-empt.

Open the practice notebook in Colab Curated references Lab (homework)

PreviousWeek 10: Recurrent Networks (RNNs)NextWeek 12: Representation Learning