Week 11 Part III · Architectures & Representation Learning
LSTMs, GRUs & Sequence Tasks
Gated recurrent units; how gates restore gradient flow; LSTM versus GRU; sequence classification and sequence-to-sequence tasks.
Learning goals
- Build an LSTM or GRU and compare it to the plain RNN.
- Understand how gates restore gradient flow.
- Apply gated networks to a sequence task.
This is the weekly
homework lab, completed independently after the lecture and the practice lesson. It follows the course's
Build / Predict & probe / Explain & defend model: use an AI assistant freely for the Build; the graded learning is in Predict and Explain. See the
AI-use policy and a
fully worked sample submission.
⚙Exercise
Part A · AI assistant welcomeBuild
- Build an LSTM or GRU on the same task as Week 10 and compare it to the plain RNN.
Part B · student reasoningPredict & probe
- Predict behavior on long versus short sequences and which gates matter most.
Part C · in plain languageExplain & defend
- Ablate the gates, explain how gating preserves the gradient signal where the RNN failed, and compare LSTM with GRU.
✓Deliverables
- An LSTM or GRU notebook with an RNN-versus-gated comparison.
- A gate ablation with an explanation.
Hints.- Keep the task identical to Week 10 for a fair comparison.
- A GRU is lighter than an LSTM; watch long-sequence accuracy.
❓Self-check
Answer each before expanding it. If one is unclear, revisit the lab and the references.
What do the gates in an LSTM control?
How much information to forget, add, and output from the cell state.
How does a GRU differ from an LSTM?
It merges gates and state into a simpler, lighter unit.
How does gating help with vanishing gradients?
The cell state provides a near-linear path that preserves the gradient signal.
What is teacher forcing?
Feeding the ground-truth previous token as the decoder input during training.
How does sequence classification differ from seq2seq?
One label for the whole sequence versus an output sequence (encoder-decoder).
Instructor lesson plan (with references)