Introduction to Deep Learning · HIT

Week 11   Part III · Architectures & Representation Learning

LSTMs, GRUs & Sequence Tasks

Gated recurrent units; how gates restore gradient flow; LSTM versus GRU; sequence classification and sequence-to-sequence tasks.

Learning goals

This is the weekly homework lab, completed independently after the lecture and the practice lesson. It follows the course's Build / Predict & probe / Explain & defend model: use an AI assistant freely for the Build; the graded learning is in Predict and Explain. See the AI-use policy and a fully worked sample submission.

Exercise

Part A · AI assistant welcomeBuild

  1. Build an LSTM or GRU on the same task as Week 10 and compare it to the plain RNN.

Part B · student reasoningPredict & probe

  1. Predict behavior on long versus short sequences and which gates matter most.

Part C · in plain languageExplain & defend

  1. Ablate the gates, explain how gating preserves the gradient signal where the RNN failed, and compare LSTM with GRU.

Deliverables

Hints.

Self-check

Answer each before expanding it. If one is unclear, revisit the lab and the references.

What do the gates in an LSTM control?
How much information to forget, add, and output from the cell state.
How does a GRU differ from an LSTM?
It merges gates and state into a simpler, lighter unit.
How does gating help with vanishing gradients?
The cell state provides a near-linear path that preserves the gradient signal.
What is teacher forcing?
Feeding the ground-truth previous token as the decoder input during training.
How does sequence classification differ from seq2seq?
One label for the whole sequence versus an output sequence (encoder-decoder).

Instructor lesson plan (with references)

PreviousWeek 10: Recurrent Networks (RNNs)NextWeek 12: Representation Learning