Week 10 Part III · Architectures & Representation Learning

Recurrent Networks (RNNs)

Sequence data and recurrence; the RNN cell; backpropagation through time; vanishing and exploding gradients.

Learning goals

Build a plain RNN for sequence data.
Understand recurrence and backpropagation through time.
Observe the vanishing-gradient problem directly.

This is the weekly homework lab, completed independently after the lecture and the practice lesson. It follows the course's Build / Predict & probe / Explain & defend model: use an AI assistant freely for the Build; the graded learning is in Predict and Explain. See the AI-use policy and a fully worked sample submission.

⚙Exercise

Part A · AI assistant welcomeBuild

Build a plain RNN on a character-level or short time-series task.

Part B · student reasoningPredict & probe

Predict how the gradient magnitude changes across time steps for long sequences.

Part C · in plain languageExplain & defend

Measure gradient norms across time steps, demonstrate vanishing gradients on long sequences, and explain why long-range dependencies are hard for a plain RNN.

✓Deliverables

An RNN notebook.
A gradient-norm-versus-time-step plot with an explanation.

Hints.

Clip gradients to avoid explosion; start with short sequences.
Log the gradient norm at the earliest time steps to see vanishing.

❓Self-check

Answer each before expanding it. If one is unclear, revisit the lab and the references.

What does an RNN share across time steps?

The same weights (parameters).

What is backpropagation through time?

Backpropagation on the network unrolled across time steps.

Why do long sequences cause vanishing gradients?

Repeated multiplication shrinks (or explodes) gradients across many steps.

Name one remedy for exploding gradients.

Gradient clipping.

What is the hidden state?

The recurrent memory passed from one time step to the next.

Instructor lesson plan (with references)