Week 1 Part I · Foundations

Deep Learning Overview & ML-to-Network Framing

What deep learning is; framing a task as tensor inputs, model outputs, and a loss function.

Learning goals

Set up the PyTorch toolchain and confirm it runs.
Frame any ML task as tensors in, a model, and a loss out.
Run a first end-to-end training loop and read the loss curve.

This is the weekly homework lab, completed independently after the lecture and the practice lesson. It follows the course's Build / Predict & probe / Explain & defend model: use an AI assistant freely for the Build; the graded learning is in Predict and Explain. See the AI-use policy and a fully worked sample submission.

⚙Exercise

Part A · AI assistant welcomeBuild

Set up Python, PyTorch, and Jupyter or Colab; confirm whether a GPU is visible and fall back to CPU if not.
With an AI assistant's help, scaffold a minimal training loop on a tiny dataset (a 2-class toy set or a small MNIST subset): model, loss, optimizer, and one train step.
Train for a few epochs and plot the loss going down.

Part B · student reasoningPredict & probe

For three tasks (binary spam detection, house-price regression, 10-class digit recognition), write the input tensor shape, output-layer size, and loss, before any coding.
Predict whether the loss should reach near zero on the toy task and roughly how fast.

Part C · in plain languageExplain & defend

Explain in a few sentences what the model, loss, and optimizer each do.
Justify why the shape-and-loss framing is correct for each of the three tasks.
Name one thing the AI assistant produced that needed correction or was not initially understood.

✓Deliverables

A notebook with the running loop and a loss curve.
The three-task framing table (input shape, output size, loss).
A short reflection (5 to 8 sentences).

Hints.

Start on CPU with a tiny dataset; correctness first, speed later.
If the loss is flat, check the learning rate and that the label format matches the loss.

❓Self-check

Answer each before expanding it. If one is unclear, revisit the lab and the references.

What four components does every ML task reduce to?

Data, a model, a loss (objective), and optimization.

For 10-class digit classification, what output-layer size and loss are appropriate?

Ten output logits with cross-entropy loss.

Why pass logits, not softmax probabilities, to CrossEntropyLoss?

It applies log-softmax internally, which is more numerically stable.

What is the difference in role between the loss and the optimizer?

The loss measures error; the optimizer updates parameters to reduce it.

Training loss stays flat. Name two things to check.

The learning rate, and that the label format matches the chosen loss.

Instructor lesson plan (with references)