| 0:00-0:10 | 10 min | Recap & objectives- Retrieval: train/serve skew; when streaming wins.
- Today: the model lifecycle, from training run to live endpoint.
|
| 0:10-0:25 | 15 min | Motivation: which run made this model?- Story: the production model whose training data, code version, and hyperparameters nobody could name during an audit.
- Training is the easy part; versioning, serving, and safely replacing models is where production ML lives.
- Everything today reuses weeks 3 to 5: the API contract, the canary, the data version.
|
| 0:25-0:50 | 25 min | Experiment tracking, properly- What to log: parameters, metrics over time (curves, not just finals), artifacts, environment.
- The reproducibility triple: git SHA + data version (week 5's DVC snapshot) + environment, pinned to every run.
- Comparing runs in the MLflow UI; reading a learning curve versus a final number.
- Tracking applies to LLM work too: prompt versions and eval scores are runs (weeks 9 to 10 will reuse this).
|
| 0:50-1:10 | 20 min | The model registry & model cards- A model version is an artifact plus its lineage, not a file on a laptop; the registry doubles as the model catalogue: discovery, ownership, and governance in one place.
- Registry stages: staging, production, archived; promotion as a governed, auditable transition.
- Rollback as a first-class operation: the previous version is one transition away.
- Model cards: intended use, training data, metrics, limitations; documentation that ships with the artifact.
|
| 1:10-1:20 | 10 min | Break |
| 1:20-1:40 | 20 min | Serving patterns- Packaging: the model plus pinned dependencies, containerised like any service.
- Online (request/response), batch (precomputed), and streaming scoring; choosing by freshness need and cost.
- The serving endpoint is a REST API: week 3's contracts, validation, and health checks apply unchanged.
- Latency engineering: model warm-up, request batching, the p95 budget.
|
| 1:40-1:55 | 15 min | Safe rollout for models (predict, then run)- Shadow: real traffic, responses unused; the free lunch of rollout safety.
- Canary and A/B for models; what online metric decides promotion.
- Predict: offline AUC up 2 points; what can still go wrong online? Then the offline-online gap, demonstrated with a latency regression.
|
| 1:55-2:00 | 5 min | Wrap-up & practice previewPractice takes the project model from tracked run to canaried endpoint. |
Common misconception to confront.
Students often think: A good offline metric guarantees a good online result.
Set it straight: Offline metrics use historical data and proxy objectives. Feedback loops, latency, and distribution shift mean online behaviour can differ, so validate with shadow, canary, or A/B.
In the practice session the instructor demonstrates the tooling live and teaches the hands-on topics that belong at the keyboard. There are no separate weekly labs: each session closes with the project-integration brief, the increment every team adds to its end-to-end system before next week.
| 0:00-0:10 | 10 min | Setup & recap- Open the project's training script and the MLflow UI.
- Recap the reproducibility triple.
|
| 0:10-0:35 | 25 min | Instrument and track- Instrument the project's training: params, metric curves, artifacts.
- Pin the git SHA and the DVC data version to the run; show the lineage end to end.
- Train twice with different hyperparameters; compare runs in the UI.
|
| 0:35-1:00 | 25 min | Register, promote, roll back- Register the better run as model v1; write its model card from the run metadata.
- Promote staging to production; then practice the rollback transition.
- Chatbot and document teams: the 'model' may be a prompt + model-tier configuration; track it the same way.
|
| 1:00-1:10 | 10 min | Break |
| 1:10-1:35 | 25 min | Serve it- Wrap the registered model in a serving runtime behind the project's REST API.
- Health probes, input validation, and warm-up; deploy to the cluster from week 4.
- Load-test; read p95 on the RED dashboard; compare to the recorded baseline.
|
| 1:35-1:50 | 15 min | Canary the model- Ship model v2 as a canary next to v1, exactly like week 4's service canary.
- Compare per-version predictions and latency; promote or roll back on evidence.
|
| 1:50-2:00 | 10 min | Project-integration briefThe 'Project integration' card: tracked, registered, served, canaried; the model lifecycle is now demonstrable for Presentation 2. |