Week 8 Part IV · MLOps 🎤 Student Project Presentation 2 · Interim
Monitoring, Model Drift & Governance
Instructor lesson plan: lecture (2 h) and practice (2 h).
Learning objectives
- Monitor a deployed model and its inputs.
- Distinguish data drift from concept drift and detect each.
- Define retraining triggers and an audit trail.
Tools this week
EvidentlyPSI / KS testsembedding-distance driftregistry stage gates
🎓Lecture · 2 hours
| 0:00-0:10 | 10 min | Recap & objectives- Retrieval: the reproducibility triple; shadow versus canary.
- Today: the day-two problem; presentations this afternoon.
|
| 0:10-0:25 | 15 min | Motivation: the model that rotted quietly- Story: a recalibrated sensor fleet shifts the input distribution; accuracy follows six weeks later, after the damage.
- Models decay because the world moves; the question is how early you can know.
- Labels are late or absent in production, so accuracy is usually unobservable in real time.
|
| 0:25-0:50 | 25 min | Drift, precisely- Data drift: P(X) changes; the inputs no longer look like training.
- Concept drift: P(Y|X) changes; the world's mapping itself moved.
- Label shift, feature drift, prediction drift: what is observable when.
- Why input drift is the early-warning signal: it needs no labels and precedes the accuracy drop.
- The IoT case on the board: which drift is the recalibrated sensor, and what would you see first?
|
| 0:50-1:10 | 20 min | Detectors and their trade-offs- Population stability index and KS tests on features; thresholds and their false-alarm rates.
- Embedding-distance drift for unstructured inputs (text, images): the detector the chatbot teams need.
- Prediction-distribution monitoring and proxy metrics when labels lag.
- Window choices: reference versus rolling; seasonality returns in week 12.
|
| 1:10-1:20 | 10 min | Break |
| 1:20-1:40 | 20 min | Retraining & governance- Retraining triggers: schedule, drift threshold, proxy degradation; cost of each false trigger.
- The retraining loop reuses everything: pipeline (week 5), tracking and registry (week 7), canary (weeks 4, 7).
- Human-in-the-loop approval for promotion; the audit trail an auditor will actually ask for.
- Every detector pairs with a documented action, or it is decoration.
|
| 1:40-1:55 | 15 min | Worked example (predict, then run)- Predict: we shift one feature's distribution by 15%; which detector fires first, PSI or KS?
- Run the simulation; read the detectors; decide, as a class, whether the retraining trigger should fire.
|
| 1:55-2:00 | 5 min | Wrap-up & Student-Project-Presentation logisticsFinal reminders for Student Project Presentation 2: the live CI/CD demo is mandatory; running order posted. |
Common misconception to confront.
Students often think: If accuracy looks fine, there is no drift.
Set it straight: Labels are often delayed, so live accuracy is usually unknown. Input drift can be detected before any accuracy drop and is the real early-warning signal.
Check for understanding (pose during the concept blocks; let students answer before revealing).
What is the difference between data drift and concept drift?
Data drift is a change in the input distribution; concept drift is a change in the input-to-output relationship itself.
What do you monitor when ground-truth labels arrive weeks late?
Input and feature distributions and the prediction distribution as proxies, alongside business KPIs.
Key takeaways.- Models decay; monitoring is a day-two necessity, not optional.
- Input drift is detectable before accuracy drops.
- A detector without an attached action is just noise.
Common pitfalls to pre-empt.- A drift alarm with no playbook is noise; attach an action to every detector.
- Watching only accuracy misses drift while labels are delayed.
📚Reading & resources
- Designing Machine Learning Systems, ch. 8 to 9 Huyen; data distribution shifts, monitoring, and continual learning.
- Evidently documentation Drift reports and monitors; the practice tooling.
- Reliable Machine Learning, the monitoring chapters Chen, Murphy, Parisa, Sculley and Underwood; SRE principles applied to ML in production.
- Introducing MLOps, the governance chapters Treveil et al.; approval workflows and audit trails for model changes.
🎤Student Project Presentation · 2 hours
The full two-hour practice slot is given over to student project presentations (Student Project Presentation 2 · Interim). There is no instructor-prepared material: teams present and defend their work to the class, with peer and instructor questions after each talk. Each team has 12 to 15 minutes plus questions, and submits a short written report and a tagged release of the repository.
What each team presents.- Working data pipeline through bronze and silver, with validation gates and the data contract in force.
- Experiment tracking and a versioned model (or tracked prompt configuration) in the registry.
- The model served behind the project's REST API, with the canary path and the RED dashboard shown live.
- CI/CD demonstrated live: a change lands through the gated pipeline during the talk.
- Monitoring and drift plan for the second half.
See the running-project brief for the full milestone description and the grading weight.
Project integration (this week)
- Apply the feedback from Presentation 2.
- Stand up the input-drift monitor on the project's served model (PSI or embedding distance, per use case).
- Write the retraining-trigger policy: which signal, which threshold, which action, who approves.
Curated references Project brief