Week 8 Part IV · MLOps 🎤 Student Project Presentation 2 · Interim

Monitoring, Model Drift & Governance

Instructor lesson plan: lecture (2 h) and practice (2 h).

Learning objectives

Monitor a deployed model and its inputs.
Distinguish data drift from concept drift and detect each.
Define retraining triggers and an audit trail.

Tools this week

EvidentlyPSI / KS testsembedding-distance driftregistry stage gates

🎓Lecture · 2 hours

0:00-0:10	10 min	Recap & objectives Retrieval: the reproducibility triple; shadow versus canary. Today: the day-two problem; presentations this afternoon.
0:10-0:25	15 min	Motivation: the model that rotted quietly Story: a recalibrated sensor fleet shifts the input distribution; accuracy follows six weeks later, after the damage. Models decay because the world moves; the question is how early you can know. Labels are late or absent in production, so accuracy is usually unobservable in real time.
0:25-0:50	25 min	Drift, precisely Data drift: P(X) changes; the inputs no longer look like training. Concept drift: P(Y\|X) changes; the world's mapping itself moved. Label shift, feature drift, prediction drift: what is observable when. Why input drift is the early-warning signal: it needs no labels and precedes the accuracy drop. The IoT case on the board: which drift is the recalibrated sensor, and what would you see first?
0:50-1:10	20 min	Detectors and their trade-offs Population stability index and KS tests on features; thresholds and their false-alarm rates. Embedding-distance drift for unstructured inputs (text, images): the detector the chatbot teams need. Prediction-distribution monitoring and proxy metrics when labels lag. Window choices: reference versus rolling; seasonality returns in week 12.
1:10-1:20	10 min	Break
1:20-1:40	20 min	Retraining & governance Retraining triggers: schedule, drift threshold, proxy degradation; cost of each false trigger. The retraining loop reuses everything: pipeline (week 5), tracking and registry (week 7), canary (weeks 4, 7). Human-in-the-loop approval for promotion; the audit trail an auditor will actually ask for. Every detector pairs with a documented action, or it is decoration.
1:40-1:55	15 min	Worked example (predict, then run) Predict: we shift one feature's distribution by 15%; which detector fires first, PSI or KS? Run the simulation; read the detectors; decide, as a class, whether the retraining trigger should fire.
1:55-2:00	5 min	Wrap-up & Student-Project-Presentation logisticsFinal reminders for Student Project Presentation 2: the live CI/CD demo is mandatory; running order posted.

Common misconception to confront.

Students often think: If accuracy looks fine, there is no drift.
Set it straight: Labels are often delayed, so live accuracy is usually unknown. Input drift can be detected before any accuracy drop and is the real early-warning signal.

Check for understanding (pose during the concept blocks; let students answer before revealing).

What is the difference between data drift and concept drift?

Data drift is a change in the input distribution; concept drift is a change in the input-to-output relationship itself.

What do you monitor when ground-truth labels arrive weeks late?

Input and feature distributions and the prediction distribution as proxies, alongside business KPIs.

Key takeaways.

Models decay; monitoring is a day-two necessity, not optional.
Input drift is detectable before accuracy drops.
A detector without an attached action is just noise.

Common pitfalls to pre-empt.

A drift alarm with no playbook is noise; attach an action to every detector.
Watching only accuracy misses drift while labels are delayed.

📚Reading & resources

Designing Machine Learning Systems, ch. 8 to 9 Huyen; data distribution shifts, monitoring, and continual learning.
Evidently documentation Drift reports and monitors; the practice tooling.
Reliable Machine Learning, the monitoring chapters Chen, Murphy, Parisa, Sculley and Underwood; SRE principles applied to ML in production.
Introducing MLOps, the governance chapters Treveil et al.; approval workflows and audit trails for model changes.

🎤Student Project Presentation · 2 hours

The full two-hour practice slot is given over to student project presentations (Student Project Presentation 2 · Interim). There is no instructor-prepared material: teams present and defend their work to the class, with peer and instructor questions after each talk. Each team has 12 to 15 minutes plus questions, and submits a short written report and a tagged release of the repository.

What each team presents.

Working data pipeline through bronze and silver, with validation gates and the data contract in force.
Experiment tracking and a versioned model (or tracked prompt configuration) in the registry.
The model served behind the project's REST API, with the canary path and the RED dashboard shown live.
CI/CD demonstrated live: a change lands through the gated pipeline during the talk.
Monitoring and drift plan for the second half.

See the running-project brief for the full milestone description and the grading weight.

Project integration (this week)

Apply the feedback from Presentation 2.
Stand up the input-drift monitor on the project's served model (PSI or embedding distance, per use case).
Write the retraining-trigger policy: which signal, which threshold, which action, who approves.

Curated references Project brief