Engineering of AI Systems · HIT

Week 8   Part IV · MLOps   🎤 Student Project Presentation 2 · Interim

Monitoring, Model Drift & Governance

Instructor lesson plan: lecture (2 h) and practice (2 h).

Learning objectives

Tools this week

EvidentlyPSI / KS testsembedding-distance driftregistry stage gates

🎓Lecture · 2 hours

0:00-0:1010 minRecap & objectives
  • Retrieval: the reproducibility triple; shadow versus canary.
  • Today: the day-two problem; presentations this afternoon.
0:10-0:2515 minMotivation: the model that rotted quietly
  • Story: a recalibrated sensor fleet shifts the input distribution; accuracy follows six weeks later, after the damage.
  • Models decay because the world moves; the question is how early you can know.
  • Labels are late or absent in production, so accuracy is usually unobservable in real time.
0:25-0:5025 minDrift, precisely
  • Data drift: P(X) changes; the inputs no longer look like training.
  • Concept drift: P(Y|X) changes; the world's mapping itself moved.
  • Label shift, feature drift, prediction drift: what is observable when.
  • Why input drift is the early-warning signal: it needs no labels and precedes the accuracy drop.
  • The IoT case on the board: which drift is the recalibrated sensor, and what would you see first?
0:50-1:1020 minDetectors and their trade-offs
  • Population stability index and KS tests on features; thresholds and their false-alarm rates.
  • Embedding-distance drift for unstructured inputs (text, images): the detector the chatbot teams need.
  • Prediction-distribution monitoring and proxy metrics when labels lag.
  • Window choices: reference versus rolling; seasonality returns in week 12.
1:10-1:2010 minBreak
1:20-1:4020 minRetraining & governance
  • Retraining triggers: schedule, drift threshold, proxy degradation; cost of each false trigger.
  • The retraining loop reuses everything: pipeline (week 5), tracking and registry (week 7), canary (weeks 4, 7).
  • Human-in-the-loop approval for promotion; the audit trail an auditor will actually ask for.
  • Every detector pairs with a documented action, or it is decoration.
1:40-1:5515 minWorked example (predict, then run)
  • Predict: we shift one feature's distribution by 15%; which detector fires first, PSI or KS?
  • Run the simulation; read the detectors; decide, as a class, whether the retraining trigger should fire.
1:55-2:005 minWrap-up & Student-Project-Presentation logisticsFinal reminders for Student Project Presentation 2: the live CI/CD demo is mandatory; running order posted.
Common misconception to confront.

Students often think: If accuracy looks fine, there is no drift.
Set it straight: Labels are often delayed, so live accuracy is usually unknown. Input drift can be detected before any accuracy drop and is the real early-warning signal.

Check for understanding (pose during the concept blocks; let students answer before revealing).
What is the difference between data drift and concept drift?
Data drift is a change in the input distribution; concept drift is a change in the input-to-output relationship itself.
What do you monitor when ground-truth labels arrive weeks late?
Input and feature distributions and the prediction distribution as proxies, alongside business KPIs.
Key takeaways.
Common pitfalls to pre-empt.

📚Reading & resources

🎤Student Project Presentation · 2 hours

The full two-hour practice slot is given over to student project presentations (Student Project Presentation 2 · Interim). There is no instructor-prepared material: teams present and defend their work to the class, with peer and instructor questions after each talk. Each team has 12 to 15 minutes plus questions, and submits a short written report and a tagged release of the repository.

What each team presents.

See the running-project brief for the full milestone description and the grading weight.

Project integration (this week)

Curated references Project brief

PreviousWeek 7: Experiment Tracking, Model Registry & ServingNextWeek 9: LLM Foundations: AI APIs, Tokens & the Token Economy