| 0:00-0:10 | 10 min | Welcome & objectives- Course mechanics: 2 h lecture + 2 h practice, one running project, three presentations (weeks 5, 8, 13), no written exams.
- What this course is: the engineering around the model, not the model itself.
- This week's objectives on the board.
|
| 0:10-0:25 | 15 min | Motivation: the prototype-to-production gap- A walk through a real incident: an accurate model, a stale feature pipeline, a silent 40% error rate for three weeks.
- The 90/10 inversion: the model is a small box in a large systems diagram.
- Discussion prompt: what could have caught this, and at which layer?
|
| 0:25-0:50 | 25 min | Reliability as a contract: SLIs, SLOs, error budgets- SLI: a measured indicator (request latency, error rate, freshness).
- SLO: the target on that indicator; SLA: the contract with consequences.
- Error budgets: a quantified allowance of unreliability to spend on change.
- Board work: turn 'the chatbot should be fast and reliable' into two SLOs with numbers.
- The toil concept and why automation is an SRE obligation, not a luxury.
|
| 0:50-1:10 | 20 min | What 'production-ready' means- Availability, latency (p95, not average), cost, recoverability, observability.
- Day-one versus day-two: shipping is the start, operating is the job.
- The on-call loop: detect, triage, mitigate, learn (blameless postmortems).
- Quick poll: which property does your favourite app most visibly fail on?
|
| 1:10-1:20 | 10 min | Break |
| 1:20-1:40 | 20 min | The five operational layers- DevOps: code to running service (CI/CD, containers, observability).
- DataOps: trustworthy data (pipelines, quality, versioning).
- MLOps: the model lifecycle (tracking, registry, serving, drift).
- LLMOps: operating models you did not train (APIs, RAG, evaluation, cost).
- AgentOps: operating loops that act (tools, tracing, bounds).
- The stacking argument: each layer inherits the guarantees, and the debts, of those below.
|
| 1:40-1:55 | 15 min | The course use cases & the running project- IoT telemetry: sensor streams, anomaly alerts, predictive maintenance.
- Document-QA chatbot: RAG over a real corpus, grounded answers with citations.
- Document processing: structured extraction from invoices and forms at scale.
- How one system will thread all five layers across thirteen weeks; what each Student Project Presentation must show.
|
| 1:55-2:00 | 5 min | Wrap-up & practice previewRevisit the checks below; practice sets up Git, Docker, and the team repositories. |
Common misconception to confront.
Students often think: If the model is accurate, the system is done.
Set it straight: Accuracy is one property. Availability, latency, cost, data freshness, and recoverability are separate properties that usually dominate production outcomes.
In the practice session the instructor demonstrates the tooling live and teaches the hands-on topics that belong at the keyboard. There are no separate weekly labs: each session closes with the project-integration brief, the increment every team adds to its end-to-end system before next week.
| 0:00-0:10 | 10 min | Setup & recap- Verify Git, Docker, and an editor on every machine; fix stragglers now.
- Recap: SLOs, error budgets, the five layers, the three use cases.
|
| 0:10-0:35 | 25 min | Git for teams, done properly- Branch, commit, push, pull request, review, merge: the full loop live.
- Branch protection and required reviews on the shared repository.
- Repository layout for a service: src, tests, infra, docs; .gitignore and secrets hygiene from day one.
|
| 0:35-1:00 | 25 min | Containers from zero- Write a Dockerfile for a minimal 'hello-service' line by line.
- Build, run, stop, and inspect; read the logs; map the port.
- Pin the base image and dependency versions; rebuild and show identical behaviour.
|
| 1:00-1:10 | 10 min | Break |
| 1:10-1:35 | 25 min | Team formation & project kickoff- Form teams of three or four; create each team's repository from the course template.
- Walk the template: service skeleton, infra folder, README contract.
- Each team opens its first pull request (the README) and merges it through review.
|
| 1:35-1:50 | 15 min | Students drive- Each team containerises the template service and runs it locally.
- Instructor circulates; common failures (port clashes, missing pins) fixed live.
|
| 1:50-2:00 | 10 min | Project-integration briefWalk the 'Project integration' card below: what each team adds to its system before next week and how it builds toward Presentation 1. |