| 0:00-0:10 | 10 min | Recap & objectives- Retrieval: the testing pyramid; why version an API.
- Today: from one container to a managed fleet, and seeing what it does.
|
| 0:10-0:25 | 15 min | Motivation: 3 a.m. and the pod is gone- One container is easy; forty containers with deploys, crashes, and traffic spikes are why orchestration exists.
- Story: the deploy that took down checkout because all instances restarted at once; the rollout pattern that would have saved it.
- You cannot operate what you cannot see: the observability half of today.
|
| 0:25-0:50 | 25 min | Kubernetes: the mental model- Desired state and reconciliation: you declare, the control loop converges.
- Pods, Deployments, Services; labels and selectors as the glue.
- Liveness and readiness probes: the /health endpoint from week 3 finds its purpose.
- Resource requests and limits; horizontal autoscaling.
- Board work: trace a request from DNS through the Service to a pod, and what happens when that pod dies.
|
| 0:50-1:10 | 20 min | Deployment patterns & GitOps- Rolling update: the default, and its window of mixed versions.
- Blue-green: two full environments, one switch, instant rollback, double cost.
- Canary: a small slice of real traffic watched closely, then ramp; the pattern the project will use for models too.
- GitOps in one picture: the cluster state lives in Git, an agent (Argo CD) reconciles it; the audit log is the repo history.
|
| 1:10-1:20 | 10 min | Break |
| 1:20-1:40 | 20 min | Observability: logs, metrics, traces- The three pillars and the question each answers: what happened, how much, and where in the chain.
- Monitoring answers known questions; observability lets you ask new ones without shipping code.
- Structured logs (JSON) versus grep archaeology; high-cardinality labels.
- The RED method: rate, errors, duration, per service; USE for resources.
|
| 1:40-1:55 | 15 min | Tail latency (predict, then run)- Predict: average latency is 80 ms; what is p99, and who experiences it?
- Run a live load test; watch p50 stay flat while p99 explodes under saturation.
- Why SLOs are written on percentiles, never averages.
|
| 1:55-2:00 | 5 min | Wrap-up & practice previewPractice deploys the project to a cluster, runs a canary, and builds the dashboard the rest of the course reads. |
Common misconception to confront.
Students often think: Observability just means more dashboards.
Set it straight: Observability is the ability to ask new questions of a running system without shipping new code. Well-structured, high-cardinality telemetry, not the number of dashboards, is what enables it.
In the practice session the instructor demonstrates the tooling live and teaches the hands-on topics that belong at the keyboard. There are no separate weekly labs: each session closes with the project-integration brief, the increment every team adds to its end-to-end system before next week.
| 0:00-0:10 | 10 min | Setup & recap- Start a local cluster (kind or minikube) on every team machine.
- Recap: desired state, probes, RED.
|
| 0:10-0:35 | 25 min | Deploy the project service- Write the Deployment and Service manifests for the week-3 service.
- Wire the liveness and readiness probes to /health.
- Scale to three replicas; kill a pod; watch reconciliation bring it back.
|
| 0:35-1:00 | 25 min | Canary rollout, for real- Ship v2 of the service next to v1; shift 10% of traffic.
- Watch the error rate per version; promote, then practice the rollback path.
- This exact pattern returns in week 7 for models.
|
| 1:00-1:10 | 10 min | Break |
| 1:10-1:35 | 25 min | The project dashboard- Expose Prometheus metrics from the service; scrape them.
- Build the RED dashboard in Grafana: rate, errors, p50/p95/p99 duration.
- Run the load test; read the tail; record the baseline p95 for the project log.
|
| 1:35-1:50 | 15 min | Students drive- Each team gets its service deployed, canaried, and on the dashboard.
- Instructor circulates on probe and scrape misconfigurations.
|
| 1:50-2:00 | 10 min | Project-integration briefThe 'Project integration' card: deployed service + canary path + RED dashboard; the baseline p95 goes into the Presentation-1 spec. |