| 0:00-0:10 | 10 min | Recap & objectives- Retrieval: two judge biases; what an eval set is for.
- Today: from answering to acting. No prior agent experience assumed.
|
| 0:10-0:25 | 15 min | Motivation: the loop that would not stop- An agent is an LLM in a loop with tools: it can search, call APIs, write files, act.
- Story: the retry loop that mis-classified a code bug as a transient error and burned the budget overnight; bounds were the fix, not luck.
- Autonomy times tools equals a brand-new failure surface; that is the week.
|
| 0:25-0:50 | 25 min | Agents from first principles- The plan-act-observe loop drawn on the board, no framework, just the calls.
- Function calling precisely: the model emits a structured tool call (week 9's schemas); your code executes it and returns the observation.
- Tools: search, retrieval (week 10's RAG becomes a tool), calculators, internal APIs.
- Memory: the context window as short-term memory; external stores for long-term.
- Termination: the agent decides it is done, or the harness decides for it; why the second must exist.
|
| 0:50-1:10 | 20 min | MCP & the tool ecosystem- The integration explosion: N agents times M tools; why a standard interface was inevitable.
- MCP, the Model Context Protocol: servers expose tools and resources; any compliant agent can use them.
- Adoption across vendors; the server catalog as the live map of the ecosystem.
- Design lesson: a tool's description is a prompt; bad tool docs produce bad tool calls.
- Failure modes catalogued: hallucinated arguments, wrong tool choice, loops, cascade errors.
|
| 1:10-1:20 | 10 min | Break |
| 1:20-1:40 | 20 min | AgentOps: operating autonomy- Tracing as spans: every step, tool call, and decision is a recorded span; week 11's tracing, one level up.
- Step-level evaluation: judge the trajectory, not only the final answer; where did it first go wrong?
- Bounds, mandatory: max steps, wall-clock and cost budgets, an error classifier (config errors fail fast, transients retry).
- Least privilege on tools; human-in-the-loop checkpoints for irreversible actions.
|
| 1:40-1:55 | 15 min | Managed agents- Bedrock AgentCore and Foundry Agent Service: hosted loop, tool catalog, memory, tracing in one console.
- The same control-versus-toil axis from week 2, applied to autonomy.
- What you still own regardless: tool design, permissions, evaluation, and the bill.
|
| 1:55-2:00 | 5 min | Wrap-up & practice previewPractice builds, traces, attacks, and bounds a real agent, then wraps a project tool as an MCP server. |
Common misconception to confront.
Students often think: An agent loop will stop on its own.
Set it straight: Without explicit step, cost, and wall-clock bounds and a termination criterion, agents loop, retry, and burn budget. Bounds and an error classifier are mandatory: a retry-on-any-exception loop spins forever on a caller-side bug.
In the practice session the instructor demonstrates the tooling live and teaches the hands-on topics that belong at the keyboard. There are no separate weekly labs: each session closes with the project-integration brief, the increment every team adds to its end-to-end system before next week.
| 0:00-0:10 | 10 min | Setup & recap- Open the agent scaffold and the tracing dashboard.
- Recap: the loop, function calling, bounds.
|
| 0:10-0:35 | 25 min | Build the loop, no magic- Give the scaffold two tools: the project's retrieval (week 10) and a calculator or internal API.
- Run real tasks; read the raw model-emitted tool calls to demystify the loop.
- Watch a wrong-tool choice happen naturally; discuss why the description invited it.
|
| 0:35-1:00 | 25 min | Trace it, break it, find it- Extend the week-11 tracing to agent spans; read a successful run end to end.
- Trigger a failure (a hallucinated argument); localise it in the trace to the exact step.
- Step-level eval: write two checks on the trajectory, not the answer.
|
| 1:00-1:10 | 10 min | Break |
| 1:10-1:35 | 25 min | Bound it- Add the max-step cap, the cost budget, and the error classifier.
- Unleash a designed runaway task; watch the bounds catch it and the classifier fail fast on a config error.
- Add one human-in-the-loop checkpoint before an irreversible action.
|
| 1:35-1:50 | 15 min | MCP & the managed tour- Wrap one project tool as an MCP server; connect the agent to it.
- Stretch: connect to a neighbouring team's MCP server.
- Five-minute managed-agent console tour; map every concept you just built onto it.
|
| 1:50-2:00 | 10 min | Project-integration briefThe 'Project integration' card: one agentic capability, traced and bounded, lands in the project before the final hardening week. |