Engineering of AI Systems · HIT

Week 12   Part V · LLMOps & AgentOps

Agents & AgentOps: Tools, MCP & Managed Agents

Instructor lesson plan: lecture (2 h) and practice (2 h).

Learning objectives

Tools this week

function calling (SDKs)MCP SDKLangfuse step tracesBedrock AgentCore / Foundry (tour)

🎓Lecture · 2 hours

0:00-0:1010 minRecap & objectives
  • Retrieval: two judge biases; what an eval set is for.
  • Today: from answering to acting. No prior agent experience assumed.
0:10-0:2515 minMotivation: the loop that would not stop
  • An agent is an LLM in a loop with tools: it can search, call APIs, write files, act.
  • Story: the retry loop that mis-classified a code bug as a transient error and burned the budget overnight; bounds were the fix, not luck.
  • Autonomy times tools equals a brand-new failure surface; that is the week.
0:25-0:5025 minAgents from first principles
  • The plan-act-observe loop drawn on the board, no framework, just the calls.
  • Function calling precisely: the model emits a structured tool call (week 9's schemas); your code executes it and returns the observation.
  • Tools: search, retrieval (week 10's RAG becomes a tool), calculators, internal APIs.
  • Memory: the context window as short-term memory; external stores for long-term.
  • Termination: the agent decides it is done, or the harness decides for it; why the second must exist.
0:50-1:1020 minMCP & the tool ecosystem
  • The integration explosion: N agents times M tools; why a standard interface was inevitable.
  • MCP, the Model Context Protocol: servers expose tools and resources; any compliant agent can use them.
  • Adoption across vendors; the server catalog as the live map of the ecosystem.
  • Design lesson: a tool's description is a prompt; bad tool docs produce bad tool calls.
  • Failure modes catalogued: hallucinated arguments, wrong tool choice, loops, cascade errors.
1:10-1:2010 minBreak
1:20-1:4020 minAgentOps: operating autonomy
  • Tracing as spans: every step, tool call, and decision is a recorded span; week 11's tracing, one level up.
  • Step-level evaluation: judge the trajectory, not only the final answer; where did it first go wrong?
  • Bounds, mandatory: max steps, wall-clock and cost budgets, an error classifier (config errors fail fast, transients retry).
  • Least privilege on tools; human-in-the-loop checkpoints for irreversible actions.
1:40-1:5515 minManaged agents
  • Bedrock AgentCore and Foundry Agent Service: hosted loop, tool catalog, memory, tracing in one console.
  • The same control-versus-toil axis from week 2, applied to autonomy.
  • What you still own regardless: tool design, permissions, evaluation, and the bill.
1:55-2:005 minWrap-up & practice previewPractice builds, traces, attacks, and bounds a real agent, then wraps a project tool as an MCP server.
Common misconception to confront.

Students often think: An agent loop will stop on its own.
Set it straight: Without explicit step, cost, and wall-clock bounds and a termination criterion, agents loop, retry, and burn budget. Bounds and an error classifier are mandatory: a retry-on-any-exception loop spins forever on a caller-side bug.

Check for understanding (pose during the concept blocks; let students answer before revealing).
Why trace at the step level, not just the final output?
Failures hide in intermediate tool calls and decisions; step traces localise exactly where the agent went wrong.
Name two bounds every agent loop needs.
A maximum step or iteration cap and a wall-clock or cost budget, with an error classifier so caller-side bugs fail fast instead of retrying.
Key takeaways.

📚Reading & resources

💻Practice · 2 hours

In the practice session the instructor demonstrates the tooling live and teaches the hands-on topics that belong at the keyboard. There are no separate weekly labs: each session closes with the project-integration brief, the increment every team adds to its end-to-end system before next week.

0:00-0:1010 minSetup & recap
  • Open the agent scaffold and the tracing dashboard.
  • Recap: the loop, function calling, bounds.
0:10-0:3525 minBuild the loop, no magic
  • Give the scaffold two tools: the project's retrieval (week 10) and a calculator or internal API.
  • Run real tasks; read the raw model-emitted tool calls to demystify the loop.
  • Watch a wrong-tool choice happen naturally; discuss why the description invited it.
0:35-1:0025 minTrace it, break it, find it
  • Extend the week-11 tracing to agent spans; read a successful run end to end.
  • Trigger a failure (a hallucinated argument); localise it in the trace to the exact step.
  • Step-level eval: write two checks on the trajectory, not the answer.
1:00-1:1010 minBreak
1:10-1:3525 minBound it
  • Add the max-step cap, the cost budget, and the error classifier.
  • Unleash a designed runaway task; watch the bounds catch it and the classifier fail fast on a config error.
  • Add one human-in-the-loop checkpoint before an irreversible action.
1:35-1:5015 minMCP & the managed tour
  • Wrap one project tool as an MCP server; connect the agent to it.
  • Stretch: connect to a neighbouring team's MCP server.
  • Five-minute managed-agent console tour; map every concept you just built onto it.
1:50-2:0010 minProject-integration briefThe 'Project integration' card: one agentic capability, traced and bounded, lands in the project before the final hardening week.
Common pitfalls to pre-empt.

Project integration (this week)

Curated references Project brief

PreviousWeek 11: LLM Evaluation, Guardrails & ObservabilityNextWeek 13: Security, Governance & Synthesis