Week 12 Part V · LLMOps & AgentOps

Agents & AgentOps: Tools, MCP & Managed Agents

Instructor lesson plan: lecture (2 h) and practice (2 h).

Learning objectives

Explain the agent loop and tool use (function calling), from zero.
Build and trace a tool-using agent; evaluate it at the step level.
Bound agents with step, cost, and permission limits.
Compare self-built agents with managed agent services.

Tools this week

function calling (SDKs)MCP SDKLangfuse step tracesBedrock AgentCore / Foundry (tour)

🎓Lecture · 2 hours

0:00-0:10	10 min	Recap & objectives Retrieval: two judge biases; what an eval set is for. Today: from answering to acting. No prior agent experience assumed.
0:10-0:25	15 min	Motivation: the loop that would not stop An agent is an LLM in a loop with tools: it can search, call APIs, write files, act. Story: the retry loop that mis-classified a code bug as a transient error and burned the budget overnight; bounds were the fix, not luck. Autonomy times tools equals a brand-new failure surface; that is the week.
0:25-0:50	25 min	Agents from first principles The plan-act-observe loop drawn on the board, no framework, just the calls. Function calling precisely: the model emits a structured tool call (week 9's schemas); your code executes it and returns the observation. Tools: search, retrieval (week 10's RAG becomes a tool), calculators, internal APIs. Memory: the context window as short-term memory; external stores for long-term. Termination: the agent decides it is done, or the harness decides for it; why the second must exist.
0:50-1:10	20 min	MCP & the tool ecosystem The integration explosion: N agents times M tools; why a standard interface was inevitable. MCP, the Model Context Protocol: servers expose tools and resources; any compliant agent can use them. Adoption across vendors; the server catalog as the live map of the ecosystem. Design lesson: a tool's description is a prompt; bad tool docs produce bad tool calls. Failure modes catalogued: hallucinated arguments, wrong tool choice, loops, cascade errors.
1:10-1:20	10 min	Break
1:20-1:40	20 min	AgentOps: operating autonomy Tracing as spans: every step, tool call, and decision is a recorded span; week 11's tracing, one level up. Step-level evaluation: judge the trajectory, not only the final answer; where did it first go wrong? Bounds, mandatory: max steps, wall-clock and cost budgets, an error classifier (config errors fail fast, transients retry). Least privilege on tools; human-in-the-loop checkpoints for irreversible actions.
1:40-1:55	15 min	Managed agents Bedrock AgentCore and Foundry Agent Service: hosted loop, tool catalog, memory, tracing in one console. The same control-versus-toil axis from week 2, applied to autonomy. What you still own regardless: tool design, permissions, evaluation, and the bill.
1:55-2:00	5 min	Wrap-up & practice previewPractice builds, traces, attacks, and bounds a real agent, then wraps a project tool as an MCP server.

Common misconception to confront.

Students often think: An agent loop will stop on its own.
Set it straight: Without explicit step, cost, and wall-clock bounds and a termination criterion, agents loop, retry, and burn budget. Bounds and an error classifier are mandatory: a retry-on-any-exception loop spins forever on a caller-side bug.

Check for understanding (pose during the concept blocks; let students answer before revealing).

Why trace at the step level, not just the final output?

Failures hide in intermediate tool calls and decisions; step traces localise exactly where the agent went wrong.

Name two bounds every agent loop needs.

A maximum step or iteration cap and a wall-clock or cost budget, with an error classifier so caller-side bugs fail fast instead of retrying.

Key takeaways.

An agent is an LLM in a loop with tools; autonomy is a new failure surface.
MCP standardises the agent-tool interface across vendors.
Trace and evaluate at the step level; bound steps, cost, and permissions.

📚Reading & resources

AI Engineering, the agents part of ch. 6 Huyen; agent patterns, tool use, and their failure modes.
Building Effective Agents Anthropic; workflows versus agents, and when not to build an agent. Required.
Model Context Protocol documentation The MCP concepts and the build-a-server tutorial used in practice.
ReAct: Synergizing Reasoning and Acting in Language Models Yao et al., ICLR 2023; the loop most agents still run.
Toolformer Schick et al., 2023; how models learn tool use.
LangGraph documentation Optional: graph-based orchestration for multi-step agents.

💻Practice · 2 hours

In the practice session the instructor demonstrates the tooling live and teaches the hands-on topics that belong at the keyboard. There are no separate weekly labs: each session closes with the project-integration brief, the increment every team adds to its end-to-end system before next week.

0:00-0:10	10 min	Setup & recap Open the agent scaffold and the tracing dashboard. Recap: the loop, function calling, bounds.
0:10-0:35	25 min	Build the loop, no magic Give the scaffold two tools: the project's retrieval (week 10) and a calculator or internal API. Run real tasks; read the raw model-emitted tool calls to demystify the loop. Watch a wrong-tool choice happen naturally; discuss why the description invited it.
0:35-1:00	25 min	Trace it, break it, find it Extend the week-11 tracing to agent spans; read a successful run end to end. Trigger a failure (a hallucinated argument); localise it in the trace to the exact step. Step-level eval: write two checks on the trajectory, not the answer.
1:00-1:10	10 min	Break
1:10-1:35	25 min	Bound it Add the max-step cap, the cost budget, and the error classifier. Unleash a designed runaway task; watch the bounds catch it and the classifier fail fast on a config error. Add one human-in-the-loop checkpoint before an irreversible action.
1:35-1:50	15 min	MCP & the managed tour Wrap one project tool as an MCP server; connect the agent to it. Stretch: connect to a neighbouring team's MCP server. Five-minute managed-agent console tour; map every concept you just built onto it.
1:50-2:00	10 min	Project-integration briefThe 'Project integration' card: one agentic capability, traced and bounded, lands in the project before the final hardening week.

Common pitfalls to pre-empt.

Broad tool permissions are dangerous; grant least privilege.
A retry-on-any-exception loop spins on caller bugs; classify the error first.

Project integration (this week)

One agentic capability added to the project (e.g. the chatbot answers via tools, the IoT system triages an alert, the extraction pipeline self-checks).
Tracing on every step; a step cap, cost budget, and error classifier in force.
One project tool exposed as an MCP server; stretch: consume another team's server.

Curated references Project brief