An agent is a model that can decide to call tools — search, code execution, APIs, databases — and iterate until it completes a goal. The power is in the loop: observe, think, act, observe again. The danger is in the loop too: agents can compound errors, use tools incorrectly, and run indefinitely without guardrails.
Course: Advanced.
This lesson covers 5 concepts: What Makes an Agent, Tool Schemas, The Agent Loop, Multi-Step in Action, Agent Failure Modes.
An agent is a model wrapped in a loop: observe state, decide next action (including tool calls), execute, observe result, repeat. The model drives the loop — it decides when to use tools and when the task is done.
Agents can complete tasks that require multiple steps, real-time information, computation, and external system interaction — things that a single LLM call cannot do.
An LLM without tools is like a brilliant person locked in a room with no phone or computer. An agent is the same person with access to all of those.
Single call: "What is the weather in Tokyo?" → model guesses. Agent: model calls weather API → observes result → reports current weather. The loop is what makes it grounded and accurate.
Tool schemas describe available functions to the model: name, description, and parameter types. The model decides when to call a tool and generates the arguments — your code executes it and returns the result.
Vague tool descriptions lead to misuse. "Search the web for current information" is clear. "Search" alone is ambiguous — the model may confuse it with other search-like operations.
Tool schemas are the model's manual for its tools. A good manual leads to correct use. A bad manual leads to misuse, wrong arguments, and failed calls.
Tool with poor description: model calls it in wrong situations, passes malformed arguments, ignores it entirely. Well-written description + clear parameter names: model uses tool correctly 95%+ of the time.
The agent loop: (1) model receives task + tool schemas + prior observations. (2) model outputs a tool_call or a final response. (3) if tool_call: execute it, append result as tool message. (4) model sees the result and decides next step. Repeat until final response.
The loop must be implemented in your code — it is not automatic. You write the while loop, execute tool calls, append results, and call the model again until it stops.
Think of the loop as the scaffolding and the model as the contractor. The scaffolding provides the structure — what tools are available, what results came back. The contractor decides what to do next.
Research agent loop: model calls search(q1) → appends result → calls search(q2) → appends result → synthesises both → outputs final answer. Three model calls, two tool executions, one result.
The agent called get_stock_price with symbol="NVDA", received the current price and 52-week high, computed the percentage difference, and answered — all grounded in real data retrieved via tool use.
The final answer is grounded — every number in the response came from a tool call, not from model weights. This is what makes agents more reliable than raw LLM calls for factual questions.
Compare this to a non-agent LLM: it would guess the price from training data (wrong) or refuse to answer (unhelpful). The tool loop is what makes the answer both accurate and grounded.
Without tool use: model guesses NVDA's price from training data (months out of date, potentially wrong). With tool use: model retrieves the live price and computes the accurate comparison.
Agents fail in four characteristic ways: compounding errors from early wrong assumptions, tool misuse from bad schemas or ambiguous descriptions, infinite loops from missing stopping conditions, and context overflow from accumulating too many tool results.
Agent failures are harder to debug than single-call failures because errors compound across steps and the root cause may be 5 steps earlier. Logging every tool call and model decision is essential.
Unlike a single LLM call where the failure is visible immediately, agent failures are often silent — the agent keeps running while building on a wrong assumption from step 1.
A 10-step agent that gets step 2 wrong will fail on step 10 with an error that appears to be about step 10 — but was caused at step 2. Full agent trajectory logging is required to diagnose this.