The agent loop¶
Scope: the control flow that turns a one-shot model into an agent. An agent is a model plus a loop: the model decides what to do, a tool does it, the result re-enters the context, and the model decides whether to continue or stop. This page covers the cycle itself, the ReAct pattern it implements, how it terminates, and when a loop is the right structure versus a fixed workflow. The loop is driven by the harness and uses tools, context and memory, and planning.
Code here is illustrative reference; validate against your own stack before relying on it.
flowchart TB
START["Task + context"] --> THINK["Think: model evaluates context, decides"]
THINK -->|"tool call(s)"| ACT["Act: execute tool(s)"]
ACT --> OBS["Observe: append result to context"]
OBS --> CHECK{"final answer<br/>or max steps?"}
CHECK -->|"no"| THINK
THINK -->|"plain answer"| DONE["Stop: return result"]
CHECK -->|"yes"| DONE
Overview¶
An agent is built from three parts: the model is the brain that decides, tools are the action space, and the loop unfolds those decisions over time. The model makes autonomous decisions possible, tools expand what it can do, and the loop runs it until the task is done.1 The distinction from a plain model call is autonomy over control flow: the model, not the developer, decides the next step and when to stop.2
Core knowledge¶
The cycle¶
Each iteration is one think-act-observe step. The model reads the current context and either calls one or more tools or returns a final answer. If it calls tools, the harness executes them, appends the results to the context as observations, and loops. If it returns a plain answer, the loop stops. A concrete implementation makes this explicit: a run() builds an execution context, then repeatedly calls step() until a final result is set or a step cap is reached, where each step() prepares the request, makes one model call (think), executes any requested tools (act), and records the observation. Completion is detected when a turn produces an assistant message with no tool calls.2
It is ReAct¶
The loop is the ReAct pattern: reasoning and acting interleaved, so the model reasons about what it needs, acts to get it, observes, and reasons again.3 Early ReAct parsed a free-text Thought/Action format that broke on a missing bracket; modern tool-calling removes that fragility by having the model emit structured tool calls directly. The reasoning did not disappear, it moved inside the model.2 The practical upshot is that you rarely hand-write a reasoning format; you give the model tools and let the loop carry the reasoning.
Termination and loop guards¶
A loop that cannot stop is a bug waiting to bill. Three mechanisms bound it: a hard step cap (a maximum number of iterations), explicit completion detection (a turn with no tool call, or an emitted final-answer tool), and the budget gate in the control plane. The step cap is the universal guard against infinite loops; completion detection is what lets a well-behaved agent stop early.2
Loop or workflow¶
Not every task needs a loop. A workflow runs a developer-defined control flow (a single call, a fixed chain, or a router that picks among preset branches); an agent lets the model drive the flow. Use the loop when the path genuinely depends on intermediate results the developer cannot enumerate in advance, and prefer a workflow when the steps are known, because a fixed flow is cheaper, more predictable, and easier to test.4 A router that only selects among fixed branches is still a workflow, because the model is not choosing actions, only a path.2
Don't-miss checklist¶
- Drive the loop with explicit think, act, observe stages over a single mutable context.
- Always set a step cap; treat hitting it as a failure to investigate, not a normal exit.
- Detect completion explicitly (no tool call, or a final-answer tool) so good runs stop early.
- Append every tool result as an observation; never drop it silently.
- Choose a workflow over an agent when the control flow is known in advance.
Failure modes¶
- No step cap. The loop runs until it exhausts budget or context.
- Premature stop. Weak completion detection ends the run before the task is done.
- Lost direction. Over a long loop the model forgets the goal; add planning and reflection.
- Swallowed observations. A failed tool call returns nothing useful and the model proceeds on a false premise.
- Agent where a workflow would do. A loop is used for a fixed pipeline, adding cost and nondeterminism for no gain.
Open questions & validation¶
- The right step cap is task-dependent; measure the distribution of steps-to-completion on real tasks.
- Completion detection is heuristic; validate that the agent neither stops early nor loops on near-duplicate actions.
- Long-horizon coherence is unsolved; combine the loop with planning and external state for multi-step tasks.
References¶
- Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models: https://arxiv.org/abs/2210.03629
- Anthropic, Building effective agents (workflows versus agents): https://www.anthropic.com/research/building-effective-agents
- Song & Hur, Build an AI Agent (From Scratch), Manning Publications (MEAP), 2026.
- Model Context Protocol: https://modelcontextprotocol.io/
Related: Harness architecture · Tools & function calling · Context & memory · Planning & reasoning · Agentic & tool-use RL · Agentic systems
-
Song & Hur (Ch 1): the model makes autonomous decisions possible, tools expand the action space, and the loop unfolds them over time. ↩
-
Song & Hur, Build an AI Agent (From Scratch), Manning MEAP, 2026. An agent autonomously decides what actions to take and when to stop; the loop is
run()callingstep()(prepare, think, act, observe) until a final result or a step cap; modern tool-calling is ReAct with the reasoning moved inside the model; a router that picks among fixed branches is a workflow, not an agent. ↩↩↩↩↩ -
Yao et al., ReAct interleaves reasoning traces and actions so each improves the other. ↩
-
Anthropic, Building effective agents: prefer the simplest structure that works; use a workflow when steps are known and an agent only when the path must be decided at runtime. ↩