Planning and reasoning¶
Scope: the techniques that let an agent handle a task too complex for a single reason-act step. The plain agent loop loses direction, answers prematurely, and fails to recover from errors on long tasks; planning and reflection add structure by giving the agent time to think before and after acting. This page covers explicit planning, reflection-based recovery, and the reasoning patterns the field has accumulated. It builds on the loop and is observed and scored through evaluation.
Code here is illustrative reference; validate against your own stack before relying on it.
flowchart LR
TASK["Complex task"] --> PLAN["Plan: decompose into a task list"]
PLAN --> ACT["Act: execute next step (loop)"]
ACT --> REFLECT["Reflect: review progress, analyse errors"]
REFLECT -->|"on track"| ACT
REFLECT -->|"need_replan"| PLAN
ACT -->|"all tasks done"| DONE["Final answer"]
Core knowledge¶
Why the bare loop is not enough¶
A reason-act loop works well for short tasks and degrades on long ones in predictable ways: it loses the overall direction, commits to an answer too early, forgets work it already did, and loops on a failing action instead of trying another approach. The fix is to give the agent explicit time to think, separating the act of planning from the act of doing.1 This is metacognition in a small form: planning is time to think before acting, reflection is time to look back after acting.1
Planning: set direction by decomposition¶
Planning turns a vague goal into an explicit, ordered task list the agent can track (pending, in progress, done) and keep in view as it works. The implementation can be trivial because the model does the real planning; the tool just records and re-presents the list, and the plan is regenerated as understanding improves. Keeping the plan visible at the end of the context also counters the lost-in-the-middle effect (context and memory).1 The research lineage is decomposition-first prompting: Plan-and-Solve devises a plan before solving, Least-to-Most breaks a hard problem into easier sub-problems, and ReWOO separates planning from tool execution so the model plans the whole workflow before any tool runs.234
Reflection: recover from failure¶
Reflection pauses the loop to review progress, analyse what went wrong, synthesise results, and self-check before continuing. Its highest-value use is failure recovery: when a tool call fails, reflecting on why and choosing a different approach beats looping to the step cap. A reflection step can set a re-plan flag that sends the agent back to planning, so the two form a cycle (plan sets direction, reflection checks it). The pattern generalises published methods: Reflexion turns an outcome into verbal self-feedback stored in memory for the next attempt, and Self-Refine iterates generate-critique-revise on the model's own output.156
The reasoning toolbox¶
Underneath planning and reflection sits a set of reasoning techniques worth knowing:789
- Chain-of-thought: prompting the model to reason step by step before answering improves multi-step accuracy.
- Self-consistency: sample several reasoning paths and take the majority answer, trading compute for reliability.
- Tree of thoughts: explore and evaluate multiple branches with backtracking, for tasks where a single linear chain is fragile.
- ReAct: interleave reasoning with actions so the model can ground its reasoning in tool results, the pattern the loop already implements.
Modern reasoning models internalise much of chain-of-thought, but explicit planning and reflection still help on long-horizon, tool-using tasks, because they impose structure the model does not maintain on its own.1
When not to plan¶
Planning and reflection cost extra calls. On simple, single-step tasks they add latency and tokens for no gain. Reach for them when the task is genuinely multi-step, when early actions constrain later ones, or when failure recovery matters; otherwise let the loop run.1
Don't-miss checklist¶
- Add explicit planning only for genuinely multi-step tasks; keep the plan visible in context.
- Regenerate the plan as understanding improves rather than committing to the first one.
- Use reflection chiefly for failure recovery: analyse the error, then change approach.
- Wire a re-plan signal from reflection back to planning so the agent can correct course.
- Match the reasoning technique to the task; do not pay for tree search on a linear problem.
Failure modes¶
- Planning everything. Trivial tasks get a planning overhead that only adds cost and latency.
- Stale plan. The plan is fixed at the start and never revised as facts change.
- Reflection theatre. The agent narrates reflection without changing behaviour; recovery never happens.
- Loop on failure. Without reflection the agent repeats a failing action to the step cap.
- Over-deliberation. Excess self-consistency or tree search burns budget past the point of return.
Open questions & validation¶
- Whether explicit planning beats a strong model's internal reasoning is task-dependent; A/B it on real tasks.
- Reflection quality is hard to measure; validate that it changes actions, not just text.
- Tree search and self-consistency costs scale fast; measure the accuracy-per-token trade-off.
References¶
- Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models: https://arxiv.org/abs/2210.03629
- Wei et al., Chain-of-Thought Prompting Elicits Reasoning in LLMs: https://arxiv.org/abs/2201.11903
- Wang et al., Self-Consistency Improves Chain of Thought Reasoning: https://arxiv.org/abs/2203.11171
- Yao et al., Tree of Thoughts: Deliberate Problem Solving with LLMs: https://arxiv.org/abs/2305.10601
- Shinn et al., Reflexion: Language Agents with Verbal Reinforcement Learning: https://arxiv.org/abs/2303.11366
- Madaan et al., Self-Refine: Iterative Refinement with Self-Feedback: https://arxiv.org/abs/2303.17651
- Wang et al., Plan-and-Solve Prompting: https://arxiv.org/abs/2305.04091
- Xu et al., ReWOO: Decoupling Reasoning from Observations: https://arxiv.org/abs/2305.18323
- Song & Hur, Build an AI Agent (From Scratch), Manning Publications (MEAP), 2026.
Related: The agent loop · Context & memory · Evaluating agents · Harness architecture · Agentic & tool-use RL · Agentic systems
-
Song & Hur (Ch 7): planning and reflection give the agent time to think before and after acting (metacognition); planning decomposes a task into a tracked list regenerated as understanding improves, reflection's main value is failure recovery, and a re-plan signal links the two. Explicit planning helps on long, multi-step tasks and is overhead on simple ones. ↩↩↩↩↩↩
-
Wang et al., Plan-and-Solve: devise a plan before solving to reduce missing-step errors. ↩
-
Zhou et al., Least-to-Most: decompose a hard problem into progressively easier sub-problems. ↩
-
Xu et al., ReWOO: plan the full workflow before executing tools, decoupling reasoning from observations. ↩
-
Shinn et al., Reflexion: convert an outcome into verbal self-feedback stored in memory for the next attempt. ↩
-
Madaan et al., Self-Refine: iterate generate, critique, revise on the model's own output. ↩
-
Wei et al., chain-of-thought prompting elicits step-by-step reasoning that improves multi-step accuracy. ↩
-
Wang et al., self-consistency samples multiple reasoning paths and votes. ↩
-
Yao et al., Tree of Thoughts explores and evaluates multiple branches with backtracking. ↩