Reasoning Techniques for AI Agents: Chain-of-Thought to Tree-of-Thought

Q: What is Chain-of-Thought prompting?

Chain-of-Thought (CoT) prompting asks a language model to reason through a problem step by step before giving a final answer, instead of jumping straight to a response. The intermediate steps give the model room to compute and let each step constrain the next, which sharply improves accuracy on math, logic, and other multi-step tasks. It is the foundational reasoning technique that ReAct, self-consistency, and Tree-of-Thought all build on.

Q: What is the difference between ReAct and Chain-of-Thought?

Chain-of-Thought reasons only over what the model already knows from training, producing a self-contained line of thought. ReAct interleaves reasoning with real actions. It runs a loop of Thought, Action, and Observation, calling tools or APIs and feeding the actual results back into its next thought. Use plain CoT for self-contained reasoning, and ReAct whenever the answer depends on live data, files, or actions outside the model.

Q: When should you use Tree-of-Thought instead of Chain-of-Thought?

Use Tree-of-Thought only for genuine search problems where the best first step becomes clear only after exploring several options, such as constraint planning or logic puzzles. It branches into multiple candidate steps, scores each one, and backtracks from dead ends, which can cost dozens of model calls per answer. If a single Chain-of-Thought or self-consistency already solves the task reliably, Tree-of-Thought is overkill.

Space & Story Team

Part ofAgentic Design Patterns: The Complete Guide to Building Intelligent AI Systems

Based on Agentic Design Patterns by Antonio Gulli (Springer). All book royalties go to Save the Children.

agentic design patterns

Antonio Gulli

Space & Story Team·June 15, 2026·11 min read

Reasoning Techniques for AI Agents: Chain-of-Thought to Tree-of-Thought

Key Takeaway

Chain-of-Thought, self-consistency, ReAct, and Tree-of-Thought are reasoning patterns that buy agent reliability with compute. CoT shows the work, self-consistency votes across paths, ReAct grounds thinking in tool results, and Tree-of-Thought searches and backtracks. The skill is picking the cheapest one a task actually needs.

Why This Matters for Enterprise AI

A model that answers instantly is a model that is guessing. For trivia and rephrasing, the guess is fine. For a refund decision, a diagnosis, or a multi-step database migration, the guess is a liability. The gap between a demo that wows a boardroom and an agent that survives a quarter of real traffic is almost always reasoning architecture.

The techniques in this post are how you buy reliability with compute. Chain-of-Thought makes the model show its work. Self-consistency runs that work several times and takes a vote. ReAct interleaves thinking with real tool calls so the agent reasons over fresh facts instead of stale memory. Tree-of-Thought lets the agent branch, judge its own branches, and back out of dead ends. None of these are free, and the skill is knowing which one a given task needs. If you have read AI agent planning, this is the layer underneath the plan: planning decides what steps to take, reasoning decides how the agent thinks through each one.

What Are Reasoning Techniques in AI Agents?

Reasoning techniques are prompting and control-flow patterns that force a language model to generate intermediate steps before it commits to an answer. Rather than map a question straight to a final token, the model works through the problem on the page. Antonio Gulli, in Agentic Design Patterns, treats reasoning as the pattern family that turns a fast pattern-matcher into something that can plan, check itself, and recover. The intermediate steps are the product. They are where the model catches its own errors, and they are what you log when something goes wrong at 2 a.m.

The mental model is the difference between a student who blurts the first number that comes to mind and one who works the problem on scratch paper. Same student, same knowledge. The one who writes out the steps gets the hard problems right far more often. Each written step constrains the next one and narrows the space of things that can go wrong. Anthropic's Building Effective Agents makes the same point in practical terms: give the model room to think before it acts, and reach for heavier machinery only when the task earns it.

An abstract branching diagram where a single reasoning path forks into several candidate branches, some pruned and one highlighted, a visual of how agents move from linear Chain-of-Thought to branching Tree-of-Thought reasoning — Reasoning techniques range from a single linear chain of steps to a branching tree the agent can evaluate and prune. The heavier the method, the more compute it spends to get a hard answer right.

These methods stack on top of each other rather than competing. ReAct uses Chain-of-Thought for its "think" step. Self-consistency wraps several Chain-of-Thought runs. Tree-of-Thought generalizes the chain into a search. Learn them as a ladder, from cheap and linear to expensive and exploratory, and pick the lowest rung that solves your problem.

Chain-of-Thought: Make the Model Show Its Work

Chain-of-Thought (CoT) is the foundational technique, and most days it is the only one you need. Instead of asking for an answer, you ask the model to reason step by step toward the answer. The phrase "think step by step" is the canonical trigger, but the real mechanism is that you are giving the model tokens to compute with. A transformer does a fixed amount of work per token, so a problem that needs ten steps of arithmetic cannot be solved in zero steps of output. CoT gives it the steps.

The payoff shows up on anything compositional: math word problems, multi-hop questions, policy checks with several conditions, code that has to satisfy three constraints at once. On a question like "if the order shipped on the 3rd, ships ground, and the customer is in Zone 4, is it past the delivery SLA (service-level agreement) today," a direct answer is a coin flip. The same model, asked to lay out the ship date, the transit window, and today's date first, gets it right because each step pins down the next.

CoT is cheap. It costs you the output tokens of the reasoning, and on a frontier model that is usually a rounding error against the value of a correct answer. The one real cost is that the reasoning is visible, which you may not want to show end users. The standard move is to generate the chain, then generate a clean final answer, and only surface the second one.

Self-Consistency: Sample Several Paths, Take a Vote

A single Chain-of-Thought is one walk down one path. If the model takes a wrong turn early, the rest of the chain follows it off the cliff, confidently. Self-consistency is the fix: run the same CoT prompt several times at a non-zero temperature so each run reasons differently, then majority-vote on the final answers. The reasoning that more paths converge on is more likely to be right.

Think of it as asking five competent people the same hard question and going with the answer three of them gave. The wrong turns tend to be idiosyncratic and scatter; the right answer tends to be the attractor that independent reasoning paths land on. In the original research this single change lifted accuracy on hard math benchmarks by double digits, with no new training and no new data.

The cost is linear and obvious. Five samples cost roughly five times the tokens and, unless you parallelize the calls, five times the latency. So self-consistency earns its keep on high-stakes, verifiable answers where being wrong is expensive and the question has a discrete result you can vote over: a number, a category, a yes or no. It is a poor fit for open-ended generation, where there is no clean "same answer" to count, and overkill for anything a single chain already nails.

Enterprise reality: A claims-triage agent that decides approve, deny, or escalate is a textbook self-consistency case. Sample the reasoning five times, take the majority, and route the rare three-two split to a human instead of letting one unlucky chain auto-deny a valid claim. The extra compute is trivial next to the cost of a wrong denial, and the vote split doubles as a free confidence signal you can act on.

ReAct: Interleave Reasoning With Tool Actions

Chain-of-Thought reasons over what the model already knows. That is exactly the problem when the answer depends on something the model does not know, such as a live order status, a current price, or the contents of a file. ReAct, short for Reason + Act, closes that gap. The agent runs a loop of Thought, Action, Observation: it thinks about what it needs, takes one action (a tool or API call), reads the real result, and thinks again with that result in hand.

This is the pattern that makes tool use in AI agents into genuine reasoning rather than a single blind function call. The model is not predicting what the database probably says. It queries the database, reads the actual row, and reasons from the fact. Each observation grounds the next thought in reality, so the agent self-corrects against the world instead of compounding its own guesses.

A ReAct Loop (Abbreviated)

Here is the shape of a ReAct loop. The model emits a thought and an action; your code runs the tool and feeds the real observation back; the loop repeats until the model emits a final answer.

# Abbreviated — illustrative ReAct loop, not production code
tools = {"search": web_search, "lookup_order": order_db}def react(question, max_steps=6):
    scratchpad = ""  # running Thought / Action / Observation log
    for _ in range(max_steps):
        prompt = (
            f"Question: {question}\n{scratchpad}\n"
            "Think, then act. Reply 'Action: <tool>[<input>]' "
            "or 'Final: <answer>'."
        )
        step = llm(prompt)            # e.g. "Thought: ... Action: lookup_order[A-417]"
        scratchpad += step + "\n"
        if step.startswith("Final:"):
            return steptool, arg = parse_action(step)        # -> ("lookup_order", "A-417")
        observation = tools<a href="arg">tool</a>        # real call, real result
        scratchpad += f"Observation: {observation}\n"
    return "Final: stopped — step budget exhausted"

The same loop is what create_react_agent builds for you in LangGraph, and what a tool-equipped agent runs under the hood in Google's Agent Development Kit (ADK). The framework hides the string-wrangling; the Thought-Action-Observation cycle is the pattern. Note the max_steps budget. An agent that can loop needs a hard ceiling, or a confused one will think and act forever.

Tree-of-Thought: Branch, Evaluate, Backtrack

Some problems can't be solved by reasoning forward in a straight line, because the right first move only becomes obvious after you have explored a few wrong ones. Planning a multi-leg trip under constraints, a logic puzzle, a tricky refactor with several viable approaches: these need search, not a single chain. Tree-of-Thought (ToT) supplies it. Instead of one linear chain, the agent generates several candidate next steps, scores each one, expands the promising branches, and abandons the dead ends. It is Chain-of-Thought turned into a tree the agent can walk with backtracking.

Three pieces make it work. The agent branches by proposing multiple distinct next thoughts from the current state. It evaluates each branch, usually by asking the model itself to rate how promising a partial solution looks. And it backtracks, pruning low-scored branches and returning to a better fork when a path stalls. That self-evaluation step is the engine. It is the same instinct behind reflection and adaptation, applied mid-search to decide which line of thinking deserves more compute.

ToT is powerful and expensive, and the expense is not linear. Exploring a branching tree can cost dozens of model calls for one answer, so the honest default is to not use it. Reach for Tree-of-Thought only when three things are all true: the problem really does need exploration, a single chain or self-consistency has demonstrably failed on it, and a partial solution is something the model can score. For the long tail of agent tasks, that combination is rare, and a plain chain is the right call.

Scratchpads: Give Reasoning a Place to Live

Underneath all four techniques sits a humbler idea: the scratchpad. A scratchpad is just working memory in the context window, a place where the agent writes its intermediate reasoning, tool results, and notes-to-self as it goes. The ReAct loop above keeps one. Every CoT chain is a scratchpad that happens to be discarded after the answer.

Making the scratchpad explicit buys you three things. The agent can refer back to what it already figured out instead of re-deriving it. You get a complete, inspectable trace of how the agent reached its conclusion, which is what auditors and on-call engineers need when something breaks. And you control what carries forward. At each step you decide what stays in the scratchpad and what gets summarized or dropped, which is how you keep a long-running agent from drowning in its own history. The scratchpad is where reasoning meets memory management, and on long tasks that boundary is where agents most often fall over.

When Each Technique Is Worth Its Compute

The whole game is matching the method to the task, because every rung up this ladder trades latency and tokens for reliability you may not need.

Plain answer, no reasoning. Lookups, formatting, classification a small model nails. If a direct call is reliable, adding reasoning just burns money and time.
Chain-of-Thought. The default for anything with steps: math, multi-condition logic, code, analysis. Cheap, high-payoff, your first reach.
Self-consistency. High-stakes, verifiable answers with a discrete result, where a single chain is occasionally wrong and being wrong is costly. Pay the multiplier for the vote and the confidence signal.
ReAct. Anything that depends on facts outside the model: live data, files, search, actions in a system. Not optional here; it is the only honest way to answer.
Tree-of-Thought. The expensive specialist. Genuine search problems where simpler methods have failed and partial solutions can be scored. Most agents never need it.

The failure mode in both directions is real. Under-reason and your agent guesses on problems it should have worked through; you will see confident, wrong answers that a chain would have caught. Over-reason and you ship an agent that takes nine seconds and a dollar to answer a question a single call would have nailed, which is its own way of failing in production. Start at the lowest rung that works and climb only when the task forces you to.

Key Takeaways

Reasoning techniques force a model to generate intermediate steps before answering, which is how agents catch their own errors and how you debug them when they fail.
Chain-of-Thought is the cheap, high-payoff default; "think step by step" gives the model the tokens it needs to work through compositional problems.
Self-consistency samples several chains and majority-votes, trading roughly N times the compute for higher accuracy and a confidence signal on high-stakes, verifiable answers.
ReAct interleaves Thought, Action, and Observation so the agent reasons over real tool results instead of stale memory. It is the right pattern whenever the answer lives outside the model.
Tree-of-Thought branches, self-evaluates, and backtracks for genuine search problems, but it is expensive and rarely needed; match the technique to the task and climb the ladder only when forced.

Previous in series

AI Agent Planning: How Intelligent Systems Decide What to Do Next

Next in series

Multi-Agent Systems: Orchestrating Teams of AI Agents

Is your site invisible to AI search?

Get a free AEO infrastructure audit and find out what your competitors are doing that you're not.

Get Your Free Audit

Industry sources we cite.

3 links · External

Quick answers

Frequently asked.

Keep reading

Continue with.

Agentic AI

AI Agent Planning: How Intelligent Systems Decide What to Do Next

AI agent planning turns a high-level goal into an ordered sequence of executable steps. Learn task decomposition, plan-and-execute vs ReAct, and when to replan.

June 15, 2026·10mRead

Agentic AI

Reflection and Adaptation: How AI Agents Learn From Their Own Output

Reflection is the pattern where an AI agent critiques its own output and revises it, looping until the work clears a quality bar. It is the self-correction loop behind reliable agents.

June 15, 2026·10mRead

Agentic AI

Tool Use in AI Agents: Function Calling and Beyond

How AI agents use function calling to work with APIs and databases and other external services. The tool use pattern explained with code examples from Antonio Gulli.

March 30, 2026·10mRead

Reasoning Techniques for AI Agents: Chain-of-Thought to Tree-of-Thought

Why This Matters for Enterprise AI

What Are Reasoning Techniques in AI Agents?

Chain-of-Thought: Make the Model Show Its Work

Self-Consistency: Sample Several Paths, Take a Vote

ReAct: Interleave Reasoning With Tool Actions

A ReAct Loop (Abbreviated)

Tree-of-Thought: Branch, Evaluate, Backtrack

Scratchpads: Give Reasoning a Place to Live

When Each Technique Is Worth Its Compute

Key Takeaways

Further Reading

Industry sources we cite.

Frequently asked.

What is Chain-of-Thought prompting?

What is the difference between ReAct and Chain-of-Thought?

When should you use Tree-of-Thought instead of Chain-of-Thought?

Continue with.

AI Agent Planning: How Intelligent Systems Decide What to Do Next

Reflection and Adaptation: How AI Agents Learn From Their Own Output

Tool Use in AI Agents: Function Calling and Beyond