Based on Agentic Design Patterns by Antonio Gulli (Springer). All book royalties go to Save the Children.

Key Takeaway
Prompt chaining decomposes a complex task into a sequence of LLM calls, where each step's output feeds the next. It trades one unreliable mega-prompt for several focused, inspectable steps — the foundational orchestration pattern every other agentic technique builds on.
Why This Matters for Enterprise AI
Most teams reach for AI by writing one enormous prompt. They cram the instructions, the data, the formatting rules, and the edge cases into a single block of text, send it to the model, and hope the output holds together. It rarely does at production scale. The model drops a constraint, misreads the format, or skips a step you buried in paragraph four.
Prompt chaining is the fix, and it is the pattern every other agentic technique builds on. Instead of asking the model to do everything at once, prompt chaining breaks the work into a sequence of smaller calls, where each step's output feeds the next. The result is a workflow you can inspect, test, and trust. If you have read the foundations of agentic design, this is where the agent's "think it through" step gets real structure.
What Is Prompt Chaining?
Prompt chaining is the practice of decomposing a complex task into a sequence of LLM calls, where the output of one step becomes the input to the next. Antonio Gulli, in Agentic Design Patterns, frames it as the foundational orchestration pattern: a controlled pipeline that trades one unreliable mega-prompt for several reliable small ones.
The mental model is a relay race. Each runner carries the baton a short distance and hands it off. No single runner covers the whole track, and if one stumbles, you know exactly where the handoff failed. Anthropic's Building Effective Agents describes the same idea. It treats prompt chaining as the first workflow worth reaching for before anything more elaborate.

The distinction matters. A single prompt asks the model to plan, reason, retrieve, format, and check its own work in one pass, with no chance to course-correct. A chain gives each of those jobs its own call, its own focused instructions, and its own checkpoint.
How Prompt Chaining Works
A prompt chain runs as an ordered pipeline. The pattern is simple enough to describe in five steps.
- Decompose the task. Break the goal into discrete sub-tasks that have a natural order. "Summarize this contract and flag risky clauses" becomes "extract the clauses," then "classify each one," then "write the risk summary."
- Design each prompt. Write a focused prompt for each sub-task. Each one does a single job, so its instructions stay short and its output stays predictable.
- Pass the output forward. Take the result of step one and insert it into the prompt for step two. This handoff is the chain. The cleaner the output format at each step, the easier the handoff.
- Transform between steps when needed. Real chains rarely pass raw text straight through. You parse JSON, filter a list, or reshape a result before it becomes the next input. This glue code is where a lot of reliability lives.
- Return the final result. The last step produces the answer the user came for, built on the verified work of every step before it.
Each step is a checkpoint. You can log it, validate it, retry it, or swap the model behind it without touching the rest of the pipeline. That inspectability is the whole point of prompt chaining.
A Concrete Walk-Through
Picture an agent that reviews vendor contracts. As one mega-prompt, the task is "read this contract, find the risky clauses, and write a summary for legal." That single call has to parse a dense document, judge what counts as risky, and produce polished prose all at once. When it misses an indemnification clause, you cannot tell whether it failed to find the clause or failed to flag it.
Broken into a chain, the same job becomes three clean steps. The first extracts every clause into a structured list, and nothing else. The second scores each clause for risk against a rubric, returning a label and a one-line reason per clause. The third takes the scored list and writes the summary a lawyer reads.
Now each step has a tight job, a checkable output, and a clear place to intervene. If the scoring step over-flags, you tune its rubric without touching extraction or drafting. That is the practical payoff of prompt chaining: the work becomes modular instead of monolithic.
Code Example (Abbreviated)
Here is a two-step chain in LangChain: summarize a document, then extract action items from that summary. The output of the first call is piped directly into the second.
# Abbreviated — illustrative two-step chain, not production code
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParserllm = ChatOpenAI(model="gpt-4o")
summarize = ChatPromptTemplate.from_template(
"Summarize the document in 3 sentences:\n\n{document}"
)
extract = ChatPromptTemplate.from_template(
"List the action items from this summary as bullets:\n\n{summary}"
)
# Step 1 output flows into Step 2 as summary
chain = (
{"summary": summarize | llm | StrOutputParser()}
| extract
| llm
| StrOutputParser()
)
result = chain.invoke({"document": long_text})
The same shape holds in Google ADK, where a SequentialAgent runs sub-agents in order and passes state between them through output keys. The framework changes; the pattern does not.
Why Chaining Beats One Mega-Prompt
The instinct to write one big prompt comes from a reasonable place. Fewer calls feel simpler and cheaper. Reliability does not scale that way, though, and prompt chaining wins on it for three concrete reasons.
It reduces cognitive load on the model. A model given one job at a time performs that job better than a model juggling six. Each focused prompt narrows the space of things that can go wrong. You stop asking the model to hold the entire problem in working memory.
It makes failure legible. When a mega-prompt produces a bad answer, you have no idea which part broke. When step three of a chain fails, you know it was step three. You can read its input, see its output, and fix that link without rewriting everything around it.
It lets you mix models and tools. A chain can run a cheap fast model for extraction, a stronger model for reasoning, and a tool call in the middle for live data. One prompt forces one model to do all of it. Prompt chaining is also the on-ramp to tool use in AI agents, since any step in the chain can be an API call instead of another LLM invocation.
Enterprise reality: A contract-review chain that extracts clauses, scores each for risk, then drafts a summary as three separate steps is something your legal team can audit. They can see which clause was flagged, the reason it was flagged, and the step that flagged it. A single 2,000-word prompt that spits out a verdict is a black box, and black boxes do not pass compliance review.
Failure Modes to Plan For
Prompt chaining buys reliability, but it introduces its own risks. Two matter most.
Error propagation. Each step trusts the output of the one before it. If step one summarizes the wrong section, step two faithfully extracts action items from the wrong summary, and the mistake compounds down the chain. The defense is validation between steps. Check the output format, sanity-check the content, and fail loudly when a step returns something unexpected rather than passing bad data forward.
Latency and cost. A chain of five calls takes roughly five times as long as one call and costs more in tokens, because the context often grows at each step. For a user waiting on a response, that adds up fast. The fix is to chain only what needs sequencing, run independent steps concurrently where you can, and use smaller models for the simple links. The next pattern in this series, routing and parallelization, exists largely to claw back the latency that naive chaining gives away.
A third, quieter risk is over-decomposition. Splitting a task into fifteen steps when three would do adds handoffs and latency and bug surface without buying any reliability. Chain for a reason, not for its own sake.
When to Use Prompt Chaining (and When Not To)
Prompt chaining is the right tool when the task has a clear sequence of dependent stages.
- Reach for it when each step depends on the previous one: extract then summarize, draft then critique then revise, translate then localize then format.
- It earns its keep when intermediate outputs need validation, logging, or human review before the work continues.
- The same goes for tasks where different stages call for different models, tools, or temperatures.
- And it is the right call when a single prompt has grown so long and conditional that you can no longer reason about what it will do.
Some tasks do not need a chain at all.
- Skip it for simple, single-shot requests. A one-line definition does not need a pipeline.
- Skip it when the sub-tasks are independent rather than sequential. If three things can happen at once, parallelize them instead.
- Skip it when latency is critical and one well-written prompt handles the task reliably on its own.
The honest test is whether the handoffs are earning their keep. If breaking the task into steps makes the output more reliable or more inspectable, chain it. If it just adds calls, do not.
Key Takeaways
- Prompt chaining decomposes a complex task into a sequence of LLM calls, where each step's output becomes the next step's input.
- It beats one mega-prompt on reliability because each focused step is easier for the model to get right, easier to debug, and easier to audit.
- The five-step shape: decompose the task, design each prompt, pass the output forward, transform between steps, and return the final result.
- The main risks are error propagation and added latency. Validate between steps, and chain only what needs sequencing.
- Prompt chaining is the foundational orchestration pattern. Routing, parallelization, reflection, and planning all build on top of it.
Previous in series
What Makes an AI System an Agent? The Foundation of Agentic Design
Next in series
Routing and Parallelization: Scaling AI Agent Orchestration
Is your site invisible to AI search?
Get a free AEO infrastructure audit and find out what your competitors are doing that you're not.
Get Your Free AuditFurther Reading
Industry sources we cite.
3 links · External
Frequently asked.
Continue with.
Agentic AI
What Makes an AI System an Agent? The Foundation of Agentic Design
Learn what makes an AI system a true agent — from the 5-step agentic loop to the 4 levels of agent complexity. Based on Antonio Gulli's Agentic Design Patterns.
Agentic AI
Routing and Parallelization: Scaling AI Agent Orchestration
Routing dispatches each input to the right specialized path; parallelization runs independent sub-tasks at once. Together they scale agent accuracy and latency.
Agentic AI
Tool Use in AI Agents: Function Calling and Beyond
How AI agents use function calling to interact with APIs, databases, and external services. The tool use pattern explained with code examples from Antonio Gulli.