The Definitive Glossary of Agentic AI: 100+ Terms Explained

Q: What is agentic AI?

Agentic AI refers to systems that use a language model to pursue goals with initiative and autonomy, rather than only responding to prompts. An agentic system perceives a situation, decides on a course of action, takes that action using tools, observes the result, and repeats until the goal is met. The difference from a plain chatbot is that an agent does things in the world, not just generates text.

Q: What is the difference between MCP and A2A?

MCP (Model Context Protocol) is a standard for connecting a single agent to external tools and data sources through a uniform interface, so one integration works across compatible clients. A2A (Agent-to-Agent) is a standard for letting separate agents, often built by different teams or vendors, discover each other and collaborate. MCP connects an agent to tools; A2A connects agents to other agents.

Q: What is the difference between a context window and memory in an AI agent?

A context window is the fixed amount of text, measured in tokens, that a model can consider in a single call, and anything outside it is invisible to the model in that moment. Memory is the set of mechanisms an agent uses to retain and recall information across steps or sessions, stored outside the context window in something like a database or vector store. Memory is how an agent remembers a user between sessions; the context window is only what fits in the current request.

Space & Story Team

Part ofAgentic Design Patterns: The Complete Guide to Building Intelligent AI Systems

Based on Agentic Design Patterns by Antonio Gulli (Springer). All book royalties go to Save the Children.

Agentic AI

agentic AI glossary

AI agent terms

agentic design patterns

AI agent definitions

LLM terminology

Antonio Gulli

Space & Story Team·June 15, 2026·16 min read

The Definitive Glossary of Agentic AI: 100+ Terms Explained

Key Takeaway

A reference glossary of 100+ agentic AI terms, grouped by theme from Foundations and Orchestration to Memory, Reasoning, Reliability, and Protocols. Keep it open while you design, review, or settle which thing someone meant.

Why This Matters for Enterprise AI

Every team adopting agents ends up arguing past each other. One engineer says "memory" and means the context window, while the person across the table hears it as a vector store of past conversations. A product manager asks for "an agent" and receives a chatbot, because nobody agreed on what the word covers. The vocabulary of agentic AI moved faster than the shared understanding of it, and the gap shows up as rework, mis-scoped projects, and meetings that go in circles.

A glossary fixes the cheapest failure mode there is: people using the same word for different things. This page collects the terms that run through Space & Story's Agentic Design Patterns series and the wider field, grounded in Antonio Gulli's Agentic Design Patterns (Springer) and Anthropic's Building Effective Agents. Definitions are short on purpose. The goal is a reference you can keep open while you design, review a pull request, or settle which thing someone meant.

An abstract grid of small glowing violet nodes connected by thin lines, representing a shared vocabulary of agentic AI terms organized into clusters — A shared vocabulary is the cheapest reliability investment a team building agents can make: the same word should mean the same thing across every conversation.

The terms below are grouped by theme so related ideas sit together. Skim a section to orient, or search the page for a specific word.

Foundations

The base vocabulary. If a term shows up in every other section, it lives here.

AI agent: A system that uses a language model to perceive a situation, decide on a course of action toward a goal, and take that action, often over multiple steps. Unlike a static model, an agent does things rather than only generating text. See what makes a system an agent.
LLM (large language model): A neural network trained on large text corpora to predict the next token, which lets it generate and transform language. The LLM is the reasoning core an agent is built around.
Agentic: A property of a system that acts with initiative and autonomy toward goals, rather than only responding to direct prompts. "Agentic" describes behavior; an "agent" is the system that exhibits it.
Agentic loop: The repeating cycle an agent runs: take in a goal, gather context, decide on a next step, act, observe the result, and repeat until the goal is met. Often summarized as perceive, reason, act, observe.
Autonomy: The degree to which a system acts without human intervention. More autonomy means fewer approval gates and more decisions the agent makes on its own.
Levels of autonomy: A progression from a bare model (Level 0), to a model plus tools (Level 1), to a planning and self-improving agent (Level 2), to coordinated multi-agent systems (Level 3). The levels describe how much an agent can do unassisted.
Foundation model: A large model trained on broad data that can be adapted to many downstream tasks. LLMs are the text-and-multimodal flavor of foundation model.
Inference: The act of running a trained model to produce an output from an input. Every agent step that calls the model is an inference.
Token: The unit a model reads and writes: a word, sub-word, or character chunk. Token counts drive both cost and the limits of what fits in a single call.
Prompt: The input text given to a model, including instructions, context, and the task. The prompt is the only thing a base model knows about the current request.
Completion (response): The text a model generates in reply to a prompt. In an agent, one step's completion often becomes the next step's input.
System prompt: A persistent instruction block that sets the model's role, rules, and constraints for an entire session, separate from the user's per-turn input.
Chatbot: A conversational interface over an LLM that answers turn by turn. A chatbot becomes an agent only when it gains tools, memory, planning, or the ability to act.
Workflow vs. agent: A workflow follows a predefined path of LLM calls; an agent decides its own path at runtime. Anthropic draws this line to recommend the simplest design that works.
Determinism: Whether the same input reliably produces the same output. LLMs are non-deterministic by default, which is why agent design leans on validation and guardrails.

Orchestration & Workflows

How multiple model calls and steps get arranged into something reliable.

Orchestration: The coordination of multiple LLM calls, tools, and steps into a coherent workflow. Orchestration is the layer that turns single model calls into an agent.
Prompt chaining: Decomposing a task into a sequence of LLM calls, where each step's output feeds the next. It trades one unreliable mega-prompt for several focused, inspectable steps. See prompt chaining.
Routing: Classifying an input and directing it to the most appropriate model, prompt, or sub-agent. Routing sends a billing question and a technical question down different paths. See routing and parallelization.
Parallelization: Running independent sub-tasks at the same time instead of in sequence, then combining the results. It claws back the latency that naive chaining gives away. See routing and parallelization.
Sectioning: A parallelization style that splits a task into independent pieces handled concurrently, such as scanning a document while a separate call checks it for policy violations.
Voting: A parallelization style that runs the same task several times and aggregates the answers, used to raise confidence or catch outliers.
Orchestrator-worker: A pattern where a lead agent breaks a goal into sub-tasks, delegates them to worker agents, and synthesizes their results. The orchestrator plans; the workers execute.
Evaluator-optimizer: A loop where one model generates an output and another critiques it, repeating until the critique is satisfied. It is reflection split across two roles.
Pipeline: An ordered series of processing steps where data flows from one stage to the next. A prompt chain is a pipeline of LLM calls.
State: The information an agent carries forward between steps: the goal, intermediate results, and history. Passing state cleanly between steps is where much of a workflow's reliability lives.
Handoff: The transfer of control or context from one step, tool, or agent to the next. Clean handoffs depend on predictable output formats.
Control flow: The logic that decides which step runs next: in sequence, in parallel, in a loop, or conditionally. In agents, the model itself often drives control flow.
Graph (agent graph): A representation of an agent as nodes (steps) connected by edges (transitions), used by frameworks like LangGraph to make control flow explicit and inspectable.

Tools & Integration

How an agent reaches beyond text to act on the world.

Tool use: An agent's ability to call external functions, APIs, databases, or services to gather information or take action beyond its training data. Tool use is what turns a text generator into a doer. See tool use in AI agents.
Function calling: The mechanism by which a model outputs a structured request to invoke a named function with arguments, which the surrounding code then executes. It is the concrete implementation of tool use.
Tool: A capability exposed to an agent, defined by a name, a description, and an input schema. The description teaches the model when and how to use it.
Tool schema: The structured definition (usually JSON Schema) of a tool's name, purpose, and parameters. A clear schema is what lets a model call a tool correctly.
API (application programming interface): A defined way for software to talk to other software. Most agent tools are wrappers around APIs.
MCP (Model Context Protocol): An open standard that defines how agents connect to external tools and data sources through a uniform interface, so an integration built once works across compatible clients. It is the USB-C port of agent tooling: one connector, many devices. See the Model Context Protocol.
MCP server: A service that exposes tools, resources, or prompts to agents over MCP. One server can serve many different agent clients.
MCP client: The agent-side component that connects to MCP servers and makes their tools available to the model.
A2A (Agent-to-Agent): An open protocol for agents built by different teams or vendors to discover each other and collaborate, where MCP connects an agent to tools and A2A connects agents to other agents. See the Agent-to-Agent protocol.
Code execution: Giving an agent a sandbox to write and run code, turning open-ended computation (math, data wrangling, plotting) into a single tool.
Sandbox: An isolated environment where an agent's tool calls or generated code run without access to production systems. Sandboxing is a core safety control for action-taking agents.
Computer use: A capability that lets an agent operate a graphical interface by reading the screen and controlling the mouse and keyboard, used when no API exists.
Retrieval tool: A tool that fetches relevant documents or records on demand, the bridge between an agent and a knowledge base.
Structured output: Model output constrained to a defined format such as JSON, so downstream code can parse it reliably. Structured output is what makes tool handoffs predictable.

Memory & Knowledge

How agents access information they were not trained on and remember across time.

Context window: The maximum amount of text, measured in tokens, a model can consider in a single call. Everything the model "knows" in the moment must fit inside it.
Memory: The mechanisms an agent uses to retain and recall information across steps or sessions, distinct from the fixed context window. See memory in AI agents.
Short-term memory: Information held within the current context window or session, such as the running conversation. It vanishes when the window fills or the session ends.
Long-term memory: Information persisted outside the context window, in a database or vector store, and retrieved when relevant. It is how an agent remembers a user across sessions.
Episodic memory: Memory of specific past events or interactions, such as "last week this user asked about refunds." It anchors an agent in its own history.
Semantic memory: Memory of general facts and concepts an agent holds about the world or a domain, independent of when they were learned.
Procedural memory: Memory of how to perform tasks, encoded as learned skills, routines, or reusable instructions rather than facts.
RAG (Retrieval-Augmented Generation): A pattern that retrieves relevant external documents and inserts them into the prompt so the model answers from current, specific information instead of training data alone. See RAG for AI agents.
Embedding: A numeric vector that captures the meaning of a piece of text, so that semantically similar texts sit close together in vector space. Embeddings power semantic search and retrieval.
Vector store (vector database): A database that indexes embeddings and returns the nearest matches to a query vector. It is the storage layer most RAG and long-term memory systems run on.
Semantic search: Search that ranks results by meaning rather than exact keyword match, using embedding similarity. It finds the relevant passage even when the words differ.
Chunking: Splitting documents into smaller passages before embedding them, so retrieval returns focused pieces rather than whole files. Chunk size is a key tuning knob in RAG quality.
Knowledge base: The corpus of documents, records, or facts an agent retrieves from. Its quality sets the ceiling on what RAG can return.
Grounding: Tying a model's output to verifiable source information, so claims trace back to retrieved evidence rather than the model's parameters. Grounding is the main defense against hallucination.
Context engineering: The discipline of selecting, formatting, and managing exactly the right information to put in the context window at each step. It is as decisive for agent quality as the model choice.
Prompt engineering: The practice of designing prompts and instructions to get reliable, accurate model behavior. It governs how a single call performs; context engineering governs what that call sees.
Retrieval: The step of fetching relevant information from a knowledge base in response to a query, the "R" in RAG.
Re-ranking: A second pass that reorders retrieved candidates by relevance using a stronger model, improving what reaches the prompt.

Reasoning & Planning

How agents think through problems and decide what to do.

Reasoning: The process by which a model works through a problem in intermediate steps to reach a conclusion, rather than answering in one leap. See reasoning techniques.
Chain-of-thought (CoT): Prompting a model to show its intermediate reasoning steps before the final answer, which improves accuracy on multi-step problems. The thinking becomes visible and checkable.
Tree-of-thought (ToT): An extension of chain-of-thought that explores several reasoning branches in parallel and selects the most promising, trading compute for harder-problem performance.
Self-consistency: Sampling multiple independent reasoning paths for the same question and taking the majority answer, which smooths over any single bad chain.
ReAct: A pattern that interleaves reasoning and acting: the model thinks, takes a tool action, observes the result, and thinks again. It is the loop most tool-using agents run.
Reflection: An agent reviewing its own output against the goal and revising it, turning a first draft into a corrected one. Reflection is how agents self-correct without a human in the loop. See reflection and adaptation.
Self-critique: A model generating an explicit critique of its own work as the input to a revision step. It is the mechanism inside reflection.
Planning: Decomposing a high-level goal into an ordered set of steps before execution, then carrying them out. Planning is what lets an agent handle tasks too large for a single action. See planning in AI agents.
Task decomposition: Breaking a complex goal into smaller, tractable sub-tasks. It is the first move in both planning and orchestration.
Plan-and-execute: A two-phase approach where the agent first drafts a full plan, then executes the steps, optionally re-planning if reality diverges.
Goal: The objective an agent is trying to achieve, stated by a user or a parent agent. Everything in the agentic loop serves the goal.
Subgoal: An intermediate objective that contributes to the main goal, produced by task decomposition.
Zero-shot prompting: Asking a model to perform a task with instructions only and no worked examples. It relies entirely on the model's pretrained ability.
Few-shot prompting: Giving a model a handful of worked examples in the prompt to demonstrate the desired pattern before the real task. Examples often beat lengthy instructions.
In-context learning: A model's ability to pick up a new task from examples or instructions in the prompt, without any change to its weights. It is what makes few-shot prompting work.

Multi-Agent Systems

How several agents divide labor and coordinate.

Multi-agent system (MAS): A system of several agents, often specialized, that coordinate to accomplish a goal no single agent handles well alone. It mirrors how human organizations divide labor. See multi-agent systems.
Specialist agent: An agent scoped to one role or domain, such as a research agent or a coding agent, designed to do one thing well within a larger team.
Orchestrator agent: A coordinating agent that assigns work to specialists and integrates their outputs, the manager in an agent team.
Coordination: The mechanisms by which agents share information, divide tasks, and avoid stepping on each other. Poor coordination is the main reason multi-agent systems underperform a good single agent.
Agent communication: The messages agents exchange to delegate, report, or negotiate, increasingly over standards like A2A so agents from different vendors interoperate.
Role: The defined responsibility and scope assigned to an agent in a team, which shapes its prompt, its tools, and what it is allowed to decide.
Delegation: One agent assigning a sub-task to another and trusting it to return a usable result. Delegation is what makes the orchestrator-worker pattern scale.
Blackboard: A shared memory space where agents post and read intermediate results, used as a coordination medium in some multi-agent designs.
Swarm: A multi-agent approach where many lightweight agents follow simple rules and useful behavior emerges from their interaction, rather than from central control.

Reliability & Safety

How agents are kept accurate, controlled, and safe to deploy.

Hallucination: A model generating output that is fluent but false or unsupported by any source. It is the central reliability risk in any LLM system, and grounding is the main mitigation.
Guardrails: Rules and checks that constrain what an agent can say or do, blocking unsafe, off-policy, or out-of-scope actions. Guardrails are what make an autonomous agent safe to deploy. See guardrails and safety.
Prompt injection: An attack where malicious instructions hidden in user input or retrieved content hijack an agent's behavior. It is the agent-era equivalent of a code-injection vulnerability.
Jailbreak: A crafted input that bypasses a model's safety training to elicit prohibited output. Defending against jailbreaks is part of any guardrail strategy.
Human-in-the-loop (HITL): A design where a person reviews or approves an agent's actions at defined checkpoints, especially before high-stakes or irreversible steps. See exception handling and human-in-the-loop.
Human-on-the-loop: A lighter-touch oversight model where a person monitors an agent running autonomously and intervenes only when needed, rather than approving every step.
Exception handling: How an agent detects and recovers from failures: a tool error, an invalid output, or an unexpected state. Handling these gracefully separates a demo from a production agent.
Fallback: A predefined safe alternative an agent takes when its primary path fails, such as escalating to a human or returning a cautious default.
Confidence: A signal of how sure an agent is about an answer or action, used to decide when to ask for help or trigger a fallback.
Validation: Checking an output against expected format or content before passing it forward, the between-steps defense against error propagation.
Error propagation: When a mistake early in a chain is faithfully carried forward and compounded by later steps. Validation between steps is the standard guard against it.
Alignment: How well a model's behavior matches human intent and values. Agent guardrails operate on top of a model's underlying alignment training.
Content moderation: Filtering inputs and outputs for unsafe, harmful, or non-compliant material, often as an explicit guardrail layer around the model.
Least privilege: Granting an agent only the permissions and tool access it strictly needs, so a compromised or confused agent can do limited damage.

Operations & Evaluation

How agents are measured, observed, and run in production.

Evaluation (eval): The systematic measurement of how well an agent performs against a defined set of test cases or criteria. Without evals, "it seems better" is the only signal you have. See monitoring and evaluation.
LLM-as-judge: Using a separate model to score or grade another model's output against a rubric, which makes evaluating open-ended responses scalable. The judge prompt is itself something you tune and test.
Benchmark: A standardized dataset and scoring method used to compare models or agents on a defined task. Benchmarks measure general capability, not your specific use case.
Golden dataset: A curated set of inputs with known-good outputs, used as the reference for regression testing an agent over time.
Observability: The ability to understand an agent's internal behavior from the outputs it emits: logs, traces, and metrics. You cannot debug what you cannot see.
Tracing: Recording the full sequence of an agent's steps, tool calls, prompts, and responses for a single run, so a failure can be reconstructed end to end.
Logging: Capturing discrete events and data points during an agent's execution, the raw material of observability.
Monitoring: Tracking an agent's health, performance, and quality metrics over time in production, with alerts when they drift.
Latency: The time between a request and the agent's response. Each added step, tool call, or model invocation increases it, which is why parallelization matters.
Throughput: The number of requests or tasks an agent system can handle per unit of time, a capacity measure distinct from per-request latency.
Cost-per-task: The total spend (tokens, tool calls, compute) to complete one unit of work. It is the unit economics number that decides whether an agent is viable at scale. See resource optimization.
Token cost: The price charged per input and output token by a model provider, the dominant variable cost in most agent systems.
Caching: Reusing previously computed results, such as prompt prefixes or retrieval results, to cut latency and cost on repeated work.
Prompt caching: Caching the processed form of a repeated prompt prefix (a long system prompt or document) so it is not re-charged and re-processed on every call.
Model cascade: Routing simple cases to a cheap, fast model and escalating only hard cases to an expensive one, a core cost-optimization tactic.
Regression: A drop in agent quality on previously passing cases after a change. Golden datasets and evals exist to catch regressions before users do.
Drift: Gradual degradation in agent performance as the world, the data, or the model behind it changes over time.

Models & Tuning

The model layer agents are built on, and how it gets adapted.

Fine-tuning: Further training a base model on a focused dataset to specialize its behavior or style. It bakes knowledge into weights, where RAG keeps knowledge external.
Pretraining: The initial, large-scale training that gives a foundation model its general capabilities, before any task-specific adaptation.
RLHF (Reinforcement Learning from Human Feedback): A training stage that aligns a model to human preferences using human-rated examples. It is much of why instruction-following models feel helpful.
Temperature: A sampling setting that controls randomness in a model's output: low for focused and deterministic, high for varied and creative. Agents usually run low for predictability.
Top-p (nucleus sampling): A sampling setting that limits choices to the smallest set of tokens whose probabilities sum to p, another lever on output randomness.
Parameters (weights): The learned numeric values that define a model's behavior. Parameter count is a rough, imperfect proxy for capability.
Quantization: Compressing a model's weights to lower numeric precision to shrink memory use and speed inference, usually with minor quality loss.
Distillation: Training a smaller model to imitate a larger one, yielding a cheaper model that retains much of the original's quality for a target task.
Multimodal model: A model that handles more than text, such as images, audio, or video, alongside language. Multimodal agents can read a screenshot or a chart.
Reasoning model: A model trained or configured to spend extra inference compute on internal deliberation before answering, trading latency for accuracy on hard problems.
Context length: The size of a model's context window, a headline spec that bounds how much an agent can consider at once.

Ecosystem & Protocols

The frameworks, standards, and surrounding pieces teams build with.

Agent framework: A library that provides the scaffolding for building agents: orchestration, tool integration, memory, and state. Examples include LangChain, LangGraph, CrewAI, and Google's Agent Development Kit.
LangChain: A widely used framework for composing LLM calls, tools, and chains into applications, often the first stop for building agentic workflows.
LangGraph: A framework for building agents as explicit graphs of nodes and edges, giving fine control over state and control flow.
CrewAI: A framework oriented around multi-agent teams, where role-based agents collaborate on a shared task.
Google ADK (Agent Development Kit): Google's toolkit for building and deploying agents, with primitives like sequential and parallel agents for orchestration.
MCP (protocol): The Model Context Protocol again, listed here as ecosystem plumbing: the standard that lets tools and agents interoperate without bespoke integrations.
llms.txt: A proposed root-level file that gives LLMs a concise, machine-readable brief on a site's content and structure, an emerging AI-discoverability convention.
Agentic RAG: RAG where an agent actively decides what to retrieve, when, and from where, rather than running a single fixed retrieval step. It is retrieval under the agent's control.
Agentic workflow: Any process where an LLM directs a multi-step sequence of reasoning and actions toward a goal, the umbrella term for the patterns in this series.
Toolkit (agent toolkit): A curated set of frameworks, protocols, and services for building agents in practice. See the agentic AI toolkit.
Token budget: The cap a team sets on tokens per task or session to control cost and keep context windows manageable, a practical lever in resource optimization.
Streaming: Returning a model's output incrementally as it is generated, rather than all at once, which improves perceived latency in interactive agents.

Key Takeaways

A shared vocabulary is the cheapest reliability investment a team can make: most early agent confusion is two people using one word for different things.
The core distinction to internalize is workflow versus agent. A workflow follows a fixed path; an agent decides its own path at runtime, and you should reach for the simplest design that works.
Memory, retrieval, and context are separate ideas. The context window is what fits in one call, memory persists across calls, and retrieval (RAG) pulls in external knowledge on demand.
Protocols draw the connections: MCP links an agent to tools and data, while A2A links agents to other agents.
Reliability is a stack, not a feature. Guardrails, validation, human-in-the-loop, observability, and evals each cover a different failure mode, and production agents use all of them.
Use this page as a companion to the full Agentic Design Patterns series, where each term that links out gets a full treatment.

Previous in series

The Agentic AI Toolkit: Frameworks, Environments, and CLI Agents

Is your site invisible to AI search?

Get a free AEO infrastructure audit and find out what your competitors are doing that you're not.

Get Your Free Audit

Industry sources we cite.

3 links · External

Quick answers

Frequently asked.

Keep reading

Continue with.

Agentic AI

Agentic Design Patterns: The Complete Guide to Building Intelligent AI Systems

A 19-part series on the design patterns behind production AI agents, based on Antonio Gulli's Agentic Design Patterns. From prompt chaining to multi-agent orchestration.

March 23, 2026·5mRead

Agentic AI

What Makes an AI System an Agent? The Foundation of Agentic Design

What makes an AI system a true agent, from the 5-step agentic loop to the 4 levels of agent complexity. Based on Antonio Gulli's Agentic Design Patterns.

March 23, 2026·11mRead

Agentic AI

The Agentic AI Toolkit: Frameworks, Environments, and CLI Agents

A working tour of the agent-building stack: orchestration frameworks (LangGraph, CrewAI, AutoGen, ADK), agent platforms, and CLI coding agents, and when to reach for each.

June 15, 2026·11mRead

The Definitive Glossary of Agentic AI: 100+ Terms Explained

Why This Matters for Enterprise AI

Foundations

Orchestration & Workflows

Tools & Integration

Memory & Knowledge

Reasoning & Planning

Multi-Agent Systems

Reliability & Safety

Operations & Evaluation

Models & Tuning

Ecosystem & Protocols

Key Takeaways

Further Reading

Industry sources we cite.

Frequently asked.

What is agentic AI?

What is the difference between MCP and A2A?

What is the difference between a context window and memory in an AI agent?

Continue with.

Agentic Design Patterns: The Complete Guide to Building Intelligent AI Systems

What Makes an AI System an Agent? The Foundation of Agentic Design

The Agentic AI Toolkit: Frameworks, Environments, and CLI Agents