AI Agent Glossary 2026: 25 Terms Builders Need

Surya Koritala
22 Min Read

This AI agent glossary covers 25 terms every builder should know in 2026, organized for quick reference. This glossary covers 25 core terms used in AI agent building and agent infrastructure in 2026, from agent loop and tool use to Model Context Protocol, A2A, guardrails, and evals. It is intended as a reference for builders who need concise, factual definitions of the concepts that shape modern agent systems, with related-term cross-links and pointers to our guides on Model Context Protocol and A2A.

A2A

A2A, short for Agent2Agent Protocol, is an open protocol introduced by Google and partners for communication between agents across different frameworks and vendors. It is designed to let one agent discover another agent’s capabilities, exchange messages, and coordinate task execution using a shared interoperability layer. For a deeper explainer, see our guide to what A2A is and Google’s announcement.

Example: A customer support agent can use A2A to hand an order-refund task to a separate finance agent operated by another team.

See also: Agent orchestrator, Multi-agent system, Model Context Protocol

Developer documentation page for AI model function calling and tool integration
Image: source page. Used under fair use.

Agent loop

An agent loop is the repeated cycle in which an AI system receives a goal or input, reasons about the next step, takes an action such as calling a tool, observes the result, and continues until it reaches a stopping condition. This loop is the operational core of many agent systems because it turns a single model response into a sequence of decisions and actions. Frameworks such as LangGraph describe this pattern explicitly in graph-based agent workflows.

Example: An agent receives a request to book travel, searches flights, checks policy rules, asks a clarifying question, and then submits the booking in successive loop iterations.

See also: Tool use, ReAct pattern, Agent orchestrator

Agent orchestrator

An agent orchestrator is the control layer that manages how models, tools, prompts, memory, and sometimes multiple agents are coordinated to complete a task. It typically handles routing, state transitions, retries, error handling, and policy enforcement rather than generating the underlying model output itself. In production systems, orchestration often sits above the model API and below the application interface.

Example: A backend service decides whether a request should go to a planner, a retrieval step, or a browser automation tool before returning a final answer.

See also: Tool router, Multi-agent system, Agent loop

Chain-of-thought

Chain-of-thought refers to intermediate reasoning steps a model may generate while solving a problem. The term became widely used after research showing that prompting models to produce step-by-step reasoning can improve performance on some tasks, though many providers now recommend using concise reasoning controls or hidden reasoning mechanisms rather than exposing detailed internal traces to end users. In product settings, the phrase often describes reasoning behavior more broadly than literal visible text.

Example: A math-solving system may internally break a word problem into smaller calculations before returning only the final answer.

See also: Reasoning model, ReAct pattern, Inference

Computer Use

Computer use refers to model-driven interaction with graphical user interfaces such as clicking, typing, scrolling, and reading on-screen elements in software or web environments. It differs from ordinary API-based tool use because the model acts through the same interface a human user would, often using screenshots, accessibility trees, or browser automation layers. Anthropic and OpenAI have both documented computer-use style capabilities in their developer materials.

Example: An agent opens a browser, logs into a dashboard, navigates menus, and downloads a report by interacting with the UI directly.

See also: Tool use, Sandbox, Agent loop

Context window

A context window is the amount of input and generated text, measured in tokens, that a model can process in a single request. It determines how much conversation history, retrieved material, code, or instructions can be included at once before older content must be truncated, summarized, or otherwise managed. Context limits vary by model and are documented by model providers.

Example: If a long support conversation exceeds the model’s context window, the system may summarize earlier turns before continuing.

See also: Token, Memory layer, RAG

Eval

An eval, short for evaluation, is a structured method for measuring how well a model or agent system performs on defined tasks, behaviors, or safety criteria. Evals can be automated or human-reviewed and may test accuracy, tool correctness, latency, policy compliance, hallucination rates, or task completion. OpenAI, Anthropic, and other vendors publish guidance on building evals for model and agent workflows.

Example: A team runs an eval suite of 500 customer-service tasks to compare whether a new agent version resolves tickets more accurately than the previous release.

See also: Guardrails, Tool use, Agent orchestrator

Fine-tuning

Fine-tuning is the process of training a preexisting model further on a narrower dataset or task so that it better follows a desired style, domain, or behavior. It differs from prompting because the model weights are updated rather than only the request text. Providers such as OpenAI and Anthropic document fine-tuning for selected model families and use cases.

Example: A legal-tech company fine-tunes a model on contract annotation examples so it produces more consistent clause classifications.

See also: LoRA, Foundation model, Inference

Foundation model

A foundation model is a large model trained on broad data that can be adapted to many downstream tasks such as chat, coding, classification, summarization, or tool calling. The term was formalized by the Stanford Center for Research on Foundation Models to describe models that serve as general-purpose bases for many applications. In agent systems, the foundation model is often the reasoning and language engine behind planning and tool selection.

Example: A developer builds a support agent on top of a general-purpose language model and adds retrieval, tools, and policy controls around it.

See also: Frontier model, Reasoning model, Fine-tuning

Frontier model

A frontier model is a term commonly used for the most capable general-purpose models available at a given point in time. The phrase appears in policy, safety, and industry discussions to distinguish leading-edge systems from smaller or less capable models, though there is no single universal threshold. In practice, it usually refers to models at the top end of performance and scale.

Example: A safety team may apply stricter review and deployment controls to a frontier model than to a smaller task-specific model.

See also: Foundation model, Reasoning model, Guardrails

Function calling

Function calling is a model capability that lets a model return structured arguments for a predefined function or tool instead of only free-form text. The application then executes the function outside the model and can pass the result back into the conversation. OpenAI, Anthropic, Google, and others document variants of this pattern for tool integration.

Example: A model returns JSON arguments for get_weather(city="Chicago"), and the application calls the weather API before asking the model to summarize the result.

See also: Tool use, Tool router, Model Context Protocol

{
  "name": "get_weather",
  "arguments": {
    "city": "Chicago"
  }
}

Guardrails

Guardrails are controls that constrain or monitor model and agent behavior to reduce unsafe, noncompliant, or low-quality outputs and actions. They can include input validation, output filtering, policy checks, tool permissioning, human approval steps, and runtime monitoring. Guardrails are usually implemented as system-level controls around the model rather than as a single feature inside the model itself.

Example: A finance agent may be allowed to draft a payment request but blocked from executing a transfer without an approval workflow.

See also: Sandbox, Eval, Agent orchestrator

Inference

Inference is the process of running a trained model to generate outputs from new inputs. It is distinct from training or fine-tuning because the model weights are not being updated during normal inference requests. In agent systems, inference may happen many times within a single task as the agent plans, calls tools, and synthesizes results.

Example: Every time an agent decides whether to search documents or ask a follow-up question, it is making another inference call.

See also: Token, Context window, Fine-tuning

LoRA

LoRA, short for Low-Rank Adaptation, is a parameter-efficient fine-tuning method introduced in academic research that adapts a model by training a relatively small number of additional parameters. This reduces the compute and storage burden compared with updating all model weights. LoRA and related adapter methods are widely used in open model ecosystems.

Example: A team can adapt an open model for customer support tone using LoRA without retraining the full base model.

See also: Fine-tuning, Foundation model, Inference

Memory layer

A memory layer is the part of an agent system that stores and retrieves information beyond the immediate context window. It can include conversation summaries, user preferences, task state, long-term facts, or external records in databases and vector stores. Memory layers help agents maintain continuity across sessions or long workflows.

Example: A sales assistant remembers that a user prefers weekly pipeline summaries and uses that preference in future interactions.

See also: RAG, Context window, Vector embedding

Model Context Protocol

Model Context Protocol, or MCP, is an open protocol introduced by Anthropic for connecting AI assistants and models to external data sources and tools through a standardized interface. MCP defines a client-server pattern for exposing resources, prompts, and tools so that applications can integrate context providers more consistently. For a fuller overview, see our guide to Model Context Protocol and Anthropic’s announcement.

Example: A desktop coding assistant can use MCP to access a local repository indexer, documentation source, and issue tracker through a common protocol.

See also: Function calling, Tool use, A2A

Multi-agent system

A multi-agent system is an architecture in which two or more agents work together, either cooperatively or with specialized roles, to complete a broader task. Different agents may handle planning, retrieval, coding, verification, or domain-specific subtasks, with an orchestrator or protocol coordinating them. This pattern is common when tasks are too broad or heterogeneous for one agent design.

Example: One agent plans a research task, another gathers sources, and a third checks citations before a final answer is assembled.

See also: A2A, Agent orchestrator, Tool router

RAG

RAG, short for retrieval-augmented generation, is a pattern in which a system retrieves relevant external information and includes it in the model’s context before generation. It is used to ground responses in fresher or domain-specific data than the base model may contain. RAG typically combines search, ranking, chunking, and prompt assembly steps.

Example: A support bot retrieves the latest product documentation and release notes before answering a troubleshooting question.

See also: Vector embedding, Memory layer, Context window

ReAct pattern

ReAct is a prompting and agent design pattern that interleaves reasoning and acting, allowing a model to alternate between thinking about the next step and using tools or taking actions. The term comes from research showing that combining reasoning traces with actions can improve task performance in interactive settings. Many modern agent frameworks implement variants of this pattern even when the exact prompt format differs.

Example: An agent reasons that it lacks enough information, performs a search, reads the result, and then decides on the next action.

See also: Agent loop, Tool use, Chain-of-thought

Reasoning model

A reasoning model is a model optimized for tasks that benefit from multi-step problem solving, planning, or deliberate analysis. Vendors use the term somewhat differently, but it generally refers to models tuned or architected to perform better on complex tasks than standard chat-oriented models. In agent systems, reasoning models are often used for planning, tool selection, or verification-heavy steps.

Example: A workflow may use a reasoning model to design a multi-step migration plan and a cheaper general model to draft the final user-facing summary.

See also: Chain-of-thought, Frontier model, Agent loop

Sandbox

A sandbox is an isolated execution environment used to run code, tools, or computer-use actions with constrained permissions. Sandboxes reduce risk by limiting network access, filesystem access, credentials, or system-level privileges during agent execution. They are a common safety control for coding agents and browser automation systems.

Example: A coding agent can test generated Python in a sandboxed container without gaining unrestricted access to the production environment.

See also: Guardrails, Computer Use, Tool use

Token

A token is a unit of text that a model processes, and it may correspond to a whole word, part of a word, punctuation, or whitespace depending on the tokenizer. Model pricing, context limits, and generation length are commonly measured in tokens rather than characters or words. Token counts matter because they affect both cost and what can fit into a single request.

Example: A long prompt with attached documentation may consume thousands of input tokens before the model generates any output.

See also: Context window, Inference, RAG

Tool router

A tool router is the mechanism that decides which tool, function, API, or subsystem should handle a given request or subtask. Routing can be model-driven, rule-based, or hybrid, and it often considers task type, permissions, cost, latency, and reliability. In larger systems, tool routing is a key part of orchestration and failure handling.

Example: A request about account balance is routed to a banking API tool, while a request to summarize policy text is routed to retrieval plus generation.

See also: Function calling, Agent orchestrator, Tool use

Tool use

Tool use is the ability of a model or agent to invoke external capabilities such as APIs, databases, search systems, code execution environments, or browser controls. It extends the model beyond text generation by letting it act on the world or fetch information not contained in its weights. Tool use is one of the defining features that separates many agent systems from plain chat interfaces.

Example: Instead of guessing a shipment status, an agent calls the logistics API and returns the live tracking result.

See also: Function calling, Tool router, Computer Use

Vector embedding

A vector embedding is a numerical representation of text, images, or other data in a high-dimensional space where semantically similar items are placed closer together. Embeddings are widely used for semantic search, clustering, recommendation, and retrieval pipelines in agent systems. In RAG, embeddings help match a user query to relevant stored content.

Example: A documentation platform converts every article chunk into an embedding so a support agent can retrieve the most relevant passages for a question.

See also: RAG, Memory layer, Token

Frequently asked questions

What is an AI agent glossary?

An AI agent glossary is a reference list of terms used to describe how agent systems work, including concepts such as tool use, orchestration, retrieval, memory, and evaluation. Readers who want protocol-specific context can also review Anthropic’s Model Context Protocol announcement and Google’s A2A introduction.

What is the difference between tool use and function calling?

Tool use is the broader concept of an agent invoking external capabilities, while function calling is one common implementation pattern in which the model returns structured arguments for a predefined function. OpenAI’s function calling guide documents this approach directly.

Why do builders need to understand MCP and A2A?

Builders need to understand MCP and A2A because both address interoperability, but at different layers: MCP standardizes how models and assistants connect to tools and context providers, while A2A is designed for communication between agents. See Anthropic’s MCP documentation and Google’s A2A overview.

How are RAG, memory, and context windows related?

A context window limits how much information a model can process in one request, while RAG and memory layers help bring in relevant information from outside that immediate window. Pinecone’s RAG explainer and OpenAI’s tokenizer page are useful starting points.

Primary sources

Last updated: May 20, 2026. Related: Agent Infrastructure.

Share This Article
1 Comment