What Is Foundry IQ? Microsoft's Agentic Retrieval Layer

Q: What is Foundry IQ in simple terms?

Foundry IQ is Microsoft's managed knowledge layer for AI agents. It turns scattered enterprise data into reusable, permission-aware knowledge bases and uses agentic retrieval — decomposing a question into sub-queries, searching multiple sources in parallel, reranking, and iterating — to return grounded answers with citations behind one endpoint.

Q: Is Foundry IQ just RAG?

Not exactly. It is the same core idea — grounding a model in your data — but the machinery is upgraded from single-shot RAG into a multi-query control loop. Foundry IQ decomposes the query, runs concurrent keyword/vector/hybrid searches, semantically reranks, iterates when evidence is thin, and enforces permissions. Microsoft reports up to 36% better relevance and 54% better recall on multi-hop questions versus single-shot RAG.

Q: What is the difference between Foundry IQ, Work IQ, and Fabric IQ?

They are three layers, not competitors. Fabric IQ is the semantic data foundation over OneLake and Power BI; Foundry IQ is the knowledge and retrieval middle layer that grounds custom agents; Work IQ is the Microsoft 365 collaboration-context top layer. Foundry IQ can even consume Work IQ and Fabric IQ as knowledge sources.

Q: How is Foundry IQ different from Microsoft Graph?

Microsoft Graph is the access layer — a unified API that retrieves Microsoft 365 content and enforces who can see what. Foundry IQ is the understanding layer that sits above it: it decomposes questions, ranks by relevance, and returns grounded answers. Graph enables access; IQ enables understanding. Foundry IQ uses the same identity and permission fabric to enforce access at query time.

Q: How much does Foundry IQ Serverless cost?

Foundry IQ Serverless is in public preview with scale-to-zero pricing — roughly $0.24 per compute unit/hour and up to $0.29 per GB/month of storage, with a 1 GB/index limit. Billing is expected to start in late 2026. On top of the tier, agentic retrieval bills per token in both Azure AI Search (reranking) and Azure OpenAI (query planning and answer synthesis); Microsoft's worked example totals about $4.32 for 2,000 retrievals.

Q: What is Web IQ in Foundry IQ?

Web IQ is the extension that gives Foundry IQ agents live web context — web, news, images, video, and shopping — with sub-165 ms latency and zero data retention, respecting publisher preferences. It lets agents ground answers in fresh public information alongside your private enterprise knowledge sources.

A vendor-neutral explainer: what Foundry IQ actually is, how it differs from Work IQ, Fabric IQ, Web IQ and plain Microsoft Graph, whether it is really ‘just RAG’, and the honest pricing and lock-in caveats Microsoft’s own docs leave out.

Contents

What is Foundry IQ, in one precise definition?

Foundry IQ is Microsoft’s managed knowledge-and-retrieval layer for AI agents: it turns scattered enterprise content into a reusable, permission-aware knowledge base that any agent can query through one endpoint, using an agentic retrieval engine instead of classic single-shot RAG. If you are asking what is Foundry IQ at a technical level, the cleanest mental model is: a configurable, multi-source knowledge base plus a reasoning-driven retriever that decomposes questions, searches several sources in parallel, reranks, and returns grounded answers with citations.

Microsoft’s own definition, from the Foundry Learn docs, calls it “a managed knowledge layer that turns enterprise data into reusable, permission-aware knowledge bases for AI agents.” A knowledge base is the top-level object; it bundles knowledge sources (connections to internal and external data stores) and the parameters that control retrieval behavior. Multiple agents can share one knowledge base, which is the part that matters operationally — you configure retrieval once and reuse it across an agent fleet.

Foundry IQ was first introduced at Microsoft Ignite 2025 and then substantially expanded at Build 2026 (June 2026) with serverless retrieval, new knowledge sources, Web IQ, and an SLA-backed endpoint. Under the hood it is powered by Azure AI Search’s agentic retrieval pipeline; Foundry IQ is the productized, agent-facing experience of that pipeline inside the new Microsoft Foundry portal.

Illustration of an enterprise knowledge retrieval layer routing a query across multiple data sources — Image.

Foundry IQ = a multi-source, permission-aware knowledge base + an agentic (multi-query, reranking, iterating) retriever, exposed to agents behind one managed endpoint. It is the knowledge/grounding layer in Microsoft’s three-part IQ stack.

Foundry IQ explained: how does agentic retrieval actually work?

Foundry IQ’s agentic retrieval treats a query as a reasoning task: an LLM breaks one complex question into focused sub-queries, runs them in parallel across keyword, vector, and hybrid search, semantically reranks each result set, iterates when the signal is weak, and returns a unified answer with citations and an activity plan. That loop — plan, search, rerank, decide, search again or stop — is what separates it from a fixed retrieve-then-generate pipeline.

Walking the pipeline from the Azure AI Search docs: (1) your app calls the knowledge base with a query plus conversation history; (2) an LLM does query planning, analyzing the whole chat thread to find the underlying need and rewriting it into sub-queries — it even corrects spelling and expands synonyms; (3) all sub-queries execute simultaneously against your sources as keyword, vector, or hybrid searches, and each is semantically reranked (the L2 semantic ranker) to promote the best matches; (4) results are merged into a three-part response: grounding content, source references for citation, and an execution plan you can inspect.

Two design choices stand out. First, retrieval mode is index-driven: if your index has both text and vector fields, you get hybrid search automatically; if it only has a vector field, you get pure vector search. Second, the agentic loop adds latency on purpose — it invokes the full query pipeline multiple times — but runs those passes in parallel to keep response times reasonable. You trade a few hundred milliseconds and extra tokens for materially better recall on hard questions.

“Classic RAG does one retrieve-then-generate pass. Agentic retrieval turns retrieval into a control loop: plan, search in parallel, rerank, decide, and search again if the evidence is thin.”
Foundry IQ agentic retrieval, summarized

Is Foundry IQ just RAG? Agentic retrieval vs classic single-shot RAG

No — Foundry IQ is not ‘just RAG’, but it is not magic either: it is RAG upgraded from a single-shot pipeline into a multi-query control loop. Classic RAG embeds your question once, hits one index once, and generates. Foundry IQ’s agentic retrieval decomposes the question, searches multiple sources concurrently, reranks, iterates with follow-ups, and enforces permissions before answering. The honest framing: same core idea (ground the model in your data), meaningfully different machinery.

Microsoft’s published benchmarks put numbers on the gap. On complex, multi-hop queries, agentic retrieval improves response relevance by up to 36% versus traditional single-shot RAG, and Microsoft reports recall improvements of up to 54% over single-shot retrieval — roughly a +20-point lift in answer quality on the harder questions where single-shot RAG tends to miss a sub-fact. That delta is real, but it concentrates on compound and ambiguous questions; for a straightforward ‘what’s our refund policy’ lookup, single-shot RAG is often just as good and cheaper.

The table below is the side-by-side most marketing pages skip. Use it to decide honestly whether you need agentic retrieval or whether plain RAG already serves your traffic.

Dimension	Classic single-shot RAG	Foundry IQ agentic retrieval
Query handling	One query, embedded once	LLM decomposes into focused sub-queries
Search modes	Single mode per pass (usually vector)	Concurrent keyword + vector + hybrid
Reranking	Optional, often a single reranker	Semantic L2 reranking on every sub-query
Iteration / follow-ups	None — retrieve once, then generate	Iterates and reissues queries if signal is weak
Sources per query	Typically one index	Multiple sources fanned out in parallel
Citations	DIY — you assemble references	Built-in references + activity plan returned
Permissions	You enforce at the app layer	Identity- and document-level enforced at query time
Billing unit	Per query (uniform)	Per token (varies with reasoning effort)
Best for	Simple, high-volume doc lookups	Multi-hop, ambiguous, multi-source questions

Classic single-shot RAG vs Foundry IQ agentic retrieval — the honest side-by-side.

Rule of thumb: agentic retrieval earns its token cost when answers require stitching facts across sources or hops. For single-fact lookups at high volume, classic RAG is often the cheaper, equally-goo

Foundry IQ vs Work IQ vs Fabric IQ vs Web IQ: where are the boundaries?

The IQ stack is three layers plus a web extension. Fabric IQ is the semantic data foundation over OneLake and Power BI; Foundry IQ is the knowledge/retrieval middle layer that grounds custom agents; Work IQ is the Microsoft 365 collaboration-context top layer; and Web IQ extends Foundry IQ’s reach to live web search. Each is standalone, but they compose to give an agent structured-data context, grounded knowledge, and human-collaboration context at once.

Concretely: Fabric IQ models business data — ontologies, semantic models, graphs, data agents — so an agent can reason over analytics in OneLake and Power BI. Foundry IQ connects structured and unstructured content across Azure, SharePoint, OneLake, and the web into permission-aware knowledge bases for custom agents you build in Foundry. Work IQ is a contextual layer for Microsoft 365 that captures collaboration signals from documents, meetings, chats, and workflows so agents understand how your org actually operates. Web IQ brings live web, news, image, and video context with sub-165 ms latency and zero data retention.

The one-line separation that keeps people unstuck: Fabric IQ is for your data, Foundry IQ is for your knowledge, Work IQ is for your work patterns, and Web IQ is for the open web. The foundry iq vs work iq vs fabric iq confusion almost always comes from treating them as competitors — they are layers, and at Build 2026 Microsoft began letting Foundry IQ consume Work IQ and Fabric IQ as knowledge sources.

Fabric IQ → structured data (OneLake/Power BI). Foundry IQ → knowledge grounding/retrieval. Work IQ → M365 collaboration context. Web IQ → live web. Microsoft Graph → the access/permission plumbing underneath all of them.

Foundry IQ vs Microsoft Graph: aren’t these the same thing?

No. Microsoft Graph is the access layer; the IQ layers are the understanding layer. The cleanest phrasing is ‘Graph enables access, IQ enables understanding.’ Graph is the unified API that answers ‘which files or emails exist and who can see them’; Foundry IQ sits above it and answers ‘what is the relevant, grounded knowledge for this question.’

Microsoft Graph is a unified API and permission-aware access layer across Microsoft 365 — it retrieves emails, files, calendars, Teams, and SharePoint content and enforces who can see what. What Graph does not do is interpret meaning, decompose a question, rank by relevance, or assemble grounded answers. That is exactly the job Foundry IQ (and Work IQ and Fabric IQ) add on top.

Importantly, the two are complementary, not alternatives. Foundry IQ leans on identity and permissions plumbing — it runs queries under the caller’s Microsoft Entra identity, synchronizes access control lists, and honors Microsoft Purview sensitivity labels — to enforce document-level access at query time. In other words, the IQ layers ride on the same access fabric Graph represents; they add reasoning and retrieval, not a parallel permission system.

Pros

Removes a large amount of custom RAG plumbing: ingestion, chunking, embedding, reranking, citations
Multi-source fan-out behind one SLA-backed endpoint, including Work IQ, Fabric IQ, Azure SQL, File Search, and MCP
Identity- and document-level permission enforcement at query time via Entra, ACL sync, and Purview labels
Measurable relevance and recall gains on multi-hop questions over single-shot RAG
Serverless scale-to-zero suits bursty, event-driven agent workloads

Cons

Token-based billing can exceed per-query RAG on high-volume simple lookups with no relevance benefit
Deep coupling to Azure AI Search, Entra, Purview, and the Foundry portal raises switching cost
Serverless and several knowledge sources are in preview, single-region, without full production SLA
Agentic loop adds latency; reasoning effort must be tuned per workload
Portal access is preview-only; GA features require the latest Search REST API and code

Foundry IQ serverless pricing and lock-in: what’s the honest read?

36%

Relevance lift on multi-hop

Agentic retrieval vs single-shot RAG (Microsoft)

54%

Recall improvement

Over single-shot retrieval (Microsoft)

$4.32

Example query-execution cost

2,000 retrievals, Microsoft worked example

<165 ms

Web IQ latency

Live web search, zero data retention

Foundry IQ Serverless is in public preview with scale-to-zero pricing: you pay only for compute and storage you use, and the service idles to zero when no queries run. Microsoft’s published preview rates are about $0.24 per compute unit/hour and up to $0.29 per GB/month of indexed storage, with a 1 GB/index limit, 30 indexes per service, and 5 services per subscription per region. Billing is expected to begin in late 2026. For bursty agent traffic, scale-to-zero is the headline benefit — no idle clusters to pay for.

But serverless tier pricing is only half the bill. Agentic retrieval also bills per token across two services: Azure AI Search charges for retrieval and reranking tokens (a free monthly allowance, then pay-as-you-go), and Azure OpenAI charges for the query-planning and answer-synthesis tokens on whatever model you assign. Microsoft’s own worked example — 2,000 retrievals with three sub-queries each, reranking 50 chunks per sub-query — lands at roughly $3.30 for reranking in Search plus about $1.02 for query planning in Azure OpenAI, for $4.32 total. That scales linearly with sub-query fan-out, so high reasoning effort on high volume gets expensive fast.

On lock-in: be clear-eyed. Foundry IQ is inseparable from Azure AI Search, leans on Entra and Purview for permissions, and its agent wiring lives in the Foundry portal. The moment you adopt Work IQ and Fabric IQ as knowledge sources, your retrieval logic, your permission model, and your data semantics are all Microsoft-native. That is fine if you are an Azure shop consolidating on Foundry — it is a meaningful caveat if you value portability. Price the exit (re-indexing, re-implementing permission-aware retrieval elsewhere) before you commit.

Build vs buy: Foundry IQ vs pgvector plus a reranker

Foundry IQ is real infrastructure, not a RAG rebrand — but buy it for the right reasons

Foundry IQ’s durable value is the permission-aware, multi-source knowledge base behind one SLA-backed endpoint, not the agentic loop itself. Adopt it when you have messy multi-source data and real document-level permissions, and when consolidating on Azure is acceptable. If your retrieval is one clean index with simple questions, pgvector plus a reranker stays cheaper and more portable. And price the per-token bill and the lock-in before the demo wins you over.

Build it yourself with pgvector plus a reranker when your corpus is one or two tidy indexes, your questions are simple, and portability matters. Buy Foundry IQ when you have many messy sources, real document-level permissions to enforce, and questions that genuinely need decomposition across sources. The decision is less about capability and more about how much retrieval plumbing you want to own.

The DIY path is well-trodden and cheap: Postgres with the pgvector extension for storage and similarity search, an open or hosted reranker (a cross-encoder or a Cohere/Voyage-style rerank API) to reorder candidates, and a thin agent loop if you want query decomposition. You keep full control, full portability, and predictable per-query cost. What you also keep: ingestion and chunking, embedding refresh, ACL synchronization, citation assembly, multi-source routing, and the eval harness to prove it all works. Foundry IQ folds those into a managed service — that is precisely what you are paying for.

A blunt heuristic: if you can describe your retrieval as ‘one index, top-k, optional rerank,’ pgvector plus a reranker will be cheaper and more portable, and you should not buy. If you find yourself building query planners, fanning out to SharePoint plus OneLake plus the web, syncing Purview labels, and enforcing per-document permissions under each user’s identity, you are re-implementing Foundry IQ — and the managed version, with its SLA-backed endpoint, is likely the better trade.

Builder’s take

I build retrieval into production agents at Cyntr and Loomfeed, so I read the Foundry IQ launch through one lens: does it remove plumbing I’d otherwise own? Here’s my honest read.

The real product is the multi-source knowledge base plus permission-aware retrieval behind one endpoint — not the agentic loop, which you can build yourself in an afternoon.
Agentic retrieval is genuinely better on multi-hop questions, but it bills per token, not per query. On high-volume, simple lookups it can cost more than a single-shot vector search for no relevance gain. Match the reasoning effort to the question.
Serverless scale-to-zero is the most underrated piece for bursty agent workloads — but it’s preview, single-region, and the moment you wire in Work IQ + Fabric IQ + Purview ACLs you are deep in the Microsoft stack. Price the exit before you price the entry.
If your corpus is one tidy index and your questions are simple, pgvector plus a reranker still wins on cost and portability. Foundry IQ earns its keep when you have many messy sources, real document-level permissions, and questions that need decomposition.

Frequently asked questions

What is Foundry IQ in simple terms?

Foundry IQ is Microsoft’s managed knowledge layer for AI agents. It turns scattered enterprise data into reusable, permission-aware knowledge bases and uses agentic retrieval — decomposing a question into sub-queries, searching multiple sources in parallel, reranking, and iterating — to return grounded answers with citations behind one endpoint.

Is Foundry IQ just RAG?

Not exactly. It is the same core idea — grounding a model in your data — but the machinery is upgraded from single-shot RAG into a multi-query control loop. Foundry IQ decomposes the query, runs concurrent keyword/vector/hybrid searches, semantically reranks, iterates when evidence is thin, and enforces permissions. Microsoft reports up to 36% better relevance and 54% better recall on multi-hop questions versus single-shot RAG.

What is the difference between Foundry IQ, Work IQ, and Fabric IQ?

They are three layers, not competitors. Fabric IQ is the semantic data foundation over OneLake and Power BI; Foundry IQ is the knowledge and retrieval middle layer that grounds custom agents; Work IQ is the Microsoft 365 collaboration-context top layer. Foundry IQ can even consume Work IQ and Fabric IQ as knowledge sources.

How is Foundry IQ different from Microsoft Graph?

Microsoft Graph is the access layer — a unified API that retrieves Microsoft 365 content and enforces who can see what. Foundry IQ is the understanding layer that sits above it: it decomposes questions, ranks by relevance, and returns grounded answers. Graph enables access; IQ enables understanding. Foundry IQ uses the same identity and permission fabric to enforce access at query time.

How much does Foundry IQ Serverless cost?

Foundry IQ Serverless is in public preview with scale-to-zero pricing — roughly $0.24 per compute unit/hour and up to $0.29 per GB/month of storage, with a 1 GB/index limit. Billing is expected to start in late 2026. On top of the tier, agentic retrieval bills per token in both Azure AI Search (reranking) and Azure OpenAI (query planning and answer synthesis); Microsoft’s worked example totals about $4.32 for 2,000 retrievals.

What is Web IQ in Foundry IQ?

Web IQ is the extension that gives Foundry IQ agents live web context — web, news, images, video, and shopping — with sub-165 ms latency and zero data retention, respecting publisher preferences. It lets agents ground answers in fresh public information alongside your private enterprise knowledge sources.

Primary sources

What is Foundry IQ? — Microsoft Learn
Agentic retrieval overview — Microsoft Learn
Build smarter agents faster with Foundry IQ — Microsoft Foundry Blog
Foundry IQ: boost response relevance by 36% with agentic retrieval — Microsoft Community Hub
Work IQ, Fabric IQ, Foundry IQ vs Microsoft Graph — VisualLabs
Making Sense of Microsoft’s AI Strategy: Work IQ, Fabric IQ, Foundry IQ — James Serra’s Blog

Last updated: June 3, 2026. Related: Agent Infrastructure.

What Is Foundry IQ? Microsoft’s Agentic Retrieval Layer

What is Foundry IQ, in one precise definition?

Foundry IQ explained: how does agentic retrieval actually work?

Is Foundry IQ just RAG? Agentic retrieval vs classic single-shot RAG

Foundry IQ vs Work IQ vs Fabric IQ vs Web IQ: where are the boundaries?

Foundry IQ vs Microsoft Graph: aren’t these the same thing?

Pros

Cons

Foundry IQ serverless pricing and lock-in: what’s the honest read?

Build vs buy: Foundry IQ vs pgvector plus a reranker

Foundry IQ is real infrastructure, not a RAG rebrand — but buy it for the right reasons

Builder’s take

Frequently asked questions

What is Foundry IQ in simple terms?

Is Foundry IQ just RAG?

What is the difference between Foundry IQ, Work IQ, and Fabric IQ?

How is Foundry IQ different from Microsoft Graph?

How much does Foundry IQ Serverless cost?

What is Web IQ in Foundry IQ?

Primary sources

Leave a Reply Cancel reply

More Popular from Alatirok

Tokens Per Agentic Coding Task: The 2026 Variance Data

What Is Cognition Devin? The Enterprise Guide for 2026

What Is Circle Agent Stack? USDC Wallets for AI Agents

AI Agent Identity: Entra Agent ID vs Okta vs SailPoint

Why Does My AI Agent Context Window Fill Up So Fast?

Migrate OpenAI Agent Builder to Agents SDK Before Nov 30

Best Voice AI Agent Framework 2026: Vapi vs LiveKit vs Pipecat

Purpose-Built Legal AI vs General LLM: 2026 Verdict

Categories

Quick Links