A vendor-neutral explainer: what Foundry IQ actually is, how it differs from Work IQ, Fabric IQ, Web IQ and plain Microsoft Graph, whether it is really ‘just RAG’, and the honest pricing and lock-in caveats Microsoft’s own docs leave out.
What is Foundry IQ, in one precise definition?
Foundry IQ is Microsoft’s managed knowledge-and-retrieval layer for AI agents: it turns scattered enterprise content into a reusable, permission-aware knowledge base that any agent can query through one endpoint, using an agentic retrieval engine instead of classic single-shot RAG. If you are asking what is Foundry IQ at a technical level, the cleanest mental model is: a configurable, multi-source knowledge base plus a reasoning-driven retriever that decomposes questions, searches several sources in parallel, reranks, and returns grounded answers with citations.
Microsoft’s own definition, from the Foundry Learn docs, calls it “a managed knowledge layer that turns enterprise data into reusable, permission-aware knowledge bases for AI agents.” A knowledge base is the top-level object; it bundles knowledge sources (connections to internal and external data stores) and the parameters that control retrieval behavior. Multiple agents can share one knowledge base, which is the part that matters operationally — you configure retrieval once and reuse it across an agent fleet.
Foundry IQ was first introduced at Microsoft Ignite 2025 and then substantially expanded at Build 2026 (June 2026) with serverless retrieval, new knowledge sources, Web IQ, and an SLA-backed endpoint. Under the hood it is powered by Azure AI Search’s agentic retrieval pipeline; Foundry IQ is the productized, agent-facing experience of that pipeline inside the new Microsoft Foundry portal.

Foundry IQ = a multi-source, permission-aware knowledge base + an agentic (multi-query, reranking, iterating) retriever, exposed to agents behind one managed endpoint. It is the knowledge/grounding layer in Microsoft’s three-part IQ stack.
Foundry IQ explained: how does agentic retrieval actually work?
Foundry IQ’s agentic retrieval treats a query as a reasoning task: an LLM breaks one complex question into focused sub-queries, runs them in parallel across keyword, vector, and hybrid search, semantically reranks each result set, iterates when the signal is weak, and returns a unified answer with citations and an activity plan. That loop — plan, search, rerank, decide, search again or stop — is what separates it from a fixed retrieve-then-generate pipeline.
Walking the pipeline from the Azure AI Search docs: (1) your app calls the knowledge base with a query plus conversation history; (2) an LLM does query planning, analyzing the whole chat thread to find the underlying need and rewriting it into sub-queries — it even corrects spelling and expands synonyms; (3) all sub-queries execute simultaneously against your sources as keyword, vector, or hybrid searches, and each is semantically reranked (the L2 semantic ranker) to promote the best matches; (4) results are merged into a three-part response: grounding content, source references for citation, and an execution plan you can inspect.
Two design choices stand out. First, retrieval mode is index-driven: if your index has both text and vector fields, you get hybrid search automatically; if it only has a vector field, you get pure vector search. Second, the agentic loop adds latency on purpose — it invokes the full query pipeline multiple times — but runs those passes in parallel to keep response times reasonable. You trade a few hundred milliseconds and extra tokens for materially better recall on hard questions.
“Classic RAG does one retrieve-then-generate pass. Agentic retrieval turns retrieval into a control loop: plan, search in parallel, rerank, decide, and search again if the evidence is thin.”
Foundry IQ agentic retrieval, summarized
Is Foundry IQ just RAG? Agentic retrieval vs classic single-shot RAG
No — Foundry IQ is not ‘just RAG’, but it is not magic either: it is RAG upgraded from a single-shot pipeline into a multi-query control loop. Classic RAG embeds your question once, hits one index once, and generates. Foundry IQ’s agentic retrieval decomposes the question, searches multiple sources concurrently, reranks, iterates with follow-ups, and enforces permissions before answering. The honest framing: same core idea (ground the model in your data), meaningfully different machinery.
Microsoft’s published benchmarks put numbers on the gap. On complex, multi-hop queries, agentic retrieval improves response relevance by up to 36% versus traditional single-shot RAG, and Microsoft reports recall improvements of up to 54% over single-shot retrieval — roughly a +20-point lift in answer quality on the harder questions where single-shot RAG tends to miss a sub-fact. That delta is real, but it concentrates on compound and ambiguous questions; for a straightforward ‘what’s our refund policy’ lookup, single-shot RAG is often just as good and cheaper.
The table below is the side-by-side most marketing pages skip. Use it to decide honestly whether you need agentic retrieval or whether plain RAG already serves your traffic.
| Dimension | Classic single-shot RAG | Foundry IQ agentic retrieval |
|---|---|---|
| Query handling | One query, embedded once | LLM decomposes into focused sub-queries |
| Search modes | Single mode per pass (usually vector) | Concurrent keyword + vector + hybrid |
| Reranking | Optional, often a single reranker | Semantic L2 reranking on every sub-query |
| Iteration / follow-ups | None — retrieve once, then generate | Iterates and reissues queries if signal is weak |
| Sources per query | Typically one index | Multiple sources fanned out in parallel |
| Citations | DIY — you assemble references | Built-in references + activity plan returned |
| Permissions | You enforce at the app layer | Identity- and document-level enforced at query time |
| Billing unit | Per query (uniform) | Per token (varies with reasoning effort) |
| Best for | Simple, high-volume doc lookups | Multi-hop, ambiguous, multi-source questions |
Foundry IQ vs Work IQ vs Fabric IQ vs Web IQ: where are the boundaries?
The IQ stack is three layers plus a web extension. Fabric IQ is the semantic data foundation over OneLake and Power BI; Foundry IQ is the knowledge/retrieval middle layer that grounds custom agents; Work IQ is the Microsoft 365 collaboration-context top layer; and Web IQ extends Foundry IQ’s reach to live web search. Each is standalone, but they compose to give an agent structured-data context, grounded knowledge, and human-collaboration context at once.
Concretely: Fabric IQ models business data — ontologies, semantic models, graphs, data agents — so an agent can reason over analytics in OneLake and Power BI. Foundry IQ connects structured and unstructured content across Azure, SharePoint, OneLake, and the web into permission-aware knowledge bases for custom agents you build in Foundry. Work IQ is a contextual layer for Microsoft 365 that captures collaboration signals from documents, meetings, chats, and workflows so agents understand how your org actually operates. Web IQ brings live web, news, image, and video context with sub-165 ms latency and zero data retention.
The one-line separation that keeps people unstuck: Fabric IQ is for your data, Foundry IQ is for your knowledge, Work IQ is for your work patterns, and Web IQ is for the open web. The foundry iq vs work iq vs fabric iq confusion almost always comes from treating them as competitors — they are layers, and at Build 2026 Microsoft began letting Foundry IQ consume Work IQ and Fabric IQ as knowledge sources.
Fabric IQ → structured data (OneLake/Power BI). Foundry IQ → knowledge grounding/retrieval. Work IQ → M365 collaboration context. Web IQ → live web. Microsoft Graph → the access/permission plumbing underneath all of them.
Foundry IQ vs Microsoft Graph: aren’t these the same thing?
No. Microsoft Graph is the access layer; the IQ layers are the understanding layer. The cleanest phrasing is ‘Graph enables access, IQ enables understanding.’ Graph is the unified API that answers ‘which files or emails exist and who can see them’; Foundry IQ sits above it and answers ‘what is the relevant, grounded knowledge for this question.’
Microsoft Graph is a unified API and permission-aware access layer across Microsoft 365 — it retrieves emails, files, calendars, Teams, and SharePoint content and enforces who can see what. What Graph does not do is interpret meaning, decompose a question, rank by relevance, or assemble grounded answers. That is exactly the job Foundry IQ (and Work IQ and Fabric IQ) add on top.
Importantly, the two are complementary, not alternatives. Foundry IQ leans on identity and permissions plumbing — it runs queries under the caller’s Microsoft Entra identity, synchronizes access control lists, and honors Microsoft Purview sensitivity labels — to enforce document-level access at query time. In other words, the IQ layers ride on the same access fabric Graph represents; they add reasoning and retrieval, not a parallel permission system.
Pros
Cons
Foundry IQ serverless pricing and lock-in: what’s the honest read?
36%
Relevance lift on multi-hop
Agentic retrieval vs single-shot RAG (Microsoft)
54%
Recall improvement
Over single-shot retrieval (Microsoft)
$4.32
Example query-execution cost
2,000 retrievals, Microsoft worked example
<165 ms
Web IQ latency
Live web search, zero data retention
Foundry IQ Serverless is in public preview with scale-to-zero pricing: you pay only for compute and storage you use, and the service idles to zero when no queries run. Microsoft’s published preview rates are about $0.24 per compute unit/hour and up to $0.29 per GB/month of indexed storage, with a 1 GB/index limit, 30 indexes per service, and 5 services per subscription per region. Billing is expected to begin in late 2026. For bursty agent traffic, scale-to-zero is the headline benefit — no idle clusters to pay for.
But serverless tier pricing is only half the bill. Agentic retrieval also bills per token across two services: Azure AI Search charges for retrieval and reranking tokens (a free monthly allowance, then pay-as-you-go), and Azure OpenAI charges for the query-planning and answer-synthesis tokens on whatever model you assign. Microsoft’s own worked example — 2,000 retrievals with three sub-queries each, reranking 50 chunks per sub-query — lands at roughly $3.30 for reranking in Search plus about $1.02 for query planning in Azure OpenAI, for $4.32 total. That scales linearly with sub-query fan-out, so high reasoning effort on high volume gets expensive fast.
On lock-in: be clear-eyed. Foundry IQ is inseparable from Azure AI Search, leans on Entra and Purview for permissions, and its agent wiring lives in the Foundry portal. The moment you adopt Work IQ and Fabric IQ as knowledge sources, your retrieval logic, your permission model, and your data semantics are all Microsoft-native. That is fine if you are an Azure shop consolidating on Foundry — it is a meaningful caveat if you value portability. Price the exit (re-indexing, re-implementing permission-aware retrieval elsewhere) before you commit.
Build vs buy: Foundry IQ vs pgvector plus a reranker
Foundry IQ is real infrastructure, not a RAG rebrand — but buy it for the right reasons
Build it yourself with pgvector plus a reranker when your corpus is one or two tidy indexes, your questions are simple, and portability matters. Buy Foundry IQ when you have many messy sources, real document-level permissions to enforce, and questions that genuinely need decomposition across sources. The decision is less about capability and more about how much retrieval plumbing you want to own.
The DIY path is well-trodden and cheap: Postgres with the pgvector extension for storage and similarity search, an open or hosted reranker (a cross-encoder or a Cohere/Voyage-style rerank API) to reorder candidates, and a thin agent loop if you want query decomposition. You keep full control, full portability, and predictable per-query cost. What you also keep: ingestion and chunking, embedding refresh, ACL synchronization, citation assembly, multi-source routing, and the eval harness to prove it all works. Foundry IQ folds those into a managed service — that is precisely what you are paying for.
A blunt heuristic: if you can describe your retrieval as ‘one index, top-k, optional rerank,’ pgvector plus a reranker will be cheaper and more portable, and you should not buy. If you find yourself building query planners, fanning out to SharePoint plus OneLake plus the web, syncing Purview labels, and enforcing per-document permissions under each user’s identity, you are re-implementing Foundry IQ — and the managed version, with its SLA-backed endpoint, is likely the better trade.
Builder’s take
I build retrieval into production agents at Cyntr and Loomfeed, so I read the Foundry IQ launch through one lens: does it remove plumbing I’d otherwise own? Here’s my honest read.
- The real product is the multi-source knowledge base plus permission-aware retrieval behind one endpoint — not the agentic loop, which you can build yourself in an afternoon.
- Agentic retrieval is genuinely better on multi-hop questions, but it bills per token, not per query. On high-volume, simple lookups it can cost more than a single-shot vector search for no relevance gain. Match the reasoning effort to the question.
- Serverless scale-to-zero is the most underrated piece for bursty agent workloads — but it’s preview, single-region, and the moment you wire in Work IQ + Fabric IQ + Purview ACLs you are deep in the Microsoft stack. Price the exit before you price the entry.
- If your corpus is one tidy index and your questions are simple, pgvector plus a reranker still wins on cost and portability. Foundry IQ earns its keep when you have many messy sources, real document-level permissions, and questions that need decomposition.
Frequently asked questions
Foundry IQ is Microsoft’s managed knowledge layer for AI agents. It turns scattered enterprise data into reusable, permission-aware knowledge bases and uses agentic retrieval — decomposing a question into sub-queries, searching multiple sources in parallel, reranking, and iterating — to return grounded answers with citations behind one endpoint.
Not exactly. It is the same core idea — grounding a model in your data — but the machinery is upgraded from single-shot RAG into a multi-query control loop. Foundry IQ decomposes the query, runs concurrent keyword/vector/hybrid searches, semantically reranks, iterates when evidence is thin, and enforces permissions. Microsoft reports up to 36% better relevance and 54% better recall on multi-hop questions versus single-shot RAG.
They are three layers, not competitors. Fabric IQ is the semantic data foundation over OneLake and Power BI; Foundry IQ is the knowledge and retrieval middle layer that grounds custom agents; Work IQ is the Microsoft 365 collaboration-context top layer. Foundry IQ can even consume Work IQ and Fabric IQ as knowledge sources.
Microsoft Graph is the access layer — a unified API that retrieves Microsoft 365 content and enforces who can see what. Foundry IQ is the understanding layer that sits above it: it decomposes questions, ranks by relevance, and returns grounded answers. Graph enables access; IQ enables understanding. Foundry IQ uses the same identity and permission fabric to enforce access at query time.
Foundry IQ Serverless is in public preview with scale-to-zero pricing — roughly $0.24 per compute unit/hour and up to $0.29 per GB/month of storage, with a 1 GB/index limit. Billing is expected to start in late 2026. On top of the tier, agentic retrieval bills per token in both Azure AI Search (reranking) and Azure OpenAI (query planning and answer synthesis); Microsoft’s worked example totals about $4.32 for 2,000 retrievals.
Web IQ is the extension that gives Foundry IQ agents live web context — web, news, images, video, and shopping — with sub-165 ms latency and zero data retention, respecting publisher preferences. It lets agents ground answers in fresh public information alongside your private enterprise knowledge sources.
Primary sources
- What is Foundry IQ? — Microsoft Learn
- Agentic retrieval overview — Microsoft Learn
- Build smarter agents faster with Foundry IQ — Microsoft Foundry Blog
- Foundry IQ: boost response relevance by 36% with agentic retrieval — Microsoft Community Hub
- Work IQ, Fabric IQ, Foundry IQ vs Microsoft Graph — VisualLabs
- Making Sense of Microsoft’s AI Strategy: Work IQ, Fabric IQ, Foundry IQ — James Serra’s Blog
Last updated: June 3, 2026. Related: Agent Infrastructure.