The short answer is mostly no by default, but that changed on June 15, 2026 when Anthropic split Agent SDK usage into its own credit pool. Here is the provider-by-provider breakdown.
Do AI agents have separate rate limits? The short answer
By default, no: AI agents do not have separate rate limits, and your agent traffic draws from the same provider quota and API key as everything else you run. The big exception arrived on June 15, 2026, when Anthropic moved Agent SDK and third-party programmatic usage on Claude subscriptions into a separate monthly credit pool, distinct from your chat usage. So the honest 2026 answer to “do AI agents have separate rate limits” is: it depends on the provider, and it changed this month.
Here is the distinction that matters. There are two different meters in play. One is the API rate limit (RPM and TPM ceilings on your API key), and the other is the subscription usage limit (the cap on a consumer plan like Claude Pro or Max). For raw API access at OpenAI, Anthropic, Azure, and AWS Bedrock, agents and chat share whatever quota lives on the key, project, or account you point them at, unless you deliberately isolate them.
What Anthropic changed is the subscription side. If you use a paid Claude plan and run the Agent SDK, the headless claude -p command, Claude Code GitHub Actions, or a third-party app over the Agent Client Protocol, that programmatic traffic now bills against a dedicated per-user credit pool instead of the same allowance that powers your interactive chat. That is the closest thing to “agents have their own rate limit” any major provider currently ships, and the rest of this guide gives you the provider-by-provider table plus the gateway workaround to build the isolation yourself.

Agents share your quota by default. The exception is Anthropic’s Claude subscriptions: from June 15, 2026, Agent SDK and third-party programmatic usage draws from a separate per-user monthly credit pool, not your chat allowance.
Is agent traffic metered separately? Provider-by-provider table
Across the four major providers, only Anthropic‘s consumer Claude subscriptions meter agent traffic separately as of June 2026. On every raw API surface (OpenAI, Anthropic’s own API, Azure OpenAI in Foundry, and AWS Bedrock), agent and chat requests share the same quota scope unless you split them across separate keys, projects, or deployments. The table below is the answer to “do agents count against my API quota” for each provider.
Read it as the meter scope. OpenAI counts at the organization and project level. Anthropic’s API counts per organization tier. Azure OpenAI in Microsoft Foundry counts per region, per subscription, per model or deployment. Bedrock counts per model, per region, per account. None of them know or care whether the caller is a chat UI or an autonomous agent. The only built-in separation is Anthropic’s new subscription credit pool, which sits above the API entirely.
| Provider | Shares chat quota by default? | RPM / TPM model | On limit | How to isolate agents |
|---|---|---|---|---|
| Anthropic (Claude subscription) | No, since June 15, 2026 | Separate per-user credit pool for Agent SDK / claude -p / GitHub Actions / 3rd-party ACP apps | SDK requests stop, or flow to usage credits at standard API rates if enabled | Already separate; chat stays on regular plan limits |
| Anthropic API (direct) | Yes | Tier 1: 50 RPM / 30k ITPM / 8k OTPM to Tier 4: 4,000 RPM / 2M ITPM / 400k OTPM | HTTP 429 rate_limit_error + retry-after header | Separate API key per agent, or a gateway virtual key |
| OpenAI API | Yes | RPM, TPM, RPD, TPD enforced at org + project level | HTTP 429 Too Many Requests | Separate project or separate API key per agent |
| Azure OpenAI (Foundry) | Yes | TPM + RPM per region, per subscription, per model/deployment; quota tiers 0-6 | HTTP 429 Too Many Requests | Separate deployment per agent; split TPM across deployments |
| AWS Bedrock | Yes | Token-based quotas per model/region/account (RPM no longer enforced on bedrock-runtime; e.g. Claude Sonnet 4.6 ~5M TPM) | HTTP 429 ThrottlingException | Separate AWS account or inference profile; gateway virtual key |
What changed with the Anthropic Agent SDK separate credit pool on June 15, 2026?
Starting June 15, 2026, programmatic Claude usage on a paid subscription draws from a new per-user monthly credit pool rather than your regular chat allowance. That covers the Claude Agent SDK, the headless claude -p command, Claude Code GitHub Actions, and third-party apps built on the Agent Client Protocol, while interactive chat and the Claude Code terminal keep using your normal subscription limits. This is the anthropic agent sdk separate credit pool that builders have been searching for, and it is the single biggest 2026 shift on this question.
The credit amounts are per-user and not pooled across a team. Reporting on the change lists Pro at $20 per month, Max 5x at $100, Max 20x at $200, Team premium at $100 per seat, and Enterprise premium at $200 per seat. Three mechanics matter most. First, you must complete a one-time opt-in to claim the credit, after which it refreshes automatically each billing cycle. Second, unused credit does not roll over. Third, when the credit runs out, Agent SDK requests either stop entirely or fall through to usage credits at standard API pricing, but only if you have usage credits explicitly enabled.
What still counts against your normal subscription quota: chatting with Claude on web, desktop, or mobile, the interactive Claude Code terminal and IDE experience, and Claude Cowork. In other words, Anthropic drew a line between “you, typing” and “your code, calling.” The practical upshot for the claude programmatic credit pool june 2026 question: your nightly agent run no longer silently eats the allowance you wanted for your own coding session, but you now have a second meter to budget for and monitor.
“Anthropic drew a line between ‘you, typing’ and ‘your code, calling.’ That is the closest any major provider comes to giving agents their own rate limit.”
Alatirok analysis of Anthropic’s June 15, 2026 billing change
Do agents count against my API quota when I use the raw API?
Yes. On the raw API, agent traffic counts against the same quota as every other call on that key, project, or account. OpenAI enforces rate limits at the organization and project level; Anthropic’s API enforces per-organization tier limits; Azure OpenAI counts per region, subscription, and deployment; Bedrock counts per model, region, and account. None of them isolate agents automatically. So if you are asking “do agents count against my api quota,” the default answer on the API path is an unambiguous yes.
Concrete numbers make this real. On the Anthropic API, Tier 1 gives roughly 50 RPM with 30,000 input tokens per minute and 8,000 output tokens per minute, climbing to Tier 4 at 4,000 RPM, 2,000,000 ITPM, and 400,000 OTPM. Note that Anthropic meters input and output tokens separately (ITPM and OTPM), so a single combined TPM figure understates the picture. OpenAI’s tiers scale RPM and TPM with cumulative spend, and limits are hit across RPM, TPM, RPD, or TPD, whichever you breach first.
Azure OpenAI in Microsoft Foundry is the most explicitly scoped: TPM and RPM are defined per region, per subscription, and per model or deployment type, so a gpt-5.1 deployment listed at 1,000,000 TPM and 10,000 RPM gets that pool in each region for each subscription. Bedrock took the opposite simplifying step in 2026, dropping RPM enforcement on the bedrock-runtime endpoint and governing throughput purely by token-based quotas per model, region, and account, with newer models like Claude Sonnet 4.6 carrying around 5M TPM. The throughline: the meter follows the credential and scope, never the caller’s intent.
What is an AI agent 429 rate limit error and why do agents trigger it more?
An AI agent 429 rate limit error means the provider rejected your request because you exceeded an RPM or TPM ceiling. Agents trigger it far more than chat because a single autonomous task can fire dozens of multi-thousand-token calls in a tight loop, and traditional request-count limiting cannot see the difference between a cheap call and an expensive one. If you run agents, the ai agent 429 rate limit error is the symptom you will fight most.
Every provider surfaces this as HTTP 429, with provider-specific bodies. OpenAI and Azure return 429 Too Many Requests. Anthropic returns a 429 with an error type of rate_limit_error, a message naming the limit you breached, and a retry-after header telling you how many seconds to wait. Bedrock raises a 429 ThrottlingException. The right client behavior is identical across all of them: exponential backoff with jitter, and respecting retry-after when present.
The deeper reason agents 429 so easily is that request counting is the wrong unit. As Zuplo puts it, a prompt with 50 tokens and a prompt with 10,000 tokens both count as one request, but the compute cost, latency, and provider charges are drastically different, and one chat completion that burns 8,000 tokens gets the same single-request tick as a lightweight metadata lookup. A single agent request can cost 100x a typical human request. That is why token-based (TPM) limiting, not request-based (RPM) limiting, is what actually stops a runaway agent from exhausting your budget or your quota.
Request-count limits are misleading for agents: a 50-token call and a 10,000-token call both tick as ‘1 request.’ Token-based (TPM) limiting is what actually stops a runaway agent from draining your qHow do I give my AI agents separate rate limits with a gateway?
To give agents their own rate limits when the provider won’t, put a self-hosted LLM gateway in front of your API keys and issue each agent a virtual key with its own TPM and RPM ceiling. Gateways like Portkey, TrueFoundry, and LiteLLM let one parent provider credential fan out into many isolated per-agent quotas, so a noisy agent can only 429 itself, not its siblings. This is the per-agent rate limiting llm pattern, and it is the practical answer when you need isolation that the provider doesn’t offer natively.
The mechanism is a virtual account or virtual key. In TrueFoundry’s model, a rate-limit rule targets a subject such as virtualaccount:va-james, giving that agent a dedicated counter while every agent still shares a single parent API key upstream. You define the policy declaratively, for example a per-day request cap or a tokens-per-minute cap, and you can layer rules by user, team, model, environment, or customer ID. Because each virtual key keeps its own counter, an agent that exceeds its limit triggers 429s only for its own traffic, solving the noisy-neighbor problem where one runaway bot starves everyone sharing the pool.
The same gateway gives you token-aware limiting (requests_per_minute and tokens_per_minute side by side), automatic fallback to a secondary model on 429, and one observability surface to watch blocked requests by rule and by agent. If you run more than a handful of agents against shared keys, this is the cleanest way to enforce both throughput ceilings and spend ceilings without waiting for each provider to ship native per-agent quotas.
For deeper builds, pair this with our work on agent FinOps, token cost per task, the best LLM gateway shortlist, and enterprise agent pricing. Rate limiting and unit cost are two views of the same control problem: one caps throughput, the other caps spend.
# TrueFoundry-style gateway rule: give each agent its own TPM/RPM ceiling
# so one runaway agent can only 429 itself, not the shared pool.
rate_limiting_rules:
# Planner agent: token-heavy, cap on tokens not requests
- id: planner-agent-tpm
when:
subjects: ["virtualaccount:agent-planner"]
models: ["anthropic/claude-sonnet-4-6"]
limit_to: 200000
unit: tokens_per_minute
# Tool-runner agent: chatty, cap on requests
- id: tool-runner-rpm
when:
subjects: ["virtualaccount:agent-tool-runner"]
limit_to: 600
unit: requests_per_minute
# Safety net: hard daily ceiling across all agent virtual keys
- id: fleet-daily-cap
when:
subjects: ["team:agent-fleet"]
limit_to: 5000000
unit: tokens_per_day
What does the Anthropic credit pool change mean for your 2026 agent budget?
Verdict: agents share your quota by default, except on Claude subscriptions after June 15, 2026
It means you now budget two meters for Claude: your normal subscription for interactive work, and a separate, non-rolling, per-user credit pool for everything programmatic. Plan for the credit to run out mid-month on any serious agent workload, and decide in advance whether jobs should hard-stop or fall through to standard API pricing. This is the part of the claude agent sdk rate limits 2026 story that hits your invoice, not just your throughput.
Do the math on your own usage. The per-user credit ranges from $20 (Pro) to $200 (Max 20x and Enterprise premium), billed against standard API rates once you cross into usage credits, for example $3 per million input and $15 per million output tokens for Sonnet 4.6, and $5 per million input and $25 per million output for Opus 4.7. A single agent that processes a few million tokens a day will exhaust a $20 Pro credit quickly, which is by design: Anthropic is unbundling the programmatic subsidy from the flat-rate chat plan.
Three action items. First, claim the credit through the one-time opt-in so your agents don’t break on June 15. Second, decide your overflow posture: enable usage credits if continuity matters, or leave them off if you’d rather a job fail loudly than run up a bill. Third, instrument the meter, because the credit does not roll over, so unused headroom is lost and overruns are silent unless you watch them. For teams running fleets, route programmatic traffic through a gateway so you can attribute every token to a specific agent and cap it before it touches the credit pool.
Pros
Cons
Builder’s take
I run Cyntr’s agent fleet against multiple frontier providers, and the question “do my agents draw from the same quota as my chat?” is the one that actually bites teams in production. Here is how I think about it as someone who pays the bills.
- Treat the API key as a blast radius, not a credential. The moment two agents share one key, a runaway loop on agent A can 429 agent B. We give every Cyntr persona its own virtual key with its own TPM ceiling so no single bot can starve the pool.
- Anthropic’s June 15 split is the first time a major provider drew a hard line between ‘human me’ and ‘programmatic me’ on the same subscription. Budget for it: that $20-$200 credit is per-user, doesn’t roll over, and your claude -p jobs stop dead when it’s gone unless you’ve enabled usage credits.
- Request-count limits are a lie for agents. A planning step that burns 12,000 tokens and a one-line tool call both tick ‘1 request.’ If you’re not limiting on tokens, you’re not limiting anything that matters to your invoice.
- Self-hosting a gateway (LiteLLM, Portkey, TrueFoundry) is the cleanest workaround. It gives you per-agent isolation on top of providers that only meter at the org or project level, plus one place to watch 429s.
- Cross-link this with your token-cost-per-task work. Rate limits and unit cost are the same FinOps problem viewed from two angles: one caps throughput, the other caps spend.
Frequently asked questions
Not by default on raw APIs. OpenAI, the Anthropic API, Azure OpenAI, and AWS Bedrock all meter agent and chat traffic against the same key, project, or account quota, so they 429 together. The exception is Anthropic’s Claude subscriptions: from June 15, 2026, Agent SDK and third-party programmatic usage draws from a separate per-user monthly credit pool rather than your chat allowance.
It is a per-user monthly credit, separate from your chat usage, that covers programmatic Claude usage: the Agent SDK, the headless claude -p command, Claude Code GitHub Actions, and third-party apps over the Agent Client Protocol. Reported amounts are $20 (Pro), $100 (Max 5x), $200 (Max 20x), $100/seat (Team premium), and $200/seat (Enterprise premium). It requires a one-time opt-in, does not roll over, and when exhausted, requests stop or fall through to standard API pricing if usage credits are enabled.
Yes, on the raw API. Agent calls consume the same RPM and TPM as any other request on that credential. OpenAI counts at the organization and project level, Anthropic’s API counts per organization tier, Azure OpenAI counts per region/subscription/deployment, and Bedrock counts per model/region/account. To separate them, use a distinct project, key, or deployment per agent, or front everything with a gateway that issues per-agent virtual keys.
A 429 is the provider rejecting a request because you exceeded an RPM or TPM ceiling. OpenAI and Azure return 429 Too Many Requests; Anthropic returns a 429 with error type rate_limit_error plus a retry-after header; Bedrock returns a 429 ThrottlingException. Agents hit 429s more often because a single task fires many large calls in a loop. The fix is exponential backoff with jitter and token-based (TPM) limiting rather than request counting.
Because a prompt with 50 tokens and a prompt with 10,000 tokens both count as one request, yet their cost, latency, and provider charges differ enormously. A single agent request can cost 100x a typical human request. Request counting lets a token-heavy agent drain your budget while staying under an RPM cap, so token-per-minute (TPM) limits are required to actually stop a runaway agent.
Put a self-hosted LLM gateway such as Portkey, TrueFoundry, or LiteLLM in front of your provider keys and issue each agent a virtual key with its own TPM and RPM ceiling. Each virtual key keeps a separate counter, so an agent that exceeds its limit triggers 429s only for itself, not for sibling agents sharing the same upstream credential. You can layer the limits by agent, team, model, or environment and add token-based caps.
Primary sources
- Anthropic Claude Credit Overhaul – June 15, 2026 (plan-by-plan breakdown) — Digital Applied
- Anthropic splits billing again: Agent SDK gets separate credit pools — The New Stack
- Token-Based Rate Limiting: How to Manage AI Agent API Traffic in 2026 — Zuplo
- Rate Limiting in AI Gateway: virtual keys and per-agent isolation — TrueFoundry
- OpenAI API Rate Limits (org/project level, 429, RPM/TPM) — OpenAI
- Azure OpenAI in Microsoft Foundry Models Quotas and Limits — Microsoft Learn
- Quotas for Amazon Bedrock (per-model/region/account, ThrottlingException) — Amazon Web Services
- AI API Rate Limits 2026: OpenAI, Anthropic, Gemini RPM, TPM & 429 — DevTk.AI
Last updated: June 3, 2026. Related: Capital.