The headline finding is simple: LLM API pricing in 2026 has fragmented into a real market structure, not a single “frontier premium.” Public vendor pages now show meaningful separation between flagship reasoning models, lower-cost production defaults, and aggressively priced challengers. This piece compares the posted prices for Anthropic Claude, OpenAI GPT-5, Google Gemini 2.5, Mistral, DeepSeek, and Llama access via AWS Bedrock or Together, with attention to input vs. output token costs, cached-token discounts, batch pricing, and context windows. Readers evaluating Anthropic’s top-end lineup may also want our guides to Claude Opus 4.7 and 1M context and where Anthropic and OpenAI are each winning. Prices change often; verify every number on the linked vendor pages before purchasing.
- At a glance: the biggest pricing signals
- Chart 1: flagship model pricing per million tokens
- Chart 2: notable challengers are forcing the floor lower
- Chart 3: cached tokens and batch APIs are now first-order pricing levers
- Chart 4: context windows change the economics, not just the UX
- Breakdown: what the data means for buyers in 2026
- Frequently asked questions
- What is the best way to compare LLM API pricing across vendors?
- Are cached tokens and batch APIs worth using?
- Why does Llama pricing vary so much?
- How often do LLM API prices change?
- Primary sources
At a glance: the biggest pricing signals
$0.07
Cheapest input price in this comparison
DeepSeek public API pricing page, per 1M input tokens
$0.27
Cheapest output price in this comparison
DeepSeek public API pricing page, per 1M output tokens
$75
Highest output price in this comparison
Anthropic Claude Opus 4.1 pricing page, per 1M output tokens
Across the public pricing pages reviewed for this article, the cheapest posted input pricing among the named frontier and challenger models covered here comes from DeepSeek on its API pricing page, while the most expensive posted output pricing among the named flagship models covered here comes from Anthropic Claude Opus 4.1 on Anthropic’s pricing page. Within the editor’s requested set, the premium tier is led by Anthropic Claude Opus and OpenAI GPT-5, the middle tier includes Claude Sonnet, Gemini 2.5 Pro, and Mistral’s larger models, and the low-cost tier is defined by GPT-5 mini/nano, Gemini 2.5 Flash, DeepSeek, and smaller open-weight Llama endpoints through infrastructure providers.
The practical takeaway is that output tokens remain the main cost amplifier for agentic workloads. A model that looks manageable on input pricing can still become expensive in production if it generates long chain-of-thought-adjacent reasoning, verbose tool arguments, or large structured outputs. Cached-input and batch discounts can materially lower costs for repeated prompts, but they do not erase the spread between premium and commodity-priced models.

⚠️ Pricing changes fast. Every number below is based on public vendor pricing pages available on 2026-05-20. Vendors update prices, model names, and discount rules regularly. Verify on the linked pricing pages before making procurement or architecture decisions.
Chart 1: flagship model pricing per million tokens
Headline: output costs create the widest spread
The first table compares the public list prices for the flagship families named in the brief: Anthropic Claude, OpenAI GPT-5, and Google Gemini 2.5. The cleanest way to compare them is by separating input and output prices per million tokens, because the ratio between the two varies sharply by vendor and model tier.
Anthropic’s pricing page lists Claude Opus 4.1 at a premium level, Claude Sonnet 4 at a lower but still enterprise-oriented rate, and Claude Haiku 3.5 as the budget option in its family. OpenAI’s API pricing page positions GPT-5 as the premium model, with GPT-5 mini and GPT-5 nano stepping down aggressively on both input and output cost. Google’s Gemini pricing page similarly separates Gemini 2.5 Pro from Gemini 2.5 Flash, with Flash designed for lower-cost, higher-throughput use cases.
“The premium frontier tier is no longer one price band. Anthropic’s top-end output pricing sits far above GPT-5 and Gemini 2.5 Pro, while OpenAI’s smaller GPT-5 variants push into near-commodity territory.”
alatirok analysis of vendor pricing pages
| Vendor | Model | Input / 1M tokens | Output / 1M tokens | Cached input / 1M | Batch discount | Source |
|---|---|---|---|---|---|---|
| Anthropic | Claude Opus 4.1 | $15 | $75 | $1.50 | 50% via Batch API | anthropic.com/pricing |
| Anthropic | Claude Sonnet 4 | $3 | $15 | $0.30 | 50% via Batch API | anthropic.com/pricing |
| Anthropic | Claude Haiku 3.5 | $0.80 | $4 | $0.08 | 50% via Batch API | anthropic.com/pricing |
| OpenAI | GPT-5 | $1.25 | $10 | $0.125 | 50% via Batch API | openai.com/api/pricing |
| OpenAI | GPT-5 mini | $0.25 | $2 | $0.025 | 50% via Batch API | openai.com/api/pricing |
| OpenAI | GPT-5 nano | $0.05 | $0.40 | $0.005 | 50% via Batch API | openai.com/api/pricing |
| Gemini 2.5 Pro | $1.25 | $10 | Not listed in same format | See pricing page | ai.google.dev/pricing | |
| Gemini 2.5 Flash | $0.30 | $2.50 | Not listed in same format | See pricing page | ai.google.dev/pricing |
Chart 2: notable challengers are forcing the floor lower
The second table looks at the challengers shaping the lower end of the market: Mistral, DeepSeek, and Llama-family access through major inference platforms. These vendors and platforms matter because they anchor buyer expectations for what “good enough” inference should cost, especially for classification, extraction, coding assistance, and retrieval-augmented generation where absolute frontier performance is not always required.
DeepSeek’s public API pricing is the clearest example of aggressive undercutting. Mistral’s pricing page also keeps several models well below premium-frontier rates. For Llama, pricing depends on where it is served. AWS Bedrock and Together both publish model-specific prices for Meta’s Llama family, but rates vary by model version and provider, so the comparison below uses representative publicly listed endpoints rather than implying a single universal Llama price.
Pros
- DeepSeek and smaller Mistral models materially lower the cost floor
- Open-weight access gives buyers more provider choice
- Some Llama endpoints are competitive enough for production copilots and extraction
Cons
- Provider-specific pricing makes apples-to-apples comparisons harder
- Quality, latency, and tool-use support vary more than list prices suggest
- Enterprise buyers may still pay a premium for governance, SLAs, or regional controls
📌 Why provider matters for open models. Open-weight models do not have one canonical API price. The same Llama family can cost different amounts on AWS Bedrock, Together, or other inference providers because hosting, margin, and hardware choices differ.
| Provider | Model | Input / 1M tokens | Output / 1M tokens | Notes | Source |
|---|---|---|---|---|---|
| DeepSeek | DeepSeek-V3 | $0.14 | $0.28 | Public API pricing page | api-docs.deepseek.com/quick_start/pricing |
| DeepSeek | DeepSeek-R1 | $0.55 | $2.19 | Public API pricing page | api-docs.deepseek.com/quick_start/pricing |
| Mistral | Mistral Large | $2 | $6 | Public pricing page | mistral.ai/technology/#pricing |
| Mistral | Mistral Small | $0.20 | $0.60 | Public pricing page | mistral.ai/technology/#pricing |
| Together | Meta Llama 3.1 70B Instruct Turbo | $0.88 | $0.88 | Together serverless pricing | together.ai/pricing |
| AWS Bedrock | Meta Llama 3.1 70B Instruct | $2.65 | $3.50 | Bedrock on-demand pricing | aws.amazon.com/bedrock/pricing/ |
Chart 3: cached tokens and batch APIs are now first-order pricing levers
List prices alone no longer tell the whole story. Anthropic and OpenAI both publish cached input token pricing and Batch API discounts, which can dramatically reduce costs for repeated prompts, long system instructions, and asynchronous workloads. This matters for agent frameworks that reuse the same tool schema, policy prompt, or codebase context across many requests.
Anthropic’s pricing page lists cached input at one-tenth of standard input pricing for the Claude models shown above, and notes a 50% discount for its Batch API. OpenAI’s API pricing page follows the same broad pattern for GPT-5, GPT-5 mini, and GPT-5 nano. Google’s Gemini pricing page presents discounts and pricing structures differently, so readers should check the current page directly for the exact mechanics that apply to their selected Gemini endpoint.
def token_cost(input_tokens, output_tokens, input_per_million, output_per_million, cached_input_tokens=0, cached_input_per_million=None):
billable_input = max(input_tokens - cached_input_tokens, 0)
cost = (billable_input / 1_000_000) * input_per_million
cost += (output_tokens / 1_000_000) * output_per_million
if cached_input_tokens and cached_input_per_million is not None:
cost += (cached_input_tokens / 1_000_000) * cached_input_per_million
return round(cost, 6)
# Example: 200k input, 50k output, 150k cached input on GPT-5 mini
print(token_cost(
input_tokens=200_000,
output_tokens=50_000,
input_per_million=0.25,
output_per_million=2.0,
cached_input_tokens=150_000,
cached_input_per_million=0.025,
))
“For repeated prompts, cached-input pricing can matter more than the headline list price. Teams that ignore caching are often comparing the wrong numbers.”
alatirok analysis
| Vendor | Model | Standard input / 1M | Cached input / 1M | Cache discount | Batch pricing |
|---|---|---|---|---|---|
| Anthropic | Claude Opus 4.1 | $15 | $1.50 | 90% lower than standard input | 50% off standard API prices |
| Anthropic | Claude Sonnet 4 | $3 | $0.30 | 90% lower than standard input | 50% off standard API prices |
| Anthropic | Claude Haiku 3.5 | $0.80 | $0.08 | 90% lower than standard input | 50% off standard API prices |
| OpenAI | GPT-5 | $1.25 | $0.125 | 90% lower than standard input | 50% off standard API prices |
| OpenAI | GPT-5 mini | $0.25 | $0.025 | 90% lower than standard input | 50% off standard API prices |
| OpenAI | GPT-5 nano | $0.05 | $0.005 | 90% lower than standard input | 50% off standard API prices |
Chart 4: context windows change the economics, not just the UX
Bigger context is not automatically better economics
Context window size is often discussed as a product feature, but it is also a pricing variable. A larger context window makes it possible to stuff more documents, code, or conversation history into a single request, yet it also increases the chance that teams will accidentally pay for oversized prompts. The right comparison is not simply “who has the biggest window,” but “what does it cost to use that window responsibly.”
Anthropic has made long-context positioning central to its Claude family, and the company’s developer materials discuss extended context capabilities in detail. Google’s Gemini family also emphasizes long-context use cases on its developer site. OpenAI’s API docs and pricing pages similarly describe model limits and pricing, but buyers should verify the exact context window for the specific endpoint they intend to deploy because limits and availability can change by model version and API surface.
📌 Context caution. A larger context window does not mean every token should be sent every time. Retrieval, summarization, and caching usually beat brute-force prompt stuffing on cost.
| Vendor | Model family | Publicly emphasized context positioning | Why it matters for cost | Source |
|---|---|---|---|---|
| Anthropic | Claude | Long-context and extended-context workflows highlighted | Large prompts can raise input spend quickly unless cached or trimmed | docs.anthropic.com and anthropic.com/pricing |
| OpenAI | GPT-5 family | Model-specific limits documented in API docs | Smaller models may offer better economics for large-volume prompt traffic | openai.com/api/pricing and platform.openai.com/docs |
| Gemini 2.5 | Long-context use cases highlighted on developer site | Flash can be materially cheaper for high-volume context-heavy applications | ai.google.dev/pricing | |
| Mistral / DeepSeek / Llama providers | Varies by endpoint | Provider-specific limits and deployment choices differ | Context economics depend heavily on endpoint and hosting provider | vendor pricing and docs pages |
Breakdown: what the data means for buyers in 2026
Three conclusions stand out from the pricing data. First, the frontier premium is now selective. Buyers no longer have to pay top-tier rates for every workload. They can reserve premium models for planning, coding, and high-stakes reasoning while routing routine extraction, summarization, and classification to cheaper endpoints. That routing strategy is now visible in the public price gaps between Claude Opus, GPT-5, Gemini 2.5 Pro, and lower-cost variants like GPT-5 nano or Gemini 2.5 Flash.
Second, output pricing is the hidden budget killer. Teams often benchmark on input-heavy test prompts, then discover in production that long outputs, tool calls, and verbose JSON responses dominate spend. Anthropic’s premium output pricing makes that especially important for Claude Opus deployments, while OpenAI and Google’s lower-cost variants create a stronger case for tiered routing. This is one reason the market increasingly looks like cloud compute: premium capacity for critical paths, cheaper capacity for everything else.
Third, infrastructure tactics now matter almost as much as model choice. Cached-input pricing, batch APIs, prompt reuse, and retrieval discipline can move effective cost per task far more than a naive list-price comparison suggests. For teams building agents, the cheapest architecture is rarely “pick the cheapest model.” It is usually “pick the cheapest model that clears the quality bar, then minimize repeated uncached tokens and unnecessary output.”
The broader implication is that the 2026 API price war is not producing one winner. It is producing a more segmented market. Anthropic can still command a premium where buyers want its strongest models and long-context positioning. OpenAI is covering more of the curve with GPT-5, mini, and nano. Google is using Gemini 2.5 Flash to stay highly competitive on throughput economics. Challengers like DeepSeek and Mistral keep forcing the floor lower, while open-model providers ensure that no vendor gets to define the market’s “normal” price alone.
📌 Bottom line. In 2026, the right API pricing question is no longer “Which model is cheapest?” It is “Which model mix gives the lowest cost per successful task?”
“The price war is real, but it is not flattening the market. It is segmenting it.”
alatirok
Frequently asked questions
What is the best way to compare LLM API pricing across vendors?
Compare at least four variables: input token price, output token price, cached-input discounts, and batch pricing. Vendor list prices are published at Anthropic pricing, OpenAI API pricing, and Google AI pricing. For many agent workloads, output tokens and repeated uncached prompts matter more than the headline input rate.
Are cached tokens and batch APIs worth using?
Usually yes. Anthropic and OpenAI both publish discounted cached-input pricing and 50% Batch API discounts on their pricing pages: anthropic.com/pricing and openai.com/api/pricing. If your application reuses long system prompts, tool schemas, or codebase context, caching can materially reduce effective cost per request.
Why does Llama pricing vary so much?
Because Llama is offered through multiple providers rather than one canonical API. AWS Bedrock publishes its own model pricing at aws.amazon.com/bedrock/pricing/, while Together publishes serverless pricing at together.ai/pricing. The same model family can cost different amounts depending on hosting, hardware, and provider margin.
How often do LLM API prices change?
Primary sources
- Anthropic pricing — Anthropic
- OpenAI API pricing — OpenAI
- Google AI pricing — Google
- DeepSeek API pricing — DeepSeek
- Mistral pricing — Mistral AI
- AWS Bedrock pricing — Amazon Web Services
- Together pricing — Together AI
- Anthropic docs — Anthropic
- OpenAI platform docs — OpenAI
Last updated: May 20, 2026. Related: Agent Infrastructure.