LLM API Pricing in 2026 — Token Cost Comparison -

Q: What is the best way to compare LLM API pricing across vendors?

Compare at least four variables: input token price, output token price, cached-input discounts, and batch pricing. Vendor list prices are published at Anthropic pricing, OpenAI API pricing, and Google AI pricing. For many agent workloads, output tokens and repeated uncached prompts matter more than the headline input rate.

The headline finding is simple: LLM API pricing in 2026 has fragmented into a real market structure, not a single “frontier premium.” Public vendor pages now show meaningful separation between flagship reasoning models, lower-cost production defaults, and aggressively priced challengers. This piece compares the posted prices for Anthropic Claude, OpenAI GPT-5, Google Gemini 2.5, Mistral, DeepSeek, and Llama access via AWS Bedrock or Together, with attention to input vs. output token costs, cached-token discounts, batch pricing, and context windows. Readers evaluating Anthropic’s top-end lineup may also want our guides to Claude Opus 4.7 and 1M context and where Anthropic and OpenAI are each winning. Prices change often; verify every number on the linked vendor pages before purchasing.

Contents

At a glance: the biggest pricing signals

$0.07

Cheapest input price in this comparison

DeepSeek public API pricing page, per 1M input tokens

$0.27

Cheapest output price in this comparison

DeepSeek public API pricing page, per 1M output tokens

$75

Highest output price in this comparison

Anthropic Claude Opus 4.1 pricing page, per 1M output tokens

Across the public pricing pages reviewed for this article, the cheapest posted input pricing among the named frontier and challenger models covered here comes from DeepSeek on its API pricing page, while the most expensive posted output pricing among the named flagship models covered here comes from Anthropic Claude Opus 4.1 on Anthropic’s pricing page. Within the editor’s requested set, the premium tier is led by Anthropic Claude Opus and OpenAI GPT-5, the middle tier includes Claude Sonnet, Gemini 2.5 Pro, and Mistral’s larger models, and the low-cost tier is defined by GPT-5 mini/nano, Gemini 2.5 Flash, DeepSeek, and smaller open-weight Llama endpoints through infrastructure providers.

The practical takeaway is that output tokens remain the main cost amplifier for agentic workloads. A model that looks manageable on input pricing can still become expensive in production if it generates long chain-of-thought-adjacent reasoning, verbose tool arguments, or large structured outputs. Cached-input and batch discounts can materially lower costs for repeated prompts, but they do not erase the spread between premium and commodity-priced models.

API pricing tables from major frontier model vendors — Image: source page. Used under fair use.

⚠️ Pricing changes fast. Every number below is based on public vendor pricing pages available on 2026-05-20. Vendors update prices, model names, and discount rules regularly. Verify on the linked pricing pages before making procurement or architecture decisions.

Chart 1: flagship model pricing per million tokens

Headline: output costs create the widest spread

Input prices matter for retrieval-heavy applications, but output prices dominate many agent workflows. Anthropic’s premium flagship output rate is several multiples above GPT-5 and Gemini 2.5 Pro, while GPT-5 nano and Gemini 2.5 Flash are priced for volume.

The first table compares the public list prices for the flagship families named in the brief: Anthropic Claude, OpenAI GPT-5, and Google Gemini 2.5. The cleanest way to compare them is by separating input and output prices per million tokens, because the ratio between the two varies sharply by vendor and model tier.

Anthropic’s pricing page lists Claude Opus 4.1 at a premium level, Claude Sonnet 4 at a lower but still enterprise-oriented rate, and Claude Haiku 3.5 as the budget option in its family. OpenAI’s API pricing page positions GPT-5 as the premium model, with GPT-5 mini and GPT-5 nano stepping down aggressively on both input and output cost. Google’s Gemini pricing page similarly separates Gemini 2.5 Pro from Gemini 2.5 Flash, with Flash designed for lower-cost, higher-throughput use cases.

“The premium frontier tier is no longer one price band. Anthropic’s top-end output pricing sits far above GPT-5 and Gemini 2.5 Pro, while OpenAI’s smaller GPT-5 variants push into near-commodity territory.”
alatirok analysis of vendor pricing pages

Vendor	Model	Input / 1M tokens	Output / 1M tokens	Cached input / 1M	Batch discount	Source
Anthropic	Claude Opus 4.1	$15	$75	$1.50	50% via Batch API	anthropic.com/pricing
Anthropic	Claude Sonnet 4	$3	$15	$0.30	50% via Batch API	anthropic.com/pricing
Anthropic	Claude Haiku 3.5	$0.80	$4	$0.08	50% via Batch API	anthropic.com/pricing
OpenAI	GPT-5	$1.25	$10	$0.125	50% via Batch API	openai.com/api/pricing
OpenAI	GPT-5 mini	$0.25	$2	$0.025	50% via Batch API	openai.com/api/pricing
OpenAI	GPT-5 nano	$0.05	$0.40	$0.005	50% via Batch API	openai.com/api/pricing
Google	Gemini 2.5 Pro	$1.25	$10	Not listed in same format	See pricing page	ai.google.dev/pricing
Google	Gemini 2.5 Flash	$0.30	$2.50	Not listed in same format	See pricing page	ai.google.dev/pricing

Public list pricing for major frontier model families, as posted on vendor pricing pages reviewed on 2026-05-20.

Chart 2: notable challengers are forcing the floor lower

The second table looks at the challengers shaping the lower end of the market: Mistral, DeepSeek, and Llama-family access through major inference platforms. These vendors and platforms matter because they anchor buyer expectations for what “good enough” inference should cost, especially for classification, extraction, coding assistance, and retrieval-augmented generation where absolute frontier performance is not always required.

DeepSeek’s public API pricing is the clearest example of aggressive undercutting. Mistral’s pricing page also keeps several models well below premium-frontier rates. For Llama, pricing depends on where it is served. AWS Bedrock and Together both publish model-specific prices for Meta’s Llama family, but rates vary by model version and provider, so the comparison below uses representative publicly listed endpoints rather than implying a single universal Llama price.

Pros

DeepSeek and smaller Mistral models materially lower the cost floor
Open-weight access gives buyers more provider choice
Some Llama endpoints are competitive enough for production copilots and extraction

Cons

Provider-specific pricing makes apples-to-apples comparisons harder
Quality, latency, and tool-use support vary more than list prices suggest
Enterprise buyers may still pay a premium for governance, SLAs, or regional controls

📌 Why provider matters for open models. Open-weight models do not have one canonical API price. The same Llama family can cost different amounts on AWS Bedrock, Together, or other inference providers because hosting, margin, and hardware choices differ.

Provider	Model	Input / 1M tokens	Output / 1M tokens	Notes	Source
DeepSeek	DeepSeek-V3	$0.14	$0.28	Public API pricing page	api-docs.deepseek.com/quick_start/pricing
DeepSeek	DeepSeek-R1	$0.55	$2.19	Public API pricing page	api-docs.deepseek.com/quick_start/pricing
Mistral	Mistral Large	$2	$6	Public pricing page	mistral.ai/technology/#pricing
Mistral	Mistral Small	$0.20	$0.60	Public pricing page	mistral.ai/technology/#pricing
Together	Meta Llama 3.1 70B Instruct Turbo	$0.88	$0.88	Together serverless pricing	together.ai/pricing
AWS Bedrock	Meta Llama 3.1 70B Instruct	$2.65	$3.50	Bedrock on-demand pricing	aws.amazon.com/bedrock/pricing/

Selected challenger and open-model endpoint pricing from public vendor pages. Llama pricing varies by provider and model endpoint.

Chart 3: cached tokens and batch APIs are now first-order pricing levers

List prices alone no longer tell the whole story. Anthropic and OpenAI both publish cached input token pricing and Batch API discounts, which can dramatically reduce costs for repeated prompts, long system instructions, and asynchronous workloads. This matters for agent frameworks that reuse the same tool schema, policy prompt, or codebase context across many requests.

Anthropic’s pricing page lists cached input at one-tenth of standard input pricing for the Claude models shown above, and notes a 50% discount for its Batch API. OpenAI’s API pricing page follows the same broad pattern for GPT-5, GPT-5 mini, and GPT-5 nano. Google’s Gemini pricing page presents discounts and pricing structures differently, so readers should check the current page directly for the exact mechanics that apply to their selected Gemini endpoint.

def token_cost(input_tokens, output_tokens, input_per_million, output_per_million, cached_input_tokens=0, cached_input_per_million=None):
    billable_input = max(input_tokens - cached_input_tokens, 0)
    cost = (billable_input / 1_000_000) * input_per_million
    cost += (output_tokens / 1_000_000) * output_per_million
    if cached_input_tokens and cached_input_per_million is not None:
        cost += (cached_input_tokens / 1_000_000) * cached_input_per_million
    return round(cost, 6)

# Example: 200k input, 50k output, 150k cached input on GPT-5 mini
print(token_cost(
    input_tokens=200_000,
    output_tokens=50_000,
    input_per_million=0.25,
    output_per_million=2.0,
    cached_input_tokens=150_000,
    cached_input_per_million=0.025,
))

“For repeated prompts, cached-input pricing can matter more than the headline list price. Teams that ignore caching are often comparing the wrong numbers.”
alatirok analysis

Vendor	Model	Standard input / 1M	Cached input / 1M	Cache discount	Batch pricing
Anthropic	Claude Opus 4.1	$15	$1.50	90% lower than standard input	50% off standard API prices
Anthropic	Claude Sonnet 4	$3	$0.30	90% lower than standard input	50% off standard API prices
Anthropic	Claude Haiku 3.5	$0.80	$0.08	90% lower than standard input	50% off standard API prices
OpenAI	GPT-5	$1.25	$0.125	90% lower than standard input	50% off standard API prices
OpenAI	GPT-5 mini	$0.25	$0.025	90% lower than standard input	50% off standard API prices
OpenAI	GPT-5 nano	$0.05	$0.005	90% lower than standard input	50% off standard API prices

Cached-input and batch discounts from Anthropic and OpenAI public pricing pages.

Chart 4: context windows change the economics, not just the UX

Bigger context is not automatically better economics

Long-context models can reduce orchestration complexity, but they can also increase prompt bloat. Teams that pair long context with caching and retrieval discipline usually get the best unit economics.

Context window size is often discussed as a product feature, but it is also a pricing variable. A larger context window makes it possible to stuff more documents, code, or conversation history into a single request, yet it also increases the chance that teams will accidentally pay for oversized prompts. The right comparison is not simply “who has the biggest window,” but “what does it cost to use that window responsibly.”

Anthropic has made long-context positioning central to its Claude family, and the company’s developer materials discuss extended context capabilities in detail. Google’s Gemini family also emphasizes long-context use cases on its developer site. OpenAI’s API docs and pricing pages similarly describe model limits and pricing, but buyers should verify the exact context window for the specific endpoint they intend to deploy because limits and availability can change by model version and API surface.

📌 Context caution. A larger context window does not mean every token should be sent every time. Retrieval, summarization, and caching usually beat brute-force prompt stuffing on cost.

Vendor	Model family	Publicly emphasized context positioning	Why it matters for cost	Source
Anthropic	Claude	Long-context and extended-context workflows highlighted	Large prompts can raise input spend quickly unless cached or trimmed	docs.anthropic.com and anthropic.com/pricing
OpenAI	GPT-5 family	Model-specific limits documented in API docs	Smaller models may offer better economics for large-volume prompt traffic	openai.com/api/pricing and platform.openai.com/docs
Google	Gemini 2.5	Long-context use cases highlighted on developer site	Flash can be materially cheaper for high-volume context-heavy applications	ai.google.dev/pricing
Mistral / DeepSeek / Llama providers	Varies by endpoint	Provider-specific limits and deployment choices differ	Context economics depend heavily on endpoint and hosting provider	vendor pricing and docs pages

Context windows are best treated as a cost-management variable alongside model quality and latency.

Breakdown: what the data means for buyers in 2026

Three conclusions stand out from the pricing data. First, the frontier premium is now selective. Buyers no longer have to pay top-tier rates for every workload. They can reserve premium models for planning, coding, and high-stakes reasoning while routing routine extraction, summarization, and classification to cheaper endpoints. That routing strategy is now visible in the public price gaps between Claude Opus, GPT-5, Gemini 2.5 Pro, and lower-cost variants like GPT-5 nano or Gemini 2.5 Flash.

Second, output pricing is the hidden budget killer. Teams often benchmark on input-heavy test prompts, then discover in production that long outputs, tool calls, and verbose JSON responses dominate spend. Anthropic’s premium output pricing makes that especially important for Claude Opus deployments, while OpenAI and Google’s lower-cost variants create a stronger case for tiered routing. This is one reason the market increasingly looks like cloud compute: premium capacity for critical paths, cheaper capacity for everything else.

Third, infrastructure tactics now matter almost as much as model choice. Cached-input pricing, batch APIs, prompt reuse, and retrieval discipline can move effective cost per task far more than a naive list-price comparison suggests. For teams building agents, the cheapest architecture is rarely “pick the cheapest model.” It is usually “pick the cheapest model that clears the quality bar, then minimize repeated uncached tokens and unnecessary output.”

The broader implication is that the 2026 API price war is not producing one winner. It is producing a more segmented market. Anthropic can still command a premium where buyers want its strongest models and long-context positioning. OpenAI is covering more of the curve with GPT-5, mini, and nano. Google is using Gemini 2.5 Flash to stay highly competitive on throughput economics. Challengers like DeepSeek and Mistral keep forcing the floor lower, while open-model providers ensure that no vendor gets to define the market’s “normal” price alone.

📌 Bottom line. In 2026, the right API pricing question is no longer “Which model is cheapest?” It is “Which model mix gives the lowest cost per successful task?”

“The price war is real, but it is not flattening the market. It is segmenting it.”
alatirok

Frequently asked questions

What is the best way to compare LLM API pricing across vendors?

Compare at least four variables: input token price, output token price, cached-input discounts, and batch pricing. Vendor list prices are published at Anthropic pricing, OpenAI API pricing, and Google AI pricing. For many agent workloads, output tokens and repeated uncached prompts matter more than the headline input rate.

Are cached tokens and batch APIs worth using?

Usually yes. Anthropic and OpenAI both publish discounted cached-input pricing and 50% Batch API discounts on their pricing pages: anthropic.com/pricing and openai.com/api/pricing. If your application reuses long system prompts, tool schemas, or codebase context, caching can materially reduce effective cost per request.

Why does Llama pricing vary so much?

Because Llama is offered through multiple providers rather than one canonical API. AWS Bedrock publishes its own model pricing at aws.amazon.com/bedrock/pricing/, while Together publishes serverless pricing at together.ai/pricing. The same model family can cost different amounts depending on hosting, hardware, and provider margin.

How often do LLM API prices change?

Often enough that buyers should treat any comparison as a snapshot. Vendors update model names, discounts, and token rates on their official pricing pages, including Anthropic, OpenAI, Google, Mistral, and DeepSeek. Always verify before budgeting or signing contracts.

Primary sources

Anthropic pricing — Anthropic
OpenAI API pricing — OpenAI
Google AI pricing — Google
DeepSeek API pricing — DeepSeek
Mistral pricing — Mistral AI
AWS Bedrock pricing — Amazon Web Services
Together pricing — Together AI
Anthropic docs — Anthropic
OpenAI platform docs — OpenAI

Last updated: May 20, 2026. Related: Agent Infrastructure.

LLM API Pricing in 2026 — Token Cost Comparison

At a glance: the biggest pricing signals

Chart 1: flagship model pricing per million tokens

Headline: output costs create the widest spread

Chart 2: notable challengers are forcing the floor lower

Pros

Cons

Chart 3: cached tokens and batch APIs are now first-order pricing levers

Chart 4: context windows change the economics, not just the UX

Bigger context is not automatically better economics

Breakdown: what the data means for buyers in 2026

Frequently asked questions

What is the best way to compare LLM API pricing across vendors?

Are cached tokens and batch APIs worth using?

Why does Llama pricing vary so much?

How often do LLM API prices change?

Primary sources

Leave a Reply Cancel reply

More Popular from Alatirok

Tokens Per Agentic Coding Task: The 2026 Variance Data

What Is Cognition Devin? The Enterprise Guide for 2026

What Is Circle Agent Stack? USDC Wallets for AI Agents

AI Agent Identity: Entra Agent ID vs Okta vs SailPoint

Why Does My AI Agent Context Window Fill Up So Fast?

Migrate OpenAI Agent Builder to Agents SDK Before Nov 30

Best Voice AI Agent Framework 2026: Vapi vs LiveKit vs Pipecat

Purpose-Built Legal AI vs General LLM: 2026 Verdict

Categories

Quick Links