AI Voice Agent Cost Per Minute 2026: The Real All-In Math

Every vendor publishes a base rate and stops there. We stacked orchestration, STT, LLM tokens, TTS, and telephony into one neutral per-minute number per stack — with a worked 3-minute call and a sensitivity table.

Contents

What is the real AI voice agent cost per minute in 2026?

The real AI voice agent cost per minute in 2026 is $0.12–$0.20 for a typical production stack, not the $0.05–$0.07 base rate vendors advertise. The advertised number is only the orchestration layer — once you add speech-to-text, LLM tokens, text-to-speech, and telephony, the loaded cost is roughly 2.5–3x the sticker.

Here is the problem with every pricing page you will read. Retell ranks its own pricing post at #1; Famulor, Ringlyn, and Klariqo all sell agents and quote the layers that flatter their own bundle. Each one shows a base orchestration rate and lets you assume that is the bill. It is not. The base rate is the floor of a five-part stack, and the four parts they leave out are the ones that actually move the invoice.

Alatirok sells no voice product. We have nothing to upsell and no base rate to defend, so this piece does the thing none of the ranking pages do: it sums every layer — orchestration plus STT plus LLM plus TTS plus telephony — into one neutral all-in number per stack, shows the math on a real 3-minute call, and then stress-tests it for silence and concurrency. If you are comparing voice-agent stacks and want the blended price a finance team would actually sign off on, this is the decomposition.

Across the 2026 market the all-in cost per minute spans roughly $0.07 at the aggressive self-serve end to about $0.35 at premium enterprise — a 5x spread (Famulor, 2026). Where you land inside that band is decided less by which orchestrator you pick and more by which LLM and which TTS voice you wire behind it.

Stacked horizontal bar chart comparing all-in per-minute cost of Vapi, Retell, and ElevenLabs voice agent stacks broken into orchestration, STT, LLM, TTS, and telephony segments — Image.

The five layers behind every voice agent per-minute price

Every AI voice agent cost per minute breaks into five stacked layers: orchestration, speech-to-text (STT), the LLM, text-to-speech (TTS), and telephony. Advertised rates almost always show only the orchestration layer; the other four are billed separately or passed through at provider cost.

Think of it as a stack of fees that all run against the same connected minute. The orchestration platform (Vapi, Retell) charges to manage turn-taking, latency, barge-in, and call state. STT transcribes the caller. The LLM generates the reply, billed by token but expressible per minute. TTS speaks the reply, billed by character. Telephony carries the call over the PSTN, billed per leg.

Here are the real 2026 component costs we will use throughout, drawn from the vendors’ own pages and a neutral calculator:

Bring-your-own-key platforms advertise ‘model costs at $0 if you bring your own API key’ — but that only zeroes the platform’s markup, not the underlying provider bill. You still pay OpenAI, Deepgram, and ElevenLabs directly. ‘At cost’ helps only if your negotiated cost is actually good.

Layer	What it does	Representative 2026 cost / min	Source
Orchestration	Turn-taking, latency, barge-in, call state	$0.05 (Vapi) / $0.055 (Retell base)	Vapi, Retell AI
STT (speech-to-text)	Transcribes the caller in real time	~$0.004–$0.008 (Deepgram Nova-3)	Deepgram
LLM	Generates the agent’s reply (token-based)	~$0.01–$0.04 (GPT-4o); $0.003 nano → $0.16 frontier	Retell AI, Softcery
TTS (text-to-speech)	Speaks the reply (character-based)	~$0.03–$0.05 (ElevenLabs Flash/Turbo) up to $0.04+ premium	ElevenLabs, Retell AI
Telephony	Carries the call over the PSTN	~$0.0085 inbound / $0.014 outbound (Twilio US)	Twilio

The five per-minute layers and their real 2026 cost ranges (sources: Vapi, Retell AI, Deepgram, ElevenLabs, Twilio, Softcery).

Vapi cost per minute: stacking the real number

Vapi cost per minute starts at a $0.05/min orchestration fee, but a realistic GPT-4o + Deepgram + ElevenLabs + Twilio stack lands at roughly $0.08–$0.15/min all-in. The $0.05 is hosting and orchestration only; Vapi passes STT, LLM, and TTS through ‘at cost’ (Vapi, 2026).

Vapi is the clearest example of the gap between sticker and bill because its pricing page is honest about being incomplete: it lists calls at $0.05/min and explicitly states model provider cost (STT, LLM, TTS) is billed ‘at cost — $0 if you bring your own API key.’ That last clause is the trap. Bringing your own key removes Vapi’s markup, not the provider’s invoice.

Here is the worked mid-point for a standard outbound stack: $0.05 orchestration + ~$0.008 Deepgram Nova-3 streaming STT + ~$0.02 GPT-4o LLM + ~$0.03 ElevenLabs Flash TTS + $0.014 Twilio outbound = about $0.122/min. Swap to a cheaper TTS voice and a GPT-4o-mini-class model and you can press toward the $0.08 floor; swap to a frontier model and a premium voice and you blow past $0.15.

The lesson: with Vapi, the platform fee is the most predictable line on the bill. Your variance lives entirely in the model and voice choices you make above it.

Retell AI pricing per minute: where the bundle hides

Retell AI pricing per minute is $0.055/min for the core voice infrastructure, building to roughly $0.11–$0.15/min once you add an LLM, a premium TTS voice, and telephony. Retell’s own page itemizes TTS at $0.015–$0.040/min and LLM from $0.003 (nano) to $0.16 (frontier) (Retell AI, 2026).

Retell’s base voice infrastructure is $0.055/min, and the company’s pricing page is more granular than most — it breaks out TTS separately ($0.015/min for platform/Cartesia/Fish voices up to $0.040/min for ElevenLabs), the LLM as a pass-through, and telephony by country. It also stacks optional per-minute add-ons that quietly inflate the bill: knowledge base (+$0.005), advanced denoising (+$0.005), safety guardrails (+$0.005), and PII removal (+$0.01).

Worked mid-point for a comparable stack: $0.055 base + ~$0.02 GPT-4o LLM + ~$0.04 ElevenLabs TTS + $0.014 Twilio = about $0.129/min. Note that Retell’s base rate already absorbs STT, which is why you do not see a separate STT line — but that also means you cannot shop STT independently the way you can on a pure-passthrough platform.

Retell’s own blog quotes a realistic production range of $0.11–$0.15/min with all components included, and $0.13–$0.31/min once premium models and features stack — which matches our decomposition almost exactly (Retell AI, 2026).

ElevenLabs conversational AI cost: the bundled alternative

ElevenLabs conversational AI cost is effectively $0.08–$0.20/min, with model tiers at roughly $0.08 (standard), $0.10 (turbo), and $0.12 (premium) per minute before telephony (ElevenLabs / Softcery, 2026). It bundles STT, LLM, and ElevenLabs’ own TTS into one rate, so the per-minute number looks higher but hides fewer surprises.

ElevenLabs Agents (its Conversational AI product) is the inverse philosophy to Vapi. Instead of a thin orchestration fee plus four pass-through layers, it sells a single per-minute rate that already contains STT, the LLM, and ElevenLabs’ best-in-class TTS. You trade the ability to shop each layer for a number that is closer to all-in out of the box.

Worked mid-point: $0.12/min premium tier + $0.014 Twilio outbound = about $0.134/min — landing right between the Vapi and Retell mid-points. The premium tier exists because TTS voice quality is ElevenLabs’ moat, and that quality is priced into every minute whether your script needs it or not.

The trade-off is real. If voice realism is the product — concierge, luxury, brand-sensitive lines — the bundle is worth it. If you are running high-volume routing where a synthetic-but-fine voice is acceptable, you are paying a premium-voice tax on every minute you may not need.

AI voice agent pricing comparison: one all-in number per stack

On a like-for-like GPT-4o + premium-TTS + US-telephony build, the all-in AI voice agent cost per minute is roughly $0.12 (Vapi), $0.13 (Retell), and $0.13 (ElevenLabs CAI) at the mid-point — far closer to each other than their advertised base rates ($0.05 vs $0.055 vs $0.12) suggest.

This is the chart no vendor publishes, because it erases the advantage their base rate is meant to imply. When you stack every layer, the orchestration fee — the number everyone competes on — turns out to be a minority of the bill. The LLM and TTS layers, which buyers treat as an afterthought, are where the money actually goes.

The stacked-bar view below is our original decomposition. Each bar is one stack, segmented by layer, summed to a single all-in number, with the market range band ($0.07 aggressive self-serve to $0.35 premium enterprise) drawn for context.

Pros

BYOK lets you shop each layer and exploit negotiated LLM/STT rates
Pure orchestration fees are predictable and easy to forecast
You can downgrade the LLM or TTS voice per use-case to cut cost
Bundled rates hide fewer line items — closer to all-in out of the box

Cons

BYOK’s true cost is opaque until you sum five separate invoices
‘At cost / $0 with your own key’ removes platform markup, not provider bills
Bundled rates bake in premium-voice cost even when you don’t need it
Per-feature add-ons (KB, PII, denoising) stack quietly on usage-based platforms

All-in voice agent cost per minute, 2026 — Mid-point all-in: Vapi ~$0.122, Retell ~$0.129, ElevenLabs CAI ~$0.134. The advertised base rates ($0.05 / $0.055 / $0.12) explain less than half the bill on the cheaper stacks.

Worked example: what a 3-minute call actually costs

A typical 3-minute outbound call costs about $0.37 on a mid-point Vapi stack, $0.39 on Retell, and $0.40 on ElevenLabs Conversational AI — roughly 11–13 cents per minute connected. At 10,000 minutes/month that is $1,220–$1,340, before per-feature add-ons and concurrency lines.

Multiply the mid-points out and the abstraction becomes a budget. The per-call delta between the three stacks is about three cents — noise next to the LLM and TTS choices inside each one. This is exactly why base-rate shopping misleads: you optimize the 5-cent line and ignore the 5-cent-to-15-cent lines stacked on top of it.

The table below scales the mid-point all-in cost across common call volumes so you can sanity-check a vendor quote against the loaded math.

“The orchestration fee everyone competes on is a minority of the bill. The LLM and TTS layers buyers treat as an afterthought are where the money actually goes.”
Alatirok analysis, 2026

Volume	Vapi (~$0.122/min)	Retell (~$0.129/min)	ElevenLabs CAI (~$0.134/min)
3-minute call	$0.37	$0.39	$0.40
100 calls (3 min avg)	$36.60	$38.70	$40.20
1,000 min / mo	$122	$129	$134
10,000 min / mo	$1,220	$1,290	$1,340
100,000 min / mo	$12,200	$12,900	$13,400

Mid-point all-in cost scaled by volume (assumes ~$0.12–$0.134/min loaded; excludes phone-number rental, concurrency lines, and per-feature add-ons).

Sensitivity: how silence, barge-in, and concurrency change the bill

Budget for $0.12–$0.20/min, then optimize the LLM and TTS — not the base rate

On a like-for-like 2026 build, Vapi (~$0.122), Retell (~$0.129), and ElevenLabs Conversational AI (~$0.134) land within three cents of each other all-in, even though their advertised base rates differ 2.4x. Choose the orchestrator for developer experience, control, and concurrency economics; choose your LLM and TTS voice deliberately, because that is where 40–60% of the loaded minute lives. Plan finance off the loaded minute (2.5–3x the sticker), watch silence and concurrency, and treat any vendor that quotes only a base rate as quoting you half the bill.

Because you pay for connected wall-clock time, silence and hold music cost the same per minute as productive speech — so a call with 30% dead air costs ~30% more per useful minute. Concurrency lines and per-feature add-ons (knowledge base, PII redaction) can add $0.01–$0.025/min and a per-line monthly fee on top of usage.

Two variables that no base rate captures will swing your real number more than the vendor you choose. The first is talk-time efficiency. Voice agents bill the connected minute, not the productive minute, so long IVR menus, hold music, slow endpointing, and a caller who rambles all inflate cost without adding value. Tight endpointing and reliable barge-in (letting the caller interrupt) are cost levers, not just UX polish — they shorten the connected minute.

The second is scale structure. Vapi includes 10 concurrent lines then charges per line; Retell includes 20 then charges monthly per line plus its per-minute add-ons. At 100,000 minutes/month those lines and add-ons can quietly add a four-to-five-figure layer that never appears in a per-minute comparison.

The sensitivity table shows how a nominal $0.122/min Vapi mid-point shifts under realistic conditions.

Scenario	Adjustment	Effective cost / connected min
Baseline (GPT-4o, Flash TTS)	—	$0.122
30% silence / hold per useful min	÷0.70 useful	~$0.174 per useful min
Downgrade to mini-class LLM	−$0.015 LLM	~$0.107
Premium frontier LLM	+$0.12 LLM	~$0.242
+ Knowledge base + PII redaction	+$0.015	~$0.137
Cheap synthetic TTS voice	−$0.025 TTS	~$0.097

Sensitivity of the Vapi mid-point ($0.122/min nominal) to silence, model, and feature choices.

Builder’s take

I have shipped voice into both Cyntr and Loomfeed, and the single most expensive mistake I see buyers make is budgeting off the headline number. Here is how I actually model it.

Budget off the loaded minute, not the base rate. A $0.05/min orchestration fee routinely lands at $0.12–$0.15 once STT, LLM, TTS, and telephony are added. Plan for 2.5–3x the sticker.
Your LLM and TTS choices move the bill more than your platform choice. Swapping a premium TTS voice for a Flash-tier voice can save more per minute than switching orchestrators entirely.
Silence is not free. You pay for connected wall-clock time, so a chatty IVR with long hold music or dead air burns the same per-minute rate as productive talk. Aggressive endpointing and barge-in are cost levers, not just UX.
Concurrency lines and per-feature add-ons (knowledge base, PII redaction, denoising) are the quiet six-figure surprise at enterprise scale. Price the whole bundle before you sign.
BYOK is a trap if you do not have negotiated model rates. ‘At cost’ only helps if your cost is good — otherwise the bundled platforms with volume discounts can be cheaper.

Frequently asked questions

What is the average AI voice agent cost per minute in 2026?

A typical production stack lands at $0.12–$0.20 per minute all-in. The full market spans about $0.07/min for aggressive self-serve builds to $0.35/min for premium enterprise — a roughly 5x spread (Famulor, 2026). The advertised base rate of $0.05–$0.07 covers only the orchestration layer, not the STT, LLM, TTS, and telephony stacked on top.

What is Vapi’s cost per minute really?

Vapi charges a $0.05/min orchestration fee, but a realistic GPT-4o + Deepgram STT + ElevenLabs TTS + Twilio telephony stack lands at about $0.08–$0.15/min all-in. Vapi passes model costs through ‘at cost’ — which removes its markup but not the underlying OpenAI, Deepgram, and ElevenLabs bills (Vapi, 2026).

How much does Retell AI cost per minute?

Retell AI’s base voice infrastructure is $0.055/min, building to roughly $0.11–$0.15/min once you add an LLM, a TTS voice, and telephony. Retell’s own page itemizes TTS at $0.015–$0.040/min, LLM from $0.003 (nano) to $0.16 (frontier), plus per-minute add-ons for knowledge base, denoising, and PII removal (Retell AI, 2026).

What does ElevenLabs Conversational AI cost per minute?

ElevenLabs Conversational AI (Agents) runs about $0.08/min standard, $0.10/min turbo, and $0.12/min premium, before telephony — effectively $0.08–$0.20/min all-in. Unlike Vapi or Retell, it bundles STT, the LLM, and ElevenLabs’ own premium TTS into one rate, so the number is higher but hides fewer line items (ElevenLabs / Softcery, 2026).

Why is the real cost higher than the advertised base rate?

Advertised base rates show only the orchestration layer. The four layers they exclude — STT (~$0.004–$0.008), LLM (~$0.01–$0.04), TTS (~$0.03–$0.05), and telephony (~$0.014) — typically add up to more than the base rate itself, pushing the loaded cost to roughly 2.5–3x the sticker. That is why a $0.05/min platform commonly bills $0.12–$0.15/min.

How do I lower my voice agent cost per minute?

Three levers, in order of impact: downgrade the LLM to a mini-class model where quality allows (saves ~$0.01–$0.04/min); pick a cheaper TTS voice when premium realism isn’t required (saves ~$0.02–$0.03/min); and shorten connected time with tight endpointing and barge-in so you pay for fewer silent or hold minutes. Switching orchestrators usually saves the least, because the base fee is the smallest variable layer.

Primary sources

Vapi Pricing (orchestration $0.05/min, BYOK at-cost) — Vapi
Retell AI Pricing ($0.055/min voice infra + components) — Retell AI
AI Voice Agent Pricing 2026: What 10 Platforms Actually Cost Per Minute — Famulor
AI Voice Agent Cost Calculator 2026 (component breakdown) — Softcery
Deepgram Pricing (Nova-3 STT per minute) — Deepgram
ElevenLabs Pricing (Conversational AI / Agents) — ElevenLabs
Programmable Voice Pricing in United States — Twilio
AI Voice Agent Pricing in 2026: Full Cost Breakdown & ROI — Retell AI

Last updated: June 2, 2026. Related: Products.

AI Voice Agent Cost Per Minute 2026: The Real All-In Math

What is the real AI voice agent cost per minute in 2026?

The five layers behind every voice agent per-minute price

Vapi cost per minute: stacking the real number

Retell AI pricing per minute: where the bundle hides

ElevenLabs conversational AI cost: the bundled alternative

AI voice agent pricing comparison: one all-in number per stack

Pros

Cons

Worked example: what a 3-minute call actually costs

Sensitivity: how silence, barge-in, and concurrency change the bill

Budget for $0.12–$0.20/min, then optimize the LLM and TTS — not the base rate

Builder’s take

Frequently asked questions

What is the average AI voice agent cost per minute in 2026?

What is Vapi’s cost per minute really?

How much does Retell AI cost per minute?

What does ElevenLabs Conversational AI cost per minute?

Why is the real cost higher than the advertised base rate?

How do I lower my voice agent cost per minute?

Primary sources

Leave a Reply Cancel reply

More Popular from Alatirok

Tokens Per Agentic Coding Task: The 2026 Variance Data

What Is Cognition Devin? The Enterprise Guide for 2026

What Is Circle Agent Stack? USDC Wallets for AI Agents

AI Agent Identity: Entra Agent ID vs Okta vs SailPoint

Why Does My AI Agent Context Window Fill Up So Fast?

Migrate OpenAI Agent Builder to Agents SDK Before Nov 30

Purpose-Built Legal AI vs General LLM: 2026 Verdict

What Is DNS-AID? AI Agent Discovery via DNS, Explained

Categories

Quick Links