Best LLM Gateway 2026: LiteLLM vs Portkey vs OpenRouter

Seven LLM gateways, ranked for engineers wiring multi-provider routing in 2026 — the real self-host-vs-SaaS tradeoff, latency overhead, fallback under outages, and a clear pick-when verdict.

Contents

What is the best LLM gateway in 2026?

The best LLM gateway in 2026 is LiteLLM if you self-host, OpenRouter if you want the fastest multi-provider access with no infrastructure, and Portkey if you need built-in guardrails and governance. Those three cover roughly 90% of teams, but the honest answer depends on one load-bearing decision you have to make before reading a single feature table: do you want to operate the proxy yourself, or pay someone to run it?

Everything else — model count, dashboards, prompt management — is downstream of that. A gateway sits on the hot path between your application and every model provider, normalizing them to one OpenAI-compatible API, handling retries and fallback when a provider 500s, and giving you one place to track spend and enforce policy. The problem with most “best LLM gateway” lists is that they’re written by gateway vendors who crown themselves, list features in a grid, and never tell you what the choice actually costs you in latency, lock-in, or operational toil.

This ranking is for the engineer wiring multi-provider routing right now and deciding self-host versus SaaS proxy. We rank seven real options on the four things that decide it in production: deployment model, latency overhead the proxy adds, fallback and retry behavior under a provider outage, and total cost. We also cover Cloudflare AI Gateway, Kong, and AWS Bedrock — three serious production options the vendor comparisons routinely omit because they don’t fit a tidy three-way fight.

Network routing diagram showing an LLM gateway proxy fanning requests out to multiple model providers with fallback paths — Image.

Do you want to run the proxy yourself? Yes, and you already have Postgres + Redis -> LiteLLM. No, you want zero infra and the most models fastest -> OpenRouter. You need PII redaction, jailbreak detection, and audit trails in the gateway layer -> Portkey. Everything else is a tiebreaker.

LLM gateway comparison table: LiteLLM vs Portkey vs OpenRouter and more

The fastest way to read the field: LiteLLM and Portkey’s gateway are open-source and self-hostable; OpenRouter and Cloudflare are SaaS-first; Kong and Bedrock are infrastructure you likely already run. The decision matrix below maps each gateway to deployment model, provider/model reach, pricing, built-in guardrails, and native fallback — plus the one situation each one wins.

A few numbers worth pinning down before you trust any grid. OpenRouter advertises 400+ models across 60+ providers at provider token prices plus a 5.5% fee on credit purchases, with 25+ free models and no monthly minimum. Portkey’s open-source gateway routes 1,600+ model variants with 50+ guardrails and went fully Apache 2.0 in March 2026; the managed control plane starts free and goes to $49/mo. Helicone’s free tier is 10K requests/month with Pro at $79/mo. These are the figures that move the decision, and they change — re-check the pricing pages before you commit.

“The gateway is a hop on your hot path. Every millisecond and every 500 it adds is yours to own.”
Surya Koritala, founder of Cyntr and Loomfeed

Gateway	Deploy model	Providers / models	Pricing model	Built-in guardrails	Fallback + retry	Pick it when
LiteLLM	Self-host (SDK + proxy); managed tier exists	100+ providers	Open-source, you operate (Postgres + Redis at scale)	Partial (hooks, integrations)	Yes — router order + retries + fallbacks	You self-host and want an OpenAI-compatible standard
OpenRouter	SaaS only	60+ providers / 400+ models	Provider price + 5.5% credit fee, no minimum	No	Yes — automatic provider fallback	You want the most models, fastest, with zero infra
Portkey	Self-host (Apache 2.0) or managed	250+ providers / 1,600+ models	OSS free; managed free -> $49/mo	Yes — 50+ guardrails, PII, jailbreak	Yes — conditional routing + fallbacks	You need guardrails and governance in the gateway
Helicone	Cloud or self-host	Provider-agnostic proxy	Free 10K req/mo -> $79/mo	Partial	Yes (gateway mode)	You want observability-first; note maintenance mode
Cloudflare AI Gateway	SaaS (edge)	Multi-provider passthrough	Free core; logs capped 100K/mo (Free)	Limited	Yes — caching + fallback	You want edge caching and rate limiting, free to start
Kong AI Gateway	Self-host (OSS data plane) / enterprise	Multi-LLM via plugins	OSS free; AI features need Enterprise	Yes — PII (20 categories), prompt guards	Yes — load balance + circuit breaking	You already run Kong for your API gateway
AWS Bedrock	Managed (AWS)	First-party + partner models	AWS usage-based	Guardrails for Bedrock	Yes — Intelligent Prompt Routing (GA)	You’re AWS-native and want in-cloud failover

LLM gateway decision matrix (2026): deployment, reach, price, guardrails, fallback

1. LiteLLM — best self-hosted LLM gateway

LiteLLM is the best LLM gateway for teams that self-host, because it is the de facto OpenAI-compatible standard: an open-source Python SDK and proxy that normalizes 100+ providers behind one endpoint, with routing, spend tracking, and access control you run in your own VPC. If you’ve ever seen a tool say “point it at any OpenAI-compatible base URL,” LiteLLM is what fills that slot for everything that isn’t literally OpenAI.

Operationally, it’s honest about being software you run. The router supports ordered deployments (order=1, order=2…), per-level retries that escalate to the next tier, and configured fallbacks when everything in a model group fails — the default is 3 retries per model group. Critically, LiteLLM exposes an x-litellm-overhead-duration-ms response header so you can measure exactly what the proxy adds to each request; published benchmarks put median overhead around 78ms on an EC2 g5.xlarge, and simple-shuffle routing is the recommended low-overhead default. At a few hundred requests per second, you’ll want Postgres and Redis behind it and a patching cadence for the Python proxy.

The catch is that those things are your job. There’s no SLA, no one to page, and tuning throughput past a few hundred RPS is real work. But you own the data plane completely, there’s no per-token markup, and the OpenAI-compatible surface means the rest of your stack — observability, eval harnesses, agent frameworks — just works.

Pros

Open-source, no per-token markup — cost is your infrastructure
OpenAI-compatible API that 100+ providers and downstream tools assume
Mature router: ordered fallbacks, per-level retries, load balancing
Exposes overhead-duration header so you can measure proxy latency
Self-hosted in your VPC — full control of the data plane

Cons

You operate it: Postgres + Redis + Python proxy patching at scale
No SLA or vendor support on the OSS path
Throughput tuning becomes real engineering past a few hundred RPS
Guardrails are integrations/hooks, not a polished built-in layer

2. OpenRouter — fastest multi-provider access (and its 5.5% markup)

OpenRouter is the best LLM gateway when you want the widest model access with zero infrastructure: one OpenAI-compatible endpoint to 400+ models across 60+ providers, billed at provider token prices plus a flat 5.5% fee on credit purchases, with no monthly minimum and 25+ free models. It is the fastest on-ramp in the category — sign up, buy credits, change a base URL, and you’re calling Claude, GPT, Gemini, and Llama through one key with automatic provider fallback.

The 5.5% is the number that decides whether OpenRouter stays in your stack. It’s charged when you purchase credits, not per request, and it’s genuinely cheap relative to the convenience: consolidated billing, a unified key, and failover across providers without you operating anything. For prototypes, spiky workloads, and anything where engineering time costs more than the markup, that math is overwhelmingly in OpenRouter’s favor.

Where it stops being optimal is steady, high-volume production. 5.5% on a six-figure annual inference bill is real money you could reclaim by self-hosting LiteLLM and going direct to providers — and OpenRouter is SaaS-only, so you can’t run it in your VPC for data-residency or air-gapped requirements. It also has no built-in guardrails layer. The clean pattern: start on OpenRouter, watch the bill, and migrate routing to LiteLLM when the markup, not the calendar, says it’s time.

OpenRouter’s 5.5% is on credit purchases, not a per-token surcharge on top of marked-up rates — token prices match going direct to the provider. That makes it the cheapest credit-purchase fee in the category, but it still compounds at high volume. Model the annualized number, not the per-call one.

3. Portkey — best LLM gateway for guardrails and governance

Portkey is the best LLM gateway when policy is the point: it bundles routing across 1,600+ model variants with 50+ built-in guardrails — PII redaction, jailbreak detection, content checks — plus audit trails and governance, and as of March 2026 the entire gateway is open-source under Apache 2.0. That last change matters: you can now self-host the core routing and guardrails for free and only pay for the managed control plane (analytics, prompt management, enterprise governance) if you want it.

This is the layer LiteLLM and OpenRouter don’t natively give you. If you’re in a regulated domain, shipping agents that touch customer data, or need to prove to a security review that prompts are screened and logged, Portkey’s guardrails being in the gateway — not bolted on in application code — is the differentiator. The company reports governing over 1 trillion tokens daily and $180M+ in annualized spend, so the routing path is battle-tested at scale.

Pricing on the managed side is a free prototyping tier, $49/mo for production, and custom enterprise. The trade-off versus LiteLLM is that you’re adopting Portkey’s worldview and its config surface; the trade-off versus OpenRouter is that you’re operating more (if self-hosting) or paying for a control plane. But if your blocker is “we can’t ship without guardrails,” nothing else on this list answers it as directly out of the box.

You don’t have to choose routing OR guardrails. A common 2026 pattern is LiteLLM for routing in the VPC with Portkey’s now-Apache-2.0 gateway or Kong’s PII plugins composed in for policy — keep the tw

4-7. Helicone, Cloudflare AI Gateway, Kong, and Bedrock — the options vendors omit

The three-way LiteLLM-vs-Portkey-vs-OpenRouter fight ignores four production-grade options, and for some teams one of them is the right answer. Vendor blogs skip these because they don’t fit a clean narrative — but if you already run Kong, live on Cloudflare’s edge, or are AWS-native, the best gateway might be infrastructure you already own.

Helicone (#4) leads with observability — one line of code and it captures everything your LLM app does, in gateway or proxy mode, cloud or self-hosted, free up to 10K requests/month and $79/mo for Pro. Caveat worth knowing: after Mintlify’s acquisition in March 2026, Helicone moved into maintenance mode, so weigh roadmap risk before betting a long-lived integration on it.

Cloudflare AI Gateway (#5) is free at its core — analytics, caching, rate limiting, and reliability features for a one-line change — with the Free plan logging capped at 100,000 events/month (1M on Workers Paid). Its semantic-aware caching can cut latency dramatically on repeated requests, and 2026 added unified billing so you can pay third-party model usage through your Cloudflare invoice. Best when you want edge caching and rate limiting first and routing second.

Kong AI Gateway (#6) extends Kong’s enterprise API gateway with 60+ AI plugins: semantic routing, semantic caching, prompt guards, PII sanitization across 20 categories and 9 languages, plus load balancing with circuit breaking. The OSS data plane is Apache 2.0 and free to self-host, but the advanced AI features (semantic caching, deep analytics, compliance tooling) require Kong Enterprise. The obvious pick if Kong already fronts your APIs. AWS Bedrock (#7) isn’t a third-party gateway, but with Intelligent Prompt Routing now generally available and managed inference profiles offering in-region/global failover, it’s a legitimate routing-and-fallback layer for AWS-native shops — often wired together with LiteLLM as the Bedrock-fronting fallback target.

Helicone

5 out of 5

Observability-first gateway, but check maintenance-mode roadmap risk.
Best for: Teams that want logging and analytics on the LLM path, fast

What works

Free 10K requests/month, one-line integration
Cloud or self-hosted; provider-agnostic
Strong out-of-the-box observability

Watch out for

In maintenance mode after Mintlify acquisition (Mar 2026)
Routing is less of a focus than analytics
Pro at $79/mo is pricier than peers’ entry tiers

Cloudflare AI Gateway

5 out of 5

Free, edge-native caching and rate limiting — routing is secondary.
Best for: Apps already on Cloudflare wanting caching and cost control

What works

Core features free; one line of code to start
Edge caching can slash latency on repeated requests
2026 unified billing for third-party model usage

Watch out for

Free plan logs capped at 100K/month
Limited built-in guardrails vs Portkey/Kong
Fewer native routing knobs than LiteLLM

Kong AI Gateway

5 out of 5

The right gateway if Kong already fronts your APIs.
Best for: Enterprises standardized on Kong’s API gateway

What works

60+ AI plugins: semantic routing, caching, PII sanitization
OSS data plane free (Apache 2.0); circuit breaking + fallback
Reuses existing Kong routes and ops

Watch out for

Key AI features (semantic caching, analytics) need Enterprise
Heavier to adopt if not already a Kong shop
More platform than purpose-built LLM router

AWS Bedrock

5 out of 5

A managed routing/fallback layer for AWS-native teams.
Best for: Shops standardized on AWS wanting in-cloud failover

What works

Intelligent Prompt Routing now GA
Managed inference profiles: in-region/global failover
Native to AWS billing, IAM, and guardrails

Watch out for

Limited to first-party + partner models on Bedrock
Not a neutral multi-vendor gateway
Often paired with LiteLLM for true multi-provider reach

What latency and fallback actually cost you (the part vendors skip)

A gateway adds latency and a new failure mode to every request, and choosing the best LLM gateway means budgeting for both — not just counting features. This is the analysis missing from self-crowning vendor posts, and it’s the part that bites you in production.

On latency: a self-hosted proxy is an extra network hop plus processing time. LiteLLM’s published median overhead is around 78ms on a g5.xlarge with simple-shuffle routing, and it exposes x-litellm-overhead-duration-ms so you can verify your own number rather than trust a benchmark. Edge gateways like Cloudflare can go the other direction — cached responses skip the provider entirely and cut latency by up to ~90% on repeats — but only for cacheable, repeated requests. SaaS proxies like OpenRouter add the round-trip to their infrastructure, which is usually negligible but is not zero and is outside your control.

On fallback: this is where a gateway earns its place. When a provider rate-limits or 500s mid-incident, you want automatic reroute, not a paged engineer. LiteLLM’s ordered-deployment fallback with per-level retries, OpenRouter’s automatic provider fallback, Portkey’s conditional routing, Kong’s circuit breaking, and Bedrock’s regional failover all solve this — but the quality differs. Test it deliberately: kill a provider in staging and watch whether requests reroute, how many retries fire, and what your p99 does under the storm. A gateway you haven’t failure-tested is a single point of failure wearing a reliability costume.

What it costs to start: LLM gateway entry pricing — Entry price only. OpenRouter shows $0 monthly minimum but adds a 5.5% fee on credit purchases; self-host options ($0 here) carry infrastructure and ops cost not shown.

Self-host vs SaaS proxy: the verdict

Self-host? -> LiteLLM. Fastest access? -> OpenRouter. Need guardrails? -> Portkey.

LiteLLM is the OpenAI-compatible self-host standard with the best price-per-control if you already run Postgres + Redis. OpenRouter wins zero-infra speed and the widest model reach at provider price + a 5.5% credit fee. Portkey wins when PII redaction, jailbreak detection, and audit trails must live in the gateway — and it’s now Apache 2.0, so you can self-host it too. If you’re already on Cloudflare, Kong, or AWS, evaluate their gateways before adding a new dependency. Measure proxy latency, failure-test fallback, and keep routing and policy as separate, composable layers.

Self-host an LLM gateway when you need data residency, want no per-token markup, and already run Postgres and Redis — that’s LiteLLM. Use a SaaS proxy when speed-to-first-call and zero ops matter more than the fee — that’s OpenRouter. Reach for a policy-heavy gateway when guardrails are a shipping blocker — that’s Portkey. The cost crossover is the thing to model: OpenRouter’s 5.5% is trivial at low volume and meaningful at high volume, while LiteLLM’s “free” carries real operational cost that’s trivial at high volume and annoying at low volume.

Don’t over-index on model count. 400 versus 1,600 models sounds decisive but almost no team uses more than a handful in production — routing quality, fallback behavior, and observability matter far more than the headline number. And remember the four omitted options: if you already live on Cloudflare, run Kong, or build on AWS, the best gateway may be the one you’re already paying for.

The composability point is the one I’d leave you with. These are not mutually exclusive. Routing and guardrails are separate decisions, and the strongest 2026 stacks treat them that way — LiteLLM or Bedrock for routing, Portkey or Kong for policy, Helicone or Cloudflare for caching and observability. Pick the load-bearing layer first, then compose.

Builder’s take

I run Cyntr’s orchestration engine and Loomfeed’s AI layer through a gateway, so this isn’t theoretical for me. The thing nobody on the vendor blogs says out loud: the gateway is a hop on your hot path. Every millisecond and every 500 it adds is yours to own. Here’s how I actually decide.

If you already run Postgres and Redis, self-hosting LiteLLM is close to free and gives you the OpenAI-compatible surface everything else assumes — that’s my default for anything past a prototype.
OpenRouter’s 5.5% is the cheapest credit-purchase fee in the category, and for the first month of a project, the time it saves dwarfs the markup. Migrate off it when the bill, not the calendar, tells you to.
Treat guardrails as a separate decision from routing. You can run LiteLLM for routing and bolt Portkey’s now-Apache-2.0 gateway or Kong’s PII plugins on for policy — don’t buy a SaaS tier just to get a regex.
Measure the overhead header on day one. LiteLLM exposes x-litellm-overhead-duration-ms; if you can’t see your proxy’s added latency, you’re flying blind into your own p99.
Helicone going into maintenance mode after the Mintlify acquisition is a reminder: pick a gateway whose project will outlive your integration, or pick one you can fully self-host and fork.

Frequently asked questions

What is an LLM gateway and do I actually need one?

An LLM gateway is a proxy that sits between your app and one or more model providers, normalizing them to a single (usually OpenAI-compatible) API while handling routing, retries, fallback, spend tracking, and policy. You need one the moment you call more than one provider, want automatic failover when a provider has an outage, or need a single place to track cost and enforce guardrails. For a single-provider prototype, you can skip it.

LiteLLM vs Portkey vs OpenRouter — which should I pick?

Pick LiteLLM if you want to self-host an open-source, OpenAI-compatible proxy and already run Postgres and Redis. Pick OpenRouter if you want the fastest access to the most models with zero infrastructure and accept a 5.5% fee on credit purchases. Pick Portkey if you need built-in guardrails, PII redaction, and governance in the gateway layer — and note its gateway went fully open-source (Apache 2.0) in March 2026, so you can self-host it as well.

How much latency does an LLM gateway add?

A self-hosted proxy adds a network hop plus processing time — LiteLLM’s published median overhead is around 78ms on an EC2 g5.xlarge with simple-shuffle routing, and it exposes an x-litellm-overhead-duration-ms header so you can measure your own number. SaaS proxies like OpenRouter add a round-trip to their infrastructure, usually small but outside your control. Edge gateways like Cloudflare can reduce latency on cached, repeated requests by up to ~90%.

Is OpenRouter’s 5.5% fee worth it?

At low or spiky volume, yes — the fee is charged on credit purchases (not as a per-token surcharge on marked-up rates), token prices match going direct to the provider, and the convenience of one key, consolidated billing, and automatic fallback usually outweighs the markup. At steady high volume, model the annualized cost: 5.5% on a six-figure inference bill is real money you could reclaim by self-hosting LiteLLM and calling providers directly.

What are the best OpenRouter alternatives in 2026?

For self-hosting without a markup, LiteLLM. For guardrails and governance, Portkey (now Apache 2.0). For observability-first, Helicone (free to 10K requests/month, though in maintenance mode after the Mintlify acquisition). For edge caching and rate limiting, Cloudflare AI Gateway (free core). For teams already on Kong or AWS, Kong AI Gateway or Amazon Bedrock with Intelligent Prompt Routing.

Should I self-host my LLM gateway or use a SaaS proxy?

Self-host (LiteLLM, or Portkey/Kong’s open-source data planes) when you need data residency, want no per-token markup, or already operate Postgres and Redis — you trade vendor fees for operational ownership. Use a SaaS proxy (OpenRouter, Cloudflare, managed Portkey) when time-to-first-call and zero ops matter more than the fee, or when you don’t want to patch and scale a proxy yourself. The cost crossover, not the feature list, usually decides it.

Primary sources

6 best LLM gateways for developers in 2026 — Braintrust
OpenRouter Pricing (5.5% fee, 400+ models, 60+ providers) — OpenRouter
Portkey’s Gateway is Now Fully Open Source — GlobeNewswire
Portkey-AI/gateway (1,600+ LLMs, 50+ guardrails, Apache 2.0) — GitHub
Latency Overhead Troubleshooting (x-litellm-overhead-duration-ms) — LiteLLM Docs
BerriAI/litellm (100+ LLM APIs, OpenAI format) — GitHub
Cloudflare AI Gateway Pricing — Cloudflare Docs
Kong AI Gateway — Kong
Helicone Pricing — Helicone
Use Amazon Bedrock Intelligent Prompt Routing — AWS Machine Learning Blog
Best AI LLM Routers and OpenRouter Alternatives in 2026 — Pinggy

Last updated: June 2, 2026. Related: Agent Infrastructure.

What is the best LLM gateway in 2026?

LLM gateway comparison table: LiteLLM vs Portkey vs OpenRouter and more

1. LiteLLM — best self-hosted LLM gateway

Pros

Cons

2. OpenRouter — fastest multi-provider access (and its 5.5% markup)

3. Portkey — best LLM gateway for guardrails and governance

4-7. Helicone, Cloudflare AI Gateway, Kong, and Bedrock — the options vendors omit

Helicone

What works

Watch out for

Cloudflare AI Gateway

What works

Watch out for

Kong AI Gateway

What works

Watch out for

AWS Bedrock

What works

Watch out for

What latency and fallback actually cost you (the part vendors skip)

Self-host vs SaaS proxy: the verdict

Self-host? -> LiteLLM. Fastest access? -> OpenRouter. Need guardrails? -> Portkey.

Builder’s take

Frequently asked questions

What is an LLM gateway and do I actually need one?

LiteLLM vs Portkey vs OpenRouter — which should I pick?

How much latency does an LLM gateway add?

Is OpenRouter’s 5.5% fee worth it?

What are the best OpenRouter alternatives in 2026?

Should I self-host my LLM gateway or use a SaaS proxy?

Primary sources

Leave a Reply Cancel reply

More Popular from Alatirok

Categories

Quick Links