A runtime-only buyer’s guide to the prompt-injection classifiers and firewalls builders actually deploy in front of agents — ranked by detection rate, added latency, license, and whether they catch indirect injection from RAG context.
What are the best prompt injection detection tools in 2026?
The best prompt injection detection tools 2026 for runtime defense are Lakera Guard (hosted, 98%+ claimed detection, sub-50ms), LLM Guard (open-source MIT, direct and indirect scanning), and StackOne Defender (Apache-2.0, purpose-built for indirect injection at under 10ms) — with Rebuff, Meta Prompt Guard 2, Vigil, and NVIDIA NeMo Guardrails rounding out the field for self-hosted, multilingual, and multi-turn use cases. Which one is best depends on three buying signals most roundups bury: your detection-rate floor, your latency budget on the live request path, and whether you need to catch indirect injection arriving through retrieved or tool-call context — not just the user’s typed message.
Before anything else, separate two layers people conflate. Offensive red-team scanners like Garak and Promptfoo generate attacks and probe your system in CI; they are evaluation tools, not defenses. The seven tools in this guide are runtime detectors — they sit in your request path and screen every prompt, retrieved document, or tool result as it flows through, live, in production. This list is runtime-only on purpose. If you want to attack your own stack before shipping, that is a separate (and complementary) red-teaming workflow.
A prompt injection firewall earns its place only if it improves at least one of detection rate, coverage, or latency without quietly breaking the others. A classifier that hits 99% but adds 800ms to every call is not a guardrail, it is a regression. A 5ms regex that misses every obfuscated attack is theater. The right answer for most builders is a layered runtime prompt injection scanner: cheap checks first, expensive checks only where they pay off.

Prompt injection detection tools compared: detection rate, latency, license
Use this table as your shortlist filter: hosted vs self-hosted decides where your prompt data lives, the latency band decides whether the tool fits the hot path, and the indirect-injection column tells you whether it protects RAG and tool-calling agents — the axis vendors most often omit. Figures below are vendor-claimed or drawn from public model cards and benchmarks; always re-measure detection and latency on your own traffic, because both move with your prompt distribution and hardware.
Two columns deserve emphasis. The latency band reflects an architectural fact, not a vendor quirk: lightweight rule-based and classifier guardrails typically add 10-50ms per request, while LLM-as-judge checks (a second model reasoning about the input) add 200-1000ms, per General Analysis. And the indirect-injection column is the real differentiator — most direct-input classifiers screen only the user message and are blind to instructions smuggled in through a retrieved web page, email, or support ticket.
Classifier-style detectors (Lakera, LLM Guard, StackOne, Prompt Guard) are fast enough to run synchronously on every request. LLM-as-judge and dialog-modeling approaches (NeMo’s LLM rails, Rebuff’s optional LLM check) are powerful but belong on flagged traffic or async pipelines, not the universal hot path.
| Tool | Hosted / Self-hosted | License | Claimed detection | p50 latency band | Direct injection | Indirect / RAG injection |
|---|---|---|---|---|---|---|
| Lakera Guard | Hosted (managed API) | Commercial (free tier) | 98%+ | Sub-50ms (classifier) | Yes | Yes |
| LLM Guard | Self-hosted | MIT | DeBERTa-v3 classifier | 10-50ms (classifier) | Yes | Yes (scans any text incl. retrieved) |
| StackOne Defender | Self-hosted (npm) | Apache-2.0 | ~88.7% acc / 90.8% F1 | <10ms (two-tier) | Partial | Yes (purpose-built) |
| Rebuff | Self-hosted or SaaS | Apache-2.0 | Multi-layer (no single rate) | 10-50ms + LLM step optional | Yes | Limited |
| Meta Prompt Guard 2 | Self-hosted model | MIT (base models) | Binary classifier | 10-50ms (classifier) | Yes | Yes (context-aware) |
| Vigil | Self-hosted | Apache-2.0 (alpha) | Ensemble (YARA + DeBERTa + vectors) | 10-50ms+ (ensemble) | Yes | Limited |
| NeMo Guardrails | Self-hosted | Apache-2.0 | Programmable (dialog + rails) | Varies; LLM rails 200-1000ms | Yes | Yes (retrieval rail) |
1. Lakera Guard — best hosted prompt injection firewall
Lakera Guard is the strongest turnkey choice for teams that want a maintained, multilingual prompt injection firewall as a hosted API and can accept sending prompt text to a third party. AppSecSanta’s 2026 review reports 98%+ prompt-injection detection, sub-50ms latency, a false-positive rate below 0.5%, and support for 100+ languages and scripts — and crucially it covers direct injection, indirect injection, jailbreaks, and system-prompt extraction, making it usable in front of RAG pipelines.
The trade-off is deployment and transparency. Lakera Guard is a managed cloud API with no general self-hosted tier, and pricing is sales-gated (a free tier exists for evaluation). You are buying a threat model that updates continuously against fresh adversarial samples — the appeal is exactly that you do not maintain the detector yourself. For regulated workloads where prompt data cannot leave your environment, that hosted posture is the dealbreaker, and you should look at LLM Guard or Prompt Guard instead.
Pros
Cons
2. LLM Guard — best open source LLM input output scanner
LLM Guard is the best open source prompt injection detection toolkit for teams that want to self-host and screen both inputs and outputs, including text retrieved from RAG. Built by Protect AI and MIT-licensed, it ships roughly 15 input scanners and 20 output scanners — its PromptInjection scanner is a fine-tuned DeBERTa-v3 model, alongside Anonymize (PII), Secrets, BanTopics, Toxicity, and more. It installs via pip, runs as a standalone API server or in-process, and has been downloaded over 2.5 million times.
Because LLM Guard scans arbitrary text, you are not limited to screening the user’s typed message — you can run the same classifier over a retrieved document or a tool result before it reaches the model, which is how you close the indirect-injection gap. It is the most complete LLM input output scanner here, and the natural anchor of a self-hosted stack. The cost is that you own tuning, model updates, and the false-positive budget yourself.
Pros
Cons
Lakera Guard vs LLM Guard is the hosted-vs-self-hosted decision in miniature: choose Lakera for a maintained, multilingual API you do not operate; choose LLM Guard when prompt data must stay in your environment and you can own the tuning.
3. StackOne Defender — best for indirect prompt injection protection
StackOne Defender is the best purpose-built tool for indirect prompt injection protection in tool-calling and agent workflows, where attacks hide in documents, emails, tickets, and API responses rather than the user’s message. It is an Apache-2.0 open-source npm package (npm install @stackone/defender) that wraps your tool calls and inspects results before they reach the LLM. Per its GitHub repo, it is 22MB, CPU-only, runs in under 10ms, and reports roughly 88.7% accuracy / 90.8% F1 across ~25k samples including adversarial sets.
Architecturally it is a two-tier detector: a ~1ms pattern layer plus a fine-tuned MiniLM ONNX classifier (~10ms) doing sentence-level analysis to catch attacks that evade regex. It plugs into Vercel AI SDK, LangChain, LangGraph, Pydantic AI, CrewAI, and MCP, and runs by default inside StackOne connectors. Because it focuses on the indirect vector, pair it with a direct-input classifier — it is the missing half of the stack for anyone whose agent reads untrusted external content.
Pros
Cons
4. Rebuff, Meta Prompt Guard 2, Vigil and NeMo Guardrails
These four cover the rest of the runtime field: Rebuff for self-hardening multi-layer detection, Meta Prompt Guard 2 for a tiny multilingual classifier, Vigil for an extensible ensemble scanner, and NVIDIA NeMo Guardrails for multi-turn dialog control that single-shot classifiers cannot do. Each fills a specific niche rather than competing head-on with the top three.
Rebuff (Protect AI, Apache-2.0) combines heuristics, a dedicated LLM detector, a vector database of known attacks, and canary tokens, and self-hardens by learning from detected attempts. The LLM-detection step is powerful but pushes you into the 200-1000ms band, so treat it as a flagged-traffic layer rather than an every-request gate.
Meta Prompt Guard 2 ships as MIT-licensed open models in two sizes — an 86M multilingual variant (mDeBERTa-base) and a 22M English-only variant (DeBERTa-xsmall) for resource-constrained deployments. It is a binary classifier detecting both injection and jailbreaks, and being context-aware it can screen retrieved content, not just user input.
Vigil (deadbits, Apache-2.0) is a Python module and REST API using a multi-layered ensemble: vector-DB similarity, YARA rules, a DeBERTa transformer (Protect AI’s deberta-v3-base-prompt-injection-v2), prompt-response similarity, and canary tokens. It is highly extensible but flagged alpha, so pilot it before betting production on it.
NVIDIA NeMo Guardrails (Apache-2.0) is the outlier — its Colang language models the full multi-turn dialog and offers five rail types (input, dialog, retrieval, execution, output). That makes it the only option here that can track multi-turn injection attempts a per-request classifier misses, with a retrieval rail for RAG. The cost is complexity and latency when LLM-backed rails run; NVIDIA itself flags the current release as beta and not production-ready as-is.
Picking by niche: Rebuff if you want self-hardening and already run a vector DB; Prompt Guard 2 (22M) when you need a near-free multilingual classifier on cheap hardware; Vigil when you want YARA-stylHow to choose a runtime prompt injection scanner
Choose by answering four questions in order: where can prompt data live, what is your latency budget, do you face indirect injection, and how much do you want to operate? That sequence eliminates most options fast and turns a seven-way comparison into a one- or two-tool decision.
First, data residency. If prompts cannot leave your environment, hosted Lakera Guard is out and you are choosing among LLM Guard, Prompt Guard 2, Rebuff, Vigil, and NeMo. Second, latency: anything synchronous on every request should stay in the 10-50ms classifier band; reserve the 200-1000ms LLM-judge tier (Rebuff’s LLM step, NeMo’s LLM rails) for flagged or async traffic. Third, indirect injection: if your agent retrieves or tool-calls untrusted content, you need StackOne Defender, LLM Guard, Prompt Guard 2, or NeMo’s retrieval rail in the path — a user-input-only classifier will not save you. Fourth, operational load: a hosted API trades control for a maintained threat model; self-hosting trades effort for sovereignty.
The pattern that holds up in production is layered, not single-vendor. Run a cheap regex/heuristic pre-filter, then a fast classifier (LLM Guard, Prompt Guard 2, or StackOne for tool results) on every request, and escalate only flagged traffic to a heavier judge or NeMo dialog model. As General Analysis notes, fast classifiers can handle 95%+ of traffic in real time while slower judges are reserved for the edge cases — that is how you get high coverage without paying the LLM-judge latency on every call.
“A classifier that hits 99% but adds 800ms to every call is not a guardrail — it is a regression. Detection rate, latency, and indirect coverage are one decision, not three.”
Surya Koritala, founder of Cyntr and Loomfeed
Verdict: which prompt injection detection tool should you deploy?
For most builders the answer in 2026 is a layered stack, not a single product: LLM Guard or Prompt Guard 2 as a self-hosted fast classifier on every request, StackOne Defender wrapping tool calls to close the indirect-injection gap, and Lakera Guard when you want a maintained hosted firewall and can send text out. Reserve Rebuff’s LLM detection and NeMo’s dialog rails for flagged or multi-turn traffic where their extra latency is justified, and use Vigil when you need custom YARA signatures.
Remember the layer split that the top search results blur: these are runtime detectors. Keep running offensive scanners (Garak, Promptfoo) in CI to attack your own stack before shipping — the two jobs are complementary, and the tools above will only ever be as good as the adversarial traffic you test them against. The figures here are claims; the measurement that matters is on your own prompt distribution and hardware.
Builder’s take
I run guardrails in production on Cyntr and Loomfeed, so I read these listicles the way a builder does: not ‘which tool is best’ but ‘which tool sits in my request path without wrecking my p95, and does it catch the attack vector that actually hits me.’ Most roundups fail that test because they mix offensive scanners with runtime defense. Here is how I actually reason about the shortlist.
- Pick your layer first. Garak and Promptfoo are red-team scanners you run in CI; the tools below run on every live request. Do not confuse the two — you need both, but they are different line items.
- Indirect injection is the real exposure for anyone doing RAG or tool-calling. A classifier that only screens the user’s typed message is blind to a poisoned support ticket or web page your agent retrieves. StackOne Defender and LLM Guard explicitly target that gap.
- Latency is a budget, not a footnote. A 10-50ms classifier is invisible; a 200-1000ms LLM-as-judge on the hot path is a product decision. Run the heavy check async or only on flagged traffic.
- Self-hosted MIT/Apache models (LLM Guard, Prompt Guard, Rebuff, Vigil) keep prompt data in your VPC. Hosted APIs like Lakera Guard buy you a maintained threat model in exchange for sending text out — fine for many teams, a non-starter for some.
- No single tool is a finished answer. The durable pattern is layered: a cheap regex/heuristic pre-filter, a fast classifier on every request, and a heavier judge or dialog model reserved for the traffic the cheap layers flag.
Frequently asked questions
There is no single best tool for every team. For a hosted firewall, Lakera Guard leads on claimed detection (98%+) and language coverage. For self-hosted, LLM Guard is the most complete open-source LLM input/output scanner, and StackOne Defender is the best dedicated tool for indirect (tool-result/RAG) injection. Most production stacks combine a fast classifier on every request with a heavier check on flagged traffic.
Lakera Guard is a hosted, managed API with a continuously updated threat model, 98%+ claimed detection, and 100+ language support, but prompt text leaves your environment. LLM Guard is open-source (MIT), self-hosted, and keeps data in your VPC with 15 input and 20 output scanners. Choose Lakera if you want zero maintenance and can send text out; choose LLM Guard if data residency or cost rules out a hosted API.
Indirect prompt injection hides malicious instructions in content an agent retrieves or tool-calls — a web page, email, support ticket, or API response — rather than in the user’s typed message. Direct-input classifiers are blind to it. Tools that explicitly address it include StackOne Defender (purpose-built for tool results), LLM Guard (it can scan any text, including retrieved context), Meta Prompt Guard 2, and NeMo Guardrails’ retrieval rail.
Lightweight rule-based and classifier guardrails typically add about 10-50ms per request, fast enough to run synchronously on every call. LLM-as-judge checks, where a second model reasons about the input, add roughly 200-1000ms, per General Analysis. The common pattern is to run fast classifiers on all traffic and reserve LLM judges for flagged or asynchronous analysis.
Yes. LLM Guard (MIT), StackOne Defender (Apache-2.0), Rebuff (Apache-2.0), Vigil (Apache-2.0), Meta Prompt Guard 2 (MIT base models), and NVIDIA NeMo Guardrails (Apache-2.0) are all free and self-hostable. Lakera Guard is the main commercial, hosted option, though it offers a free evaluation tier.
No, and conflating them is the most common mistake. Red-team scanners such as Garak and Promptfoo generate attacks and probe your system in CI to find weaknesses before you ship — they are evaluation tools. The detectors in this guide (Lakera Guard, LLM Guard, StackOne Defender, and others) run at runtime on live traffic to block attacks in the request path. You want both: offensive scanning to test, runtime detection to defend.
Primary sources
- Lakera Guard 2026 review — detection rate, latency, languages — AppSecSanta
- Introduction to Lakera Guard — API documentation — Lakera
- LLM Guard — protectai/llm-guard repository — Protect AI / GitHub
- LLM Guard — Prompt Injection input scanner docs — Protect AI
- StackOne Defender — open source indirect prompt injection protection — StackOne / GitHub
- StackOne Defender product page — StackOne
- Rebuff — protectai/rebuff prompt injection detector — Protect AI / GitHub
- Llama Prompt Guard 2 (86M) model card — Meta / Hugging Face
- Vigil — deadbits/vigil-llm LLM security scanner — deadbits / GitHub
- NeMo Guardrails — NVIDIA-NeMo/Guardrails toolkit — NVIDIA / GitHub
- Best AI Guardrails 2026 — latency by guardrail type — General Analysis
Last updated: June 2, 2026. Related: Identity Provenance.