Why Does My AI Agent Keep Looping? (And How to Stop It)

A decision-ordered diagnostic for the four root causes of agent loops, with a copy-pasteable duplicate-call detector, per-error-class retry budgets, and why max_turns alone is not enough.

Contents

Why does my AI agent keep looping?

Your AI agent keeps looping because it never receives a machine-readable signal that the task is done, so it interprets every ambiguous tool result as a reason to try again. The loop is not the model being ‘dumb’ — it is the harness failing to give the model a stop condition it can parse. In production this shows up as the same tool being called with the same arguments dozens of times in a row, burning tokens and budget until a turn cap finally kills the run. So why does my AI agent keep looping? Almost always because the success condition is never made explicit to the model.

Unlike a traditional infinite loop, which is a clear logical error in your code, an agent loop is a probabilistic misread. The LLM is choosing the next action one token at a time, and if the world it can see has not visibly changed — no success flag, no terminal status, no new information — the most likely next action is often the same action it just took. That is the whole bug in one sentence.

This article gives you the diagnostic that every other ranking page skips. Instead of ‘set max_turns and pray,’ you get the four root causes mapped to four specific fixes, a copy-pasteable duplicate-call detector with an exact 3-repeat threshold, a retry-budget table keyed to HTTP error class, and the typed tool-result enum that kills the most common loop of all. Work the four causes in order and you will find yours.

Diagram of an AI agent stuck in a reasoning loop, repeatedly calling the same tool with identical arguments — Image.

The four root causes of an AI agent stuck in a loop

Almost every agent loop in production traces to one of four root causes: missing completion states, retry amplification across layers, unmapped failure classes treated as transient, or non-idempotent side effects. Diagnose them in that order — completion states first, because they cause the majority of loops and are the cheapest to fix.

MatrixTrak’s production teardown of looping agents names these same four mechanisms, and the AWS engineering write-up on DEV Community confirms the first one is by far the most common. Here is what each looks like in a live trace, and which fix section to jump to.

Cause 1, missing completion states: your tool returns prose like ‘Found 3 results, more may be available’ instead of a status field the model can branch on. The agent has no parsable ‘done,’ so it keeps searching. Fix: typed tool results (section below).

Cause 2, retry amplification: you have retries in your HTTP client, again in your tool wrapper, and again in your agent policy. A single permanent failure becomes 2 x 2 x 2 = 8 attempts, and to the agent it looks like the environment is flaky rather than broken. Fix: one retry owner with a per-error-class budget.

Cause 3, unmapped failure classes: every error — including auth, validation, and permission denials — is treated as transient and retried. Those are stop conditions, not retry events; no number of retries will fix a 403. Fix: the retry-budget table.

Cause 4, non-idempotent side effects: the looping tool actually does something each time — sends an email, charges a card, files a ticket. Now the loop is not just wasting tokens, it is duplicating real-world actions. Fix: idempotency keys plus the duplicate-call detector.

Roughly four out of five loops we have triaged are cause #1 (missing completion states) or cause #3 (unmapped failure classes). Check those two before you touch your prompt or your model choice.

Root cause	What you see in the trace	The fix
Missing completion states	Same tool, same args, result text is ambiguous (‘more may be available’)	Typed tool results with a status enum
Retry amplification	Bursts of identical calls; retries stacked in HTTP + wrapper + policy	Single retry owner, one budget per error class
Unmapped failure classes	Endless retries on 401/403/422 that can never succeed	Retry-budget table: stop on auth/validation/permission
Non-idempotent side effects	Duplicate emails, charges, or tickets on each repeat	Idempotency keys + duplicate-call fingerprint detector

Four root causes of agent loops, each mapped to its symptom in the trace and its fix.

How do I detect when my AI agent is repeating the same tool call?

Detect a loop by fingerprinting each step as a hash of the last tool name plus its result, and trip a circuit breaker when the same fingerprint repeats 3 or more times. This is purely pattern-based — no LLM judgment, no heuristics about ‘progress’ — which makes it reliable enough to run in your hot path.

MatrixTrak’s production approach is exactly this: track last_tool + last_result_hash, and when that fingerprint repeats past a threshold of three, you have a confirmed loop. Three is the right number because two repeats can be legitimate (a retry after a transient blip), but a third identical (tool, result) pair means the agent has learned nothing and the environment has not changed. Here is a drop-in detector you can wire into a PreToolUse hook or wrap around your own tool dispatcher.

The detector below is deliberately stateless about the model’s reasoning. It does not try to decide whether the agent is ‘making progress’ — it only asks whether this exact (tool, result) pair has now happened three times. That is the whole point: the agent’s own narration about why it is retrying is the unreliable part, so we ignore it and watch behavior instead.

Do not silently drop the call. Return the BLOCKED message as the tool result so the model sees it in context. Agents recover well from an explicit ‘you already tried this, change approach’ — far better than from a hard exception that just ends the run.

import hashlib
from collections import Counter

class LoopDetector:
    """Trips when the same (tool, result) fingerprint repeats >= 3 times."""

    def __init__(self, threshold: int = 3):
        self.threshold = threshold
        self.counts: Counter[str] = Counter()
        self.iteration = 0

    @staticmethod
    def _fingerprint(tool: str, result: str) -> str:
        # Hash tool name + result so identical (call, outcome) pairs collide.
        # Note: fingerprint the RESULT, not the args -- two different argument
        # sets that both return "no_results" are still the same dead end.
        raw = f"{tool}::{result}".encode()
        return hashlib.sha256(raw).hexdigest()[:16]

    def check(self, tool: str, result: str) -> dict:
        self.iteration += 1
        fp = self._fingerprint(tool, result)
        self.counts[fp] += 1
        repeats = self.counts[fp]
        looping = repeats >= self.threshold
        return {
            "loop_iteration": self.iteration,
            "last_tool": tool,
            "last_result_hash": fp,
            "tool_calls_count": self.iteration,
            "repeats": repeats,
            "decision": "BREAK_LOOP" if looping else "continue",
        }


# Usage inside your tool-dispatch loop:
detector = LoopDetector(threshold=3)

def on_tool_result(tool_name: str, result_text: str):
    verdict = detector.check(tool_name, result_text)
    log.info("agent_step", **verdict)   # structured, one line per step
    if verdict["decision"] == "BREAK_LOOP":
        # Feed this back to the model as the tool result so it changes course.
        return (
            "BLOCKED: This exact tool call has produced the same result "
            f"{verdict['repeats']} times. Do not call {tool_name} again with "
            "the same approach. Either change strategy or report that you "
            "cannot complete the task."
        )
    return result_text

What retry budget should I set for each error class?

Set retries by error class, not a single global number: 0 retries for validation, auth, and permission errors; 3 retries with exponential backoff and jitter for rate limits (HTTP 429); 2 retries then escalate for transient 5xx and timeouts. The single biggest cause of retry loops is treating a permanent error as if it were transient.

This is the rule that the generic ‘just add retries’ advice gets dangerously wrong. A 401 or a 422 will return the identical error on attempt number fifty — retrying it is the loop. MatrixTrak’s framework codifies the budgets below, and pairing them with the duplicate-call detector above gives you defense in depth: the budget stops most retry storms at the source, and the fingerprint detector catches anything that slips through (including loops that are not even retry-driven).

Critically, this budget must live in exactly one layer. If your HTTP library retries 5xx three times, and your tool wrapper retries twice on top of that, you have not set a budget of 2 — you have set a budget of 8, and the agent sees a world that looks intermittently broken. Pick one owner for retries (we recommend the tool wrapper, where you know the error class) and disable retries everywhere else.

import random, time

# Map each error class to its budget. Stop-classes get 0 retries by design.
RETRY_BUDGET = {
    "validation":  {"retries": 0},
    "auth":        {"retries": 0},
    "permission":  {"retries": 0},
    "rate_limit":  {"retries": 3},   # 429
    "transient":   {"retries": 2},   # 5xx / timeout
}

def classify(status: int) -> str:
    if status in (400, 422):        return "validation"
    if status == 401:               return "auth"
    if status == 403:               return "permission"
    if status == 429:               return "rate_limit"
    if status >= 500 or status == 408: return "transient"
    return "transient"

def call_with_budget(do_call):
    """One retry owner. Backoff with full jitter; stop-classes never retry."""
    attempt = 0
    while True:
        status, result = do_call()
        if 200 <= status < 300:
            return result
        cls = classify(status)
        budget = RETRY_BUDGET[cls]["retries"]
        if attempt >= budget:
            # error_class + decision go straight into your structured logs
            log.warning("retry_exhausted", error_class=cls, status=status,
                        decision="escalate")
            raise RuntimeError(f"{cls} error ({status}) not retryable here")
        sleep = min(2 ** attempt, 30) * (0.5 + random.random())  # full jitter
        time.sleep(sleep)
        attempt += 1

“A 403 will be a 403 on the fiftieth try. Retrying a permanent error is not resilience — it is the loop.”
The single rule that prevents most retry-driven agent loops

Error class	Example status	Retries	Backoff	Then what
Validation	400 / 422	0	none	Stop — return the validation error to the model so it fixes its input
Auth	401	0	none	Stop and escalate — credentials will not self-heal
Permission	403	0	none	Stop and escalate — this is a policy block, not a glitch
Rate limit	429	3	exponential + jitter	Respect Retry-After; after 3, escalate or queue
Transient server	500 / 502 / 503	2	exponential + jitter	Escalate after 2 — do not retry forever
Timeout	408 / client timeout	2	exponential + jitter	Escalate after 2; check the 300s SDK default timeout

Retry budget by error class. Centralize this in one layer; do not stack it on top of HTTP-client retries.

How do typed tool results stop the ambiguous-feedback loop?

Return a typed status enum from every tool — ok | no_results | error — instead of human-readable prose, so the agent can branch deterministically on the outcome. Ambiguous feedback like ‘more results may be available’ is the most common loop trigger because it reads to the model as ‘keep trying,’ not ‘done.’

This is the root cause that no incumbent explainer puts front and center, and it is the one with the highest leverage. The AWS engineering team’s write-up on DEV Community measured it directly: a tool returning the ambiguous string ‘Found N flights… Note: More results may be available’ caused the agent to make 14 tool calls in 21 seconds. The exact same task, with the tool returning an explicit terminal status, took 2 tool calls in 4 seconds — a 7x reduction purely from clearer signaling.

The fix is not better prompt engineering. It is a contract. Define a small status enum, attach it to every tool result, and the model gets an unambiguous branch point. ‘no_results’ is a terminal success that means stop looking; it is categorically different from ‘error,’ which the model should not confuse with an empty-but-valid answer. Conflating those two is what produces the ‘search forever’ loop.

from enum import Enum
from dataclasses import dataclass, asdict
from typing import Any

class ToolStatus(str, Enum):
    OK = "ok"                 # request succeeded, payload present
    NO_RESULTS = "no_results" # request succeeded, nothing to return (TERMINAL)
    ERROR = "error"           # request failed; see error_class

@dataclass
class ToolResult:
    status: ToolStatus
    data: Any = None
    error_class: str | None = None   # validation | auth | permission | ...
    message: str = ""

    def to_model(self) -> dict:
        # The model sees status FIRST. no_results is a stop, not a retry.
        return asdict(self)

# Before (ambiguous prose -> the agent re-searches forever):
#   return f"Found {n} flights under ${cap}. More results may be available."

# After (typed -> the agent branches deterministically and stops):
def search_flights(cap: int) -> ToolResult:
    matches = run_search(cap)
    if not matches:
        return ToolResult(ToolStatus.NO_RESULTS,
                          message=f"No flights under ${cap}. This is final.")
    return ToolResult(ToolStatus.OK, data=matches,
                      message=f"Returned {len(matches)} flights. Complete.")

Typed results cut the AWS team’s loop from 14 tool calls to 2 — a 7x reduction — with zero changes to the prompt or the model. It is the highest-leverage fix in this article.

Why is max_turns alone not enough to stop an agent loop?

max_turns is a blunt instrument: it caps the damage but does not fix the loop, because an agent that hits its turn limit on a complex task has simply failed on that task. Pair it with max_budget_usd, the duplicate-call detector, and typed tool results so the loop is prevented and contained rather than just truncated.

The Claude Agent SDK documents this honestly. max_turns ‘counts tool-use turns only’ and defaults to no limit; max_budget_usd ‘caps turns based on a spend threshold,’ and the docs state plainly that ‘setting a budget is a good default for production agents.’ Both are real backstops — but they are the last line of defense, not the first. When either fires, the SDK returns a ResultMessage with subtype error_max_turns or error_max_budget_usd, which tells you the run was killed, not that the task succeeded.

Think of it as layered defense. The duplicate-call detector and typed results prevent the loop. The retry budget contains the most common cause. max_budget_usd is your spend ceiling, which matters because a runaway agent’s real cost is dollars, not turns — a single expensive turn can blow past a generous turn count. And max_turns is the final fuse. Relying on max_turns alone is like running a service with only an OOM killer and no memory management: technically it stops, but every stop is an incident.

One more SDK detail worth wiring in: the default external-call timeout is 300 seconds (5 minutes). A tool that hangs without timing out can stall a loop just as effectively as one that repeats, so confirm your tool calls inherit a sane timeout rather than blocking indefinitely.

Pros

Trivial to set — one parameter, hard upper bound on runaway sessions
Guarantees the process eventually terminates regardless of model behavior
A sensible final backstop once the real controls are in place

Cons

Does not prevent the loop — it only truncates it after the waste happens
A turn cap reached on 30% of complex tasks means 30% silent task failures
Counts turns, not dollars; one expensive turn can blow the real budget
Gives you no diagnostic — you learn a run died, not why it looped

Set max_budget_usd as your production default and keep max_turns as a fuse, but never ship them as your only defense. Without the duplicate-call detector and typed results, you are just choosing how much money to waste before the loop is killed.

What should I log to debug an agent loop fast?

Repeat threshold to trip the detector

Same (tool, result) fingerprint three times = confirmed loop

0 / 3 / 2

Retries: stop-class / rate-limit / transient

Validation, auth, permission get zero; 429 gets 3; 5xx/timeout get 2

Fewer tool calls from typed results

AWS measured 14 calls down to 2 with a status enum

max_budget_usd default in the SDK

No limit by default — set one; it is ‘a good default for production agents’

Stop the loop in layers, in order

Work the four root causes in sequence. First, give every tool a typed status enum (ok | no_results | error) — this alone fixes the most common loop and cut one team’s calls 7x. Second, set retry budgets by error class: 0 for validation/auth/permission, 3 for 429, 2 for 5xx/timeout, and centralize them in one layer. Third, run the duplicate-call detector that trips at 3 identical fingerprints and feeds a BLOCKED message back to the model. Fourth, log the six structured fields so the next loop is a one-line grep. Only then add max_budget_usd as your spend ceiling and max_turns as the final fuse. max_turns alone is not a fix — it is the OOM killer you never want to rely on.

Log a structured record per step with six fields: loop_iteration, last_tool, last_result_hash, tool_calls_count, error_class, and decision. These turn a 3am loop incident into a one-line grep instead of an archaeological dig through raw transcripts.

Most teams log the full agent transcript and nothing else, which means debugging a loop requires reading thousands of tokens of model narration to find the three identical calls buried inside. The fields above — the same schema MatrixTrak recommends — let you spot the loop instantly: filter on a repeating last_result_hash, and the loop_iteration count tells you exactly when the fingerprint started repeating and what decision the breaker made.

The decision field is the keystone. It records what your control layer did at each step — continue, BREAK_LOOP, escalate, retry — so when you review an incident you can see not just that the agent looped, but whether your defenses fired correctly. If you see five identical last_result_hash values and a decision of ‘continue’ on all of them, your detector threshold is mis-set. If you see error_class ‘permission’ with decision ‘retry,’ your retry budget is leaking. The logs tell you which of the four root causes you are looking at without ever opening the transcript.

Builder’s take

I have shipped enough agent infrastructure across Cyntr and Loomfeed to know that a looping agent is almost never a ‘bad model’ problem. It is a missing-feedback problem, and you fix it in the harness, not the prompt.

A loop is a missing state machine. If your agent cannot read a machine-parsable ‘done’, it will keep poking at the world hoping the world changed. Give every tool a typed status before you touch the system prompt.
max_turns is a fuse, not a fix. I run it as a backstop, but the real controls are a duplicate-call fingerprint detector and a retry budget keyed to the error class. Treat a 429 differently from a 403 differently from a 500.
Centralize retries in exactly one layer. The worst loops I have debugged were three retry mechanisms (HTTP client, tool wrapper, agent policy) multiplying into 2x2x2 attempts on a permanent failure. Pick one owner.
Log the fingerprint, not the transcript. loop_iteration, last_tool, last_result_hash, error_class, decision — five fields turn a 3am incident into a one-line grep instead of a transcript archaeology dig.

Frequently asked questions

Why does my AI agent keep calling the same tool repeatedly?

It is not getting a machine-readable signal that the call succeeded or that the task is done. When a tool returns ambiguous prose like ‘more results may be available,’ the model reads it as an invitation to try again. Return a typed status (ok | no_results | error) instead, and the agent gets an unambiguous branch point. This single fix cut one AWS team’s loop from 14 tool calls to 2.

What is the best threshold for detecting an agent loop?

Three repeats of the same fingerprint. Fingerprint each step as a hash of the last tool name plus its result; when that exact (tool, result) pair occurs three times, you have a confirmed loop. Two repeats can be a legitimate retry after a transient blip, but a third identical pair means the agent has learned nothing and the environment has not changed.

How many times should an AI agent retry a failed tool call?

It depends on the error class. Validation (400/422), auth (401), and permission (403) errors get zero retries — they are permanent and will return the same failure every time. Rate limits (429) get three retries with exponential backoff plus jitter. Transient server errors (5xx) and timeouts get two retries, then escalate. A single global retry count is what causes retry loops.

Is max_turns enough to stop an AI agent from looping?

No. max_turns caps the damage but does not prevent the loop, and an agent that hits its turn limit on a complex task has simply failed silently. The Claude Agent SDK documents it as counting tool-use turns only with no default limit. Pair it with max_budget_usd (a spend ceiling the SDK calls ‘a good default for production agents’), a duplicate-call detector, and typed tool results so the loop is prevented, not just truncated.

What is the difference between max_turns and max_budget_usd?

max_turns caps the number of tool-use round trips; max_budget_usd caps total spend. They protect against different failure shapes. A loop of cheap calls is bounded by max_turns, but a few very expensive turns can blow your real budget while staying under a generous turn count — that is what max_budget_usd catches. Set the budget as your production default and keep max_turns as a final fuse.

How do I stop an agent loop that has non-idempotent side effects?

This is the dangerous case — the looping tool sends emails, charges cards, or files tickets on each repeat. Two defenses: attach an idempotency key so duplicate calls with the same key are deduplicated server-side, and run the duplicate-call fingerprint detector so the second or third identical call is blocked before it executes the side effect again. Never rely on max_turns here, because even a small loop can duplicate real-world actions.

Primary sources

How the agent loop works — Claude Agent SDK (max_turns, max_budget_usd) — Anthropic / Claude Code Docs
How to Prevent AI Agent Reasoning Loops from Wasting Tokens — AWS on DEV Community
Agent keeps calling same tool: why autonomous agents loop forever in production — MatrixTrak
Why Your AI Agent Keeps Looping and How to Stop It — Meritshot
Why your AI agent loops forever (and how to break the cycle) — DEV Community

Last updated: June 2, 2026. Related: Observability.

Why Does My AI Agent Keep Looping? (And How to Stop It)

Why does my AI agent keep looping?

The four root causes of an AI agent stuck in a loop

How do I detect when my AI agent is repeating the same tool call?

What retry budget should I set for each error class?

How do typed tool results stop the ambiguous-feedback loop?

Why is max_turns alone not enough to stop an agent loop?

Pros

Cons

What should I log to debug an agent loop fast?

Stop the loop in layers, in order

Builder’s take

Frequently asked questions

Why does my AI agent keep calling the same tool repeatedly?

What is the best threshold for detecting an agent loop?

How many times should an AI agent retry a failed tool call?

Is max_turns enough to stop an AI agent from looping?

What is the difference between max_turns and max_budget_usd?

How do I stop an agent loop that has non-idempotent side effects?

Primary sources

Leave a Reply Cancel reply

More Popular from Alatirok

Tokens Per Agentic Coding Task: The 2026 Variance Data

What Is Cognition Devin? The Enterprise Guide for 2026

What Is Circle Agent Stack? USDC Wallets for AI Agents

AI Agent Identity: Entra Agent ID vs Okta vs SailPoint

Why Does My AI Agent Context Window Fill Up So Fast?

Migrate OpenAI Agent Builder to Agents SDK Before Nov 30

Best Voice AI Agent Framework 2026: Vapi vs LiveKit vs Pipecat

Purpose-Built Legal AI vs General LLM: 2026 Verdict

Categories

Quick Links