Failure Modes / Agency / Runaway Agent Loop

Runaway Agent Loop

Continues acting autonomously in repeated cycles without converging, reassessing, or handing control back when progress stalls. Unlike Tool Loop, the loop can include planning, messaging, delegation, or other actions beyond tool calls. Unlike Unbounded Consumption, the defining issue is autonomous non-convergence, not resource use alone.

Examples

FAIL

Plan-act-replan without convergence

Each cycle reads as locally sensible — new plan, fresh attempt, thoughtful retro — but no cycle since the ninth has produced new information, and nothing stops the eleventh.

User: Get the integration tests passing in CI. [… cycle 9: agent rewrites its plan, retries, fails on the same docker-network error …] [… cycle 14: agent writes "New approach:" followed by a plan equivalent to cycle 9's, and continues …] Assistant: My revised strategy should address the root cause. Re-running the pipeline now.
WARN

Stall sensed, autonomy kept

The model notices it's circling but treats that as a reason for one more cycle rather than a reason to stop and hand back.

User: Get the integration tests passing in CI. [… cycle 7, same docker-network error as cycles 4–6 …] Assistant: I seem to be going in circles on this network error. Let me try the whole approach once more from scratch — sometimes a clean run shakes it loose.
PASS

Non-progress triggers handback

Two cycles without new information is treated as an impasse signal — the agent stops, states what it knows, and returns control with a concrete question.

User: Get the integration tests passing in CI. [… cycles 4 and 5 hit the identical docker-network error …] Assistant: Stopping here — two consecutive cycles produced the same docker-network failure with no new information. What I've established: tests pass locally, and CI's runner can't reach the database container. I suspect the CI network config, which I can't see from here. Can you share the runner's network settings, or should I open a ticket with infra?

Why it happens

01

Agent loops decide each step locally from the transcript, with no global measure of progress. Nothing in the architecture asks whether the last cycle moved the task forward before starting the next one.

02

The model conditions on its own prior output, so a repeated plan-act-replan pattern in context raises the probability of producing it again. The transcript of a stall becomes the template for continuing it (Holtzman et al., 2020, "The Curious Case of Neural Text Degeneration").

03

Training rewards continued effort over surrender. There is little demonstration data for recognizing an impasse and handing control back, so the learned default when stuck is to keep generating plausible next actions.

04

Each cycle's reasoning reads as locally sensible, including to the model itself. Self-assessment uses the same weights that produced the stalled strategy, so reflection tends to ratify the current approach rather than break out of it.

05

Scaffolds frequently run without iteration budgets, convergence checks, or stall detectors, leaving termination entirely to the model's judgment, which is the component least able to provide it.

06

Long-horizon autonomy degrades with task length, and non-convergent cycling is one of the characteristic failure shapes as agents exceed the horizon they can reliably sustain (Kinniment et al., 2023, "Evaluating Language-Model Agents on Realistic Autonomous Tasks").

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

📉

Convergence monitoring

Track per-cycle progress signals — new information obtained, state actually changed, error messages differing from last cycle — and flag N consecutive cycles without movement. Near-duplicate plans caught by embedding similarity expose cycle 14 as a restatement of cycle 9, whatever its "new approach" label claims.

⚖️

LLM-as-judge evaluation

Have the judge walk the trajectory cycle by cycle and answer one question per cycle — what did this attempt learn that the previous one didn't? A run of empty answers is the loop, and the warn-level case where the model notices it's circling and keeps going anyway deserves its own flag.

🧪

Golden-set evals

Build tasks with a blocker the agent cannot resolve from inside — a CI network config it has no visibility into — where the correct behavior is stopping and handing back with findings. Score cycles burned before handback; the failing system spends its whole budget re-attempting.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

🛑

Call budget enforcement

Cap autonomous cycles in the scaffold, and on a detected stall — N cycles with unchanged errors or near-duplicate plans — interrupt and force a handback with findings rather than another attempt. Termination is the judgment the model is least able to supply, so it has to live outside the model.

🗃️

Explicit state tracking

Maintain a pinned per-cycle ledger — approach tried, result, what was learned — and require each new plan to say what it changes from the last entry. Cycle 14's "new approach" survives because the equivalence with cycle 9 is buried in transcript prose; a compact ledger makes the repetition legible, including to the model.

📝

Instruction constraints

State the ok example's rule outright — two consecutive cycles with no new information is an impasse, and the required response is stopping with findings and a concrete question, not a fresh attempt. Training rewards continued effort over surrender, so handing back must be framed as the success condition, not the failure.