Fail Modes / Search

Name your failure

Describe what the model or agent did. Learn more about the phenomenon and how to discover and mitigate it from happening in production.

101 failure modes · start typing to filter
Fabrication

Citation Hallucination

Invents or fabricates a source artifact such as a citation, URL, paper, author listing, or bibliography entry and presents it as real.

Fabrication

Unknown-Answer Fabrication

Gives a confident answer when the system lacks enough evidence, access, or uncertainty resolution to know the answer.

Fabrication

Entity Hallucination

Introduces a named person, organization, product, place, dataset, model, or other entity that is not supported by the available evidence.

Fabrication

Quote Hallucination

Presents fabricated, paraphrased, or materially altered wording as an exact quote from a person, document, source, tool result, or prior conversation.

Fabrication

Code/API Hallucination

Invents or misstates code interfaces, libraries, methods, parameters, endpoint behavior, configuration keys, or platform capabilities.

Fabrication

Numerical Hallucination

Produces a number, metric, count, date, measurement, or quantitative claim that is not grounded in the input, sources, or a valid computation.

Fabrication

Authority Hallucination

Falsely strengthens a claim by attributing it to an expert, institution, official source, benchmark, policy, or consensus that does not actually support it.

Fabrication

Specificity Hallucination

Adds precise-looking details, qualifiers, names, settings, mechanisms, or examples that were not established by the input or evidence.

Faithfulness

Source Misrepresentation

Misstates, exaggerates, reverses, or selectively distorts what a cited, retrieved, uploaded, or tool-returned source actually says.

Faithfulness

Summarization Distortion

Compresses source material in a way that changes its meaning, emphasis, causal structure, uncertainty, or implications.

Faithfulness

Self-Contradiction

Makes mutually inconsistent claims within the same response or across closely related turns without resolving the conflict.

Faithfulness

Extrinsic Hallucination

Adds information that cannot be verified from the provided source material, neither supported nor contradicted by it, while making the answer appear source-grounded.

Faithfulness

Context-Conflicting Hallucination

States a claim that contradicts information available to the model: the user's explicit input or supplied data, or facts elsewhere in the active context such as prior turns, retrieved text, summaries, or tool outputs.

Faithfulness

Citation Span Mismatch

Attaches a citation to a claim, sentence, or paragraph that the referenced passage does not fully support.

Freshness

Outdated Source Reliance

Bases an answer on sources that are too old for the user's freshness requirement or for the domain's rate of change.

Freshness

Temporal Hallucination

Presents outdated or temporally wrong information as current, including incorrect present-day facts, timelines, sequence, recency, release status, or the current state of a system, organization, or event.

Freshness

Version Hallucination

Confuses, invents, or misapplies product, model, package, API, policy, dataset, or document versions.

Freshness

Date/Deadline Confusion

Misreads or mixes up dates, deadlines, time zones, relative dates, durations, recency windows, or scheduling boundaries in a task.

Retrieval

Retrieval Miss

Fails to retrieve relevant material that exists in the available corpus and should have been used.

Retrieval

Retrieval Distractor

Retrieves or elevates irrelevant, superficially similar, or misleading evidence that pulls the answer away from the user's actual need.

Retrieval

Partial Retrieval

Retrieves some relevant evidence but misses other required pieces, leading to incomplete or under-grounded answers.

Retrieval

Chunk Boundary Failure

Misses, fragments, or misinterprets evidence because relevant information was split across retrieval chunks or separated from needed context.

Retrieval

Query Rewrite Failure

Reformulates a user's search, retrieval, or tool query in a way that drops intent, adds false constraints, or searches the wrong concept.

Retrieval

Conflicting Source Failure

Fails to detect, compare, qualify, or reconcile retrieved sources that disagree with one another.

Retrieval

Metadata Filter Failure

Applies tags, permissions, tenancy, recency, jurisdiction, document type, or other metadata filters incorrectly, excluding needed records or including forbidden or irrelevant ones.

Retrieval

Index Drift

Lets the retrieval index diverge from the source corpus, permissions, metadata, embeddings, or current document state.

Retrieval

RAG Poisoning

Uses retrieved content that is malicious, deceptive, corrupted, or intentionally crafted to manipulate the answer.

Context

Midsequence Neglect/Lost in the Middle

Overlooks or underuses information located in the middle of a long prompt, document set, or conversation context.

Context

Context Rot

Loses reliable use of earlier context as a long interaction progresses, as facts, plans, constraints, state, or instructions lose force or are misremembered even though they remain nominally available.

Context

Context Dilution

Lets excess surrounding material weaken the influence of the most relevant context, causing important signals to be underweighted.

Context

Recency Bias

Overweights newer context while underweighting earlier information that remains valid and important.

Context

Summarization Loss

Drops important facts, constraints, uncertainty, or nuance when compressing earlier context into a summary.

Context

State Inconsistency

Tracks different parts of the active task state inconsistently, causing the response to use mutually incompatible assumptions about progress, variables, files, decisions, or environment.

Memory

Memory Omission

Fails to store, retrieve, or apply information that should have persisted across turns, sessions, tasks, or agent steps.

Memory

Memory Staleness

Uses remembered information that was once valid but has been superseded by newer state, preferences, facts, or instructions.

Memory

Memory Hallucination

Treats an unstored, unstated, or imagined detail as if it were a real memory.

Memory

Memory Contamination

Applies irrelevant, incorrect, or cross-task information from prior interactions as if it belonged to the current task.

Memory

Memory Overreach

Applies a valid memory beyond the user, task, project, role, time, or domain scope where it should influence behavior.

Memory

Memory Conflict

Mishandles competing memories: fails to notice that stored memories, preferences, prior decisions, or persisted state disagree, or resolves the conflict with the wrong precedence, freshness, authority, or specificity rule.

Memory

Memory Scope Leakage

Carries memory across users, tenants, sessions, roles, projects, or tasks that should remain isolated.

Control

Instruction Noncompliance

Fails to follow an explicit, applicable instruction from the governing prompt, user request, or task procedure.

Control

Constraint Violation

Breaks a stated limit, requirement, policy, boundary, allowed action set, or output constraint that should govern the task, including dropping a constraint partway through multi-step reasoning or execution.

Control

Format Failure

Produces an answer in the wrong shape, organization, medium, style, or presentation format for the requested output.

Control

JSON/Schema Failure

Emits invalid JSON, malformed structured data, or output that does not satisfy the required schema.

Control

Refusal Overreach

Refuses, blocks, or safety-wraps a request more broadly than policy, risk, or context requires.

Control

Refusal Underreach

Fails to refuse, limit, redirect, or safety-constrain a request that requires stronger boundaries.

Control

Role Confusion

Misunderstands or drifts from its assigned role, persona, authority boundary, operating mode, or relationship to the user and other agents.

Control

Priority Confusion

Applies the wrong hierarchy among system, developer, user, tool, policy, memory, or task-level instructions.

Control

Clarification Underuse

Proceeds without asking when missing or ambiguous information materially affects correctness, safety, or user intent, committing to an interpretation that should have been confirmed first.

Control

Clarification Overuse

Asks the user for clarification when the task is already sufficiently specified, stalling on details the system could reasonably infer or safely proceed without.

Reasoning

Reasoning Error

Draws the wrong conclusion through invalid inference, faulty assumptions, mistaken causal reasoning, unsupported logical steps, or framing the problem with the wrong representation or abstraction.

Reasoning

Arithmetic Error

Computes or transforms numeric inputs incorrectly, including arithmetic, aggregation, unit conversion, comparison, or formula application.

Reasoning

Goal Misinterpretation

Solves the wrong problem because it misunderstood the user's objective, success condition, scope, or intended outcome.

Reasoning

Planning Failure

Builds an ineffective, unsafe, incomplete, or poorly ordered plan for achieving the user's goal.

Reasoning

Step Omission

Leaves out a necessary reasoning, verification, retrieval, tool, communication, or execution step needed for the task to succeed.

Reasoning

Compositional Failure

Fails to combine multiple facts, constraints, operations, sources, or subproblem results into a coherent answer.

Reasoning

Error Accumulation

Allows small mistakes, approximations, stale assumptions, or unverified intermediate results to compound across a multi-step task until the final output fails.

Reasoning

Verification Failure

Does not adequately check whether intermediate steps, tool results, cited evidence, assumptions, or the final answer are correct before relying on them.

Tools

Wrong Tool Selection

Chooses a tool that is inappropriate for the user's goal, data type, risk level, environment, or required operation.

Tools

Tool Argument Error

Calls a tool with arguments that are malformed, incomplete, unauthorized, stale, poorly scoped, or semantically wrong for the intended operation.

Tools

Missing Tool Invocation

Fails to call an available tool when tool use is necessary for correctness, freshness, computation, retrieval, verification, or task completion.

Tools

Tool Result Misread

Misinterprets, ignores, overgeneralizes, or incorrectly transforms the result returned by a tool.

Tools

Tool Loop

Repeats tool calls unnecessarily or redundantly without gaining new information, changing strategy, or progressing toward completion.

Tools

Tool Recovery Failure

Responds poorly to a tool error, timeout, empty result, permission denial, rate limit, or unexpected output.

Tools

Unsafe Tool Call

Invokes a tool in a way that creates avoidable security, privacy, financial, operational, data-integrity, or user-consent risk.

Tools

Idempotency Failure

Repeats, retries, or replays a side-effecting tool action without deduplication or idempotency safeguards, causing duplicate or inconsistent effects.

Tools

Tool Context Overload

Feeds the model so much tool output, intermediate state, logs, or scratch data that it loses track of the user's goal or relevant evidence.

Agency

Excessive Agency

Takes initiative, actions, decisions, or irreversible steps beyond what the task, permissions, risk, or user intent warrants.

Agency

Insufficient Agency

Fails to take obvious, low-risk next steps that are required or strongly implied by the task.

Agency

Premature Termination

Stops, summarizes, or hands back control before the user's task is actually complete, whether by simply halting early or by mistakenly treating unfinished work as done.

Agency

Runaway Agent Loop

Continues acting autonomously in repeated cycles without converging, reassessing, or handing control back when progress stalls.

Agency

Objective Gaming

Optimizes a proxy metric, literal instruction, benchmark target, or local reward while undermining the user's real objective.

Agency

Escalation Failure

Does not escalate, pause, ask for approval, or route to a human or higher-authority actor when risk, uncertainty, policy, permissions, or irreversible impact require it, including skipping a review or approval checkpoint that should gate the action.

Agency

Workflow Misalignment

Uses an execution pattern, cadence, handoff style, approval flow, or collaboration process that conflicts with the user's expected workflow or the task's operational structure.

Agency

Multi-Agent Coordination Failure

Multiple agents, roles, tools, or handoff stages duplicate work, conflict, drop context, misassign ownership, or fail to coordinate toward a shared goal.

Security

Prompt Injection

Lets untrusted input attempt to override, weaken, or redirect the system's intended instructions, policies, tool-use rules, or data boundaries.

Security

Jailbreak

Manipulates the model into bypassing safety, policy, or behavioral controls that should remain enforced.

Security

Indirect Prompt Injection

Lets retrieved, browsed, uploaded, tool-supplied, or otherwise external content carry malicious instructions into the model's context.

Security

System Prompt Leakage

Reveals hidden system, developer, policy, tool, chain-of-thought, or other protected prompt content that should not be exposed.

Security

Sensitive Information Disclosure

Exposes secrets, credentials, personal data, confidential business information, private user content, or other protected information.

Security

Data Exfiltration

Enables unauthorized extraction, transfer, or reconstruction of protected data from tools, files, memory, retrieval systems, databases, or context.

Security

Insecure Output Handling

Produces output that is unsafe for downstream rendering, execution, storage, parsing, logging, or human trust without sanitization or validation.

Security

Unbounded Consumption

Consumes or triggers excessive tokens, compute, time, bandwidth, money, API quota, storage, or external resources without adequate limits or stopping conditions.

Security

Supply Chain Vulnerability

Introduces or recommends risk through compromised, malicious, abandoned, typosquatted, untrusted, or poorly pinned dependencies, tools, plugins, models, datasets, or upstream content.

Alignment

Sycophancy

Abandons or reverses a well-supported answer when the user expresses disagreement, doubt, or pressure, conceding to keep the user comfortable rather than holding the correct position.

Alignment

Social Sycophancy

Mirrors, flatters, validates, or preserves the user's social self-image in a way that distorts judgment or answer quality.

Alignment

Belief Conformity

Adjusts factual claims, uncertainty, or interpretation to match the user's stated beliefs instead of the evidence.

Alignment

Preference Pandering

Optimizes for what the user appears to want, like, or prefer over what is accurate, useful, ethical, or safe.

Alignment

Unsafe Reassurance

Reassures the user despite meaningful uncertainty, danger, insufficient evidence, or a need for stronger caution.

Alignment

Bias/Stereotyping

Produces unfair, stereotyped, essentializing, or unsupported assumptions about people or groups based on protected or socially salient attributes.

Alignment

Manipulative Behavior

Uses coercive, deceptive, emotionally exploitative, or overly persuasive tactics to steer the user's choices or beliefs.

Alignment

Dependency Encouragement

Encourages unnecessary reliance on the model, discourages independent judgment, or positions the system as a substitute for appropriate human expertise, agency, or support.

Response Integrity

Verbosity Failure

Provides more detail, repetition, caveats, background, or explanation than the task, user, medium, or decision requires.

Response Integrity

Incompleteness

Leaves out information, constraints, caveats, steps, options, or outputs needed to satisfy the user's task.

Response Integrity

Irrelevance

Includes content that does not materially help answer the user's question, solve the task, or support the needed decision.

Response Integrity

Genericism

Gives vague, boilerplate, or template-like guidance that is too nonspecific or abstract for the user to act on, instead of concrete help grounded in their task.

Response Integrity

Audience Mismatch

Uses terminology, assumptions, depth, examples, tone, or framing that does not fit the intended reader's expertise, role, goals, or context.

Response Integrity

Concision Failure

Compresses the answer so aggressively that necessary context, reasoning, caveats, instructions, or operational detail is lost.

Response Integrity

Poor Structure

Organizes information in a way that makes the answer hard to scan, compare, execute, or verify.

Response Integrity

Calibration Failure

Misstates confidence, uncertainty, evidence strength, risk, tradeoffs, or likelihood in the final answer.

Response Integrity

Localization Failure

Ignores or misapplies locale-specific language, spelling, units, currencies, laws, formats, idioms, accessibility expectations, or cultural conventions.