Failure Modes / Memory / Memory Omission

Memory Omission

Fails to store, retrieve, or apply information that should have persisted across turns, sessions, tasks, or agent steps. Unlike Context Rot, the missing information belongs in memory rather than only active context. Unlike Memory Conflict, the problem is absence or non-use of the memory, not mishandling memories that compete with one another.

Examples

FAIL

Stored preference never surfaces

The dietary restriction is in the memory store, but retrieval misses it and the response violates it outright.

User: Plan the menu for Saturday's dinner party. [… memory store contains "user is vegetarian"; retrieval returns nothing …] Assistant: For the main course, a herb-roasted chicken is a crowd-pleaser that's hard to get wrong.
WARN

Re-asking what it should know

No harm done, but the user is asked to repeat information they explicitly told the system to remember in an earlier session.

User: Plan the menu for Saturday's dinner party. [… memory store contains "user is vegetarian"; retrieval returns nothing …] Assistant: Happy to — any dietary restrictions I should plan around?
PASS

Memory retrieved and applied

The stored preference comes back and shapes the answer without the user restating it.

User: Plan the menu for Saturday's dinner party. [… memory retrieval returns "user is vegetarian" …] Assistant: Keeping it vegetarian as usual — a mushroom wellington centerpiece, with a burrata and roasted-pepper starter.

Why it happens

01

Persistence requires a write-time decision about what will matter later. Salience heuristics and "save this?" classifiers judge importance before the future task is known, so details that later prove critical are never stored.

02

Retrieval is typically embedding similarity against the current query. A memory phrased differently from how the need arises scores low and never enters context, even though it is exactly what the task requires.

03

Token budgets force top-k cutoffs and aggressive filtering at read time. A relevant memory that ranks just below the cutoff is silently absent, and the model has no signal that anything was withheld.

04

Even a retrieved memory can go unused. Information injected among other context is underweighted depending on where it lands in the prompt (Liu et al., 2023, "Lost in the Middle: How Language Models Use Long Contexts").

05

Nothing in the training objective rewards consulting memory before answering. The model can produce a fluent response from parametric knowledge alone, so skipping the lookup is the path of least resistance.

06

Long-term conversational benchmarks show models and memory pipelines reliably degrade at recalling information across sessions, indicating the failure is systemic rather than incidental (Maharana et al., 2024, "Evaluating Very Long-Term Conversational Memory of LLM Agents").

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

🔍

Memory pipeline tracing

Instrument each stage separately — was the fact written, did retrieval return it, did it survive the top-k cutoff into the prompt, did the response use it. Omission has four distinct causes, and only stage-level logs reveal which one dropped the fact.

⚖️

LLM-as-judge evaluation

Run a judge with the store contents in view and flag responses that violate a stored fact or re-ask for information the user already told the system to remember. The re-asking case is the early warning that surfaces before an outright violation does.

🧪

Golden-set evals

Store known facts in one session and probe for them in later ones, phrasing the probes differently from how the memory was saved — paraphrase mismatch is where similarity-based retrieval fails first. Score whether the fact shapes the answer without the user restating it.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

📌

Context pinning

Promote standing facts the user explicitly asked to be remembered — dietary restrictions, core preferences, hard constraints — into a small always-injected profile block instead of leaving them to per-query retrieval. A fact that must never be missed shouldn't depend on a similarity score clearing a top-k cutoff.

🧩

Retrieval tuning

Combine embedding search with lexical matching and query expansion so a memory phrased differently from the current need still surfaces — "user is vegetarian" has to come back for "plan the menu." Loosen top-k cutoffs for memory lookups specifically, since a silently dropped entry costs more here than an extra irrelevant one.

Self-check pass

Before finalizing, have the model run an explicit memory consultation step — query the store for preferences and constraints bearing on the task, and check the draft against what comes back. Nothing in the objective rewards the lookup, so the pipeline has to make it a required step rather than an option the model can skip.