Designing Agent Memory That Forgets: Time-Decay Scoring and Memory Consolidation for LLM Agents

The "remember everything" trap

When teams build their first LLM agent with persistent memory, the default instinct is to store every interaction, every retrieved fact, every intermediate reasoning step. Storage is cheap. More context means better answers. Right?

Wrong. Within a few hundred sessions, your agent's memory store becomes a liability. A user corrected their shipping address three times - which version is current? A project's requirements evolved over six meetings - the early versions are now actively misleading. A debugging session from two months ago references a bug that has long been fixed. When the retrieval layer pulls these stale memories into the context window, the agent treats them as equally valid to fresh information. The result is context pollution: the model's attention is diluted across outdated, contradictory, and irrelevant memories, and answer quality degrades.

Human memory solves this problem through forgetting. Not random forgetting - structured, purposeful forgetting where less-accessed and less-important memories gradually fade while frequently reinforced and emotionally significant memories strengthen. We can build the same mechanics into agent memory systems.

Exponential decay: the mathematical foundation

The most widely used model for memory decay is the exponential decay function, inspired by Ebbinghaus's forgetting curve from cognitive psychology. For a memory created at time t₀ with initial importance score S₀, the effective score at time t is:

S(t) = S₀ × e^(−λ × (t − t₀))

Here, λ (lambda) is the decay rate constant. A higher λ means faster forgetting. The elegance of exponential decay is that it is memoryless - the rate of decay at any point depends only on the current score, not on the history of how the score got there. This makes it computationally simple: you do not need to track access history to compute the current score, just the creation time and the decay rate.

In practice, you set λ based on the expected useful lifetime of different memory types. For conversational context ("the user prefers bullet-point summaries"), a half-life of 7-14 days is typical. For factual knowledge ("the API endpoint is /v2/users"), a half-life of 60-90 days works better. For identity-level information ("the user is the VP of Engineering at Acme Corp"), you may want a half-life of 6-12 months or no decay at all.

The half-life t½ relates to λ by: λ = ln(2) / t½. So a 14-day half-life gives λ ≈ 0.0495 per day. After 14 days, a memory's score is 50% of its original. After 28 days, 25%. After 42 days, 12.5%. The decay is aggressive enough to suppress stale information but gradual enough that recently relevant memories remain accessible.

Importance-weighted retention: not all memories are equal

Pure time decay treats all memories identically - a trivial "user said hello" message decays at the same rate as "user confirmed the production deployment window is Friday 2am UTC." This is clearly wrong. Important memories should decay more slowly.

The solution is to assign each memory an importance score at creation time and use it to modulate the decay rate. There are several practical approaches:

LLM-assessed importance: When a memory is created, pass it through a lightweight LLM call (or a fine-tuned classifier) that rates its importance on a 0-1 scale. Memories scoring above 0.8 get a reduced decay rate (λ × 0.3). Memories below 0.3 get an accelerated decay rate (λ × 2.0). This is the most accurate approach but adds latency and cost to memory creation.
Heuristic importance: Assign importance based on structural weights. Memories derived from explicit user instructions score higher than those from casual conversation. Memories containing named entities, dates, or numeric values score higher than generic statements. Memories that were referenced in subsequent interactions get a boost. This is cheaper than LLM-based scoring and works surprisingly well in practice.
Access-reinforced importance: Borrow from the spacing effect in cognitive science. Every time a memory is retrieved and used in a response, reset its decay clock or boost its score. Memories that prove repeatedly useful resist decay naturally. Memories that are never accessed fade quickly. This creates an emergent system where the agent's working memory naturally reflects its most practically useful knowledge.

The most robust production systems combine all three: LLM-assessed base importance, heuristic modifiers, and access-based reinforcement. The combined score feeds into the decay function as a variable decay rate: S(t) = S₀ × e^(−λ(importance) × (t − t_last_access)), where t_last_access is the most recent time the memory was retrieved.

Memory consolidation: merging related memories into higher-fidelity records

Time decay handles the problem of stale memories. But there is a related problem it does not solve: memory fragmentation. Over many sessions, an agent accumulates dozens of small memories about the same topic. "User is building a React app." "User chose Next.js for the framework." "User is deploying to Vercel." "User switched from Pages Router to App Router." "User needs SSR for SEO." These are five separate memories that, taken together, describe a single coherent project context.

When retrieval pulls three of the five fragments, the agent has an incomplete picture. When it pulls all five, it wastes context window tokens on redundant information. The solution is memory consolidation - periodically merging related memories into fewer, richer records that capture the same information more efficiently.

Consolidation works in three steps:

Cluster detection. Group memories by semantic similarity. Embed all active memories and run a clustering algorithm - HDBSCAN works well because it does not require a pre-specified number of clusters and handles noise (memories that do not belong to any cluster). Memories within a cluster are candidates for consolidation.
Summary generation. For each cluster, pass the constituent memories to an LLM with instructions to produce a single consolidated memory that preserves all key facts, resolves any contradictions (preferring more recent information), and discards redundancy. The prompt should emphasize that the consolidated memory must be self-contained - it should make sense without reference to the original fragments.
Replacement. Archive the original fragment memories (do not delete them - you may need them for audit or debugging) and insert the consolidated memory. The new memory inherits the highest importance score from its constituents and has its decay clock reset to the current time.

The Generative Agents paper from Stanford and Google demonstrated this reflection-and-consolidation pattern in a simulated environment, showing that agents with memory consolidation developed more coherent long-term behavior than agents with raw memory stores. The same principle applies to production LLM agents.

Implementing decay-aware retrieval

Having a decay score on each memory is useless if your retrieval layer ignores it. The retrieval query must combine semantic relevance (how well does this memory match the current query?) with temporal relevance (how fresh is this memory?).

The simplest approach is a weighted linear combination:

final_score = α × similarity_score + (1 − α) × decay_score

Where α controls the trade-off. A typical starting point is α = 0.7, weighting semantic relevance more heavily but allowing temporal relevance to break ties and suppress stale results. You should tune α based on your application: customer support agents need higher temporal weighting (recent tickets are more relevant), while knowledge management agents need lower temporal weighting (old policies may still be in effect).

A more sophisticated approach is to use the decay score as a pre-filter. Before running semantic search, exclude memories whose decay score has fallen below a minimum threshold (e.g., 0.05). This reduces the search space and ensures that deeply decayed memories never appear in results regardless of their semantic similarity. The pre-filter is especially important for agents with large memory stores - it keeps the ANN index lean and search latency low.

For agents that handle both time-sensitive conversations and stable knowledge retrieval, consider using separate memory pools with different decay configurations. Conversational memory decays aggressively (half-life of days). Factual knowledge decays slowly (half-life of months). Identity information may not decay at all. Each pool has its own retrieval parameters, and the agent's orchestration layer decides which pools to query based on the nature of the current request. Our post on context window assembly strategies covers how to efficiently pack memories from multiple sources into a single prompt.

After implementing time-decay scoring with access-based reinforcement, our customer support agent stopped referencing resolved tickets in new conversations. The CSAT score for our AI-assisted responses went from 3.2 to 4.1 out of 5 - almost entirely because the agent stopped confusing current issues with historical ones.

The consolidation schedule: when and how often

Running consolidation too frequently wastes compute and risks over-compressing memories that are still accumulating detail. Running it too infrequently lets fragmentation build up. The right cadence depends on your agent's interaction volume.

A practical heuristic: trigger consolidation when any memory cluster exceeds a fragment threshold - typically 5-10 memories within a similarity radius of 0.85. You can check this asynchronously after each memory creation. When the threshold is exceeded, queue a consolidation job for that cluster.

Additionally, run a global consolidation sweep on a fixed schedule - daily for high-volume agents, weekly for lower-volume ones. The global sweep catches clusters that grow gradually and would not trigger the per-insertion check.

One critical implementation detail: consolidation must be idempotent. If the consolidation job fails halfway through, or runs twice due to a scheduling glitch, the result should be the same as if it ran once successfully. Archive-then-insert (rather than delete-then-insert) ensures that no memories are lost even if the process fails between steps.

Garbage collection: the final stage of the memory lifecycle

Even with decay and consolidation, your memory store will grow over time. Archived fragment memories accumulate. Consolidated memories themselves eventually decay. You need a garbage collection process that permanently removes memories below a floor threshold.

The garbage collection threshold should be significantly below your retrieval pre-filter threshold. If you stop retrieving memories at a decay score of 0.05, garbage collect at 0.01. This buffer ensures that a memory is not garbage collected just before a consolidation sweep would have boosted it back above the retrieval threshold.

For compliance-sensitive applications, garbage collection must respect retention policies. Some memories may need to be retained for audit purposes regardless of their decay score. Tag these memories as retention-exempt during creation, and exclude them from garbage collection while still allowing their decay scores to fall (so they do not pollute retrieval results).

Where TypeGraph fits in

TypeGraph's agent memory layer implements configurable time-decay scoring with per-memory-type decay rates, access-based reinforcement, and automatic consolidation. The system supports multiple memory pools with independent decay configurations, decay-aware retrieval with tunable relevance-vs-recency weighting, and scheduled garbage collection with retention policy support. The goal is to give your agents the ability to forget gracefully - so they stay sharp, relevant, and trustworthy over thousands of sessions.