Memory Scoring: How the Brain (and Mem0) Decide What Matters

Memories are not equal

A "remember everything" memory store is functionally equivalent to a memory store that remembers nothing useful: with no priority signal, retrieval is random. Both biological and artificial memory systems need a way to rank memories. The brain runs this scoring in parallel across multiple subsystems, each contributing a different signal. Modern AI memory frameworks are starting to do the same.

The biology

Several systems converge to determine memory priority:

The amygdala tags emotionally charged events for enhanced encoding. This is why the day you got engaged or the day a loved one died is etched in detail, while the days around it are blurry.
Dopaminergic reward prediction errors from the VTA and substantia nigra signal when outcomes differ from expectations. Surprising events get a dopamine pulse and are remembered better.
The hippocampus detects novelty, giving priority to new information over information you already have.
The Von Restorff effect demonstrates that distinctive items, ones that stand out from their context, are remembered better. The list of seven words with one in red and six in black is the canonical lab demonstration.
The testing effect (sometimes called retrieval-enhanced learning) shows that retrieval itself strengthens memory more than passive re-study. Practice quizzes are more effective than re-reading the textbook.

What is striking about the brain's scoring is that it is multi-dimensional and continuous. There is not a single "importance" axis. There are many partial weights, and they combine over time into something like a stable priority for each memory.

The technology

Most production AI memory systems use composite scoring of the form:

score = a x recency + b x relevance + g x importance + d x access frequency

Mem0 implements importance filtering at ingestion (a small LLM identifies salient facts) and uses MMR-based reranking at retrieval. Graphiti combines BM25 keyword importance, cosine semantic relevance, and graph structural importance with temporal versioning. Zep's older formulation used a similar structure.

The most cognitively grounded scoring formula in production is loosely modeled on ACT-R (Adaptive Control of Thought, Rational), the cognitive architecture from John Anderson's lab at Carnegie Mellon. ACT-R's base-level activation is given by:

B(m) = ln(sum over j of t_j^-d)

where t_j is the time since the jth use of memory m, and d is a decay parameter. Combine this with associative weights (W_j x S_j_i, the spreading activation from the current context), semantic similarity, and stochastic noise, and you have a model that quietly powers a lot of cognitive simulation. A 2025 ACM paper demonstrated this exact mechanism running inside an LLM dialogue agent with human-like memory reinforcement.

Attention weights in transformers compute a kind of dynamic salience through Query-Key dot products, which is formally equivalent to modern Hopfield network retrieval. TF-IDF weights items by local frequency relative to corpus rarity, which loosely parallels the Von Restorff effect (distinctive items score higher).

Where the gap is

TF-IDF, BM25, and attention-based scoring are mature. Composite scoring in agent memory systems (Mem0, Graphiti) is production-ready. ACT-R-inspired scoring exists in research, but is rarely deployed.

Emotional salience weighting has no implementation in any major agent system. That is striking, because the amygdala's role in human memory is enormous. There is no production AI memory system that flags "this conversation is important to the user" the way amygdalar tagging does in the brain. Sentiment-aware memory weighting is a sitting opportunity.

Reward prediction error as a memory signal is also barely used. RL has it, in spades, but agent memory systems mostly do not. A memory framework that scored memories partly by "how surprising was this compared to what the agent expected" would map cleanly to dopaminergic memory tagging in biology.

Practical implication: composite scoring (recency + relevance + a simple importance signal) gets you 80% of the value with low effort. The next 20% comes from adding weights biology cares about: emotional weight, surprise, and explicit retrieval-based reinforcement.

Series footer

← Previous: Forgetting · Series anchor · Next: Associative Memory →

Memory Scoring: How the Brain (and Mem0) Decide What Matters

Memories are not equal

The biology

The technology

Where the gap is

Series footer

More Posts

What We Learned Testing Embedding Dimensions and pgvector halfvec for RAG

From Human Memory to Machine Memory: A Field Guide to AI Memory Architecture

Sensory Memory: The Quarter-Second Buffer Behind Whisper and Kafka