Memory Encoding: Why Deep Processing Beats Keyword Search

Encoding is what determines what you can recall

You can read a paragraph and forget it instantly, or you can read the same paragraph and remember it for years. The difference is rarely about effort or attention. It is about how the information was processed at the moment it landed. Encoding is the process that turns perception into a stored memory trace, and the depth at which you encoded predicts almost everything about how well you will retrieve it later.

The biology

The Levels of Processing framework, proposed by Craik & Lockhart in 1972, was a quiet revolution in memory science. They argued that memory is not a separate "store" you push things into. Memory is a byproduct of how deeply you process information at perception. Three levels:

Shallow (structural): "Is the word in capital letters?" Processes surface features. Encoded poorly.
Intermediate (phonetic): "Does the word rhyme with 'tree'?" Processes sound. Encoded slightly better.
Deep (semantic): "Is the word a type of fruit?" Processes meaning. Encoded best, by a wide margin.

In experiments, semantic processing produces dramatically better recall than structural processing of the same words, even when subjects spend the same amount of time on each. The reason is that deeper processing creates richer, more interconnected traces with more potential retrieval pathways.

The hippocampal dentate gyrus performs pattern separation: it transforms similar inputs into distinct, non-overlapping neural representations, so that the trace for "your friend's birthday party last year" does not get confused with "your cousin's birthday party last year" even though they shared many features. The amygdala enhances encoding for emotionally significant events, which is why you remember vividly where you were during a major news event but cannot recall what you ate for lunch last Tuesday. Dual coding theory (Paivio) shows that information encoded through both verbal and imagery channels has richer representations and more retrieval pathways.

The technology

Embedding models are the direct technical analog of encoding. The landscape is mature: OpenAI text-embedding-3, Cohere embed-v4.0, BGE (BAAI), Google's EmbeddingGemma (under 15ms inference on Edge TPU), and many others. They all transform raw text input into high-dimensional vector representations that capture semantic similarity.

The Craik and Lockhart parallel is precise. Keyword and BM25 indexing is shallow encoding (surface features only). Semantic embedding is deep encoding (capturing meaning, synonymy, context). On the MTEB benchmark, semantic embedding models outperform keyword baselines by wide margins on retrieval tasks, exactly the way Craik and Lockhart's "deep" subjects outperformed "shallow" subjects on recall tests.

Chunking strategies parallel encoding decisions:

Fixed-size chunking is rote processing.
Semantic chunking, which splits at natural cognitive boundaries, mirrors how the brain segments perception into meaningful units.
Recursive chunking mirrors schema-based encoding: bigger units made of smaller meaningful units.

Metadata tagging (timestamps, user IDs, topics) implements the encoding specificity principle, which says that context becomes part of the stored trace and contributes to retrieval. Without metadata, you have a vector floating in space. With metadata, you have a vector with retrieval handles.

LLM-driven fact extraction (Mem0's approach) represents very deep semantic encoding: rather than storing the raw text, the system extracts meaning. Multi-modal encoders (CLIP, ImageBind, SigLIP 2) directly parallel Paivio's dual coding theory: a single object encoded through both visual and verbal channels has richer retrieval pathways.

Where the gap is

Encoding is the most mature memory component in production today. Production-grade embedding models exist from multiple providers, with standardized benchmarks (MTEB) and battle-tested deployment. The remaining frontiers are compositional understanding (current embeddings still struggle to compose meaning correctly: "the dog chased the cat" and "the cat chased the dog" are too close in vector space) and perfect cross-lingual alignment.

Practical implication: if you are still doing keyword-only retrieval in 2026, switch. The science said "depth wins" five decades ago, and the technology has caught up. Hybrid retrieval (semantic + lexical) is the production default. Pure shallow encoding is a leftover from a less informed era.

Series footer

← Previous: Procedural Memory · Series anchor · Next: Memory Consolidation →

Memory Encoding: Why Deep Processing Beats Keyword Search

Encoding is what determines what you can recall

The biology

The technology

Where the gap is

Series footer

More Posts

What We Learned Testing Embedding Dimensions and pgvector halfvec for RAG

From Human Memory to Machine Memory: A Field Guide to AI Memory Architecture

Sensory Memory: The Quarter-Second Buffer Behind Whisper and Kafka