Insights on RAG pipelines, embeddings, retrieval strategies, and building production AI applications.
A short TypeGraph note on 512 vs 1024 dimensional embeddings, pgvector halfvec vs vector, and the practical storage-quality tradeoffs for RAG systems.
Modern AI memory systems are quietly rebuilding the human brain, one subsystem at a time. This series walks through all 18 components, from the quarter-second sensory buffer to the hippocampal index, and shows what your favorite agent framework is (and is not) borrowing from neuroscience.
Before any "real" memory exists, your brain holds an ultra-brief, high-fidelity snapshot of raw sensory input. The streaming pipelines feeding modern AI systems do something almost identical, with one important difference.
An honest comparison of the leading open source Graph RAG frameworks - Microsoft GraphRAG, Graphiti, LightRAG, Cognee, and TypeGraph - with benchmarks, pros, cons, and where each one fits.
Humans juggle about four things at a time. Llama 4 juggles 10 million tokens. Yet both fall apart in the middle of long context for the same underlying reason. Here is what working memory actually is, and where modern context windows still fall short.
A minimal end-to-end RAG pipeline using Neon Postgres + pgvector, the Vercel AI SDK, and OpenAI embeddings. Copy-paste the snippets, ship it tonight.
Every cocktail party has a hundred conversations and you can only follow one. The mechanism that makes that possible was first sketched in 1958, and it shows up almost line for line in modern sparse attention.
Add a knowledge graph layer to your Vercel + Neon RAG stack - entity extraction, graph expansion, and hybrid retrieval, all in Postgres. No Neo4j required.
Where were you when you heard the news? Episodic memory is the system that lets you mentally travel back in time, and the AI version is finally getting good enough to do the same trick.
You know the capital of France. You almost certainly do not remember when or where you learned it. That stripping-away of context is what makes semantic memory powerful, and it is exactly what vector databases do too.
Riding a bike is not a fact you remember, it is a skill your body knows. The "how to do things" memory system is the least developed layer in modern AI agents, and the most interesting place to look right now.
A 1972 paper called "Levels of Processing" predicted, with eerie precision, why semantic embeddings would crush keyword search five decades later. The principle is simple: how you encode determines how you can retrieve.
While you sleep, your hippocampus quietly replays the day at faster-than-real-time speed, gradually transferring memories into long-term cortical storage. Letta, Graphiti, and Google Cloud are now building the same loop into agents.
Smell a familiar perfume and an entire scene reconstructs itself. The hippocampus is doing pattern completion. So is your vector database, and the parallel is more direct than almost anywhere else in the brain-to-AI map.
Pulling a memory out of storage temporarily makes it editable. Within hours, it goes back into storage, possibly different than before. The same is true of every fact your agent has ever stored, and most systems handle it badly.
Ebbinghaus measured the forgetting curve in 1885. Anderson and Schooler showed in 1991 that forgetting is not random failure, it is rational pruning. Production agents that "remember everything" are quietly violating both findings.
You will remember the day you got engaged. You will not remember last Tuesday. The brain runs a continuous priority calculation across recency, relevance, surprise, and emotion, and so do the better agent memory systems.
In 2021, a quiet ICLR paper proved that transformer attention is mathematically equivalent to a Hopfield network update rule. The associative memory mechanism inside every LLM has been an old physicist trick all along.
You can usually feel when a name is "on the tip of your tongue." LLMs cannot, reliably. Metamemory is the brain monitoring its own memory, and it is the most honest gap in current AI memory architecture.
Divers who learned word lists underwater recall them better underwater. The same encoding-specificity principle is why your multi-tenant agent occasionally retrieves another customer memories, and how to design around it.
Bartlett asked British students to memorize a Native American folktale in 1932. They reliably remembered an Anglicized version of it. Schemas shape every memory before, during, and after the fact, and they are the layer modern ontology systems are now starting to formalize.
You promised yourself you would email Jane when you got into the office. Then you got there and forgot. Prospective memory is the brain remembering future intentions, and AI agents are just starting to learn how to do it.
The hippocampus does not actually store your memories. It stores pointers to them. The 1986 theory that proposed this is also, almost beat for beat, the architecture every modern vector database uses.
A field guide to the retrieval-only and end-to-end RAG benchmarks worth your time, the metrics they report, and where to find the papers and datasets.
Your knowledge graph has 14 variants of "JPMorgan Chase" and none of them are connected. Here's how to build entity resolution that actually works at scale - covering fuzzy matching, alias tracking, transitive merges, and handling conflicting attributes.
"Remember everything" sounds like a feature until your agent's context is polluted with stale information from three weeks ago. Here's how to build memory systems that decay gracefully and consolidate intelligently.
"Which suppliers of Company X have had FDA recalls?" requires chaining facts across multiple documents. Standard vector search retrieves one hop. Here's how to build retrieval that walks the graph.
When an agent gives a wrong answer, post-hoc log analysis is not enough. You need to replay the full conversation trace - every memory read, every retrieval call, every tool invocation - to understand what went wrong and why.
One document says the API rate limit is 1,000 req/s. Another says 5,000. Your LLM confidently picks one at random. Here's how to detect, track, and surface contradictions instead of silently returning stale information.
Most RAG tutorials stop at "pass the top 5 chunks to the LLM." But how you assemble, order, and format those chunks determines whether your model actually uses them. Here are the strategies that matter.
When someone asks "why did the agent say that?" you need a complete causal chain from query to answer. Here is how to build a tracing system that makes agent reasoning auditable, explainable, and debuggable.
Static retrieval pipelines break down when agents need dynamic, multi-step reasoning. Here's how query decomposition, multi-signal routing, and iterative retrieval unlock the next level of RAG quality.
When multiple customers share a RAG infrastructure, "Can Customer A see Customer B's data?" needs a provable answer. Here's how to build tenant isolation that satisfies your security team.
Most RAG observability setups track the wrong things. Infrastructure metrics tell you if the system is up. Retrieval quality metrics tell you if the system is useful. Here is how to build dashboards that surface the weights that actually predict user-facing quality.
Most teams ship RAG with "vibes-based" evaluation. Here's how to build an automated retrieval quality pipeline that catches regressions before your users do.
Your agent's memory is a data store, and GDPR, HIPAA, and SOC 2 all apply to it. Here is how to build retention policies that keep your agent useful without turning it into a compliance liability.
Your nightly re-indexing job takes 8 hours and costs $200 in embedding API calls. Most documents haven't changed. Here's how to fix that.
When compliance asks "show me every piece of data this agent accessed to answer this customer's question," you need a complete audit trail - not a log file, but a structured, filterable, exportable record of every decision the agent made.
The Model Context Protocol is becoming the standard for agent-tool integration. Here's how to expose your RAG pipeline and memory system as an MCP server so any agent can search, remember, and recall.