TypeGraph Blog

Insights on RAG pipelines, embeddings, retrieval strategies, and building production AI applications.

What We Learned Testing Embedding Dimensions and pgvector halfvec for RAG

A short TypeGraph note on 512 vs 1024 dimensional embeddings, pgvector halfvec vs vector, and the practical storage-quality tradeoffs for RAG systems.

Ryan Musser

4 min read

Agent MemoryMemory ArchitectureCognitive ArchitectureSeries

From Human Memory to Machine Memory: A Field Guide to AI Memory Architecture

Modern AI memory systems are quietly rebuilding the human brain, one subsystem at a time. This series walks through all 18 components, from the quarter-second sensory buffer to the hippocampal index, and shows what your favorite agent framework is (and is not) borrowing from neuroscience.

Ryan Musser

6 min read

Agent MemoryMemory ArchitectureStreamingSensory Memory

Sensory Memory: The Quarter-Second Buffer Behind Whisper and Kafka

Before any "real" memory exists, your brain holds an ultra-brief, high-fidelity snapshot of raw sensory input. The streaming pipelines feeding modern AI systems do something almost identical, with one important difference.

Ryan Musser

5 min read

Graph RAGOpen SourceComparisonRAG

5 Best Open Source Graph RAG Tools (2026)

An honest comparison of the leading open source Graph RAG frameworks - Microsoft GraphRAG, Graphiti, LightRAG, Cognee, and TypeGraph - with benchmarks, pros, cons, and where each one fits.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureWorking MemoryContext Windows

Working Memory and Context Windows: Why Your Agent Forgets Mid-Task

Humans juggle about four things at a time. Llama 4 juggles 10 million tokens. Yet both fall apart in the middle of long context for the same underlying reason. Here is what working memory actually is, and where modern context windows still fall short.

Ryan Musser

6 min read

RAGVercelNeonTutorialpgvector

How to Set Up RAG on Vercel and Neon (in 10 minutes)

A minimal end-to-end RAG pipeline using Neon Postgres + pgvector, the Vercel AI SDK, and OpenAI embeddings. Copy-paste the snippets, ship it tonight.

Ryan Musser

6 min read

Agent MemoryMemory ArchitectureAttentionSparse Attention

Attention Gating: From Broadbent's Filter to Sparse Transformers

Every cocktail party has a hundred conversations and you can only follow one. The mechanism that makes that possible was first sketched in 1958, and it shows up almost line for line in modern sparse attention.

Ryan Musser

5 min read

Graph RAGVercelNeonTutorialKnowledge Graph

How to Set Up Graph RAG on Vercel and Neon

Add a knowledge graph layer to your Vercel + Neon RAG stack - entity extraction, graph expansion, and hybrid retrieval, all in Postgres. No Neo4j required.

Ryan Musser

7 min read

Agent MemoryMemory ArchitectureEpisodic MemoryKnowledge Graphs

Episodic Memory: Time Cells, Place Cells, and Bi-Temporal Graphs

Where were you when you heard the news? Episodic memory is the system that lets you mentally travel back in time, and the AI version is finally getting good enough to do the same trick.

Ryan Musser

6 min read

Agent MemoryMemory ArchitectureSemantic MemoryKnowledge GraphsVector Databases

Semantic Memory: How Brains and Vector Databases Both Lose the Source

You know the capital of France. You almost certainly do not remember when or where you learned it. That stripping-away of context is what makes semantic memory powerful, and it is exactly what vector databases do too.

Ryan Musser

6 min read

Agent MemoryMemory ArchitectureProcedural MemoryTool UseSkills

Procedural Memory: The Skills Layer AI Agents Are Just Starting to Build

Riding a bike is not a fact you remember, it is a skill your body knows. The "how to do things" memory system is the least developed layer in modern AI agents, and the most interesting place to look right now.

Ryan Musser

6 min read

Agent MemoryMemory ArchitectureEmbeddingsEncoding

Memory Encoding: Why Deep Processing Beats Keyword Search

A 1972 paper called "Levels of Processing" predicted, with eerie precision, why semantic embeddings would crush keyword search five decades later. The principle is simple: how you encode determines how you can retrieve.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureConsolidationSleep-Time Agents

Memory Consolidation: How Sleep Inspired Background Memory Agents

While you sleep, your hippocampus quietly replays the day at faster-than-real-time speed, gradually transferring memories into long-term cortical storage. Letta, Graphiti, and Google Cloud are now building the same loop into agents.

Ryan Musser

6 min read

Agent MemoryMemory ArchitectureRetrievalVector Search

Memory Retrieval: Pattern Completion in Hippocampus and Vector Search

Smell a familiar perfume and an entire scene reconstructs itself. The hippocampus is doing pattern completion. So is your vector database, and the parallel is more direct than almost anywhere else in the brain-to-AI map.

Ryan Musser

6 min read

Agent MemoryMemory ArchitectureReconsolidationKnowledge Updates

Memory Reconsolidation: When Remembering Rewrites the Memory

Pulling a memory out of storage temporarily makes it editable. Within hours, it goes back into storage, possibly different than before. The same is true of every fact your agent has ever stored, and most systems handle it badly.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureForgettingDecay

Forgetting Is a Feature: Why Good Memory Systems Throw Things Away

Ebbinghaus measured the forgetting curve in 1885. Anderson and Schooler showed in 1991 that forgetting is not random failure, it is rational pruning. Production agents that "remember everything" are quietly violating both findings.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureScoringImportance Weighting

Memory Scoring: How the Brain (and Mem0) Decide What Matters

You will remember the day you got engaged. You will not remember last Tuesday. The brain runs a continuous priority calculation across recency, relevance, surprise, and emotion, and so do the better agent memory systems.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureAssociative MemoryHopfield Networks

Associative Memory: From Hopfield Nets to "Attention Is Hopfield"

In 2021, a quiet ICLR paper proved that transformer attention is mathematically equivalent to a Hopfield network update rule. The associative memory mechanism inside every LLM has been an old physicist trick all along.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureMetamemoryCalibration

Metamemory: Can Agents Know What They Don't Know?

You can usually feel when a name is "on the tip of your tongue." LLMs cannot, reliably. Metamemory is the brain monitoring its own memory, and it is the most honest gap in current AI memory architecture.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureMulti-TenantContext

Context-Dependent Retrieval: Why Memory Leaks Across Tenants

Divers who learned word lists underwater recall them better underwater. The same encoding-specificity principle is why your multi-tenant agent occasionally retrieves another customer memories, and how to design around it.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureSchemasOntologies

Schemas: How Prior Knowledge Quietly Distorts Every Memory

Bartlett asked British students to memorize a Native American folktale in 1932. They reliably remembered an Anglicized version of it. Schemas shape every memory before, during, and after the fact, and they are the layer modern ontology systems are now starting to formalize.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureProspective MemoryScheduling

Prospective Memory: Teaching Agents to Remember Future Intentions

You promised yourself you would email Jane when you got into the office. Then you got there and forgot. Prospective memory is the brain remembering future intentions, and AI agents are just starting to learn how to do it.

Ryan Musser

5 min read

Agent MemoryMemory ArchitectureHippocampusVector DatabasesIndexing

Hippocampal Indexing: The Most Direct Bio-to-Tech Parallel in AI Memory

The hippocampus does not actually store your memories. It stores pointers to them. The 1986 theory that proposed this is also, almost beat for beat, the architecture every modern vector database uses.

Ryan Musser

6 min read

EvaluationBenchmarksRAGRetrieval

RAG Benchmarks That Actually Matter: From BEIR to CRAG

A field guide to the retrieval-only and end-to-end RAG benchmarks worth your time, the metrics they report, and where to find the papers and datasets.

Ryan Musser

12 min read

Knowledge GraphEntity ResolutionGraph RAGData Quality

Entity Resolution in RAG Pipelines: How to Merge Duplicate Entities Across Unstructured Documents

Your knowledge graph has 14 variants of "JPMorgan Chase" and none of them are connected. Here's how to build entity resolution that actually works at scale - covering fuzzy matching, alias tracking, transitive merges, and handling conflicting attributes.

Ryan Musser

11 min read

Agent MemoryMemory ArchitectureLLM AgentsCognitive Architecture

Designing Agent Memory That Forgets: Time-Decay Scoring and Memory Consolidation for LLM Agents

"Remember everything" sounds like a feature until your agent's context is polluted with stale information from three weeks ago. Here's how to build memory systems that decay gracefully and consolidate intelligently.

Ryan Musser

10 min read

Graph RAGMulti-Hop ReasoningKnowledge GraphRetrieval Strategy

Multi-Hop Reasoning Over Knowledge Graphs in RAG: When One Retrieval Step Isn't Enough

"Which suppliers of Company X have had FDA recalls?" requires chaining facts across multiple documents. Standard vector search retrieves one hop. Here's how to build retrieval that walks the graph.

Ryan Musser

13 min read

ObservabilityDebuggingAI AgentsReplay

Conversation Replay for AI Agents: How to Debug What Your Agent Did and Why

When an agent gives a wrong answer, post-hoc log analysis is not enough. You need to replay the full conversation trace - every memory read, every retrieval call, every tool invocation - to understand what went wrong and why.

Ryan Musser

10 min read

Data QualityRAGKnowledge ManagementFact Verification

Building a Contradiction Detection System for RAG: When Your Knowledge Base Disagrees With Itself

One document says the API rate limit is 1,000 req/s. Another says 5,000. Your LLM confidently picks one at random. Here's how to detect, track, and surface contradictions instead of silently returning stale information.

Ryan Musser

10 min read

RAGContext AssemblyLLM OptimizationRetrieval

Context Window Assembly Strategies: How to Pack the Most Useful Information Into Your LLM's Token Budget

Most RAG tutorials stop at "pass the top 5 chunks to the LLM." But how you assemble, order, and format those chunks determines whether your model actually uses them. Here are the strategies that matter.

Ryan Musser

9 min read

TracingExplainabilityAI AgentsObservability

End-to-End Tracing for Agent Reasoning: From User Query to Final Answer

When someone asks "why did the agent say that?" you need a complete causal chain from query to answer. Here is how to build a tracing system that makes agent reasoning auditable, explainable, and debuggable.

Ryan Musser

12 min read

Agentic RAGAI AgentsRetrieval StrategyQuery Rewriting

Agentic Retrieval: Why Your AI Agent Shouldn't Use the Same Search Query the User Typed

Static retrieval pipelines break down when agents need dynamic, multi-step reasoning. Here's how query decomposition, multi-signal routing, and iterative retrieval unlock the next level of RAG quality.

Ryan Musser

12 min read

Multi-TenancySecurityGovernanceRAG Architecture

Governance and Access Control for Multi-Tenant RAG: Preventing Data Leakage Across Tenants

When multiple customers share a RAG infrastructure, "Can Customer A see Customer B's data?" needs a provable answer. Here's how to build tenant isolation that satisfies your security team.

Ryan Musser

11 min read

ObservabilityMonitoringRAGProduction

Observability for RAG Pipelines: The Metrics That Actually Matter in Production

Most RAG observability setups track the wrong things. Infrastructure metrics tell you if the system is up. Retrieval quality metrics tell you if the system is useful. Here is how to build dashboards that surface the weights that actually predict user-facing quality.

Ryan Musser

11 min read

EvaluationTestingRAGQuality Assurance

Evaluating RAG Retrieval Quality: Building an Automated Test Suite for Recall, Precision, and MRR

Most teams ship RAG with "vibes-based" evaluation. Here's how to build an automated retrieval quality pipeline that catches regressions before your users do.

Ryan Musser

14 min read

GovernanceComplianceAgent MemoryData Retention

Retention Policies for AI Agent Memory: Balancing Compliance With Usefulness

Your agent's memory is a data store, and GDPR, HIPAA, and SOC 2 all apply to it. Here is how to build retention policies that keep your agent useful without turning it into a compliance liability.

Ryan Musser

10 min read

IndexingProductionScalingCost Optimization

Incremental Re-Indexing for RAG: How to Keep a Million-Document Corpus Current Without Re-Embedding Everything

Your nightly re-indexing job takes 8 hours and costs $200 in embedding API calls. Most documents haven't changed. Here's how to fix that.

Ryan Musser

10 min read

AuditGovernanceAI AgentsCompliance

Building Audit Trails for AI Agent Decisions: Who Asked What, and What Did the Agent Access?

When compliance asks "show me every piece of data this agent accessed to answer this customer's question," you need a complete audit trail - not a log file, but a structured, filterable, exportable record of every decision the agent made.

Ryan Musser

11 min read

MCPAI AgentsAgent MemoryIntegration

Connecting MCP Servers to RAG Pipelines: Giving Every AI Agent a Shared Long-Term Memory

The Model Context Protocol is becoming the standard for agent-tool integration. Here's how to expose your RAG pipeline and memory system as an MCP server so any agent can search, remember, and recall.

Ryan Musser

11 min read