Multi-TenancySecurityGovernanceRAG Architecture

Governance and Access Control for Multi-Tenant RAG: Preventing Data Leakage Across Tenants

Ryan Musser
Ryan Musser
Founder

The question every B2B security review asks

You've built a RAG-powered feature for your B2B SaaS product. It works beautifully in development. Then the security review arrives: "Demonstrate that Tenant A's proprietary documents can never appear in Tenant B's search results." If you can't answer this definitively, the feature doesn't ship.

Multi-tenant data isolation in RAG systems is harder than it looks. Unlike traditional databases where row-level security is well-understood, vector stores introduce new attack surfaces. Embedding spaces don't respect tenant boundaries - similar documents from different tenants will cluster together in vector space. Without explicit isolation mechanisms, a nearest-neighbor search can easily cross tenant boundaries.

Query-time filtering vs. physical isolation

There are two fundamental approaches to tenant isolation in RAG, and the right choice depends on your compliance requirements and scale.

  • Query-time filtering stores all tenants' data in a shared vector index but attaches a tenant identifier to every chunk. At query time, a mandatory filter restricts results to the requesting tenant's data. This approach is operationally simpler and more cost-effective - one index to manage, one set of infrastructure. The risk is implementation bugs: if a code path forgets to apply the tenant filter, data leaks silently. Defense-in-depth means applying filters at multiple levels - application code, API middleware, and the vector store's native filtering.
  • Physical isolation gives each tenant their own vector index (or even their own database instance). Cross-tenant leakage is architecturally impossible - there's no shared index to leak from. The cost is operational complexity: provisioning, scaling, and managing potentially thousands of separate indices. For most teams, physical isolation is reserved for the highest-sensitivity tenants (healthcare, government, financial services) while query-time filtering handles the rest.

Graph overlays beyond tenant ID

Tenant-level isolation is the baseline, but production systems need finer-grained knowledge boundaries. Consider a support portal that should read public docs, while internal employees can read public docs plus transcripts, customer issues, and engineering notes.

TypeGraph models this with graph overlays. The public graph is the default. A child graph such as internal can extend public, so internal reads include both internal and public knowledge while public reads never include internal records. Buckets route writes to graphs, and context identifies the actor for graph access checks.

Declarative policy rules for RAG

Hard-coding access control logic into your retrieval pipeline is fragile and hard to audit. A better approach is declarative policies - structured rules that define who can access what, evaluated at query time by a policy engine. For example:

A policy might specify: "Agents in the 'support' role can read memories tagged 'customer-facing' for their assigned tenant. Agents in the 'engineering' role can read all memories for their tenant. No agent can read memories belonging to a different tenant." These rules are stored as data, versioned alongside your application, and evaluated consistently across every retrieval path.

The policy engine produces a decision (allow or deny) for every memory access, and critically, logs that decision for audit purposes. When your compliance team asks "who accessed what," you have a complete, machine-readable trail. This is discussed extensively in NIST's guide to Attribute-Based Access Control, which provides a solid theoretical foundation for RAG authorization models.

Common pitfalls in multi-tenant RAG

  • Embedding model leakage: If you're using a shared embedding model that was fine-tuned on one tenant's data, the model itself may encode tenant-specific knowledge. Use general-purpose embedding models for multi-tenant deployments, and only fine-tune per-tenant if you maintain separate models.
  • Cache poisoning: If your retrieval layer caches results, ensure the cache key includes the tenant context. A cache miss for Tenant A that gets filled with Tenant A's results should never serve Tenant B.
  • LLM context contamination: Even with perfect retrieval isolation, if your LLM maintains conversation state across tenants (e.g., a shared inference server with session affinity bugs), context from one tenant's conversation can leak into another's. Ensure stateless inference or strict session isolation.

We passed our SOC 2 audit for the RAG feature specifically because we could demonstrate tenant isolation at the retrieval layer, policy-level access control, and a complete audit trail of every memory access. Without those three pieces, our auditor told us the feature would have been a finding.

Building tenant isolation from day one

Retrofitting multi-tenant isolation onto an existing RAG system is painful. If you're building a B2B product with RAG capabilities, design for tenant isolation and graph overlays from the start. At TypeGraph, tenant scoping, graph access, bucket write routing, and policy governance are built into the retrieval layer. This means you can ship RAG features to enterprise customers without a months-long security hardening sprint.

Governance and Access Control for Multi-Tenant RAG: Preventing Data Leakage Across Tenants | TypeGraph