How to Set Up Graph RAG on Vercel and Neon

Vanilla vector RAG breaks down on questions whose answers span multiple documents - "which of our customers in healthcare use the same vendor as Acme?" Vector similarity finds the closest chunks, but it doesn't connect them.

Graph RAG fixes this by extracting entities and relations from your corpus into a knowledge graph and expanding retrieval through the graph. Here's how to do it on Neon Postgres + Vercel AI SDK, with no separate graph database.

If you haven't shipped vanilla RAG yet, start with How to Set Up RAG on Vercel and Neon first - this guide builds on that schema.

1. Schema: graph beside the vectors

Same Neon database, four extra tables:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id bigserial PRIMARY KEY,
  content text,
  embedding vector(1536)
);

CREATE TABLE entities (
  id bigserial PRIMARY KEY,
  name text NOT NULL,
  type text NOT NULL,
  UNIQUE (name, type)
);

CREATE TABLE relations (
  subject_id bigint REFERENCES entities(id),
  predicate text NOT NULL,
  object_id bigint REFERENCES entities(id),
  PRIMARY KEY (subject_id, predicate, object_id)
);

CREATE TABLE mentions (
  doc_id bigint REFERENCES documents(id),
  entity_id bigint REFERENCES entities(id),
  PRIMARY KEY (doc_id, entity_id)
);

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

2. Extract entities and relations

Use generateObject from the AI SDK to extract typed (subject, predicate, object) triples per chunk. Production-grade extraction also needs entity resolution so "Acme Corp" and "Acme" map to the same node:

import { generateObject } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'

const Schema = z.object({
  triples: z.array(z.object({
    subject: z.string(),
    subjectType: z.string(),
    predicate: z.string(),
    object: z.string(),
    objectType: z.string(),
  })),
})

export async function extract(text: string) {
  const { object } = await generateObject({
    model: openai('gpt-4o-mini'),
    schema: Schema,
    prompt: `Extract entities and relations as triples from:\n\n${text}`,
  })
  return object.triples
}

3. Persist the graph

Upsert entities, then relations, then mentions linking the doc to its entities:

async function upsertEntity(name: string, type: string) {
  const [{ id }] = await sql`
    INSERT INTO entities (name, type) VALUES (${name}, ${type})
    ON CONFLICT (name, type) DO UPDATE SET name = EXCLUDED.name
    RETURNING id`
  return id as number
}

export async function persist(docId: number, triples: Triple[]) {
  for (const t of triples) {
    const s = await upsertEntity(t.subject, t.subjectType)
    const o = await upsertEntity(t.object, t.objectType)
    await sql`INSERT INTO relations VALUES (${s}, ${t.predicate}, ${o})
              ON CONFLICT DO NOTHING`
    await sql`INSERT INTO mentions VALUES (${docId}, ${s})
              ON CONFLICT DO NOTHING`
    await sql`INSERT INTO mentions VALUES (${docId}, ${o})
              ON CONFLICT DO NOTHING`
  }
}

4. Hybrid retrieval: vectors + graph expansion

The trick: retrieve top-K by vector similarity, then pull in 1-hop neighbors of any entity mentioned in those chunks. This is what makes multi-hop reasoning work:

export async function graphRetrieve(query: string, k = 5) {
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: query,
  })
  const vec = JSON.stringify(embedding)

  const seedDocs = await sql`
    SELECT id, content FROM documents
    ORDER BY embedding <=> ${vec} LIMIT ${k}`

  const ids = seedDocs.map((d) => d.id)

  const expanded = await sql`
    SELECT DISTINCT d.content
    FROM mentions m1
    JOIN relations r ON r.subject_id = m1.entity_id OR r.object_id = m1.entity_id
    JOIN mentions m2 ON m2.entity_id IN (r.subject_id, r.object_id)
    JOIN documents d ON d.id = m2.doc_id
    WHERE m1.doc_id = ANY(${ids}) AND d.id <> ALL(${ids})
    LIMIT 5`

  return [...seedDocs, ...expanded]
}

5. Wire it into the chat route

Same as vanilla RAG, just swap retrieve for graphRetrieve:

const docs = await graphRetrieve(lastMessage)
const context = docs.map((d) => d.content).join('\n---\n')

const result = streamText({
  model: openai('gpt-4o'),
  system: `Answer using ONLY the context.\n\n${context}`,
  messages: convertToModelMessages(messages),
})

This is the minimum viable version

Production Graph RAG also needs entity resolution, schema constraints on predicates, contradiction handling for facts that change over time, and good evals. Each of those is a project on its own.

If you'd rather not own all of that, TypeGraph ships entity resolution, contradiction detection, and graph-augmented retrieval as a TypeScript SDK that runs on the same Neon Postgres - and it's MIT licensed.

Comparing options? See the 5 best open source Graph RAG tools.

How to Set Up Graph RAG on Vercel and Neon

1. Schema: graph beside the vectors

2. Extract entities and relations

3. Persist the graph

4. Hybrid retrieval: vectors + graph expansion

5. Wire it into the chat route

This is the minimum viable version

More Posts

What We Learned Testing Embedding Dimensions and pgvector halfvec for RAG

From Human Memory to Machine Memory: A Field Guide to AI Memory Architecture

Sensory Memory: The Quarter-Second Buffer Behind Whisper and Kafka