How to Set Up RAG on Vercel and Neon (in 10 minutes)
If you're shipping an AI app on Next.js, the boring-but-correct stack for RAG is Neon Postgres + pgvector + Vercel AI SDK. Serverless, no extra vector DB, no new bills. Here's the whole thing in five steps.
1. Provision Neon and enable pgvector
Create a project at neon.tech, then in the SQL editor:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id bigserial PRIMARY KEY,
content text NOT NULL,
embedding vector(1536) NOT NULL,
created_at timestamptz DEFAULT now()
);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
HNSW is the fast approximate index. Use 1536 dimensions for OpenAI text-embedding-3-small; change if you pick another model.
2. Install dependencies
pnpm add ai @ai-sdk/openai @neondatabase/serverless
Add DATABASE_URL and OPENAI_API_KEY to your Vercel project env vars.
3. Ingestion
Chunk your documents (a paragraph splitter works for most cases - see our context window assembly guide for advanced strategies), then embed and insert:
import { embedMany } from 'ai'
import { openai } from '@ai-sdk/openai'
import { neon } from '@neondatabase/serverless'
const sql = neon(process.env.DATABASE_URL!)
export async function ingest(chunks: string[]) {
const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-small'),
values: chunks,
})
await Promise.all(
chunks.map((content, i) =>
sql`INSERT INTO documents (content, embedding)
VALUES (${content}, ${JSON.stringify(embeddings[i])})`
)
)
}
4. Retrieval
Embed the query, run a cosine-distance search (<=>), grab the top K:
import { embed } from 'ai'
import { openai } from '@ai-sdk/openai'
import { neon } from '@neondatabase/serverless'
const sql = neon(process.env.DATABASE_URL!)
export async function retrieve(query: string, k = 5) {
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: query,
})
return sql`SELECT content
FROM documents
ORDER BY embedding <=> ${JSON.stringify(embedding)}
LIMIT ${k}`
}
5. Wire it into a chat route
In app/api/chat/route.ts:
import { streamText, convertToModelMessages } from 'ai'
import { openai } from '@ai-sdk/openai'
import { retrieve } from '@/lib/rag'
export async function POST(req: Request) {
const { messages } = await req.json()
const last = messages.at(-1)?.content ?? ''
const docs = await retrieve(last)
const context = docs.map((d) => d.content).join('\n---\n')
const result = streamText({
model: openai('gpt-4o'),
system: `Answer using ONLY the context below.\n\nCONTEXT:\n${context}`,
messages: convertToModelMessages(messages),
})
return result.toUIMessageStreamResponse()
}
Deploy. That's a working RAG app.
What's next
This minimal setup will carry you through a prototype. Before production:
- Evaluate it. Track Recall@K, MRR, and end-to-end answer quality. See our evaluation guide.
- Tune chunking and context assembly. Top-K alone leaves quality on the table - see context window assembly.
- Add a graph layer for multi-hop questions. Vector RAG can't answer "which of Alice's projects use the same dependency as Bob's?" Read our Graph RAG on Vercel + Neon guide next.
- Compare frameworks before scaling up. See the 5 best open source Graph RAG tools.
If you'd rather not assemble all this by hand, TypeGraph wraps ingestion, retrieval, evaluation, and a graph layer behind a TypeScript SDK that runs on the same Neon instance.