How to Set Up RAG on Vercel and Neon (in 10 minutes)

If you're shipping an AI app on Next.js, the boring-but-correct stack for RAG is Neon Postgres + pgvector + Vercel AI SDK. Serverless, no extra vector DB, no new bills. Here's the whole thing in five steps.

1. Provision Neon and enable pgvector

Create a project at neon.tech, then in the SQL editor:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id           bigserial PRIMARY KEY,
  content      text NOT NULL,
  embedding    vector(1536) NOT NULL,
  created_at   timestamptz DEFAULT now()
);

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

HNSW is the fast approximate index. Use 1536 dimensions for OpenAI text-embedding-3-small; change if you pick another model.

2. Install dependencies

pnpm add ai @ai-sdk/openai @neondatabase/serverless

Add DATABASE_URL and OPENAI_API_KEY to your Vercel project env vars.

3. Ingestion

Chunk your documents (a paragraph splitter works for most cases - see our context window assembly guide for advanced strategies), then embed and insert:

import { embedMany } from 'ai'
import { openai } from '@ai-sdk/openai'
import { neon } from '@neondatabase/serverless'

const sql = neon(process.env.DATABASE_URL!)

export async function ingest(chunks: string[]) {
  const { embeddings } = await embedMany({
    model: openai.embedding('text-embedding-3-small'),
    values: chunks,
  })

  await Promise.all(
    chunks.map((content, i) =>
      sql`INSERT INTO documents (content, embedding)
          VALUES (${content}, ${JSON.stringify(embeddings[i])})`
    )
  )
}

4. Retrieval

Embed the query, run a cosine-distance search (<=>), grab the top K:

import { embed } from 'ai'
import { openai } from '@ai-sdk/openai'
import { neon } from '@neondatabase/serverless'

const sql = neon(process.env.DATABASE_URL!)

export async function retrieve(query: string, k = 5) {
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: query,
  })

  return sql`SELECT content
             FROM documents
             ORDER BY embedding <=> ${JSON.stringify(embedding)}
             LIMIT ${k}`
}

5. Wire it into a chat route

In app/api/chat/route.ts:

import { streamText, convertToModelMessages } from 'ai'
import { openai } from '@ai-sdk/openai'
import { retrieve } from '@/lib/rag'

export async function POST(req: Request) {
  const { messages } = await req.json()
  const last = messages.at(-1)?.content ?? ''

  const docs = await retrieve(last)
  const context = docs.map((d) => d.content).join('\n---\n')

  const result = streamText({
    model: openai('gpt-4o'),
    system: `Answer using ONLY the context below.\n\nCONTEXT:\n${context}`,
    messages: convertToModelMessages(messages),
  })

  return result.toUIMessageStreamResponse()
}

Deploy. That's a working RAG app.

What's next

This minimal setup will carry you through a prototype. Before production:

Evaluate it. Track Recall@K, MRR, and end-to-end answer quality. See our evaluation guide.
Tune chunking and context assembly. Top-K alone leaves quality on the table - see context window assembly.
Add a graph layer for multi-hop questions. Vector RAG can't answer "which of Alice's projects use the same dependency as Bob's?" Read our Graph RAG on Vercel + Neon guide next.
Compare frameworks before scaling up. See the 5 best open source Graph RAG tools.

If you'd rather not assemble all this by hand, TypeGraph wraps ingestion, retrieval, evaluation, and a graph layer behind a TypeScript SDK that runs on the same Neon instance.

How to Set Up RAG on Vercel and Neon (in 10 minutes)

1. Provision Neon and enable pgvector

2. Install dependencies

3. Ingestion

4. Retrieval

5. Wire it into a chat route

What's next

More Posts

What We Learned Testing Embedding Dimensions and pgvector halfvec for RAG

From Human Memory to Machine Memory: A Field Guide to AI Memory Architecture

Sensory Memory: The Quarter-Second Buffer Behind Whisper and Kafka