RAGVercelNeonTutorialpgvector

How to Set Up RAG on Vercel and Neon (in 10 minutes)

Ryan Musser
Ryan Musser
Founder

If you're shipping an AI app on Next.js, the boring-but-correct stack for RAG is Neon Postgres + pgvector + Vercel AI SDK. Serverless, no extra vector DB, no new bills. Here's the whole thing in five steps.

1. Provision Neon and enable pgvector

Create a project at neon.tech, then in the SQL editor:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id           bigserial PRIMARY KEY,
  content      text NOT NULL,
  embedding    vector(1536) NOT NULL,
  created_at   timestamptz DEFAULT now()
);

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

HNSW is the fast approximate index. Use 1536 dimensions for OpenAI text-embedding-3-small; change if you pick another model.

2. Install dependencies

pnpm add ai @ai-sdk/openai @neondatabase/serverless

Add DATABASE_URL and OPENAI_API_KEY to your Vercel project env vars.

3. Ingestion

Chunk your documents (a paragraph splitter works for most cases - see our context window assembly guide for advanced strategies), then embed and insert:

import { embedMany } from 'ai'
import { openai } from '@ai-sdk/openai'
import { neon } from '@neondatabase/serverless'

const sql = neon(process.env.DATABASE_URL!)

export async function ingest(chunks: string[]) {
  const { embeddings } = await embedMany({
    model: openai.embedding('text-embedding-3-small'),
    values: chunks,
  })

  await Promise.all(
    chunks.map((content, i) =>
      sql`INSERT INTO documents (content, embedding)
          VALUES (${content}, ${JSON.stringify(embeddings[i])})`
    )
  )
}

4. Retrieval

Embed the query, run a cosine-distance search (<=>), grab the top K:

import { embed } from 'ai'
import { openai } from '@ai-sdk/openai'
import { neon } from '@neondatabase/serverless'

const sql = neon(process.env.DATABASE_URL!)

export async function retrieve(query: string, k = 5) {
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: query,
  })

  return sql`SELECT content
             FROM documents
             ORDER BY embedding <=> ${JSON.stringify(embedding)}
             LIMIT ${k}`
}

5. Wire it into a chat route

In app/api/chat/route.ts:

import { streamText, convertToModelMessages } from 'ai'
import { openai } from '@ai-sdk/openai'
import { retrieve } from '@/lib/rag'

export async function POST(req: Request) {
  const { messages } = await req.json()
  const last = messages.at(-1)?.content ?? ''

  const docs = await retrieve(last)
  const context = docs.map((d) => d.content).join('\n---\n')

  const result = streamText({
    model: openai('gpt-4o'),
    system: `Answer using ONLY the context below.\n\nCONTEXT:\n${context}`,
    messages: convertToModelMessages(messages),
  })

  return result.toUIMessageStreamResponse()
}

Deploy. That's a working RAG app.

What's next

This minimal setup will carry you through a prototype. Before production:

If you'd rather not assemble all this by hand, TypeGraph wraps ingestion, retrieval, evaluation, and a graph layer behind a TypeScript SDK that runs on the same Neon instance.

How to Set Up RAG on Vercel and Neon (in 10 minutes) | TypeGraph