RAGEmbeddingspgvectorBenchmarks

What We Learned Testing Embedding Dimensions and pgvector halfvec for RAG

Ryan Musser
Ryan Musser
Founder

Our testing shows that halfvec beats vector when using pgvector on Postgres because it cuts raw vector storage roughly in half while preserving retrieval quality. Embedding dimension size is a quality-cost tradeoff: 1024 dimensions can significantly improve hard retrieval tasks, while 512 dimensions is cheaper and sufficient for basic use cases.

Here is the short version from three legal retrieval probes. Read each table left to right: nDCG@10 and Recall@10 measure retrieval quality, and time is wall-clock semantic search time for that benchmark's query set. Higher quality is better; lower time is better.

License TL;DR Retrieval

Configuration Storage nDCG@10 Recall@10 Time
512 dims, Large ingest + Lite search vector 0.7362 0.9231 5.30s
512 dims, Large ingest + Large search vector 0.8101 0.9385 5.26s
1024 dims, Large ingest + Large search vector 0.8066 0.9385 8.05s
1024 dims, Large ingest + Large search halfvec 0.8038 0.9385 5.69s

Contractual Clause Retrieval

Configuration Storage nDCG@10 Recall@10 Time
512 dims, Large ingest + Lite search vector 0.8929 0.9444 3.85s
512 dims, Large ingest + Large search vector 0.9167 0.9667 3.84s
1024 dims, Large ingest + Large search vector 0.9305 0.9778 3.81s
1024 dims, Large ingest + Large search halfvec 0.9287 0.9778 3.94s

Legal RAG Bench

Configuration Storage nDCG@10 Recall@10 Time
512 dims, Large ingest + Lite search vector 0.4307 0.6900 8.84s
512 dims, Large ingest + Large search vector 0.5969 0.8700 8.16s
1024 dims, Large ingest + Large search vector 0.6550 0.9100 9.35s
1024 dims, Large ingest + Large search halfvec 0.6580 0.9200 9.18s

The important pattern is not that halfvec directly improves quality. Treat tiny differences as normal run variance. The useful finding is that halfvec kept quality in the same range while cutting HNSW index footprint sharply.

Storage layout Approx. raw vector bytes Practical read
512 dims, vector ~2 KB per embedding Smaller and often strong enough for simpler corpora.
1024 dims, vector ~4 KB per embedding Higher recall potential, but roughly doubles raw vector storage.
1024 dims, halfvec ~2 KB per embedding Preserves the 1024-dimensional representation with about half the raw storage.

This storage math matters because HNSW search is fastest when the index stays hot in RAM. Once the index grows beyond your Postgres or Neon compute memory budget, cache hit rates fall and p95 latency gets harder to control. Smaller vectors usually mean smaller indexes, which lets you fit more chunks, tenants, or corpora on the same database tier before scaling up.

What does embedding dimension size change?

Embedding dimensions are the number of values used to represent a chunk, document, or query. A 512 dimensional embedding has 512 numbers. A 1024 dimensional embedding has 1024 numbers.

More dimensions can preserve more semantic detail, which helps when retrieval depends on narrow distinctions: legal clauses, compliance language, product terms, or long technical passages. The cost is larger vectors, larger indexes, and more RAM pressure.

Did 1024 dimensions improve retrieval quality?

In our legal retrieval testing, yes. The hardest corpora benefited from 1024 dimensional embeddings with Voyage 4 Large, especially when the retriever needed one precise supporting passage from thousands of similar legal passages. That does not mean 1024 is always right; simpler docs and smaller corpora may do fine at 512.

What is the tradeoff between 512 and 1024 dimensions?

512 dimensions is smaller, cheaper, faster to store, and often good enough. It is a reasonable default for lightweight RAG, general product documentation, and early prototypes.

1024 dimensions costs more storage and index memory, but can improve RAG retrieval quality on precision-sensitive workloads. For legal, compliance, financial, technical, and enterprise search, it is often worth testing first.

2048 dimensions should earn its keep in benchmarks before you pay the cost. 256 dimensions is likely too aggressive for hard legal or enterprise retrieval unless your own evals prove otherwise.

What is pgvector halfvec?

pgvector supports multiple vector storage types. The standard vector type stores float32 values. The halfvec type stores float16 values.

In plain English: halfvec uses less precision per number, so each embedding takes less space. A 1024 dimensional vector is roughly 4KB before overhead. A 1024 dimensional halfvec is roughly 2KB. That reduction also matters for HNSW index size, because less vector data needs to stay hot for fast search.

For teams running pgvector at scale, this is not a theoretical optimization. Vector search storage and HNSW index size directly affect database size, RAM pressure, cache behavior, and serving cost.

Did halfvec hurt retrieval quality?

Not in any meaningful way in our benchmark runs. That was the important practical finding. We could reduce vector storage with halfvec without seeing a retrieval-quality penalty that justified keeping float32 vector as the TypeGraph default.

There is still a caveat: benchmark your own corpus if correctness is sensitive. Retrieval quality depends on the model, the corpus, the chunking strategy, and the query distribution. But based on our tests, halfvec is the better starting point for most pgvector RAG systems.

Why did TypeGraph choose halfvec as the default?

halfvec gives TypeGraph a cleaner starting point: lower storage, lower index pressure, and stable benchmark quality. Builders still choose the embedding model and dimension size that fit their workload, but the database storage default should be boring and cost-aware.

What should RAG teams do next?

Start with 1024 dimensions for legal, compliance, technical, or high-stakes enterprise retrieval. Use 512 dimensions for lighter workloads where cost and latency dominate. Prefer pgvector halfvec over vector unless your benchmark proves float32 is worth the extra storage.

Measure quality with nDCG@10, MAP@10, Recall@10, and latency. We used this approach in the Legal RAG Bench, Contractual Clause Retrieval, and License TL;DR Retrieval runs. For a primer on the metrics, read our RAG retrieval evaluation guide.

The short version: dimensions tune quality, halfvec tunes cost, and benchmarks tell you where the tradeoff is worth it.

What We Learned Testing Embedding Dimensions and pgvector halfvec for RAG | TypeGraph