pgvector RAG Embeddings

traceo intermediate 5 min read

ELI5

Each requirement and document chunk gets a 1,536-number “fingerprint” that captures its meaning. To find similar text, Postgres compares fingerprints with cosine distance. An HNSW index keeps the search to milliseconds even with millions of rows, like a librarian who already knows which shelves are nearest to which.

Technical Deep Dive

Schema Additions (migration `009_pgvector_embeddings.sql`)

CREATE EXTENSION IF NOT EXISTS vector;

ALTER TABLE requirements ADD COLUMN embedding vector(1536);

CREATE TABLE document_chunks (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  workspace_id UUID NOT NULL REFERENCES workspace(id) ON DELETE CASCADE,
  document_id TEXT NOT NULL,
  chunk_index INTEGER NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536),
  metadata JSONB NOT NULL DEFAULT '{}',
  UNIQUE (workspace_id, document_id, chunk_index)
);

Indexes

Index	Table	Method	Operator
`idx_requirements_embedding_hnsw`	`requirements`	HNSW	`vector_cosine_ops`
`idx_document_chunks_embedding_hnsw`	`document_chunks`	HNSW	`vector_cosine_ops`

Both indexes use m=16, ef_construction=64. These are pgvector’s recommended starting defaults — higher m improves recall at index size cost; higher ef_construction improves recall at build-time cost. Query-time recall is tuned via SET hnsw.ef_search.

Dimensionality

vector(1536) matches OpenAI’s text-embedding-3-small / text-embedding-ada-002 output. AIConfig.embedding_model selects the model; switching to a different dimension requires a migration that drops and recreates both embedding columns and their indexes.

RLS

document_chunks uses both ENABLE and FORCE ROW LEVEL SECURITY, paired with policies:

users_view_workspace_chunks — SELECT WHERE workspace_id = auth.workspace_id()
members_create_chunks — INSERT for roles in ('owner','admin','editor','member') with workspace match

requirements.embedding inherits the existing requirements RLS — no separate policy needed for the column.

Query Path

flowchart LR
  Q[Tool query string] --> EMB[embedding model]
  EMB --> VEC[vector 1536]
  VEC --> SQL["SELECT ... ORDER BY embedding <=> $1 LIMIT k"]
  SQL --> HNSW[HNSW cosine scan]
  HNSW --> RLS[RLS predicate]
  RLS --> ROWS[Top-k rows]

The <=> operator is cosine distance under vector_cosine_ops. RLS is applied as part of the same plan, so viewer-role callers already see only their workspace’s matches.

Key Terms

HNSW → Hierarchical Navigable Small World; pgvector’s approximate-NN index.
<=> → cosine distance operator (smaller = more similar).
document_chunks → RAG-only table, separate from requirements, used by ask_documentation.

Q&A

Q: Why is document_chunks FORCEd while requirements isn’t? A: requirements was already RLS-enabled in 002_rls_policies.sql and the existing service-role bypass pattern was acceptable. document_chunks is newer and tightens the policy to also cover the table owner — a safer default introduced in migration 009.

Q: Can I store embeddings of a different dimension? A: Not without a migration. Both columns are vector(1536) and the HNSW index is bound to that dimension; mismatched inserts raise a Postgres error.

Q: What does ask_documentation do with these tables? A: It embeds the user question, runs an HNSW cosine search against document_chunks, takes the top-k passages within the caller’s workspace, and synthesises an answer with citations.

Examples

A 5,000-chunk corpus indexed with m=16, ef_construction=64 typically returns top-5 results in under 5 ms on a single core. Bumping hnsw.ef_search from the default 40 to 100 raises recall noticeably at the cost of perhaps 2–3 ms per query — a useful knob when answers feel “almost but not quite” relevant.

neighbors on the map

Multi-Layer Caching Strategy debugging stale link data
LORE+CAIRNET Shared Graph Primitive understanding how LORE and CAIRNET share the graph component