pgvector RAG Embeddings
traceo intermediate 5 min read
ELI5
Each requirement and document chunk gets a 1,536-number “fingerprint” that captures its meaning. To find similar text, Postgres compares fingerprints with cosine distance. An HNSW index keeps the search to milliseconds even with millions of rows, like a librarian who already knows which shelves are nearest to which.
Technical Deep Dive
Schema Additions (migration 009_pgvector_embeddings.sql)
CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE requirements ADD COLUMN embedding vector(1536);
CREATE TABLE document_chunks ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), workspace_id UUID NOT NULL REFERENCES workspace(id) ON DELETE CASCADE, document_id TEXT NOT NULL, chunk_index INTEGER NOT NULL, content TEXT NOT NULL, embedding vector(1536), metadata JSONB NOT NULL DEFAULT '{}', UNIQUE (workspace_id, document_id, chunk_index));Indexes
| Index | Table | Method | Operator |
|---|---|---|---|
idx_requirements_embedding_hnsw | requirements | HNSW | vector_cosine_ops |
idx_document_chunks_embedding_hnsw | document_chunks | HNSW | vector_cosine_ops |
Both indexes use m=16, ef_construction=64. These are pgvector’s recommended starting defaults — higher m improves recall at index size cost; higher ef_construction improves recall at build-time cost. Query-time recall is tuned via SET hnsw.ef_search.
Dimensionality
vector(1536) matches OpenAI’s text-embedding-3-small / text-embedding-ada-002 output. AIConfig.embedding_model selects the model; switching to a different dimension requires a migration that drops and recreates both embedding columns and their indexes.
RLS
document_chunks uses both ENABLE and FORCE ROW LEVEL SECURITY, paired with policies:
users_view_workspace_chunks—SELECT WHERE workspace_id = auth.workspace_id()members_create_chunks—INSERTfor roles in('owner','admin','editor','member')with workspace match
requirements.embedding inherits the existing requirements RLS — no separate policy needed for the column.
Query Path
flowchart LR Q[Tool query string] --> EMB[embedding model] EMB --> VEC[vector 1536] VEC --> SQL["SELECT ... ORDER BY embedding <=> $1 LIMIT k"] SQL --> HNSW[HNSW cosine scan] HNSW --> RLS[RLS predicate] RLS --> ROWS[Top-k rows]The <=> operator is cosine distance under vector_cosine_ops. RLS is applied as part of the same plan, so viewer-role callers already see only their workspace’s matches.
Key Terms
- HNSW → Hierarchical Navigable Small World; pgvector’s approximate-NN index.
<=>→ cosine distance operator (smaller = more similar).document_chunks→ RAG-only table, separate from requirements, used byask_documentation.
Q&A
Q: Why is document_chunks FORCEd while requirements isn’t?
A: requirements was already RLS-enabled in 002_rls_policies.sql and the existing service-role bypass pattern was acceptable. document_chunks is newer and tightens the policy to also cover the table owner — a safer default introduced in migration 009.
Q: Can I store embeddings of a different dimension?
A: Not without a migration. Both columns are vector(1536) and the HNSW index is bound to that dimension; mismatched inserts raise a Postgres error.
Q: What does ask_documentation do with these tables?
A: It embeds the user question, runs an HNSW cosine search against document_chunks, takes the top-k passages within the caller’s workspace, and synthesises an answer with citations.
Examples
A 5,000-chunk corpus indexed with m=16, ef_construction=64 typically returns top-5 results in under 5 ms on a single core. Bumping hnsw.ef_search from the default 40 to 100 raises recall noticeably at the cost of perhaps 2–3 ms per query — a useful knob when answers feel “almost but not quite” relevant.
neighbors on the map
- Multi-Layer Caching Strategy debugging stale link data
- LORE+CAIRNET Shared Graph Primitive understanding how LORE and CAIRNET share the graph component