CRUMB a card from devarno-cloud

pgvector RAG Embeddings

traceo intermediate 5 min read

ELI5

Each requirement and document chunk gets a 1,536-number “fingerprint” that captures its meaning. To find similar text, Postgres compares fingerprints with cosine distance. An HNSW index keeps the search to milliseconds even with millions of rows, like a librarian who already knows which shelves are nearest to which.

Technical Deep Dive

Schema Additions (migration 009_pgvector_embeddings.sql)

CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE requirements ADD COLUMN embedding vector(1536);
CREATE TABLE document_chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id UUID NOT NULL REFERENCES workspace(id) ON DELETE CASCADE,
document_id TEXT NOT NULL,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB NOT NULL DEFAULT '{}',
UNIQUE (workspace_id, document_id, chunk_index)
);

Indexes

IndexTableMethodOperator
idx_requirements_embedding_hnswrequirementsHNSWvector_cosine_ops
idx_document_chunks_embedding_hnswdocument_chunksHNSWvector_cosine_ops

Both indexes use m=16, ef_construction=64. These are pgvector’s recommended starting defaults — higher m improves recall at index size cost; higher ef_construction improves recall at build-time cost. Query-time recall is tuned via SET hnsw.ef_search.

Dimensionality

vector(1536) matches OpenAI’s text-embedding-3-small / text-embedding-ada-002 output. AIConfig.embedding_model selects the model; switching to a different dimension requires a migration that drops and recreates both embedding columns and their indexes.

RLS

document_chunks uses both ENABLE and FORCE ROW LEVEL SECURITY, paired with policies:

  • users_view_workspace_chunksSELECT WHERE workspace_id = auth.workspace_id()
  • members_create_chunksINSERT for roles in ('owner','admin','editor','member') with workspace match

requirements.embedding inherits the existing requirements RLS — no separate policy needed for the column.

Query Path

flowchart LR
Q[Tool query string] --> EMB[embedding model]
EMB --> VEC[vector 1536]
VEC --> SQL["SELECT ... ORDER BY embedding <=> $1 LIMIT k"]
SQL --> HNSW[HNSW cosine scan]
HNSW --> RLS[RLS predicate]
RLS --> ROWS[Top-k rows]

The <=> operator is cosine distance under vector_cosine_ops. RLS is applied as part of the same plan, so viewer-role callers already see only their workspace’s matches.

Key Terms

  • HNSW → Hierarchical Navigable Small World; pgvector’s approximate-NN index.
  • <=> → cosine distance operator (smaller = more similar).
  • document_chunks → RAG-only table, separate from requirements, used by ask_documentation.

Q&A

Q: Why is document_chunks FORCEd while requirements isn’t? A: requirements was already RLS-enabled in 002_rls_policies.sql and the existing service-role bypass pattern was acceptable. document_chunks is newer and tightens the policy to also cover the table owner — a safer default introduced in migration 009.

Q: Can I store embeddings of a different dimension? A: Not without a migration. Both columns are vector(1536) and the HNSW index is bound to that dimension; mismatched inserts raise a Postgres error.

Q: What does ask_documentation do with these tables? A: It embeds the user question, runs an HNSW cosine search against document_chunks, takes the top-k passages within the caller’s workspace, and synthesises an answer with citations.

Examples

A 5,000-chunk corpus indexed with m=16, ef_construction=64 typically returns top-5 results in under 5 ms on a single core. Bumping hnsw.ef_search from the default 40 to 100 raises recall noticeably at the cost of perhaps 2–3 ms per query — a useful knob when answers feel “almost but not quite” relevant.

neighbors on the map