CRUMB a card from devarno-cloud

Operations & Versions Schema

chronicle intermediate 5 min read

ELI5

The relay’s database is a triple ledger: every keystroke is one row in operations, a running tally of “where the document is now” lives in document_versions, and occasional photos of the whole document live in document_snapshots so we don’t have to replay every keystroke from scratch.

Technical Deep Dive

Tables

Defined in relay/migrations/002_create_operations.sql. Initialised by OperationRepository::initialize via sqlx::raw_sql (executes every statement in the file, unlike query() which silently runs only the first).

erDiagram
DOCUMENTS ||--o{ OPERATIONS : has
DOCUMENTS ||--o| DOCUMENT_VERSIONS : tracks
DOCUMENTS ||--o{ DOCUMENT_SNAPSHOTS : snapshots
DOCUMENTS {
TEXT id PK
TEXT title
TEXT owner_id
TEXT status
TEXT created_at
TEXT updated_at
TEXT deleted_at
}
OPERATIONS {
TEXT id PK
TEXT document_id FK
TEXT author_id
TEXT operation_type "insert|delete|format"
TEXT position_id
TEXT end_position_id
TEXT content
TEXT attributes
INTEGER hlc_timestamp
INTEGER hlc_counter
INTEGER server_seq
TEXT dependencies "json array"
TEXT vest_proof_ref
TEXT created_at
}
DOCUMENT_VERSIONS {
TEXT document_id PK
INTEGER current_seq
INTEGER latest_hlc
INTEGER operation_count
TEXT updated_at
}
DOCUMENT_SNAPSHOTS {
TEXT id PK
TEXT document_id
TEXT content
INTEGER server_seq
INTEGER hlc_timestamp
TEXT created_at
}

Constraints & Indexes

ObjectWhy
UNIQUE (document_id, server_seq)guarantees per-doc total order; surfaced as OperationError::Duplicate
CHECK operation_type IN ('insert','delete','format')rejects unknown types at insert time (note: retain is wire-only, never persisted)
idx_operations_document_seqfast since_version sync queries
idx_operations_document_hlcfast timeline range scans
idx_operations_author_idauthor-based audit and timeline aggregation
idx_snapshots_document_seq DESClatest snapshot lookup

Pool Configuration

AppState::new configures SQLite for collaborative write load (relay/src/services/state.rs):

SettingValueWhy
journal_modeWALreaders don’t block writers
busy_timeout5 sabsorbs momentary write-lock contention
foreign_keysonprevents orphan rows if hard deletes are ever issued
synchronousNormalsafe with WAL; fewer fsyncs than Full
max_connections5SQLite serialises writes anyway, larger pool just adds overhead

Sync Read Flow

flowchart LR
A[client: SyncRequest since_version=N] --> B[OperationRepository.list_since]
B -- idx_operations_document_seq --> C[(operations rows where server_seq > N)]
C --> D[OperationSummary stream]
D --> E[SyncResponseMsg]

Key Terms

  • server_seq → monotonic per-document sequence; the source of truth for sync resume.
  • dependencies (JSON column) → string-encoded Vec<Uuid>; intentionally not normalised into a join table because it is read whole-row.
  • document_versions → one-row-per-document materialised summary updated on every store; cheap reads of “current version”.

Q&A

Q: Why does the persistence layer enumerate only insert/delete/format? A: OperationPayload in models/operation.rs has those three variants; retain is a wire-only signal that never reaches the table.

Q: Why TEXT timestamps instead of INTEGER? A: Per ADR-003, ISO-8601 strings keep the documents table human-readable; the operations table stores HLC wall time as INTEGER ms because it is queried numerically.

Q: Why are there commented-out FOREIGN KEY lines on operations.document_id? A: SQLite quirk — they live alongside foreign_keys = on. The migration leaves the constraint commented to keep the table independently creatable; FK enforcement still runs against the in-pragma setting.

Examples

A since_version=42 sync becomes SELECT ... FROM operations WHERE document_id = ? AND server_seq > 42 ORDER BY server_seq — covered by idx_operations_document_seq. Adding a column for branch_id would require an extra index to keep this query covering.

neighbors on the map