CRDT Sidecar WebSocket Transport
rocky advanced 7 min read
ELI5
Next.js still has no stable WebSocket primitive in route handlers, so collaborative prompt editing runs on a tiny Node sidecar that the dev server supervises like a child process. Edits stream to a .yaml.crdt file alongside the prompt every five idle seconds, so killing the sidecar mid-typing does not lose work.
Technical Deep Dive
Why a sidecar (Phase 3e D1)
Next.js 16 still has no stable Route-Handler WebSocket primitive. Coupling the phase close to upstream framework cadence is open-ended; a bounded ~200-line Node sidecar at console/scripts/crdt-server.mjs is the cheapest path. The sidecar is retire-able later (no schema change) — crdt.ts registry contract is the seam.
Three transport modes
RALPH_CRDT_TRANSPORT={sidecar|remote|disabled} mirrors the SS-07 RALPH_TRANSPORT shape:
| Mode | Route returns | Editor behaviour |
|---|---|---|
disabled | 501 (cloud default until managed CRDT lands) | offline banner; uncoordinated edits |
sidecar | 307 to ws://127.0.0.1:${RALPH_CRDT_PORT ?? 8766}/<slug>/<encoded path> | live convergence |
remote | 307 to ${RALPH_CRDT_URL}/<slug>/<encoded path> | live convergence |
The HTTP route (console/src/app/api/ralph/prompts/crdt/[...path]/route.ts) does the role gate up front — observer rejected with 403 — so an unauthorised client never even sees the redirect target.
Handshake
sequenceDiagram autonumber participant Browser as Browser (operator/admin) participant Route as Next.js route participant Sidecar as CRDT sidecar (Node) participant FS as <prompts>/<file>.yaml.crdt
Browser->>Route: GET .../crdt/<path> Route->>Route: Airlock role gate alt observer Route-->>Browser: 403 else operator/admin Route-->>Browser: 307 ws://.../<slug>/<encoded path> end Browser->>Sidecar: WS upgrade (Sec-WebSocket-Protocol: bearer.<token>) Sidecar->>Sidecar: verify bearer (HMAC from VAULT) Sidecar->>Sidecar: Airlock session → role alt observer Sidecar-->>Browser: 1008 close else accepted Sidecar->>FS: read .yaml.crdt (seed if exists) Sidecar->>Browser: bound to Y.Doc loop on update Browser->>Sidecar: Yjs update Sidecar->>Sidecar: schedule 5s debounce end Sidecar->>FS: encodeStateAsUpdate → temp + rename endThe bearer rides as Sec-WebSocket-Protocol: bearer.<token> because y-websocket supports subprotocol-as-bearer natively. No new auth surface; the same VAULT-resolved HMAC the HTTP routes use.
Persistence model
<workspace>/prompts/<file>.yaml.crdt joins .yaml and .yaml.fp as a third sidecar in the prompts tree. Persistence is atomic via temp-file + rename. Debounced 5 s after the last update. On boot, getDoc() first attempts to seed from the .crdt file via Y.applyUpdate(doc, fs.readFileSync(...)). The file is gitignored by default — binary, transient.
Lifecycle
stateDiagram-v2 [*] --> Spawned: npm run dev Spawned --> Running: WS listening Running --> Backoff: process crash Backoff --> Running: exp backoff cap 30s Running --> Draining: SIGTERM Draining --> Persisted: flush all docs to .crdt Persisted --> [*] Running --> Persisted: 5s idle per doc Persisted --> RunningThe dev process supervises the sidecar in parallel to the existing ralph serve supervisor: spawn on boot, restart on crash with exponential backoff capped at 30 s, kill on SIGTERM. Production deployment uses an external supervisor (systemd / PM2 / k8s) — out of scope for 3e and documented as a deployment note.
HATCH events
| Event | When | Body content logged? |
|---|---|---|
prompt.crdt.session.opened | after WS handshake accept | no |
prompt.crdt.session.closed | on WS close (any reason) | no |
Body content is never logged (3d invariant). The existing prompt.edited event continues to fire on the HTTP save (the CRDT publish snapshot per gate G3); no change there. The sidecar reaches RELAY by HTTP loopback to the Next.js process with the same bearer.
Key Terms
- y-websocket → existing
^3.0.0dep inconsole/package.json; shipssetupWSConnectionand a subprotocol-bearer client - Yjs
.crdtsidecar → binary state snapshot viaY.encodeStateAsUpdate; written debounced and gitignored - Subprotocol bearer → HTTP-style bearer carried in
Sec-WebSocket-Protocolsince WS has no Authorization header negotiation phase - Sidecar supervision → Next.js dev process spawning, restarting, and SIGTERM-draining the Node WS server
Q&A
Q: Why a separate sidecar rather than ws inside Next.js?
A: Next.js 16 has no stable Route-Handler WebSocket primitive. A bounded ~200-line sidecar is the cheapest path that doesn’t couple the phase close to upstream framework cadence; the crdt.ts registry remains the seam, so the sidecar is retire-able later without a schema change.
Q: What survives a sidecar restart?
A: All committed Y.Doc state, via the .yaml.crdt file. The five-second debounce is the upper bound on un-persisted edits; killing mid-keystroke loses at most that window.
Q: Why is the role gate enforced both at the HTTP route and at the WS handshake?
A: The HTTP route prevents the redirect from leaking to an unauthorised client; the WS handshake prevents a client that bypassed the route from connecting directly. Defence in depth — and the WS check is the only one that runs in remote mode.
Examples
A whiteboard in a meeting room (Y.Doc) with a janitor who photographs it every five idle seconds (.yaml.crdt). Anyone with a stamped pass (bearer) can walk in and edit; observers can peek through the window but the door (WS handshake) refuses them. If the janitor walks off shift mid-meeting, the next janitor restores the latest photograph and writing resumes.
neighbors on the map
- WebSocket Session Lifecycle adding a new privileged WS handler
- CRDT Operation Message adding a new operation type
- FNP CRDT Conflict-Free Merge Semantics understanding CRDT properties and guarantees
- CRDT Merge Strategies resolving a concurrent edit on a shared unit