CRUMB a card from devarno-cloud

CRDT Sidecar WebSocket Transport

rocky advanced 7 min read

ELI5

Next.js still has no stable WebSocket primitive in route handlers, so collaborative prompt editing runs on a tiny Node sidecar that the dev server supervises like a child process. Edits stream to a .yaml.crdt file alongside the prompt every five idle seconds, so killing the sidecar mid-typing does not lose work.

Technical Deep Dive

Why a sidecar (Phase 3e D1)

Next.js 16 still has no stable Route-Handler WebSocket primitive. Coupling the phase close to upstream framework cadence is open-ended; a bounded ~200-line Node sidecar at console/scripts/crdt-server.mjs is the cheapest path. The sidecar is retire-able later (no schema change) — crdt.ts registry contract is the seam.

Three transport modes

RALPH_CRDT_TRANSPORT={sidecar|remote|disabled} mirrors the SS-07 RALPH_TRANSPORT shape:

ModeRoute returnsEditor behaviour
disabled501 (cloud default until managed CRDT lands)offline banner; uncoordinated edits
sidecar307 to ws://127.0.0.1:${RALPH_CRDT_PORT ?? 8766}/<slug>/<encoded path>live convergence
remote307 to ${RALPH_CRDT_URL}/<slug>/<encoded path>live convergence

The HTTP route (console/src/app/api/ralph/prompts/crdt/[...path]/route.ts) does the role gate up front — observer rejected with 403 — so an unauthorised client never even sees the redirect target.

Handshake

sequenceDiagram
autonumber
participant Browser as Browser (operator/admin)
participant Route as Next.js route
participant Sidecar as CRDT sidecar (Node)
participant FS as <prompts>/<file>.yaml.crdt
Browser->>Route: GET .../crdt/<path>
Route->>Route: Airlock role gate
alt observer
Route-->>Browser: 403
else operator/admin
Route-->>Browser: 307 ws://.../<slug>/<encoded path>
end
Browser->>Sidecar: WS upgrade (Sec-WebSocket-Protocol: bearer.<token>)
Sidecar->>Sidecar: verify bearer (HMAC from VAULT)
Sidecar->>Sidecar: Airlock session → role
alt observer
Sidecar-->>Browser: 1008 close
else accepted
Sidecar->>FS: read .yaml.crdt (seed if exists)
Sidecar->>Browser: bound to Y.Doc
loop on update
Browser->>Sidecar: Yjs update
Sidecar->>Sidecar: schedule 5s debounce
end
Sidecar->>FS: encodeStateAsUpdate → temp + rename
end

The bearer rides as Sec-WebSocket-Protocol: bearer.<token> because y-websocket supports subprotocol-as-bearer natively. No new auth surface; the same VAULT-resolved HMAC the HTTP routes use.

Persistence model

<workspace>/prompts/<file>.yaml.crdt joins .yaml and .yaml.fp as a third sidecar in the prompts tree. Persistence is atomic via temp-file + rename. Debounced 5 s after the last update. On boot, getDoc() first attempts to seed from the .crdt file via Y.applyUpdate(doc, fs.readFileSync(...)). The file is gitignored by default — binary, transient.

Lifecycle

stateDiagram-v2
[*] --> Spawned: npm run dev
Spawned --> Running: WS listening
Running --> Backoff: process crash
Backoff --> Running: exp backoff cap 30s
Running --> Draining: SIGTERM
Draining --> Persisted: flush all docs to .crdt
Persisted --> [*]
Running --> Persisted: 5s idle per doc
Persisted --> Running

The dev process supervises the sidecar in parallel to the existing ralph serve supervisor: spawn on boot, restart on crash with exponential backoff capped at 30 s, kill on SIGTERM. Production deployment uses an external supervisor (systemd / PM2 / k8s) — out of scope for 3e and documented as a deployment note.

HATCH events

EventWhenBody content logged?
prompt.crdt.session.openedafter WS handshake acceptno
prompt.crdt.session.closedon WS close (any reason)no

Body content is never logged (3d invariant). The existing prompt.edited event continues to fire on the HTTP save (the CRDT publish snapshot per gate G3); no change there. The sidecar reaches RELAY by HTTP loopback to the Next.js process with the same bearer.

Key Terms

  • y-websocket → existing ^3.0.0 dep in console/package.json; ships setupWSConnection and a subprotocol-bearer client
  • Yjs .crdt sidecar → binary state snapshot via Y.encodeStateAsUpdate; written debounced and gitignored
  • Subprotocol bearer → HTTP-style bearer carried in Sec-WebSocket-Protocol since WS has no Authorization header negotiation phase
  • Sidecar supervision → Next.js dev process spawning, restarting, and SIGTERM-draining the Node WS server

Q&A

Q: Why a separate sidecar rather than ws inside Next.js? A: Next.js 16 has no stable Route-Handler WebSocket primitive. A bounded ~200-line sidecar is the cheapest path that doesn’t couple the phase close to upstream framework cadence; the crdt.ts registry remains the seam, so the sidecar is retire-able later without a schema change.

Q: What survives a sidecar restart? A: All committed Y.Doc state, via the .yaml.crdt file. The five-second debounce is the upper bound on un-persisted edits; killing mid-keystroke loses at most that window.

Q: Why is the role gate enforced both at the HTTP route and at the WS handshake? A: The HTTP route prevents the redirect from leaking to an unauthorised client; the WS handshake prevents a client that bypassed the route from connecting directly. Defence in depth — and the WS check is the only one that runs in remote mode.

Examples

A whiteboard in a meeting room (Y.Doc) with a janitor who photographs it every five idle seconds (.yaml.crdt). Anyone with a stamped pass (bearer) can walk in and edit; observers can peek through the window but the door (WS handshake) refuses them. If the janitor walks off shift mid-meeting, the next janitor restores the latest photograph and writing resumes.

neighbors on the map