SSE Realtime Gateway Fanout
skyflow intermediate 6 min read
ELI5
The Realtime Gateway holds open SSE connections — long HTTP responses that keep streaming event: …\ndata: … frames. When NATS publishes a events.realtime.* event, the gateway looks up open channels for that user and writes the frame. With multiple gateway replicas, the gateway also fans out via Redis pub/sub so a user connected to instance A still receives events processed on instance B.
Technical Deep Dive
Connection Lifecycle
sequenceDiagram autonumber participant C as Browser participant CD as Caddy participant RT as realtime-gateway participant RD as Redis participant N as NATS REALTIME
C->>CD: GET /sse?token=<jwt> CD->>RT: forward RT->>RT: validateJWT(token) → user_id RT->>RT: ConnectionManager.Add(conn) RT->>RD: SADD connections:{user_id} {conn_id}, EXPIRE 1h RT-->>C: HTTP 200, SSE headers RT-->>C: event: connected loop every 30s RT-->>C: event: heartbeat end N->>RT: events.realtime.xp_earned (proto) RT->>RT: SendToUser(user_id, evt) RT-->>C: event: xp_earned\ndata: {...} C--xRT: client disconnect RT->>RD: SREM connections:{user_id} {conn_id}Connection Manager (in-memory)
type ConnectionManager struct { connections map[string][]*Connection // user_id → [conns] mu sync.RWMutex}type Connection struct { UserID, ConnID string EventChan chan SSEEvent // buffered 100 CloseChan chan struct{}}SendToUser does a non-blocking send. If the per-connection channel is full (slow client), the event is dropped and realtime_events_dropped_total is incremented — no head-of-line blocking is allowed.
Multi-Instance Fanout
flowchart LR N["NATS events.realtime.>"] --> A[gateway-A] N --> B[gateway-B] N --> C[gateway-C] A -->|publish realtime:events| RP((Redis pub/sub)) B -->|publish| RP C -->|publish| RP RP --> A RP --> B RP --> C A -->|local SendToUser| U1[user X conn on A] B -->|local SendToUser| U2[user X conn on B]Each gateway:
- Subscribes its own NATS consumer (load-balanced by JetStream’s consumer group).
- Re-publishes the parsed event onto Redis
realtime:eventsso peers also see it. - Listens to
realtime:eventsand writes to local connections only.
This trades double-delivery on Redis for guaranteed cross-instance reach, since NATS only delivers to one consumer in the group.
Heartbeats
A 30-second event: heartbeat keeps proxies (Caddy, browsers) from killing the idle TCP socket. X-Accel-Buffering: no disables nginx-style response buffering when present.
Key Terms
- SSE (Server-Sent Events) → text/event-stream HTTP response with
event:/data:frames; one-way server→client; nativeEventSourcein browsers - Drop-on-full channel → 100-buffer, non-blocking send; better to drop a notification than block all subscribers
connections:{user_id}Redis SET → cross-instance presence (TTL 1h auto-cleanup); used by other services to ask “is this user online?”- Redis pub/sub vs JetStream → JetStream load-balances across instances (one wins); Redis pub/sub fans out (all see)
Q&A
Q: Why use Redis pub/sub when NATS could broadcast? A: The NATS consumer is a durable consumer group — JetStream delivers each message to exactly one instance. To reach connections on the other instances, the receiver re-publishes via Redis pub/sub, which is fanout-by-default.
Q: What happens when a client’s network is too slow to drain events?
A: The per-connection EventChan (capacity 100) fills up and SendToUser falls into the default: case, logging “Event dropped (channel full)” and incrementing the dropped-events counter. The connection itself is preserved.
Q: How is the JWT validated on the SSE endpoint?
A: Token comes via ?token= query param (because EventSource cannot set headers). validateJWT checks the RS256 signature against the public key mounted at /secrets/jwt-public.pem.
Q: What is the maximum SSE connection lifetime? A: Not capped server-side. Caddy / load balancers and the 1h TTL on the Redis presence key are the practical limits; the JS client implements exponential-backoff reconnect.
Examples
Like a stadium PA system. The announcer (NATS) shouts each event into the booth (gateway A). Booth A re-broadcasts on the venue intercom (Redis) so booths B and C in other stands also hear it, then each booth speaks only to the fans physically near its speakers (local connections).
neighbors on the map
- WebSocket Session Lifecycle adding a new privileged WS handler
- Presence Broadcast Channels diagnosing missing cursor updates in the editor
- Onboarding Lifecycle Events wiring an analytics consumer to onboarding signals