CRUMB a card from devarno-cloud

SSE Realtime Gateway Fanout

skyflow intermediate 6 min read

ELI5

The Realtime Gateway holds open SSE connections — long HTTP responses that keep streaming event: …\ndata: … frames. When NATS publishes a events.realtime.* event, the gateway looks up open channels for that user and writes the frame. With multiple gateway replicas, the gateway also fans out via Redis pub/sub so a user connected to instance A still receives events processed on instance B.

Technical Deep Dive

Connection Lifecycle

sequenceDiagram
autonumber
participant C as Browser
participant CD as Caddy
participant RT as realtime-gateway
participant RD as Redis
participant N as NATS REALTIME
C->>CD: GET /sse?token=<jwt>
CD->>RT: forward
RT->>RT: validateJWT(token) → user_id
RT->>RT: ConnectionManager.Add(conn)
RT->>RD: SADD connections:{user_id} {conn_id}, EXPIRE 1h
RT-->>C: HTTP 200, SSE headers
RT-->>C: event: connected
loop every 30s
RT-->>C: event: heartbeat
end
N->>RT: events.realtime.xp_earned (proto)
RT->>RT: SendToUser(user_id, evt)
RT-->>C: event: xp_earned\ndata: {...}
C--xRT: client disconnect
RT->>RD: SREM connections:{user_id} {conn_id}

Connection Manager (in-memory)

type ConnectionManager struct {
connections map[string][]*Connection // user_id → [conns]
mu sync.RWMutex
}
type Connection struct {
UserID, ConnID string
EventChan chan SSEEvent // buffered 100
CloseChan chan struct{}
}

SendToUser does a non-blocking send. If the per-connection channel is full (slow client), the event is dropped and realtime_events_dropped_total is incremented — no head-of-line blocking is allowed.

Multi-Instance Fanout

flowchart LR
N["NATS events.realtime.>"] --> A[gateway-A]
N --> B[gateway-B]
N --> C[gateway-C]
A -->|publish realtime:events| RP((Redis pub/sub))
B -->|publish| RP
C -->|publish| RP
RP --> A
RP --> B
RP --> C
A -->|local SendToUser| U1[user X conn on A]
B -->|local SendToUser| U2[user X conn on B]

Each gateway:

  1. Subscribes its own NATS consumer (load-balanced by JetStream’s consumer group).
  2. Re-publishes the parsed event onto Redis realtime:events so peers also see it.
  3. Listens to realtime:events and writes to local connections only.

This trades double-delivery on Redis for guaranteed cross-instance reach, since NATS only delivers to one consumer in the group.

Heartbeats

A 30-second event: heartbeat keeps proxies (Caddy, browsers) from killing the idle TCP socket. X-Accel-Buffering: no disables nginx-style response buffering when present.

Key Terms

  • SSE (Server-Sent Events) → text/event-stream HTTP response with event: / data: frames; one-way server→client; native EventSource in browsers
  • Drop-on-full channel → 100-buffer, non-blocking send; better to drop a notification than block all subscribers
  • connections:{user_id} Redis SET → cross-instance presence (TTL 1h auto-cleanup); used by other services to ask “is this user online?”
  • Redis pub/sub vs JetStream → JetStream load-balances across instances (one wins); Redis pub/sub fans out (all see)

Q&A

Q: Why use Redis pub/sub when NATS could broadcast? A: The NATS consumer is a durable consumer group — JetStream delivers each message to exactly one instance. To reach connections on the other instances, the receiver re-publishes via Redis pub/sub, which is fanout-by-default.

Q: What happens when a client’s network is too slow to drain events? A: The per-connection EventChan (capacity 100) fills up and SendToUser falls into the default: case, logging “Event dropped (channel full)” and incrementing the dropped-events counter. The connection itself is preserved.

Q: How is the JWT validated on the SSE endpoint? A: Token comes via ?token= query param (because EventSource cannot set headers). validateJWT checks the RS256 signature against the public key mounted at /secrets/jwt-public.pem.

Q: What is the maximum SSE connection lifetime? A: Not capped server-side. Caddy / load balancers and the 1h TTL on the Redis presence key are the practical limits; the JS client implements exponential-backoff reconnect.

Examples

Like a stadium PA system. The announcer (NATS) shouts each event into the booth (gateway A). Booth A re-broadcasts on the venue intercom (Redis) so booths B and C in other stands also hear it, then each booth speaks only to the fans physically near its speakers (local connections).

neighbors on the map