CRUMB a card from devarno-cloud

NATS JetStream Stream Topology

skyflow intermediate 6 min read

ELI5

NATS JetStream is the conveyor belt that carries every Skyflow event. There are six belts (streams), each with its own size and how-long-to-keep policy: USER and GAMIFICATION are big and durable, TIMELINE is a work queue (delete after consumed), REALTIME lives only in RAM for 5 minutes. Subjects are addresses on the belt, like events.user.registered.

Technical Deep Dive

Streams (nats/topology.md)

StreamSubjectsStorageRetentionMax ageNotes
USERevents.user.>filelimits365d1M msgs / 10GB
SUBSCRIPTIONevents.subscription.>filelimits730d2y for compliance
TIMELINEevents.timeline.>filework_queue7ddelete-after-consume; 5MB max msg
GAMIFICATIONevents.gamification.> events.xp.> events.achievement.> events.streak.>filelimits365d5M msgs / 50GB
REALTIMEevents.realtime.>memorylimits5mephemeral push
DLQdlq.>filelimits30dfailed events for inspection

duplicate_window per stream: USER 2m, SUBSCRIPTION 5m, TIMELINE 1m, GAMIFICATION 30s, REALTIME 10s. NATS dedupes on the Nats-Msg-Id header within this window.

Stream → Consumer Wiring

flowchart LR
subgraph Producers
CA[core-api]
TL[timeline-service]
GAM[gamification-service]
end
subgraph Streams
U[USER]
S[SUBSCRIPTION]
T[TIMELINE\nwork_queue]
G[GAMIFICATION]
R[REALTIME\nmemory]
D[DLQ]
end
subgraph Consumers
gamU[user-indexer]
caS[subscription-processor]
tlT[timeline-processor]
gamT[timeline-xp-awarder]
lbG[leaderboard-updater]
acG[achievement-checker]
rtR[realtime-gateway]
er[event-router]
end
CA --> U
CA --> S
CA --> T
TL --> T
TL --> G
GAM --> G
GAM --> R
U --> gamU
S --> caS
T --> tlT
T --> gamT
G --> lbG
G --> acG
R --> rtR
U --> er
S --> er
T --> er
G --> er
er --> D

Consumer Group Defaults

classDiagram
class ConsumerConfig {
+string durable_name
+string filter_subject
+AckPolicy explicit
+int max_deliver = 3
+duration ack_wait = 60s
+int max_ack_pending = 100
}

A durable consumer with the same durable_name across N replicas forms a load-balanced group; events fan out one-per-group, redelivered up to 3 times before the event-router publishes to dlq.<subject>.

Key Terms

  • Stream → durable, named append log on JetStream (file or memory backed)
  • Subject → topic address inside a stream (events.user.registered)
  • work_queue retention → message is deleted after the first consumer acks; only one logical consumer group can exist per subject
  • duplicate_window → time during which JetStream rejects re-publishes with the same Nats-Msg-Id
  • max_ack_pending → backpressure cap; consumer pauses fetch when this many unacked msgs are outstanding

Q&A

Q: Why does TIMELINE use work_queue retention? A: Each events.timeline.requested should be processed exactly once by the analysis worker; deleting after ack prevents accidental re-analysis and keeps the stream small despite 5MB messages.

Q: Can multiple consumer groups read from a work_queue stream? A: No — work_queue allows only one logical consumer per subject. If both Gamification and a future analytics service need timeline events, route through GAMIFICATION (which is limits retention) or split the stream.

Q: What happens when REALTIME hits max_age=5m? A: Messages older than 5 minutes are dropped. Acceptable because realtime notifications are useless if a client wasn’t connected to receive them in time.

Q: Where do failed events go after max_deliver=3? A: Event Router catches the terminal failure and republishes onto dlq.<original_subject>, also persisting the protobuf bytes to the dlq_events Postgres table for manual replay.

Examples

Think of streams as parcel-sorting belts at a depot. USER and SUBSCRIPTION are long-haul archive belts (years of records). TIMELINE is the same-day courier belt — once a parcel is picked, it’s gone. REALTIME is a pneumatic tube — five minutes and the message vanishes whether anyone caught it or not.

neighbors on the map