Timeline Analysis Saga
skyflow intermediate 7 min read
ELI5
When a user hits POST /api/v1/timeline/analyze, Core API doesn’t do the work itself. It publishes events.timeline.requested to NATS and immediately replies 202 Accepted. The Python Timeline Service consumes that event, fetches Bluesky posts, runs the algorithm, writes results, then publishes events.timeline.completed — which Gamification picks up to award XP and Realtime Gateway picks up to push a notification.
Technical Deep Dive
End-to-End Sequence
sequenceDiagram autonumber participant C as Client participant API as core-api participant N as NATS (TIMELINE) participant TL as timeline-service (Py) participant BS as Bluesky / AT Proto participant PG as Postgres participant RD as Redis participant G as NATS (GAMIFICATION) participant GAM as gamification-service participant RT as realtime-gateway
C->>API: POST /api/v1/timeline/analyze {did, type} API->>API: tier check + rate limit (Redis) API->>N: publish events.timeline.requested API-->>C: 202 Accepted {analysis_id, estimated_completion} N->>TL: deliver (work_queue, max_deliver=3) TL->>RD: GET analysis:{did}:{type}:{date} alt cache hit RD-->>TL: cached TimelineAnalysis else cache miss TL->>BS: get_author_feed (token-bucket rate-limited) BS-->>TL: posts batch TL->>TL: calculate_flow / clout / kudos / pulse TL->>PG: INSERT timeline_analyses TL->>RD: SET analysis:... TTL=1h (DRIFT) / 30m (LIFT+) end TL->>N: publish events.timeline.completed TL->>N: ACK requested msg N->>GAM: deliver completed (consumer group) GAM->>PG: INSERT xp_transactions (+10 × multiplier) GAM->>G: publish events.xp.earned GAM->>G: publish events.realtime.xp_earned G->>RT: deliver realtime RT->>C: SSE event: xp_earnedState of an Analysis Row
flowchart LR Q["requested<br/>(NATS only)"] --> P["processing<br/>in-flight"] P --> C["completed<br/>row in timeline_analyses"] P --> F["failed<br/>events.timeline.failed"] C --> X["XP awarded<br/>xp_transactions row"]Tier-Aware Cache TTL (services/02-timeline-service.md § 4.1)
| Tier | Cache TTL |
|---|---|
| DRIFT | 60 minutes |
| LIFT / JET / ORBIT | 30 minutes |
The shorter TTL on paid tiers ensures fresher data for paying users, at the cost of more Bluesky API calls (rate-limited via internal token bucket of 5000 req/h).
Failure Modes
| Where it breaks | Symptom | Recovery |
|---|---|---|
| Bluesky 5xx | events.timeline.failed, NACK on requested msg | NATS redelivers up to 3× then DLQ |
| Postgres write fails | analysis lost; requested msg NACK’d | redelivery; idempotent re-analysis |
| Gamification down | XP not awarded; completed msg accumulates in consumer pending | drain when service returns; max_ack_pending=100 backpressure |
Key Terms
- Saga → multi-step async workflow coordinated by event hand-off rather than RPC orchestration
- work_queue stream → TIMELINE deletes a message once any consumer group acks it; the analysis runs once
- Token bucket → in-process rate limiter for outbound Bluesky API calls (5000 req/h)
- estimated_completion → returned in the 202 response; client may poll
GET /api/v1/timeline/analyses/:idor wait for SSE
Q&A
Q: Why does the API return 202 instead of 200 with the result?
A: Analysis fetches up to 500 posts from Bluesky (multi-second). Returning 202 lets the client subscribe to SSE or poll for the completed row, keeping HTTP latency under the 200ms p95 budget for core-api.
Q: What guarantees XP is awarded exactly once?
A: Two layers. (1) Timeline publishes events.timeline.completed with event_id matching the analysis_id; Gamification dedupes on event_id. (2) xp_transactions is append-only — even a duplicate would be detectable by querying WHERE event_id = $1.
Q: What happens to the cache hit path?
A: Even on cache hit, Timeline still publishes events.timeline.completed so Gamification awards XP and the user sees their reward — the cache only short-circuits the Bluesky fetch + algorithm.
Q: How is the SSE notification routed back to the right user?
A: events.realtime.xp_earned carries user_id. Realtime Gateway’s connection manager looks up open SSE channels for that user and writes the event: xp_earned frame.
Examples
Like ordering a custom-printed t-shirt online: the website (core-api) confirms the order instantly (202), the print shop (timeline-service) does the slow work, then a tracking notification (SSE xp_earned) hits your phone the moment the parcel ships.
neighbors on the map
- Site Provisioning Saga State Machine debugging a site stuck mid-provision
- Onboarding Lifecycle Events wiring an analytics consumer to onboarding signals
- Prompt-DAG Scheduler designing a graph.json for a new repo