Job Lifecycle State Machine
so1 beginner 3 min read
ELI5
A job is a parcel: it sits in the bin (pending), gets carried by a courier (running), and ends in exactly one of three trays — delivered (success), returned (failed), or recalled (cancelled). It never goes back into the bin.
Technical Deep Dive
States
Defined in so1-shared/src/jobs.ts as the JobState enum: pending, running, success, failed, cancelled.
Transitions
stateDiagram-v2 [*] --> pending : POST /api/jobs (createdAt set) pending --> running : worker picks up (startedAt set) pending --> cancelled : cancel before start running --> success : output produced (completedAt set) running --> failed : error captured (completedAt set) running --> cancelled : user cancels (completedAt set) success --> [*] failed --> [*] cancelled --> [*]Timestamp Invariants
| Field | Set when |
|---|---|
createdAt | always, on POST /api/jobs |
startedAt | on entry to running |
completedAt | on entry to any terminal state (success / failed / cancelled) |
Terminal States
success, failed, cancelled are absorbing — no further transitions. ADR-003 specifies logs are immutable append-only and persist through and beyond terminal entry; the SSE stream closes after a terminal status event.
Why Cancellation Has Two Sources
A job in pending can be cancelled before any worker touches it (cheap; just metadata flip). A running cancellation must signal the worker; cooperative cancellation is implementation-defined and deferred per ADR-003 Phase 2.
Key Terms
- Terminal state → state with no outgoing transitions.
- Cooperative cancellation → worker checks a cancel flag at safe points rather than being killed.
Q&A
Q: Can failed transition to success after a retry?
A: No. Retries are new jobs (with a new id). Idempotency-Key (so1-009) controls duplicate creation.
Q: Is completedAt set on cancelled?
A: Yes — cancellation is a terminal state, so completion timestamping rules apply.
Q: What does the UI show during pending?
A: Per ADR-003, “deferred work — delay may be 100ms-1s”; show a spinner, not log output.
Examples
POST /api/jobs { action: "trigger-github-workflow" } returns 202 with state: "pending". The worker picks it up, transitions to running with startedAt, streams logs, and finishes with a single event: status, data: { state: "success", output: { … } } SSE message.
neighbors on the map
- Unit Lifecycle States deciding whether a unit can be imported
- Run Outcome Classification interpreting a History row's status pill
- Site Provisioning Saga State Machine debugging a site stuck mid-provision