CRUMB a card from devarno-cloud

Job Lifecycle State Machine

so1 beginner 3 min read

ELI5

A job is a parcel: it sits in the bin (pending), gets carried by a courier (running), and ends in exactly one of three trays — delivered (success), returned (failed), or recalled (cancelled). It never goes back into the bin.

Technical Deep Dive

States

Defined in so1-shared/src/jobs.ts as the JobState enum: pending, running, success, failed, cancelled.

Transitions

stateDiagram-v2
[*] --> pending : POST /api/jobs (createdAt set)
pending --> running : worker picks up (startedAt set)
pending --> cancelled : cancel before start
running --> success : output produced (completedAt set)
running --> failed : error captured (completedAt set)
running --> cancelled : user cancels (completedAt set)
success --> [*]
failed --> [*]
cancelled --> [*]

Timestamp Invariants

FieldSet when
createdAtalways, on POST /api/jobs
startedAton entry to running
completedAton entry to any terminal state (success / failed / cancelled)

Terminal States

success, failed, cancelled are absorbing — no further transitions. ADR-003 specifies logs are immutable append-only and persist through and beyond terminal entry; the SSE stream closes after a terminal status event.

Why Cancellation Has Two Sources

A job in pending can be cancelled before any worker touches it (cheap; just metadata flip). A running cancellation must signal the worker; cooperative cancellation is implementation-defined and deferred per ADR-003 Phase 2.

Key Terms

  • Terminal state → state with no outgoing transitions.
  • Cooperative cancellation → worker checks a cancel flag at safe points rather than being killed.

Q&A

Q: Can failed transition to success after a retry? A: No. Retries are new jobs (with a new id). Idempotency-Key (so1-009) controls duplicate creation.

Q: Is completedAt set on cancelled? A: Yes — cancellation is a terminal state, so completion timestamping rules apply.

Q: What does the UI show during pending? A: Per ADR-003, “deferred work — delay may be 100ms-1s”; show a spinner, not log output.

Examples

POST /api/jobs { action: "trigger-github-workflow" } returns 202 with state: "pending". The worker picks it up, transitions to running with startedAt, streams logs, and finishes with a single event: status, data: { state: "success", output: { … } } SSE message.

neighbors on the map