CRUMB a card from devarno-cloud

CI Transition Event Schema

kahn intermediate 6 min read

ELI5

A standardised parcel label: every package shows when it was sent, which conveyor belt it belongs to, and what kind of contents it carries — and the post office refuses any parcel without the right stickers in the right boxes.

Technical Deep Dive

contracts/schemas/transitions.schema.json defines four event variants. Every line in transitions.jsonl is one JSON object; all variants share {ts, run_id, event}.

Base Fields

FieldTypeConstraints
tsstringRFC3339 with millisecond precision + Z
run_idstring^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$
eventenumrun_start | node_transition | node_attempt | run_end

Variant Map

classDiagram
class TransitionEvent {
+string ts
+string run_id
+string event
}
class run_start {
+int total_nodes
}
class node_transition {
+string node_id
+NodeStatus from
+NodeStatus to
+int? attempt
+string? reason
}
class node_attempt {
+string node_id
+int attempt
+float duration_s
+bool converged
+float? backoff_s
+DoneWhenResult[] done_when_results
}
class run_end {
+Outcome outcome
+int done
+int failed
+int blocked
+float total_duration_s
+int? total_attempts
+int? flake_retries
+int? exit_code
}
class DoneWhenResult {
+string cmd
+int rc
+float duration_s
+string? tail
+bool? truncated
}
TransitionEvent <|-- run_start
TransitionEvent <|-- node_transition
TransitionEvent <|-- node_attempt
TransitionEvent <|-- run_end
node_attempt --> DoneWhenResult

Enum Values

  • NodeStatus: pending | ready | running | done | failed | blocked
  • Outcome: clean | clean_with_flake | partial | stuck | catastrophic

Field Conditionality

  • node_transition.attempt is present iff to == "running".
  • node_attempt.backoff_s is omitted for attempt 1.
  • done_when_result.tail is present only when rc != 0; capped at 4096 chars and paired with truncated: true if cropped.
  • run_end.exit_code elides 0 and is present on stuck / catastrophic.

Reason Grammar

node_transition.reason is <kind>[:<detail>]. Known kinds: max_ralph_iters_reached, ancestor_failed:<id>[,<id>...]. Free-form for forward-compat.

Key Terms

  • oneOf → JSON Schema discriminator that selects the variant $def by event.
  • forward-compat → Producers may add unknown fields; consumers tolerate them and the emitter passes them through unchanged.
  • done_when_result → Per-shell-command record inside node_attempt.

Q&A

Q: Can ts be 2026-04-22T10:00:00Z (no milliseconds)? A: No. The pattern requires \.\d{3}Z — exactly three digits of fraction.

Q: What ordering can a consumer rely on for ts? A: Monotonic non-decrease is preferred but not required. Tie-breaking is file-offset, not ts-sort (O-4).

Q: Why isn’t exit_code: 0 emitted on clean runs? A: Schema says “Optional. Present on non-clean outcomes.” It’s a wire-size optimisation — absence implies zero.

Examples

A flaky retry’s third attempt looks like:

{"ts":"2026-04-22T10:08:01.600Z","run_id":"run_flaky","event":"node_attempt",
"node_id":"user-table","attempt":3,"duration_s":4.2,"converged":true,"backoff_s":4.0,
"done_when_results":[{"cmd":"pnpm test","rc":0,"duration_s":4.1}]}

neighbors on the map