CRUMB a card from devarno-cloud

Archive Directory Layout

kahn beginner 4 min read

ELI5

A library archive: each run gets its own folder, with one append-only logbook, one final stamped summary card, and a copy of the floor plan it used. There’s also a card catalog at the door, which you can throw away and rebuild from the folders.

Technical Deep Dive

flowchart TD
A[".kahn/archive/"] --> B["runs/"]
A --> C["index.json (cache, rebuildable)"]
B --> D["<run_id>/"]
D --> E["transitions.jsonl (append-only)"]
D --> F["summary.json (once at run_end)"]
D --> G["graph.json (once at run_start)"]

Write Discipline

OperationAllowed?Why
Append to transitions.jsonlyesCore write path
Rewrite transitions.jsonlnoBreaks I-4 append-only
Write summary.jsononce per runSecond write is a bug
Write graph.jsononce per runCaptured at run_start
Atomic-replace index.jsonyestmp + rename()
Any write outside .kahn/archive/noBreaks I-1

transitions.jsonl opens with O_WRONLY | O_APPEND | O_CREAT. POSIX guarantees atomicity for writes ≤ PIPE_BUF (4 KiB), which exceeds the largest realistic event.

Sequencing Invariants

  1. run_start is the first event in any transitions.jsonl.
  2. run_end is the last event in a complete log.
  3. Every to == "running" transition has a matching node_attempt with the same attempt, unless the process died mid-attempt.
  4. Same-ts events resolved by file-offset order (O-4) — never sort by ts alone.

summary.json Shape

Mirrors a row of data/history_index.json exactly so the History view reads either fixture or archive without branching:

{"run_id":"run_flaky","started":"...","ended":"...","duration_s":481.6,
"outcome":"clean_with_flake","total_nodes":7,"done":7,"failed":0,"blocked":0,
"total_attempts":11,"flake_retries":4,"exit_code":0,"failed_nodes":[],
"node_attempts":{"schema-init":1,"auth-table":2}}

Key Terms

  • run_id → Filesystem-safe partition key matching ^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$.
  • index.json → Cached run list. Rebuildable from runs/*/summary.json — never source of truth.
  • I-4 → The “monotonic non-decrease” check on transitions.jsonl byte length.

Q&A

Q: What happens if I delete index.json? A: Scope rebuilds it on next start by scanning runs/*/summary.json. No data loss.

Q: Why is graph.json snapshotted into the archive? A: So replay stays renderable even if the live orchestrator’s graph.json changes after the run (closes O-3).

Q: How are same-ts events ordered? A: By file offset. Reducers must read in append order, not sort by timestamp.

Examples

Mounting .kahn/archive/runs/ as an S3 prefix gives the same partition shape — Scope’s read path is a pure prefix scan, so the only code change needed is the EventSource adapter behind archive.py.

neighbors on the map