Archive Directory Layout
kahn beginner 4 min read
ELI5
A library archive: each run gets its own folder, with one append-only logbook, one final stamped summary card, and a copy of the floor plan it used. There’s also a card catalog at the door, which you can throw away and rebuild from the folders.
Technical Deep Dive
flowchart TD A[".kahn/archive/"] --> B["runs/"] A --> C["index.json (cache, rebuildable)"] B --> D["<run_id>/"] D --> E["transitions.jsonl (append-only)"] D --> F["summary.json (once at run_end)"] D --> G["graph.json (once at run_start)"]Write Discipline
| Operation | Allowed? | Why |
|---|---|---|
Append to transitions.jsonl | yes | Core write path |
Rewrite transitions.jsonl | no | Breaks I-4 append-only |
Write summary.json | once per run | Second write is a bug |
Write graph.json | once per run | Captured at run_start |
Atomic-replace index.json | yes | tmp + rename() |
Any write outside .kahn/archive/ | no | Breaks I-1 |
transitions.jsonl opens with O_WRONLY | O_APPEND | O_CREAT. POSIX guarantees atomicity for writes ≤ PIPE_BUF (4 KiB), which exceeds the largest realistic event.
Sequencing Invariants
run_startis the first event in anytransitions.jsonl.run_endis the last event in a complete log.- Every
to == "running"transition has a matchingnode_attemptwith the sameattempt, unless the process died mid-attempt. - Same-
tsevents resolved by file-offset order (O-4) — never sort bytsalone.
summary.json Shape
Mirrors a row of data/history_index.json exactly so the History view reads either fixture or archive without branching:
{"run_id":"run_flaky","started":"...","ended":"...","duration_s":481.6, "outcome":"clean_with_flake","total_nodes":7,"done":7,"failed":0,"blocked":0, "total_attempts":11,"flake_retries":4,"exit_code":0,"failed_nodes":[], "node_attempts":{"schema-init":1,"auth-table":2}}Key Terms
- run_id → Filesystem-safe partition key matching
^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$. - index.json → Cached run list. Rebuildable from
runs/*/summary.json— never source of truth. - I-4 → The “monotonic non-decrease” check on
transitions.jsonlbyte length.
Q&A
Q: What happens if I delete index.json?
A: Scope rebuilds it on next start by scanning runs/*/summary.json. No data loss.
Q: Why is graph.json snapshotted into the archive?
A: So replay stays renderable even if the live orchestrator’s graph.json changes after the run (closes O-3).
Q: How are same-ts events ordered?
A: By file offset. Reducers must read in append order, not sort by timestamp.
Examples
Mounting .kahn/archive/runs/ as an S3 prefix gives the same partition shape — Scope’s read path is a pure prefix scan, so the only code change needed is the EventSource adapter behind archive.py.
neighbors on the map
- Horizontal Scalability Seams planning an S3-backed archive