Prompt-DAG Scheduler
kahn intermediate 6 min read
ELI5
A kitchen with a head chef who only lets a cook start a dish when all the prep dishes it needs are plated, and never lets two cooks share the same chopping board at once. The chef can run several non-clashing dishes in parallel up to a configured stove count.
Technical Deep Dive
core/orchestrator.py schedules Nodes loaded from graph.json (validated by contracts/schemas/graph.schema.json). Each node has depends_on, touches, parallel_safe, and done_when.
Ready Set
ready_nodes(nodes) returns nodes whose status is pending or ready and whose depends_on IDs are all in the done set.
Conflict & Batch Selection
flowchart TD A["ready_nodes()"] --> B{"len(in_flight)+batch < max_par?"} B -->|no| Z[stop] B -->|yes| C{"node.parallel_safe?"} C -->|false && others present| D[skip] C -->|true| E{"touches overlap any in_flight or batch?"} E -->|yes| D E -->|no| F[append to batch]conflicts(a, b) is bool(set(a.touches) & set(b.touches)). A non-parallel_safe node forces solo execution.
Status Lifecycle
stateDiagram-v2 [*] --> pending pending --> ready: deps satisfied ready --> running: pick_batch running --> done: ralph converged running --> failed: max_ralph_iters_reached pending --> blocked: ancestor_failed ready --> blocked: ancestor_failed failed --> [*] done --> [*] blocked --> [*]ready is a synthesised state — emitted exactly once per node when its deps clear (closes O-2 in the schema).
Cycle Guard
Before scheduling, run() requires at least one node with empty depends_on. Zero roots ⇒ exit code 2 with "graph has no roots — cycle or malformed deps".
Key Terms
- touches → File-path strings; two nodes whose
touchesoverlap cannot run concurrently. - parallel_safe → If false, the node runs alone; nothing else may be in-flight or batched alongside it.
- ancestor_failed → The reason string emitted on a
→ blockedtransition, with the failed parent IDs appended.
Q&A
Q: What happens when a node’s parent fails?
A: Every descendant in pending or ready transitions to blocked with reason="ancestor_failed:<parent>". They never run.
Q: Can two parallel-safe nodes that touch overlapping files run together?
A: No. pick_batch rejects them via conflicts() even if both are individually parallel-safe.
Q: How is ready emitted differently from other transitions?
A: It is synthesised by _Scribe.mark_ready and de-duplicated via a _readied set, so a node never gets two pending → ready events.
Examples
For a graph [schema-init] → [auth-table, user-table] → [auth-service, user-service] → [api-gateway], with max_par=3 and auth-table.touches=["migrations/0012_auth.sql"], the scheduler will run auth-table and user-table together (no overlap), but if user-service and auth-service both touches "src/api.ts" only one will run at a time.
neighbors on the map
- Run Outcome Classification interpreting a History row's status pill
- End-to-End Chain Execution Request Flow tracing a chain execution through the entire system
- Dependency DAG & Blast Radius estimating the impact of changing a shared rule