CRUMB a card from devarno-cloud

Prompt-DAG Scheduler

kahn intermediate 6 min read

ELI5

A kitchen with a head chef who only lets a cook start a dish when all the prep dishes it needs are plated, and never lets two cooks share the same chopping board at once. The chef can run several non-clashing dishes in parallel up to a configured stove count.

Technical Deep Dive

core/orchestrator.py schedules Nodes loaded from graph.json (validated by contracts/schemas/graph.schema.json). Each node has depends_on, touches, parallel_safe, and done_when.

Ready Set

ready_nodes(nodes) returns nodes whose status is pending or ready and whose depends_on IDs are all in the done set.

Conflict & Batch Selection

flowchart TD
A["ready_nodes()"] --> B{"len(in_flight)+batch < max_par?"}
B -->|no| Z[stop]
B -->|yes| C{"node.parallel_safe?"}
C -->|false && others present| D[skip]
C -->|true| E{"touches overlap any in_flight or batch?"}
E -->|yes| D
E -->|no| F[append to batch]

conflicts(a, b) is bool(set(a.touches) & set(b.touches)). A non-parallel_safe node forces solo execution.

Status Lifecycle

stateDiagram-v2
[*] --> pending
pending --> ready: deps satisfied
ready --> running: pick_batch
running --> done: ralph converged
running --> failed: max_ralph_iters_reached
pending --> blocked: ancestor_failed
ready --> blocked: ancestor_failed
failed --> [*]
done --> [*]
blocked --> [*]

ready is a synthesised state — emitted exactly once per node when its deps clear (closes O-2 in the schema).

Cycle Guard

Before scheduling, run() requires at least one node with empty depends_on. Zero roots ⇒ exit code 2 with "graph has no roots — cycle or malformed deps".

Key Terms

  • touches → File-path strings; two nodes whose touches overlap cannot run concurrently.
  • parallel_safe → If false, the node runs alone; nothing else may be in-flight or batched alongside it.
  • ancestor_failed → The reason string emitted on a → blocked transition, with the failed parent IDs appended.

Q&A

Q: What happens when a node’s parent fails? A: Every descendant in pending or ready transitions to blocked with reason="ancestor_failed:<parent>". They never run.

Q: Can two parallel-safe nodes that touch overlapping files run together? A: No. pick_batch rejects them via conflicts() even if both are individually parallel-safe.

Q: How is ready emitted differently from other transitions? A: It is synthesised by _Scribe.mark_ready and de-duplicated via a _readied set, so a node never gets two pending → ready events.

Examples

For a graph [schema-init] → [auth-table, user-table] → [auth-service, user-service] → [api-gateway], with max_par=3 and auth-table.touches=["migrations/0012_auth.sql"], the scheduler will run auth-table and user-table together (no overlap), but if user-service and auth-service both touches "src/api.ts" only one will run at a time.

neighbors on the map