Synchronous Chain Execution Engine (DEC-002)
iris advanced 8 min read
ELI5
A chain execution is like a relay race where each runner must finish their lap before the next runner starts. There’s a referee at the start line who can cancel the whole race if conditions are bad (before gate), and another referee at the finish line who can disqualify the team if they cheated (after gate). If a runner trips and falls, a medic checks them (on_error gate) before deciding whether the race continues.
Technical Deep Dive
DEC-002: Synchronous Chain Execution with Async I/O
The decision: Chains execute step-by-step synchronously; async I/O is used only for external calls. No async task queues, no fire-and-forget, no parallelism within a chain.
Rationale: Gates must be binding. A veto must halt execution immediately, not after downstream steps have already fired. Councils deliberate; they don’t batch-process. Async execution breaks auditability.
Execution Flow
flowchart TD A["ChainExecutor.execute_chain()"] --> B["Parse & validate chain"] B --> C["Evaluate BEFORE gates"] C -->|Any veto| D["Return: status=vetoed"] C -->|All allow| E["Step 0: invoke sprite"] E -->|Success| F["Evaluate AFTER gate (step 0)"] E -->|Failure| G["Evaluate ON_ERROR gate"] F -->|Veto| D F -->|Allow| H["Step 1: invoke sprite"] G -->|Veto| D G -->|Allow| H H -->|Success| I["Step 2..."] H -->|Failure| J["Evaluate ON_ERROR gate"] I -->|All steps complete| K["Evaluate final AFTER gates"] K -->|Veto| D K -->|Allow| L["Return: status=completed"] J -->|Continue| I J -->|Veto| D D --> M["Persist execution history"] L --> M M --> N["Return ChainExecutionResult"]Three-Phase Protocol
sequenceDiagram autonumber participant C as Client participant R as Chain Router participant E as ChainExecutor participant G as GateEngine participant S as Sprite (placeholder)
C->>R: POST /v1/chains/execute R->>E: execute_chain(council_id, chain, input)
rect rgb(255, 240, 245) Note over E,G: Phase 1: Pre-execution gates E->>G: evaluate before-gates G-->>E: allow / veto end
alt Before-gate veto E-->>R: status: vetoed R-->>C: 409 GATE_VETO else Before-gates allow rect rgb(240, 255, 240) Note over E,S: Phase 2: Step execution (sequential) loop For each step E->>S: invoke(action, input_map) S-->>E: output E->>G: evaluate after-gate G-->>E: allow / veto alt After-gate veto E-->>R: status: vetoed R-->>C: 409 GATE_VETO end end end
rect rgb(240, 245, 255) Note over E,G: Phase 3: Post-execution gates E->>G: evaluate final after-gates G-->>E: allow / veto end
alt Final gate veto E-->>R: status: vetoed R-->>C: 409 GATE_VETO else All gates allow E-->>R: status: completed R-->>C: 200 ChainExecutionResult end endExecution Result Model
classDiagram class ChainExecutionResult { +UUID execution_id +UUID council_id +UUID chain_id +ChainStatus status +datetime started_at +datetime completed_at +integer duration_ms +StepExecution[] steps +GateEvaluation[] gates } class StepExecution { +integer order +UUID sprite_id +string action +StepStatus status +object output +string error } class GateEvaluation { +string type +UUID sprite_id +GateDecision decision +string reason } class ChainStatus { <<enumeration>> completed failed vetoed } ChainExecutionResult --> StepExecution : contains ChainExecutionResult --> GateEvaluation : contains ChainExecutionResult --> ChainStatus : usesCouncilExecutor Behaviour (SDK)
The Python SDK’s CouncilExecutor adds an additional layer:
class CouncilExecutor: def execute_all_chains(self): for chain in self.council.chains: result = self.execute_chain(chain.name) if result.status == "vetoed": break # Halt entire council on veto! elif result.status == "failed": log.error(f"Chain {chain.name} failed") continue # Failed chains don't halt councilKey difference: A vetoed chain halts the entire council execution. A failed chain logs an error but the council continues with remaining chains.
State Machine
stateDiagram-v2 [*] --> Validating: Parse chain Validating --> BeforeGates: Structure valid BeforeGates --> Executing: All gates allow BeforeGates --> Vetoed: Gate vetoes Executing --> AfterGate: Step succeeds Executing --> OnErrorGate: Step fails AfterGate --> NextStep: Gate allows AfterGate --> Vetoed: Gate vetoes OnErrorGate --> NextStep: Gate allows (continue) OnErrorGate --> Vetoed: Gate vetoes NextStep --> Executing: More steps NextStep --> FinalGates: All steps done FinalGates --> Completed: Gates allow FinalGates --> Vetoed: Gate vetoes Completed --> [*] Vetoed --> [*]Key Terms
- Synchronous execution → Steps run one at a time, in order, with no parallelism. Each step must complete before the next begins.
- Binding gate → A gate whose veto immediately halts all further execution (no rollback of completed steps, but no new steps start)
- Step execution → A single invocation of a sprite’s capability within a chain
- ChainExecutionResult → The complete audit record of a chain run: status, timing, per-step outputs, and gate decisions
- Veto → A gate decision that halts the chain immediately. Returns HTTP 409 with
GATE_VETOcode. - Placeholder invocation → Current sprite invocation is simulated (
{"status": "simulated"}). Real implementation would use RPC or REST calls to sprite endpoints.
Q&A
Q: Why not make chains async for better performance? A: DEC-002 explicitly rejected async execution because:
- Gates must be truly binding — a veto must stop execution before downstream steps fire
- Async task queues introduce race conditions where a step might complete after a gate veto
- Auditability requires a linear, deterministic execution log (not a DAG)
- “Councils deliberate, they don’t batch-process”
Q: What happens to completed steps when a later gate vetoes?
A: They remain in the result with status="completed". The chain returns status="vetoed" and includes all gate evaluations. There is no automatic rollback — side effects from completed steps may need manual cleanup.
Q: Can steps run in parallel within a chain? A: No. DEC-002 mandates sequential execution. If you need parallel execution, model it as separate chains within the same council or use external orchestration.
Q: How is execution history persisted?
A: The ExecutionHistoryRegistry stores every ChainExecutionResult keyed by execution_id and indexed by chain_id. The GET /v1/chains/{id}/history endpoint provides paginated access with optional status filtering.
Q: What is the maximum chain execution time?
A: MAX_CHAIN_EXECUTION_TIME_SECONDS=300 (5 minutes) in iris-service config. The _parse_timeout() method converts ISO 8601 duration strings (e.g., PT5M) to seconds. Full timeout enforcement is planned but not yet implemented.
Examples
Synchronous chain execution is like an airport security checkpoint:
- Before gate = TSA checks your ID and boarding pass before you enter the queue
- Step 0 = Remove shoes, belt, electronics → place in bins
- After gate (step 0) = X-ray operator checks the scan. If something looks suspicious → veto (full pat-down, no one else advances)
- Step 1 = Walk through metal detector
- After gate (step 1) = If detector beeps → veto (wanded inspection)
- Step 2 = Collect belongings from conveyor
- Final gate = Gate agent verifies your face matches your ID before you board
If any checkpoint says “stop,” the entire process halts immediately. You don’t proceed to the metal detector while TSA is still examining your bag.
neighbors on the map
- REST API — Council Creation & Chain Execution creating a council via the API
- End-to-End Chain Execution Request Flow tracing a chain execution through the entire system