CRUMB a card from devarno-cloud

Synchronous Chain Execution Engine (DEC-002)

iris advanced 8 min read

ELI5

A chain execution is like a relay race where each runner must finish their lap before the next runner starts. There’s a referee at the start line who can cancel the whole race if conditions are bad (before gate), and another referee at the finish line who can disqualify the team if they cheated (after gate). If a runner trips and falls, a medic checks them (on_error gate) before deciding whether the race continues.

Technical Deep Dive

DEC-002: Synchronous Chain Execution with Async I/O

The decision: Chains execute step-by-step synchronously; async I/O is used only for external calls. No async task queues, no fire-and-forget, no parallelism within a chain.

Rationale: Gates must be binding. A veto must halt execution immediately, not after downstream steps have already fired. Councils deliberate; they don’t batch-process. Async execution breaks auditability.

Execution Flow

flowchart TD
A["ChainExecutor.execute_chain()"] --> B["Parse & validate chain"]
B --> C["Evaluate BEFORE gates"]
C -->|Any veto| D["Return: status=vetoed"]
C -->|All allow| E["Step 0: invoke sprite"]
E -->|Success| F["Evaluate AFTER gate (step 0)"]
E -->|Failure| G["Evaluate ON_ERROR gate"]
F -->|Veto| D
F -->|Allow| H["Step 1: invoke sprite"]
G -->|Veto| D
G -->|Allow| H
H -->|Success| I["Step 2..."]
H -->|Failure| J["Evaluate ON_ERROR gate"]
I -->|All steps complete| K["Evaluate final AFTER gates"]
K -->|Veto| D
K -->|Allow| L["Return: status=completed"]
J -->|Continue| I
J -->|Veto| D
D --> M["Persist execution history"]
L --> M
M --> N["Return ChainExecutionResult"]

Three-Phase Protocol

sequenceDiagram
autonumber
participant C as Client
participant R as Chain Router
participant E as ChainExecutor
participant G as GateEngine
participant S as Sprite (placeholder)
C->>R: POST /v1/chains/execute
R->>E: execute_chain(council_id, chain, input)
rect rgb(255, 240, 245)
Note over E,G: Phase 1: Pre-execution gates
E->>G: evaluate before-gates
G-->>E: allow / veto
end
alt Before-gate veto
E-->>R: status: vetoed
R-->>C: 409 GATE_VETO
else Before-gates allow
rect rgb(240, 255, 240)
Note over E,S: Phase 2: Step execution (sequential)
loop For each step
E->>S: invoke(action, input_map)
S-->>E: output
E->>G: evaluate after-gate
G-->>E: allow / veto
alt After-gate veto
E-->>R: status: vetoed
R-->>C: 409 GATE_VETO
end
end
end
rect rgb(240, 245, 255)
Note over E,G: Phase 3: Post-execution gates
E->>G: evaluate final after-gates
G-->>E: allow / veto
end
alt Final gate veto
E-->>R: status: vetoed
R-->>C: 409 GATE_VETO
else All gates allow
E-->>R: status: completed
R-->>C: 200 ChainExecutionResult
end
end

Execution Result Model

classDiagram
class ChainExecutionResult {
+UUID execution_id
+UUID council_id
+UUID chain_id
+ChainStatus status
+datetime started_at
+datetime completed_at
+integer duration_ms
+StepExecution[] steps
+GateEvaluation[] gates
}
class StepExecution {
+integer order
+UUID sprite_id
+string action
+StepStatus status
+object output
+string error
}
class GateEvaluation {
+string type
+UUID sprite_id
+GateDecision decision
+string reason
}
class ChainStatus {
<<enumeration>>
completed
failed
vetoed
}
ChainExecutionResult --> StepExecution : contains
ChainExecutionResult --> GateEvaluation : contains
ChainExecutionResult --> ChainStatus : uses

CouncilExecutor Behaviour (SDK)

The Python SDK’s CouncilExecutor adds an additional layer:

class CouncilExecutor:
def execute_all_chains(self):
for chain in self.council.chains:
result = self.execute_chain(chain.name)
if result.status == "vetoed":
break # Halt entire council on veto!
elif result.status == "failed":
log.error(f"Chain {chain.name} failed")
continue # Failed chains don't halt council

Key difference: A vetoed chain halts the entire council execution. A failed chain logs an error but the council continues with remaining chains.

State Machine

stateDiagram-v2
[*] --> Validating: Parse chain
Validating --> BeforeGates: Structure valid
BeforeGates --> Executing: All gates allow
BeforeGates --> Vetoed: Gate vetoes
Executing --> AfterGate: Step succeeds
Executing --> OnErrorGate: Step fails
AfterGate --> NextStep: Gate allows
AfterGate --> Vetoed: Gate vetoes
OnErrorGate --> NextStep: Gate allows (continue)
OnErrorGate --> Vetoed: Gate vetoes
NextStep --> Executing: More steps
NextStep --> FinalGates: All steps done
FinalGates --> Completed: Gates allow
FinalGates --> Vetoed: Gate vetoes
Completed --> [*]
Vetoed --> [*]

Key Terms

  • Synchronous execution → Steps run one at a time, in order, with no parallelism. Each step must complete before the next begins.
  • Binding gate → A gate whose veto immediately halts all further execution (no rollback of completed steps, but no new steps start)
  • Step execution → A single invocation of a sprite’s capability within a chain
  • ChainExecutionResult → The complete audit record of a chain run: status, timing, per-step outputs, and gate decisions
  • Veto → A gate decision that halts the chain immediately. Returns HTTP 409 with GATE_VETO code.
  • Placeholder invocation → Current sprite invocation is simulated ({"status": "simulated"}). Real implementation would use RPC or REST calls to sprite endpoints.

Q&A

Q: Why not make chains async for better performance? A: DEC-002 explicitly rejected async execution because:

  1. Gates must be truly binding — a veto must stop execution before downstream steps fire
  2. Async task queues introduce race conditions where a step might complete after a gate veto
  3. Auditability requires a linear, deterministic execution log (not a DAG)
  4. “Councils deliberate, they don’t batch-process”

Q: What happens to completed steps when a later gate vetoes? A: They remain in the result with status="completed". The chain returns status="vetoed" and includes all gate evaluations. There is no automatic rollback — side effects from completed steps may need manual cleanup.

Q: Can steps run in parallel within a chain? A: No. DEC-002 mandates sequential execution. If you need parallel execution, model it as separate chains within the same council or use external orchestration.

Q: How is execution history persisted? A: The ExecutionHistoryRegistry stores every ChainExecutionResult keyed by execution_id and indexed by chain_id. The GET /v1/chains/{id}/history endpoint provides paginated access with optional status filtering.

Q: What is the maximum chain execution time? A: MAX_CHAIN_EXECUTION_TIME_SECONDS=300 (5 minutes) in iris-service config. The _parse_timeout() method converts ISO 8601 duration strings (e.g., PT5M) to seconds. Full timeout enforcement is planned but not yet implemented.

Examples

Synchronous chain execution is like an airport security checkpoint:

  1. Before gate = TSA checks your ID and boarding pass before you enter the queue
  2. Step 0 = Remove shoes, belt, electronics → place in bins
  3. After gate (step 0) = X-ray operator checks the scan. If something looks suspicious → veto (full pat-down, no one else advances)
  4. Step 1 = Walk through metal detector
  5. After gate (step 1) = If detector beeps → veto (wanded inspection)
  6. Step 2 = Collect belongings from conveyor
  7. Final gate = Gate agent verifies your face matches your ID before you board

If any checkpoint says “stop,” the entire process halts immediately. You don’t proceed to the metal detector while TSA is still examining your bag.

neighbors on the map