End-to-End Chain Execution Request Flow
iris advanced 8 min read
ELI5
This is the full journey of a chain execution request from the moment you press “go” to the moment you get a result. It’s like tracking a pizza order from your phone app through the restaurant’s kitchen, past the quality check, into the delivery driver’s car, and finally to your door — with a receipt showing exactly what happened at each step.
Technical Deep Dive
Full Request Lifecycle
sequenceDiagram autonumber participant Client as Client (SDK/Browser/MCP) participant TS as Transport Layer<br/>(HTTP/gRPC/MCP) participant FW as FastAPI App participant MW as Middleware<br/>(CORS, OTel, Request ID) participant RT as chains Router participant VAL as Validation<br/>(Pydantic) participant CRE as CouncilRegistry participant EXE as ChainExecutor participant GATE as GateEngine participant MET as Metrics Recorder participant HIS as ExecutionHistoryRegistry participant OT as OpenTelemetry
Client->>TS: POST /v1/chains/execute TS->>FW: HTTP Request FW->>MW: Process middleware MW->>MW: Inject request_id, tracer, meter MW->>RT: Route to execute_chain handler RT->>VAL: Validate ChainExecuteRequest VAL-->>RT: Validated model
RT->>CRE: get_by_id(council_id) alt Council not found CRE-->>RT: None RT-->>FW: 404 ApiError FW-->>TS: HTTP 404 TS-->>Client: Error response else Council found CRE-->>RT: Council RT->>RT: Find chain in council.chains alt Chain not found RT-->>FW: 404 ApiError FW-->>TS: HTTP 404 TS-->>Client: Error response else Chain found RT->>EXE: execute_chain(council_id, chain, input)
rect rgb(255, 240, 245) Note over EXE,GATE: Phase 1: Before Gates (0-5ms) EXE->>GATE: evaluate_gate(before, ...) GATE-->>EXE: allow / veto alt Veto EXE-->>RT: status=vetoed end end
rect rgb(240, 255, 240) Note over EXE,GATE: Phase 2: Step Execution (varies) loop For each step EXE->>EXE: _invoke_sprite(step) Note over EXE: Placeholder: returns<br/>{"status": "simulated"} EXE->>GATE: evaluate_gate(after, ...) GATE-->>EXE: allow / veto alt Veto EXE-->>RT: status=vetoed end end end
rect rgb(240, 245, 255) Note over EXE,GATE: Phase 3: Final Gates (0-5ms) EXE->>GATE: evaluate_gate(after, ...) GATE-->>EXE: allow / veto alt Veto EXE-->>RT: status=vetoed else Allow EXE-->>RT: status=completed end end
RT->>HIS: create(execution_history) RT->>MET: record_chain_execution(status, duration, steps) RT->>OT: Record spans + metrics
alt status == vetoed RT-->>FW: 409 GATE_VETO FW-->>TS: HTTP 409 TS-->>Client: Error with gate details else status == completed RT-->>FW: 200 ChainExecutionResult FW-->>TS: HTTP 200 TS-->>Client: Full result end end endPerformance Characteristics
| Phase | Typical Duration | Bottleneck |
|---|---|---|
| HTTP transport | 1–10 ms | Network latency |
| Middleware | 0–1 ms | CORS header processing |
| Pydantic validation | 1–5 ms | Model complexity |
| Registry lookup | 0–1 ms | In-memory dict access |
| Gate evaluation | 0–5 ms | Condition complexity |
| Step execution | Highly variable | Sprite invocation (placeholder = 0ms; real RPC = 100ms–5s) |
| History persistence | 0–1 ms | In-memory dict insert |
| Metrics recording | 0–1 ms | Counter increment |
| OTel span export | Async (batched) | Network to collector |
Total typical latency (placeholder steps): 10–50 ms Total typical latency (real sprite RPC): 500 ms–30 s depending on chain length
C4 Container View
---title: "Container Diagram — Chain Execution Flow"---flowchart TD subgraph client ["**Client**"] py["<b>Python Script</b><br/><i>Python</i><br/>Uses iris-sdk"]:::container ts["<b>Web App</b><br/><i>TypeScript</i><br/>Uses @iris-hq/sdk"]:::container mcp["<b>Claude Desktop</b><br/><i>MCP</i><br/>Uses iris-mcp-server"]:::container end subgraph core ["**IRIS Core**"] fastapi["<b>FastAPI</b><br/><i>Python</i><br/>HTTP routing + validation"]:::container executor["<b>ChainExecutor</b><br/><i>Python</i><br/>Synchronous execution"]:::container gate["<b>GateEngine</b><br/><i>Python</i><br/>Condition evaluation"]:::container registry["<b>In-Memory Registries</b><br/><i>Python</i><br/>Sprite/Council/History stores"]:::container metrics["<b>Metrics Recorder</b><br/><i>Python</i><br/>Prometheus counters"]:::container end subgraph obs ["**Observability**"] otel["<b>OTel Collector</b><br/><i>Go</i><br/>Telemetry routing"]:::container jaeger["<b>Jaeger</b><br/><i>Go</i><br/>Trace storage"]:::container prom["<b>Prometheus</b><br/><i>Go</i><br/>Metrics storage"]:::container end py -- "POST /v1/chains/execute" --> fastapi ts -- "POST /v1/chains/execute" --> fastapi mcp -- "POST /v1/chains/execute" --> fastapi fastapi -- "Delegates execution" --> executor executor -- "Evaluates gates" --> gate executor -- "Reads councils/chains" --> registry executor -- "Writes execution history" --> registry executor -- "Records metrics" --> metrics fastapi -- "Exports spans/metrics" --> otel otel -- "Forwards traces" --> jaeger otel -- "Forwards metrics" --> prom
classDef person fill:#1c1c24,stroke:#e85d3e,color:#f0ece6 classDef system fill:#1c1c24,stroke:#d4a574,color:#f0ece6 classDef ext fill:#141419,stroke:#8b7e74,color:#f0ece6,stroke-dasharray: 4 3 classDef db fill:#1c1c24,stroke:#d4a574,color:#f0ece6 classDef container fill:#1c1c24,stroke:#d4a574,color:#f0ece6Error Paths
flowchart TD A["Client Request"] --> B{"Validation?"} B -->|Fails| C["400 Bad Request"] B -->|Passes| D{"Council exists?"} D -->|No| E["404 Council Not Found"] D -->|Yes| F{"Chain exists?"} F -->|No| G["404 Chain Not Found"] F -->|Yes| H{"Before gate?"} H -->|Veto| I["409 GATE_VETO"] H -->|Allow| J{"Step execution"} J -->|Step fails| K{"On-error gate?"} K -->|Veto| I K -->|Allow| L["200 status=failed"] J -->|Step succeeds| M{"After gate?"} M -->|Veto| I M -->|Allow| N{"More steps?"} N -->|Yes| J N -->|No| O{"Final gate?"} O -->|Veto| I O -->|Allow| P["200 status=completed"]Observability Integration
Every phase of execution is observable:
| Layer | Trace Span | Metrics | Logs |
|---|---|---|---|
| HTTP | http.request | http_requests_total | Access log with request_id |
| Router | chain.execute | chain_executions_total | Operation start/end |
| Gate | gate.evaluate | gate_decisions_total | Decision + reason |
| Step | step.execute | chain_steps_executed_total | Sprite ID + action + status |
| History | history.create | — | Execution ID + status |
Key Terms
- Request lifecycle → The complete path from client request through all system layers to response
- Placeholder invocation → Current step execution returns simulated data; real implementation would use RPC
- Bottleneck → The slowest phase of execution, typically real sprite invocation in production
- Trace span → A single timed operation within a distributed trace
- Error path → The alternative execution flow when validation fails, resources are missing, or gates veto
- Async metrics export → Metrics are recorded synchronously but exported to backends asynchronously via batch processors
Q&A
Q: Where is most of the execution time spent?
A: In a production system with real sprite invocation, the vast majority of time is spent in _invoke_sprite() — making RPC calls to sprite endpoints or executing AI model inference. With the current placeholder, execution is nearly instantaneous.
Q: How do I trace a specific execution across all layers?
A: Use the execution_id (UUID) returned in the ChainExecutionResult. This ID is also stored in ExecutionHistoryRegistry and logged with the request_id.
Q: What happens if the OTel Collector is down during execution? A: Execution continues normally. Spans are batched and retried with exponential backoff. If the collector remains down, spans are dropped after retry exhaustion.
Q: Can I execute a chain without going through the REST API?
A: Yes. The Python SDK’s CouncilExecutor can execute chains directly without HTTP, using in-memory models. This is useful for testing and local development.
Q: How does the middleware inject the request_id?
A: The create_request_context_middleware() ASGI middleware reads X-Request-ID or X-Correlation-ID headers from the incoming request. If absent, it generates a new UUID. This ID is stored in request.state for access by handlers.
Examples
The end-to-end flow is like ordering food delivery:
- You (Client) = Open the app, select “Sushi Platter,” pay
- App (Transport) = Sends order to restaurant’s tablet
- Host (Middleware) = Confirms the order format is valid, assigns order #12345
- Kitchen Manager (Router) = Checks if the restaurant is open (council exists), finds the sushi menu (chain exists)
- Chef (ChainExecutor) = Starts cooking: rice → fish → roll → cut
- Food Safety Inspector (GateEngine) = Checks rice temperature before serving → approves
- Receipt Printer (ExecutionHistory) = Records order #12345 with all items and timestamps
- Analytics (Metrics) = “One sushi order completed in 12 minutes”
- GPS Tracker (OpenTelemetry) = Shows the full journey from order placed to delivery
- Delivery Driver (HTTP Response) = Brings you the result: delicious sushi + receipt
If the inspector finds the fish is bad (gate veto), the order is cancelled immediately (409 GATE_VETO), you get a refund, and the kitchen stops cooking — no half-finished orders delivered.
neighbors on the map
- OpenTelemetry Instrumentation & Metrics adding observability to iris-service code
- In-Memory Registry Architecture understanding how iris-service stores data
- CI/CD Pipeline & Schema Propagation understanding the build process