CRUMB a card from devarno-cloud

End-to-End Chain Execution Request Flow

iris advanced 8 min read

ELI5

This is the full journey of a chain execution request from the moment you press “go” to the moment you get a result. It’s like tracking a pizza order from your phone app through the restaurant’s kitchen, past the quality check, into the delivery driver’s car, and finally to your door — with a receipt showing exactly what happened at each step.

Technical Deep Dive

Full Request Lifecycle

sequenceDiagram
autonumber
participant Client as Client (SDK/Browser/MCP)
participant TS as Transport Layer<br/>(HTTP/gRPC/MCP)
participant FW as FastAPI App
participant MW as Middleware<br/>(CORS, OTel, Request ID)
participant RT as chains Router
participant VAL as Validation<br/>(Pydantic)
participant CRE as CouncilRegistry
participant EXE as ChainExecutor
participant GATE as GateEngine
participant MET as Metrics Recorder
participant HIS as ExecutionHistoryRegistry
participant OT as OpenTelemetry
Client->>TS: POST /v1/chains/execute
TS->>FW: HTTP Request
FW->>MW: Process middleware
MW->>MW: Inject request_id, tracer, meter
MW->>RT: Route to execute_chain handler
RT->>VAL: Validate ChainExecuteRequest
VAL-->>RT: Validated model
RT->>CRE: get_by_id(council_id)
alt Council not found
CRE-->>RT: None
RT-->>FW: 404 ApiError
FW-->>TS: HTTP 404
TS-->>Client: Error response
else Council found
CRE-->>RT: Council
RT->>RT: Find chain in council.chains
alt Chain not found
RT-->>FW: 404 ApiError
FW-->>TS: HTTP 404
TS-->>Client: Error response
else Chain found
RT->>EXE: execute_chain(council_id, chain, input)
rect rgb(255, 240, 245)
Note over EXE,GATE: Phase 1: Before Gates (0-5ms)
EXE->>GATE: evaluate_gate(before, ...)
GATE-->>EXE: allow / veto
alt Veto
EXE-->>RT: status=vetoed
end
end
rect rgb(240, 255, 240)
Note over EXE,GATE: Phase 2: Step Execution (varies)
loop For each step
EXE->>EXE: _invoke_sprite(step)
Note over EXE: Placeholder: returns<br/>{"status": "simulated"}
EXE->>GATE: evaluate_gate(after, ...)
GATE-->>EXE: allow / veto
alt Veto
EXE-->>RT: status=vetoed
end
end
end
rect rgb(240, 245, 255)
Note over EXE,GATE: Phase 3: Final Gates (0-5ms)
EXE->>GATE: evaluate_gate(after, ...)
GATE-->>EXE: allow / veto
alt Veto
EXE-->>RT: status=vetoed
else Allow
EXE-->>RT: status=completed
end
end
RT->>HIS: create(execution_history)
RT->>MET: record_chain_execution(status, duration, steps)
RT->>OT: Record spans + metrics
alt status == vetoed
RT-->>FW: 409 GATE_VETO
FW-->>TS: HTTP 409
TS-->>Client: Error with gate details
else status == completed
RT-->>FW: 200 ChainExecutionResult
FW-->>TS: HTTP 200
TS-->>Client: Full result
end
end
end

Performance Characteristics

PhaseTypical DurationBottleneck
HTTP transport1–10 msNetwork latency
Middleware0–1 msCORS header processing
Pydantic validation1–5 msModel complexity
Registry lookup0–1 msIn-memory dict access
Gate evaluation0–5 msCondition complexity
Step executionHighly variableSprite invocation (placeholder = 0ms; real RPC = 100ms–5s)
History persistence0–1 msIn-memory dict insert
Metrics recording0–1 msCounter increment
OTel span exportAsync (batched)Network to collector

Total typical latency (placeholder steps): 10–50 ms Total typical latency (real sprite RPC): 500 ms–30 s depending on chain length

C4 Container View

---
title: "Container Diagram — Chain Execution Flow"
---
flowchart TD
subgraph client ["**Client**"]
py["<b>Python Script</b><br/><i>Python</i><br/>Uses iris-sdk"]:::container
ts["<b>Web App</b><br/><i>TypeScript</i><br/>Uses @iris-hq/sdk"]:::container
mcp["<b>Claude Desktop</b><br/><i>MCP</i><br/>Uses iris-mcp-server"]:::container
end
subgraph core ["**IRIS Core**"]
fastapi["<b>FastAPI</b><br/><i>Python</i><br/>HTTP routing + validation"]:::container
executor["<b>ChainExecutor</b><br/><i>Python</i><br/>Synchronous execution"]:::container
gate["<b>GateEngine</b><br/><i>Python</i><br/>Condition evaluation"]:::container
registry["<b>In-Memory Registries</b><br/><i>Python</i><br/>Sprite/Council/History stores"]:::container
metrics["<b>Metrics Recorder</b><br/><i>Python</i><br/>Prometheus counters"]:::container
end
subgraph obs ["**Observability**"]
otel["<b>OTel Collector</b><br/><i>Go</i><br/>Telemetry routing"]:::container
jaeger["<b>Jaeger</b><br/><i>Go</i><br/>Trace storage"]:::container
prom["<b>Prometheus</b><br/><i>Go</i><br/>Metrics storage"]:::container
end
py -- "POST /v1/chains/execute" --> fastapi
ts -- "POST /v1/chains/execute" --> fastapi
mcp -- "POST /v1/chains/execute" --> fastapi
fastapi -- "Delegates execution" --> executor
executor -- "Evaluates gates" --> gate
executor -- "Reads councils/chains" --> registry
executor -- "Writes execution history" --> registry
executor -- "Records metrics" --> metrics
fastapi -- "Exports spans/metrics" --> otel
otel -- "Forwards traces" --> jaeger
otel -- "Forwards metrics" --> prom
classDef person fill:#1c1c24,stroke:#e85d3e,color:#f0ece6
classDef system fill:#1c1c24,stroke:#d4a574,color:#f0ece6
classDef ext fill:#141419,stroke:#8b7e74,color:#f0ece6,stroke-dasharray: 4 3
classDef db fill:#1c1c24,stroke:#d4a574,color:#f0ece6
classDef container fill:#1c1c24,stroke:#d4a574,color:#f0ece6

Error Paths

flowchart TD
A["Client Request"] --> B{"Validation?"}
B -->|Fails| C["400 Bad Request"]
B -->|Passes| D{"Council exists?"}
D -->|No| E["404 Council Not Found"]
D -->|Yes| F{"Chain exists?"}
F -->|No| G["404 Chain Not Found"]
F -->|Yes| H{"Before gate?"}
H -->|Veto| I["409 GATE_VETO"]
H -->|Allow| J{"Step execution"}
J -->|Step fails| K{"On-error gate?"}
K -->|Veto| I
K -->|Allow| L["200 status=failed"]
J -->|Step succeeds| M{"After gate?"}
M -->|Veto| I
M -->|Allow| N{"More steps?"}
N -->|Yes| J
N -->|No| O{"Final gate?"}
O -->|Veto| I
O -->|Allow| P["200 status=completed"]

Observability Integration

Every phase of execution is observable:

LayerTrace SpanMetricsLogs
HTTPhttp.requesthttp_requests_totalAccess log with request_id
Routerchain.executechain_executions_totalOperation start/end
Gategate.evaluategate_decisions_totalDecision + reason
Stepstep.executechain_steps_executed_totalSprite ID + action + status
Historyhistory.createExecution ID + status

Key Terms

  • Request lifecycle → The complete path from client request through all system layers to response
  • Placeholder invocation → Current step execution returns simulated data; real implementation would use RPC
  • Bottleneck → The slowest phase of execution, typically real sprite invocation in production
  • Trace span → A single timed operation within a distributed trace
  • Error path → The alternative execution flow when validation fails, resources are missing, or gates veto
  • Async metrics export → Metrics are recorded synchronously but exported to backends asynchronously via batch processors

Q&A

Q: Where is most of the execution time spent? A: In a production system with real sprite invocation, the vast majority of time is spent in _invoke_sprite() — making RPC calls to sprite endpoints or executing AI model inference. With the current placeholder, execution is nearly instantaneous.

Q: How do I trace a specific execution across all layers? A: Use the execution_id (UUID) returned in the ChainExecutionResult. This ID is also stored in ExecutionHistoryRegistry and logged with the request_id.

Q: What happens if the OTel Collector is down during execution? A: Execution continues normally. Spans are batched and retried with exponential backoff. If the collector remains down, spans are dropped after retry exhaustion.

Q: Can I execute a chain without going through the REST API? A: Yes. The Python SDK’s CouncilExecutor can execute chains directly without HTTP, using in-memory models. This is useful for testing and local development.

Q: How does the middleware inject the request_id? A: The create_request_context_middleware() ASGI middleware reads X-Request-ID or X-Correlation-ID headers from the incoming request. If absent, it generates a new UUID. This ID is stored in request.state for access by handlers.

Examples

The end-to-end flow is like ordering food delivery:

  1. You (Client) = Open the app, select “Sushi Platter,” pay
  2. App (Transport) = Sends order to restaurant’s tablet
  3. Host (Middleware) = Confirms the order format is valid, assigns order #12345
  4. Kitchen Manager (Router) = Checks if the restaurant is open (council exists), finds the sushi menu (chain exists)
  5. Chef (ChainExecutor) = Starts cooking: rice → fish → roll → cut
  6. Food Safety Inspector (GateEngine) = Checks rice temperature before serving → approves
  7. Receipt Printer (ExecutionHistory) = Records order #12345 with all items and timestamps
  8. Analytics (Metrics) = “One sushi order completed in 12 minutes”
  9. GPS Tracker (OpenTelemetry) = Shows the full journey from order placed to delivery
  10. Delivery Driver (HTTP Response) = Brings you the result: delicious sushi + receipt

If the inspector finds the fish is bad (gate veto), the order is cancelled immediately (409 GATE_VETO), you get a refund, and the kitchen stops cooking — no half-finished orders delivered.

neighbors on the map