End-to-End Chain Execution Request Flow

iris advanced 8 min read

ELI5

This is the full journey of a chain execution request from the moment you press “go” to the moment you get a result. It’s like tracking a pizza order from your phone app through the restaurant’s kitchen, past the quality check, into the delivery driver’s car, and finally to your door — with a receipt showing exactly what happened at each step.

Technical Deep Dive

Full Request Lifecycle

sequenceDiagram
    autonumber
    participant Client as Client (SDK/Browser/MCP)
    participant TS as Transport Layer<br/>(HTTP/gRPC/MCP)
    participant FW as FastAPI App
    participant MW as Middleware<br/>(CORS, OTel, Request ID)
    participant RT as chains Router
    participant VAL as Validation<br/>(Pydantic)
    participant CRE as CouncilRegistry
    participant EXE as ChainExecutor
    participant GATE as GateEngine
    participant MET as Metrics Recorder
    participant HIS as ExecutionHistoryRegistry
    participant OT as OpenTelemetry

    Client->>TS: POST /v1/chains/execute
    TS->>FW: HTTP Request
    FW->>MW: Process middleware
    MW->>MW: Inject request_id, tracer, meter
    MW->>RT: Route to execute_chain handler
    RT->>VAL: Validate ChainExecuteRequest
    VAL-->>RT: Validated model

    RT->>CRE: get_by_id(council_id)
    alt Council not found
        CRE-->>RT: None
        RT-->>FW: 404 ApiError
        FW-->>TS: HTTP 404
        TS-->>Client: Error response
    else Council found
        CRE-->>RT: Council
        RT->>RT: Find chain in council.chains
        alt Chain not found
            RT-->>FW: 404 ApiError
            FW-->>TS: HTTP 404
            TS-->>Client: Error response
        else Chain found
            RT->>EXE: execute_chain(council_id, chain, input)

            rect rgb(255, 240, 245)
                Note over EXE,GATE: Phase 1: Before Gates (0-5ms)
                EXE->>GATE: evaluate_gate(before, ...)
                GATE-->>EXE: allow / veto
                alt Veto
                    EXE-->>RT: status=vetoed
                end
            end

            rect rgb(240, 255, 240)
                Note over EXE,GATE: Phase 2: Step Execution (varies)
                loop For each step
                    EXE->>EXE: _invoke_sprite(step)
                    Note over EXE: Placeholder: returns<br/>{"status": "simulated"}
                    EXE->>GATE: evaluate_gate(after, ...)
                    GATE-->>EXE: allow / veto
                    alt Veto
                        EXE-->>RT: status=vetoed
                    end
                end
            end

            rect rgb(240, 245, 255)
                Note over EXE,GATE: Phase 3: Final Gates (0-5ms)
                EXE->>GATE: evaluate_gate(after, ...)
                GATE-->>EXE: allow / veto
                alt Veto
                    EXE-->>RT: status=vetoed
                else Allow
                    EXE-->>RT: status=completed
                end
            end

            RT->>HIS: create(execution_history)
            RT->>MET: record_chain_execution(status, duration, steps)
            RT->>OT: Record spans + metrics

            alt status == vetoed
                RT-->>FW: 409 GATE_VETO
                FW-->>TS: HTTP 409
                TS-->>Client: Error with gate details
            else status == completed
                RT-->>FW: 200 ChainExecutionResult
                FW-->>TS: HTTP 200
                TS-->>Client: Full result
            end
        end
    end

Performance Characteristics

Phase	Typical Duration	Bottleneck
HTTP transport	1–10 ms	Network latency
Middleware	0–1 ms	CORS header processing
Pydantic validation	1–5 ms	Model complexity
Registry lookup	0–1 ms	In-memory dict access
Gate evaluation	0–5 ms	Condition complexity
Step execution	Highly variable	Sprite invocation (placeholder = 0ms; real RPC = 100ms–5s)
History persistence	0–1 ms	In-memory dict insert
Metrics recording	0–1 ms	Counter increment
OTel span export	Async (batched)	Network to collector

Total typical latency (placeholder steps): 10–50 ms Total typical latency (real sprite RPC): 500 ms–30 s depending on chain length

C4 Container View

---
title: "Container Diagram — Chain Execution Flow"
---
flowchart TD
  subgraph client ["**Client**"]
    py["<b>Python Script</b><br/><i>Python</i><br/>Uses iris-sdk"]:::container
    ts["<b>Web App</b><br/><i>TypeScript</i><br/>Uses @iris-hq/sdk"]:::container
    mcp["<b>Claude Desktop</b><br/><i>MCP</i><br/>Uses iris-mcp-server"]:::container
  end
  subgraph core ["**IRIS Core**"]
    fastapi["<b>FastAPI</b><br/><i>Python</i><br/>HTTP routing + validation"]:::container
    executor["<b>ChainExecutor</b><br/><i>Python</i><br/>Synchronous execution"]:::container
    gate["<b>GateEngine</b><br/><i>Python</i><br/>Condition evaluation"]:::container
    registry["<b>In-Memory Registries</b><br/><i>Python</i><br/>Sprite/Council/History stores"]:::container
    metrics["<b>Metrics Recorder</b><br/><i>Python</i><br/>Prometheus counters"]:::container
  end
  subgraph obs ["**Observability**"]
    otel["<b>OTel Collector</b><br/><i>Go</i><br/>Telemetry routing"]:::container
    jaeger["<b>Jaeger</b><br/><i>Go</i><br/>Trace storage"]:::container
    prom["<b>Prometheus</b><br/><i>Go</i><br/>Metrics storage"]:::container
  end
  py -- "POST /v1/chains/execute" --> fastapi
  ts -- "POST /v1/chains/execute" --> fastapi
  mcp -- "POST /v1/chains/execute" --> fastapi
  fastapi -- "Delegates execution" --> executor
  executor -- "Evaluates gates" --> gate
  executor -- "Reads councils/chains" --> registry
  executor -- "Writes execution history" --> registry
  executor -- "Records metrics" --> metrics
  fastapi -- "Exports spans/metrics" --> otel
  otel -- "Forwards traces" --> jaeger
  otel -- "Forwards metrics" --> prom


  classDef person fill:#1c1c24,stroke:#e85d3e,color:#f0ece6
  classDef system fill:#1c1c24,stroke:#d4a574,color:#f0ece6
  classDef ext fill:#141419,stroke:#8b7e74,color:#f0ece6,stroke-dasharray: 4 3
  classDef db fill:#1c1c24,stroke:#d4a574,color:#f0ece6
  classDef container fill:#1c1c24,stroke:#d4a574,color:#f0ece6

Error Paths

flowchart TD
    A["Client Request"] --> B{"Validation?"}
    B -->|Fails| C["400 Bad Request"]
    B -->|Passes| D{"Council exists?"}
    D -->|No| E["404 Council Not Found"]
    D -->|Yes| F{"Chain exists?"}
    F -->|No| G["404 Chain Not Found"]
    F -->|Yes| H{"Before gate?"}
    H -->|Veto| I["409 GATE_VETO"]
    H -->|Allow| J{"Step execution"}
    J -->|Step fails| K{"On-error gate?"}
    K -->|Veto| I
    K -->|Allow| L["200 status=failed"]
    J -->|Step succeeds| M{"After gate?"}
    M -->|Veto| I
    M -->|Allow| N{"More steps?"}
    N -->|Yes| J
    N -->|No| O{"Final gate?"}
    O -->|Veto| I
    O -->|Allow| P["200 status=completed"]

Observability Integration

Every phase of execution is observable:

Layer	Trace Span	Metrics	Logs
HTTP	`http.request`	`http_requests_total`	Access log with request_id
Router	`chain.execute`	`chain_executions_total`	Operation start/end
Gate	`gate.evaluate`	`gate_decisions_total`	Decision + reason
Step	`step.execute`	`chain_steps_executed_total`	Sprite ID + action + status
History	`history.create`	—	Execution ID + status

Key Terms

Request lifecycle → The complete path from client request through all system layers to response
Placeholder invocation → Current step execution returns simulated data; real implementation would use RPC
Bottleneck → The slowest phase of execution, typically real sprite invocation in production
Trace span → A single timed operation within a distributed trace
Error path → The alternative execution flow when validation fails, resources are missing, or gates veto
Async metrics export → Metrics are recorded synchronously but exported to backends asynchronously via batch processors

Q&A

Q: Where is most of the execution time spent? A: In a production system with real sprite invocation, the vast majority of time is spent in _invoke_sprite() — making RPC calls to sprite endpoints or executing AI model inference. With the current placeholder, execution is nearly instantaneous.

Q: How do I trace a specific execution across all layers? A: Use the execution_id (UUID) returned in the ChainExecutionResult. This ID is also stored in ExecutionHistoryRegistry and logged with the request_id.

Q: What happens if the OTel Collector is down during execution? A: Execution continues normally. Spans are batched and retried with exponential backoff. If the collector remains down, spans are dropped after retry exhaustion.

Q: Can I execute a chain without going through the REST API? A: Yes. The Python SDK’s CouncilExecutor can execute chains directly without HTTP, using in-memory models. This is useful for testing and local development.

Q: How does the middleware inject the request_id? A: The create_request_context_middleware() ASGI middleware reads X-Request-ID or X-Correlation-ID headers from the incoming request. If absent, it generates a new UUID. This ID is stored in request.state for access by handlers.

Examples

The end-to-end flow is like ordering food delivery:

You (Client) = Open the app, select “Sushi Platter,” pay
App (Transport) = Sends order to restaurant’s tablet
Host (Middleware) = Confirms the order format is valid, assigns order #12345
Kitchen Manager (Router) = Checks if the restaurant is open (council exists), finds the sushi menu (chain exists)
Chef (ChainExecutor) = Starts cooking: rice → fish → roll → cut
Food Safety Inspector (GateEngine) = Checks rice temperature before serving → approves
Receipt Printer (ExecutionHistory) = Records order #12345 with all items and timestamps
Analytics (Metrics) = “One sushi order completed in 12 minutes”
GPS Tracker (OpenTelemetry) = Shows the full journey from order placed to delivery
Delivery Driver (HTTP Response) = Brings you the result: delicious sushi + receipt

If the inspector finds the fish is bad (gate veto), the order is cancelled immediately (409 GATE_VETO), you get a refund, and the kitchen stops cooking — no half-finished orders delivered.

neighbors on the map

OpenTelemetry Instrumentation & Metrics adding observability to iris-service code
In-Memory Registry Architecture understanding how iris-service stores data
CI/CD Pipeline & Schema Propagation understanding the build process