Provision-then-RALPH End-to-End
rocky advanced 7 min read
ELI5
The big test is a one-take play: provision a workspace from nothing, run a four-prompt RALPH job inside it, tear it down. If the audit log (HATCH) doesn’t show every act in the right order, the run failed. The whole thing must work without any cloud account so anyone self-hosting can run the same test.
Technical Deep Dive
What the test is and where it lives
tests/e2e/test_provision_then_ralph.py (Phase 5 §12). Python, because RALPH is Python and Python is the producer language for KAHN events. Gate ROCKY_E2E=1; always-run on parent CI; opt-in locally (Docker daemon required).
The OSS-parity contract (verbatim)
The
solotier +LocalDockerdriver +LocalAuthadapter path must pass the full e2e test with no Polar.sh network calls and no devarno-cloud-tenant credentials. If a self-hoster’s CI breaks on the e2e test, we broke OSS.
The Phase 5e CI runs the e2e in this exact configuration on every PR; the job MUST be required for merge.
The play, six acts
sequenceDiagram autonumber participant Test as pytest participant Hearth as hearth (Go binary, unix socket) participant Console as console (SS-08 + SS-07 routes) participant LocalAuth participant Worker as ralph serve participant Sink as fake HATCH sink
Test->>Hearth: spawn binary Test->>Console: POST /api/hearth/workspaces { slug:"test", tier:"solo" } Console->>LocalAuth: resolve session → admin Console->>Sink: hearth.provisioning_started Console->>Hearth: Provision(slug, profile) Hearth-->>Console: DeploymentRef{ status: ready } Console->>Sink: hearth.provisioned
Test->>Console: POST /api/ralph/runs (4-prompt mock) Console->>Worker: submit (HMAC bearer) Worker-->>Console: { run_id } Console->>Sink: ralph.run.started loop per prompt (KAHN node) Worker->>Console: SSE NodeAttempt → NodeTransition Console->>Sink: ralph.node.* end Console->>Sink: ralph.run.ended
Test->>Console: DELETE /api/hearth/workspaces/test Console->>Sink: hearth.decommission_started Console->>Hearth: Teardown(ref) Hearth-->>Console: ok Console->>Sink: hearth.decommissioned Test->>Hearth: Status(ref) Hearth-->>Test: tier_torn_downAsserted event chain (the test’s exit criterion)
hearth.provisioning_started → hearth.provisioned →ralph.run.started → ralph.node.* → ralph.run.ended →hearth.decommission_started → hearth.decommissionedImage stand-ins (Phase 5 D6)
The LocalDocker driver pulls CAIRNET + LORE images by tag. While those are not yet published, the e2e uses tiny stand-ins (nginx:alpine for both) and asserts only on the deployment lifecycle, not on CAIRNET / LORE service behaviour. A subsequent post-Phase-5 PR swaps stand-ins for real images once those services publish to a registry.
What can fail and where you find out
flowchart TB A[hearth not on path] -->|spawn fails| Sa[pytest fixture error] B[Docker daemon not running] -->|Provision errors| Sb[hearth.failed event] C[VAULT bearer drift] -->|console→worker 401| Sc[ralph.run.started never fires] D[KAHN schema drift] -->|SSE frame rejected| Sd[ralph.node.* count under 4] E[teardown timeout] -->|Status stuck tearing_down| Se[hearth.decommissioned never fires]Each failure is a precise gap in the asserted chain — there is no log-noise diagnosis step.
Key Terms
- Fake HATCH sink → in-memory or JSONL recorder injected by the test fixture; lets the assertion compare the literal event sequence
- Image stand-in → tiny placeholder image (
nginx:alpine) used until real CAIRNET / LORE images publish (Phase 5 D6) - OSS parity invariant → solo + LocalDocker + LocalAuth must pass the full e2e with no Polar / cloud creds
- Required-for-merge → the e2e job is on branch protection; a red e2e blocks Phase 5 merges to
main
Q&A
Q: Why is the e2e written in Python rather than TypeScript? A: RALPH is Python and the producer language for KAHN events. Driving the producer side from the same language reduces the test’s harness surface; the console is exercised through HTTP, which any language can drive.
Q: What does the test assert about KAHN frames specifically?
A: That ralph.node.* fires once per prompt (four times for the mock run) and is bracketed by exactly one ralph.run.started and one ralph.run.ended. KILN-optional fields are tolerated when present.
Q: Why must the e2e use stand-in images instead of real CAIRNET / LORE?
A: Phase 5 explicitly does not move CAIRNET / LORE source. They already exist in devarno-cloud/cairnet and …/lore and ship via published images; until those are published, asserting on their service behaviour would couple the test to internals it must not know about.
Examples
A fire drill that runs every Friday: the alarm sounds, the building empties through specific stairwells in a specific order, the warden notes each floor passing the muster point, the building re-opens. If anyone is missing or the muster sheet has gaps, the drill failed — and the drill must work without the on-call inspector being present (OSS parity).
neighbors on the map
- Site Provisioning Saga State Machine debugging a site stuck mid-provision
- End-to-End Chain Execution Request Flow tracing a chain execution through the entire system
- Ralph Convergence Loop tuning max_ralph_iters for a flaky node