HEARTH Driver Protocol
rocky intermediate 6 min read
ELI5
Every way Rocky knows how to deploy a workspace (Docker, kustomize, devarno-cloud) speaks the same four-verb language: provision, status, upgrade, teardown. Calling provision twice with the same inputs gives the same answer without doing anything new; teardown is one-way and afterwards status reports a permanent terminal state.
Technical Deep Dive
The interface (locked surface — Phase 5 D3)
package driver
import "context"
type Driver interface { Provision(ctx context.Context, slug string, profile ProvisioningProfile) (DeploymentRef, error) Status(ctx context.Context, ref DeploymentRef) (Status, error) Upgrade(ctx context.Context, ref DeploymentRef, profile ProvisioningProfile) (DeploymentRef, error) Teardown(ctx context.Context, ref DeploymentRef) error}Cross-language types come from github.com/rocky-hq/contracts/go/hearth — generated from zod via quicktype (Phase 5 D5).
Contract guarantees
Locked by 5a’s protocol tests against FakeDriver:
| Guarantee | Method |
|---|---|
Idempotent on (slug, profile.tier, profile.driver) triple — same inputs return the same DeploymentRef without side effects | Provision |
| Read-only; never mutates state; safe to call any number of times | Status |
May mutate the live deployment but MUST preserve DeploymentRef.workspace_slug; endpoint / secrets_vault_path / last_status may change | Upgrade |
Irreversible; afterwards Status returns terminal tier_torn_down (a state, not an error) | Teardown |
All four MUST honour ctx.Done() and return promptly with ctx.Err() when cancelled | all |
Status state machine
stateDiagram-v2 [*] --> provisioning: Provision called provisioning --> ready: success provisioning --> failed: error (D8: no auto-retry) ready --> upgrading: Upgrade called upgrading --> ready: success upgrading --> failed ready --> tearing_down: Teardown called failed --> tearing_down: admin re-run / cleanup tearing_down --> tier_torn_down tier_torn_down --> [*]failed is not auto-retried. Decision D8 (Phase 5): “Roll back partial state, mark DeploymentRef failed, retain logs, no auto-retry. Provisioning failures usually mean credential or capacity drift; silent retry hides the real problem.”
Class structure
classDiagram class Driver { <<interface>> +Provision(ctx, slug, profile) DeploymentRef +Status(ctx, ref) Status +Upgrade(ctx, ref, profile) DeploymentRef +Teardown(ctx, ref) error } class FakeDriver { records all calls deterministic outputs } class LocalDocker { docker SDK client labels rocky.workspace_slug per-slug bridge network named volumes } class Kustomize { Phase 6 } class DevarnoCloud { Phase 6 } Driver <|.. FakeDriver Driver <|.. LocalDocker Driver <|.. Kustomize Driver <|.. DevarnoCloudWhy FakeDriver
internal/driver/fake/ records every call deterministically. The protocol contract tests run against it (5a) so the interface is locked before any real driver is implemented. Real drivers (LocalDocker in 5c, Kustomize and DevarnoCloud in Phase 6) are added by writing a new file under internal/driver/<name>/ that satisfies the same interface — no schema rewrite, no console-side changes.
The boundary
The console NEVER embeds Go; it talks to hearth over a small JSON-over-HTTP RPC surface (single binary, Unix socket in self-host, private port in cloud). System redesign §“Why Go”: the console doesn’t actually need to embed HEARTH; it talks to it over a small RPC surface — that’s the right place for polyglot.
Key Terms
DeploymentRef→ driver-returned handle persisted in the console DB; carriesworkspace_slug,tier,driver,endpoint,secrets_vault_path,created,last_status(rocky-010)ProvisioningProfile→ resource caps + driver flags resolved from a tier (rocky-009)tier_torn_down→ terminalStatusvalue, a state and not an error- Idempotence triple →
(slug, tier, driver)— same triple twice = sameDeploymentRef, no new side effects
Q&A
Q: What does Status return immediately after Teardown succeeds?
A: tier_torn_down. It is a terminal state, not an error condition. Callers can distinguish “torn down” from “never existed” purely by status value, no exception handling needed.
Q: Why is there no Retry method?
A: There isn’t one. Failures are loud (failed state, retained logs, no auto-retry) and admins re-run Provision explicitly. Idempotence on the (slug, tier, driver) triple makes the re-run safe.
Q: How does adding a new driver in Phase 6 not break existing deployments?
A: The interface is locked by FakeDriver contract tests in 5a. New drivers add a new file under internal/driver/<name>/ and a new value in DriverNameSchema; they don’t touch the four method signatures.
Examples
A locksmith franchise where every shop offers the same four services: cut a key (provision), tell you if your existing key still works (status), re-key your lock (upgrade), and decommission the door entirely (teardown). Asking for the same key twice gives you the original, not a duplicate. Once a door is decommissioned, asking about it just confirms it’s gone — that is not an error, just a fact.
neighbors on the map
- Site Provisioning Saga State Machine debugging a site stuck mid-provision
- Gate Engine & Veto Mechanics designing gate conditions
- Prompt-DAG Scheduler designing a graph.json for a new repo