HEARTH Driver Protocol

rocky intermediate 6 min read

ELI5

Every way Rocky knows how to deploy a workspace (Docker, kustomize, devarno-cloud) speaks the same four-verb language: provision, status, upgrade, teardown. Calling provision twice with the same inputs gives the same answer without doing anything new; teardown is one-way and afterwards status reports a permanent terminal state.

Technical Deep Dive

The interface (locked surface — Phase 5 D3)

package driver

import "context"

type Driver interface {
    Provision(ctx context.Context, slug string, profile ProvisioningProfile) (DeploymentRef, error)
    Status(ctx context.Context, ref DeploymentRef) (Status, error)
    Upgrade(ctx context.Context, ref DeploymentRef, profile ProvisioningProfile) (DeploymentRef, error)
    Teardown(ctx context.Context, ref DeploymentRef) error
}

Cross-language types come from github.com/rocky-hq/contracts/go/hearth — generated from zod via quicktype (Phase 5 D5).

Contract guarantees

Locked by 5a’s protocol tests against FakeDriver:

Guarantee	Method
Idempotent on `(slug, profile.tier, profile.driver)` triple — same inputs return the same `DeploymentRef` without side effects	`Provision`
Read-only; never mutates state; safe to call any number of times	`Status`
May mutate the live deployment but MUST preserve `DeploymentRef.workspace_slug`; `endpoint` / `secrets_vault_path` / `last_status` may change	`Upgrade`
Irreversible; afterwards `Status` returns terminal `tier_torn_down` (a state, not an error)	`Teardown`
All four MUST honour `ctx.Done()` and return promptly with `ctx.Err()` when cancelled	all

Status state machine

stateDiagram-v2
    [*] --> provisioning: Provision called
    provisioning --> ready: success
    provisioning --> failed: error (D8: no auto-retry)
    ready --> upgrading: Upgrade called
    upgrading --> ready: success
    upgrading --> failed
    ready --> tearing_down: Teardown called
    failed --> tearing_down: admin re-run / cleanup
    tearing_down --> tier_torn_down
    tier_torn_down --> [*]

failed is not auto-retried. Decision D8 (Phase 5): “Roll back partial state, mark DeploymentRef failed, retain logs, no auto-retry. Provisioning failures usually mean credential or capacity drift; silent retry hides the real problem.”

Class structure

classDiagram
    class Driver {
        <<interface>>
        +Provision(ctx, slug, profile) DeploymentRef
        +Status(ctx, ref) Status
        +Upgrade(ctx, ref, profile) DeploymentRef
        +Teardown(ctx, ref) error
    }
    class FakeDriver {
        records all calls
        deterministic outputs
    }
    class LocalDocker {
        docker SDK client
        labels rocky.workspace_slug
        per-slug bridge network
        named volumes
    }
    class Kustomize {
        Phase 6
    }
    class DevarnoCloud {
        Phase 6
    }
    Driver <|.. FakeDriver
    Driver <|.. LocalDocker
    Driver <|.. Kustomize
    Driver <|.. DevarnoCloud

Why FakeDriver

internal/driver/fake/ records every call deterministically. The protocol contract tests run against it (5a) so the interface is locked before any real driver is implemented. Real drivers (LocalDocker in 5c, Kustomize and DevarnoCloud in Phase 6) are added by writing a new file under internal/driver/<name>/ that satisfies the same interface — no schema rewrite, no console-side changes.

The boundary

The console NEVER embeds Go; it talks to hearth over a small JSON-over-HTTP RPC surface (single binary, Unix socket in self-host, private port in cloud). System redesign §“Why Go”: the console doesn’t actually need to embed HEARTH; it talks to it over a small RPC surface — that’s the right place for polyglot.

Key Terms

DeploymentRef → driver-returned handle persisted in the console DB; carries workspace_slug, tier, driver, endpoint, secrets_vault_path, created, last_status (rocky-010)
ProvisioningProfile → resource caps + driver flags resolved from a tier (rocky-009)
tier_torn_down → terminal Status value, a state and not an error
Idempotence triple → (slug, tier, driver) — same triple twice = same DeploymentRef, no new side effects

Q&A

Q: What does Status return immediately after Teardown succeeds? A: tier_torn_down. It is a terminal state, not an error condition. Callers can distinguish “torn down” from “never existed” purely by status value, no exception handling needed.

Q: Why is there no Retry method? A: There isn’t one. Failures are loud (failed state, retained logs, no auto-retry) and admins re-run Provision explicitly. Idempotence on the (slug, tier, driver) triple makes the re-run safe.

Q: How does adding a new driver in Phase 6 not break existing deployments? A: The interface is locked by FakeDriver contract tests in 5a. New drivers add a new file under internal/driver/<name>/ and a new value in DriverNameSchema; they don’t touch the four method signatures.

Examples

A locksmith franchise where every shop offers the same four services: cut a key (provision), tell you if your existing key still works (status), re-key your lock (upgrade), and decommission the door entirely (teardown). Asking for the same key twice gives you the original, not a duplicate. Once a door is decommissioned, asking about it just confirms it’s gone — that is not an error, just a fact.

neighbors on the map

Site Provisioning Saga State Machine debugging a site stuck mid-provision
Gate Engine & Veto Mechanics designing gate conditions
Prompt-DAG Scheduler designing a graph.json for a new repo