CRUMB a card from devarno-cloud

HEARTH Driver Protocol

rocky intermediate 6 min read

ELI5

Every way Rocky knows how to deploy a workspace (Docker, kustomize, devarno-cloud) speaks the same four-verb language: provision, status, upgrade, teardown. Calling provision twice with the same inputs gives the same answer without doing anything new; teardown is one-way and afterwards status reports a permanent terminal state.

Technical Deep Dive

The interface (locked surface — Phase 5 D3)

hearth/internal/driver/driver.go
package driver
import "context"
type Driver interface {
Provision(ctx context.Context, slug string, profile ProvisioningProfile) (DeploymentRef, error)
Status(ctx context.Context, ref DeploymentRef) (Status, error)
Upgrade(ctx context.Context, ref DeploymentRef, profile ProvisioningProfile) (DeploymentRef, error)
Teardown(ctx context.Context, ref DeploymentRef) error
}

Cross-language types come from github.com/rocky-hq/contracts/go/hearth — generated from zod via quicktype (Phase 5 D5).

Contract guarantees

Locked by 5a’s protocol tests against FakeDriver:

GuaranteeMethod
Idempotent on (slug, profile.tier, profile.driver) triple — same inputs return the same DeploymentRef without side effectsProvision
Read-only; never mutates state; safe to call any number of timesStatus
May mutate the live deployment but MUST preserve DeploymentRef.workspace_slug; endpoint / secrets_vault_path / last_status may changeUpgrade
Irreversible; afterwards Status returns terminal tier_torn_down (a state, not an error)Teardown
All four MUST honour ctx.Done() and return promptly with ctx.Err() when cancelledall

Status state machine

stateDiagram-v2
[*] --> provisioning: Provision called
provisioning --> ready: success
provisioning --> failed: error (D8: no auto-retry)
ready --> upgrading: Upgrade called
upgrading --> ready: success
upgrading --> failed
ready --> tearing_down: Teardown called
failed --> tearing_down: admin re-run / cleanup
tearing_down --> tier_torn_down
tier_torn_down --> [*]

failed is not auto-retried. Decision D8 (Phase 5): “Roll back partial state, mark DeploymentRef failed, retain logs, no auto-retry. Provisioning failures usually mean credential or capacity drift; silent retry hides the real problem.”

Class structure

classDiagram
class Driver {
<<interface>>
+Provision(ctx, slug, profile) DeploymentRef
+Status(ctx, ref) Status
+Upgrade(ctx, ref, profile) DeploymentRef
+Teardown(ctx, ref) error
}
class FakeDriver {
records all calls
deterministic outputs
}
class LocalDocker {
docker SDK client
labels rocky.workspace_slug
per-slug bridge network
named volumes
}
class Kustomize {
Phase 6
}
class DevarnoCloud {
Phase 6
}
Driver <|.. FakeDriver
Driver <|.. LocalDocker
Driver <|.. Kustomize
Driver <|.. DevarnoCloud

Why FakeDriver

internal/driver/fake/ records every call deterministically. The protocol contract tests run against it (5a) so the interface is locked before any real driver is implemented. Real drivers (LocalDocker in 5c, Kustomize and DevarnoCloud in Phase 6) are added by writing a new file under internal/driver/<name>/ that satisfies the same interface — no schema rewrite, no console-side changes.

The boundary

The console NEVER embeds Go; it talks to hearth over a small JSON-over-HTTP RPC surface (single binary, Unix socket in self-host, private port in cloud). System redesign §“Why Go”: the console doesn’t actually need to embed HEARTH; it talks to it over a small RPC surface — that’s the right place for polyglot.

Key Terms

  • DeploymentRef → driver-returned handle persisted in the console DB; carries workspace_slug, tier, driver, endpoint, secrets_vault_path, created, last_status (rocky-010)
  • ProvisioningProfile → resource caps + driver flags resolved from a tier (rocky-009)
  • tier_torn_down → terminal Status value, a state and not an error
  • Idempotence triple(slug, tier, driver) — same triple twice = same DeploymentRef, no new side effects

Q&A

Q: What does Status return immediately after Teardown succeeds? A: tier_torn_down. It is a terminal state, not an error condition. Callers can distinguish “torn down” from “never existed” purely by status value, no exception handling needed.

Q: Why is there no Retry method? A: There isn’t one. Failures are loud (failed state, retained logs, no auto-retry) and admins re-run Provision explicitly. Idempotence on the (slug, tier, driver) triple makes the re-run safe.

Q: How does adding a new driver in Phase 6 not break existing deployments? A: The interface is locked by FakeDriver contract tests in 5a. New drivers add a new file under internal/driver/<name>/ and a new value in DriverNameSchema; they don’t touch the four method signatures.

Examples

A locksmith franchise where every shop offers the same four services: cut a key (provision), tell you if your existing key still works (status), re-key your lock (upgrade), and decommission the door entirely (teardown). Asking for the same key twice gives you the original, not a duplicate. Once a door is decommissioned, asking about it just confirms it’s gone — that is not an error, just a fact.

neighbors on the map