Deployment Strategies & Rollback
sparki intermediate 5 min read
ELI5
When loco ships code, it picks one of four moving-truck plans: just swap the boxes (direct), set up a second house and forward the address (blue-green), open a side door for 5% of guests first (canary), or replace the furniture room by room (rolling). If the new place catches fire, the rollback flag drives the truck back.
Technical Deep Dive
Defined in subsystems/loco/types.go (Go, engine-side) and mirrored in services/deploy-loco/src/adapters/ (Rust, worker-side).
Strategies
| Constant | String | Description |
|---|---|---|
StrategyDirect | direct | Stop old, start new. Cheapest, has downtime. |
StrategyBlueGreen | blue-green | Stand up parallel stack, flip router. |
StrategyCanary | canary | Route a percentage of traffic, ramp on success. |
StrategyRolling | rolling | Replace instances N at a time. |
Platforms
| Constant | String | Adapter |
|---|---|---|
PlatformRailway | railway | services/deploy-loco/src/adapters/railway.rs |
PlatformRender | render | (planned) |
PlatformFlyIO | flyio | (planned) |
PlatformVercel | vercel | (planned) |
PlatformCustom | custom | Generic webhook + script adapter |
Health Checks
| Type | Use |
|---|---|
HealthCheckHTTP | GET an endpoint, expect 2xx |
HealthCheckTCP | open a TCP port |
HealthCheckScript | run a script, exit code 0 = pass |
Statuses: passing, warning, critical, unknown.
Decision Flow
flowchart TD REQ[CreateDeploymentRequest] --> CFG{strategy?} CFG -->|direct| D[stop old, start new] CFG -->|blue-green| BG[provision green, run health checks, flip] CFG -->|canary| CN[shift X% traffic, observe, ramp] CFG -->|rolling| RL[replace instance batches] D --> HC[health check] BG --> HC CN --> HC RL --> HC HC -->|passing| OK[status=success api=healthy] HC -->|critical| FAIL[status=failed] FAIL --> AR{auto_rollback?} AR -->|true| RB[restore rollback_target_id] AR -->|false| END[stay failed] RB --> RBOK[status=rolled_back]State of an Auto-Rollback
stateDiagram-v2 deploying --> health_check health_check --> success: probes green health_check --> failed: probes red failed --> rolled_back: auto_rollback=true rolled_back --> [*]auto_rollback
DeploymentConfig.AutoRollback bool (Go) governs whether a failed health check triggers the rollback path. The Rust worker reads this from the row, looks up the previous successful deployment, and re-issues that adapter call. The original failed row’s rollback_target_id is set to that previous deployment.
Key Terms
- canary → a controlled minority traffic shift used to detect regressions before full rollout
- adapter → per-platform module under
deploy-loco/src/adapters/translating aDeploymentConfigto the platform’s API - health check → HTTP/TCP/script probe gating success
- auto_rollback → boolean on the deployment config; on failure, restore the previous successful deployment
Q&A
Q: Which strategies require a load balancer in front of the service?
A: blue-green, canary, and rolling all assume a router that can shift traffic between instances. direct does not — it accepts downtime in exchange.
Q: Does loco itself implement canary traffic shifting?
A: No. Loco delegates to the platform adapter (e.g., Railway’s deploy API). Loco orchestrates the phases (current_phase) and reads health-check results; the platform owns the actual traffic split.
Q: What if rollback_target_id is null when auto-rollback fires?
A: There is no previous successful deployment to restore (e.g., this is the first deploy). The worker leaves status at failed and surfaces an error rather than rolling back to nothing.
Examples
A canary to Railway: deploy-loco creates a new Railway deploy at 5% traffic, polls the platform for current_phase=canary, runs a 60s HTTP health check on the new pods. Probes pass → ramp to 50% → re-check → ramp to 100% → mark success/healthy. A 5xx spike during the 50% phase flips status to failed; if auto_rollback=true, Railway is asked to restore the prior deployment ID and the row records rolled_back.
neighbors on the map
- Site Hosting Modes & Lifecycle Stages adding a new fork branching on platform vs user_git
- Run Outcome Classification interpreting a History row's status pill
- Site Provisioning Saga State Machine debugging a site stuck mid-provision