RabbitMQ Queue Taxonomy
sparki intermediate 5 min read
ELI5
The engine talks to its workers through four labelled mailboxes at RabbitMQ: two live queues (builds, deployments) and two dead-letter queues (builds.failed, deployments.failed). Every letter (message) is signed for (publisher confirms) and re-sent up to three times with growing pauses if delivery fails.
Technical Deep Dive
internal/mq/config.go declares the four queue names as constants and ships defaults via DefaultConfig(). producer.go and consumer.go implement the publish/consume protocol against github.com/rabbitmq/amqp091-go.
Queue Constants
| Constant | Value | Role |
|---|---|---|
QueueBuilds | builds | New build jobs from API → workers |
QueueDeployments | deployments | Deployment notifications → deploy-loco |
QueueBuildsFailed | builds.failed | DLQ for permanently-failed builds |
QueueDeploymentsFailed | deployments.failed | DLQ for permanently-failed deployments |
Default Connection Profile
| Field | Default |
|---|---|
Host | rabbitmq.rabbitmq.svc.cluster.local |
Port | 5672 |
VHost | / |
PoolSize | 10 |
HeartbeatInterval | 60s |
ConnectionTimeout | 30s |
PublisherConfirms | true |
ConfirmTimeout | 5s |
MaxRetryAttempts | 3 |
InitialBackoff | 1s |
MaxBackoff | 10s |
BackoffMultiplier | 2.0 |
Topology
flowchart LR API[api-engine REST handler] --> PROD[mq.Producer] PROD -->|PublishBuildJob| BQ[(builds)] PROD -->|PublishDeploymentNotification| DQ[(deployments)] BQ --> WC[Worker pool consumer] DQ --> LC[deploy-loco consumer] WC -- transient err NACK requeue --> BQ WC -- permanent err NACK no requeue --> BFQ[(builds.failed)] LC -- permanent err --> DFQ[(deployments.failed)]Publish Sequence
sequenceDiagram autonumber participant H as REST handler participant P as Producer participant C as ConnectionPool participant R as RabbitMQ
H->>P: PublishBuildJob(ctx, job) loop attempt 1..MaxRetryAttempts P->>C: borrow channel P->>R: Publish (mandatory + persistent + confirm) alt confirm received within ConfirmTimeout R-->>P: ack P-->>H: nil else timeout / nack P->>P: sleep backoff (×BackoffMultiplier, cap MaxBackoff) end end P-->>H: error after 3 attemptsError Classification at Consumer
MessageHandler returns (ErrorType, error):
nil→ ACKErrorTypeTransient→ NACK + requeue (re-delivery)ErrorTypePermanent→ NACK without requeue → routed to*.failedDLQ
Key Terms
- publisher confirms → AMQP feature where the broker acks each successfully-persisted publish; defaults to on
- prefetch → consumer-side cap on unacked deliveries buffered locally
- DLQ → dead-letter queue, the
.failedsuffix in this taxonomy - transient vs permanent → consumer’s classification of an error; transient requeues, permanent dead-letters
Q&A
Q: How many publish attempts before a message is dropped?
A: 3 (MaxRetryAttempts: 3). Backoff starts at 1s, doubles, capped at 10s. After the third failure the producer returns an error and the caller decides what to do (typically respond 5xx).
Q: Are publisher confirms required?
A: They are on by default (PublisherConfirms: true). Disabling them is allowed via env but not recommended — without confirms, a publish that the broker silently drops looks identical to a successful one.
Q: What priority do build jobs use?
A: The producer uses job.Priority from the BuildJobMessage; deployment notifications use a fixed default of 5.
Examples
A 5xx from the cloud RabbitMQ during a PublishBuildJob triggers: attempt 1 fails (1s sleep) → attempt 2 fails (2s sleep) → attempt 3 fails (4s sleep, capped at 10s would matter on attempt 5+) → producer returns wrapped error including the message ID; the REST handler returns 503 to the client and no builds row exists.
neighbors on the map
- NATS Subject Taxonomy wiring a new consumer to the right stream
- NATS Event Bridge subscribing a Choco service to STRATT lifecycle events
- CI Transition Event Schema vendoring kahn_emit.py into a CI producer