CRUMB a card from devarno-cloud

In-Process Rate-Limit Bucket

kahn intermediate 4 min read

ELI5

A jar of marbles per tenant; a request takes one marble. Drop a fresh marble in at a fixed pace. If the jar’s empty, the request waits at the door. There’s only one jar room — adding a second jar room without a shared counter would let everyone double-dip.

Technical Deep Dive

backend/kahn/rate_limit.py is an in-process per-tenant token bucket guarding cloud-mode ingest. It conforms to the RateLimiter Protocol from ingest.py:

class RateLimiter(Protocol):
def allow(self, tenant_id: str) -> bool: ...

Soundness Constraint

Sound only under single-replica topology. railway.toml declares no numReplicas, so the Railway host:api service runs one replica today. Two replicas would each enforce their own quota, producing aggregate REPLICAS × limit.

flowchart LR
subgraph Now ["Single replica (sound)"]
C1[client] --> R1[host:api]
R1 --> B1[(in-process bucket)]
end
subgraph Future ["Multi-replica (UNSOUND with this module)"]
C2[client] --> R2a[host:api #1]
C2 --> R2b[host:api #2]
R2a --> B2a[(bucket #1)]
R2b --> B2b[(bucket #2)]
end
Future -.migration.-> PG[(rate_state in Postgres\nSELECT FOR UPDATE)]

The migration target is a rate_state row keyed on tenant_id (Postgres SELECT … FOR UPDATE) or Redis. A backlog entry in docs/backlog/phase-3-backlog.md tracks the trigger.

Operator Surface

def snapshot(tenant_id: str) -> RateLimitSnapshot: ...

/api/self/tenant calls this so an operator can answer “is my producer hitting the limit?” without grepping logs — directly visible from the SPA.

Key Terms

  • token bucket → Classical algorithm: tokens refill at a fixed rate up to a cap; each request consumes one.
  • single replica → The Railway-deployed host:api runs one process; rate state is process-local.
  • RateLimitSnapshot → Operator-facing struct (capacity, available, refill rate) returned by snapshot().

Q&A

Q: What happens when the bucket is empty? A: allow() returns False; ingest returns 429 to the caller. There’s no queueing — the producer is expected to back off.

Q: Why not put rate state in Postgres now? A: Round-trip cost on every ingest event would dominate. The single-replica assumption is documented and gates the migration.

Q: Does OSS mode rate-limit? A: No. OSS mode is single-operator localhost; there’s no tenant axis to limit on.

Examples

A producer issuing a 10k-event burst sees the first N succeed, the rest 429 once the bucket drains. /api/self/tenant shows available: 0, refill_per_s: 50 so the operator immediately knows whether to slow the producer or request a quota bump.

neighbors on the map