In-Process Rate-Limit Bucket
kahn intermediate 4 min read
ELI5
A jar of marbles per tenant; a request takes one marble. Drop a fresh marble in at a fixed pace. If the jar’s empty, the request waits at the door. There’s only one jar room — adding a second jar room without a shared counter would let everyone double-dip.
Technical Deep Dive
backend/kahn/rate_limit.py is an in-process per-tenant token bucket guarding cloud-mode ingest. It conforms to the RateLimiter Protocol from ingest.py:
class RateLimiter(Protocol): def allow(self, tenant_id: str) -> bool: ...Soundness Constraint
Sound only under single-replica topology. railway.toml declares no numReplicas, so the Railway host:api service runs one replica today. Two replicas would each enforce their own quota, producing aggregate REPLICAS × limit.
flowchart LR subgraph Now ["Single replica (sound)"] C1[client] --> R1[host:api] R1 --> B1[(in-process bucket)] end subgraph Future ["Multi-replica (UNSOUND with this module)"] C2[client] --> R2a[host:api #1] C2 --> R2b[host:api #2] R2a --> B2a[(bucket #1)] R2b --> B2b[(bucket #2)] end Future -.migration.-> PG[(rate_state in Postgres\nSELECT FOR UPDATE)]The migration target is a rate_state row keyed on tenant_id (Postgres SELECT … FOR UPDATE) or Redis. A backlog entry in docs/backlog/phase-3-backlog.md tracks the trigger.
Operator Surface
def snapshot(tenant_id: str) -> RateLimitSnapshot: .../api/self/tenant calls this so an operator can answer “is my producer hitting the limit?” without grepping logs — directly visible from the SPA.
Key Terms
- token bucket → Classical algorithm: tokens refill at a fixed rate up to a cap; each request consumes one.
- single replica → The Railway-deployed
host:apiruns one process; rate state is process-local. RateLimitSnapshot→ Operator-facing struct (capacity, available, refill rate) returned bysnapshot().
Q&A
Q: What happens when the bucket is empty?
A: allow() returns False; ingest returns 429 to the caller. There’s no queueing — the producer is expected to back off.
Q: Why not put rate state in Postgres now? A: Round-trip cost on every ingest event would dominate. The single-replica assumption is documented and gates the migration.
Q: Does OSS mode rate-limit? A: No. OSS mode is single-operator localhost; there’s no tenant axis to limit on.
Examples
A producer issuing a 10k-event burst sees the first N succeed, the rest 429 once the bucket drains. /api/self/tenant shows available: 0, refill_per_s: 50 so the operator immediately knows whether to slow the producer or request a quota bump.
neighbors on the map
- Horizontal Scalability Seams planning an S3-backed archive
- Tier-Based Rate Limiting debugging 429 errors for specific users