CRUMB a card from devarno-cloud

Tier-Based Rate Limiting

tektree intermediate 4 min read

ELI5

Each tier is a different size cup that refills once a minute: free is a shot glass (100), pro is a pint (1000), team is a pitcher (5000). Drink it dry and you wait for the next refill — the 429 is the bartender saying “minute’s not up yet.”

Technical Deep Dive

Tier-based rate limiting is enforced only at the api-gateway, after the auth middleware has set user_tier in context. Downstream services do not implement their own limits.

Quota Table

TierLimit (req/min)Source
free100gateway middleware default
pro1000gateway middleware
team5000gateway middleware
enterpriseunlimitedper API_CONTRACTS.md

Decision Flow

flowchart TD
A[Auth middleware sets user_id, user_tier] --> B[RateLimit middleware]
B --> C{tier ∈ free/pro/team?}
C -->|free| D[bucket key: rl:user_id:free, cap=100]
C -->|pro| E[cap=1000]
C -->|team| F[cap=5000]
D --> G{tokens > 0?}
E --> G
F --> G
G -->|yes| H[decrement, set X-RateLimit-Remaining, continue]
G -->|no| I[429 + X-RateLimit-Reset]

Bucket Lifecycle

stateDiagram-v2
[*] --> Idle
Idle --> Active: first request in window
Active --> Active: token decrement
Active --> Throttled: tokens == 0
Throttled --> Idle: window expires (60s TTL)

Implementation State

services/api-gateway/internal/middleware/middleware.go:99-132 defines RateLimit(redisURL string) which is wired into the protected /api/v1 group. Today the middleware sets the X-RateLimit-* response headers and reads the configured RedisURL; the actual Redis-backed token-bucket counter is stubbed (no INCR / EXPIRE round-trip yet). The intended scheme per SECURITY_ARCHITECTURE.md is a token bucket keyed by (user_id, endpoint) in Redis. Anonymous (pre-auth) requests are not rate-limited at the gateway today.

Why Gateway-Only

Centralising at the ingress avoids cache-coherency problems across services and lets rate-limit decisions reuse the same Redis instance already used for caching and the event bus. It also means a misbehaving internal service cannot accidentally over-limit downstream calls.

Key Terms

  • Tier → resolved from JWT tier claim, propagated as X-User-Tier.
  • Token bucket → fixed capacity, refills at full rate at the end of each window.
  • Window → 60 s, by quota name.
  • X-RateLimit-* headersLimit, Remaining, Reset returned on every limited response.

Q&A

Q: Two distinct users on the same Pro plan each see 1000/min, not 500/min split between them. Why? A: The bucket key includes user_id. Limits are per-user, not per-tier-pool — each Pro user gets their own 1000/min.

Q: A request without Authorization arrives at /api/v1/auth/login. What rate-limit applies? A: None at the gateway today — the middleware is wired on the protected group only, so unauthenticated /auth/* traffic is uncapped. This is a known gap and is the most common reason to add an IP-based pre-auth limiter.

Q: Why is Enterprise listed as “unlimited” rather than a very large number? A: It is a contract value, not a code path: the middleware skips quota enforcement when the tier resolves to enterprise (see API_CONTRACTS.md). Treat it as a bypass, not a high cap.

Examples

Bumping Pro from 1000 to 2000 req/min: edit the tier-quota table in services/api-gateway/internal/middleware/middleware.go, deploy. Existing Pro users see the new ceiling at the next bucket window (≤60 s).

neighbors on the map