Tier-Based Rate Limiting
tektree intermediate 4 min read
ELI5
Each tier is a different size cup that refills once a minute: free is a shot glass (100), pro is a pint (1000), team is a pitcher (5000). Drink it dry and you wait for the next refill — the 429 is the bartender saying “minute’s not up yet.”
Technical Deep Dive
Tier-based rate limiting is enforced only at the api-gateway, after the auth middleware has set user_tier in context. Downstream services do not implement their own limits.
Quota Table
| Tier | Limit (req/min) | Source |
|---|---|---|
| free | 100 | gateway middleware default |
| pro | 1000 | gateway middleware |
| team | 5000 | gateway middleware |
| enterprise | unlimited | per API_CONTRACTS.md |
Decision Flow
flowchart TD A[Auth middleware sets user_id, user_tier] --> B[RateLimit middleware] B --> C{tier ∈ free/pro/team?} C -->|free| D[bucket key: rl:user_id:free, cap=100] C -->|pro| E[cap=1000] C -->|team| F[cap=5000] D --> G{tokens > 0?} E --> G F --> G G -->|yes| H[decrement, set X-RateLimit-Remaining, continue] G -->|no| I[429 + X-RateLimit-Reset]Bucket Lifecycle
stateDiagram-v2 [*] --> Idle Idle --> Active: first request in window Active --> Active: token decrement Active --> Throttled: tokens == 0 Throttled --> Idle: window expires (60s TTL)Implementation State
services/api-gateway/internal/middleware/middleware.go:99-132 defines RateLimit(redisURL string) which is wired into the protected /api/v1 group. Today the middleware sets the X-RateLimit-* response headers and reads the configured RedisURL; the actual Redis-backed token-bucket counter is stubbed (no INCR / EXPIRE round-trip yet). The intended scheme per SECURITY_ARCHITECTURE.md is a token bucket keyed by (user_id, endpoint) in Redis. Anonymous (pre-auth) requests are not rate-limited at the gateway today.
Why Gateway-Only
Centralising at the ingress avoids cache-coherency problems across services and lets rate-limit decisions reuse the same Redis instance already used for caching and the event bus. It also means a misbehaving internal service cannot accidentally over-limit downstream calls.
Key Terms
- Tier → resolved from JWT
tierclaim, propagated asX-User-Tier. - Token bucket → fixed capacity, refills at full rate at the end of each window.
- Window → 60 s, by quota name.
X-RateLimit-*headers →Limit,Remaining,Resetreturned on every limited response.
Q&A
Q: Two distinct users on the same Pro plan each see 1000/min, not 500/min split between them. Why?
A: The bucket key includes user_id. Limits are per-user, not per-tier-pool — each Pro user gets their own 1000/min.
Q: A request without Authorization arrives at /api/v1/auth/login. What rate-limit applies?
A: None at the gateway today — the middleware is wired on the protected group only, so unauthenticated /auth/* traffic is uncapped. This is a known gap and is the most common reason to add an IP-based pre-auth limiter.
Q: Why is Enterprise listed as “unlimited” rather than a very large number?
A: It is a contract value, not a code path: the middleware skips quota enforcement when the tier resolves to enterprise (see API_CONTRACTS.md). Treat it as a bypass, not a high cap.
Examples
Bumping Pro from 1000 to 2000 req/min: edit the tier-quota table in services/api-gateway/internal/middleware/middleware.go, deploy. Existing Pro users see the new ceiling at the next bucket window (≤60 s).
neighbors on the map
- Tier-Based Rate Limiting debugging 429 errors for specific users