Per-IP Fixed-Window Rate Limiter
traceo beginner 3 min read
ELI5
Each visiting IP gets a paper ticket with N punches. Every minute, the ticket is replaced with a fresh one. When a visitor’s ticket runs out before the minute ends, the door says 429 and tells them when the next ticket prints (X-RateLimit-Reset). Health-check robots get a free pass.
Technical Deep Dive
Configuration
| Setting | Service | Default |
|---|---|---|
MCP_RATE_LIMIT | MCP server | 100/minute |
ENGINE_RATE_LIMIT | Engine | 50/minute |
Both are read by traceo_mcp_server/config.py into SecurityConfig.
Algorithm
flowchart TD REQ[Request from IP X] EXEMPT{Path exempt?} PASS[Skip limiter] GET[bucket = _buckets X] RESET{now - window_start ≥ 60s?} NEW[Reset window: count=0, window_start=now] INC[count += 1] CHK{count ≤ limit?} OK[Allow, add X-RateLimit-* headers] DENY[429 Too Many Requests] REQ --> EXEMPT EXEMPT -- yes --> PASS EXEMPT -- no --> GET GET --> RESET RESET -- yes --> NEW --> INC RESET -- no --> INC INC --> CHK CHK -- yes --> OK CHK -- no --> DENYExempt Paths
| Service | Exempt |
|---|---|
MCP :8000 | /health |
Engine :8001 | /health, /ready, /metrics |
The engine exempts more paths because Prometheus scrapes /metrics on a tight interval and would otherwise consume the bucket of any IP it shares with real traffic.
Response Headers (always emitted on non-exempt paths)
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Configured ceiling for this minute |
X-RateLimit-Remaining | Requests left in the current window |
X-RateLimit-Reset | Unix timestamp when the window flips |
Storage
_buckets is an in-process dict keyed by IP. Buckets do not survive process restart and do not coordinate across replicas — a fleet of N MCP processes effectively has an Nx limit per IP.
Key Terms
- Fixed window → a counter that hard-resets every 60 seconds; cheaper than a token bucket but allows burst at boundaries.
- Bucket → per-IP pair
(count, window_start). - Exempt → path skipped entirely; counter not incremented.
Q&A
Q: Two replicas, one user — what’s the effective limit?
A: 2× the configured limit, because each process holds its own _buckets dict and the load balancer round-robins.
Q: An attacker sends 200 requests at the 59s→61s boundary. Does the limiter catch it?
A: No. Fixed-window allows up to limit in each window, so 100 requests at 0:59 and 100 more at 1:00 all pass. This is the known cost of fixed-window vs sliding-window.
Q: Where does the limiter sit relative to auth?
A: On the engine it runs after AuthMiddleware; on the MCP server it runs before the per-route auth decorators. So an unauthenticated request still consumes the engine’s bucket but not the MCP’s.
Examples
Prometheus scrapes the engine’s /metrics 30 times a minute. Without the exempt list, that single scraper IP would burn 30/50 of its budget on monitoring, leaving real callers a quarter of the headroom. Exempting /metrics keeps the scraper invisible to the limiter.
neighbors on the map
- Tier-Based Rate Limiting debugging 429 errors for specific users