CRUMB a card from devarno-cloud

Per-IP Fixed-Window Rate Limiter

traceo beginner 3 min read

ELI5

Each visiting IP gets a paper ticket with N punches. Every minute, the ticket is replaced with a fresh one. When a visitor’s ticket runs out before the minute ends, the door says 429 and tells them when the next ticket prints (X-RateLimit-Reset). Health-check robots get a free pass.

Technical Deep Dive

Configuration

SettingServiceDefault
MCP_RATE_LIMITMCP server100/minute
ENGINE_RATE_LIMITEngine50/minute

Both are read by traceo_mcp_server/config.py into SecurityConfig.

Algorithm

flowchart TD
REQ[Request from IP X]
EXEMPT{Path exempt?}
PASS[Skip limiter]
GET[bucket = _buckets X]
RESET{now - window_start ≥ 60s?}
NEW[Reset window: count=0, window_start=now]
INC[count += 1]
CHK{count ≤ limit?}
OK[Allow, add X-RateLimit-* headers]
DENY[429 Too Many Requests]
REQ --> EXEMPT
EXEMPT -- yes --> PASS
EXEMPT -- no --> GET
GET --> RESET
RESET -- yes --> NEW --> INC
RESET -- no --> INC
INC --> CHK
CHK -- yes --> OK
CHK -- no --> DENY

Exempt Paths

ServiceExempt
MCP :8000/health
Engine :8001/health, /ready, /metrics

The engine exempts more paths because Prometheus scrapes /metrics on a tight interval and would otherwise consume the bucket of any IP it shares with real traffic.

Response Headers (always emitted on non-exempt paths)

HeaderMeaning
X-RateLimit-LimitConfigured ceiling for this minute
X-RateLimit-RemainingRequests left in the current window
X-RateLimit-ResetUnix timestamp when the window flips

Storage

_buckets is an in-process dict keyed by IP. Buckets do not survive process restart and do not coordinate across replicas — a fleet of N MCP processes effectively has an Nx limit per IP.

Key Terms

  • Fixed window → a counter that hard-resets every 60 seconds; cheaper than a token bucket but allows burst at boundaries.
  • Bucket → per-IP pair (count, window_start).
  • Exempt → path skipped entirely; counter not incremented.

Q&A

Q: Two replicas, one user — what’s the effective limit? A: 2× the configured limit, because each process holds its own _buckets dict and the load balancer round-robins.

Q: An attacker sends 200 requests at the 59s→61s boundary. Does the limiter catch it? A: No. Fixed-window allows up to limit in each window, so 100 requests at 0:59 and 100 more at 1:00 all pass. This is the known cost of fixed-window vs sliding-window.

Q: Where does the limiter sit relative to auth? A: On the engine it runs after AuthMiddleware; on the MCP server it runs before the per-route auth decorators. So an unauthenticated request still consumes the engine’s bucket but not the MCP’s.

Examples

Prometheus scrapes the engine’s /metrics 30 times a minute. Without the exempt list, that single scraper IP would burn 30/50 of its budget on monitoring, leaving real callers a quarter of the headroom. Exempting /metrics keeps the scraper invisible to the limiter.

neighbors on the map