Multi-Layer Caching Strategy

smo1 intermediate 6 min read

ELI5

SMO1 stores copies of link data in four places, each closer or further from the user. It is like having a photo of your ID: you carry one in your wallet (browser cache), the building security desk has a copy (edge KV), the HR office has the original file (Redis), and the government database has the master record (PostgreSQL). When you update your hairstyle, every copy needs updating — but the wallet photo is fastest to check.

Technical Deep Dive

Four-Layer Cache Hierarchy

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e8f4f8', 'primaryTextColor': '#2d3748', 'primaryBorderColor': '#90cdf4', 'lineColor': '#718096', 'secondaryColor': '#f0fff4', 'tertiaryColor': '#fefcbf'}}}%%
flowchart TB
    subgraph User["User's Browser"]
        RC[React Query<br/>Client Cache]
    end
    subgraph Edge["Cloudflare Edge"]
        KV["Cloudflare KV<br/>link:{slug}"]
    end
    subgraph Backend["purr-api"]
        RD[Redis<br/>link URL cache]
        JW[JWKS Cache<br/>airlock keys]
    end
    subgraph Origin["Origin"]
        PG[(PostgreSQL)]
    end

    U[User] -->|Dashboard| RC
    RC -->|Stale after 5 min| P1[purr-api]
    U -->|Short link| KV
    KV -->|Miss| P2[purr-api]
    P1 --> RD
    P2 --> RD
    RD -->|Miss| PG
    P1 --> JW
    P2 --> JW
    JW -->|Miss| Airlock[Airlock JWKS]

Layer 1: React Query (Client-Side)

Where: meow-web browser tab What: User data, link lists, dashboard stats TTL / Stale time: 5 minutes (staleTime: 5 * 60 * 1000) Invalidation: Manual via queryClient.invalidateQueries() after mutations

When a user edits a link, the UI optimistically updates the cache, then re-fetches from purr-api to confirm. This avoids flickering while ensuring consistency.

Layer 2: Cloudflare KV (Edge)

Where: Cloudflare’s global edge network (250+ cities) What: Link resolution data (url, isActive, expiresAt, utm_*, redirectMode, protectionType) Key format: link:{slug} TTL: 300 seconds (5 minutes) for entries written by zoomies-edge on cache miss Consistency: Eventually consistent (writes propagate globally within ~60 seconds)

Two sources of KV data:

zoomies-edge write-back — on cache miss, the worker fetches from purr-api and writes to KV with 5-minute TTL
KV Sync Service — purr-api actively pushes link changes to KV via REST API (see smo1-015)

Layer 3: Redis (Backend)

Where: purr-api server (or Redis cluster in production) What:

Link URL cache (same data as KV, but for purr-api internal use)
Rate limit counters
JWKS public key cache

Link cache TTL: 300 seconds (5 minutes) JWKS cache TTL: 300 seconds (5 minutes) Rate limit TTL: 3600 seconds (1 hour)

Redis acts as a “hot cache” for purr-api, reducing PostgreSQL query load by ~80% for read-heavy endpoints like link resolution and dashboard stats.

Layer 4: PostgreSQL (Source of Truth)

Where: Primary database (Railway in production, Docker locally) What: All persistent data Cache behaviour: No implicit caching; every query hits disk unless served by PostgreSQL’s own buffer pool

Cache Invalidation Strategy

When a link is updated (e.g., destination URL changed, protection added):

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e8f4f8', 'primaryTextColor': '#2d3748', 'primaryBorderColor': '#90cdf4', 'lineColor': '#718096', 'secondaryColor': '#f0fff4', 'tertiaryColor': '#fefcbf'}}}%%
flowchart LR
    A[Link updated<br/>via API] --> B[LinkService.Update]
    B --> C[UPDATE PostgreSQL]
    B --> D["DELETE Redis link:{slug}"]
    B --> E["PUT Cloudflare KV link:{slug}"]
    B --> F[Invalidate React Query<br/>via websocket / polling]
    C --> G[Source of truth updated]
    D --> H[Next read re-fills from PG]
    E --> I[Edge sees new data<br/>within ~60s]
    F --> J[Dashboard refreshes]

Key principle: Write-through to KV + Redis deletion. The next read will:

Miss in Redis → query PostgreSQL → re-fill Redis
Find updated data in KV (if sync succeeded) or fall back to API

Stale Data Scenarios

Scenario	Impact	Mitigation
KV propagation delay (up to 60s)	Edge may serve old URL briefly	Acceptable for most use cases; critical updates use custom slug change
Redis expiry race condition	Two requests both miss and query PG	No data loss; minor PG load spike
React Query stale data	User sees old link list for up to 5 min	Manual invalidation on mutation; optimistic updates

Key Terms

TTL → Time-To-Live; seconds until a cache entry expires automatically
Write-through → Writing to cache and database simultaneously on update
Cache invalidation → Deleting or updating cached entries when source data changes
Eventually consistent → KV property: reads may return slightly old data after a write
Optimistic update → UI assumes the mutation succeeds and updates cache immediately, rolling back on error

Q&A

Q: Why not use a single cache layer? A: Each layer serves a different purpose. React Query reduces API calls from the browser. KV reduces origin latency globally. Redis reduces database load. PostgreSQL ensures durability. Removing any layer would create a bottleneck.

Q: What is the maximum staleness a user can experience? A: Worst case: KV propagation delay (~60s) + React Query stale time (5 min) = ~6 minutes. In practice, link updates trigger immediate React Query invalidation, so dashboard users see changes within seconds.

Q: How does the system handle a link deletion? A: The link is soft-deleted (is_active = false). KV and Redis entries are deleted. The edge worker treats missing or inactive links as “not found” and proxies to the landing page.

Q: Why delete Redis but put KV on update? A: Redis is fast to re-fill (local to purr-api). KV is slow to propagate, so we actively push the new value rather than waiting for the edge worker to discover the deletion and re-fetch.

Examples

Think of caching like a city’s water supply:

React Query is the water tank on your roof — instant pressure, but only holds a small amount and needs refilling
Cloudflare KV is the neighbourhood water tower — shared by many houses, refilled from the main plant, and takes a few minutes to update when the city switches reservoirs
Redis is the pumping station — pressurises water for the neighbourhood and reduces load on the main pipes
PostgreSQL is the reservoir and treatment plant — the ultimate source, but too far away to rely on for every glass of water
Cache invalidation is the city switching water sources: they update the treatment plant, flush the pumping station, and fill the towers with new water — but the roof tank on your house might still have old water until you drain it

neighbors on the map

Edge Redirect Flow debugging why a slug does not redirect
Click Tracking Pipeline debugging missing or duplicate click counts
Database Architecture designing a query that spans transactional and analytical data
Tier-Based Rate Limiting debugging 429 errors for specific users