Database Architecture
smo1 intermediate 8 min read
ELI5
SMO1 uses three different databases, each with a specialty, like a kitchen with three chefs. PostgreSQL is the head chef who keeps the recipe book perfect and consistent. Redis is the prep cook who memorises the most common orders so meals come out fast. ClickHouse is the accountant who counts every ingredient used, every plate served, and every dollar earned — but does not touch the recipes.
Technical Deep Dive
Three-Database Strategy
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e8f4f8', 'primaryTextColor': '#2d3748', 'primaryBorderColor': '#90cdf4', 'lineColor': '#718096', 'secondaryColor': '#f0fff4', 'tertiaryColor': '#fefcbf'}}}%%flowchart TB subgraph OLTP["OLTP: PostgreSQL 16"] PG[(PostgreSQL)] direction TB PG_USERS[users] PG_LINKS[links] PG_API_KEYS[api_keys] PG_ACH[achievements] PG_XP[xp_events] PG_WEB[webhook_events] end
subgraph Cache["Cache: Redis 7"] RD[(Redis)] direction TB RD_RATE[rate_limit buckets] RD_LINK[link URL cache] RD_SESSION[session cache] end
subgraph OLAP["OLAP: ClickHouse 24"] CH[(ClickHouse)] direction TB CH_CLICKS[link_clicks] CH_AGG[ materialized views ] end
API[purr-api] -->|CRUD| PG API -->|sliding window| RD_RATE API -->|read-through| RD_LINK API -->|insert only| CH_CLICKS CH -->|aggregations| APIPostgreSQL 16 (OLTP — Source of Truth)
Role: All transactional data — users, links, API keys, achievements, billing records, webhook events.
Key features:
- Extensions:
uuid-ossp(UUID generation),pg_trgm(trigram text search for slug lookups) - Triggers: Automatic
updated_attimestamp updates via trigger function - Enums:
subscription_tier,link_status,achievement_category,xp_event_type,protection_type - Indexes:
user_id,uid,custom_slug,created_at,total_clicksfor fast lookups - Connection pooling: pgxpool with configurable min/max pool size and connection lifetime
- Migrations: golang-migrate with SQL files embedded into the binary
Critical tables:
| Table | Purpose | Key indexes |
|---|---|---|
users | Identity, subscription, gamification | id (PK), email (unique) |
links | Short links with UTM and protection | uid (unique), custom_slug (unique, partial), user_id |
api_keys | Hashed API credentials | key_hash (unique), user_id |
achievements | Badge definitions | category |
user_achievements | Progress junction | user_id, achievement_id (composite) |
xp_events | XP audit trail | user_id, created_at |
webhook_events | Idempotency for Polar webhooks | event_id (unique) |
protection_attempts | Failed PIN/password attempts | link_id, ip_address |
Redis 7 (Cache + Rate Limiting)
Role: Ephemeral, high-speed operations that do not require ACID guarantees.
Two primary use cases:
-
Link URL cache
- Key format:
link:{slug} - TTL: 300 seconds (5 minutes)
- Value: serialized link data (URL, UTM params, protection settings)
- Purpose: Reduce PostgreSQL load on redirect resolution
- Key format:
-
Rate limiting (sliding window)
- Key format:
rate:{identifier}:{window_bucket} - TTL: 3600 seconds (1 hour)
- Value: integer counter
- Purpose: Enforce tier-based request limits per hour
- Key format:
Configuration:
- AOF persistence enabled (survives restarts)
- Password authentication
- Configurable pool size and connection timeouts
ClickHouse 24 (OLAP — Analytics Warehouse)
Role: Immutable, append-only event store for analytics queries.
Why ClickHouse over Postgres for analytics?
- Columnar storage: Aggregations (SUM, COUNT, AVG) are 10–100× faster than row-based Postgres
- Vectorised execution: Processes data in batches, not row-by-row
- Compression: Better compression ratios for repetitive event data
- Approximate aggregates:
uniqExactfor unique visitor counting without full sorting
Schema: link_clicks table with columns for event metadata, parsed User-Agent, GeoIP, and UTM inheritance.
Interface: HTTP protocol (8123) with JSONEachRow format for inserts. No persistent connection pool needed — each insert is an independent HTTP POST.
Query patterns:
- Time-series aggregation (
GROUP BY toStartOfDay(clicked_at)) - Country/device/browser breakdowns
- Unique visitor counts via
uniqExact(visitor_id) - Referrer analysis
Local Development (litter-box)
All three databases run in Docker Compose for local development:
| Service | Image | Ports | Init scripts |
|---|---|---|---|
| PostgreSQL | postgres:16-alpine | 5432 | /init/postgres |
| Redis | redis:7-alpine | 6379 | — |
| ClickHouse | clickhouse-server:24-alpine | 8123, 9000 | /init/clickhouse |
| MinIO | minio/minio | 9002, 9001 | Auto-creates buckets |
| Mailpit | axllent/mailpit | 1025, 8025 | — |
| Prometheus | prom/prometheus | 9090 | 15-day retention |
| Grafana | grafana/grafana | 3002 | Provisioned dashboards |
Key Terms
- OLTP → Online Transaction Processing; PostgreSQL handles ACID CRUD operations
- OLAP → Online Analytical Processing; ClickHouse handles large-scale aggregations on immutable event data
- pgxpool → Go PostgreSQL connection pooler with automatic health checks and connection recycling
- JSONEachRow → ClickHouse input format where each INSERT row is a JSON object on its own line
- AOF → Append-Only File; Redis persistence mode that logs every write command
- uniqExact → ClickHouse aggregate that counts distinct values exactly (not approximately like
uniq) - pg_trgm → PostgreSQL trigram extension enabling fast fuzzy text search on slugs
Q&A
Q: Why not use PostgreSQL for everything, including analytics? A: Postgres row-based storage and B-tree indexes are excellent for transactional queries but slow for columnar aggregations over billions of rows. ClickHouse is purpose-built for analytical workloads and is 10–100× faster for time-series dashboards.
Q: What happens if Redis loses all its data? A: Nothing critical. Redis is purely a cache and rate-limit store. Link data can be re-fetched from PostgreSQL, and rate limits reset (temporarily more permissive until the window refills). AOF persistence minimises data loss.
Q: How does purr-api handle ClickHouse being temporarily unavailable? A: ClickHouse inserts are fire-and-forget from the edge worker’s perspective. Within purr-api, the analytics service should catch insert errors and log them for retry or reconciliation. Operational alerting flags sustained failures.
Q: Are there foreign keys between ClickHouse and PostgreSQL?
A: No. ClickHouse does not enforce foreign keys, and the analytics schema is denormalised (e.g., user_id is stored as a plain String, not a foreign key). This is intentional: analytics data must remain queryable even if the transactional database is under maintenance.
Examples
Think of the three-database architecture like a hospital:
- PostgreSQL is the electronic health records system — every patient file is accurate, consistent, and immediately updatable
- Redis is the whiteboard in the nurses’ station — frequently accessed room numbers, allergy alerts, and “patients seen this hour” counters; fast to read and write, but if someone accidentally wipes it, the real records still exist
- ClickHouse is the hospital’s epidemiology research database — millions of anonymised visit records used to study disease trends, treatment outcomes, and seasonal patterns. No one edits a past visit; they only append new studies
neighbors on the map
- Click Tracking Pipeline debugging missing or duplicate click counts
- Core Data Models designing a new feature that touches users, links, or achievements
- Multi-Layer Caching Strategy debugging stale link data