Audience Materialized View Refresh
skyflow intermediate 5 min read
ELI5
The landing page and dashboards don’t query timeline_analyses directly — they read from precomputed materialised views (landing_hero_stats, top_active_users, etc.). A function refresh_audience_views() walks each view, runs REFRESH MATERIALIZED VIEW CONCURRENTLY, and writes a row into materialized_view_refreshes with timing and any error. CONCURRENTLY needs a UNIQUE INDEX, which every view declares.
Technical Deep Dive
Views (004_audience_schema.sql)
| View | Window | Purpose |
|---|---|---|
landing_hero_stats | 30d | total analyses / users / avg engagement |
analysis_type_stats | 30d | per-type (flow / clout / kudos / pulse) totals |
top_active_users | 30d | top 100 active users for “wall of fame” |
recent_activity | 24h | last 50 analyses for the live feed |
user_analysis_summary | all-time | per-user stats for dashboard |
daily_analysis_trends | 90d | daily counts for chart |
analysis_correlations | 30d | cross-metric Pearson CORR (premium) |
Each view declares a UNIQUE INDEX on its natural key — required for REFRESH … CONCURRENTLY to work without an exclusive lock.
Refresh Loop
flowchart TD Cron[scheduler] --> F[refresh_audience_views] F --> L[INSERT materialized_view_refreshes status=running] L --> R[EXECUTE 'REFRESH MATERIALIZED VIEW CONCURRENTLY view'] R --> OK{ok?} OK -->|yes| C[count rows; UPDATE status=success duration_ms row_count] OK -->|no| E[UPDATE status=failed error_message=SQLERRM] C --> N{more views?} E --> N N -->|yes| L N -->|no| Done[RETURN TABLE name,duration,status]Tracking Table
CREATE TABLE materialized_view_refreshes ( id BIGSERIAL PRIMARY KEY, view_name TEXT NOT NULL, started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), completed_at TIMESTAMPTZ, duration_ms INT, row_count INT, status TEXT NOT NULL DEFAULT 'running', -- running|success|failed error_message TEXT);View Dependency
classDiagram class timeline_analyses { +uuid id +uuid user_id +text target_did +jsonb flow_data +jsonb clout_data +jsonb kudos_data +jsonb pulse_data +timestamptz analyzed_at } class landing_hero_stats {<<MV>>} class analysis_type_stats {<<MV>>} class top_active_users {<<MV>>} class recent_activity {<<MV>>} class user_analysis_summary {<<MV>>} class daily_analysis_trends {<<MV>>} class analysis_correlations {<<MV>>} timeline_analyses <-- landing_hero_stats timeline_analyses <-- analysis_type_stats timeline_analyses <-- top_active_users timeline_analyses <-- recent_activity timeline_analyses <-- user_analysis_summary timeline_analyses <-- daily_analysis_trends timeline_analyses <-- analysis_correlationsThe leaderboard views (leaderboard_global, leaderboard_weekly) live in 002_gamification_schema.sql and have their own refresh_leaderboards() function — they are not driven by refresh_audience_views.
Scheduling
The scheduled_jobs table seeds entries like:
| name | schedule |
|---|---|
refresh_leaderboards | */15 * * * * |
cleanup_expired_sessions | 0 * * * * |
cleanup_profile_cache | 0 */6 * * * |
cleanup_rate_limits | 0 0 * * * |
A separate scheduler / cron service is expected to invoke refresh_audience_views(); the cron expression itself is service-owned and not committed in 004_audience_schema.sql.
Key Terms
- Materialized view → a query result physically stored on disk; reads are O(1) but the view is stale until refreshed
- REFRESH CONCURRENTLY → rebuilds without taking an exclusive lock on the view; requires a UNIQUE INDEX
materialized_view_refreshes→ audit table; one row per refresh attempt with status + error_message- EXCEPTION WHEN OTHERS → PL/pgSQL catch-all in
refresh_audience_viewsso one failing view does not abort the whole loop
Q&A
Q: Why does each view require a UNIQUE INDEX?
A: Postgres needs it to perform REFRESH MATERIALIZED VIEW CONCURRENTLY — the diff between old and new contents is computed by joining on the unique key.
Q: Where do I look when the dashboard shows yesterday’s numbers?
A: SELECT * FROM materialized_view_refreshes ORDER BY started_at DESC LIMIT 20; — confirms which view last ran, when, and whether it errored.
Q: Why is the leaderboard refresh function separate? A: Leaderboards live in Redis (sorted sets) and only fall back to the Postgres materialized views for cold-start. Their refresh cadence (15 min) and ownership are different from analytics views.
Q: What if a refresh takes longer than the cron interval?
A: The next tick will start while the previous still runs. CONCURRENTLY allows this without lock conflict, but compute cost compounds — watch duration_ms in materialized_view_refreshes and adjust schedule.
Examples
Like a newspaper that’s printed once an hour. The web edition (the view) is fast to read but only as fresh as the last print run. The print log on the wall (materialized_view_refreshes) tells you when the last edition went out and whether the press jammed.
neighbors on the map
- Multi-Layer Caching Strategy debugging stale link data
- LORE+CAIRNET Shared Graph Primitive understanding how LORE and CAIRNET share the graph component