CRUMB a card from devarno-cloud

Audience Materialized View Refresh

skyflow intermediate 5 min read

ELI5

The landing page and dashboards don’t query timeline_analyses directly — they read from precomputed materialised views (landing_hero_stats, top_active_users, etc.). A function refresh_audience_views() walks each view, runs REFRESH MATERIALIZED VIEW CONCURRENTLY, and writes a row into materialized_view_refreshes with timing and any error. CONCURRENTLY needs a UNIQUE INDEX, which every view declares.

Technical Deep Dive

Views (004_audience_schema.sql)

ViewWindowPurpose
landing_hero_stats30dtotal analyses / users / avg engagement
analysis_type_stats30dper-type (flow / clout / kudos / pulse) totals
top_active_users30dtop 100 active users for “wall of fame”
recent_activity24hlast 50 analyses for the live feed
user_analysis_summaryall-timeper-user stats for dashboard
daily_analysis_trends90ddaily counts for chart
analysis_correlations30dcross-metric Pearson CORR (premium)

Each view declares a UNIQUE INDEX on its natural key — required for REFRESH … CONCURRENTLY to work without an exclusive lock.

Refresh Loop

flowchart TD
Cron[scheduler] --> F[refresh_audience_views]
F --> L[INSERT materialized_view_refreshes status=running]
L --> R[EXECUTE 'REFRESH MATERIALIZED VIEW CONCURRENTLY view']
R --> OK{ok?}
OK -->|yes| C[count rows; UPDATE status=success duration_ms row_count]
OK -->|no| E[UPDATE status=failed error_message=SQLERRM]
C --> N{more views?}
E --> N
N -->|yes| L
N -->|no| Done[RETURN TABLE name,duration,status]

Tracking Table

CREATE TABLE materialized_view_refreshes (
id BIGSERIAL PRIMARY KEY,
view_name TEXT NOT NULL,
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
completed_at TIMESTAMPTZ,
duration_ms INT,
row_count INT,
status TEXT NOT NULL DEFAULT 'running', -- running|success|failed
error_message TEXT
);

View Dependency

classDiagram
class timeline_analyses {
+uuid id
+uuid user_id
+text target_did
+jsonb flow_data
+jsonb clout_data
+jsonb kudos_data
+jsonb pulse_data
+timestamptz analyzed_at
}
class landing_hero_stats {<<MV>>}
class analysis_type_stats {<<MV>>}
class top_active_users {<<MV>>}
class recent_activity {<<MV>>}
class user_analysis_summary {<<MV>>}
class daily_analysis_trends {<<MV>>}
class analysis_correlations {<<MV>>}
timeline_analyses <-- landing_hero_stats
timeline_analyses <-- analysis_type_stats
timeline_analyses <-- top_active_users
timeline_analyses <-- recent_activity
timeline_analyses <-- user_analysis_summary
timeline_analyses <-- daily_analysis_trends
timeline_analyses <-- analysis_correlations

The leaderboard views (leaderboard_global, leaderboard_weekly) live in 002_gamification_schema.sql and have their own refresh_leaderboards() function — they are not driven by refresh_audience_views.

Scheduling

The scheduled_jobs table seeds entries like:

nameschedule
refresh_leaderboards*/15 * * * *
cleanup_expired_sessions0 * * * *
cleanup_profile_cache0 */6 * * *
cleanup_rate_limits0 0 * * *

A separate scheduler / cron service is expected to invoke refresh_audience_views(); the cron expression itself is service-owned and not committed in 004_audience_schema.sql.

Key Terms

  • Materialized view → a query result physically stored on disk; reads are O(1) but the view is stale until refreshed
  • REFRESH CONCURRENTLY → rebuilds without taking an exclusive lock on the view; requires a UNIQUE INDEX
  • materialized_view_refreshes → audit table; one row per refresh attempt with status + error_message
  • EXCEPTION WHEN OTHERS → PL/pgSQL catch-all in refresh_audience_views so one failing view does not abort the whole loop

Q&A

Q: Why does each view require a UNIQUE INDEX? A: Postgres needs it to perform REFRESH MATERIALIZED VIEW CONCURRENTLY — the diff between old and new contents is computed by joining on the unique key.

Q: Where do I look when the dashboard shows yesterday’s numbers? A: SELECT * FROM materialized_view_refreshes ORDER BY started_at DESC LIMIT 20; — confirms which view last ran, when, and whether it errored.

Q: Why is the leaderboard refresh function separate? A: Leaderboards live in Redis (sorted sets) and only fall back to the Postgres materialized views for cold-start. Their refresh cadence (15 min) and ownership are different from analytics views.

Q: What if a refresh takes longer than the cron interval? A: The next tick will start while the previous still runs. CONCURRENTLY allows this without lock conflict, but compute cost compounds — watch duration_ms in materialized_view_refreshes and adjust schedule.

Examples

Like a newspaper that’s printed once an hour. The web edition (the view) is fast to read but only as fresh as the last print run. The print log on the wall (materialized_view_refreshes) tells you when the last edition went out and whether the press jammed.

neighbors on the map