CRUMB a card from devarno-cloud

Auth Decision OTel Span Schema

meridian beginner 4 min read

ELI5

Every time the bouncer makes a yes/no decision, they log a tiny card with the reason. Dashboards stack the cards by colour to show patterns. Meridian and airlock use the same card shape so cards from both buildings stack in the same dashboard.

Technical Deep Dive

Both src/middleware.ts and src/pages/auth/callback.ts open spans on the tracer devarno.auth v1, span name auth.decision. The schema is pinned to atlas/data/auth-span-schema.yaml so dashboards can slice uniformly across .devarno.cloud apps and cross-apex apps like meridian.

Attribute Schema

AttributeTypeValuesSource
auth.flowstringF9constant — cross-apex flow id
auth.decisionstringallow, redirectderived
auth.reasonstringvalid_session, no_cookie, expired, invalid_session, handoff_errorbranch label
auth.substringuser idsession payload (allow only)
auth.clientstringstratt-devconstant
auth.truth_sourcestringairlockconstant
auth.session_age_msint0..ttlMscomputed in middleware
auth.handoff_errorstringairlock error codecallback only

Span status is OK on allow, ERROR (with message=reason) on redirect.

Decision Flow

flowchart TD
REQ[incoming request] --> PUB{public prefix?}
PUB -- yes --> SKIP[no span, next]
PUB -- no --> COOKIE{cookie present?}
COOKIE -- no --> R1[reason=no_cookie<br/>decision=redirect]
COOKIE -- yes --> VERIFY{verifyMeridianSession}
VERIFY -- null --> R2[reason=invalid_session<br/>decision=redirect]
VERIFY -- ok, expired --> R3[reason=expired<br/>decision=redirect]
VERIFY -- ok --> ALLOW[reason=valid_session<br/>decision=allow<br/>auth.sub=sub<br/>auth.session_age_ms=age]
R1 --> SPAN
R2 --> SPAN
R3 --> SPAN
ALLOW --> SPAN
SPAN[tracer.startSpan auth.decision]

Boot-Time No-Op

trace.getTracer returns a no-op tracer until the OpenTelemetry SDK is initialised at boot. This keeps middleware safe to import in environments where OTel is not configured (tests, local dev without a collector) without any branching in the call site.

Key Terms

  • Flow id (F9) → Cross-apex handoff is named F9 in the devarno-cloud auth catalogue; dashboards filter by it.
  • truth_source → Which authority minted the underlying credential. Airlock for meridian; CASA for .devarno.cloud apps.
  • No-op tracer → OTel API stub returned before SDK init; spans are created but never exported.

Q&A

Q: Why is auth.sub only set on allow? A: On redirect we either don’t have a cookie or the cookie is invalid, so there is no trustworthy subject to attribute the denial to. Setting it would either be a lie or leak the spoofed value.

Q: Why pin to a YAML schema instead of just using the OTel attribute names directly? A: Multiple apps emit auth.decision spans; the schema file is the cross-repo contract that lets Grafana queries fan out without per-app branches.

Q: Why does the callback emit handoff_error rather than reusing invalid_session? A: They are different failure populations — handoff_error is upstream (airlock did not even mint a token), invalid_session is local (token was minted but tampered).

Examples

A spike of auth.reason=expired with auth.session_age_ms clustering near MERIDIAN_SESSION_TTL_SECONDS * 1000 is the expected daily TTL boundary; an expired spike where session_age_ms is much smaller than TTL means an operator just lowered TTL and existing cookies are now over-aged.

neighbors on the map