CRUMB a card from devarno-cloud

LatencyMetric EMA Algorithm

weave intermediate 5 min read

ELI5

The EMA latency tracker is like a taxi driver’s mental average of how long a route takes: each new trip updates the estimate, but old experience still counts for 80%. After about 85 trips the driver is considered “reliable” (confidence ≥ 80). The driver is considered “fast” if the route consistently takes ≤ 5 minutes.

Technical Deep Dive

EMA Formula

new_latency = (1 − α) × old_latency + α × measured_ms
= 0.8 × old_latency + 0.2 × measured_ms

Alpha α = 0.2 is hardcoded in LatencyMetric::update() (line 81, network.rs). This is a conservative smoothing factor — it takes roughly 9 samples to weight new measurements at > 80% of the total.

Confidence Growth

confidence = min(sample_count, 100) × 95 / 100
sample_countconfidence
00
10
8580 (is_reliable threshold)
10095 (maximum)

Confidence is capped at 95, not 100 — this leaves a permanent uncertainty margin so no link is treated as infallible.

Convergence to True Latency

xychart-beta
title "EMA convergence: true latency = 10ms, initial prior = 12ms"
x-axis "Sample count" [1, 5, 10, 20, 30, 50]
y-axis "Estimated latency (ms)" 8 --> 14
line [11.6, 10.7, 10.2, 10.05, 10.01, 10.0]

(Inferred from the EMA formula applied iteratively; true values computed analytically.)

Status Flags

FlagMethodCondition
Fastis_fast()latency_ms <= 5
Reliableis_reliable()confidence >= 80 (≈ 85 samples)

These flags are consumed by Transport::reselect_transport() via the score formula. A link can be fast but not reliable (few samples), or reliable but not fast (well-measured slow link).

Integration with Transport Scoring

The score latency_ms as i32 − confidence as i32 means:

  • A fresh fast link (3 ms, confidence 0) scores 3.
  • A mature fast link (3 ms, confidence 80) scores −77.
  • A mature slow link (30 ms, confidence 80) scores −50.

The mature fast link always wins once it accumulates samples. During cold-start, all links score near their priors.

Key Terms

  • EMA (Exponential Moving Average) → Smoothing filter: new = (1−α)×old + α×sample; weights recent samples more than old ones
  • alpha (α) → Smoothing factor; 0.2 in WEAVE — retains 80% of prior estimate per sample
  • confidence → Integer 0–95 derived from sample count; used as a tie-breaker bonus in transport scoring
  • is_fastlatency_ms <= 5; informational flag; not directly used in scoring
  • is_reliableconfidence >= 80; requires ≈ 85 samples; indicates stable measurement base

Q&A

Q: How many samples until a link beats the cold-start prior of another transport? A: A BLE link starts at 3 ms prior. After 1 sample its score is 3 - 0 = 3. A QUIC link at 12 ms prior after 85 samples (confidence 80) scores 12 - 80 = -68. QUIC would win the score competition even though it is slower in absolute terms — highlighting that the cold-start prior matters during the first 85 measurements.

Q: What prevents a temporary spike from permanently degrading a link’s preference? A: The EMA with α=0.2 heavily smooths spikes. A single spike at 3× the true value shifts the estimate by only 20% of the spike’s deviation. Recovery follows the same EMA rate — roughly 10 samples to halve the spike’s residual effect.

Q: Why cap confidence at 95 instead of 100? A: The cap models inherent measurement uncertainty — network latency is never perfectly stable. Reaching confidence = 100 would mean the score formula could produce scores far into the negatives, potentially causing spurious transport flaps when a minor measurement fluctuation temporarily raises latency_ms.

Examples

Simulating 3 BLE measurements matching the test in network.rs lines 243–250:

Initial: latency_ms = 3 (prior), confidence = 0, sample_count = 0
After update(10): latency_ms = 0.8*3 + 0.2*10 = 4.4 → 4, confidence = 0, count = 1
After update(10): latency_ms = 0.8*4 + 0.2*10 = 5.2 → 5, confidence = 0, count = 2
After update(10): latency_ms = 0.8*5 + 0.2*10 = 6, confidence = 2, count = 3
assert!(latency_ms <= 12) ← test passes

This matches the assertion at line 250: after 3 samples converging toward 10 ms, the estimate is still ≤ 12 ms.

neighbors on the map