LatencyMetric EMA Algorithm
weave intermediate 5 min read
ELI5
The EMA latency tracker is like a taxi driver’s mental average of how long a route takes: each new trip updates the estimate, but old experience still counts for 80%. After about 85 trips the driver is considered “reliable” (confidence ≥ 80). The driver is considered “fast” if the route consistently takes ≤ 5 minutes.
Technical Deep Dive
EMA Formula
new_latency = (1 − α) × old_latency + α × measured_ms = 0.8 × old_latency + 0.2 × measured_msAlpha α = 0.2 is hardcoded in LatencyMetric::update() (line 81, network.rs). This is a conservative smoothing factor — it takes roughly 9 samples to weight new measurements at > 80% of the total.
Confidence Growth
confidence = min(sample_count, 100) × 95 / 100| sample_count | confidence |
|---|---|
| 0 | 0 |
| 1 | 0 |
| 85 | 80 (is_reliable threshold) |
| 100 | 95 (maximum) |
Confidence is capped at 95, not 100 — this leaves a permanent uncertainty margin so no link is treated as infallible.
Convergence to True Latency
xychart-betatitle "EMA convergence: true latency = 10ms, initial prior = 12ms"x-axis "Sample count" [1, 5, 10, 20, 30, 50]y-axis "Estimated latency (ms)" 8 --> 14line [11.6, 10.7, 10.2, 10.05, 10.01, 10.0](Inferred from the EMA formula applied iteratively; true values computed analytically.)
Status Flags
| Flag | Method | Condition |
|---|---|---|
| Fast | is_fast() | latency_ms <= 5 |
| Reliable | is_reliable() | confidence >= 80 (≈ 85 samples) |
These flags are consumed by Transport::reselect_transport() via the score formula. A link can be fast but not reliable (few samples), or reliable but not fast (well-measured slow link).
Integration with Transport Scoring
The score latency_ms as i32 − confidence as i32 means:
- A fresh fast link (3 ms, confidence 0) scores 3.
- A mature fast link (3 ms, confidence 80) scores −77.
- A mature slow link (30 ms, confidence 80) scores −50.
The mature fast link always wins once it accumulates samples. During cold-start, all links score near their priors.
Key Terms
- EMA (Exponential Moving Average) → Smoothing filter:
new = (1−α)×old + α×sample; weights recent samples more than old ones - alpha (α) → Smoothing factor;
0.2in WEAVE — retains 80% of prior estimate per sample - confidence → Integer 0–95 derived from sample count; used as a tie-breaker bonus in transport scoring
- is_fast →
latency_ms <= 5; informational flag; not directly used in scoring - is_reliable →
confidence >= 80; requires ≈ 85 samples; indicates stable measurement base
Q&A
Q: How many samples until a link beats the cold-start prior of another transport?
A: A BLE link starts at 3 ms prior. After 1 sample its score is 3 - 0 = 3. A QUIC link at 12 ms prior after 85 samples (confidence 80) scores 12 - 80 = -68. QUIC would win the score competition even though it is slower in absolute terms — highlighting that the cold-start prior matters during the first 85 measurements.
Q: What prevents a temporary spike from permanently degrading a link’s preference? A: The EMA with α=0.2 heavily smooths spikes. A single spike at 3× the true value shifts the estimate by only 20% of the spike’s deviation. Recovery follows the same EMA rate — roughly 10 samples to halve the spike’s residual effect.
Q: Why cap confidence at 95 instead of 100?
A: The cap models inherent measurement uncertainty — network latency is never perfectly stable. Reaching confidence = 100 would mean the score formula could produce scores far into the negatives, potentially causing spurious transport flaps when a minor measurement fluctuation temporarily raises latency_ms.
Examples
Simulating 3 BLE measurements matching the test in network.rs lines 243–250:
Initial: latency_ms = 3 (prior), confidence = 0, sample_count = 0After update(10): latency_ms = 0.8*3 + 0.2*10 = 4.4 → 4, confidence = 0, count = 1After update(10): latency_ms = 0.8*4 + 0.2*10 = 5.2 → 5, confidence = 0, count = 2After update(10): latency_ms = 0.8*5 + 0.2*10 = 6, confidence = 2, count = 3assert!(latency_ms <= 12) ← test passesThis matches the assertion at line 250: after 3 samples converging toward 10 ms, the estimate is still ≤ 12 ms.
neighbors on the map
- Multi-Underlay Transport Selection diagnosing why a peer keeps falling back to WebRTC when BLE should be available
- Spanning Tree Election & Broadcast debugging why the broadcast root keeps changing unexpectedly under topology churn
- FNP Observability & Prometheus Metrics monitoring FNP systems
- In-Process Rate-Limit Bucket investigating ingest 429s