Multi-Underlay Transport Selection
weave intermediate 6 min read
ELI5
Transport selection is like a delivery company that always tries to send packages by bicycle courier first (BLE, fastest for short distances), then motorcycle (Wi-Fi Direct), then car (QUIC), then truck (WebRTC). As real delivery times come in, the company updates its estimates with an exponential moving average and re-ranks the options automatically.
Technical Deep Dive
Class Diagram
classDiagramclass TransportType { <<enumeration>> WebRTC QUIC BLE WiFiDirect}
class LatencyMetric { +TransportType transport +u32 latency_ms +u8 confidence +u32 sample_count +new(transport) LatencyMetric +update(measured_ms) +is_fast() bool +is_reliable() bool}
class Transport { +PeerID peer +BTreeMap~TransportType LatencyMetric~ links +TransportType preferred +new(peer) Transport +update_latency(transport, measured_ms) +available() Vec~TransportType~ +estimated_latency() u32 +route_confidence() u8}
class NetworkTopology { +BTreeMap~PeerID Transport~ peers +new() NetworkTopology +add_peer(peer) +update_latency(peer, transport, measured_ms) +peers_by_latency() Vec~(PeerID u32)~ +nearest_peer() Option~(PeerID u32)~ +peers_with_transport(transport) usize}
Transport --> LatencyMetricTransport --> TransportTypeNetworkTopology --> TransportSource: mesh-node/src/network.rs.
Initial Latency Priors
| Transport | Initial latency_ms | Documented range |
|---|---|---|
| BLE | 3 | 2–5 ms |
| WiFiDirect | 6 | 2–10 ms |
| QUIC | 12 | 5–20 ms |
| WebRTC | 30 | 10–50 ms |
All priors start with confidence = 0, sample_count = 0. The optimistic default preferred = BLE is set in Transport::new().
Transport Reselection Flow
flowchart TD UPD["update_latency(transport, measured_ms)"] UPD --> METRIC["LatencyMetric::update(measured_ms)\nEMA: new = 0.8*old + 0.2*measured"] METRIC --> RESEL["reselect_transport()"] RESEL --> SCORE["For each link with sample_count > 0:\nscore = latency_ms - confidence"] SCORE --> MIN["Pick transport with lowest score"] MIN --> PREF["Transport.preferred = winner"]The score formula latency_ms - confidence penalises low-confidence links (many samples = high confidence = lower score, favouring the link). A link with zero samples is excluded from competition.
available() Ordering
Transport::available() returns only links with confidence > 0 (i.e. at least one sample), sorted ascending by latency_ms. This differs from preferred, which uses the score formula.
NetworkTopology
NetworkTopology is the mesh-wide view: a BTreeMap<PeerID, Transport>. Key operations:
peers_by_latency()— sorted list byestimated_latency()of preferred linknearest_peer()— convenience: first element ofpeers_by_latency()peers_with_transport(t)— counts peers where transportthasconfidence > 0
Key Terms
- TransportType → Enum with four variants:
WebRTC,QUIC,BLE,WiFiDirect - LatencyMetric → Per-transport EMA tracker;
confidencegrows withsample_count, capped at 95 - preferred → The
TransportTypewith the lowestscore = latency_ms - confidenceamong measured links - score formula →
latency_ms as i32 - confidence as i32; lower is better; confidence subtracts bias toward untested links - NetworkTopology → Mesh-wide index of all peer transports; entry point for spanning-tree decisions
Q&A
Q: Why does reselect_transport() skip links with sample_count == 0? A: Uninitialised links only have prior estimates. Including them would cause the selector to permanently prefer BLE (prior 3 ms) over a well-measured QUIC link (e.g. 8 ms measured, confidence 80). The guard ensures only evidence-backed transports compete.
Q: Can preferred ever revert to a slower transport? A: Yes — if the fast link degrades (measured_ms rises via EMA) while a slower link accumulates high confidence, the score of the previously-fast link can exceed the slower link’s score, causing reselection. This is intentional adaptive behaviour.
Q: What is the EMA alpha and can it be tuned?
A: Alpha is hardcoded to 0.2 in LatencyMetric::update() (line 81 of network.rs). Higher alpha reacts faster to spikes; lower alpha smooths more aggressively. There is currently no config knob for this.
Examples
Peer A connects to Peer B. Three BLE measurements arrive at 3, 4, 3 ms:
- After sample 1:
latency_ms = 0.8*3 + 0.2*3 = 3,confidence = 0. - After sample 2:
latency_ms = 3,confidence = 1. - After sample 3:
latency_ms = 3,confidence = 2.
One WebRTC measurement at 40 ms arrives: latency_ms = 0.8*30 + 0.2*40 = 32, confidence = 0.
reselect_transport(): BLE score = 3 − 2 = 1; WebRTC score excluded (sample_count = 0 initially, but after first WebRTC sample it has confidence = 0 too, so excluded). BLE wins. preferred = BLE.
neighbors on the map
- LatencyMetric EMA Algorithm benchmarking how many samples are needed before a link reaches reliable status
- Spanning Tree Election & Broadcast debugging why the broadcast root keeps changing unexpectedly under topology churn
- Clock Discipline & Peer Sync diagnosing why Theorem 1 (≤8 ms delivery) is breached in a specific deployment
- FNP Kubernetes Multi-Region Architecture deploying FNP across multiple regions
- In-Process Rate-Limit Bucket investigating ingest 429s