CRUMB a card from devarno-cloud

Multi-Underlay Transport Selection

weave intermediate 6 min read

ELI5

Transport selection is like a delivery company that always tries to send packages by bicycle courier first (BLE, fastest for short distances), then motorcycle (Wi-Fi Direct), then car (QUIC), then truck (WebRTC). As real delivery times come in, the company updates its estimates with an exponential moving average and re-ranks the options automatically.

Technical Deep Dive

Class Diagram

classDiagram
class TransportType {
<<enumeration>>
WebRTC
QUIC
BLE
WiFiDirect
}
class LatencyMetric {
+TransportType transport
+u32 latency_ms
+u8 confidence
+u32 sample_count
+new(transport) LatencyMetric
+update(measured_ms)
+is_fast() bool
+is_reliable() bool
}
class Transport {
+PeerID peer
+BTreeMap~TransportType LatencyMetric~ links
+TransportType preferred
+new(peer) Transport
+update_latency(transport, measured_ms)
+available() Vec~TransportType~
+estimated_latency() u32
+route_confidence() u8
}
class NetworkTopology {
+BTreeMap~PeerID Transport~ peers
+new() NetworkTopology
+add_peer(peer)
+update_latency(peer, transport, measured_ms)
+peers_by_latency() Vec~(PeerID u32)~
+nearest_peer() Option~(PeerID u32)~
+peers_with_transport(transport) usize
}
Transport --> LatencyMetric
Transport --> TransportType
NetworkTopology --> Transport

Source: mesh-node/src/network.rs.

Initial Latency Priors

TransportInitial latency_msDocumented range
BLE32–5 ms
WiFiDirect62–10 ms
QUIC125–20 ms
WebRTC3010–50 ms

All priors start with confidence = 0, sample_count = 0. The optimistic default preferred = BLE is set in Transport::new().

Transport Reselection Flow

flowchart TD
UPD["update_latency(transport, measured_ms)"]
UPD --> METRIC["LatencyMetric::update(measured_ms)\nEMA: new = 0.8*old + 0.2*measured"]
METRIC --> RESEL["reselect_transport()"]
RESEL --> SCORE["For each link with sample_count > 0:\nscore = latency_ms - confidence"]
SCORE --> MIN["Pick transport with lowest score"]
MIN --> PREF["Transport.preferred = winner"]

The score formula latency_ms - confidence penalises low-confidence links (many samples = high confidence = lower score, favouring the link). A link with zero samples is excluded from competition.

available() Ordering

Transport::available() returns only links with confidence > 0 (i.e. at least one sample), sorted ascending by latency_ms. This differs from preferred, which uses the score formula.

NetworkTopology

NetworkTopology is the mesh-wide view: a BTreeMap<PeerID, Transport>. Key operations:

  • peers_by_latency() — sorted list by estimated_latency() of preferred link
  • nearest_peer() — convenience: first element of peers_by_latency()
  • peers_with_transport(t) — counts peers where transport t has confidence > 0

Key Terms

  • TransportType → Enum with four variants: WebRTC, QUIC, BLE, WiFiDirect
  • LatencyMetric → Per-transport EMA tracker; confidence grows with sample_count, capped at 95
  • preferred → The TransportType with the lowest score = latency_ms - confidence among measured links
  • score formulalatency_ms as i32 - confidence as i32; lower is better; confidence subtracts bias toward untested links
  • NetworkTopology → Mesh-wide index of all peer transports; entry point for spanning-tree decisions

Q&A

Q: Why does reselect_transport() skip links with sample_count == 0? A: Uninitialised links only have prior estimates. Including them would cause the selector to permanently prefer BLE (prior 3 ms) over a well-measured QUIC link (e.g. 8 ms measured, confidence 80). The guard ensures only evidence-backed transports compete.

Q: Can preferred ever revert to a slower transport? A: Yes — if the fast link degrades (measured_ms rises via EMA) while a slower link accumulates high confidence, the score of the previously-fast link can exceed the slower link’s score, causing reselection. This is intentional adaptive behaviour.

Q: What is the EMA alpha and can it be tuned? A: Alpha is hardcoded to 0.2 in LatencyMetric::update() (line 81 of network.rs). Higher alpha reacts faster to spikes; lower alpha smooths more aggressively. There is currently no config knob for this.

Examples

Peer A connects to Peer B. Three BLE measurements arrive at 3, 4, 3 ms:

  1. After sample 1: latency_ms = 0.8*3 + 0.2*3 = 3, confidence = 0.
  2. After sample 2: latency_ms = 3, confidence = 1.
  3. After sample 3: latency_ms = 3, confidence = 2.

One WebRTC measurement at 40 ms arrives: latency_ms = 0.8*30 + 0.2*40 = 32, confidence = 0.

reselect_transport(): BLE score = 3 − 2 = 1; WebRTC score excluded (sample_count = 0 initially, but after first WebRTC sample it has confidence = 0 too, so excluded). BLE wins. preferred = BLE.

neighbors on the map