Multi-Underlay Transport Selection

weave intermediate 6 min read

ELI5

Transport selection is like a delivery company that always tries to send packages by bicycle courier first (BLE, fastest for short distances), then motorcycle (Wi-Fi Direct), then car (QUIC), then truck (WebRTC). As real delivery times come in, the company updates its estimates with an exponential moving average and re-ranks the options automatically.

Technical Deep Dive

Class Diagram

classDiagram
class TransportType {
    <<enumeration>>
    WebRTC
    QUIC
    BLE
    WiFiDirect
}

class LatencyMetric {
    +TransportType transport
    +u32 latency_ms
    +u8 confidence
    +u32 sample_count
    +new(transport) LatencyMetric
    +update(measured_ms)
    +is_fast() bool
    +is_reliable() bool
}

class Transport {
    +PeerID peer
    +BTreeMap~TransportType LatencyMetric~ links
    +TransportType preferred
    +new(peer) Transport
    +update_latency(transport, measured_ms)
    +available() Vec~TransportType~
    +estimated_latency() u32
    +route_confidence() u8
}

class NetworkTopology {
    +BTreeMap~PeerID Transport~ peers
    +new() NetworkTopology
    +add_peer(peer)
    +update_latency(peer, transport, measured_ms)
    +peers_by_latency() Vec~(PeerID u32)~
    +nearest_peer() Option~(PeerID u32)~
    +peers_with_transport(transport) usize
}

Transport --> LatencyMetric
Transport --> TransportType
NetworkTopology --> Transport

Source: mesh-node/src/network.rs.

Initial Latency Priors

Transport	Initial latency_ms	Documented range
BLE	3	2–5 ms
WiFiDirect	6	2–10 ms
QUIC	12	5–20 ms
WebRTC	30	10–50 ms

All priors start with confidence = 0, sample_count = 0. The optimistic default preferred = BLE is set in Transport::new().

Transport Reselection Flow

flowchart TD
    UPD["update_latency(transport, measured_ms)"]
    UPD --> METRIC["LatencyMetric::update(measured_ms)\nEMA: new = 0.8*old + 0.2*measured"]
    METRIC --> RESEL["reselect_transport()"]
    RESEL --> SCORE["For each link with sample_count > 0:\nscore = latency_ms - confidence"]
    SCORE --> MIN["Pick transport with lowest score"]
    MIN --> PREF["Transport.preferred = winner"]

The score formula latency_ms - confidence penalises low-confidence links (many samples = high confidence = lower score, favouring the link). A link with zero samples is excluded from competition.

available() Ordering

Transport::available() returns only links with confidence > 0 (i.e. at least one sample), sorted ascending by latency_ms. This differs from preferred, which uses the score formula.

NetworkTopology

NetworkTopology is the mesh-wide view: a BTreeMap<PeerID, Transport>. Key operations:

peers_by_latency() — sorted list by estimated_latency() of preferred link
nearest_peer() — convenience: first element of peers_by_latency()
peers_with_transport(t) — counts peers where transport t has confidence > 0

Key Terms

TransportType → Enum with four variants: WebRTC, QUIC, BLE, WiFiDirect
LatencyMetric → Per-transport EMA tracker; confidence grows with sample_count, capped at 95
preferred → The TransportType with the lowest score = latency_ms - confidence among measured links
score formula → latency_ms as i32 - confidence as i32; lower is better; confidence subtracts bias toward untested links
NetworkTopology → Mesh-wide index of all peer transports; entry point for spanning-tree decisions

Q&A

Q: Why does reselect_transport() skip links with sample_count == 0? A: Uninitialised links only have prior estimates. Including them would cause the selector to permanently prefer BLE (prior 3 ms) over a well-measured QUIC link (e.g. 8 ms measured, confidence 80). The guard ensures only evidence-backed transports compete.

Q: Can preferred ever revert to a slower transport? A: Yes — if the fast link degrades (measured_ms rises via EMA) while a slower link accumulates high confidence, the score of the previously-fast link can exceed the slower link’s score, causing reselection. This is intentional adaptive behaviour.

Q: What is the EMA alpha and can it be tuned? A: Alpha is hardcoded to 0.2 in LatencyMetric::update() (line 81 of network.rs). Higher alpha reacts faster to spikes; lower alpha smooths more aggressively. There is currently no config knob for this.

Examples

Peer A connects to Peer B. Three BLE measurements arrive at 3, 4, 3 ms:

After sample 1: latency_ms = 0.8*3 + 0.2*3 = 3, confidence = 0.
After sample 2: latency_ms = 3, confidence = 1.
After sample 3: latency_ms = 3, confidence = 2.

One WebRTC measurement at 40 ms arrives: latency_ms = 0.8*30 + 0.2*40 = 32, confidence = 0.

reselect_transport(): BLE score = 3 − 2 = 1; WebRTC score excluded (sample_count = 0 initially, but after first WebRTC sample it has confidence = 0 too, so excluded). BLE wins. preferred = BLE.

neighbors on the map

LatencyMetric EMA Algorithm benchmarking how many samples are needed before a link reaches reliable status
Spanning Tree Election & Broadcast debugging why the broadcast root keeps changing unexpectedly under topology churn
Clock Discipline & Peer Sync diagnosing why Theorem 1 (≤8 ms delivery) is breached in a specific deployment
FNP Kubernetes Multi-Region Architecture deploying FNP across multiple regions
In-Process Rate-Limit Bucket investigating ingest 429s