CRUMB a card from devarno-cloud

Spanning Tree Election & Broadcast

weave advanced 8 min read

ELI5

A spanning tree election is like choosing who shouts fire-drill instructions in a building: the person closest to the exit (lowest latency) becomes the root; everyone else forms a chain from the root outward so the message travels the shortest total path. The whole seating plan is re-evaluated every 200 ms in case someone walks to a different room.

Technical Deep Dive

Class Diagram

classDiagram
class TreeNode {
+PeerID peer
+u32 latency_ms
+Option~PeerID~ parent
+Vec~PeerID~ children
+new(peer, latency_ms) TreeNode
}
class SpanningTree {
+Option~PeerID~ root
+BTreeMap~PeerID TreeNode~ nodes
+u64 last_computed
+new() SpanningTree
+add_peer(peer, latency_ms)
+update_latency(peer, latency_ms)
+recompute(current_time)
+children_of(peer) Vec~PeerID~
+depth(peer) usize
+max_path_latency() u32
}
SpanningTree --> TreeNode

Source: mesh-node/src/convergence.rs lines 93–268.

Recompute Algorithm (Prim’s)

flowchart TD
START["recompute(current_time)"]
START --> RESET["Reset parent/children for all nodes"]
RESET --> MINROOT["Root = node with min latency_ms"]
MINROOT --> PRIM["Prim loop: in_tree = [root]\nnot_in_tree = rest"]
PRIM --> EDGE["For each (tree_peer, outside_peer):\ncost = max(tree_peer.latency_ms, outside_peer.latency_ms)"]
EDGE --> MINADD["Pick (parent, child) with min cost\nAdd child to tree"]
MINADD --> DONE{not_in_tree empty?}
DONE -->|No| PRIM
DONE -->|Yes| STAMP["last_computed = current_time"]

Root election: node with the minimum latency_ms is elected root. If two nodes tie, BTreeMap ordering (by PeerID) breaks the tie deterministically.

Edge cost formula: max(tree_peer.latency_ms, outside_peer.latency_ms) — this minimises the worst-case latency on any tree edge, not the sum (which would be Dijkstra). For P2P broadcast, bottleneck minimisation is more important than total path length.

Key Operations

MethodDescription
add_peer(peer, latency_ms)Inserts a TreeNode; idempotent via or_insert_with
update_latency(peer, latency_ms)Updates latency_ms in existing node; does not recompute
recompute(ts)Full Prim’s rebuild; O(n²) in peer count
children_of(peer)Returns the broadcast fanout list for a given peer
depth(peer)Walk parent chain to count hops from root
max_path_latency()Maximum latency from root to any leaf (worst-case delivery estimate)

Rebuild Frequency

WeaveConfig::tree_recompute_interval_ms controls how often the application should call recompute():

  • default: 200 ms
  • strict: 100 ms
  • relaxed: 500 ms

SpanningTree does not self-schedule; the caller is responsible for invoking recompute() at the correct interval.

Key Terms

  • SpanningTree → Minimum spanning tree over mesh peers; root is the lowest-latency node; recomputed every tree_recompute_interval_ms
  • TreeNode → Per-peer struct storing latency_ms, parent, and children; parent/children reset at each recompute() call
  • rootOption<PeerID> — the peer with minimum latency_ms at last recompute time; elected deterministically
  • edge costmax(tree_peer.latency_ms, outside_peer.latency_ms) — minimises worst-case link on the broadcast path
  • max_path_latency → Recursive maximum latency from root to leaf; used to evaluate whether the tree satisfies Theorem 1 (≤ 8 ms)

Q&A

Q: What happens to the tree if the current root peer leaves the mesh? A: The root Option<PeerID> becomes stale — SpanningTree does not self-invalidate on peer departure. The next recompute() call will re-elect a new root from the remaining nodes. Callers must call recompute() when peer departure is detected.

Q: Why use max(latency_a, latency_b) as edge cost instead of latency_a + latency_b? A: WEAVE aims to minimise the broadcast bottleneck (Theorem 1 is about delivery time, not total bandwidth). Using max selects the tree that minimises the slowest single hop, which directly bounds the P99 delivery latency for messages flowing from root outward.

Q: Is recompute() safe to call concurrently with children_of()? A: No — recompute() mutably borrows self (through nodes.get_mut()) and rebuilds the entire tree. Concurrent calls require external synchronisation (e.g., a RwLock<SpanningTree>).

Examples

Three peers with latencies [5, 10, 8] ms (from convergence.rs test at lines 317–326):

After recompute():

  • Root = PeerID(1) (latency 5 ms — minimum).
  • PeerID(3) (8 ms) attaches to PeerID(1) via cost max(5, 8) = 8.
  • PeerID(2) (10 ms) attaches to PeerID(1) via cost max(5, 10) = 10.
  • Tree: 1 → [3, 2].

max_path_latency() = 10 (worst leaf). Under default WeaveConfig (p99 ≤ 8 ms), this tree would fail Theorem 1 on the PeerID(2) branch — the application should reject PeerID(2) or switch to a faster transport.

neighbors on the map