CRUMB a card from devarno-cloud

LCB DAG State Machine

weave intermediate 7 min read

ELI5

A DAG state machine is a bucket-sort line in a parcel depot: each parcel has a list of parcels it depends on arriving first. When a dependency clears, the dependent parcel slides into the “ready to dispatch” belt. DAGState is that depot — delivery_queue is the belt.

Technical Deep Dive

State Transitions

stateDiagram-v2
[*] --> Unknown
Unknown --> Pending: add_message (parents > 0)
Unknown --> Ready: add_message (parents = 0)
Pending --> Ready: satisfy_dependency() → dependencies_remaining = 0
Ready --> Delivered: deliver_message()
Delivered --> [*]

add_message Path (Algorithm 1, lines 1–10)

sequenceDiagram
participant Caller
participant DAGState
participant DAGNode
Caller->>DAGState: add_message(msg, received_at)
DAGState->>DAGState: check messages.contains_key(msg_id)
alt duplicate
DAGState-->>Caller: Err("Duplicate message")
else new
DAGState->>DAGNode: new(msg, received_at)
Note right of DAGNode: dependencies_remaining = parents.len()
DAGState->>DAGState: messages.insert(msg_id, node)
alt parents.len() == 0
DAGState->>DAGState: delivery_queue.push_back(msg_id)
end
DAGState->>DAGState: lamport_clock = max(lamport_clock, msg.lamport) + 1
DAGState-->>Caller: Ok(())
end

Source: mesh-node/src/proto.rs lines 148–175.

deliver_message Cascade

When deliver_message(msg_id) is called:

  1. Node is marked delivered = true.
  2. Every other node whose parents contains msg_id calls satisfy_dependency().
  3. Any node that reaches dependencies_remaining == 0 is pushed onto delivery_queue.

This cascade is O(n) in the number of messages currently in the DAG — acceptable for typical collaboration sessions but a performance consideration for very large, deep chains (see weave-005).

Lamport Clock Update Rule

lamport_clock = max(lamport_clock, msg.lamport) + 1

Applied inside add_message. The clock is used for causal ordering tiebreaking; it does not replace the physical clock used by convergence.rs for the 8 ms delivery guarantee.

Key API

MethodReturnsContract
add_message(msg, ts)Result<(), String>Idempotent reject on duplicate
deliver_message(id)Result<LCBMessage, String>Error if not ready
next_delivery()Option<LCBMessage>Pops queue, delivers head
get_deliverable()Vec<MsgID>All nodes with ready_to_deliver() true
size() / delivered_count()usizeDiagnostic counters

Key Terms

  • DAGNode → Wrapper around LCBMessage; tracks received_at, delivered flag, and dependencies_remaining
  • delivery_queueVecDeque<MsgID>; FIFO buffer of messages cleared for application delivery
  • dependencies_remaining → Counter initialised to parents.len(); decremented by satisfy_dependency() each time a parent is delivered
  • satisfy_dependency → Decrements dependencies_remaining; called by deliver_message on all child nodes of a newly delivered node
  • Algorithm 1 → The LCB core loop specified in lib.rs doc comment; DAGState is its direct implementation

Q&A

Q: What happens if a parent message never arrives (permanent network partition)? A: dependencies_remaining stays > 0 indefinitely and the child never enters delivery_queue. LC5 (Liveness, ≤ 300 ms bound) requires that the parent arrives within max_delivery_ms; the transport layer is responsible for triggering a timeout and delivering via alternate route. DAGState itself has no timeout logic — it trusts the caller.

Q: Can next_delivery() return None even when messages exist? A: Yes — if every message in the DAG has at least one undelivered parent. Callers should check delivery_queue.is_empty() before polling to avoid busy-looping.

Q: Why use BTreeMap instead of HashMap for messages? A: BTreeMap<MsgID, DAGNode> gives deterministic iteration order (important for the deliver_message cascade loop at lines 194–201), which prevents any ordering-dependent bugs across different runtime environments.

Examples

Three messages arrive in network order B, A, C where A is the root, B depends on A, and C depends on B:

  1. add_message(B)dependencies_remaining = 1, not queued.
  2. add_message(A)dependencies_remaining = 0, queued immediately.
  3. add_message(C)dependencies_remaining = 1, not queued.
  4. next_delivery() pops A, calls deliver_message(A), cascades to B → dependencies_remaining = 0, B queued.
  5. next_delivery() pops B, cascades to C → C queued.
  6. next_delivery() pops C.

Result: delivered order A → B → C regardless of arrival order.

neighbors on the map