CRUMB a card from devarno-cloud

Temporal DAG Data Model

tnp intermediate 7 min read

ELI5

The DAG is a family tree where every operation is a person, parents are earlier operations, and a “fork” is any parent who had two or more children — i.e., two competing branches of history both trace back to that ancestor.

Technical Deep Dive

Class Diagram

classDiagram
class NodeID {
+bytes: Vec~u8~
+new(bytes: Vec~u8~) NodeID
}
class DAGNode {
+id: NodeID
+operation: Vec~u8~
+state_hash: Vec~u8~
+parents: Vec~NodeID~
+timestamp_ms: u64
+replica_id: u64
+new(id, operation, state_hash, replica_id) DAGNode
}
class TemporalDAG {
-nodes: HashMap~NodeID, DAGNode~
-children: HashMap~NodeID, Vec~NodeID~~
-roots: Vec~NodeID~
+add_node(op, hash) Result~NodeID~
+add_edge(parent, child) Result
+topological_sort() Result~Vec~NodeID~~
+detect_forks() Vec~NodeID~
+longest_path_length() usize
+node_count() usize
+roots() Vec~NodeID~
}
TemporalDAG "1" --> "0..*" DAGNode : stores
DAGNode "1" --> "1" NodeID : has id
DAGNode "0..*" --> "0..*" NodeID : parents

NodeID Construction

add_node derives the NodeID directly from the operation bytes:

let node_id = NodeID::new(operation.clone());

This means two calls with identical operation bytes return the same NodeID. The DAG treats this as a duplicate and returns early (Ok(node_id)) without inserting a second node — a built-in idempotency guard.

Root Tracking

The first node ever inserted becomes a root. Subsequent nodes added via add_edge are removed from self.roots if they acquire a parent:

roots.retain(|id| id != &child)

Topological Sort (Kahn’s Algorithm)

flowchart LR
INIT["Compute in-degree\nfor every node"] --> QUEUE["Queue all\nzero-in-degree roots"]
QUEUE --> PROCESS["Pop node → append to sorted\nDecrement children's in-degree"]
PROCESS --> CHECK{"child.in_degree == 0?"}
CHECK -- yes --> QUEUE
CHECK -- no --> PROCESS
PROCESS --> DONE{"queue empty?"}
DONE -- "sorted.len == nodes.len" --> OK["Ok(sorted)"]
DONE -- "sorted.len < nodes.len" --> ERR["Err: Cyclic dependency"]

Fork Detection

pub fn detect_forks(&self) -> Vec<NodeID> {
self.children
.iter()
.filter(|(_, children)| children.len() > 1)
.map(|(parent, _)| parent.clone())
.collect()
}

Any node with two or more children in self.children is classified as a fork origin. The result feeds ForkAnalyzer::analyze to compute a fork-probability ratio.

Longest Path

Computed via depth-first search from each root with a HashSet<NodeID> visited guard. Returns 0 for an empty DAG. Used by ForkAnalyzer::estimate_storage_overhead to model canonical operation count.

Key Terms

  • NodeID → wrapper around Vec<u8>; derived from operation bytes; doubles as dedup key
  • DAGNode → graph vertex; carries operation, state_hash, parents, timestamp_ms, and replica_id
  • roots → nodes with no parents; maintained by add_edge via retain
  • detect_forks → returns nodes whose children map entry has len > 1

Q&A

Q: What happens if add_edge is called with a parent or child ID that is not in self.nodes? A: Returns Err("Node not found".to_string()).

Q: Can replica_id carry meaningful peer data? A: As of the current source, DAGNode::new hardcodes replica_id: 0; the field is present for future use but carries no peer identity today.

Q: Does longest_path_length handle diamond-shaped merges (a node with two parents)? A: The visited guard prevents revisiting a node, so the DFS may under-count depth on diamonds — it stops the second time it encounters an already-visited merge node. (Inferred from the DFS implementation; no test covers the diamond case.)

Examples

let mut dag = TemporalDAG::new();
let id1 = dag.add_node(b"op1".to_vec(), vec![1]).unwrap();
let id2 = dag.add_node(b"branch-a".to_vec(), vec![2]).unwrap();
let id3 = dag.add_node(b"branch-b".to_vec(), vec![3]).unwrap();
dag.add_edge(id1.clone(), id2).unwrap();
dag.add_edge(id1.clone(), id3).unwrap();
// id1 is a fork origin
assert_eq!(dag.detect_forks().len(), 1);
// topological sort visits id1 before id2 and id3
let sorted = dag.topological_sort().unwrap();
assert_eq!(sorted[0], id1);

neighbors on the map