Temporal DAG Data Model
tnp intermediate 7 min read
ELI5
The DAG is a family tree where every operation is a person, parents are earlier operations, and a “fork” is any parent who had two or more children — i.e., two competing branches of history both trace back to that ancestor.
Technical Deep Dive
Class Diagram
classDiagram class NodeID { +bytes: Vec~u8~ +new(bytes: Vec~u8~) NodeID } class DAGNode { +id: NodeID +operation: Vec~u8~ +state_hash: Vec~u8~ +parents: Vec~NodeID~ +timestamp_ms: u64 +replica_id: u64 +new(id, operation, state_hash, replica_id) DAGNode } class TemporalDAG { -nodes: HashMap~NodeID, DAGNode~ -children: HashMap~NodeID, Vec~NodeID~~ -roots: Vec~NodeID~ +add_node(op, hash) Result~NodeID~ +add_edge(parent, child) Result +topological_sort() Result~Vec~NodeID~~ +detect_forks() Vec~NodeID~ +longest_path_length() usize +node_count() usize +roots() Vec~NodeID~ } TemporalDAG "1" --> "0..*" DAGNode : stores DAGNode "1" --> "1" NodeID : has id DAGNode "0..*" --> "0..*" NodeID : parentsNodeID Construction
add_node derives the NodeID directly from the operation bytes:
let node_id = NodeID::new(operation.clone());This means two calls with identical operation bytes return the same NodeID. The DAG treats this as a duplicate and returns early (Ok(node_id)) without inserting a second node — a built-in idempotency guard.
Root Tracking
The first node ever inserted becomes a root. Subsequent nodes added via add_edge are removed from self.roots if they acquire a parent:
roots.retain(|id| id != &child)Topological Sort (Kahn’s Algorithm)
flowchart LR INIT["Compute in-degree\nfor every node"] --> QUEUE["Queue all\nzero-in-degree roots"] QUEUE --> PROCESS["Pop node → append to sorted\nDecrement children's in-degree"] PROCESS --> CHECK{"child.in_degree == 0?"} CHECK -- yes --> QUEUE CHECK -- no --> PROCESS PROCESS --> DONE{"queue empty?"} DONE -- "sorted.len == nodes.len" --> OK["Ok(sorted)"] DONE -- "sorted.len < nodes.len" --> ERR["Err: Cyclic dependency"]Fork Detection
pub fn detect_forks(&self) -> Vec<NodeID> { self.children .iter() .filter(|(_, children)| children.len() > 1) .map(|(parent, _)| parent.clone()) .collect()}Any node with two or more children in self.children is classified as a fork origin. The result feeds ForkAnalyzer::analyze to compute a fork-probability ratio.
Longest Path
Computed via depth-first search from each root with a HashSet<NodeID> visited guard. Returns 0 for an empty DAG. Used by ForkAnalyzer::estimate_storage_overhead to model canonical operation count.
Key Terms
- NodeID → wrapper around
Vec<u8>; derived from operation bytes; doubles as dedup key - DAGNode → graph vertex; carries
operation,state_hash,parents,timestamp_ms, andreplica_id - roots → nodes with no parents; maintained by
add_edgeviaretain - detect_forks → returns nodes whose
childrenmap entry haslen > 1
Q&A
Q: What happens if add_edge is called with a parent or child ID that is not in self.nodes?
A: Returns Err("Node not found".to_string()).
Q: Can replica_id carry meaningful peer data?
A: As of the current source, DAGNode::new hardcodes replica_id: 0; the field is present for future use but carries no peer identity today.
Q: Does longest_path_length handle diamond-shaped merges (a node with two parents)?
A: The visited guard prevents revisiting a node, so the DFS may under-count depth on diamonds — it stops the second time it encounters an already-visited merge node. (Inferred from the DFS implementation; no test covers the diamond case.)
Examples
let mut dag = TemporalDAG::new();let id1 = dag.add_node(b"op1".to_vec(), vec![1]).unwrap();let id2 = dag.add_node(b"branch-a".to_vec(), vec![2]).unwrap();let id3 = dag.add_node(b"branch-b".to_vec(), vec![3]).unwrap();dag.add_edge(id1.clone(), id2).unwrap();dag.add_edge(id1.clone(), id3).unwrap();
// id1 is a fork originassert_eq!(dag.detect_forks().len(), 1);// topological sort visits id1 before id2 and id3let sorted = dag.topological_sort().unwrap();assert_eq!(sorted[0], id1);neighbors on the map
- FNP Lamport Clocks & Causal Ordering understanding causal ordering in distributed systems
- Hybrid Logical Clock deciding ordering between concurrent operations
- Timeline Reconstruction implementing a new timeline UI control