CRUMB a card from devarno-cloud

Fork Detection & Storage Analysis

tnp intermediate 6 min read

ELI5

The fork analyzer is a building inspector who walks the operation family tree, counts every parent with two children (that’s a fork), and then tells the landlord: “you’re using 3× the space you need — time to demolish some abandoned branches.”

Technical Deep Dive

ForkAnalyzer API

pub struct ForkAnalyzer; // zero-cost; instantiate with ForkAnalyzer::new()
pub fn analyze(&self, dag: &TemporalDAG) -> f64
pub fn get_fork_info(&self, dag: &TemporalDAG) -> Vec<ForkInfo>
pub fn estimate_storage_overhead(&self, dag: &TemporalDAG, timeline_count: usize) -> f64
pub fn prune_recommendation(&self, dag: &TemporalDAG, max_timelines: usize, active_timelines: usize) -> bool

Fork Probability

P(fork) = forks.len() / node_count.max(1)
  • Returns 0.0 for an empty DAG (no division-by-zero).
  • Forks are nodes whose entry in children has len > 1 (from detect_forks).

ForkInfo Fields

FieldTypeDescription
fork_nodeNodeIDThe parent with multiple children
branch_countusizeNumber of children (always ≥ 2)
branch_depthsVec<usize>DFS depth of each child’s subtree
timestamp_msu64Wall-clock of the fork node

Storage Overhead Model

flowchart LR
CANON["canonical_ops\n= DAG.longest_path_length()"]
BRANCH["branch_ops\n= canonical × 0.3\n(30% delta estimate)"]
TOTAL["total_storage\n= canonical + branch_ops × (N-1)"]
RATIO["overhead_ratio\n= total / canonical"]
CANON --> BRANCH
CANON --> TOTAL
BRANCH --> TOTAL
TOTAL --> RATIO

For a single timeline (N == 1), the function returns 1.0. The 30% delta assumption models ideal delta-encoding; real overhead depends on branch divergence depth.

Pruning Decision Logic

pub fn prune_recommendation(&self, dag, max_timelines, active_timelines) -> bool {
if active_timelines >= max_timelines { return true; }
let fork_ratio = forks.len() as f64 / node_count as f64;
fork_ratio > 0.1 // >10% fork ratio triggers recommendation
}

Two independent triggers:

  1. Capacity: active_timelines >= max_timelines
  2. Shape: fork ratio exceeds 10%

Branch Depth DFS

branch_depth is a recursive DFS that measures the longest chain below each fork child. Combined with ForkInfo.branch_depths, callers can identify asymmetric forks where one branch is deep and another is a shallow dead-end.

Key Terms

  • ForkAnalyzer → stateless struct; all methods take a &TemporalDAG reference
  • fork probabilityforks / nodes ratio; used to surface divergence pressure
  • estimate_storage_overhead → models total bytes as canonical + 0.3 × canonical × (N-1) branches
  • prune_recommendation → boolean threshold function; true on capacity saturation OR fork_ratio > 0.1

Q&A

Q: Is ForkAnalyzer used during normal operation or only on demand? A: On demand — TemporalNavigationProtocol::fork_probability() creates a new ForkAnalyzer per call. There is no background monitoring loop in the current source.

Q: Does branch_depth visit shared subgraphs multiple times? A: Yes — there is no visited-set guard in branch_depth, only in TemporalDAG::dfs_depth. For a deep diamond graph this could be expensive. (Observed from fork.rs source.)

Q: What delta fraction does estimate_storage_overhead assume? A: 30% (0.3). This is a hardcoded constant in the function body; no configuration knob exists.

Examples

let mut dag = TemporalDAG::new();
let root = dag.add_node(b"root".to_vec(), vec![0]).unwrap();
let a = dag.add_node(b"branch-a".to_vec(), vec![1]).unwrap();
let b = dag.add_node(b"branch-b".to_vec(), vec![2]).unwrap();
dag.add_edge(root, a).unwrap();
dag.add_edge(root, b).unwrap();
let analyzer = ForkAnalyzer::new();
let prob = analyzer.analyze(&dag); // 1/3 ≈ 0.333
let info = analyzer.get_fork_info(&dag);
// info[0].branch_count == 2
// info[0].branch_depths == [1, 1]
let overhead = analyzer.estimate_storage_overhead(&dag, 3);
// canonical = 1 (longest path), total = 1 + 0.3×2 = 1.6, ratio = 1.6

neighbors on the map