CRUMB a card from devarno-cloud

Delta Compression & Merkle Chunking

tnp intermediate 7 min read

ELI5

Delta compression is like packing for a trip: instead of two full suitcases, you pack one complete bag (canonical) plus a small bag of only the extras (deltas). When you land, you unzip both bags together to reconstruct the full wardrobe.

Technical Deep Dive

Class Diagram

classDiagram
class DeltaEncoder {
-canonical_ops: Vec~NodeID~
-delta_cache: HashMap~String, Vec~NodeID~~
+new(canonical_ops: Vec~NodeID~) DeltaEncoder
+encode_delta(timeline_id: &str, ops: &[NodeID]) Vec~NodeID~
+decode_delta(timeline_id: &str) Result~Vec~NodeID~~
+compression_ratio(timeline_id: &str, original_size: usize) f64
}
class CompressionStats {
+timeline_id: String
+original_size_bytes: usize
+encoded_size_bytes: usize
+compression_ratio: f64
+delta_count: usize
+canonical_count: usize
+savings_percentage() f64
+bytes_saved() usize
}
class MerkleChunker {
-chunk_size: usize
-tree_depth: usize
+new(chunk_size: usize) MerkleChunker
+chunk(data: &[u8]) Vec~Vec~u8~~
+compute_tree_depth(leaf_count: usize)
+get_proof_path(chunk_index: usize) Vec~usize~
}

Encode/Decode Pipeline

flowchart LR
BRANCH["branch ops\n(full node list)"] --> ENC["encode_delta:\nfilter out canonical_ops"]
ENC --> CACHE["delta_cache\n{timeline_id → Vec<NodeID>}"]
CACHE --> DEC["decode_delta:\ncanonical_ops ++ deltas"]
DEC --> FULL["reconstructed full\nnode list"]

encode_delta retains only nodes not present in canonical_ops (linear scan via Vec::contains). The result is stored in delta_cache keyed by a caller-supplied timeline_id string.

compression_ratio Formula

let encoded_size = self.canonical_ops.len() * 32 + deltas.len() * 32;
encoded_size as f64 / original_size as f64
  • Assumes each NodeID occupies 32 bytes (SHA-256 length convention; actual NodeID.bytes can be any length).
  • Does not account for the overhead of the delta_cache HashMap itself.
  • Returns 1.0 if timeline_id is not in the cache (no savings).

CompressionStats Helpers

MethodFormula
compression_ratioencoded_size / original_size.max(1)
savings_percentage(1 - ratio) × 100, floored at 0
bytes_savedoriginal.saturating_sub(encoded)

MerkleChunker

Splits raw bytes into fixed-size chunks and computes a binary proof path:

// Default chunk size
MerkleChunker::default() → chunk_size: 4096
// Tree depth = ceil(log2(leaf_count))
pub fn compute_tree_depth(&mut self, leaf_count: usize) {
self.tree_depth = (leaf_count as f64).log2().ceil() as usize;
}
// Proof path: repeated right-shift
pub fn get_proof_path(&self, chunk_index: usize) -> Vec<usize> {
// idx >>= 1 at each tree level
}

The proof path returns sibling indices at each level, not hashes — callers supply the actual hash values.

Key Terms

  • delta_cache → in-memory HashMap<String, Vec<NodeID>>; not persisted across process restarts
  • encode_delta → returns branch-exclusive nodes (those not in canonical_ops)
  • decode_deltacanonical_ops ++ cached_deltas; errors if timeline_id absent
  • MerkleChunker → chunks bytes and generates binary proof paths; default chunk 4096 bytes

Q&A

Q: What happens if decode_delta is called for a timeline that was never encoded? A: Returns Err(format!("Delta not found for timeline: {}", timeline_id)).

Q: Does DeltaEncoder verify that canonical_ops are a true prefix of the branch, or just a set intersection? A: Set intersection — it filters by Vec::contains regardless of positional ordering. A branch node that matches a canonical node anywhere in the list is excluded from deltas.

Q: Is MerkleChunker used by DeltaEncoder or TemporalNavigationProtocol? A: Neither — it is defined in compression.rs alongside DeltaEncoder but not wired into any higher-level struct in the current source.

Examples

let canonical = vec![NodeID::new(b"c1".to_vec()), NodeID::new(b"c2".to_vec())];
let mut encoder = DeltaEncoder::new(canonical);
let branch_ops = vec![
NodeID::new(b"c1".to_vec()),
NodeID::new(b"branch-only".to_vec()),
];
let deltas = encoder.encode_delta("my-branch", &branch_ops);
// deltas = [NodeID("branch-only")] — c1 filtered out
let full = encoder.decode_delta("my-branch").unwrap();
// full = [c1, c2, branch-only]
let stats = CompressionStats::new("my-branch".to_string(), 3*32, 2*32+1*32, 1, 2);
// compression_ratio = 1.0 (encoded == original here)
// savings_percentage = 0.0

neighbors on the map