Delta Compression & Merkle Chunking
tnp intermediate 7 min read
ELI5
Delta compression is like packing for a trip: instead of two full suitcases, you pack one complete bag (canonical) plus a small bag of only the extras (deltas). When you land, you unzip both bags together to reconstruct the full wardrobe.
Technical Deep Dive
Class Diagram
classDiagram class DeltaEncoder { -canonical_ops: Vec~NodeID~ -delta_cache: HashMap~String, Vec~NodeID~~ +new(canonical_ops: Vec~NodeID~) DeltaEncoder +encode_delta(timeline_id: &str, ops: &[NodeID]) Vec~NodeID~ +decode_delta(timeline_id: &str) Result~Vec~NodeID~~ +compression_ratio(timeline_id: &str, original_size: usize) f64 } class CompressionStats { +timeline_id: String +original_size_bytes: usize +encoded_size_bytes: usize +compression_ratio: f64 +delta_count: usize +canonical_count: usize +savings_percentage() f64 +bytes_saved() usize } class MerkleChunker { -chunk_size: usize -tree_depth: usize +new(chunk_size: usize) MerkleChunker +chunk(data: &[u8]) Vec~Vec~u8~~ +compute_tree_depth(leaf_count: usize) +get_proof_path(chunk_index: usize) Vec~usize~ }Encode/Decode Pipeline
flowchart LR BRANCH["branch ops\n(full node list)"] --> ENC["encode_delta:\nfilter out canonical_ops"] ENC --> CACHE["delta_cache\n{timeline_id → Vec<NodeID>}"] CACHE --> DEC["decode_delta:\ncanonical_ops ++ deltas"] DEC --> FULL["reconstructed full\nnode list"]encode_delta retains only nodes not present in canonical_ops (linear scan via Vec::contains). The result is stored in delta_cache keyed by a caller-supplied timeline_id string.
compression_ratio Formula
let encoded_size = self.canonical_ops.len() * 32 + deltas.len() * 32;encoded_size as f64 / original_size as f64- Assumes each
NodeIDoccupies 32 bytes (SHA-256 length convention; actualNodeID.bytescan be any length). - Does not account for the overhead of the
delta_cacheHashMap itself. - Returns
1.0iftimeline_idis not in the cache (no savings).
CompressionStats Helpers
| Method | Formula |
|---|---|
compression_ratio | encoded_size / original_size.max(1) |
savings_percentage | (1 - ratio) × 100, floored at 0 |
bytes_saved | original.saturating_sub(encoded) |
MerkleChunker
Splits raw bytes into fixed-size chunks and computes a binary proof path:
// Default chunk sizeMerkleChunker::default() → chunk_size: 4096
// Tree depth = ceil(log2(leaf_count))pub fn compute_tree_depth(&mut self, leaf_count: usize) { self.tree_depth = (leaf_count as f64).log2().ceil() as usize;}
// Proof path: repeated right-shiftpub fn get_proof_path(&self, chunk_index: usize) -> Vec<usize> { // idx >>= 1 at each tree level}The proof path returns sibling indices at each level, not hashes — callers supply the actual hash values.
Key Terms
- delta_cache → in-memory
HashMap<String, Vec<NodeID>>; not persisted across process restarts - encode_delta → returns branch-exclusive nodes (those not in
canonical_ops) - decode_delta →
canonical_ops ++ cached_deltas; errors if timeline_id absent - MerkleChunker → chunks bytes and generates binary proof paths; default chunk 4096 bytes
Q&A
Q: What happens if decode_delta is called for a timeline that was never encoded?
A: Returns Err(format!("Delta not found for timeline: {}", timeline_id)).
Q: Does DeltaEncoder verify that canonical_ops are a true prefix of the branch, or just a set intersection?
A: Set intersection — it filters by Vec::contains regardless of positional ordering. A branch node that matches a canonical node anywhere in the list is excluded from deltas.
Q: Is MerkleChunker used by DeltaEncoder or TemporalNavigationProtocol?
A: Neither — it is defined in compression.rs alongside DeltaEncoder but not wired into any higher-level struct in the current source.
Examples
let canonical = vec![NodeID::new(b"c1".to_vec()), NodeID::new(b"c2".to_vec())];let mut encoder = DeltaEncoder::new(canonical);
let branch_ops = vec![ NodeID::new(b"c1".to_vec()), NodeID::new(b"branch-only".to_vec()),];let deltas = encoder.encode_delta("my-branch", &branch_ops);// deltas = [NodeID("branch-only")] — c1 filtered out
let full = encoder.decode_delta("my-branch").unwrap();// full = [c1, c2, branch-only]
let stats = CompressionStats::new("my-branch".to_string(), 3*32, 2*32+1*32, 1, 2);// compression_ratio = 1.0 (encoded == original here)// savings_percentage = 0.0neighbors on the map
- TNP System Architecture Overview orienting a new contributor to the tnp-node codebase
- Fork Detection & Storage Analysis deciding whether to prune stale branches before storage overhead exceeds the configured cap
- FNP Cost Optimization & Karpenter optimizing cloud infrastructure costs