Delta Compression & Merkle Chunking

tnp intermediate 7 min read

ELI5

Delta compression is like packing for a trip: instead of two full suitcases, you pack one complete bag (canonical) plus a small bag of only the extras (deltas). When you land, you unzip both bags together to reconstruct the full wardrobe.

Technical Deep Dive

Class Diagram

classDiagram
    class DeltaEncoder {
        -canonical_ops: Vec~NodeID~
        -delta_cache: HashMap~String, Vec~NodeID~~
        +new(canonical_ops: Vec~NodeID~) DeltaEncoder
        +encode_delta(timeline_id: &str, ops: &[NodeID]) Vec~NodeID~
        +decode_delta(timeline_id: &str) Result~Vec~NodeID~~
        +compression_ratio(timeline_id: &str, original_size: usize) f64
    }
    class CompressionStats {
        +timeline_id: String
        +original_size_bytes: usize
        +encoded_size_bytes: usize
        +compression_ratio: f64
        +delta_count: usize
        +canonical_count: usize
        +savings_percentage() f64
        +bytes_saved() usize
    }
    class MerkleChunker {
        -chunk_size: usize
        -tree_depth: usize
        +new(chunk_size: usize) MerkleChunker
        +chunk(data: &[u8]) Vec~Vec~u8~~
        +compute_tree_depth(leaf_count: usize)
        +get_proof_path(chunk_index: usize) Vec~usize~
    }

Encode/Decode Pipeline

flowchart LR
    BRANCH["branch ops\n(full node list)"] --> ENC["encode_delta:\nfilter out canonical_ops"]
    ENC --> CACHE["delta_cache\n{timeline_id → Vec<NodeID>}"]
    CACHE --> DEC["decode_delta:\ncanonical_ops ++ deltas"]
    DEC --> FULL["reconstructed full\nnode list"]

encode_delta retains only nodes not present in canonical_ops (linear scan via Vec::contains). The result is stored in delta_cache keyed by a caller-supplied timeline_id string.

compression_ratio Formula

let encoded_size = self.canonical_ops.len() * 32 + deltas.len() * 32;
encoded_size as f64 / original_size as f64

Assumes each NodeID occupies 32 bytes (SHA-256 length convention; actual NodeID.bytes can be any length).
Does not account for the overhead of the delta_cache HashMap itself.
Returns 1.0 if timeline_id is not in the cache (no savings).

CompressionStats Helpers

Method	Formula
`compression_ratio`	`encoded_size / original_size.max(1)`
`savings_percentage`	`(1 - ratio) × 100`, floored at 0
`bytes_saved`	`original.saturating_sub(encoded)`

MerkleChunker

Splits raw bytes into fixed-size chunks and computes a binary proof path:

// Default chunk size
MerkleChunker::default() → chunk_size: 4096

// Tree depth = ceil(log2(leaf_count))
pub fn compute_tree_depth(&mut self, leaf_count: usize) {
    self.tree_depth = (leaf_count as f64).log2().ceil() as usize;
}

// Proof path: repeated right-shift
pub fn get_proof_path(&self, chunk_index: usize) -> Vec<usize> {
    // idx >>= 1 at each tree level
}

The proof path returns sibling indices at each level, not hashes — callers supply the actual hash values.

Key Terms

delta_cache → in-memory HashMap<String, Vec<NodeID>>; not persisted across process restarts
encode_delta → returns branch-exclusive nodes (those not in canonical_ops)
decode_delta → canonical_ops ++ cached_deltas; errors if timeline_id absent
MerkleChunker → chunks bytes and generates binary proof paths; default chunk 4096 bytes

Q&A

Q: What happens if decode_delta is called for a timeline that was never encoded? A: Returns Err(format!("Delta not found for timeline: {}", timeline_id)).

Q: Does DeltaEncoder verify that canonical_ops are a true prefix of the branch, or just a set intersection? A: Set intersection — it filters by Vec::contains regardless of positional ordering. A branch node that matches a canonical node anywhere in the list is excluded from deltas.

Q: Is MerkleChunker used by DeltaEncoder or TemporalNavigationProtocol? A: Neither — it is defined in compression.rs alongside DeltaEncoder but not wired into any higher-level struct in the current source.

Examples

let canonical = vec![NodeID::new(b"c1".to_vec()), NodeID::new(b"c2".to_vec())];
let mut encoder = DeltaEncoder::new(canonical);

let branch_ops = vec![
    NodeID::new(b"c1".to_vec()),
    NodeID::new(b"branch-only".to_vec()),
];
let deltas = encoder.encode_delta("my-branch", &branch_ops);
// deltas = [NodeID("branch-only")]  — c1 filtered out

let full = encoder.decode_delta("my-branch").unwrap();
// full = [c1, c2, branch-only]

let stats = CompressionStats::new("my-branch".to_string(), 3*32, 2*32+1*32, 1, 2);
// compression_ratio = 1.0 (encoded == original here)
// savings_percentage = 0.0

neighbors on the map

TNP System Architecture Overview orienting a new contributor to the tnp-node codebase
Fork Detection & Storage Analysis deciding whether to prune stale branches before storage overhead exceeds the configured cap
FNP Cost Optimization & Karpenter optimizing cloud infrastructure costs