CRUMB a card from devarno-cloud

Automerge IPLD CRDT Integration

weave advanced 7 min read

ELI5

AutomergeState is a recipe card box where each recipe card (Operation) has a unique barcode (MsgID). Merging two boxes just means combining them without duplicates. The box has a running checksum (state_hash) that changes whenever a card is added, so two boxes that merged everything will have the same checksum.

Technical Deep Dive

Class Diagram

classDiagram
class Operation {
+MsgID id
+u8 op_type
+Vec~u8~ payload
+Option~MsgID~ parent
+u64 lamport
+new(id, op_type, payload, parent, lamport) Operation
+to_bytes() Vec~u8~
}
class AutomergeState {
+BTreeMap~MsgID Operation~ operations
+[u8; 32] state_hash
+u32 edit_count
+new() AutomergeState
+add_operation(op) Result
+merge(other) Result
+get_operation(id) Option~Operation~
+get_state() (u32, [u8;32])
+contains(id) bool
+operation_count() usize
}
AutomergeState --> Operation
Operation --> MsgID

Source: mesh-node/src/crdt.rs.

Operation Serialisation (to_bytes layout)

[0..31] id bytes (32 bytes, MsgID)
[32] op_type (1 byte)
[33..] payload (variable)
[+1] has_parent flag (0 or 1)
[+32?] parent bytes (32 bytes, present only if has_parent = 1)
[last 8] lamport (u64, little-endian)

Source: crdt.rs lines 37–51.

Merge & State Hash Sequence

sequenceDiagram
participant NodeA as AutomergeState A
participant NodeB as AutomergeState B
NodeA->>NodeA: add_operation(op1)
NodeA->>NodeA: recompute_state_hash()
NodeB->>NodeB: add_operation(op2)
NodeB->>NodeB: recompute_state_hash()
NodeA->>NodeA: merge(&NodeB)
Note right of NodeA: For each op in B.operations:\n if not in A → add_operation(op)
NodeA->>NodeA: recompute_state_hash() [called by add_operation]
NodeA-->>NodeB: (state_hash now matches after B.merge(&A))

state_hash Algorithm

recompute_state_hash() is NOT a standard cryptographic hash. It is an additive accumulator over BTreeMap iteration order:

for (i, (_id, op)) in self.operations.iter().enumerate() {
let op_bytes = op.to_bytes();
for (j, &byte) in op_bytes.iter().enumerate() {
hasher[(i + j) % 32] = hasher[(i + j) % 32].wrapping_add(byte);
}
}

Implications:

  • Not collision-resistant: different operation sets could produce the same hash.
  • Deterministic: BTreeMap iteration is sorted by MsgID (Ord-derived), so the same operation set always produces the same hash.
  • Intent: integrity fingerprint for quick equality checks, not a security primitive.

CRDT Merge Semantics

merge(&other) is a union merge: for each (id, op) in other.operations, if id is not in self.operations, call add_operation(op). Already-present operations are silently skipped (idempotent). This satisfies CRDT commutativity and idempotence properties.

Key Terms

  • Operation → A single Automerge change: id (MsgID), op_type (u8), payload, optional parent, lamport
  • AutomergeState → CRDT document state: union of all received operations indexed by MsgID
  • state_hash → 32-byte rolling accumulator over all operations in BTreeMap order; not cryptographically secure
  • edit_count → Monotonically increasing counter of total operations ever added (including from merges)
  • merge → Union merge: adds operations from other that are absent in self; idempotent and commutative

Q&A

Q: Is state_hash suitable for use as a Byzantine integrity check? A: No. The additive accumulator (wrapping_add) is not collision-resistant. Two different operation sets could produce the same hash. For Byzantine integrity, callers should use the MsgID set directly (compare operations.keys()) or apply BLAKE3 over the serialised operation set.

Q: What does op_type represent and where is it decoded? A: op_type is a u8 that is serialised into to_bytes() but has no enum definition in crdt.rs. It is passed opaquely from the application layer — the CRDT integration layer treats it as an identifier tag for the caller to interpret.

Q: Does merge() guarantee both peers reach the same state_hash? A: Yes, once both have merged all of each other’s operations. Since BTreeMap iterates in MsgID order (deterministic), and the accumulator formula is deterministic, any two AutomergeStates containing the same set of operations will compute identical state_hash values.

Examples

Round-trip merge test pattern (from crdt.rs lines 187–196):

state1.add_operation(create_test_op(4, 0, None)).unwrap(); // op id [4;32]
state2.add_operation(create_test_op(5, 0, None)).unwrap(); // op id [5;32]
state1.merge(&state2).unwrap();
assert_eq!(state1.operation_count(), 2);

After state2.merge(&state1), both states have operations {[4;32], [5;32]} and identical state_hash. The hash equality check in the determinism test (crdt.rs lines 177–183) confirms this property.

neighbors on the map