CRUMB a card from devarno-cloud

Blake3 5-Stage Canonical Fingerprinting (DEC-005)

iris advanced 7 min read

ELI5

Imagine you want to give every person a unique ID number based on everything about them — but you don’t include their ID card number or the date they got it, because those change. You write down their name, skills, and job description in the exact same order every time, convert it to a special code, and run it through a super-fast math machine called Blake3. The result is a 64-character fingerprint that proves “this is exactly this person.”

Technical Deep Dive

The 5-Stage Canonical Pipeline (DEC-005)

flowchart LR
A["Stage 1<br/>YAML/JSON Parse"] --> B["Stage 2<br/>Object Transform"]
B --> C["Stage 3<br/>Canonical JSON"]
C --> D["Stage 4<br/>UTF-8 Encode"]
D --> E["Stage 5<br/>Blake3 Hash"]
E --> F["64-char hex digest"]

Stage 1: YAML/JSON Parse

Input is parsed into a native Python object (dict/list/scalar) using strict YAML 1.2 or JSON parsing. This happens externally before FingerprintEngine.compute_hash() is called.

Stage 2: Object Transform

Recursively normalizes the data structure for deterministic hashing:

def _normalize_object(obj):
if isinstance(obj, dict):
# Sort keys alphabetically
# Exclude: id, fingerprint, $schema
return {
k: _normalize_object(v)
for k, v in sorted(obj.items())
if k not in ("id", "fingerprint", "$schema")
}
elif isinstance(obj, list):
return [_normalize_object(item) for item in obj]
else:
return obj # Scalars pass through

Stripped fields:

  • id — server-generated, non-deterministic
  • fingerprint — circular (hash of a hash)
  • $schema — metadata, not semantic content
  • created — timestamp, changes on every creation

Stage 3: Canonical JSON Serialization

canonical = json.dumps(
normalized,
separators=(",", ":"), # No whitespace
sort_keys=True, # Deterministic key order
ensure_ascii=True # ASCII-only output
)

This produces a compact, fully deterministic JSON string with no formatting variation.

Stage 4: UTF-8 Encoding

encoded = canonical.encode("utf-8") # No BOM, no trailing newline

Stage 5: Blake3 Hash

import blake3
digest = blake3.blake3(encoded).hexdigest() # 64-char lowercase hex

SHA-256 fallback: If blake3 is not installed, falls back to hashlib.sha256(encoded).hexdigest().

Performance Comparison

AlgorithmSpeed (single-core)SecuritySetup
Blake3~3 GB/s128-bitTrustless (no setup)
SHA-256~1 GB/s128-bitTrustless
SHA-1~2 GB/sBrokenTrustless

Blake3 was chosen because it is approximately 3× faster than SHA-256 while maintaining equivalent cryptographic strength. STRATT already uses Blake3 with 14 test vectors and 98 tests, enabling cross-system reuse.

Fingerprint Format

packet-beta
title Fingerprint Structure
0-5: "prefix"
6-69: "hash"

The stored fingerprint is a string with a colon separator:

  • blake3:a1b2c3d4... (64 hex chars after colon)
  • sha256:a1b2c3d4... (fallback)

Verification Flow

sequenceDiagram
participant Client
participant Service as iris-service
participant Engine as FingerprintEngine
participant SDK as iris-sdk
Client->>Service: GET /v1/sprites/{id}/fingerprint
Service->>Engine: verify_fingerprint(sprite, stored_hash)
Engine->>SDK: compute_hash(sprite)
SDK->>SDK: Stage 1: Parse
SDK->>SDK: Stage 2: Normalize (strip id/fingerprint/$schema)
SDK->>SDK: Stage 3: Canonical JSON
SDK->>SDK: Stage 4: UTF-8 encode
SDK->>SDK: Stage 5: Blake3 hash
SDK-->>Engine: computed_hash
Engine->>Engine: case-insensitive comparison
Engine-->>Service: verified: true/false
Service-->>Client: Fingerprint {verified, stored, computed, timestamp}

STRATT Compatibility

The 5-stage pipeline is identical to STRATT’s canonical hashing, enabling cross-system verification:

# IRIS computes fingerprint
iris_hash = FingerprintEngine.compute_hash(sprite)
# MERIDIAN computes fingerprint from same canonical data
meridian_hash = StrattFingerprintEngine.compute_hash(sprite)
# If both follow DEC-005: iris_hash == meridian_hash

This is the foundation of the iris-meridian-adapter’s VerifyFingerprint RPC.

Key Terms

  • Canonical serialization → A deterministic, unambiguous text representation of data (same input always produces same bytes)
  • Blake3 → A cryptographic hash function ~3× faster than SHA-256, trustless, no setup required
  • 5-stage pipeline → The canonical hash process: parse → normalize → canonical JSON → UTF-8 → hash
  • Fingerprint → A 256-bit (64 hex char) identity hash that uniquely identifies a sprite’s semantic content
  • STRATT-compatible → Uses the same canonical pipeline as the STRATT protocol, enabling cross-system verification
  • SHA-256 fallback → Alternative hash algorithm if Blake3 library is unavailable

Q&A

Q: If I change only whitespace in a YAML file, does the fingerprint change? A: No. Stage 2 normalizes the parsed object (not the raw text), and Stage 3 uses compact JSON with no whitespace. Formatting changes do not affect the fingerprint.

Q: What if I add a new field to the sprite? A: The fingerprint will change because the canonical representation now includes the new field. This is intentional — it proves the sprite’s content has semantically changed.

Q: Why exclude id and created from the hash? A: These are server-generated, non-deterministic fields. Two logically identical sprites created at different times should have the same fingerprint. Only semantic fields (name, version, capabilities, system_prompt, etc.) are hashed.

Q: Can two different sprites produce the same fingerprint? A: In theory, hash collisions are possible (probability ~2^-256 for Blake3). In practice, this is cryptographically negligible — less likely than a cosmic ray flipping a bit in your CPU.

Q: How does the TypeScript SDK compute fingerprints? A: The TypeScript SDK defines FingerprintData types but delegates fingerprint computation to the server. The Python SDK (iris-sdk-python) contains the canonical implementation used by both the SDK CLI and iris-service.

Examples

Fingerprinting is like a master bookbinder identifying a first edition:

  • You don’t look at the library checkout stamp (id) or the date it was acquired (created)
  • You examine the actual content: paper type, typeface, ink composition, page dimensions
  • You record these in the exact same order every time (canonical)
  • You run them through a precise chemical analysis (Blake3)
  • The result is a unique “fingerprint” that proves “this is a genuine 1925 first edition, not a 1970 reprint”

neighbors on the map