Blake3 Canonical Serialisation Pipeline
grace advanced 8 min read
ELI5
Turning a YAML file into a fingerprint is a five-station assembly line. Parse, clean, sort, encode, hash. Skip a station or do it slightly wrong on one machine and the seal won’t match the seal stamped by another machine — same content, different fingerprint, FM-01.
Technical Deep Dive
Source: CANONICAL-SERIALISATION-SPEC.md (1215 lines), packages/fingerprint/src/{canonicalise,hash,verify}.ts.
Five Stages
| Stage | Input | Output | Operation |
|---|---|---|---|
| 1 | UTF-8 YAML bytes | JS object | YAML 1.2 core schema parse, merge:false, uniqueKeys:true |
| 2 | JS object | Cleaned object | Remove fingerprint (top-level only), recursive null removal, NFC normalise |
| 3 | Cleaned object | JSON string | Recursive key sort (UTF-16 code units), compact JSON.stringify |
| 4 | JSON string | UTF-8 bytes | TextEncoder.encode, no BOM, no trailing newline |
| 5 | UTF-8 bytes | 64-hex digest | Blake3-256, prefixed blake3: |
Pipeline
flowchart LR A[YAML source<br/>UTF-8 bytes] --> B[Stage 1<br/>parse YAML 1.2 core] B --> C[Stage 2a<br/>strip top-level fingerprint] C --> D[Stage 2b<br/>recursive null removal] D --> E[Stage 2c<br/>NFC unicode normalise] E --> F[Stage 3a<br/>sortKeysDeep UTF-16] F --> G[Stage 3b<br/>JSON.stringify no spaces] G --> H[Stage 4<br/>TextEncoder UTF-8] H --> I[Stage 5<br/>Blake3-256] I --> J[blake3:64-hex] style C fill:#fef3c7 style D fill:#fef3c7 style E fill:#fef3c7Stage 1 Required Config
parse(yamlString, { schema: 'core', version: '1.2', merge: false, uniqueKeys: true,})YAML 1.2 core schema means yes/no/on/off are strings, not booleans. Timestamp-like values are strings, not Date objects. Block scalars | preserve newlines; > folds them.
Stage 2 Critical Rules
- Field exclusion: only top-level
fingerprintis removed.modifiedcarries metadata and stays. - Null removal: recursive at all nesting levels. Empty arrays
[]and empty objects{}are preserved (TV-06). - NFC normalisation: applied to all string keys and values. Decomposed and precomposed forms of
émust produce identical fingerprints (TV-13a/b).
Divergence from RFC 8785 (JCS)
| Aspect | RFC 8785 | STRATT |
|---|---|---|
| Input | JSON | YAML |
| Field exclusion | none | top-level fingerprint |
| Null handling | preserved | removed |
| Unicode normalisation | none | NFC |
| Key sort, compact, escaping | aligned | aligned |
Verification
packages/fingerprint/src/verify.ts:10-54:
- Extract stored
fingerprint. - Recompute pipeline on the object minus
fingerprint. - Compare full strings →
verified|tampered|error.
Critical Constraint
blake3-wasm is pinned to exact 2.1.5. Version 3.0.0 introduces an async API that breaks deterministic synchronous hashing call sites.
Key Terms
- NFC → Unicode Normalisation Form C (Canonical Decomposition + Canonical Composition); collapses visually-identical sequences to one byte form.
- TV-XX → One of 14 reference test vectors in Section 10 of
CANONICAL-SERIALISATION-SPEC.md; conformance gate for any reimplementation. - Tampered → Verification result when computed digest ≠ stored digest; surfaces as FM-01.
Q&A
Q: Why YAML 1.2 core specifically?
A: YAML 1.1 promotes yes/no/on/off to booleans, which would change the canonical form depending on how the file is written. Core schema makes them strings — bytes-in-bytes-out determinism.
Q: Why is the fingerprint field stripped before hashing? A: A digest cannot contain itself. Stripping it lets the same canonical pipeline be used to compute and to verify.
Q: What if my YAML uses merge keys (<<)?
A: merge:false disables them. Stage 1 fails. Merge keys are forbidden because they introduce structural ambiguity that defeats canonicalisation.
Examples
Two contributors author the same role unit. One types é as U+00E9, the other as U+0065+U+0301. Without Stage 2c, their JSON strings differ in byte sequence and produce different Blake3 digests — FM-01 false positive. NFC collapses both to U+00E9 before serialisation.
neighbors on the map
- Blake3 5-Stage Canonical Fingerprinting (DEC-005) implementing or debugging fingerprint computation
- SPUH 16-byte Routing Header writing a fast unit-routing path that cannot afford full Blake3 verification
- Canonical Serialisation Pipeline reproducing a fingerprint mismatch across languages
- Blake3 Fingerprint API verifying a unit hasn't been tampered with