CRUMB a card from devarno-cloud

LSEQ Position IDs

chronicle advanced 7 min read

ELI5

LSEQ assigns every character a “shelf coordinate” like aisle.shelf|sub-aisle.sub-shelf. To insert a new character between two existing ones, you find a free slot at the same depth; if there’s no room, you go one level deeper, where the address space doubles. Sorting by the coordinate string gives you the document in order.

Technical Deep Dive

String Format

alloc.siteId|alloc.siteId|...| separates levels, . separates fields within a level (app/src/services/lseq.ts:57). Example: 8.alice|72.bob → level 0 allocation 8 by alice, level 1 allocation 72 by bob.

Allocation Space

classDiagram
class LseqLevel {
+number allocation
+string siteId
}
class LseqPosition {
+LseqLevel[] levels
}
class Constants {
<<const>>
BASE_BITS = 4
BASE = 16
MAX_DEPTH = 32
BOUNDARY = 10
}
LseqPosition o-- LseqLevel

maxAllocationAtDepth(depth) = 2^(BASE_BITS + depth) - 1. So depth 0 has 16 slots (0–15), depth 1 has 32, doubling each level. MAX_DEPTH = 32 caps runaway growth (worst case ~64 chars per ID).

Comparison Order

comparePositions walks levels top-down: shorter prefix wins ties at a shared level only when allocations match (see lseq.ts:90). Within the same allocation, siteId lexicographic order breaks ties — this is what makes concurrent inserts at the “same” slot deterministic.

Generation Strategy

flowchart TD
Start[generatePositionBetween lower upper siteId] --> Loop{depth < MAX_DEPTH}
Loop -- no --> Final[push max/2 fallback]
Loop -- yes --> Bounds[lowerAlloc = lower.levels d ?? 0<br/>upperAlloc = upper.levels d ?? max d]
Bounds --> Gap{upperAlloc - lowerAlloc > 1?}
Gap -- yes --> Choose[chooseAllocation: boundary+ or boundary-<br/>by hashSiteId parity and depth%2]
Choose --> Done[push allocation, return]
Gap -- no --> Equal{lowerAlloc == upperAlloc?}
Equal -- yes --> Match[push lowerAlloc with lowerLevel.siteId; deepen]
Equal -- no --> Adjacent[push lowerAlloc with lowerLevel.siteId<br/>upper := null at next depth]
Match --> Loop
Adjacent --> Loop

The “boundary” strategy (BOUNDARY = 10) keeps allocations close to one end of the gap so future inserts on either side still have room. depth % 2 alternates the bias, producing the LSEQ characteristic that uniformly randomises which side fills first.

Site ID Hashing

hashSiteId is a 31-multiplier polynomial hash truncated to u32. Used only to deterministically pick boundary direction; not a security primitive.

Initial Position

generateInitialPosition(siteId) returns a single level at floor(max/2) = 7 so subsequent inserts have headroom on both sides.

Key Terms

  • boundary strategy → bias inserts close to one end of the free gap to leave room for future neighbours.
  • tombstone → not part of LSEQ itself; the CRDT layer marks deleted chars as deleted but keeps their position so concurrent inserts targeting that position resolve correctly.
  • siteId → caller-supplied client identifier, written verbatim into every level the client allocates; affects both ordering tiebreak and boundary-direction hash.

Q&A

Q: Why store siteId per level instead of once per position? A: Two clients can co-author the same position string at different depths; level-local siteIds preserve who created which prefix and break ordering ties locally.

Q: What stops position IDs from growing forever? A: MAX_DEPTH = 32 is a hard cap — beyond that the algorithm uses a midpoint fallback. The boundary strategy is what keeps typical ID length short in practice.

Q: Is the order purely lexicographic on the encoded string? A: No — comparePositionStrings decodes first and compares numerically per level, then by siteId. Direct string compare would put 10.x before 2.x.

Examples

Inserting between 8.alice and 9.alice finds gap = 0 → must deepen. The algorithm pushes 8.alice then descends with lower={} and upper=null, choosing a midpoint at depth 1 (max 31 → ~15) → final 8.alice|15.bob.

neighbors on the map