CRUMB a card from devarno-cloud

Promotion Lifecycle Gates

eva intermediate 5 min read

ELI5

A prompt grows up through three school grades. To leave kindergarten (draft) you need three signed homework slips. To leave middle school (tested) you need ten signed slips and a passing report card (eva eval green). The principal (--force) can sign anyone through, but logs a yellow warning.

Technical Deep Dive

State Machine

stateDiagram-v2
[*] --> draft
draft --> tested : promote (≥ min_uses successful runs)
tested --> ready : promote (≥ min_uses verified runs + green eval)
ready --> [*]
draft --> tested : --force (warning)
tested --> ready : --force (warning)

Gate Definitions (bin/eva:831-883)

TransitionDefault min_usesCounted asEval required?
draft → tested3rows in .usage.jsonl where sent=true and verified ≠ false (i.e. true OR null)no
tested → ready10rows where sent=true and verified == true (strict)yes — eval.yml present and last .eval.jsonl row all_passed=true

Per-prompt overrides live in meta.promotion.{min_uses, require_eval}. --force prints each unmet gate to stderr as a yellow warning then proceeds.

Decision Flow

flowchart TD
start["eva promote <id>"] --> cur{current status}
cur -- ready --> halt["die: cannot promote (already at top)"]
cur -- draft --> g1["successful = sent AND verified ≠ false"]
cur -- tested --> g2["verified = sent AND verified == true"]
g1 --> ok1{count ≥ min_uses?}
g2 --> ok2{count ≥ min_uses AND eval green?}
ok1 -- no --> fail["red: list failed gates"]
ok2 -- no --> fail
ok1 -- yes --> bump["set status=tested; updated=today"]
ok2 -- yes --> bump2["set status=ready; updated=today"]
fail --> force{--force?}
force -- yes --> bump
force -- no --> exit1["return 1"]

Versioning Convention

Per README: patch = wording, minor = section/constraint change, major = structural rewrite or first promotion to ready. Breaking changes to a ready prompt are forked: eva new <id>-v2 --from <id> restarts at draft.

Key Terms

  • successful runsent: true row whose verified field is true OR null; counts toward tested gate.
  • verified runsent: true row whose verified field is strictly true; counts toward ready gate.
  • promotion gates — the meta.promotion mapping lets a prompt override min_uses and require_eval.

Q&A

Q: How many successful sent runs are needed for draft → tested by default? A: Three. min_uses defaults to 3 for the tested target (bin/eva:848).

Q: What additional artefact is required for tested → ready? A: An eval.yml whose most recent eva eval recorded all_passed: true in .eval.jsonl (bin/eva:854-863). Bypassable with meta.promotion.require_eval: false.

Q: How does —force differ from a clean promotion? A: Same status bump and updated stamp, but unmet gates are written to stderr in yellow rather than blocking the bump (bin/eva:868-872).

Examples

Block of meta.yml lowering the gate for an experimental prompt:

promotion:
min_uses: 1
require_eval: false

neighbors on the map