.usage.jsonl Append Format
eva beginner 4 min read
ELI5
Every kick run drops one line — a single JSON sticker — into the prompt’s logbook. The sticker says when, with what inputs (hashed), whether it was sent, what the verifier thought, and how big and fast the round-trip was. Stickers never get edited; they just stack.
Technical Deep Dive
Schema (bin/kick:83-98)
| Field | Type | Notes |
|---|---|---|
ts | ISO-8601 UTC | iso_now; appended at end of run |
case | string | null | The --case name, or null |
vars_hash | 12-char hex | sha256 of sorted KEY=VALUE lines; literal "none" if no vars |
sent | bool | true for --send, false for dry-render |
exit_code | int | claude’s exit (PIPESTATUS[1]); 0 for dry-render |
verified | bool | null | from verify.sh; null when no verifier |
duration_ms | int | ms_now end − start |
prompt_words | int | awk NF-tokenised count of rendered prompt |
output_words | int | awk NF-count of claude stdout (0 for dry-render) |
Class Diagram
classDiagram class UsageRow { +ts : iso8601 +case : string? +vars_hash : hex12 +sent : bool +exit_code : int +verified : bool? +duration_ms : int +prompt_words : int +output_words : int } class GateConsumer { +draft_to_tested(); +tested_to_ready() } class PerfSummary { +median_duration_ms; +median_prompt_words; +median_output_words } GateConsumer --> UsageRow : reads sent + verified PerfSummary --> UsageRow : medians over sent rowsConsumers
usage_summary(bin/eva:132-161): countstotal,sent,verified_true|false|null, lastts, and median ofduration_ms/prompt_words/output_wordsacrosssentrows. Surfaced byeva show.cmd_log(bin/eva:260-277): tail-prints recent rows.cmd_promote(bin/eva:840-866): the gate counter — splits rows onsentandverifiedfor the lifecycle thresholds in eva-003.
A dry-render (no --send) still appends a row with sent: false, so eva show and eva log reflect template iteration too — but those rows do not count toward any promotion gate.
Key Terms
- vars_hash — first 12 hex chars of
sha256over the sortedKEY=VALUElines; gives a stable signature for “same inputs, different run”. - PIPESTATUS[1] — bash idiom; here used to capture claude’s exit code through the
teepipe (bin/kick:196-197). - sent row — the universe used by every aggregator that cares about real model calls.
Q&A
Q: What is vars_hash and how is it computed?
A: First 12 hex chars of sha256sum over the lines from sorting the VARS array (bin/kick:71-78). Empty VARS produces the literal string "none".
Q: Which fields feed eva show’s perf medians?
A: duration_ms, prompt_words, output_words — medians taken across rows where sent: true (bin/eva:140-160).
Q: Why does a dry run still get a row in .usage.jsonl?
A: kick always calls append_usage at the bottom of the non-send branch (bin/kick:213) so authors can see iteration history. The row carries sent: false, exit_code: 0, verified: null, output_words: 0, which keeps it out of every gate counter.
Examples
Sample row:
{"ts":"2026-05-05T09:31:00Z","case":"happy-path","vars_hash":"a3f0…b1","sent":true,"exit_code":0,"verified":true,"duration_ms":18342,"prompt_words":612,"output_words":1483}neighbors on the map
- Promotion Lifecycle Gates promoting a prompt from draft to tested
- guard.sh & verify.sh Hook Contract writing a pre-send validator that aborts bad inputs
- FNP Observability & Prometheus Metrics monitoring FNP systems
- Run Outcome Classification interpreting a History row's status pill