Triggering Tests for Skill Auto-Load
eva intermediate 4 min read
ELI5
A separate small judge reads the prompt’s description card — not the prompt itself — and decides whether a sample user request would trigger it. You list the requests you want it to grab (should_match) and the ones you want it to ignore (should_not_match); a YES/NO that disagrees fails the eval.
Technical Deep Dive
Block Shape (bin/eva:386-405, 734-760)
triggering: judge: model: claude-haiku-4-5-20251001 # falls back to defaults.model should_match: - "refactor src/foo.py to remove the global state" should_not_match: - "explain what this file does"eva doctor requires the block to be a mapping and to contain at least one of should_match / should_not_match; both lists must be non-empty strings.
Judge Prompt Construction (bin/eva:668-693)
The judge receives a fixed wrapper plus three composed sections:
DESCRIPTION:—meta.description(ormeta.summaryas fallback).POSITIVE TRIGGERS:— bulletedmeta.triggers.NEGATIVE TRIGGERS (do NOT use for):— bulletedmeta.not_for.USER QUERY:— the test string.
It must reply DECISION=<YES|NO> REASON=<one short sentence>. Anything else returns (None, "judge output unparseable: …") and counts as failure.
Sequence
sequenceDiagram participant E as eva eval participant M as meta.yml participant J as judge model E->>M: read description, triggers, not_for loop each should_match query E->>J: wrapper + DESCRIPTION + TRIGGERS + USER QUERY J-->>E: DECISION=YES|NO REASON=… E->>E: PASS iff YES end loop each should_not_match query E->>J: same wrapper, different query J-->>E: DECISION=YES|NO REASON=… E->>E: PASS iff NO end E->>E: include in passed/total + .eval.jsonlSkip Conditions
The triggering pass is skipped entirely when --case is set (single-case mode), and when the block is absent or both lists are empty (bin/eva:738).
Key Terms
- trigger judge — secondary
claudeinvocation evaluating description-shape alone, not prompt output. - DECISION=YES — the judge predicts the skill auto-loads for this query; required for
should_match, forbidden forshould_not_match. - default judge model — falls through
triggering.judge.model→defaults.model→ unset.
Q&A
Q: Which two list keys does the triggering block accept?
A: should_match and should_not_match; doctor requires at least one of them when the block is present (bin/eva:404-405).
Q: What format does the trigger judge return?
A: A single line DECISION=<YES|NO> REASON=<sentence> parsed by re.search(r"DECISION=(YES|NO)", stdout) (bin/eva:689).
Q: What gets sent to the judge alongside the user query?
A: The meta.description (or summary if missing), bulleted meta.triggers, bulleted meta.not_for, all wrapped in a fixed instruction telling the judge to decide based on description and triggers ONLY (bin/eva:668-685).
Examples
A negative trigger that prevents the refactor prompt from grabbing weather queries:
triggering: should_not_match: - "what's the weather in San Francisco"neighbors on the map
- eva eval Cases, Assertions & Judges adding a new case to eval.yml
- Skill Export Pipeline exporting a ready prompt as an Anthropic skill
- NFT-Style Capability Token System designing authorization for cross-system sprite access
- Airlock Cross-Apex JWT Handoff debugging users who land on stratt.dev unauthenticated despite an airlock session