init

2026-04-15 06:04:42 +00:00 · 2026-01-14 00:07:28 -08:00 · 2026-01-14 00:07:28 -08:00 · aca2126c88
commit aca2126c88
6 changed files with 1233 additions and 0 deletions
--- a/commands/eval.md
+++ b/commands/eval.md
@ -0,0 +1,169 @@
+---
+description: Run eval commands - list, show, or verify evals
+argument-hint: list | show <name> | verify [name]
+allowed-tools: Read, Bash, Task
+---
+
+# /eval Command
+
+Interface for the eval system. I dispatch to the right action.
+
+## Commands
+
+### /eval list
+
+List all eval specs:
+
+```bash
+echo "Available evals:"
+echo ""
+for f in .claude/evals/*.yaml 2>/dev/null; do
+  if [ -f "$f" ]; then
+    name=$(basename "$f" .yaml)
+    desc=$(grep "^description:" "$f" | head -1 | sed 's/description: *//')
+    printf "  %-20s %s\n" "$name" "$desc"
+  fi
+done
+```
+
+If no evals exist:
+```
+No evals found in .claude/evals/
+
+Create evals by asking: "Create evals for [feature]"
+```
+
+### /eval show <name>
+
+Display an eval spec:
+
+```bash
+cat ".claude/evals/$1.yaml"
+```
+
+### /eval verify [name]
+
+Run verification. This spawns the `eval-verifier` subagent.
+
+**With name specified** (`/eval verify auth`):
+
+Delegate to eval-verifier agent:
+```
+Run the eval-verifier agent to verify .claude/evals/auth.yaml
+
+The agent should:
+1. Read the eval spec
+2. Run all checks in the verify list
+3. Collect evidence for agent checks
+4. Generate tests where generate_test: true
+5. Report results with evidence
+```
+
+**Without name** (`/eval verify`):
+
+Run all evals:
+```
+Run the eval-verifier agent to verify all evals in .claude/evals/
+
+For each .yaml file:
+1. Read the eval spec
+2. Run all checks
+3. Collect evidence
+4. Generate tests
+5. Report results
+
+Summarize overall results at the end.
+```
+
+### /eval evidence <name>
+
+Show collected evidence for an eval:
+
+```bash
+echo "Evidence for: $1"
+echo ""
+if [ -f ".claude/evals/.evidence/$1/evidence.json" ]; then
+  cat ".claude/evals/.evidence/$1/evidence.json"
+else
+  echo "No evidence collected yet. Run: /eval verify $1"
+fi
+```
+
+### /eval tests
+
+List generated tests:
+
+```bash
+echo "Generated tests:"
+echo ""
+if [ -d "tests/generated" ]; then
+  ls -la tests/generated/
+else
+  echo "No tests generated yet."
+fi
+```
+
+### /eval clean
+
+Clean evidence and generated tests:
+
+```bash
+rm -rf .claude/evals/.evidence/
+rm -rf tests/generated/
+echo "Cleaned evidence and generated tests."
+```
+
+## Workflow
+
+```
+1. Create eval spec
+   > Create evals for user authentication
+
+2. List evals
+   > /eval list
+   
+3. Show specific eval  
+   > /eval show auth
+
+4. Run verification
+   > /eval verify auth
+   
+5. Check evidence
+   > /eval evidence auth
+
+6. Run generated tests
+   > pytest tests/generated/
+```
+
+## Output Examples
+
+### /eval list
+
+```
+Available evals:
+
+  auth                 Email/password authentication with UI and API
+  todo-api             REST API for todo management
+  checkout             E-commerce checkout flow
+```
+
+### /eval verify auth
+
+```
+🔍 Eval: auth
+═══════════════════════════════════════
+
+Deterministic Checks:
+  ✅ command: npm test -- --grep 'auth' (exit 0)
+  ✅ file-contains: bcrypt in password.ts
+
+Agent Checks:
+  ✅ api-login: JWT returned correctly
+     📄 Test: tests/generated/test_auth_api_login.py
+  ✅ ui-login: Dashboard redirect works
+     📸 Evidence: 2 screenshots
+     📄 Test: tests/generated/test_auth_ui_login.py
+
+═══════════════════════════════════════
+📊 Results: 4/4 passed
+```