mirror of
https://github.com/harivansh-afk/eval-skill.git
synced 2026-04-17 13:05:07 +00:00
3.2 KiB
3.2 KiB
| description | argument-hint | allowed-tools |
|---|---|---|
| Run eval commands - list, show, or verify evals | list | show <name> | verify [name] | Read, Bash, Task |
/eval Command
Interface for the eval system. I dispatch to the right action.
Commands
/eval list
List all eval specs:
echo "Available evals:"
echo ""
for f in .claude/evals/*.yaml 2>/dev/null; do
if [ -f "$f" ]; then
name=$(basename "$f" .yaml)
desc=$(grep "^description:" "$f" | head -1 | sed 's/description: *//')
printf " %-20s %s\n" "$name" "$desc"
fi
done
If no evals exist:
No evals found in .claude/evals/
Create evals by asking: "Create evals for [feature]"
/eval show
Display an eval spec:
cat ".claude/evals/$1.yaml"
/eval verify [name]
Run verification. This spawns the eval-verifier subagent.
With name specified (/eval verify auth):
Delegate to eval-verifier agent:
Run the eval-verifier agent to verify .claude/evals/auth.yaml
The agent should:
1. Read the eval spec
2. Run all checks in the verify list
3. Collect evidence for agent checks
4. Generate tests where generate_test: true
5. Report results with evidence
Without name (/eval verify):
Run all evals:
Run the eval-verifier agent to verify all evals in .claude/evals/
For each .yaml file:
1. Read the eval spec
2. Run all checks
3. Collect evidence
4. Generate tests
5. Report results
Summarize overall results at the end.
/eval evidence
Show collected evidence for an eval:
echo "Evidence for: $1"
echo ""
if [ -f ".claude/evals/.evidence/$1/evidence.json" ]; then
cat ".claude/evals/.evidence/$1/evidence.json"
else
echo "No evidence collected yet. Run: /eval verify $1"
fi
/eval tests
List generated tests:
echo "Generated tests:"
echo ""
if [ -d "tests/generated" ]; then
ls -la tests/generated/
else
echo "No tests generated yet."
fi
/eval clean
Clean evidence and generated tests:
rm -rf .claude/evals/.evidence/
rm -rf tests/generated/
echo "Cleaned evidence and generated tests."
Workflow
1. Create eval spec
> Create evals for user authentication
2. List evals
> /eval list
3. Show specific eval
> /eval show auth
4. Run verification
> /eval verify auth
5. Check evidence
> /eval evidence auth
6. Run generated tests
> pytest tests/generated/
Output Examples
/eval list
Available evals:
auth Email/password authentication with UI and API
todo-api REST API for todo management
checkout E-commerce checkout flow
/eval verify auth
🔍 Eval: auth
═══════════════════════════════════════
Deterministic Checks:
✅ command: npm test -- --grep 'auth' (exit 0)
✅ file-contains: bcrypt in password.ts
Agent Checks:
✅ api-login: JWT returned correctly
📄 Test: tests/generated/test_auth_api_login.py
✅ ui-login: Dashboard redirect works
📸 Evidence: 2 screenshots
📄 Test: tests/generated/test_auth_ui_login.py
═══════════════════════════════════════
📊 Results: 4/4 passed