--- name: eval description: Generate evaluation specs with building and verification criteria. Use when setting up features, defining acceptance criteria, or before implementing anything significant. Triggers on "create evals", "set up verification", "define acceptance criteria", or "build [feature]". allowed-tools: Read, Grep, Glob, Write, Edit --- # Eval Skill Generate specs that define **what to build** and **how to verify it**. ## Output I create `.claude/evals/.yaml` with two sections: 1. **building_spec** — What the builder agent implements 2. **verification_spec** — What the verifier agent checks ## Format ```yaml name: feature-name description: One-line summary building_spec: description: What to build requirements: - Requirement 1 - Requirement 2 constraints: - Constraint 1 files: - suggested/file/paths.ts test_output: framework: pytest | vitest | jest path: tests/generated/ verification_spec: # Deterministic checks - type: command run: "npm test" expect: exit_code 0 # Agent checks - type: agent name: check-name prompt: | What to verify evidence: - screenshot: name - url: contains "pattern" generate_test: true ``` ## Workflow ### User Request ``` Create evals for user authentication ``` ### My Questions Before generating, I ask: 1. What auth method? (email/password, OAuth, magic link?) 2. UI, API, or both? 3. Specific security requirements? ### My Output `.claude/evals/auth.yaml`: ```yaml name: auth description: Email/password authentication with UI and API building_spec: description: | User authentication system with email/password. Secure password storage, JWT tokens, login/signup flows. requirements: - Password hashing with bcrypt (cost factor 12+) - JWT tokens with 24h expiry - POST /api/auth/login endpoint - POST /api/auth/signup endpoint - Login page at /login - Signup page at /signup - Protected route middleware constraints: - No plaintext passwords anywhere - Tokens must be httpOnly cookies or secure headers files: - src/auth/password.ts - src/auth/jwt.ts - src/auth/middleware.ts - src/routes/auth.ts - src/pages/login.tsx - src/pages/signup.tsx test_output: framework: pytest path: tests/generated/ verification_spec: # --- Deterministic --- - type: command run: "npm test -- --grep auth" expect: exit_code 0 - type: file-contains path: src/auth/password.ts pattern: "bcrypt" - type: file-not-contains path: src/ pattern: "password.*=.*plaintext" # --- Agent: API --- - type: agent name: api-login prompt: | Test login API: 1. POST /api/auth/signup with new user 2. Verify 201 response 3. POST /api/auth/login with same creds 4. Verify 200 with JWT token 5. POST /api/auth/login with wrong password 6. Verify 401 with helpful message evidence: - response: status 201 - response: status 200 - response: has "token" - response: status 401 generate_test: true # --- Agent: UI --- - type: agent name: ui-login prompt: | Test login UI: 1. Go to /login 2. Verify form has email + password fields 3. Submit with valid credentials 4. Verify redirect to /dashboard 5. Verify welcome message visible evidence: - screenshot: login-page - screenshot: after-login - url: contains "/dashboard" - element: '[data-testid="welcome"]' generate_test: true # --- Agent: Security --- - type: agent name: password-security prompt: | Verify password security: 1. Read src/auth/password.ts 2. Confirm bcrypt with cost >= 12 3. Confirm no password logging 4. Check signup doesn't echo password evidence: - text: "bcrypt" - text: "cost" or "rounds" generate_test: false # Code review, not repeatable test ``` ## Check Types ### Deterministic ```yaml - type: command run: "shell command" expect: exit_code 0 - type: command run: "curl localhost:3000/health" expect: contains: '"ok"' - type: file-exists path: src/file.ts - type: file-contains path: src/file.ts pattern: "regex pattern" - type: file-not-contains path: src/file.ts pattern: "bad pattern" ``` ### Agent ```yaml - type: agent name: descriptive-name # Used for evidence/test naming prompt: | Step-by-step verification evidence: - screenshot: step-name - url: contains "pattern" - element: "css-selector" - text: "expected text" - response: status 200 - response: has "field" generate_test: true | false ``` ## Best Practices ### Building Spec - **Be specific** — "bcrypt with cost 12" not "secure passwords" - **List files** — helps builder know where to put code - **State constraints** — what NOT to do matters ### Verification Spec - **Deterministic first** — fast, reliable checks - **Agent for semantics** — UI flows, code quality, error messages - **Evidence always** — no claim without proof - **generate_test for repeatables** — UI flows yes, code review no ### Naming - `name: feature-name` — lowercase, hyphens - `name: api-login` — for agent checks, descriptive ## What Happens Next After I create the spec: ``` /eval build auth ``` 1. Builder agent reads `building_spec`, implements 2. Verifier agent reads `verification_spec`, checks 3. If fail → builder gets feedback → fixes → verifier re-checks 4. Loop until pass 5. Agent checks become tests in `tests/generated/`