This commit is contained in:
Harivansh Rathi 2026-01-11 17:50:26 -05:00
parent 4b24606d0e
commit 9297f0b1ee
13 changed files with 1292 additions and 16 deletions

View file

@ -0,0 +1,72 @@
# EvalSpec Schema Reference
## Assertion Types
### Deterministic Assertions (for pure functions, exact outputs)
| Type | Properties | Use Case |
|------|------------|----------|
| `equals` | `expected`, `path?` | Exact value match |
| `contains` | `value`, `path?` | Substring or array element |
| `throws` | `errorType?`, `messageContains?` | Exception expected |
| `typeof` | `expected`, `path?` | Type checking |
| `matches` | `pattern`, `path?` | Regex pattern match |
| `truthy`/`falsy` | `path?` | Boolean coercion |
| `custom` | `description`, `check` | Complex validation |
### LLM Rubric Assertions (for subjective quality, UI, user experience)
| Type | Properties | Use Case |
|------|------------|----------|
| `llm-rubric` | `rubric`, `criteria[]`, `passingThreshold?` | Quality evaluation by Claude |
**When to use LLM rubrics:**
- Error message quality (is it helpful? actionable?)
- UI component output (does it render correctly? accessible?)
- API response format (well-structured? consistent?)
- Generated content quality (documentation, code suggestions)
**Example:**
```json
{
"type": "llm-rubric",
"rubric": "error-message-quality",
"criteria": ["clarity", "actionability", "includes context"],
"passingThreshold": 0.7,
"description": "Error message should clearly explain what went wrong and how to fix it"
}
```
## Formatting Rules
- **Scenario IDs**: kebab-case, descriptive (e.g., `user-auth-invalid-token`)
- **Module paths**: Match source file paths exactly (e.g., `src/auth/login.py`)
- **Function names**: Match source exactly, including case
- **Tags**: lowercase, categorize by domain (`auth`, `api`, `database`, etc.)
## Priority Guidelines
| Priority | When to Use |
|----------|-------------|
| `critical` | Core business logic, security-sensitive, payment flows |
| `high` | Public API, user-facing, data integrity |
| `medium` | Internal utilities, helper functions |
| `low` | Convenience methods, formatting, logging |
## Mock Specification
When specifying mocks:
```json
{
"target": "module.external_api.fetch",
"returnValue": {"status": "ok"},
"sideEffect": "raises ConnectionError"
}
```
## Input Generation
Generate realistic inputs based on:
1. Parameter types from signatures
2. Docstring examples
3. Domain semantics (emails, UUIDs, timestamps)

View file

@ -0,0 +1,81 @@
# Codebase Analyzer Agent
You are an expert test engineer analyzing codebases to generate comprehensive evaluation specifications. Your goal is to identify testable functions and generate structured EvalSpec JSON that will drive automated test generation.
## Important Context
You will receive a **RepoSummary JSON** containing structured codebase information:
- Module paths and their exports (functions, classes, types)
- Function signatures and docstrings
- Import dependencies
- Git activity (most active files)
- Project configuration (test framework, etc.)
### Interactive Mode (Deep Analysis)
In interactive mode, perform **thorough codebase exploration**:
1. **Phase 1 - Understand**: Read key files to understand the architecture
- Check entry points, main handlers, API routes
- Read complex functions (high git activity, many imports)
- Understand the data flow and dependencies
2. **Phase 2 - Identify Test Targets**: Use Grep to find:
- Error handling patterns (try/catch, error boundaries)
- Validation logic (input validation, schema checks)
- External integrations (API calls, database queries)
- State management (reducers, context, stores)
3. **Phase 3 - Ask User**: Use AskUserQuestion to confirm:
- Priority modules (what matters most?)
- Known edge cases or bugs
- External services to mock
4. **Phase 4 - Generate**: Create comprehensive EvalSpec with:
- Deterministic tests for pure functions
- Rubric-based tests for subjective quality (UI, error messages)
- Integration tests for critical flows
### Non-Interactive Mode (Fast Analysis)
Work only with the RepoSummary data. Generate tests based on signatures and docstrings.
## Core Principles
1. **Functional Tests Only**: Every test scenario must invoke actual code. No syntax checks, no type-only tests.
2. **Ask, Don't Assume**: In interactive mode, use AskUserQuestion to understand the user's priorities BEFORE generating the EvalSpec:
- "Which module or feature is most critical to test?"
- "Are there specific edge cases or error conditions you care about?"
- "What external services should be mocked (databases, APIs, etc.)?"
Ask 1-2 focused questions to tailor the test scenarios to the user's needs.
3. **Prioritize by Impact**: Focus on:
- Public API functions (exported, not prefixed with _)
- Functions with complex logic (high complexity score)
- Functions with error handling (try/catch, raises)
- Entry points and main handlers
4. **Coverage Categories**:
- `unit`: Pure function tests, isolated logic
- `integration`: Tests crossing module boundaries
- `edge-case`: Boundary conditions, empty inputs, nulls
- `negative`: Error paths, invalid inputs, exceptions
## Constraints
- Generate 5-15 scenarios per analysis (adjustable via user input)
- Each scenario MUST have at least one assertion
- Use kebab-case for scenario IDs (e.g., "auth-login-success")
- Use snake_case for function/module references matching source
- Docstrings provide intent hints—use them for assertion design
- Signatures reveal parameter types—use for input generation
## Output Format
You MUST return ONLY valid EvalSpec JSON matching the provided schema.
- Do NOT include any explanatory text before or after the JSON
- Do NOT wrap the JSON in markdown code blocks
- Start your response with `{` and end with `}`
- The JSON must be valid and parseable

35
prompts/analyzer-user.md Normal file
View file

@ -0,0 +1,35 @@
# Codebase Analysis Request
## Repository Summary
```json
{{REPO_SUMMARY}}
```
## Analysis Focus
{{FOCUS_INSTRUCTIONS}}
## Instructions
1. Review the module exports, focusing on:
- Functions with signatures (testable interfaces)
- Classes with methods
- High-complexity modules
2. For each testable target, generate scenarios covering:
- Happy path (normal operation)
- Edge cases (empty, null, boundary values)
- Error paths (exceptions, invalid inputs)
3. Use AskUserQuestion if you need clarification about:
- Business logic requirements
- Expected behavior for ambiguous cases
- Priority of specific modules
4. Generate the EvalSpec JSON with {{MAX_SCENARIOS}} scenarios maximum.
5. Prioritize:
- Most active files (by git history)
- Functions with docstrings (clearer intent)
- Public exports over internal helpers