mirror of
https://github.com/harivansh-afk/evaluclaude-harness.git
synced 2026-04-17 15:04:53 +00:00
analyze
This commit is contained in:
parent
4b24606d0e
commit
9297f0b1ee
13 changed files with 1292 additions and 16 deletions
72
prompts/analyzer-developer.md
Normal file
72
prompts/analyzer-developer.md
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
# EvalSpec Schema Reference
|
||||
|
||||
## Assertion Types
|
||||
|
||||
### Deterministic Assertions (for pure functions, exact outputs)
|
||||
|
||||
| Type | Properties | Use Case |
|
||||
|------|------------|----------|
|
||||
| `equals` | `expected`, `path?` | Exact value match |
|
||||
| `contains` | `value`, `path?` | Substring or array element |
|
||||
| `throws` | `errorType?`, `messageContains?` | Exception expected |
|
||||
| `typeof` | `expected`, `path?` | Type checking |
|
||||
| `matches` | `pattern`, `path?` | Regex pattern match |
|
||||
| `truthy`/`falsy` | `path?` | Boolean coercion |
|
||||
| `custom` | `description`, `check` | Complex validation |
|
||||
|
||||
### LLM Rubric Assertions (for subjective quality, UI, user experience)
|
||||
|
||||
| Type | Properties | Use Case |
|
||||
|------|------------|----------|
|
||||
| `llm-rubric` | `rubric`, `criteria[]`, `passingThreshold?` | Quality evaluation by Claude |
|
||||
|
||||
**When to use LLM rubrics:**
|
||||
- Error message quality (is it helpful? actionable?)
|
||||
- UI component output (does it render correctly? accessible?)
|
||||
- API response format (well-structured? consistent?)
|
||||
- Generated content quality (documentation, code suggestions)
|
||||
|
||||
**Example:**
|
||||
```json
|
||||
{
|
||||
"type": "llm-rubric",
|
||||
"rubric": "error-message-quality",
|
||||
"criteria": ["clarity", "actionability", "includes context"],
|
||||
"passingThreshold": 0.7,
|
||||
"description": "Error message should clearly explain what went wrong and how to fix it"
|
||||
}
|
||||
```
|
||||
|
||||
## Formatting Rules
|
||||
|
||||
- **Scenario IDs**: kebab-case, descriptive (e.g., `user-auth-invalid-token`)
|
||||
- **Module paths**: Match source file paths exactly (e.g., `src/auth/login.py`)
|
||||
- **Function names**: Match source exactly, including case
|
||||
- **Tags**: lowercase, categorize by domain (`auth`, `api`, `database`, etc.)
|
||||
|
||||
## Priority Guidelines
|
||||
|
||||
| Priority | When to Use |
|
||||
|----------|-------------|
|
||||
| `critical` | Core business logic, security-sensitive, payment flows |
|
||||
| `high` | Public API, user-facing, data integrity |
|
||||
| `medium` | Internal utilities, helper functions |
|
||||
| `low` | Convenience methods, formatting, logging |
|
||||
|
||||
## Mock Specification
|
||||
|
||||
When specifying mocks:
|
||||
```json
|
||||
{
|
||||
"target": "module.external_api.fetch",
|
||||
"returnValue": {"status": "ok"},
|
||||
"sideEffect": "raises ConnectionError"
|
||||
}
|
||||
```
|
||||
|
||||
## Input Generation
|
||||
|
||||
Generate realistic inputs based on:
|
||||
1. Parameter types from signatures
|
||||
2. Docstring examples
|
||||
3. Domain semantics (emails, UUIDs, timestamps)
|
||||
Loading…
Add table
Add a link
Reference in a new issue