mirror of
https://github.com/harivansh-afk/evaluclaude-harness.git
synced 2026-04-15 13:03:44 +00:00
4 KiB
4 KiB
evaluclaude
Zero-to-evals in one command. Claude analyzes your codebase and generates functional tests.
flowchart LR
subgraph Input
A[Your Codebase]
end
subgraph Pipeline
B[Introspect]
C[Analyze]
D[Render]
E[Run]
end
subgraph Output
F[Test Results]
G[Traces]
end
A --> B
B -->|RepoSummary| C
C -->|EvalSpec| D
D -->|Test Files| E
E --> F
E --> G
B -.-|tree-sitter| B
C -.-|Claude| C
D -.-|pytest/vitest| D
A CLI tool that uses Claude to understand your codebase and generate real, runnable functional tests. Tree-sitter parses your code structure, Claude generates test specs, and deterministic renderers create the actual tests.
Quick Start
npm install -g evaluclaude-harness
export ANTHROPIC_API_KEY=your-key
# Run the full pipeline
evaluclaude pipeline .
# Or step by step
evaluclaude intro . # Parse codebase
evaluclaude analyze . -o spec.json -i # Generate spec (interactive)
evaluclaude render spec.json # Create test files
evaluclaude run # Execute tests
Commands
| Command | Description |
|---|---|
pipeline [path] |
Run full pipeline: intro -> analyze -> render -> run |
intro [path] |
Introspect codebase with tree-sitter |
analyze [path] |
Generate EvalSpec with Claude |
render <spec> |
Render EvalSpec to test files |
run [test-dir] |
Execute tests and collect results |
grade <input> |
Grade output using LLM rubric |
rubrics |
List available rubrics |
calibrate |
Calibrate rubric against examples |
view [trace-id] |
View trace details |
traces |
List all traces |
ui |
Launch Promptfoo dashboard |
eval |
Run Promptfoo evaluations |
Supported Languages
| Language | Parser | Test Framework |
|---|---|---|
| Python | tree-sitter-python | pytest |
| TypeScript | tree-sitter-typescript | vitest, jest |
| JavaScript | tree-sitter-typescript | vitest, jest |
Output Structure
.evaluclaude/
spec.json # Generated EvalSpec
traces/ # Execution traces
results/ # Test results
promptfooconfig.yaml # Promptfoo config
How This Was Built
This project was built in a few hours using Amp Code. You can explore the development threads:
- Initial setup and CLI structure
- Introspection and tree-sitter integration
- EvalSpec analysis with Claude
- Test rendering and framework support
- Grading and rubrics system
- Promptfoo integration
- UI polish and final touches
Development
npm run build # Build
npm run dev # Dev mode
npm test # Run tests
npm run typecheck # Type check
License
MIT