evaluclaude-harness/README.md
2026-01-11 20:56:41 -05:00

3.8 KiB

evaluclaude

Zero-to-evals in one command. Claude analyzes your codebase and generates functional tests.

Version Node License

A CLI tool that uses Claude to understand your codebase and generate real, runnable functional tests. Tree-sitter parses your code structure, Claude generates test specs, and deterministic renderers create the actual tests.

Quick Start

npm install -g evaluclaude-harness
export ANTHROPIC_API_KEY=your-key

# Run the full pipeline
evaluclaude pipeline .

# Or step by step
evaluclaude intro .                    # Parse codebase
evaluclaude analyze . -o spec.json -i  # Generate spec (interactive)
evaluclaude render spec.json           # Create test files
evaluclaude run                        # Execute tests

How It Works

  INTROSPECT          ANALYZE            RENDER             RUN
  Parse code    ->    Generate     ->    Create test   ->   Execute
  with tree-sitter    EvalSpec           files (pytest,     & trace
                      with Claude        vitest, jest)

Commands

Command Description
pipeline [path] Run full pipeline: intro -> analyze -> render -> run
intro [path] Introspect codebase with tree-sitter
analyze [path] Generate EvalSpec with Claude
render <spec> Render EvalSpec to test files
run [test-dir] Execute tests and collect results
grade <input> Grade output using LLM rubric
rubrics List available rubrics
calibrate Calibrate rubric against examples
view [trace-id] View trace details
traces List all traces
ui Launch Promptfoo dashboard
eval Run Promptfoo evaluations

Supported Languages

Language Parser Test Framework
Python tree-sitter-python pytest
TypeScript tree-sitter-typescript vitest, jest
JavaScript tree-sitter-typescript vitest, jest

Output Structure

.evaluclaude/
  spec.json              # Generated EvalSpec
  traces/                # Execution traces
  results/               # Test results
  promptfooconfig.yaml   # Promptfoo config

How This Was Built

This project was built in a few hours using Amp Code. You can explore the development threads:

Development

npm run build      # Build
npm run dev        # Dev mode
npm test           # Run tests
npm run typecheck  # Type check

License

MIT