evaluclaude

Zero-to-evals in one command. Claude analyzes your codebase and generates functional tests.

flowchart LR
    subgraph Input
        A[Your Codebase]
    end
    
    subgraph Pipeline
        B[Introspect]
        C[Analyze]
        D[Render]
        E[Run]
    end
    
    subgraph Output
        F[Test Results]
        G[Traces]
    end
    
    A --> B
    B -->|RepoSummary| C
    C -->|EvalSpec| D
    D -->|Test Files| E
    E --> F
    E --> G
    
    B -.-|tree-sitter| B
    C -.-|Claude| C
    D -.-|pytest/vitest| D

A CLI tool that uses Claude to understand your codebase and generate real, runnable functional tests. Tree-sitter parses your code structure, Claude generates test specs, and deterministic renderers create the actual tests.

Quick Start

npm install -g evaluclaude-harness
export ANTHROPIC_API_KEY=your-key

# Run the full pipeline
evaluclaude pipeline .

# Or step by step
evaluclaude intro .                    # Parse codebase
evaluclaude analyze . -o spec.json -i  # Generate spec (interactive)
evaluclaude render spec.json           # Create test files
evaluclaude run                        # Execute tests

Commands

Command	Description
`pipeline [path]`	Run full pipeline: intro -> analyze -> render -> run
`intro [path]`	Introspect codebase with tree-sitter
`analyze [path]`	Generate EvalSpec with Claude
`render <spec>`	Render EvalSpec to test files
`run [test-dir]`	Execute tests and collect results
`grade <input>`	Grade output using LLM rubric
`rubrics`	List available rubrics
`calibrate`	Calibrate rubric against examples
`view [trace-id]`	View trace details
`traces`	List all traces
`ui`	Launch Promptfoo dashboard
`eval`	Run Promptfoo evaluations

Supported Languages

Language	Parser	Test Framework
Python	tree-sitter-python	pytest
TypeScript	tree-sitter-typescript	vitest, jest
JavaScript	tree-sitter-typescript	vitest, jest

Output Structure

.evaluclaude/
  spec.json              # Generated EvalSpec
  traces/                # Execution traces
  results/               # Test results
  promptfooconfig.yaml   # Promptfoo config

How This Was Built

This project was built in a few hours using Amp Code. You can explore the development threads:

Development

npm run build      # Build
npm run dev        # Dev mode
npm test           # Run tests
npm run typecheck  # Type check

License

MIT

4 KiB Raw Blame History