No description
Find a file
2026-01-13 09:25:20 -08:00
docs iteration 0 2026-01-11 16:58:40 -05:00
prompts grader, test renderer 2026-01-11 18:13:00 -05:00
rubrics improvements and promptfoo 2026-01-11 20:02:30 -05:00
src ui polish 2026-01-11 20:38:57 -05:00
.gitignore iteration 0 2026-01-11 16:58:40 -05:00
AGENTS.md iteration 0 2026-01-11 16:58:40 -05:00
package-lock.json grader, test renderer 2026-01-11 18:13:00 -05:00
package.json grader, test renderer 2026-01-11 18:13:00 -05:00
PLAN.md iteration 0 2026-01-11 16:58:40 -05:00
README.md readme 2026-01-13 09:25:20 -08:00
tsconfig.json iteration 0 2026-01-11 16:58:40 -05:00

evaluclaude

Zero-to-evals in one command. Claude analyzes your codebase and generates functional tests.

Version Node License

flowchart LR
    subgraph Input
        A[Your Codebase]
    end
    
    subgraph Pipeline
        B[Introspect]
        C[Analyze]
        D[Render]
        E[Run]
    end
    
    subgraph Output
        F[Test Results]
        G[Traces]
    end
    
    A --> B
    B -->|RepoSummary| C
    C -->|EvalSpec| D
    D -->|Test Files| E
    E --> F
    E --> G
    
    B -.-|tree-sitter| B
    C -.-|Claude| C
    D -.-|pytest/vitest| D

A CLI tool that uses Claude to understand your codebase and generate real, runnable functional tests. Tree-sitter parses your code structure, Claude generates test specs, and deterministic renderers create the actual tests.

Quick Start

npm install -g evaluclaude-harness
export ANTHROPIC_API_KEY=your-key

# Run the full pipeline
evaluclaude pipeline .

# Or step by step
evaluclaude intro .                    # Parse codebase
evaluclaude analyze . -o spec.json -i  # Generate spec (interactive)
evaluclaude render spec.json           # Create test files
evaluclaude run                        # Execute tests

Commands

Command Description
pipeline [path] Run full pipeline: intro -> analyze -> render -> run
intro [path] Introspect codebase with tree-sitter
analyze [path] Generate EvalSpec with Claude
render <spec> Render EvalSpec to test files
run [test-dir] Execute tests and collect results
grade <input> Grade output using LLM rubric
rubrics List available rubrics
calibrate Calibrate rubric against examples
view [trace-id] View trace details
traces List all traces
ui Launch Promptfoo dashboard
eval Run Promptfoo evaluations

Supported Languages

Language Parser Test Framework
Python tree-sitter-python pytest
TypeScript tree-sitter-typescript vitest, jest
JavaScript tree-sitter-typescript vitest, jest

Output Structure

.evaluclaude/
  spec.json              # Generated EvalSpec
  traces/                # Execution traces
  results/               # Test results
  promptfooconfig.yaml   # Promptfoo config

How This Was Built

This project was built in a few hours using Amp Code. You can explore the development threads:

Development

npm run build      # Build
npm run dev        # Dev mode
npm test           # Run tests
npm run typecheck  # Type check

License

MIT