mirror of https://github.com/harivansh-afk/evaluclaude-harness.git synced 2026-04-15 03:00:46 +00:00

No description

Find a file

Harivansh Rathi b3cfa100a7 readme		2026-01-13 09:25:20 -08:00
docs	iteration 0	2026-01-11 16:58:40 -05:00
prompts	grader, test renderer	2026-01-11 18:13:00 -05:00
rubrics	improvements and promptfoo	2026-01-11 20:02:30 -05:00
src	ui polish	2026-01-11 20:38:57 -05:00
.gitignore	iteration 0	2026-01-11 16:58:40 -05:00
AGENTS.md	iteration 0	2026-01-11 16:58:40 -05:00
package-lock.json	grader, test renderer	2026-01-11 18:13:00 -05:00
package.json	grader, test renderer	2026-01-11 18:13:00 -05:00
PLAN.md	iteration 0	2026-01-11 16:58:40 -05:00
README.md	readme	2026-01-13 09:25:20 -08:00
tsconfig.json	iteration 0	2026-01-11 16:58:40 -05:00

README.md

evaluclaude

Zero-to-evals in one command. Claude analyzes your codebase and generates functional tests.

flowchart LR
    subgraph Input
        A[Your Codebase]
    end
    
    subgraph Pipeline
        B[Introspect]
        C[Analyze]
        D[Render]
        E[Run]
    end
    
    subgraph Output
        F[Test Results]
        G[Traces]
    end
    
    A --> B
    B -->|RepoSummary| C
    C -->|EvalSpec| D
    D -->|Test Files| E
    E --> F
    E --> G
    
    B -.-|tree-sitter| B
    C -.-|Claude| C
    D -.-|pytest/vitest| D

A CLI tool that uses Claude to understand your codebase and generate real, runnable functional tests. Tree-sitter parses your code structure, Claude generates test specs, and deterministic renderers create the actual tests.

Quick Start

npm install -g evaluclaude-harness
export ANTHROPIC_API_KEY=your-key

# Run the full pipeline
evaluclaude pipeline .

# Or step by step
evaluclaude intro .                    # Parse codebase
evaluclaude analyze . -o spec.json -i  # Generate spec (interactive)
evaluclaude render spec.json           # Create test files
evaluclaude run                        # Execute tests

Commands

Command	Description
`pipeline [path]`	Run full pipeline: intro -> analyze -> render -> run
`intro [path]`	Introspect codebase with tree-sitter
`analyze [path]`	Generate EvalSpec with Claude
`render <spec>`	Render EvalSpec to test files
`run [test-dir]`	Execute tests and collect results
`grade <input>`	Grade output using LLM rubric
`rubrics`	List available rubrics
`calibrate`	Calibrate rubric against examples
`view [trace-id]`	View trace details
`traces`	List all traces
`ui`	Launch Promptfoo dashboard
`eval`	Run Promptfoo evaluations

Supported Languages

Language	Parser	Test Framework
Python	tree-sitter-python	pytest
TypeScript	tree-sitter-typescript	vitest, jest
JavaScript	tree-sitter-typescript	vitest, jest

Output Structure

.evaluclaude/
  spec.json              # Generated EvalSpec
  traces/                # Execution traces
  results/               # Test results
  promptfooconfig.yaml   # Promptfoo config

How This Was Built

This project was built in a few hours using Amp Code. You can explore the development threads:

Development

npm run build      # Build
npm run dev        # Dev mode
npm test           # Run tests
npm run typecheck  # Type check

License

MIT