readme

2026-04-15 08:03:43 +00:00 · 2026-01-11 20:56:41 -05:00 · 2026-01-11 20:56:41 -05:00 · 7062b53344
commit 7062b53344
parent 69c08c9d6b
1 changed files with 54 additions and 162 deletions
--- a/README.md
+++ b/README.md
@ -6,195 +6,87 @@
 ![Node](https://img.shields.io/badge/node-%3E%3D18.0.0-green)
 ![License](https://img.shields.io/badge/license-MIT-brightgreen)

-## What is this?
-
-**evaluclaude** is a CLI tool that uses Claude to understand your codebase and generate real, runnable functional tests. Unlike traditional test generators that produce boilerplate, evaluclaude:
-
- **Parses your code** with tree-sitter (no LLM tokens wasted on structure)
- **Asks smart questions** to understand your testing priorities
- **Generates specs, not code** — deterministic renderers create the actual tests
- **Full observability** — every run produces a trace you can inspect
+A CLI tool that uses Claude to understand your codebase and generate real, runnable functional tests. Tree-sitter parses your code structure, Claude generates test specs, and deterministic renderers create the actual tests.

 ## Quick Start

 ```bash
-# Install
 npm install -g evaluclaude-harness
+export ANTHROPIC_API_KEY=your-key

 # Run the full pipeline
 evaluclaude pipeline .

 # Or step by step
-evaluclaude intro .           # Introspect codebase
+evaluclaude intro .                    # Parse codebase
 evaluclaude analyze . -o spec.json -i  # Generate spec (interactive)
-evaluclaude render spec.json  # Create test files
-evaluclaude run               # Execute tests
+evaluclaude render spec.json           # Create test files
+evaluclaude run                        # Execute tests
 ```

 ## How It Works

 ```
-┌─────────────────────────────────────────────────────────┐
-│                    evaluclaude pipeline                 │
-├─────────────────────────────────────────────────────────┤
-│                                                         │
-│   1. INTROSPECT        Parse code with tree-sitter      │
-│      📂 → 📋           Extract functions, classes       │
-│                                                         │
-│   2. ANALYZE           Claude generates EvalSpec        │
-│      📋 → 🧠           Asks clarifying questions        │
-│                                                         │
-│   3. RENDER            Deterministic code generation    │
-│      🧠 → 📄           pytest / vitest / jest           │
-│                                                         │
-│   4. RUN               Execute in sandbox               │
-│      📄 → 🧪           Collect results + traces         │
-│                                                         │
-└─────────────────────────────────────────────────────────┘
+  INTROSPECT          ANALYZE            RENDER             RUN
+  Parse code    ->    Generate     ->    Create test   ->   Execute
+  with tree-sitter    EvalSpec           files (pytest,     & trace
+                      with Claude        vitest, jest)
 ```

 ## Commands

-### Core Pipeline
-
-| Command | Description |
-|---------|-------------|
-| `pipeline [path]` | Run the full pipeline: introspect → analyze → render → run |
-| `intro [path]` | Introspect codebase with tree-sitter |
-| `analyze [path]` | Generate EvalSpec with Claude |
-| `render <spec>` | Render EvalSpec to test files |
-| `run [test-dir]` | Execute tests and collect results |
-
-### Grading & Rubrics
-
-| Command | Description |
-|---------|-------------|
-| `grade <input>` | Grade output using LLM rubric |
-| `rubrics` | List available rubrics |
-| `calibrate` | Calibrate rubric against examples |
-
-### Observability
-
-| Command | Description |
-|---------|-------------|
-| `view [trace-id]` | View trace details |
-| `traces` | List all traces |
-| `ui` | Launch Promptfoo dashboard |
-| `eval` | Run Promptfoo evaluations |
-
-## Examples
-
-### Analyze a Python project interactively
-
-```bash
-evaluclaude analyze ./my-python-project -i -o spec.json
-```
-
-Claude will ask questions like:
- "I see 3 database models. Which is the core domain object?"
- "Found 47 utility functions. Want me to prioritize the most-used ones?"
-
-### Focus on specific modules
-
-```bash
-evaluclaude pipeline . --focus auth,payments --max-scenarios 20
-```
-
-### View test results in browser
-
-```bash
-evaluclaude run --export-promptfoo
-evaluclaude ui
-```
-
-### Skip steps in the pipeline
-
-```bash
-# Use existing spec, just run tests
-evaluclaude pipeline . --skip-analyze --skip-render
-
-# Generate tests without running
-evaluclaude pipeline . --skip-run
-```
-
-## Configuration
-
-### Environment Variables
-
-| Variable | Description |
-|----------|-------------|
-| `ANTHROPIC_API_KEY` | Your Anthropic API key |
-
-### Output Structure
-
-```
-.evaluclaude/
-├── spec.json           # Generated EvalSpec
-├── traces/             # Execution traces
-│   └── trace-xxx.json
-├── results/            # Test results
-│   └── run-xxx.json
-└── promptfooconfig.yaml  # Promptfoo config (with --promptfoo)
-```
-
-## Rubrics
-
-Create custom grading rubrics in YAML:
-
-```yaml
-# rubrics/my-rubric.yaml
-name: my-rubric
-description: Custom quality checks
-passingThreshold: 0.7
-
-criteria:
-  - name: correctness
-    description: Code produces correct results
-    weight: 0.5
-  - name: clarity
-    description: Code is clear and readable
-    weight: 0.3
-  - name: efficiency
-    description: Code is reasonably efficient
-    weight: 0.2
-```
-
-Use it:
-```bash
-evaluclaude grade output.txt -r my-rubric
-```
-
-## Architecture
-
-evaluclaude follows key principles:
-
-1. **Tree-sitter for introspection** — Never send raw code to Claude for structure extraction
-2. **Claude generates specs, not code** — EvalSpec JSON is LLM output; test code is deterministic
-3. **Functional tests only** — Every test must invoke actual code, no syntax checks
-4. **Full observability** — Every eval run produces an inspectable trace
+| Command                    | Description                                   |
+| :------------------------- | :-------------------------------------------- |
+| `pipeline [path]`          | Run full pipeline: intro -> analyze -> render -> run |
+| `intro [path]`             | Introspect codebase with tree-sitter          |
+| `analyze [path]`           | Generate EvalSpec with Claude                 |
+| `render <spec>`            | Render EvalSpec to test files                 |
+| `run [test-dir]`           | Execute tests and collect results             |
+| `grade <input>`            | Grade output using LLM rubric                 |
+| `rubrics`                  | List available rubrics                        |
+| `calibrate`                | Calibrate rubric against examples             |
+| `view [trace-id]`          | View trace details                            |
+| `traces`                   | List all traces                               |
+| `ui`                       | Launch Promptfoo dashboard                    |
+| `eval`                     | Run Promptfoo evaluations                     |

 ## Supported Languages

-| Language | Parser | Test Framework |
-|----------|--------|----------------|
-| Python | tree-sitter-python | pytest |
-| TypeScript | tree-sitter-typescript | vitest, jest |
-| JavaScript | tree-sitter-typescript | vitest, jest |
+| Language      | Parser                   | Test Framework  |
+| :------------ | :----------------------- | :-------------- |
+| Python        | tree-sitter-python       | pytest          |
+| TypeScript    | tree-sitter-typescript   | vitest, jest    |
+| JavaScript    | tree-sitter-typescript   | vitest, jest    |
+
+## Output Structure
+
+```
+.evaluclaude/
+  spec.json              # Generated EvalSpec
+  traces/                # Execution traces
+  results/               # Test results
+  promptfooconfig.yaml   # Promptfoo config
+```
+
+## How This Was Built
+
+This project was built in a few hours using [Amp Code](https://ampcode.com). You can explore the development threads:
+
+- [Initial setup and CLI structure](https://ampcode.com/threads/T-019bae58-69c2-74d0-a975-4be84c7d98dc)
+- [Introspection and tree-sitter integration](https://ampcode.com/threads/T-019bafc5-0d57-702a-a407-44b9b884b9d0)
+- [EvalSpec analysis with Claude](https://ampcode.com/threads/T-019baf57-7bc1-7368-9e3d-d649da47b68b)
+- [Test rendering and framework support](https://ampcode.com/threads/T-019baeef-6079-70d6-9c4a-3cfd439190f1)
+- [Grading and rubrics system](https://ampcode.com/threads/T-019baf12-e566-733d-b086-d099880c77c1)
+- [Promptfoo integration](https://ampcode.com/threads/T-019baf43-abcf-7715-8a65-0f5ac5df87ce)
+- [UI polish and final touches](https://ampcode.com/threads/T-019baf63-8c9e-7018-b8bc-538c5a3cada7)

 ## Development

 ```bash
-# Build
-npm run build
-
-# Run in dev mode
-npm run dev
-
-# Run tests
-npm test
-
-# Type check
-npm run typecheck
+npm run build      # Build
+npm run dev        # Dev mode
+npm test           # Run tests
+npm run typecheck  # Type check
 ```

 ## License