co-mono/packages/agent/README.md
Mario Zechner f82e82da93 docs: Improve reasoning support table clarity
- Remove redundant 'Reasoning Tokens' column (all models count them)
- Group by provider for better readability
- Clarify model limitations vs API limitations
- Simplify check marks to focus on thinking content availability
2025-08-10 02:13:13 +02:00

432 lines
No EOL
14 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# pi-agent
A general-purpose agent with tool calling and session persistence, modeled after Claude Code but extremely hackable and minimal. It comes with a built-in TUI (also modeled after Claude Code) for interactive use.
Everything is designed to be easy:
- Writing custom UIs on top of it (via JSON mode in any language or the TypeScript API)
- Using it for inference steps in deterministic programs (via JSON mode in any language or the TypeScript API)
- Providing your own system prompts and tools
- Working with various LLM providers or self-hosted LLMs
## Installation
```bash
npm install -g @mariozechner/pi-agent
```
This installs the `pi-agent` command globally.
## Quick Start
By default, pi-agent uses OpenAI's API with model `gpt-5-mini` and authenticates using the `OPENAI_API_KEY` environment variable. Any OpenAI-compatible endpoint works, including Ollama, vLLM, OpenRouter, Groq, Anthropic, etc.
```bash
# Single message
pi-agent "What is 2+2?"
# Multiple messages processed sequentially
pi-agent "What is 2+2?" "What about 3+3?"
# Interactive chat mode (no messages = interactive)
pi-agent
# Continue most recently modified session in current directory
pi-agent --continue "Follow up question"
# GPT-OSS via Groq
pi-agent --base-url https://api.groq.com/openai/v1 --api-key $GROQ_API_KEY --model openai/gpt-oss-120b
# GLM 4.5 via OpenRouter
pi-agent --base-url https://openrouter.ai/api/v1 --api-key $OPENROUTER_API_KEY --model z-ai/glm-4.5
# Claude via Anthropic's OpenAI compatibility layer. See: https://docs.anthropic.com/en/api/openai-sdk
pi-agent --base-url https://api.anthropic.com/v1 --api-key $ANTHROPIC_API_KEY --model claude-opus-4-1-20250805
# Gemini via Google AI
pi-agent --base-url https://generativelanguage.googleapis.com/v1beta/openai/ --api-key $GEMINI_API_KEY --model gemini-2.5-flash
```
## Usage Modes
### Single-Shot Mode
Process one or more messages and exit:
```bash
pi-agent "First question" "Second question"
```
### Interactive Mode
Start an interactive chat session:
```bash
pi-agent
```
- Type messages and press Enter to send
- Type `exit` or `quit` to end session
- Press Escape to interrupt while processing
- Press CTRL+C to clear the text editor
- Press CTRL+C twice quickly to exit
### JSON Mode
JSON mode enables programmatic integration by outputting events as JSONL (JSON Lines).
**Single-shot mode:** Outputs a stream of JSON events for each message, then exits.
```bash
pi-agent --json "What is 2+2?" "And the meaning of life?"
# Outputs: {"type":"session_start","sessionId":"bb6f0acb-80cf-4729-9593-bcf804431a53","model":"gpt-5-mini","api":"completions","baseURL":"https://api.openai.com/v1","systemPrompt":"You are a helpful assistant."} {"type":"user_message","text":"What is 2+2?"} {"type":"assistant_start"} {"type":"token_usage","inputTokens":314,"outputTokens":16,"totalTokens":330,"cacheReadTokens":0,"cacheWriteTokens":0} {"type":"assistant_message","text":"2 + 2 = 4"} {"type":"user_message","text":"And the meaning of life?"} {"type":"assistant_start"} {"type":"token_usage","inputTokens":337,"outputTokens":331,"totalTokens":668,"cacheReadTokens":0,"cacheWriteTokens":0} {"type":"assistant_message","text":"Short answer (pop-culture): 42.\n\nMore useful answers:\n- Philosophical...
```
**Interactive mode:** Accepts JSON commands via stdin and outputs JSON events to stdout.
```bash
# Start interactive JSON mode
pi-agent --json
# Now send commands via stdin
# Pipe one or more initial messages in
(echo '{"type": "message", "content": "What is 2+2?"}'; cat) | pi-agent --json
# Outputs: {"type":"session_start","sessionId":"bb64cfbe-dd52-4662-bd4a-0d921c332fd1","model":"gpt-5-mini","api":"completions","baseURL":"https://api.openai.com/v1","systemPrompt":"You are a helpful assistant."} {"type":"user_message","text":"What is 2+2?"} {"type":"assistant_start"} {"type":"token_usage","inputTokens":314,"outputTokens":16,"totalTokens":330,"cacheReadTokens":0,"cacheWriteTokens":0} {"type":"assistant_message","text":"2 + 2 = 4"}
```
Commands you can send via stdin in interactive JSON mode:
```json
{"type": "message", "content": "Your message here"} // Send a message to the agent
{"type": "interrupt"} // Interrupt current processing
```
## Configuration
### Command Line Options
```
--base-url <url> API base URL (default: https://api.openai.com/v1)
--api-key <key> API key (or set OPENAI_API_KEY env var)
--model <model> Model name (default: gpt-4o-mini)
--api <type> API type: "completions" or "responses" (default: completions)
--system-prompt <text> System prompt (default: "You are a helpful assistant.")
--continue Continue previous session
--json JSON mode
--help, -h Show help message
```
## Session Persistence
Sessions are automatically saved to `~/.pi/sessions/` and include:
- Complete conversation history
- Tool call results
- Token usage statistics
Use `--continue` to resume the last session:
```bash
pi-agent "Start a story about a robot"
# ... later ...
pi-agent --continue "Continue the story"
```
## Tools
The agent includes built-in tools for file system operations:
- **read_file** - Read file contents
- **list_directory** - List directory contents
- **bash** - Execute shell commands
- **glob** - Find files by pattern
- **ripgrep** - Search file contents
These tools are automatically available when using the agent through the `pi` command for code navigation tasks.
## JSON Mode Events
When using `--json`, the agent outputs these event types:
- `session_start` - New session started with metadata
- `user_message` - User input
- `assistant_start` - Assistant begins responding
- `assistant_message` - Assistant's response
- `reasoning` - Reasoning/thinking (for models that support it)
- `tool_call` - Tool being called
- `tool_result` - Result from tool
- `token_usage` - Token usage statistics (includes `reasoningTokens` for models with reasoning)
- `error` - Error occurred
- `interrupted` - Processing was interrupted
The complete TypeScript type definition for `AgentEvent` can be found in [`src/agent.ts`](src/agent.ts#L6).
## Build an Interactive UI with JSON Mode
Build custom UIs in any language by spawning pi-agent in JSON mode and communicating via stdin/stdout.
```javascript
import { spawn } from 'child_process';
import { createInterface } from 'readline';
// Start the agent in JSON mode
const agent = spawn('pi-agent', ['--json']);
// Create readline interface for parsing JSONL output from agent
const agentOutput = createInterface({input: agent.stdout, crlfDelay: Infinity});
// Create readline interface for user input
const userInput = createInterface({input: process.stdin, output: process.stdout});
// State tracking
let isProcessing = false, lastUsage, isExiting = false;
// Handle each line of JSON output from agent
agentOutput.on('line', (line) => {
try {
const event = JSON.parse(line);
// Handle all event types
switch (event.type) {
case 'session_start':
console.log(`Session started (${event.model}, ${event.api}, ${event.baseURL})`);
console.log('Press CTRL + C to exit');
promptUser();
break;
case 'user_message':
// Already shown in prompt, skip
break;
case 'assistant_start':
isProcessing = true;
console.log('\n[assistant]');
break;
case 'thinking':
console.log(`[thinking]\n${event.text}\n`);
break;
case 'tool_call':
console.log(`[tool] ${event.name}(${event.args.substring(0, 50)})\n`);
break;
case 'tool_result':
const lines = event.result.split('\n');
const truncated = lines.length - 5 > 0 ? `\n. ... (${lines.length - 5} more lines truncated)` : '';
console.log(`[tool result]\n${lines.slice(0, 5).join('\n')}${truncated}\n`);
break;
case 'assistant_message':
console.log(event.text.trim());
isProcessing = false;
promptUser();
break;
case 'token_usage':
lastUsage = event;
break;
case 'error':
console.error('\n❌ Error:', event.message);
isProcessing = false;
promptUser();
break;
case 'interrupted':
console.log('\n⚠ Interrupted by user');
isProcessing = false;
promptUser();
break;
}
} catch (e) {
console.error('Failed to parse JSON:', line, e);
}
});
// Send a message to the agent
function sendMessage(content) {
agent.stdin.write(`${JSON.stringify({type: 'message', content: content})}\n`);
}
// Send interrupt signal
function interrupt() {
agent.stdin.write(`${JSON.stringify({type: 'interrupt'})}\n`);
}
// Prompt for user input
function promptUser() {
if (isExiting) return;
if (lastUsage) {
console.log(`\nin: ${lastUsage.inputTokens}, out: ${lastUsage.outputTokens}, cache read: ${lastUsage.cacheReadTokens}, cache write: ${lastUsage.cacheWriteTokens}`);
}
userInput.question('\n[user]\n> ', (answer) => {
answer = answer.trim();
if (answer) {
sendMessage(answer);
} else {
promptUser();
}
});
}
// Handle Ctrl+C
process.on('SIGINT', () => {
if (isProcessing) {
interrupt();
} else {
agent.kill();
process.exit(0);
}
});
// Handle agent exit
agent.on('close', (code) => {
isExiting = true;
userInput.close();
console.log(`\nAgent exited with code ${code}`);
process.exit(code);
});
// Handle errors
agent.on('error', (err) => {
console.error('Failed to start agent:', err);
process.exit(1);
});
// Start the conversation
console.log('Pi Agent Interactive Chat');
```
### Usage Examples
```bash
# OpenAI o1/o3 - see thinking content with Responses API
pi-agent --api responses --model o1-mini "Explain quantum computing"
# Groq gpt-oss - reasoning with Chat Completions
pi-agent --base-url https://api.groq.com/openai/v1 --api-key $GROQ_API_KEY \
--model openai/gpt-oss-120b "Complex math problem"
# Gemini 2.5 - thinking content automatically configured
pi-agent --base-url https://generativelanguage.googleapis.com/v1beta/openai/ \
--api-key $GEMINI_API_KEY --model gemini-2.5-flash "Think step by step"
# OpenRouter - supports various reasoning models
pi-agent --base-url https://openrouter.ai/api/v1 --api-key $OPENROUTER_API_KEY \
--model "qwen/qwen3-235b-a22b-thinking-2507" "Complex reasoning task"
```
### JSON Mode Events
When reasoning is active, you'll see:
- `reasoning` events with thinking text (when available)
- `token_usage` events include `reasoningTokens` field
- Console/TUI show reasoning tokens with ⚡ symbol
## Reasoning
Pi-agent supports reasoning/thinking tokens for models that provide this capability:
### Supported Providers
| Provider | Model | API | Thinking Content | Notes |
|----------|-------|-----|------------------|-------|
| **OpenAI** | o1, o3 | Responses | ✅ Full | Thinking events + token counts |
| | o1, o3 | Chat Completions | ❌ | Token counts only |
| | gpt-5 | Both APIs | ❌ | Model limitation (empty summaries) |
| **Groq** | gpt-oss | Chat Completions | ✅ Full | Via `reasoning_format: "parsed"` |
| | gpt-oss | Responses | ❌ | API doesn't support reasoning.summary |
| **Gemini** | 2.5 models | Chat Completions | ✅ Full | Auto-configured via extra_body |
| **Anthropic** | Claude | OpenAI Compat | ❌ | Use native API for thinking |
| **OpenRouter** | Various | Both APIs | Varies | Depends on underlying model |
### Technical Details
The agent automatically:
- Detects provider from base URL
- Tests model reasoning support on first use (cached)
- Adjusts request parameters per provider:
- OpenAI: `reasoning_effort` (minimal/low)
- Groq: `reasoning_format: "parsed"`
- Gemini: `extra_body.google.thinking_config`
- OpenRouter: `reasoning` object with `effort` field
- Parses provider-specific response formats:
- Gemini: Extracts from `<thought>` tags
- Groq: Uses `message.reasoning` field
- OpenRouter: Uses `message.reasoning` field
- OpenAI: Uses standard `reasoning` events
## Architecture
The agent is built with:
- **agent.ts** - Core Agent class and API functions
- **cli.ts** - CLI entry point, argument parsing, and JSON mode handler
- **args.ts** - Custom typed argument parser
- **session-manager.ts** - Session persistence
- **tools/** - Tool implementations
- **renderers/** - Output formatters (console, TUI, JSON)
## Development
### Running from Source
```bash
# Run directly with npx tsx - no build needed
npx tsx src/cli.ts "What is 2+2?"
# Interactive TUI mode
npx tsx src/cli.ts
# JSON mode for programmatic use
echo '{"type":"message","content":"list files"}' | npx tsx src/cli.ts --json
```
### Testing
The agent supports three testing modes:
#### 1. Test UI/Renderers (non-interactive mode)
```bash
# Test console renderer output and metrics
npx tsx src/cli.ts "list files in /tmp" 2>&1 | tail -5
# Verify: ↑609 ↓610 ⚒ 1 (tokens and tool count)
# Test TUI renderer (with stdin)
echo "list files" | npx tsx src/cli.ts 2>&1 | grep "⚒"
```
#### 2. Test Model Behavior (JSON mode)
```bash
# Extract metrics for model comparison
echo '{"type":"message","content":"write fibonacci in Python"}' | \
npx tsx src/cli.ts --json --model gpt-4o-mini 2>&1 | \
jq -s '[.[] | select(.type=="token_usage")] | last'
# Compare models: tokens used, tool calls made, quality
for model in "gpt-4o-mini" "gpt-4o"; do
echo "Testing $model:"
echo '{"type":"message","content":"fix syntax errors in: prnt(hello)"}' | \
npx tsx src/cli.ts --json --model $model 2>&1 | \
jq -r 'select(.type=="token_usage" or .type=="tool_call" or .type=="assistant_message")'
done
```
#### 3. LLM-as-Judge Testing
```bash
# Capture output from different models and evaluate with another LLM
TASK="write a Python function to check if a number is prime"
# Get response from model A
RESPONSE_A=$(echo "{\"type\":\"message\",\"content\":\"$TASK\"}" | \
npx tsx src/cli.ts --json --model gpt-4o-mini 2>&1 | \
jq -r '.[] | select(.type=="assistant_message") | .text')
# Judge the response
echo "{\"type\":\"message\",\"content\":\"Rate this code (1-10): $RESPONSE_A\"}" | \
npx tsx src/cli.ts --json --model gpt-4o 2>&1 | \
jq -r '.[] | select(.type=="assistant_message") | .text'
```
## Use as a Library
```typescript
import { Agent, ConsoleRenderer } from '@mariozechner/pi-agent';
const agent = new Agent({
apiKey: process.env.OPENAI_API_KEY,
baseURL: 'https://api.openai.com/v1',
model: 'gpt-5-mini',
api: 'completions',
systemPrompt: 'You are a helpful assistant.'
}, new ConsoleRenderer());
await agent.ask('What is 2+2?');
```