From 508d1bb2d65a71bad1dc270a247f86339e51a557 Mon Sep 17 00:00:00 2001 From: Mario Zechner Date: Fri, 14 Nov 2025 21:32:16 +0100 Subject: [PATCH] Clean-up --- CLAUDE.md => AGENTS.md | 5 +- docs/agent.md | 1341 ---------------------------------------- 2 files changed, 3 insertions(+), 1343 deletions(-) rename CLAUDE.md => AGENTS.md (80%) delete mode 100644 docs/agent.md diff --git a/CLAUDE.md b/AGENTS.md similarity index 80% rename from CLAUDE.md rename to AGENTS.md index 000a2aaa..97c013d7 100644 --- a/CLAUDE.md +++ b/AGENTS.md @@ -3,10 +3,11 @@ - packages/ai/README.md - packages/tui/README.md - packages/agent/README.md + - packages/coding-agent.md - packages/pods/README.md - packages/web-ui/README.md -- We must NEVER have type any anywhere, unless absolutely, positively necessary. +- We must NEVER have type `any` anywhere, unless absolutely, positively necessary. - If you are working with an external API, check node_modules for the type definitions as needed instead of assuming things. - Always run `npm run check` in the project's root directory after making code changes. - You must NEVER run `npm run dev` yourself. Doing is means you failed the user hard. -- \ No newline at end of file +- Do NOT commit unless asked to by the user \ No newline at end of file diff --git a/docs/agent.md b/docs/agent.md deleted file mode 100644 index d8b63b01..00000000 --- a/docs/agent.md +++ /dev/null @@ -1,1341 +0,0 @@ -# Agent Architecture - -## Executive Summary - -This document proposes extracting the agent infrastructure from `@mariozechner/pi-web-ui` into two new packages: - -1. **`@mariozechner/agent`** - General-purpose agent package with transport abstraction, state management, and attachment support -2. **`@mariozechner/coding-agent`** - Specialized coding agent built on the general agent, with file manipulation tools and session management - -The new architecture will provide: -- **General agent core** with transport abstraction (ProviderTransport, AppTransport) -- **Reactive state management** with subscribe/emit pattern -- **Attachment support** (type definitions only - processing stays in consumers) -- **Message transformation** pipeline for filtering and adapting messages -- **Message queueing** for out-of-band message injection -- **Full abort support** throughout the execution pipeline -- **Event-driven API** for flexible UI integration -- **Clean separation** between agent logic and presentation layer -- **Coding-specific tools** (read, bash, edit, write) in a specialized package -- **Session management** for conversation persistence and resume capability - -## Current Architecture Analysis - -### Package Overview - -``` -pi-mono/ -├── packages/ai/ # Core AI streaming (GOOD - keep as-is) -├── packages/web-ui/ # Web UI with embedded agent (EXTRACT core agent logic) -├── packages/agent/ # OLD - needs to be replaced -├── packages/tui/ # Terminal UI lib (GOOD - low-level primitives) -├── packages/proxy/ # CORS proxy (unrelated) -└── packages/pods/ # GPU deployment tool (unrelated) -``` - -### packages/ai - Core Streaming Library - -**Status:** ✅ Solid foundation, keep as-is - -**Architecture:** -```typescript -agentLoop( - prompt: UserMessage, - context: AgentContext, - config: AgentLoopConfig, - signal?: AbortSignal -): EventStream -``` - -**Key Features:** -- Event-driven streaming (turn_start, message_*, tool_execution_*, turn_end, agent_end) -- Tool execution with validation -- Signal-based cancellation -- Message queue for injecting out-of-band messages -- Preprocessor support for message transformation - -**Events:** -```typescript -type AgentEvent = - | { type: "agent_start" } - | { type: "turn_start" } - | { type: "message_start"; message: Message } - | { type: "message_update"; assistantMessageEvent: AssistantMessageEvent; message: AssistantMessage } - | { type: "message_end"; message: Message } - | { type: "tool_execution_start"; toolCallId: string; toolName: string; args: any } - | { type: "tool_execution_end"; toolCallId: string; toolName: string; result: AgentToolResult | string; isError: boolean } - | { type: "turn_end"; message: AssistantMessage; toolResults: ToolResultMessage[] } - | { type: "agent_end"; messages: Message[] } -``` - -**Tool Interface:** -```typescript -interface AgentTool extends Tool { - label: string; // Human-readable name for UI - execute: ( - toolCallId: string, - params: Static, - signal?: AbortSignal - ) => Promise>; -} - -interface AgentToolResult { - output: string; // Text sent to LLM - details: T; // Structured data for UI rendering -} -``` - -### packages/web-ui/src/agent - Web Agent - -**Status:** ✅ KEEP AS-IS for now, will be replaced later after new packages are proven - -**Architecture:** -```typescript -class Agent { - constructor(opts: { - initialState?: Partial; - debugListener?: (entry: DebugLogEntry) => void; - transport: AgentTransport; - messageTransformer?: (messages: AppMessage[]) => Message[]; - }) - - async prompt(input: string, attachments?: Attachment[]): Promise - abort(): void - subscribe(fn: (e: AgentEvent) => void): () => void - setSystemPrompt(v: string): void - setModel(m: Model): void - setThinkingLevel(l: ThinkingLevel): void - setTools(t: AgentTool[]): void - replaceMessages(ms: AppMessage[]): void - appendMessage(m: AppMessage): void - async queueMessage(m: AppMessage): Promise - clearMessages(): void -} -``` - -**Key Features (will be basis for new `@mariozechner/agent` package):** -- ✅ **Transport abstraction** (ProviderTransport for direct API, AppTransport for server-side proxy) -- ✅ **Attachment type definition** (id, type, fileName, mimeType, size, content, extractedText, preview) -- ✅ **Message transformation** pipeline (app messages → LLM messages, with filtering) -- ✅ **Reactive state** (subscribe/emit pattern for UI updates) -- ✅ **Message queueing** for injecting messages out-of-band during agent loop -- ✅ **Abort support** (AbortController per prompt) -- ✅ **State management** (systemPrompt, model, thinkingLevel, tools, messages, isStreaming, etc.) - -**Strategy:** -1. Use this implementation as the **reference design** for `@mariozechner/agent` -2. Create new `@mariozechner/agent` package by copying/adapting this code -3. Keep web-ui using its own embedded agent until new package is proven stable -4. Eventually migrate web-ui to use `@mariozechner/agent` (Phase 2 of migration) -5. Document processing (PDF/DOCX/PPTX/Excel) stays in web-ui permanently - -### packages/agent - OLD Implementation - -**Status:** ⚠️ REMOVE COMPLETELY - -**Why it should be removed:** -1. **Tightly coupled to OpenAI SDK** - Not provider-agnostic, hardcoded to OpenAI's API -2. **Outdated architecture** - Superseded by web-ui's better agent design -3. **Mixed concerns** - Agent logic + tool implementations + rendering all in one package -4. **Limited scope** - Cannot be reused across different UI implementations - -**What to salvage before removal:** -1. **SessionManager** - Port to `@mariozechner/coding-agent` (JSONL-based session persistence) -2. **Tool implementations** - Adapt read, bash, edit, write tools for coding-agent -3. **Renderer abstractions** - Port TuiRenderer/ConsoleRenderer/JsonRenderer concepts to coding-agent-tui - -**Action:** Delete this package entirely after extracting useful components to the new packages. - -## Proposed Architecture - -### Package Structure - -``` -pi-mono/ -├── packages/ai/ # [unchanged] Core streaming library -│ -├── packages/agent/ # [NEW] General-purpose agent -│ ├── src/ -│ │ ├── agent.ts # Main Agent class -│ │ ├── types.ts # AgentState, AgentEvent, Attachment, etc. -│ │ ├── transports/ -│ │ │ ├── types.ts # AgentTransport interface -│ │ │ ├── ProviderTransport.ts # Direct API calls -│ │ │ ├── AppTransport.ts # Server-side proxy -│ │ │ ├── proxy-types.ts # Proxy event types -│ │ │ └── index.ts # Transport exports -│ │ └── index.ts # Public API -│ └── package.json -│ -├── packages/coding-agent/ # [NEW] Coding-specific agent + CLI -│ ├── src/ -│ │ ├── coding-agent.ts # CodingAgent wrapper (uses @mariozechner/agent) -│ │ ├── session-manager.ts # Session persistence (JSONL) -│ │ ├── tools/ -│ │ │ ├── read-tool.ts # Read files (with pagination) -│ │ │ ├── bash-tool.ts # Shell execution -│ │ │ ├── edit-tool.ts # File editing (old_string → new_string) -│ │ │ ├── write-tool.ts # File creation/replacement -│ │ │ └── index.ts # Tool exports -│ │ ├── cli/ -│ │ │ ├── index.ts # CLI entry point -│ │ │ ├── renderers/ -│ │ │ │ ├── tui-renderer.ts # Rich terminal UI -│ │ │ │ ├── console-renderer.ts # Simple console output -│ │ │ │ └── json-renderer.ts # JSONL output for piping -│ │ │ └── main.ts # CLI app logic -│ │ ├── types.ts # Public types -│ │ └── index.ts # Public API (agent + tools) -│ └── package.json # Exports both library + CLI binary -│ -├── packages/web-ui/ # [updated] Uses @mariozechner/agent -│ ├── src/ -│ │ ├── utils/ -│ │ │ └── attachment-utils.ts # Document processing (keep here) -│ │ └── ... # Other web UI code -│ └── package.json # Now depends on @mariozechner/agent -│ -└── packages/tui/ # [unchanged] Low-level terminal primitives -``` - -### Dependency Graph - -``` - ┌─────────────────────┐ - │ @mariozechner/ │ - │ pi-ai │ ← Core streaming, tool interface - └──────────┬──────────┘ - │ depends on - ↓ - ┌─────────────────────┐ - │ @mariozechner/ │ - │ agent │ ← General agent (transports, state, attachments) - └──────────┬──────────┘ - │ depends on - ↓ - ┌───────────────┴───────────────┐ - ↓ ↓ - ┌─────────────────────┐ ┌─────────────────────┐ - │ @mariozechner/ │ │ @mariozechner/ │ - │ coding-agent │ │ pi-web-ui │ - │ (lib + CLI + tools) │ │ (+ doc processing) │ - └─────────────────────┘ └─────────────────────┘ -``` - -## Package: @mariozechner/agent - -### Core Types - -```typescript -export interface Attachment { - id: string; - type: "image" | "document"; - fileName: string; - mimeType: string; - size: number; - content: string; // base64 encoded (without data URL prefix) - extractedText?: string; // For documents - preview?: string; // base64 image preview -} - -export type ThinkingLevel = "off" | "minimal" | "low" | "medium" | "high"; - -// AppMessage abstraction - extends base LLM messages with app-specific features -export type UserMessageWithAttachments = UserMessage & { attachments?: Attachment[] }; - -// Extensible interface for custom app messages (via declaration merging) -// Apps can add their own message types: -// declare module "@mariozechner/agent" { -// interface CustomMessages { -// artifact: ArtifactMessage; -// notification: NotificationMessage; -// } -// } -export interface CustomMessages { - // Empty by default - apps extend via declaration merging -} - -// AppMessage: Union of LLM messages + attachments + custom messages -export type AppMessage = - | AssistantMessage - | UserMessageWithAttachments - | ToolResultMessage - | CustomMessages[keyof CustomMessages]; - -export interface AgentState { - systemPrompt: string; - model: Model; - thinkingLevel: ThinkingLevel; - tools: AgentTool[]; - messages: AppMessage[]; // Can include attachments + custom message types - isStreaming: boolean; - streamMessage: Message | null; - pendingToolCalls: Set; - error?: string; -} - -export type AgentEvent = - | { type: "state-update"; state: AgentState } - | { type: "started" } - | { type: "completed" }; -``` - -### AppMessage Abstraction - -The `AppMessage` type is a key abstraction that extends base LLM messages with app-specific features while maintaining type safety and extensibility. - -**Key Benefits:** -1. **Extends base messages** - Adds `attachments` field to `UserMessage` for file uploads -2. **Type-safe extensibility** - Apps can add custom message types via declaration merging -3. **Backward compatible** - Works seamlessly with base LLM messages from `@mariozechner/pi-ai` -4. **Message transformation** - Filters app-specific fields before sending to LLM - -**Usage Example (Web UI):** -```typescript -import type { AppMessage } from "@mariozechner/agent"; - -// Extend with custom message type for artifacts -declare module "@mariozechner/agent" { - interface CustomMessages { - artifact: ArtifactMessage; - } -} - -interface ArtifactMessage { - role: "artifact"; - action: "create" | "update" | "delete"; - filename: string; - content?: string; - title?: string; - timestamp: string; -} - -// Now AppMessage includes: AssistantMessage | UserMessageWithAttachments | ToolResultMessage | ArtifactMessage -const messages: AppMessage[] = [ - { role: "user", content: "Hello", attachments: [attachment] }, - { role: "assistant", content: [{ type: "text", text: "Hi!" }], /* ... */ }, - { role: "artifact", action: "create", filename: "test.ts", content: "...", timestamp: "..." } -]; -``` - -**Usage Example (Coding Agent):** -```typescript -import type { AppMessage } from "@mariozechner/agent"; - -// Coding agent can extend with session metadata -declare module "@mariozechner/agent" { - interface CustomMessages { - session_metadata: SessionMetadataMessage; - } -} - -interface SessionMetadataMessage { - role: "session_metadata"; - sessionId: string; - timestamp: string; - workingDirectory: string; -} -``` - -**Message Transformation:** - -The `messageTransformer` function converts app messages to LLM-compatible messages, including handling attachments: - -```typescript -function defaultMessageTransformer(messages: AppMessage[]): Message[] { - return messages - .filter((m) => { - // Only keep standard LLM message roles - return m.role === "user" || m.role === "assistant" || m.role === "toolResult"; - }) - .map((m) => { - if (m.role === "user") { - const { attachments, ...baseMessage } = m as any; - - // If no attachments, return as-is - if (!attachments || attachments.length === 0) { - return baseMessage as Message; - } - - // Convert attachments to content blocks - const content = Array.isArray(baseMessage.content) - ? [...baseMessage.content] - : [{ type: "text", text: baseMessage.content }]; - - for (const attachment of attachments) { - // Add image blocks for image attachments - if (attachment.type === "image") { - content.push({ - type: "image", - data: attachment.content, - mimeType: attachment.mimeType - }); - } - // Add text blocks for documents with extracted text - else if (attachment.type === "document" && attachment.extractedText) { - content.push({ - type: "text", - text: attachment.extractedText - }); - } - } - - return { ...baseMessage, content } as Message; - } - return m as Message; - }); -} -``` - -This ensures that: -- Custom message types (like `artifact`, `session_metadata`) are filtered out -- Image attachments are converted to `ImageContent` blocks -- Document attachments with extracted text are converted to `TextContent` blocks -- The `attachments` field itself is stripped (replaced by proper content blocks) -- LLM receives only standard `Message` types from `@mariozechner/pi-ai` - -### Agent Class - -```typescript -export interface AgentOptions { - initialState?: Partial; - transport: AgentTransport; - // Transform app messages to LLM-compatible messages before sending - messageTransformer?: (messages: AppMessage[]) => Message[] | Promise; -} - -export class Agent { - constructor(opts: AgentOptions); - - get state(): AgentState; - subscribe(fn: (e: AgentEvent) => void): () => void; - - // State mutators - setSystemPrompt(v: string): void; - setModel(m: Model): void; - setThinkingLevel(l: ThinkingLevel): void; - setTools(t: AgentTool[]): void; - replaceMessages(ms: AppMessage[]): void; - appendMessage(m: AppMessage): void; - async queueMessage(m: AppMessage): Promise; - clearMessages(): void; - - // Main prompt method - async prompt(input: string, attachments?: Attachment[]): Promise; - - // Abort current operation - abort(): void; -} -``` - -**Key Features:** -1. **Reactive state** - Subscribe to state updates for UI binding -2. **Transport abstraction** - Pluggable backends (direct API, proxy server, etc.) -3. **Message transformation** - Convert app-specific messages to LLM format -4. **Message queueing** - Inject messages during agent loop (for tool results, errors) -5. **Attachment support** - Type-safe attachment handling (processing is external) -6. **Abort support** - Cancel in-progress operations - -### Transport Interface - -```typescript -export interface AgentRunConfig { - systemPrompt: string; - tools: AgentTool[]; - model: Model; - reasoning?: "low" | "medium" | "high"; - getQueuedMessages?: () => Promise[]>; -} - -export interface AgentTransport { - run( - messages: Message[], - userMessage: Message, - config: AgentRunConfig, - signal?: AbortSignal, - ): AsyncIterable; -} -``` - -### ProviderTransport - -```typescript -export class ProviderTransport implements AgentTransport { - async *run(messages: Message[], userMessage: Message, cfg: AgentRunConfig, signal?: AbortSignal) { - // Calls LLM providers directly using agentLoop from @mariozechner/pi-ai - // Optionally routes through CORS proxy if configured - } -} -``` - -### AppTransport - -```typescript -export class AppTransport implements AgentTransport { - constructor(proxyUrl: string); - - async *run(messages: Message[], userMessage: Message, cfg: AgentRunConfig, signal?: AbortSignal) { - // Routes requests through app server with user authentication - // Server manages API keys and usage tracking - } -} -``` - -## Package: @mariozechner/coding-agent - -### CodingAgent Class - -```typescript -export interface CodingAgentOptions { - systemPrompt: string; - model: Model; - reasoning?: "low" | "medium" | "high"; - apiKey: string; - workingDirectory?: string; - sessionManager?: SessionManager; -} - -export class CodingAgent { - constructor(options: CodingAgentOptions); - - // Access underlying agent - get agent(): Agent; - - // State accessors - get state(): AgentState; - subscribe(fn: (e: AgentEvent) => void): () => void; - - // Send a message to the agent - async prompt(message: string, attachments?: Attachment[]): Promise; - - // Abort current operation - abort(): void; - - // Message management for session restoration - replaceMessages(messages: AppMessage[]): void; - getMessages(): AppMessage[]; -} -``` - -**Key design decisions:** -1. **Wraps @mariozechner/agent** - Builds on the general agent package -2. **Pre-configured tools** - Includes read, bash, edit, write tools -3. **Session management** - Optional JSONL-based session persistence -4. **Working directory context** - All file operations relative to this directory -5. **Simple API** - Hides transport complexity, uses ProviderTransport by default - -### Usage Example (TUI) - -```typescript -import { CodingAgent } from "@mariozechner/coding-agent"; -import { SessionManager } from "@mariozechner/coding-agent"; -import { getModel } from "@mariozechner/pi-ai"; - -const session = new SessionManager({ continue: true }); -const agent = new CodingAgent({ - systemPrompt: "You are a coding assistant...", - model: getModel("openai", "gpt-4"), - apiKey: process.env.OPENAI_API_KEY!, - workingDirectory: process.cwd(), - sessionManager: session, -}); - -// Restore previous session -if (session.hasData()) { - agent.replaceMessages(session.getMessages()); -} - -// Subscribe to state changes -agent.subscribe((event) => { - if (event.type === "state-update") { - renderer.render(event.state); - } else if (event.type === "completed") { - session.save(agent.getMessages()); - } -}); - -// Send prompt -await agent.prompt("Fix the bug in server.ts"); -``` - -### Usage Example (Web UI) - -```typescript -import { Agent, ProviderTransport, Attachment } from "@mariozechner/agent"; -import { getModel } from "@mariozechner/pi-ai"; -import { loadAttachment } from "./utils/attachment-utils"; // Web UI keeps this - -const agent = new Agent({ - transport: new ProviderTransport(), - initialState: { - systemPrompt: "You are a helpful assistant...", - model: getModel("google", "gemini-2.5-flash"), - thinkingLevel: "low", - tools: [], - }, -}); - -// Subscribe to state changes for UI updates -agent.subscribe((event) => { - if (event.type === "state-update") { - updateUI(event.state); - } -}); - -// Handle file upload and send prompt -const file = await fileInput.files[0]; -const attachment = await loadAttachment(file); // Processes PDF/DOCX/etc -await agent.prompt("Analyze this document", [attachment]); -``` - -### Session Manager - -```typescript -export interface SessionManagerOptions { - continue?: boolean; // Resume most recent session - directory?: string; // Custom session directory -} - -export interface SessionMetadata { - id: string; - timestamp: string; - cwd: string; - config: CodingAgentConfig; -} - -export interface SessionData { - metadata: SessionMetadata; - messages: AppMessage[]; // Conversation history -} - -export class SessionManager { - constructor(options?: SessionManagerOptions); - - // Start a new session (writes metadata) - startSession(config: CodingAgentConfig): void; - - // Append a message to the session (appends to JSONL) - appendMessage(message: AppMessage): void; - - // Check if session has existing data - hasData(): boolean; - - // Get full session data - getData(): SessionData | null; - - // Get just the messages for agent restoration - getMessages(): AppMessage[]; - - // Get session file path - getFilePath(): string; - - // Get session ID - getId(): string; -} -``` - -**Session Storage Format (JSONL):** -```jsonl -{"type":"metadata","id":"uuid","timestamp":"2025-10-12T10:00:00Z","cwd":"/path","config":{...}} -{"type":"message","message":{"role":"user","content":"Fix the bug in server.ts"}} -{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"I'll help..."}],...}} -{"type":"message","message":{"role":"toolResult","toolCallId":"call_123","output":"..."}} -{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"Fixed!"}],...}} -``` - -**How it works:** -- First line is session metadata (id, timestamp, working directory, config) -- Each subsequent line is an `AppMessage` from `agent.state.messages` -- Messages are appended as they're added to the agent state (append-only) -- On session restore, read all message lines to reconstruct conversation history - -**Session File Naming:** -``` -~/.pi/sessions/--path-to-project--/ - 2025-10-12T10-00-00-000Z_uuid.jsonl - 2025-10-12T11-30-00-000Z_uuid.jsonl -``` - -### Tool: BashTool - -```typescript -export interface BashToolDetails { - command: string; - exitCode: number; - duration: number; // milliseconds -} - -export const bashToolSchema = Type.Object({ - command: Type.String({ description: "Shell command to execute" }), -}); - -export class BashTool implements AgentTool { - name = "bash"; - label = "Execute Shell Command"; - description = "Execute a bash command in the working directory"; - parameters = bashToolSchema; - - constructor(private workingDirectory: string); - - async execute( - toolCallId: string, - params: { command: string }, - signal?: AbortSignal - ): Promise> { - // Spawn child process with signal support - // Capture stdout/stderr - // Handle 1MB output limit with truncation - // Return structured result - } -} -``` - -**Key Features:** -- Abort support via signal → child process kill -- 1MB output limit (prevents memory exhaustion) -- Exit code tracking -- Working directory context - -**Output Format:** -```typescript -{ - output: "stdout:\n\nstderr:\n\nexit code: 0", - details: { - command: "npm test", - exitCode: 0, - duration: 1234 - } -} -``` - -### Tool: ReadTool - -```typescript -export interface ReadToolDetails { - filePath: string; - totalLines: number; - linesRead: number; - offset: number; - truncated: boolean; -} - -export const readToolSchema = Type.Object({ - file_path: Type.String({ description: "Path to file to read (relative or absolute)" }), - offset: Type.Optional(Type.Number({ - description: "Line number to start reading from (1-indexed). Omit to read from beginning.", - minimum: 1 - })), - limit: Type.Optional(Type.Number({ - description: "Maximum number of lines to read. Omit to read entire file (max 5000 lines).", - minimum: 1, - maximum: 5000 - })), -}); - -export class ReadTool implements AgentTool { - name = "read"; - label = "Read File"; - description = "Read file contents. For files >5000 lines, use offset and limit to read in chunks."; - parameters = readToolSchema; - - constructor(private workingDirectory: string); - - async execute( - toolCallId: string, - params: { file_path: string; offset?: number; limit?: number }, - signal?: AbortSignal - ): Promise> { - // Resolve file path (relative to workingDirectory) - // Count total lines in file - // If no offset/limit: read up to 5000 lines, warn if truncated - // If offset/limit: read specified range - // Format with line numbers (using cat -n style) - // Return content + metadata - } -} -``` - -**Key Features:** -- **Full file read**: Up to 5000 lines (warns LLM if truncated) -- **Ranged read**: Specify offset + limit for large files -- **Line numbers**: Output formatted like `cat -n` (1-indexed) -- **Abort support**: Can cancel during large file reads -- **Metadata**: Total line count, lines read, truncation status - -**Output Format (full file):** -```typescript -{ - output: ` 1 import { foo } from './foo'; - 2 import { bar } from './bar'; - 3 - 4 export function main() { - 5 console.log('hello'); - 6 }`, - details: { - filePath: "src/main.ts", - totalLines: 6, - linesRead: 6, - offset: 0, - truncated: false - } -} -``` - -**Output Format (large file, truncated):** -```typescript -{ - output: `WARNING: File has 10000 lines, showing first 5000. Use offset and limit parameters to read more. - - 1 import { foo } from './foo'; - 2 import { bar } from './bar'; - ... - 5000 const x = 42;`, - details: { - filePath: "src/large.ts", - totalLines: 10000, - linesRead: 5000, - offset: 0, - truncated: true - } -} -``` - -**Output Format (ranged read):** -```typescript -{ - output: ` 1000 function middleware() { - 1001 return (req, res, next) => { - 1002 console.log('middleware'); - 1003 next(); - 1004 }; - 1005 }`, - details: { - filePath: "src/server.ts", - totalLines: 10000, - linesRead: 6, - offset: 1000, - truncated: false - } -} -``` - -**Error Cases:** -- File not found → error -- Offset > total lines → error -- Binary file detected → error (suggest using bash tool) - -**Usage Examples in System Prompt:** -``` -To read a large file: -1. read(file_path="src/large.ts") // Gets first 5000 lines + total count -2. If truncated, read remaining chunks: - read(file_path="src/large.ts", offset=5001, limit=5000) - read(file_path="src/large.ts", offset=10001, limit=5000) -``` - -### Tool: EditTool - -```typescript -export interface EditToolDetails { - filePath: string; - oldString: string; - newString: string; - matchCount: number; - linesChanged: number; -} - -export const editToolSchema = Type.Object({ - file_path: Type.String({ description: "Path to file to edit (relative or absolute)" }), - old_string: Type.String({ description: "Exact string to find and replace" }), - new_string: Type.String({ description: "String to replace with" }), -}); - -export class EditTool implements AgentTool { - name = "edit"; - label = "Edit File"; - description = "Find and replace exact string in a file"; - parameters = editToolSchema; - - constructor(private workingDirectory: string); - - async execute( - toolCallId: string, - params: { file_path: string; old_string: string; new_string: string }, - signal?: AbortSignal - ): Promise> { - // Resolve file path (relative to workingDirectory) - // Read file contents - // Find old_string (must be exact match) - // Replace with new_string - // Write file back - // Return stats - } -} -``` - -**Key Features:** -- Exact string matching (no regex) -- Safe atomic writes (write temp → rename) -- Abort support (cancel before write) -- Match validation (error if old_string not found) -- Line-based change tracking - -**Output Format:** -```typescript -{ - output: "Replaced 1 occurrence in src/server.ts (3 lines changed)", - details: { - filePath: "src/server.ts", - oldString: "const port = 3000;", - newString: "const port = process.env.PORT || 3000;", - matchCount: 1, - linesChanged: 3 - } -} -``` - -**Error Cases:** -- File not found → error -- old_string not found → error -- Multiple matches for old_string → error (ambiguous) -- File changed during operation → error (race condition) - -### Tool: WriteTool - -```typescript -export interface WriteToolDetails { - filePath: string; - size: number; - isNew: boolean; -} - -export const writeToolSchema = Type.Object({ - file_path: Type.String({ description: "Path to file to create/overwrite" }), - content: Type.String({ description: "Full file contents to write" }), -}); - -export class WriteTool implements AgentTool { - name = "write"; - label = "Write File"; - description = "Create a new file or completely replace existing file contents"; - parameters = writeToolSchema; - - constructor(private workingDirectory: string); - - async execute( - toolCallId: string, - params: { file_path: string; content: string }, - signal?: AbortSignal - ): Promise> { - // Resolve file path - // Check if file exists (track isNew) - // Create parent directories if needed - // Write content atomically - // Return stats - } -} -``` - -**Key Features:** -- Creates parent directories automatically -- Safe atomic writes -- Abort support -- No size limits (trust LLM context limits) - -**Output Format:** -```typescript -{ - output: "Created new file src/utils/helper.ts (142 bytes)", - details: { - filePath: "src/utils/helper.ts", - size: 142, - isNew: true - } -} -``` - -## CLI Interface (included in @mariozechner/coding-agent) - -The coding-agent package includes both a library and a CLI interface in one package. - -### CLI Usage - -```bash -# Interactive mode (default) -coding-agent - -# Continue previous session -coding-agent --continue - -# Single-shot mode -coding-agent "Fix the TypeScript errors" - -# Multiple prompts -coding-agent "Add validation" "Write tests" - -# Custom model -coding-agent --model openai/gpt-4 --api-key $KEY - -# JSON output (for piping) -coding-agent --json < prompts.jsonl > results.jsonl -``` - -### CLI Arguments - -```typescript -{ - "base-url": string; // API endpoint - "api-key": string; // API key (or env var) - "model": string; // Model identifier - "system-prompt": string; // System prompt - "continue": boolean; // Resume session - "json": boolean; // JSONL I/O mode - "help": boolean; // Show help -} -``` - -### Renderers - -**TuiRenderer** - Rich terminal UI -- Real-time streaming output -- Syntax highlighting for code -- Tool execution indicators -- Progress spinners -- Token usage stats -- Keyboard shortcuts (Ctrl+C to abort) - -**ConsoleRenderer** - Simple console output -- Plain text output -- No ANSI codes -- Good for logging/CI - -**JsonRenderer** - JSONL output -- One JSON object per line -- Each line is a complete event -- For piping/processing - -### JSON Mode Example - -Input (stdin): -```jsonl -{"type":"message","content":"List all TypeScript files"} -{"type":"interrupt"} -{"type":"message","content":"Count the files"} -``` - -Output (stdout): -```jsonl -{"type":"turn_start","timestamp":"..."} -{"type":"message_start","message":{...}} -{"type":"tool_execution_start","toolCallId":"...","toolName":"bash","args":"{...}"} -{"type":"tool_execution_end","toolCallId":"...","result":"..."} -{"type":"message_end","message":{...}} -{"type":"turn_end"} -{"type":"interrupted"} -{"type":"message_start","message":{...}} -... -``` - -## Integration Patterns - -### VS Code Extension - -```typescript -import { CodingAgent, SessionManager } from "@mariozechner/coding-agent"; -import * as vscode from "vscode"; - -class CodingAgentProvider { - private agent: CodingAgent; - private outputChannel: vscode.OutputChannel; - - constructor() { - const workspaceRoot = vscode.workspace.workspaceFolders?.[0]?.uri.fsPath || process.cwd(); - const session = new SessionManager({ - directory: path.join(workspaceRoot, ".vscode", "agent-sessions") - }); - - this.agent = new CodingAgent({ - config: { - systemPrompt: "You are a coding assistant...", - model: getModel("openai", "gpt-4"), - apiKey: vscode.workspace.getConfiguration("codingAgent").get("apiKey")!, - }, - sessionManager: session, - workingDirectory: workspaceRoot, - }); - - this.outputChannel = vscode.window.createOutputChannel("Coding Agent"); - } - - async executePrompt(prompt: string) { - const cancellation = new vscode.CancellationTokenSource(); - - // Convert VS Code cancellation to AbortSignal - const controller = new AbortController(); - cancellation.token.onCancellationRequested(() => controller.abort()); - - for await (const event of this.agent.prompt(prompt, controller.signal)) { - switch (event.type) { - case "message_update": - this.outputChannel.appendLine(event.message.content[0].text); - break; - case "tool_execution_start": - vscode.window.showInformationMessage(`Running: ${event.toolName}`); - break; - case "tool_execution_end": - if (event.isError) { - vscode.window.showErrorMessage(`Tool failed: ${event.result}`); - } - break; - } - } - } -} -``` - -### Headless Server/API - -```typescript -import { CodingAgent } from "@mariozechner/coding-agent"; -import express from "express"; - -const app = express(); - -app.post("/api/prompt", async (req, res) => { - const { prompt, sessionId } = req.body; - - const agent = new CodingAgent({ - config: { - systemPrompt: "...", - model: getModel("openai", "gpt-4"), - apiKey: process.env.OPENAI_API_KEY!, - }, - workingDirectory: `/tmp/workspaces/${sessionId}`, - }); - - // Stream SSE - res.setHeader("Content-Type", "text/event-stream"); - - const controller = new AbortController(); - req.on("close", () => controller.abort()); - - for await (const event of agent.prompt(prompt, controller.signal)) { - res.write(`data: ${JSON.stringify(event)}\n\n`); - } - - res.end(); -}); - -app.listen(3000); -``` - -## Migration Plan - -### Phase 1: Create General Agent Package -1. Create `packages/agent/` structure -2. **COPY** Agent class from web-ui/src/agent/agent.ts (don't extract yet) -3. Copy types (AgentState, AgentEvent, Attachment, DebugLogEntry, ThinkingLevel) -4. Copy transports (types.ts, ProviderTransport.ts, AppTransport.ts, proxy-types.ts) -5. Adapt code to work as standalone package -6. Write unit tests for Agent class -7. Write tests for both transports -8. Publish `@mariozechner/agent@0.1.0` -9. **Keep web-ui unchanged** - it continues using its embedded agent - -### Phase 2: Create Coding Agent Package (with CLI) -1. Create `packages/coding-agent/` structure -2. Port SessionManager from old agent package -3. Implement ReadTool, BashTool, EditTool, WriteTool -4. Implement CodingAgent class (wraps @mariozechner/agent) -5. Implement CLI in `src/cli/` directory: - - CLI entry point (index.ts) - - TuiRenderer, ConsoleRenderer, JsonRenderer - - Argument parsing - - Interactive and single-shot modes -6. Write tests for tools and agent -7. Write integration tests for CLI -8. Publish `@mariozechner/coding-agent@0.1.0` (includes library + CLI binary) - -### Phase 3: Prove Out New Packages -1. Use coding-agent (library + CLI) extensively -2. Fix bugs and iterate on API design -3. Gather feedback from real usage -4. Ensure stability and performance - -### Phase 4: Migrate Web UI (OPTIONAL, later) -1. Once new `@mariozechner/agent` is proven stable -2. Update web-ui package.json to depend on `@mariozechner/agent` -3. Remove src/agent/agent.ts, src/agent/types.ts, src/agent/transports/ -4. Keep src/utils/attachment-utils.ts (document processing) -5. Update imports to use `@mariozechner/agent` -6. Test that web UI still works correctly -7. Verify document attachments (PDF, DOCX, etc.) still work - -### Phase 5: Cleanup -1. Deprecate/remove old `packages/agent/` package -2. Update all documentation -3. Create migration guide -4. Add examples for all use cases - -### Phase 6: Future Enhancements -1. Build VS Code extension using `@mariozechner/coding-agent` -2. Add more tools (grep, find, glob, etc.) as optional plugins -3. Plugin system for custom tools -4. Parallel tool execution -5. Streaming tool output for long-running commands - -## Open Questions & Decisions - -### 1. Should EditTool support multiple replacements? - -**Option A:** Error on multiple matches (current proposal) -- Forces explicit, unambiguous edits -- LLM must be precise with context -- Safer (no accidental mass replacements) - -**Option B:** Replace all matches -- More convenient for bulk changes -- Risk of unintended replacements -- Need `replace_all: boolean` flag - -**Decision:** Start with Option A, add replace_all flag if needed. - -### 2. ReadTool line limit and pagination strategy? - -**Decision:** 5000 line default limit with offset/limit pagination - -**Rationale:** -- **5000 lines** balances context vs token usage (typical file fits in one read) -- **Line-based pagination** is intuitive for LLM (matches how humans think about code) -- **cat -n format** with line numbers helps LLM reference specific lines in edits -- **Automatic truncation warning** teaches LLM to paginate when needed - -**Alternative considered:** Byte-based limits (rejected - harder for LLM to reason about) - -**System prompt guidance:** -``` -When reading large files: -1. First read without offset/limit to get total line count -2. If truncated, calculate chunks: ceil(totalLines / 5000) -3. Read each chunk with appropriate offset -``` - -### 3. Should ReadTool handle binary files? - -**Decision:** Error on binary files with helpful message - -**Error message:** -``` -Error: Cannot read binary file 'dist/app.js'. Use bash tool if you need to inspect: bash(command="file dist/app.js") or bash(command="xxd dist/app.js | head") -``` - -**Rationale:** -- Binary files are rarely useful to LLM -- Clear error message teaches LLM to use appropriate tools -- Prevents token waste on unreadable content - -**Binary detection:** Check for null bytes in first 8KB (same strategy as `git diff`) - -### 4. Should EditTool support regex? - -**Current proposal:** No regex, exact string match only - -**Pros of exact match:** -- Simple implementation -- No regex escaping issues -- Clear error messages -- Safer (no accidental broad matches) - -**Cons:** -- Less powerful -- Multiple edits needed for patterns - -**Decision:** Exact match only. LLM can use bash/sed for complex patterns. - -### 5. Working directory enforcement? - -**Question:** Should tools be sandboxed to workingDirectory? - -**Option A:** Enforce sandbox (only access files under workingDirectory) -- Safer -- Prevents accidental system file edits -- Clear boundaries - -**Option B:** Allow any path -- More flexible -- LLM can edit config files, etc. -- User's responsibility to review - -**Decision:** Start with Option B (no sandbox). Add `--sandbox` flag later if needed. - -### 6. Tool output size limits? - -**Current proposal:** -- ReadTool: 5000 line limit per read (paginate for more) -- BashTool: 1MB truncation -- EditTool: No limit (reasonable file sizes expected) -- WriteTool: No limit (LLM context limited) - -**Alternative:** Enforce global 1MB limit on all tool outputs - -**Decision:** Per-tool limits. ReadTool and BashTool need it most. - -### 7. How to handle long-running bash commands? - -**Question:** Should BashTool stream output or wait for completion? - -**Option A:** Wait for completion (current proposal) -- Simpler implementation -- Full output available for LLM -- Blocks until done - -**Option B:** Stream output -- Better UX (show progress) -- More complex (need to handle partial output) -- LLM sees final output only - -**Decision:** Wait for completion initially. Add streaming later if needed. - -### 8. Package naming alternatives? - -**Current proposal:** -- `@mariozechner/coding-agent` (core) -- `@mariozechner/coding-agent-tui` (TUI) - -**Alternatives:** -- `@mariozechner/file-agent` / `@mariozechner/file-agent-tui` -- `@mariozechner/dev-agent` / `@mariozechner/dev-agent-tui` -- `@mariozechner/pi-code` / `@mariozechner/pi-code-tui` - -**Decision:** `coding-agent` is clear and specific to the use case. - -## Summary - -This architecture provides: - -### General Agent Package (`@mariozechner/agent`) -✅ **Transport abstraction** - Pluggable backends (ProviderTransport, AppTransport) -✅ **Reactive state** - Subscribe/emit pattern for UI binding -✅ **Message transformation** - Flexible pipeline for message filtering/adaptation -✅ **Message queueing** - Out-of-band message injection during agent loop -✅ **Attachment support** - Type-safe attachment handling (processing is external) -✅ **Abort support** - First-class cancellation with AbortController -✅ **Provider agnostic** - Works with any LLM provider via @mariozechner/pi-ai -✅ **Type-safe** - Full TypeScript with proper types - -### Coding Agent Package (`@mariozechner/coding-agent`) -✅ **Builds on general agent** - Leverages transport abstraction and state management -✅ **Session persistence** - JSONL-based session storage and resume -✅ **Focused tools** - read, bash, edit, write (4 tools, no more) -✅ **Smart pagination** - 5000-line chunks with offset/limit for ReadTool -✅ **Working directory context** - All tools operate relative to project root -✅ **Simple API** - Hides complexity, easy to use -✅ **Testable** - Pure functions, mockable dependencies - -### Key Architectural Insights -1. **Extract, don't rewrite** - The web-ui agent is well-designed; extract it into a general package -2. **Separation of concerns** - Document processing (PDF/DOCX/etc.) stays in web-ui, only type definitions move to general agent -3. **Layered architecture** - pi-ai → agent → coding-agent → coding-agent-tui -4. **Reusable across UIs** - Web UI and coding agent both use the same general agent package -5. **Pluggable transports** - Easy to add new backends (local API, proxy server, etc.) -6. **Attachment flexibility** - Type is defined centrally, processing is done by consumers