# Coding Agent Architecture ## Executive Summary This document proposes extracting the agent infrastructure from `@mariozechner/pi-web-ui` and `@mariozechner/pi-agent` into a new headless coding agent package that can be reused across multiple UI implementations (TUI, VS Code extension, web interface). The new architecture will provide: - **Headless agent core** with file manipulation tools (read, bash, edit, write) - **Session management** for conversation persistence and resume capability - **Full abort support** throughout the execution pipeline - **Event-driven API** for flexible UI integration - **Clean separation** between agent logic and presentation layer ## Current Architecture Analysis ### Package Overview ``` pi-mono/ ├── packages/ai/ # Core AI streaming (GOOD - keep as-is) ├── packages/web-ui/ # Web UI with agent (GOOD - keep separate) ├── packages/agent/ # OLD - needs to be replaced ├── packages/tui/ # Terminal UI lib (GOOD - low-level primitives) ├── packages/proxy/ # CORS proxy (unrelated) └── packages/pods/ # GPU deployment tool (unrelated) ``` ### packages/ai - Core Streaming Library **Status:** ✅ Solid foundation, keep as-is **Architecture:** ```typescript agentLoop( prompt: UserMessage, context: AgentContext, config: AgentLoopConfig, signal?: AbortSignal ): EventStream ``` **Key Features:** - Event-driven streaming (turn_start, message_*, tool_execution_*, turn_end, agent_end) - Tool execution with validation - Signal-based cancellation - Message queue for injecting out-of-band messages - Preprocessor support for message transformation **Events:** ```typescript type AgentEvent = | { type: "agent_start" } | { type: "turn_start" } | { type: "message_start"; message: Message } | { type: "message_update"; assistantMessageEvent: AssistantMessageEvent; message: AssistantMessage } | { type: "message_end"; message: Message } | { type: "tool_execution_start"; toolCallId: string; toolName: string; args: any } | { type: "tool_execution_end"; toolCallId: string; toolName: string; result: AgentToolResult | string; isError: boolean } | { type: "turn_end"; message: AssistantMessage; toolResults: ToolResultMessage[] } | { type: "agent_end"; messages: Message[] } ``` **Tool Interface:** ```typescript interface AgentTool extends Tool { label: string; // Human-readable name for UI execute: ( toolCallId: string, params: Static, signal?: AbortSignal ) => Promise>; } interface AgentToolResult { output: string; // Text sent to LLM details: T; // Structured data for UI rendering } ``` ### packages/web-ui/agent - Web Agent **Status:** ✅ Good for web use cases, keep separate **Architecture:** ```typescript class Agent { constructor(opts: { initialState?: Partial; debugListener?: (entry: DebugLogEntry) => void; transport: AgentTransport; messageTransformer?: (messages: AppMessage[]) => Message[]; }) async prompt(input: string, attachments?: Attachment[]): Promise abort(): void subscribe(fn: (e: AgentEvent) => void): () => void } ``` **Key Features:** - **Transport abstraction** (ProviderTransport for direct API, AppTransport for server-side) - **Attachment handling** (images, documents with text extraction) - **Message transformation** (app messages → LLM messages) - **Reactive state** (subscribe pattern for UI updates) - **Message queue** for injecting tool results/errors asynchronously **Why it's different from coding agent:** - Browser-specific concerns (CORS, attachments) - Transport layer for flexible API routing - Tied to web UI state management - Supports rich media attachments ### packages/agent - OLD Implementation **Status:** ⚠️ MUST BE REPLACED **Architecture:** ```typescript class Agent { constructor( config: AgentConfig, renderer?: AgentEventReceiver, sessionManager?: SessionManager ) async ask(userMessage: string): Promise interrupt(): void setEvents(events: AgentEvent[]): void } ``` **Problems:** 1. **Tightly coupled to OpenAI SDK** (not provider-agnostic) 2. **Hardcoded tools** (read, list, bash, glob, rg) 3. **Mixed concerns** (agent logic + tool implementations in same package) 4. **No separation** between core loop and UI rendering 5. **Two API paths** (completions vs responses) with branching logic **Good parts to preserve:** 1. **SessionManager** - JSONL-based session persistence 2. **Event receiver pattern** - Clean UI integration 3. **Abort support** - Proper signal handling 4. **Renderer abstraction** (ConsoleRenderer, TuiRenderer, JsonRenderer) **Tools implemented:** - `read`: Read file contents (1MB limit with truncation) - `list`: List directory contents - `bash`: Execute shell command with abort support - `glob`: Find files matching glob pattern - `rg`: Run ripgrep search ## Proposed Architecture ### Package Structure ``` pi-mono/ ├── packages/ai/ # [unchanged] Core streaming ├── packages/coding-agent/ # [NEW] Headless coding agent │ ├── src/ │ │ ├── agent.ts # Main agent class │ │ ├── session-manager.ts # Session persistence │ │ ├── tools/ │ │ │ ├── read-tool.ts # Read files (with pagination) │ │ │ ├── bash-tool.ts # Shell execution │ │ │ ├── edit-tool.ts # File editing (old_string → new_string) │ │ │ ├── write-tool.ts # File creation/replacement │ │ │ └── index.ts # Tool exports │ │ └── types.ts # Public types │ └── package.json │ ├── packages/coding-agent-tui/ # [NEW] Terminal interface │ ├── src/ │ │ ├── cli.ts # CLI entry point │ │ ├── renderers/ │ │ │ ├── tui-renderer.ts # Rich terminal UI │ │ │ ├── console-renderer.ts # Simple console output │ │ │ └── json-renderer.ts # JSONL output for piping │ │ └── main.ts # App logic │ └── package.json │ ├── packages/web-ui/ # [unchanged] Web UI keeps its own agent └── packages/tui/ # [unchanged] Low-level terminal primitives ``` ### Dependency Graph ``` ┌─────────────────────┐ │ @mariozechner/ │ │ pi-ai │ ← Core streaming, tool interface └──────────┬──────────┘ │ depends on ↓ ┌─────────────────────┐ │ @mariozechner/ │ │ coding-agent │ ← Headless agent + file tools └──────────┬──────────┘ │ depends on ↓ ┌──────────┬──────────┐ ↓ ↓ ↓ ┌────────┐ ┌───────┐ ┌────────┐ │ TUI │ │ VSCode│ │ Web UI │ │ Client │ │ Ext │ │ (own) │ └────────┘ └───────┘ └────────┘ ``` ## Package: @mariozechner/coding-agent ### Core Agent Class ```typescript export interface CodingAgentConfig { systemPrompt: string; model: Model; reasoning?: "low" | "medium" | "high"; apiKey: string; } export interface CodingAgentOptions { config: CodingAgentConfig; sessionManager?: SessionManager; workingDirectory?: string; } export class CodingAgent { constructor(options: CodingAgentOptions); // Send a message to the agent async prompt(message: string, signal?: AbortSignal): AsyncIterable; // Restore from session events (for --continue mode) setMessages(messages: Message[]): void; // Get current message history getMessages(): Message[]; } ``` **Key design decisions:** 1. **AsyncIterable instead of callbacks** - More flexible for consumers 2. **Signal per prompt** - Each prompt() call accepts its own AbortSignal 3. **No internal state management** - Consumers handle UI state 4. **Simple message management** - Get/set for session restoration ### Usage Example (TUI) ```typescript import { CodingAgent } from "@mariozechner/coding-agent"; import { SessionManager } from "@mariozechner/coding-agent"; const session = new SessionManager({ continue: true }); const agent = new CodingAgent({ config: { systemPrompt: "You are a coding assistant...", model: getModel("openai", "gpt-4"), apiKey: process.env.OPENAI_API_KEY!, }, sessionManager: session, workingDirectory: process.cwd(), }); // Restore previous session if (session.hasData()) { agent.setMessages(session.getMessages()); } // Send prompt with abort support const controller = new AbortController(); for await (const event of agent.prompt("Fix the bug in server.ts", controller.signal)) { switch (event.type) { case "message_update": renderer.updateAssistant(event.message); break; case "tool_execution_start": renderer.showTool(event.toolName, event.args); break; case "tool_execution_end": renderer.showToolResult(event.toolName, event.result); break; } } ``` ### Session Manager ```typescript export interface SessionManagerOptions { continue?: boolean; // Resume most recent session directory?: string; // Custom session directory } export interface SessionMetadata { id: string; timestamp: string; cwd: string; config: CodingAgentConfig; } export interface SessionData { metadata: SessionMetadata; messages: Message[]; // Conversation history totalUsage: TokenUsage; // Aggregated token usage } export class SessionManager { constructor(options?: SessionManagerOptions); // Start a new session (writes metadata) startSession(config: CodingAgentConfig): void; // Log an event (appends to JSONL) appendEvent(event: AgentEvent): void; // Check if session has existing data hasData(): boolean; // Get full session data getData(): SessionData | null; // Get just the messages for agent restoration getMessages(): Message[]; // Get session file path getFilePath(): string; // Get session ID getId(): string; } ``` **Session Storage Format (JSONL):** ```jsonl {"type":"session","id":"uuid","timestamp":"2025-10-12T10:00:00Z","cwd":"/path","config":{...}} {"type":"event","timestamp":"2025-10-12T10:00:01Z","event":{"type":"turn_start"}} {"type":"event","timestamp":"2025-10-12T10:00:02Z","event":{"type":"message_start",...}} {"type":"event","timestamp":"2025-10-12T10:00:03Z","event":{"type":"message_end",...}} ``` **Session File Naming:** ``` ~/.pi/sessions/--path-to-project--/ 2025-10-12T10-00-00-000Z_uuid.jsonl 2025-10-12T11-30-00-000Z_uuid.jsonl ``` ### Tool: BashTool ```typescript export interface BashToolDetails { command: string; exitCode: number; duration: number; // milliseconds } export const bashToolSchema = Type.Object({ command: Type.String({ description: "Shell command to execute" }), }); export class BashTool implements AgentTool { name = "bash"; label = "Execute Shell Command"; description = "Execute a bash command in the working directory"; parameters = bashToolSchema; constructor(private workingDirectory: string); async execute( toolCallId: string, params: { command: string }, signal?: AbortSignal ): Promise> { // Spawn child process with signal support // Capture stdout/stderr // Handle 1MB output limit with truncation // Return structured result } } ``` **Key Features:** - Abort support via signal → child process kill - 1MB output limit (prevents memory exhaustion) - Exit code tracking - Working directory context **Output Format:** ```typescript { output: "stdout:\n\nstderr:\n\nexit code: 0", details: { command: "npm test", exitCode: 0, duration: 1234 } } ``` ### Tool: ReadTool ```typescript export interface ReadToolDetails { filePath: string; totalLines: number; linesRead: number; offset: number; truncated: boolean; } export const readToolSchema = Type.Object({ file_path: Type.String({ description: "Path to file to read (relative or absolute)" }), offset: Type.Optional(Type.Number({ description: "Line number to start reading from (1-indexed). Omit to read from beginning.", minimum: 1 })), limit: Type.Optional(Type.Number({ description: "Maximum number of lines to read. Omit to read entire file (max 5000 lines).", minimum: 1, maximum: 5000 })), }); export class ReadTool implements AgentTool { name = "read"; label = "Read File"; description = "Read file contents. For files >5000 lines, use offset and limit to read in chunks."; parameters = readToolSchema; constructor(private workingDirectory: string); async execute( toolCallId: string, params: { file_path: string; offset?: number; limit?: number }, signal?: AbortSignal ): Promise> { // Resolve file path (relative to workingDirectory) // Count total lines in file // If no offset/limit: read up to 5000 lines, warn if truncated // If offset/limit: read specified range // Format with line numbers (using cat -n style) // Return content + metadata } } ``` **Key Features:** - **Full file read**: Up to 5000 lines (warns LLM if truncated) - **Ranged read**: Specify offset + limit for large files - **Line numbers**: Output formatted like `cat -n` (1-indexed) - **Abort support**: Can cancel during large file reads - **Metadata**: Total line count, lines read, truncation status **Output Format (full file):** ```typescript { output: ` 1 import { foo } from './foo'; 2 import { bar } from './bar'; 3 4 export function main() { 5 console.log('hello'); 6 }`, details: { filePath: "src/main.ts", totalLines: 6, linesRead: 6, offset: 0, truncated: false } } ``` **Output Format (large file, truncated):** ```typescript { output: `WARNING: File has 10000 lines, showing first 5000. Use offset and limit parameters to read more. 1 import { foo } from './foo'; 2 import { bar } from './bar'; ... 5000 const x = 42;`, details: { filePath: "src/large.ts", totalLines: 10000, linesRead: 5000, offset: 0, truncated: true } } ``` **Output Format (ranged read):** ```typescript { output: ` 1000 function middleware() { 1001 return (req, res, next) => { 1002 console.log('middleware'); 1003 next(); 1004 }; 1005 }`, details: { filePath: "src/server.ts", totalLines: 10000, linesRead: 6, offset: 1000, truncated: false } } ``` **Error Cases:** - File not found → error - Offset > total lines → error - Binary file detected → error (suggest using bash tool) **Usage Examples in System Prompt:** ``` To read a large file: 1. read(file_path="src/large.ts") // Gets first 5000 lines + total count 2. If truncated, read remaining chunks: read(file_path="src/large.ts", offset=5001, limit=5000) read(file_path="src/large.ts", offset=10001, limit=5000) ``` ### Tool: EditTool ```typescript export interface EditToolDetails { filePath: string; oldString: string; newString: string; matchCount: number; linesChanged: number; } export const editToolSchema = Type.Object({ file_path: Type.String({ description: "Path to file to edit (relative or absolute)" }), old_string: Type.String({ description: "Exact string to find and replace" }), new_string: Type.String({ description: "String to replace with" }), }); export class EditTool implements AgentTool { name = "edit"; label = "Edit File"; description = "Find and replace exact string in a file"; parameters = editToolSchema; constructor(private workingDirectory: string); async execute( toolCallId: string, params: { file_path: string; old_string: string; new_string: string }, signal?: AbortSignal ): Promise> { // Resolve file path (relative to workingDirectory) // Read file contents // Find old_string (must be exact match) // Replace with new_string // Write file back // Return stats } } ``` **Key Features:** - Exact string matching (no regex) - Safe atomic writes (write temp → rename) - Abort support (cancel before write) - Match validation (error if old_string not found) - Line-based change tracking **Output Format:** ```typescript { output: "Replaced 1 occurrence in src/server.ts (3 lines changed)", details: { filePath: "src/server.ts", oldString: "const port = 3000;", newString: "const port = process.env.PORT || 3000;", matchCount: 1, linesChanged: 3 } } ``` **Error Cases:** - File not found → error - old_string not found → error - Multiple matches for old_string → error (ambiguous) - File changed during operation → error (race condition) ### Tool: WriteTool ```typescript export interface WriteToolDetails { filePath: string; size: number; isNew: boolean; } export const writeToolSchema = Type.Object({ file_path: Type.String({ description: "Path to file to create/overwrite" }), content: Type.String({ description: "Full file contents to write" }), }); export class WriteTool implements AgentTool { name = "write"; label = "Write File"; description = "Create a new file or completely replace existing file contents"; parameters = writeToolSchema; constructor(private workingDirectory: string); async execute( toolCallId: string, params: { file_path: string; content: string }, signal?: AbortSignal ): Promise> { // Resolve file path // Check if file exists (track isNew) // Create parent directories if needed // Write content atomically // Return stats } } ``` **Key Features:** - Creates parent directories automatically - Safe atomic writes - Abort support - No size limits (trust LLM context limits) **Output Format:** ```typescript { output: "Created new file src/utils/helper.ts (142 bytes)", details: { filePath: "src/utils/helper.ts", size: 142, isNew: true } } ``` ## Package: @mariozechner/coding-agent-tui ### CLI Interface ```bash # Interactive mode (default) coding-agent # Continue previous session coding-agent --continue # Single-shot mode coding-agent "Fix the TypeScript errors" # Multiple prompts coding-agent "Add validation" "Write tests" # Custom model coding-agent --model openai/gpt-4 --api-key $KEY # JSON output (for piping) coding-agent --json < prompts.jsonl > results.jsonl ``` ### Arguments ```typescript { "base-url": string; // API endpoint "api-key": string; // API key (or env var) "model": string; // Model identifier "system-prompt": string; // System prompt "continue": boolean; // Resume session "json": boolean; // JSONL I/O mode "help": boolean; // Show help } ``` ### Renderers **TuiRenderer** - Rich terminal UI - Real-time streaming output - Syntax highlighting for code - Tool execution indicators - Progress spinners - Token usage stats - Keyboard shortcuts (Ctrl+C to abort) **ConsoleRenderer** - Simple console output - Plain text output - No ANSI codes - Good for logging/CI **JsonRenderer** - JSONL output - One JSON object per line - Each line is a complete event - For piping/processing ### JSON Mode Example Input (stdin): ```jsonl {"type":"message","content":"List all TypeScript files"} {"type":"interrupt"} {"type":"message","content":"Count the files"} ``` Output (stdout): ```jsonl {"type":"turn_start","timestamp":"..."} {"type":"message_start","message":{...}} {"type":"tool_execution_start","toolCallId":"...","toolName":"bash","args":"{...}"} {"type":"tool_execution_end","toolCallId":"...","result":"..."} {"type":"message_end","message":{...}} {"type":"turn_end"} {"type":"interrupted"} {"type":"message_start","message":{...}} ... ``` ## Integration Patterns ### VS Code Extension ```typescript import { CodingAgent, SessionManager } from "@mariozechner/coding-agent"; import * as vscode from "vscode"; class CodingAgentProvider { private agent: CodingAgent; private outputChannel: vscode.OutputChannel; constructor() { const workspaceRoot = vscode.workspace.workspaceFolders?.[0]?.uri.fsPath || process.cwd(); const session = new SessionManager({ directory: path.join(workspaceRoot, ".vscode", "agent-sessions") }); this.agent = new CodingAgent({ config: { systemPrompt: "You are a coding assistant...", model: getModel("openai", "gpt-4"), apiKey: vscode.workspace.getConfiguration("codingAgent").get("apiKey")!, }, sessionManager: session, workingDirectory: workspaceRoot, }); this.outputChannel = vscode.window.createOutputChannel("Coding Agent"); } async executePrompt(prompt: string) { const cancellation = new vscode.CancellationTokenSource(); // Convert VS Code cancellation to AbortSignal const controller = new AbortController(); cancellation.token.onCancellationRequested(() => controller.abort()); for await (const event of this.agent.prompt(prompt, controller.signal)) { switch (event.type) { case "message_update": this.outputChannel.appendLine(event.message.content[0].text); break; case "tool_execution_start": vscode.window.showInformationMessage(`Running: ${event.toolName}`); break; case "tool_execution_end": if (event.isError) { vscode.window.showErrorMessage(`Tool failed: ${event.result}`); } break; } } } } ``` ### Headless Server/API ```typescript import { CodingAgent } from "@mariozechner/coding-agent"; import express from "express"; const app = express(); app.post("/api/prompt", async (req, res) => { const { prompt, sessionId } = req.body; const agent = new CodingAgent({ config: { systemPrompt: "...", model: getModel("openai", "gpt-4"), apiKey: process.env.OPENAI_API_KEY!, }, workingDirectory: `/tmp/workspaces/${sessionId}`, }); // Stream SSE res.setHeader("Content-Type", "text/event-stream"); const controller = new AbortController(); req.on("close", () => controller.abort()); for await (const event of agent.prompt(prompt, controller.signal)) { res.write(`data: ${JSON.stringify(event)}\n\n`); } res.end(); }); app.listen(3000); ``` ## Migration Plan ### Phase 1: Extract Core Package 1. Create `packages/coding-agent/` structure 2. Port SessionManager from old agent package 3. Implement BashTool, EditTool, WriteTool 4. Implement CodingAgent class using pi-ai/agentLoop 5. Write tests for each tool 6. Write integration tests ### Phase 2: Build TUI 1. Create `packages/coding-agent-tui/` 2. Port TuiRenderer from old agent package 3. Port ConsoleRenderer, JsonRenderer 4. Implement CLI argument parsing 5. Implement interactive and single-shot modes 6. Test session resume functionality ### Phase 3: Update Dependencies 1. Update web-ui if needed (should be unaffected) 2. Deprecate old agent package 3. Update documentation 4. Update examples ### Phase 4: Future Enhancements 1. Build VS Code extension 2. Add more tools (grep, find, etc.) as optional 3. Plugin system for custom tools 4. Parallel tool execution ## Open Questions & Decisions ### 1. Should EditTool support multiple replacements? **Option A:** Error on multiple matches (current proposal) - Forces explicit, unambiguous edits - LLM must be precise with context - Safer (no accidental mass replacements) **Option B:** Replace all matches - More convenient for bulk changes - Risk of unintended replacements - Need `replace_all: boolean` flag **Decision:** Start with Option A, add replace_all flag if needed. ### 2. ReadTool line limit and pagination strategy? **Decision:** 5000 line default limit with offset/limit pagination **Rationale:** - **5000 lines** balances context vs token usage (typical file fits in one read) - **Line-based pagination** is intuitive for LLM (matches how humans think about code) - **cat -n format** with line numbers helps LLM reference specific lines in edits - **Automatic truncation warning** teaches LLM to paginate when needed **Alternative considered:** Byte-based limits (rejected - harder for LLM to reason about) **System prompt guidance:** ``` When reading large files: 1. First read without offset/limit to get total line count 2. If truncated, calculate chunks: ceil(totalLines / 5000) 3. Read each chunk with appropriate offset ``` ### 3. Should ReadTool handle binary files? **Decision:** Error on binary files with helpful message **Error message:** ``` Error: Cannot read binary file 'dist/app.js'. Use bash tool if you need to inspect: bash(command="file dist/app.js") or bash(command="xxd dist/app.js | head") ``` **Rationale:** - Binary files are rarely useful to LLM - Clear error message teaches LLM to use appropriate tools - Prevents token waste on unreadable content **Binary detection:** Check for null bytes in first 8KB (same strategy as `git diff`) ### 4. Should EditTool support regex? **Current proposal:** No regex, exact string match only **Pros of exact match:** - Simple implementation - No regex escaping issues - Clear error messages - Safer (no accidental broad matches) **Cons:** - Less powerful - Multiple edits needed for patterns **Decision:** Exact match only. LLM can use bash/sed for complex patterns. ### 5. Working directory enforcement? **Question:** Should tools be sandboxed to workingDirectory? **Option A:** Enforce sandbox (only access files under workingDirectory) - Safer - Prevents accidental system file edits - Clear boundaries **Option B:** Allow any path - More flexible - LLM can edit config files, etc. - User's responsibility to review **Decision:** Start with Option B (no sandbox). Add `--sandbox` flag later if needed. ### 6. Tool output size limits? **Current proposal:** - ReadTool: 5000 line limit per read (paginate for more) - BashTool: 1MB truncation - EditTool: No limit (reasonable file sizes expected) - WriteTool: No limit (LLM context limited) **Alternative:** Enforce global 1MB limit on all tool outputs **Decision:** Per-tool limits. ReadTool and BashTool need it most. ### 7. How to handle long-running bash commands? **Question:** Should BashTool stream output or wait for completion? **Option A:** Wait for completion (current proposal) - Simpler implementation - Full output available for LLM - Blocks until done **Option B:** Stream output - Better UX (show progress) - More complex (need to handle partial output) - LLM sees final output only **Decision:** Wait for completion initially. Add streaming later if needed. ### 8. Package naming alternatives? **Current proposal:** - `@mariozechner/coding-agent` (core) - `@mariozechner/coding-agent-tui` (TUI) **Alternatives:** - `@mariozechner/file-agent` / `@mariozechner/file-agent-tui` - `@mariozechner/dev-agent` / `@mariozechner/dev-agent-tui` - `@mariozechner/pi-code` / `@mariozechner/pi-code-tui` **Decision:** `coding-agent` is clear and specific to the use case. ## Summary This architecture provides: ✅ **Headless core** - Clean separation between agent logic and UI ✅ **Reusable** - Same agent for TUI, VS Code, web, APIs ✅ **Composable** - Build on pi-ai primitives ✅ **Abortable** - First-class cancellation support ✅ **Session persistence** - Resume conversations seamlessly ✅ **Focused tools** - read, bash, edit, write (4 tools, no more) ✅ **Smart pagination** - 5000-line chunks with offset/limit ✅ **Type-safe** - Full TypeScript with schema validation ✅ **Testable** - Pure functions, mockable dependencies The key insight is to **keep web-ui's agent separate** (it has different concerns) while creating a **new focused coding agent** for file manipulation workflows that can be shared across non-web interfaces.