Agent package + coding agent WIP, refactored web-ui prompts

This commit is contained in:
Mario Zechner 2025-10-17 11:47:01 +02:00
parent 4e7a340460
commit ffc9be8867
58 changed files with 5138 additions and 2206 deletions

View file

@ -1,15 +1,23 @@
# Coding Agent Architecture
# Agent Architecture
## Executive Summary
This document proposes extracting the agent infrastructure from `@mariozechner/pi-web-ui` and `@mariozechner/pi-agent` into a new headless coding agent package that can be reused across multiple UI implementations (TUI, VS Code extension, web interface).
This document proposes extracting the agent infrastructure from `@mariozechner/pi-web-ui` into two new packages:
1. **`@mariozechner/agent`** - General-purpose agent package with transport abstraction, state management, and attachment support
2. **`@mariozechner/coding-agent`** - Specialized coding agent built on the general agent, with file manipulation tools and session management
The new architecture will provide:
- **Headless agent core** with file manipulation tools (read, bash, edit, write)
- **Session management** for conversation persistence and resume capability
- **General agent core** with transport abstraction (ProviderTransport, AppTransport)
- **Reactive state management** with subscribe/emit pattern
- **Attachment support** (type definitions only - processing stays in consumers)
- **Message transformation** pipeline for filtering and adapting messages
- **Message queueing** for out-of-band message injection
- **Full abort support** throughout the execution pipeline
- **Event-driven API** for flexible UI integration
- **Clean separation** between agent logic and presentation layer
- **Coding-specific tools** (read, bash, edit, write) in a specialized package
- **Session management** for conversation persistence and resume capability
## Current Architecture Analysis
@ -18,7 +26,7 @@ The new architecture will provide:
```
pi-mono/
├── packages/ai/ # Core AI streaming (GOOD - keep as-is)
├── packages/web-ui/ # Web UI with agent (GOOD - keep separate)
├── packages/web-ui/ # Web UI with embedded agent (EXTRACT core agent logic)
├── packages/agent/ # OLD - needs to be replaced
├── packages/tui/ # Terminal UI lib (GOOD - low-level primitives)
├── packages/proxy/ # CORS proxy (unrelated)
@ -77,9 +85,9 @@ interface AgentToolResult<T> {
}
```
### packages/web-ui/agent - Web Agent
### packages/web-ui/src/agent - Web Agent
**Status:** ✅ Good for web use cases, keep separate
**Status:** ✅ KEEP AS-IS for now, will be replaced later after new packages are proven
**Architecture:**
```typescript
@ -94,60 +102,49 @@ class Agent {
async prompt(input: string, attachments?: Attachment[]): Promise<void>
abort(): void
subscribe(fn: (e: AgentEvent) => void): () => void
setSystemPrompt(v: string): void
setModel(m: Model<any>): void
setThinkingLevel(l: ThinkingLevel): void
setTools(t: AgentTool<any>[]): void
replaceMessages(ms: AppMessage[]): void
appendMessage(m: AppMessage): void
async queueMessage(m: AppMessage): Promise<void>
clearMessages(): void
}
```
**Key Features:**
- **Transport abstraction** (ProviderTransport for direct API, AppTransport for server-side)
- **Attachment handling** (images, documents with text extraction)
- **Message transformation** (app messages → LLM messages)
- **Reactive state** (subscribe pattern for UI updates)
- **Message queue** for injecting tool results/errors asynchronously
**Key Features (will be basis for new `@mariozechner/agent` package):**
- ✅ **Transport abstraction** (ProviderTransport for direct API, AppTransport for server-side proxy)
- ✅ **Attachment type definition** (id, type, fileName, mimeType, size, content, extractedText, preview)
- ✅ **Message transformation** pipeline (app messages → LLM messages, with filtering)
- ✅ **Reactive state** (subscribe/emit pattern for UI updates)
- ✅ **Message queueing** for injecting messages out-of-band during agent loop
- ✅ **Abort support** (AbortController per prompt)
- ✅ **State management** (systemPrompt, model, thinkingLevel, tools, messages, isStreaming, etc.)
**Why it's different from coding agent:**
- Browser-specific concerns (CORS, attachments)
- Transport layer for flexible API routing
- Tied to web UI state management
- Supports rich media attachments
**Strategy:**
1. Use this implementation as the **reference design** for `@mariozechner/agent`
2. Create new `@mariozechner/agent` package by copying/adapting this code
3. Keep web-ui using its own embedded agent until new package is proven stable
4. Eventually migrate web-ui to use `@mariozechner/agent` (Phase 2 of migration)
5. Document processing (PDF/DOCX/PPTX/Excel) stays in web-ui permanently
### packages/agent - OLD Implementation
**Status:** ⚠️ MUST BE REPLACED
**Status:** ⚠️ REMOVE COMPLETELY
**Architecture:**
```typescript
class Agent {
constructor(
config: AgentConfig,
renderer?: AgentEventReceiver,
sessionManager?: SessionManager
)
**Why it should be removed:**
1. **Tightly coupled to OpenAI SDK** - Not provider-agnostic, hardcoded to OpenAI's API
2. **Outdated architecture** - Superseded by web-ui's better agent design
3. **Mixed concerns** - Agent logic + tool implementations + rendering all in one package
4. **Limited scope** - Cannot be reused across different UI implementations
async ask(userMessage: string): Promise<void>
interrupt(): void
setEvents(events: AgentEvent[]): void
}
```
**What to salvage before removal:**
1. **SessionManager** - Port to `@mariozechner/coding-agent` (JSONL-based session persistence)
2. **Tool implementations** - Adapt read, bash, edit, write tools for coding-agent
3. **Renderer abstractions** - Port TuiRenderer/ConsoleRenderer/JsonRenderer concepts to coding-agent-tui
**Problems:**
1. **Tightly coupled to OpenAI SDK** (not provider-agnostic)
2. **Hardcoded tools** (read, list, bash, glob, rg)
3. **Mixed concerns** (agent logic + tool implementations in same package)
4. **No separation** between core loop and UI rendering
5. **Two API paths** (completions vs responses) with branching logic
**Good parts to preserve:**
1. **SessionManager** - JSONL-based session persistence
2. **Event receiver pattern** - Clean UI integration
3. **Abort support** - Proper signal handling
4. **Renderer abstraction** (ConsoleRenderer, TuiRenderer, JsonRenderer)
**Tools implemented:**
- `read`: Read file contents (1MB limit with truncation)
- `list`: List directory contents
- `bash`: Execute shell command with abort support
- `glob`: Find files matching glob pattern
- `rg`: Run ripgrep search
**Action:** Delete this package entirely after extracting useful components to the new packages.
## Proposed Architecture
@ -155,132 +152,440 @@ class Agent {
```
pi-mono/
├── packages/ai/ # [unchanged] Core streaming
├── packages/coding-agent/ # [NEW] Headless coding agent
├── packages/ai/ # [unchanged] Core streaming library
├── packages/agent/ # [NEW] General-purpose agent
│ ├── src/
│ │ ├── agent.ts # Main agent class
│ │ ├── session-manager.ts # Session persistence
│ │ ├── agent.ts # Main Agent class
│ │ ├── types.ts # AgentState, AgentEvent, Attachment, etc.
│ │ ├── transports/
│ │ │ ├── types.ts # AgentTransport interface
│ │ │ ├── ProviderTransport.ts # Direct API calls
│ │ │ ├── AppTransport.ts # Server-side proxy
│ │ │ ├── proxy-types.ts # Proxy event types
│ │ │ └── index.ts # Transport exports
│ │ └── index.ts # Public API
│ └── package.json
├── packages/coding-agent/ # [NEW] Coding-specific agent + CLI
│ ├── src/
│ │ ├── coding-agent.ts # CodingAgent wrapper (uses @mariozechner/agent)
│ │ ├── session-manager.ts # Session persistence (JSONL)
│ │ ├── tools/
│ │ │ ├── read-tool.ts # Read files (with pagination)
│ │ │ ├── bash-tool.ts # Shell execution
│ │ │ ├── edit-tool.ts # File editing (old_string → new_string)
│ │ │ ├── write-tool.ts # File creation/replacement
│ │ │ └── index.ts # Tool exports
│ │ └── types.ts # Public types
│ └── package.json
│ │ ├── cli/
│ │ │ ├── index.ts # CLI entry point
│ │ │ ├── renderers/
│ │ │ │ ├── tui-renderer.ts # Rich terminal UI
│ │ │ │ ├── console-renderer.ts # Simple console output
│ │ │ │ └── json-renderer.ts # JSONL output for piping
│ │ │ └── main.ts # CLI app logic
│ │ ├── types.ts # Public types
│ │ └── index.ts # Public API (agent + tools)
│ └── package.json # Exports both library + CLI binary
├── packages/coding-agent-tui/ # [NEW] Terminal interface
├── packages/web-ui/ # [updated] Uses @mariozechner/agent
│ ├── src/
│ │ ├── cli.ts # CLI entry point
│ │ ├── renderers/
│ │ │ ├── tui-renderer.ts # Rich terminal UI
│ │ │ ├── console-renderer.ts # Simple console output
│ │ │ └── json-renderer.ts # JSONL output for piping
│ │ └── main.ts # App logic
│ └── package.json
│ │ ├── utils/
│ │ │ └── attachment-utils.ts # Document processing (keep here)
│ │ └── ... # Other web UI code
│ └── package.json # Now depends on @mariozechner/agent
├── packages/web-ui/ # [unchanged] Web UI keeps its own agent
└── packages/tui/ # [unchanged] Low-level terminal primitives
```
### Dependency Graph
```
┌─────────────────────┐
@mariozechner/
│ pi-ai │ ← Core streaming, tool interface
└──────────┬──────────┘
│ depends on
┌─────────────────────┐
@mariozechner/
│ coding-agent │ ← Headless agent + file tools
└──────────┬──────────┘
│ depends on
┌──────────┬──────────┐
↓ ↓ ↓
┌────────┐ ┌───────┐ ┌────────┐
│ TUI │ │ VSCode│ │ Web UI │
│ Client │ │ Ext │ │ (own) │
└────────┘ └───────┘ └────────┘
┌─────────────────────┐
@mariozechner/
│ pi-ai │ ← Core streaming, tool interface
└──────────┬──────────┘
│ depends on
┌─────────────────────┐
@mariozechner/
│ agent │ ← General agent (transports, state, attachments)
└──────────┬──────────┘
│ depends on
┌───────────────┴───────────────┐
↓ ↓
┌─────────────────────┐ ┌─────────────────────┐
@mariozechner/ │ │ @mariozechner/
│ coding-agent │ │ pi-web-ui │
│ (lib + CLI + tools) │ │ (+ doc processing) │
└─────────────────────┘ └─────────────────────┘
```
## Package: @mariozechner/agent
### Core Types
```typescript
export interface Attachment {
id: string;
type: "image" | "document";
fileName: string;
mimeType: string;
size: number;
content: string; // base64 encoded (without data URL prefix)
extractedText?: string; // For documents
preview?: string; // base64 image preview
}
export type ThinkingLevel = "off" | "minimal" | "low" | "medium" | "high";
// AppMessage abstraction - extends base LLM messages with app-specific features
export type UserMessageWithAttachments = UserMessage & { attachments?: Attachment[] };
// Extensible interface for custom app messages (via declaration merging)
// Apps can add their own message types:
// declare module "@mariozechner/agent" {
// interface CustomMessages {
// artifact: ArtifactMessage;
// notification: NotificationMessage;
// }
// }
export interface CustomMessages {
// Empty by default - apps extend via declaration merging
}
// AppMessage: Union of LLM messages + attachments + custom messages
export type AppMessage =
| AssistantMessage
| UserMessageWithAttachments
| ToolResultMessage
| CustomMessages[keyof CustomMessages];
export interface AgentState {
systemPrompt: string;
model: Model<any>;
thinkingLevel: ThinkingLevel;
tools: AgentTool<any>[];
messages: AppMessage[]; // Can include attachments + custom message types
isStreaming: boolean;
streamMessage: Message | null;
pendingToolCalls: Set<string>;
error?: string;
}
export type AgentEvent =
| { type: "state-update"; state: AgentState }
| { type: "started" }
| { type: "completed" };
```
### AppMessage Abstraction
The `AppMessage` type is a key abstraction that extends base LLM messages with app-specific features while maintaining type safety and extensibility.
**Key Benefits:**
1. **Extends base messages** - Adds `attachments` field to `UserMessage` for file uploads
2. **Type-safe extensibility** - Apps can add custom message types via declaration merging
3. **Backward compatible** - Works seamlessly with base LLM messages from `@mariozechner/pi-ai`
4. **Message transformation** - Filters app-specific fields before sending to LLM
**Usage Example (Web UI):**
```typescript
import type { AppMessage } from "@mariozechner/agent";
// Extend with custom message type for artifacts
declare module "@mariozechner/agent" {
interface CustomMessages {
artifact: ArtifactMessage;
}
}
interface ArtifactMessage {
role: "artifact";
action: "create" | "update" | "delete";
filename: string;
content?: string;
title?: string;
timestamp: string;
}
// Now AppMessage includes: AssistantMessage | UserMessageWithAttachments | ToolResultMessage | ArtifactMessage
const messages: AppMessage[] = [
{ role: "user", content: "Hello", attachments: [attachment] },
{ role: "assistant", content: [{ type: "text", text: "Hi!" }], /* ... */ },
{ role: "artifact", action: "create", filename: "test.ts", content: "...", timestamp: "..." }
];
```
**Usage Example (Coding Agent):**
```typescript
import type { AppMessage } from "@mariozechner/agent";
// Coding agent can extend with session metadata
declare module "@mariozechner/agent" {
interface CustomMessages {
session_metadata: SessionMetadataMessage;
}
}
interface SessionMetadataMessage {
role: "session_metadata";
sessionId: string;
timestamp: string;
workingDirectory: string;
}
```
**Message Transformation:**
The `messageTransformer` function converts app messages to LLM-compatible messages, including handling attachments:
```typescript
function defaultMessageTransformer(messages: AppMessage[]): Message[] {
return messages
.filter((m) => {
// Only keep standard LLM message roles
return m.role === "user" || m.role === "assistant" || m.role === "toolResult";
})
.map((m) => {
if (m.role === "user") {
const { attachments, ...baseMessage } = m as any;
// If no attachments, return as-is
if (!attachments || attachments.length === 0) {
return baseMessage as Message;
}
// Convert attachments to content blocks
const content = Array.isArray(baseMessage.content)
? [...baseMessage.content]
: [{ type: "text", text: baseMessage.content }];
for (const attachment of attachments) {
// Add image blocks for image attachments
if (attachment.type === "image") {
content.push({
type: "image",
data: attachment.content,
mimeType: attachment.mimeType
});
}
// Add text blocks for documents with extracted text
else if (attachment.type === "document" && attachment.extractedText) {
content.push({
type: "text",
text: attachment.extractedText
});
}
}
return { ...baseMessage, content } as Message;
}
return m as Message;
});
}
```
This ensures that:
- Custom message types (like `artifact`, `session_metadata`) are filtered out
- Image attachments are converted to `ImageContent` blocks
- Document attachments with extracted text are converted to `TextContent` blocks
- The `attachments` field itself is stripped (replaced by proper content blocks)
- LLM receives only standard `Message` types from `@mariozechner/pi-ai`
### Agent Class
```typescript
export interface AgentOptions {
initialState?: Partial<AgentState>;
transport: AgentTransport;
// Transform app messages to LLM-compatible messages before sending
messageTransformer?: (messages: AppMessage[]) => Message[] | Promise<Message[]>;
}
export class Agent {
constructor(opts: AgentOptions);
get state(): AgentState;
subscribe(fn: (e: AgentEvent) => void): () => void;
// State mutators
setSystemPrompt(v: string): void;
setModel(m: Model<any>): void;
setThinkingLevel(l: ThinkingLevel): void;
setTools(t: AgentTool<any>[]): void;
replaceMessages(ms: AppMessage[]): void;
appendMessage(m: AppMessage): void;
async queueMessage(m: AppMessage): Promise<void>;
clearMessages(): void;
// Main prompt method
async prompt(input: string, attachments?: Attachment[]): Promise<void>;
// Abort current operation
abort(): void;
}
```
**Key Features:**
1. **Reactive state** - Subscribe to state updates for UI binding
2. **Transport abstraction** - Pluggable backends (direct API, proxy server, etc.)
3. **Message transformation** - Convert app-specific messages to LLM format
4. **Message queueing** - Inject messages during agent loop (for tool results, errors)
5. **Attachment support** - Type-safe attachment handling (processing is external)
6. **Abort support** - Cancel in-progress operations
### Transport Interface
```typescript
export interface AgentRunConfig {
systemPrompt: string;
tools: AgentTool<any>[];
model: Model<any>;
reasoning?: "low" | "medium" | "high";
getQueuedMessages?: <T>() => Promise<QueuedMessage<T>[]>;
}
export interface AgentTransport {
run(
messages: Message[],
userMessage: Message,
config: AgentRunConfig,
signal?: AbortSignal,
): AsyncIterable<AgentEvent>;
}
```
### ProviderTransport
```typescript
export class ProviderTransport implements AgentTransport {
async *run(messages: Message[], userMessage: Message, cfg: AgentRunConfig, signal?: AbortSignal) {
// Calls LLM providers directly using agentLoop from @mariozechner/pi-ai
// Optionally routes through CORS proxy if configured
}
}
```
### AppTransport
```typescript
export class AppTransport implements AgentTransport {
constructor(proxyUrl: string);
async *run(messages: Message[], userMessage: Message, cfg: AgentRunConfig, signal?: AbortSignal) {
// Routes requests through app server with user authentication
// Server manages API keys and usage tracking
}
}
```
## Package: @mariozechner/coding-agent
### Core Agent Class
### CodingAgent Class
```typescript
export interface CodingAgentConfig {
export interface CodingAgentOptions {
systemPrompt: string;
model: Model<any>;
reasoning?: "low" | "medium" | "high";
apiKey: string;
}
export interface CodingAgentOptions {
config: CodingAgentConfig;
sessionManager?: SessionManager;
workingDirectory?: string;
sessionManager?: SessionManager;
}
export class CodingAgent {
constructor(options: CodingAgentOptions);
// Access underlying agent
get agent(): Agent;
// State accessors
get state(): AgentState;
subscribe(fn: (e: AgentEvent) => void): () => void;
// Send a message to the agent
async prompt(message: string, signal?: AbortSignal): AsyncIterable<AgentEvent>;
async prompt(message: string, attachments?: Attachment[]): Promise<void>;
// Restore from session events (for --continue mode)
setMessages(messages: Message[]): void;
// Abort current operation
abort(): void;
// Get current message history
getMessages(): Message[];
// Message management for session restoration
replaceMessages(messages: AppMessage[]): void;
getMessages(): AppMessage[];
}
```
**Key design decisions:**
1. **AsyncIterable instead of callbacks** - More flexible for consumers
2. **Signal per prompt** - Each prompt() call accepts its own AbortSignal
3. **No internal state management** - Consumers handle UI state
4. **Simple message management** - Get/set for session restoration
1. **Wraps @mariozechner/agent** - Builds on the general agent package
2. **Pre-configured tools** - Includes read, bash, edit, write tools
3. **Session management** - Optional JSONL-based session persistence
4. **Working directory context** - All file operations relative to this directory
5. **Simple API** - Hides transport complexity, uses ProviderTransport by default
### Usage Example (TUI)
```typescript
import { CodingAgent } from "@mariozechner/coding-agent";
import { SessionManager } from "@mariozechner/coding-agent";
import { getModel } from "@mariozechner/pi-ai";
const session = new SessionManager({ continue: true });
const agent = new CodingAgent({
config: {
systemPrompt: "You are a coding assistant...",
model: getModel("openai", "gpt-4"),
apiKey: process.env.OPENAI_API_KEY!,
},
sessionManager: session,
systemPrompt: "You are a coding assistant...",
model: getModel("openai", "gpt-4"),
apiKey: process.env.OPENAI_API_KEY!,
workingDirectory: process.cwd(),
sessionManager: session,
});
// Restore previous session
if (session.hasData()) {
agent.setMessages(session.getMessages());
agent.replaceMessages(session.getMessages());
}
// Send prompt with abort support
const controller = new AbortController();
for await (const event of agent.prompt("Fix the bug in server.ts", controller.signal)) {
switch (event.type) {
case "message_update":
renderer.updateAssistant(event.message);
break;
case "tool_execution_start":
renderer.showTool(event.toolName, event.args);
break;
case "tool_execution_end":
renderer.showToolResult(event.toolName, event.result);
break;
// Subscribe to state changes
agent.subscribe((event) => {
if (event.type === "state-update") {
renderer.render(event.state);
} else if (event.type === "completed") {
session.save(agent.getMessages());
}
}
});
// Send prompt
await agent.prompt("Fix the bug in server.ts");
```
### Usage Example (Web UI)
```typescript
import { Agent, ProviderTransport, Attachment } from "@mariozechner/agent";
import { getModel } from "@mariozechner/pi-ai";
import { loadAttachment } from "./utils/attachment-utils"; // Web UI keeps this
const agent = new Agent({
transport: new ProviderTransport(),
initialState: {
systemPrompt: "You are a helpful assistant...",
model: getModel("google", "gemini-2.5-flash"),
thinkingLevel: "low",
tools: [],
},
});
// Subscribe to state changes for UI updates
agent.subscribe((event) => {
if (event.type === "state-update") {
updateUI(event.state);
}
});
// Handle file upload and send prompt
const file = await fileInput.files[0];
const attachment = await loadAttachment(file); // Processes PDF/DOCX/etc
await agent.prompt("Analyze this document", [attachment]);
```
### Session Manager
@ -300,8 +605,7 @@ export interface SessionMetadata {
export interface SessionData {
metadata: SessionMetadata;
messages: Message[]; // Conversation history
totalUsage: TokenUsage; // Aggregated token usage
messages: AppMessage[]; // Conversation history
}
export class SessionManager {
@ -310,8 +614,8 @@ export class SessionManager {
// Start a new session (writes metadata)
startSession(config: CodingAgentConfig): void;
// Log an event (appends to JSONL)
appendEvent(event: AgentEvent): void;
// Append a message to the session (appends to JSONL)
appendMessage(message: AppMessage): void;
// Check if session has existing data
hasData(): boolean;
@ -320,7 +624,7 @@ export class SessionManager {
getData(): SessionData | null;
// Get just the messages for agent restoration
getMessages(): Message[];
getMessages(): AppMessage[];
// Get session file path
getFilePath(): string;
@ -332,12 +636,19 @@ export class SessionManager {
**Session Storage Format (JSONL):**
```jsonl
{"type":"session","id":"uuid","timestamp":"2025-10-12T10:00:00Z","cwd":"/path","config":{...}}
{"type":"event","timestamp":"2025-10-12T10:00:01Z","event":{"type":"turn_start"}}
{"type":"event","timestamp":"2025-10-12T10:00:02Z","event":{"type":"message_start",...}}
{"type":"event","timestamp":"2025-10-12T10:00:03Z","event":{"type":"message_end",...}}
{"type":"metadata","id":"uuid","timestamp":"2025-10-12T10:00:00Z","cwd":"/path","config":{...}}
{"type":"message","message":{"role":"user","content":"Fix the bug in server.ts"}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"I'll help..."}],...}}
{"type":"message","message":{"role":"toolResult","toolCallId":"call_123","output":"..."}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"Fixed!"}],...}}
```
**How it works:**
- First line is session metadata (id, timestamp, working directory, config)
- Each subsequent line is an `AppMessage` from `agent.state.messages`
- Messages are appended as they're added to the agent state (append-only)
- On session restore, read all message lines to reconstruct conversation history
**Session File Naming:**
```
~/.pi/sessions/--path-to-project--/
@ -643,9 +954,11 @@ export class WriteTool implements AgentTool<typeof writeToolSchema, WriteToolDet
}
```
## Package: @mariozechner/coding-agent-tui
## CLI Interface (included in @mariozechner/coding-agent)
### CLI Interface
The coding-agent package includes both a library and a CLI interface in one package.
### CLI Usage
```bash
# Interactive mode (default)
@ -667,7 +980,7 @@ coding-agent --model openai/gpt-4 --api-key $KEY
coding-agent --json < prompts.jsonl > results.jsonl
```
### Arguments
### CLI Arguments
```typescript
{
@ -818,33 +1131,58 @@ app.listen(3000);
## Migration Plan
### Phase 1: Extract Core Package
### Phase 1: Create General Agent Package
1. Create `packages/agent/` structure
2. **COPY** Agent class from web-ui/src/agent/agent.ts (don't extract yet)
3. Copy types (AgentState, AgentEvent, Attachment, DebugLogEntry, ThinkingLevel)
4. Copy transports (types.ts, ProviderTransport.ts, AppTransport.ts, proxy-types.ts)
5. Adapt code to work as standalone package
6. Write unit tests for Agent class
7. Write tests for both transports
8. Publish `@mariozechner/agent@0.1.0`
9. **Keep web-ui unchanged** - it continues using its embedded agent
### Phase 2: Create Coding Agent Package (with CLI)
1. Create `packages/coding-agent/` structure
2. Port SessionManager from old agent package
3. Implement BashTool, EditTool, WriteTool
4. Implement CodingAgent class using pi-ai/agentLoop
5. Write tests for each tool
6. Write integration tests
3. Implement ReadTool, BashTool, EditTool, WriteTool
4. Implement CodingAgent class (wraps @mariozechner/agent)
5. Implement CLI in `src/cli/` directory:
- CLI entry point (index.ts)
- TuiRenderer, ConsoleRenderer, JsonRenderer
- Argument parsing
- Interactive and single-shot modes
6. Write tests for tools and agent
7. Write integration tests for CLI
8. Publish `@mariozechner/coding-agent@0.1.0` (includes library + CLI binary)
### Phase 2: Build TUI
1. Create `packages/coding-agent-tui/`
2. Port TuiRenderer from old agent package
3. Port ConsoleRenderer, JsonRenderer
4. Implement CLI argument parsing
5. Implement interactive and single-shot modes
6. Test session resume functionality
### Phase 3: Prove Out New Packages
1. Use coding-agent (library + CLI) extensively
2. Fix bugs and iterate on API design
3. Gather feedback from real usage
4. Ensure stability and performance
### Phase 3: Update Dependencies
1. Update web-ui if needed (should be unaffected)
2. Deprecate old agent package
3. Update documentation
4. Update examples
### Phase 4: Migrate Web UI (OPTIONAL, later)
1. Once new `@mariozechner/agent` is proven stable
2. Update web-ui package.json to depend on `@mariozechner/agent`
3. Remove src/agent/agent.ts, src/agent/types.ts, src/agent/transports/
4. Keep src/utils/attachment-utils.ts (document processing)
5. Update imports to use `@mariozechner/agent`
6. Test that web UI still works correctly
7. Verify document attachments (PDF, DOCX, etc.) still work
### Phase 4: Future Enhancements
1. Build VS Code extension
2. Add more tools (grep, find, etc.) as optional
### Phase 5: Cleanup
1. Deprecate/remove old `packages/agent/` package
2. Update all documentation
3. Create migration guide
4. Add examples for all use cases
### Phase 6: Future Enhancements
1. Build VS Code extension using `@mariozechner/coding-agent`
2. Add more tools (grep, find, glob, etc.) as optional plugins
3. Plugin system for custom tools
4. Parallel tool execution
5. Streaming tool output for long-running commands
## Open Questions & Decisions
@ -975,14 +1313,29 @@ Error: Cannot read binary file 'dist/app.js'. Use bash tool if you need to inspe
This architecture provides:
**Headless core** - Clean separation between agent logic and UI
**Reusable** - Same agent for TUI, VS Code, web, APIs
**Composable** - Build on pi-ai primitives
**Abortable** - First-class cancellation support
**Session persistence** - Resume conversations seamlessly
### General Agent Package (`@mariozechner/agent`)
**Transport abstraction** - Pluggable backends (ProviderTransport, AppTransport)
**Reactive state** - Subscribe/emit pattern for UI binding
**Message transformation** - Flexible pipeline for message filtering/adaptation
**Message queueing** - Out-of-band message injection during agent loop
**Attachment support** - Type-safe attachment handling (processing is external)
**Abort support** - First-class cancellation with AbortController
**Provider agnostic** - Works with any LLM provider via @mariozechner/pi-ai
**Type-safe** - Full TypeScript with proper types
### Coding Agent Package (`@mariozechner/coding-agent`)
**Builds on general agent** - Leverages transport abstraction and state management
**Session persistence** - JSONL-based session storage and resume
**Focused tools** - read, bash, edit, write (4 tools, no more)
**Smart pagination** - 5000-line chunks with offset/limit
**Type-safe** - Full TypeScript with schema validation
**Smart pagination** - 5000-line chunks with offset/limit for ReadTool
**Working directory context** - All tools operate relative to project root
**Simple API** - Hides complexity, easy to use
**Testable** - Pure functions, mockable dependencies
The key insight is to **keep web-ui's agent separate** (it has different concerns) while creating a **new focused coding agent** for file manipulation workflows that can be shared across non-web interfaces.
### Key Architectural Insights
1. **Extract, don't rewrite** - The web-ui agent is well-designed; extract it into a general package
2. **Separation of concerns** - Document processing (PDF/DOCX/etc.) stays in web-ui, only type definitions move to general agent
3. **Layered architecture** - pi-ai → agent → coding-agent → coding-agent-tui
4. **Reusable across UIs** - Web UI and coding agent both use the same general agent package
5. **Pluggable transports** - Easy to add new backends (local API, proxy server, etc.)
6. **Attachment flexibility** - Type is defined centrally, processing is done by consumers