getcompanion-ai/co-mono

Fork 0

mirror of https://github.com/getcompanion-ai/co-mono.git synced 2026-04-15 13:03:42 +00:00

Mario Zechner ffc9be8867 Agent package + coding agent WIP, refactored web-ui prompts

2025-10-17 11:47:01 +02:00

42 KiB

Raw Blame History

Agent Architecture

Executive Summary

This document proposes extracting the agent infrastructure from @mariozechner/pi-web-ui into two new packages:

@mariozechner/agent - General-purpose agent package with transport abstraction, state management, and attachment support
@mariozechner/coding-agent - Specialized coding agent built on the general agent, with file manipulation tools and session management

The new architecture will provide:

General agent core with transport abstraction (ProviderTransport, AppTransport)
Reactive state management with subscribe/emit pattern
Attachment support (type definitions only - processing stays in consumers)
Message transformation pipeline for filtering and adapting messages
Message queueing for out-of-band message injection
Full abort support throughout the execution pipeline
Event-driven API for flexible UI integration
Clean separation between agent logic and presentation layer
Coding-specific tools (read, bash, edit, write) in a specialized package
Session management for conversation persistence and resume capability

Current Architecture Analysis

Package Overview

pi-mono/
├── packages/ai/              # Core AI streaming (GOOD - keep as-is)
├── packages/web-ui/          # Web UI with embedded agent (EXTRACT core agent logic)
├── packages/agent/           # OLD - needs to be replaced
├── packages/tui/             # Terminal UI lib (GOOD - low-level primitives)
├── packages/proxy/           # CORS proxy (unrelated)
└── packages/pods/            # GPU deployment tool (unrelated)

packages/ai - Core Streaming Library

Status: ✅ Solid foundation, keep as-is

Architecture:

agentLoop(
  prompt: UserMessage,
  context: AgentContext,
  config: AgentLoopConfig,
  signal?: AbortSignal
): EventStream<AgentEvent>

Key Features:

Event-driven streaming (turn_start, message_, tool_execution_, turn_end, agent_end)
Tool execution with validation
Signal-based cancellation
Message queue for injecting out-of-band messages
Preprocessor support for message transformation

Events:

type AgentEvent =
  | { type: "agent_start" }
  | { type: "turn_start" }
  | { type: "message_start"; message: Message }
  | { type: "message_update"; assistantMessageEvent: AssistantMessageEvent; message: AssistantMessage }
  | { type: "message_end"; message: Message }
  | { type: "tool_execution_start"; toolCallId: string; toolName: string; args: any }
  | { type: "tool_execution_end"; toolCallId: string; toolName: string; result: AgentToolResult<any> | string; isError: boolean }
  | { type: "turn_end"; message: AssistantMessage; toolResults: ToolResultMessage[] }
  | { type: "agent_end"; messages: Message[] }

Tool Interface:

interface AgentTool<TParameters extends TSchema = TSchema, TDetails = any> extends Tool<TParameters> {
  label: string;  // Human-readable name for UI
  execute: (
    toolCallId: string,
    params: Static<TParameters>,
    signal?: AbortSignal
  ) => Promise<AgentToolResult<TDetails>>;
}

interface AgentToolResult<T> {
  output: string;   // Text sent to LLM
  details: T;       // Structured data for UI rendering
}

packages/web-ui/src/agent - Web Agent

Status: ✅ KEEP AS-IS for now, will be replaced later after new packages are proven

Architecture:

class Agent {
  constructor(opts: {
    initialState?: Partial<AgentState>;
    debugListener?: (entry: DebugLogEntry) => void;
    transport: AgentTransport;
    messageTransformer?: (messages: AppMessage[]) => Message[];
  })

  async prompt(input: string, attachments?: Attachment[]): Promise<void>
  abort(): void
  subscribe(fn: (e: AgentEvent) => void): () => void
  setSystemPrompt(v: string): void
  setModel(m: Model<any>): void
  setThinkingLevel(l: ThinkingLevel): void
  setTools(t: AgentTool<any>[]): void
  replaceMessages(ms: AppMessage[]): void
  appendMessage(m: AppMessage): void
  async queueMessage(m: AppMessage): Promise<void>
  clearMessages(): void
}

Key Features (will be basis for new @mariozechner/agent package):

✅ Transport abstraction (ProviderTransport for direct API, AppTransport for server-side proxy)
✅ Attachment type definition (id, type, fileName, mimeType, size, content, extractedText, preview)
✅ Message transformation pipeline (app messages → LLM messages, with filtering)
✅ Reactive state (subscribe/emit pattern for UI updates)
✅ Message queueing for injecting messages out-of-band during agent loop
✅ Abort support (AbortController per prompt)
✅ State management (systemPrompt, model, thinkingLevel, tools, messages, isStreaming, etc.)

Strategy:

Use this implementation as the reference design for @mariozechner/agent
Create new @mariozechner/agent package by copying/adapting this code
Keep web-ui using its own embedded agent until new package is proven stable
Eventually migrate web-ui to use @mariozechner/agent (Phase 2 of migration)
Document processing (PDF/DOCX/PPTX/Excel) stays in web-ui permanently

packages/agent - OLD Implementation

Status: ⚠️ REMOVE COMPLETELY

Why it should be removed:

Tightly coupled to OpenAI SDK - Not provider-agnostic, hardcoded to OpenAI's API
Outdated architecture - Superseded by web-ui's better agent design
Mixed concerns - Agent logic + tool implementations + rendering all in one package
Limited scope - Cannot be reused across different UI implementations

What to salvage before removal:

SessionManager - Port to @mariozechner/coding-agent (JSONL-based session persistence)
Tool implementations - Adapt read, bash, edit, write tools for coding-agent
Renderer abstractions - Port TuiRenderer/ConsoleRenderer/JsonRenderer concepts to coding-agent-tui

Action: Delete this package entirely after extracting useful components to the new packages.

Proposed Architecture

Package Structure

pi-mono/
├── packages/ai/                          # [unchanged] Core streaming library
│
├── packages/agent/                       # [NEW] General-purpose agent
│   ├── src/
│   │   ├── agent.ts                      # Main Agent class
│   │   ├── types.ts                      # AgentState, AgentEvent, Attachment, etc.
│   │   ├── transports/
│   │   │   ├── types.ts                  # AgentTransport interface
│   │   │   ├── ProviderTransport.ts      # Direct API calls
│   │   │   ├── AppTransport.ts           # Server-side proxy
│   │   │   ├── proxy-types.ts            # Proxy event types
│   │   │   └── index.ts                  # Transport exports
│   │   └── index.ts                      # Public API
│   └── package.json
│
├── packages/coding-agent/                # [NEW] Coding-specific agent + CLI
│   ├── src/
│   │   ├── coding-agent.ts               # CodingAgent wrapper (uses @mariozechner/agent)
│   │   ├── session-manager.ts            # Session persistence (JSONL)
│   │   ├── tools/
│   │   │   ├── read-tool.ts              # Read files (with pagination)
│   │   │   ├── bash-tool.ts              # Shell execution
│   │   │   ├── edit-tool.ts              # File editing (old_string → new_string)
│   │   │   ├── write-tool.ts             # File creation/replacement
│   │   │   └── index.ts                  # Tool exports
│   │   ├── cli/
│   │   │   ├── index.ts                  # CLI entry point
│   │   │   ├── renderers/
│   │   │   │   ├── tui-renderer.ts       # Rich terminal UI
│   │   │   │   ├── console-renderer.ts   # Simple console output
│   │   │   │   └── json-renderer.ts      # JSONL output for piping
│   │   │   └── main.ts                   # CLI app logic
│   │   ├── types.ts                      # Public types
│   │   └── index.ts                      # Public API (agent + tools)
│   └── package.json                      # Exports both library + CLI binary
│
├── packages/web-ui/                      # [updated] Uses @mariozechner/agent
│   ├── src/
│   │   ├── utils/
│   │   │   └── attachment-utils.ts       # Document processing (keep here)
│   │   └── ...                           # Other web UI code
│   └── package.json                      # Now depends on @mariozechner/agent
│
└── packages/tui/                         # [unchanged] Low-level terminal primitives

Dependency Graph

                    ┌─────────────────────┐
                    │   @mariozechner/    │
                    │      pi-ai          │  ← Core streaming, tool interface
                    └──────────┬──────────┘
                               │ depends on
                               ↓
                    ┌─────────────────────┐
                    │  @mariozechner/     │
                    │      agent          │  ← General agent (transports, state, attachments)
                    └──────────┬──────────┘
                               │ depends on
                               ↓
               ┌───────────────┴───────────────┐
               ↓                               ↓
    ┌─────────────────────┐         ┌─────────────────────┐
    │  @mariozechner/     │         │  @mariozechner/     │
    │   coding-agent      │         │     pi-web-ui       │
    │ (lib + CLI + tools) │         │ (+ doc processing)  │
    └─────────────────────┘         └─────────────────────┘

Package: @mariozechner/agent

Core Types

export interface Attachment {
  id: string;
  type: "image" | "document";
  fileName: string;
  mimeType: string;
  size: number;
  content: string;        // base64 encoded (without data URL prefix)
  extractedText?: string; // For documents
  preview?: string;       // base64 image preview
}

export type ThinkingLevel = "off" | "minimal" | "low" | "medium" | "high";

// AppMessage abstraction - extends base LLM messages with app-specific features
export type UserMessageWithAttachments = UserMessage & { attachments?: Attachment[] };

// Extensible interface for custom app messages (via declaration merging)
// Apps can add their own message types:
// declare module "@mariozechner/agent" {
//   interface CustomMessages {
//     artifact: ArtifactMessage;
//     notification: NotificationMessage;
//   }
// }
export interface CustomMessages {
  // Empty by default - apps extend via declaration merging
}

// AppMessage: Union of LLM messages + attachments + custom messages
export type AppMessage =
  | AssistantMessage
  | UserMessageWithAttachments
  | ToolResultMessage
  | CustomMessages[keyof CustomMessages];

export interface AgentState {
  systemPrompt: string;
  model: Model<any>;
  thinkingLevel: ThinkingLevel;
  tools: AgentTool<any>[];
  messages: AppMessage[];      // Can include attachments + custom message types
  isStreaming: boolean;
  streamMessage: Message | null;
  pendingToolCalls: Set<string>;
  error?: string;
}

export type AgentEvent =
  | { type: "state-update"; state: AgentState }
  | { type: "started" }
  | { type: "completed" };

AppMessage Abstraction

The AppMessage type is a key abstraction that extends base LLM messages with app-specific features while maintaining type safety and extensibility.

Key Benefits:

Extends base messages - Adds attachments field to UserMessage for file uploads
Type-safe extensibility - Apps can add custom message types via declaration merging
Backward compatible - Works seamlessly with base LLM messages from @mariozechner/pi-ai
Message transformation - Filters app-specific fields before sending to LLM

Usage Example (Web UI):

import type { AppMessage } from "@mariozechner/agent";

// Extend with custom message type for artifacts
declare module "@mariozechner/agent" {
  interface CustomMessages {
    artifact: ArtifactMessage;
  }
}

interface ArtifactMessage {
  role: "artifact";
  action: "create" | "update" | "delete";
  filename: string;
  content?: string;
  title?: string;
  timestamp: string;
}

// Now AppMessage includes: AssistantMessage | UserMessageWithAttachments | ToolResultMessage | ArtifactMessage
const messages: AppMessage[] = [
  { role: "user", content: "Hello", attachments: [attachment] },
  { role: "assistant", content: [{ type: "text", text: "Hi!" }], /* ... */ },
  { role: "artifact", action: "create", filename: "test.ts", content: "...", timestamp: "..." }
];

Usage Example (Coding Agent):

import type { AppMessage } from "@mariozechner/agent";

// Coding agent can extend with session metadata
declare module "@mariozechner/agent" {
  interface CustomMessages {
    session_metadata: SessionMetadataMessage;
  }
}

interface SessionMetadataMessage {
  role: "session_metadata";
  sessionId: string;
  timestamp: string;
  workingDirectory: string;
}

Message Transformation:

The messageTransformer function converts app messages to LLM-compatible messages, including handling attachments:

function defaultMessageTransformer(messages: AppMessage[]): Message[] {
  return messages
    .filter((m) => {
      // Only keep standard LLM message roles
      return m.role === "user" || m.role === "assistant" || m.role === "toolResult";
    })
    .map((m) => {
      if (m.role === "user") {
        const { attachments, ...baseMessage } = m as any;

        // If no attachments, return as-is
        if (!attachments || attachments.length === 0) {
          return baseMessage as Message;
        }

        // Convert attachments to content blocks
        const content = Array.isArray(baseMessage.content)
          ? [...baseMessage.content]
          : [{ type: "text", text: baseMessage.content }];

        for (const attachment of attachments) {
          // Add image blocks for image attachments
          if (attachment.type === "image") {
            content.push({
              type: "image",
              data: attachment.content,
              mimeType: attachment.mimeType
            });
          }
          // Add text blocks for documents with extracted text
          else if (attachment.type === "document" && attachment.extractedText) {
            content.push({
              type: "text",
              text: attachment.extractedText
            });
          }
        }

        return { ...baseMessage, content } as Message;
      }
      return m as Message;
    });
}

This ensures that:

Custom message types (like artifact, session_metadata) are filtered out
Image attachments are converted to ImageContent blocks
Document attachments with extracted text are converted to TextContent blocks
The attachments field itself is stripped (replaced by proper content blocks)
LLM receives only standard Message types from @mariozechner/pi-ai

Agent Class

export interface AgentOptions {
  initialState?: Partial<AgentState>;
  transport: AgentTransport;
  // Transform app messages to LLM-compatible messages before sending
  messageTransformer?: (messages: AppMessage[]) => Message[] | Promise<Message[]>;
}

export class Agent {
  constructor(opts: AgentOptions);

  get state(): AgentState;
  subscribe(fn: (e: AgentEvent) => void): () => void;

  // State mutators
  setSystemPrompt(v: string): void;
  setModel(m: Model<any>): void;
  setThinkingLevel(l: ThinkingLevel): void;
  setTools(t: AgentTool<any>[]): void;
  replaceMessages(ms: AppMessage[]): void;
  appendMessage(m: AppMessage): void;
  async queueMessage(m: AppMessage): Promise<void>;
  clearMessages(): void;

  // Main prompt method
  async prompt(input: string, attachments?: Attachment[]): Promise<void>;

  // Abort current operation
  abort(): void;
}

Key Features:

Reactive state - Subscribe to state updates for UI binding
Transport abstraction - Pluggable backends (direct API, proxy server, etc.)
Message transformation - Convert app-specific messages to LLM format
Message queueing - Inject messages during agent loop (for tool results, errors)
Attachment support - Type-safe attachment handling (processing is external)
Abort support - Cancel in-progress operations

Transport Interface

export interface AgentRunConfig {
  systemPrompt: string;
  tools: AgentTool<any>[];
  model: Model<any>;
  reasoning?: "low" | "medium" | "high";
  getQueuedMessages?: <T>() => Promise<QueuedMessage<T>[]>;
}

export interface AgentTransport {
  run(
    messages: Message[],
    userMessage: Message,
    config: AgentRunConfig,
    signal?: AbortSignal,
  ): AsyncIterable<AgentEvent>;
}

ProviderTransport

export class ProviderTransport implements AgentTransport {
  async *run(messages: Message[], userMessage: Message, cfg: AgentRunConfig, signal?: AbortSignal) {
    // Calls LLM providers directly using agentLoop from @mariozechner/pi-ai
    // Optionally routes through CORS proxy if configured
  }
}

AppTransport

export class AppTransport implements AgentTransport {
  constructor(proxyUrl: string);

  async *run(messages: Message[], userMessage: Message, cfg: AgentRunConfig, signal?: AbortSignal) {
    // Routes requests through app server with user authentication
    // Server manages API keys and usage tracking
  }
}

Package: @mariozechner/coding-agent

CodingAgent Class

export interface CodingAgentOptions {
  systemPrompt: string;
  model: Model<any>;
  reasoning?: "low" | "medium" | "high";
  apiKey: string;
  workingDirectory?: string;
  sessionManager?: SessionManager;
}

export class CodingAgent {
  constructor(options: CodingAgentOptions);

  // Access underlying agent
  get agent(): Agent;

  // State accessors
  get state(): AgentState;
  subscribe(fn: (e: AgentEvent) => void): () => void;

  // Send a message to the agent
  async prompt(message: string, attachments?: Attachment[]): Promise<void>;

  // Abort current operation
  abort(): void;

  // Message management for session restoration
  replaceMessages(messages: AppMessage[]): void;
  getMessages(): AppMessage[];
}

Key design decisions:

Wraps @mariozechner/agent - Builds on the general agent package
Pre-configured tools - Includes read, bash, edit, write tools
Session management - Optional JSONL-based session persistence
Working directory context - All file operations relative to this directory
Simple API - Hides transport complexity, uses ProviderTransport by default

Usage Example (TUI)

import { CodingAgent } from "@mariozechner/coding-agent";
import { SessionManager } from "@mariozechner/coding-agent";
import { getModel } from "@mariozechner/pi-ai";

const session = new SessionManager({ continue: true });
const agent = new CodingAgent({
  systemPrompt: "You are a coding assistant...",
  model: getModel("openai", "gpt-4"),
  apiKey: process.env.OPENAI_API_KEY!,
  workingDirectory: process.cwd(),
  sessionManager: session,
});

// Restore previous session
if (session.hasData()) {
  agent.replaceMessages(session.getMessages());
}

// Subscribe to state changes
agent.subscribe((event) => {
  if (event.type === "state-update") {
    renderer.render(event.state);
  } else if (event.type === "completed") {
    session.save(agent.getMessages());
  }
});

// Send prompt
await agent.prompt("Fix the bug in server.ts");

Usage Example (Web UI)

import { Agent, ProviderTransport, Attachment } from "@mariozechner/agent";
import { getModel } from "@mariozechner/pi-ai";
import { loadAttachment } from "./utils/attachment-utils"; // Web UI keeps this

const agent = new Agent({
  transport: new ProviderTransport(),
  initialState: {
    systemPrompt: "You are a helpful assistant...",
    model: getModel("google", "gemini-2.5-flash"),
    thinkingLevel: "low",
    tools: [],
  },
});

// Subscribe to state changes for UI updates
agent.subscribe((event) => {
  if (event.type === "state-update") {
    updateUI(event.state);
  }
});

// Handle file upload and send prompt
const file = await fileInput.files[0];
const attachment = await loadAttachment(file); // Processes PDF/DOCX/etc
await agent.prompt("Analyze this document", [attachment]);

Session Manager

export interface SessionManagerOptions {
  continue?: boolean;           // Resume most recent session
  directory?: string;            // Custom session directory
}

export interface SessionMetadata {
  id: string;
  timestamp: string;
  cwd: string;
  config: CodingAgentConfig;
}

export interface SessionData {
  metadata: SessionMetadata;
  messages: AppMessage[];       // Conversation history
}

export class SessionManager {
  constructor(options?: SessionManagerOptions);

  // Start a new session (writes metadata)
  startSession(config: CodingAgentConfig): void;

  // Append a message to the session (appends to JSONL)
  appendMessage(message: AppMessage): void;

  // Check if session has existing data
  hasData(): boolean;

  // Get full session data
  getData(): SessionData | null;

  // Get just the messages for agent restoration
  getMessages(): AppMessage[];

  // Get session file path
  getFilePath(): string;

  // Get session ID
  getId(): string;
}

Session Storage Format (JSONL):

{"type":"metadata","id":"uuid","timestamp":"2025-10-12T10:00:00Z","cwd":"/path","config":{...}}
{"type":"message","message":{"role":"user","content":"Fix the bug in server.ts"}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"I'll help..."}],...}}
{"type":"message","message":{"role":"toolResult","toolCallId":"call_123","output":"..."}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"Fixed!"}],...}}

How it works:

First line is session metadata (id, timestamp, working directory, config)
Each subsequent line is an AppMessage from agent.state.messages
Messages are appended as they're added to the agent state (append-only)
On session restore, read all message lines to reconstruct conversation history

Session File Naming:

~/.pi/sessions/--path-to-project--/
  2025-10-12T10-00-00-000Z_uuid.jsonl
  2025-10-12T11-30-00-000Z_uuid.jsonl

Tool: BashTool

export interface BashToolDetails {
  command: string;
  exitCode: number;
  duration: number;  // milliseconds
}

export const bashToolSchema = Type.Object({
  command: Type.String({ description: "Shell command to execute" }),
});

export class BashTool implements AgentTool<typeof bashToolSchema, BashToolDetails> {
  name = "bash";
  label = "Execute Shell Command";
  description = "Execute a bash command in the working directory";
  parameters = bashToolSchema;

  constructor(private workingDirectory: string);

  async execute(
    toolCallId: string,
    params: { command: string },
    signal?: AbortSignal
  ): Promise<AgentToolResult<BashToolDetails>> {
    // Spawn child process with signal support
    // Capture stdout/stderr
    // Handle 1MB output limit with truncation
    // Return structured result
  }
}

Key Features:

Abort support via signal → child process kill
1MB output limit (prevents memory exhaustion)
Exit code tracking
Working directory context

Output Format:

{
  output: "stdout:\n<content>\nstderr:\n<content>\nexit code: 0",
  details: {
    command: "npm test",
    exitCode: 0,
    duration: 1234
  }
}

Tool: ReadTool

export interface ReadToolDetails {
  filePath: string;
  totalLines: number;
  linesRead: number;
  offset: number;
  truncated: boolean;
}

export const readToolSchema = Type.Object({
  file_path: Type.String({ description: "Path to file to read (relative or absolute)" }),
  offset: Type.Optional(Type.Number({
    description: "Line number to start reading from (1-indexed). Omit to read from beginning.",
    minimum: 1
  })),
  limit: Type.Optional(Type.Number({
    description: "Maximum number of lines to read. Omit to read entire file (max 5000 lines).",
    minimum: 1,
    maximum: 5000
  })),
});

export class ReadTool implements AgentTool<typeof readToolSchema, ReadToolDetails> {
  name = "read";
  label = "Read File";
  description = "Read file contents. For files >5000 lines, use offset and limit to read in chunks.";
  parameters = readToolSchema;

  constructor(private workingDirectory: string);

  async execute(
    toolCallId: string,
    params: { file_path: string; offset?: number; limit?: number },
    signal?: AbortSignal
  ): Promise<AgentToolResult<ReadToolDetails>> {
    // Resolve file path (relative to workingDirectory)
    // Count total lines in file
    // If no offset/limit: read up to 5000 lines, warn if truncated
    // If offset/limit: read specified range
    // Format with line numbers (using cat -n style)
    // Return content + metadata
  }
}

Key Features:

Full file read: Up to 5000 lines (warns LLM if truncated)
Ranged read: Specify offset + limit for large files
Line numbers: Output formatted like cat -n (1-indexed)
Abort support: Can cancel during large file reads
Metadata: Total line count, lines read, truncation status

Output Format (full file):

{
  output: `     1  import { foo } from './foo';
     2  import { bar } from './bar';
     3
     4  export function main() {
     5    console.log('hello');
     6  }`,
  details: {
    filePath: "src/main.ts",
    totalLines: 6,
    linesRead: 6,
    offset: 0,
    truncated: false
  }
}

Output Format (large file, truncated):

{
  output: `WARNING: File has 10000 lines, showing first 5000. Use offset and limit parameters to read more.

     1  import { foo } from './foo';
     2  import { bar } from './bar';
     ...
  5000  const x = 42;`,
  details: {
    filePath: "src/large.ts",
    totalLines: 10000,
    linesRead: 5000,
    offset: 0,
    truncated: true
  }
}

Output Format (ranged read):

{
  output: `  1000  function middleware() {
  1001    return (req, res, next) => {
  1002      console.log('middleware');
  1003      next();
  1004    };
  1005  }`,
  details: {
    filePath: "src/server.ts",
    totalLines: 10000,
    linesRead: 6,
    offset: 1000,
    truncated: false
  }
}

Error Cases:

File not found → error
Offset > total lines → error
Binary file detected → error (suggest using bash tool)

Usage Examples in System Prompt:

To read a large file:
1. read(file_path="src/large.ts") // Gets first 5000 lines + total count
2. If truncated, read remaining chunks:
   read(file_path="src/large.ts", offset=5001, limit=5000)
   read(file_path="src/large.ts", offset=10001, limit=5000)

Tool: EditTool

export interface EditToolDetails {
  filePath: string;
  oldString: string;
  newString: string;
  matchCount: number;
  linesChanged: number;
}

export const editToolSchema = Type.Object({
  file_path: Type.String({ description: "Path to file to edit (relative or absolute)" }),
  old_string: Type.String({ description: "Exact string to find and replace" }),
  new_string: Type.String({ description: "String to replace with" }),
});

export class EditTool implements AgentTool<typeof editToolSchema, EditToolDetails> {
  name = "edit";
  label = "Edit File";
  description = "Find and replace exact string in a file";
  parameters = editToolSchema;

  constructor(private workingDirectory: string);

  async execute(
    toolCallId: string,
    params: { file_path: string; old_string: string; new_string: string },
    signal?: AbortSignal
  ): Promise<AgentToolResult<EditToolDetails>> {
    // Resolve file path (relative to workingDirectory)
    // Read file contents
    // Find old_string (must be exact match)
    // Replace with new_string
    // Write file back
    // Return stats
  }
}

Key Features:

Exact string matching (no regex)
Safe atomic writes (write temp → rename)
Abort support (cancel before write)
Match validation (error if old_string not found)
Line-based change tracking

Output Format:

{
  output: "Replaced 1 occurrence in src/server.ts (3 lines changed)",
  details: {
    filePath: "src/server.ts",
    oldString: "const port = 3000;",
    newString: "const port = process.env.PORT || 3000;",
    matchCount: 1,
    linesChanged: 3
  }
}

Error Cases:

File not found → error
old_string not found → error
Multiple matches for old_string → error (ambiguous)
File changed during operation → error (race condition)

Tool: WriteTool

export interface WriteToolDetails {
  filePath: string;
  size: number;
  isNew: boolean;
}

export const writeToolSchema = Type.Object({
  file_path: Type.String({ description: "Path to file to create/overwrite" }),
  content: Type.String({ description: "Full file contents to write" }),
});

export class WriteTool implements AgentTool<typeof writeToolSchema, WriteToolDetails> {
  name = "write";
  label = "Write File";
  description = "Create a new file or completely replace existing file contents";
  parameters = writeToolSchema;

  constructor(private workingDirectory: string);

  async execute(
    toolCallId: string,
    params: { file_path: string; content: string },
    signal?: AbortSignal
  ): Promise<AgentToolResult<WriteToolDetails>> {
    // Resolve file path
    // Check if file exists (track isNew)
    // Create parent directories if needed
    // Write content atomically
    // Return stats
  }
}

Key Features:

Creates parent directories automatically
Safe atomic writes
Abort support
No size limits (trust LLM context limits)

Output Format:

{
  output: "Created new file src/utils/helper.ts (142 bytes)",
  details: {
    filePath: "src/utils/helper.ts",
    size: 142,
    isNew: true
  }
}

CLI Interface (included in @mariozechner/coding-agent)

The coding-agent package includes both a library and a CLI interface in one package.

CLI Usage

# Interactive mode (default)
coding-agent

# Continue previous session
coding-agent --continue

# Single-shot mode
coding-agent "Fix the TypeScript errors"

# Multiple prompts
coding-agent "Add validation" "Write tests"

# Custom model
coding-agent --model openai/gpt-4 --api-key $KEY

# JSON output (for piping)
coding-agent --json < prompts.jsonl > results.jsonl

CLI Arguments

{
  "base-url": string;        // API endpoint
  "api-key": string;         // API key (or env var)
  "model": string;           // Model identifier
  "system-prompt": string;   // System prompt
  "continue": boolean;       // Resume session
  "json": boolean;           // JSONL I/O mode
  "help": boolean;           // Show help
}

Renderers

TuiRenderer - Rich terminal UI

Real-time streaming output
Syntax highlighting for code
Tool execution indicators
Progress spinners
Token usage stats
Keyboard shortcuts (Ctrl+C to abort)

ConsoleRenderer - Simple console output

Plain text output
No ANSI codes
Good for logging/CI

JsonRenderer - JSONL output

One JSON object per line
Each line is a complete event
For piping/processing

JSON Mode Example

Input (stdin):

{"type":"message","content":"List all TypeScript files"}
{"type":"interrupt"}
{"type":"message","content":"Count the files"}

Output (stdout):

{"type":"turn_start","timestamp":"..."}
{"type":"message_start","message":{...}}
{"type":"tool_execution_start","toolCallId":"...","toolName":"bash","args":"{...}"}
{"type":"tool_execution_end","toolCallId":"...","result":"..."}
{"type":"message_end","message":{...}}
{"type":"turn_end"}
{"type":"interrupted"}
{"type":"message_start","message":{...}}
...

Integration Patterns

VS Code Extension

import { CodingAgent, SessionManager } from "@mariozechner/coding-agent";
import * as vscode from "vscode";

class CodingAgentProvider {
  private agent: CodingAgent;
  private outputChannel: vscode.OutputChannel;

  constructor() {
    const workspaceRoot = vscode.workspace.workspaceFolders?.[0]?.uri.fsPath || process.cwd();
    const session = new SessionManager({
      directory: path.join(workspaceRoot, ".vscode", "agent-sessions")
    });

    this.agent = new CodingAgent({
      config: {
        systemPrompt: "You are a coding assistant...",
        model: getModel("openai", "gpt-4"),
        apiKey: vscode.workspace.getConfiguration("codingAgent").get("apiKey")!,
      },
      sessionManager: session,
      workingDirectory: workspaceRoot,
    });

    this.outputChannel = vscode.window.createOutputChannel("Coding Agent");
  }

  async executePrompt(prompt: string) {
    const cancellation = new vscode.CancellationTokenSource();

    // Convert VS Code cancellation to AbortSignal
    const controller = new AbortController();
    cancellation.token.onCancellationRequested(() => controller.abort());

    for await (const event of this.agent.prompt(prompt, controller.signal)) {
      switch (event.type) {
        case "message_update":
          this.outputChannel.appendLine(event.message.content[0].text);
          break;
        case "tool_execution_start":
          vscode.window.showInformationMessage(`Running: ${event.toolName}`);
          break;
        case "tool_execution_end":
          if (event.isError) {
            vscode.window.showErrorMessage(`Tool failed: ${event.result}`);
          }
          break;
      }
    }
  }
}

Headless Server/API

import { CodingAgent } from "@mariozechner/coding-agent";
import express from "express";

const app = express();

app.post("/api/prompt", async (req, res) => {
  const { prompt, sessionId } = req.body;

  const agent = new CodingAgent({
    config: {
      systemPrompt: "...",
      model: getModel("openai", "gpt-4"),
      apiKey: process.env.OPENAI_API_KEY!,
    },
    workingDirectory: `/tmp/workspaces/${sessionId}`,
  });

  // Stream SSE
  res.setHeader("Content-Type", "text/event-stream");

  const controller = new AbortController();
  req.on("close", () => controller.abort());

  for await (const event of agent.prompt(prompt, controller.signal)) {
    res.write(`data: ${JSON.stringify(event)}\n\n`);
  }

  res.end();
});

app.listen(3000);

Migration Plan

Phase 1: Create General Agent Package

Create packages/agent/ structure
COPY Agent class from web-ui/src/agent/agent.ts (don't extract yet)
Copy types (AgentState, AgentEvent, Attachment, DebugLogEntry, ThinkingLevel)
Copy transports (types.ts, ProviderTransport.ts, AppTransport.ts, proxy-types.ts)
Adapt code to work as standalone package
Write unit tests for Agent class
Write tests for both transports
Publish @mariozechner/agent@0.1.0
Keep web-ui unchanged - it continues using its embedded agent

Phase 2: Create Coding Agent Package (with CLI)

Create packages/coding-agent/ structure
Port SessionManager from old agent package
Implement ReadTool, BashTool, EditTool, WriteTool
Implement CodingAgent class (wraps @mariozechner/agent)
Implement CLI in src/cli/ directory:
- CLI entry point (index.ts)
- TuiRenderer, ConsoleRenderer, JsonRenderer
- Argument parsing
- Interactive and single-shot modes
Write tests for tools and agent
Write integration tests for CLI
Publish @mariozechner/coding-agent@0.1.0 (includes library + CLI binary)

Phase 3: Prove Out New Packages

Use coding-agent (library + CLI) extensively
Fix bugs and iterate on API design
Gather feedback from real usage
Ensure stability and performance

Phase 4: Migrate Web UI (OPTIONAL, later)

Once new @mariozechner/agent is proven stable
Update web-ui package.json to depend on @mariozechner/agent
Remove src/agent/agent.ts, src/agent/types.ts, src/agent/transports/
Keep src/utils/attachment-utils.ts (document processing)
Update imports to use @mariozechner/agent
Test that web UI still works correctly
Verify document attachments (PDF, DOCX, etc.) still work

Phase 5: Cleanup

Deprecate/remove old packages/agent/ package
Update all documentation
Create migration guide
Add examples for all use cases

Phase 6: Future Enhancements

Build VS Code extension using @mariozechner/coding-agent
Add more tools (grep, find, glob, etc.) as optional plugins
Plugin system for custom tools
Parallel tool execution
Streaming tool output for long-running commands

Open Questions & Decisions

1. Should EditTool support multiple replacements?

Option A: Error on multiple matches (current proposal)

Forces explicit, unambiguous edits
LLM must be precise with context
Safer (no accidental mass replacements)

Option B: Replace all matches

More convenient for bulk changes
Risk of unintended replacements
Need replace_all: boolean flag

Decision: Start with Option A, add replace_all flag if needed.

2. ReadTool line limit and pagination strategy?

Decision: 5000 line default limit with offset/limit pagination

Rationale:

5000 lines balances context vs token usage (typical file fits in one read)
Line-based pagination is intuitive for LLM (matches how humans think about code)
cat -n format with line numbers helps LLM reference specific lines in edits
Automatic truncation warning teaches LLM to paginate when needed

Alternative considered: Byte-based limits (rejected - harder for LLM to reason about)

System prompt guidance:

When reading large files:
1. First read without offset/limit to get total line count
2. If truncated, calculate chunks: ceil(totalLines / 5000)
3. Read each chunk with appropriate offset

3. Should ReadTool handle binary files?

Decision: Error on binary files with helpful message

Error message:

Error: Cannot read binary file 'dist/app.js'. Use bash tool if you need to inspect: bash(command="file dist/app.js") or bash(command="xxd dist/app.js | head")

Rationale:

Binary files are rarely useful to LLM
Clear error message teaches LLM to use appropriate tools
Prevents token waste on unreadable content

Binary detection: Check for null bytes in first 8KB (same strategy as git diff)

4. Should EditTool support regex?

Current proposal: No regex, exact string match only

Pros of exact match:

Simple implementation
No regex escaping issues
Clear error messages
Safer (no accidental broad matches)

Cons:

Less powerful
Multiple edits needed for patterns

Decision: Exact match only. LLM can use bash/sed for complex patterns.

5. Working directory enforcement?

Question: Should tools be sandboxed to workingDirectory?

Option A: Enforce sandbox (only access files under workingDirectory)

Safer
Prevents accidental system file edits
Clear boundaries

Option B: Allow any path

More flexible
LLM can edit config files, etc.
User's responsibility to review

Decision: Start with Option B (no sandbox). Add --sandbox flag later if needed.

6. Tool output size limits?

Current proposal:

ReadTool: 5000 line limit per read (paginate for more)
BashTool: 1MB truncation
EditTool: No limit (reasonable file sizes expected)
WriteTool: No limit (LLM context limited)

Alternative: Enforce global 1MB limit on all tool outputs

Decision: Per-tool limits. ReadTool and BashTool need it most.

7. How to handle long-running bash commands?

Question: Should BashTool stream output or wait for completion?

Option A: Wait for completion (current proposal)

Simpler implementation
Full output available for LLM
Blocks until done

Option B: Stream output

Better UX (show progress)
More complex (need to handle partial output)
LLM sees final output only

Decision: Wait for completion initially. Add streaming later if needed.

8. Package naming alternatives?

Current proposal:

@mariozechner/coding-agent (core)
@mariozechner/coding-agent-tui (TUI)

Alternatives:

@mariozechner/file-agent / @mariozechner/file-agent-tui
@mariozechner/dev-agent / @mariozechner/dev-agent-tui
@mariozechner/pi-code / @mariozechner/pi-code-tui

Decision: coding-agent is clear and specific to the use case.

Summary

This architecture provides:

General Agent Package (`@mariozechner/agent`)

✅ Transport abstraction - Pluggable backends (ProviderTransport, AppTransport) ✅ Reactive state - Subscribe/emit pattern for UI binding ✅ Message transformation - Flexible pipeline for message filtering/adaptation ✅ Message queueing - Out-of-band message injection during agent loop ✅ Attachment support - Type-safe attachment handling (processing is external) ✅ Abort support - First-class cancellation with AbortController ✅ Provider agnostic - Works with any LLM provider via @mariozechner/pi-ai ✅ Type-safe - Full TypeScript with proper types

Coding Agent Package (`@mariozechner/coding-agent`)

✅ Builds on general agent - Leverages transport abstraction and state management ✅ Session persistence - JSONL-based session storage and resume ✅ Focused tools - read, bash, edit, write (4 tools, no more) ✅ Smart pagination - 5000-line chunks with offset/limit for ReadTool ✅ Working directory context - All tools operate relative to project root ✅ Simple API - Hides complexity, easy to use ✅ Testable - Pure functions, mockable dependencies

Key Architectural Insights

Extract, don't rewrite - The web-ui agent is well-designed; extract it into a general package
Separation of concerns - Document processing (PDF/DOCX/etc.) stays in web-ui, only type definitions move to general agent
Layered architecture - pi-ai → agent → coding-agent → coding-agent-tui
Reusable across UIs - Web UI and coding agent both use the same general agent package
Pluggable transports - Easy to add new backends (local API, proxy server, etc.)
Attachment flexibility - Type is defined centrally, processing is done by consumers

42 KiB Raw Blame History

Agent Architecture

Executive Summary

Current Architecture Analysis

Package Overview

packages/ai - Core Streaming Library

packages/web-ui/src/agent - Web Agent

packages/agent - OLD Implementation

Proposed Architecture

Package Structure

Dependency Graph

Package: @mariozechner/agent

Core Types

AppMessage Abstraction

Agent Class

Transport Interface

ProviderTransport

AppTransport

Package: @mariozechner/coding-agent

CodingAgent Class

Usage Example (TUI)

Usage Example (Web UI)

Session Manager

Tool: BashTool

Tool: ReadTool

Tool: EditTool

Tool: WriteTool

CLI Interface (included in @mariozechner/coding-agent)

CLI Usage

CLI Arguments

Renderers

JSON Mode Example

Integration Patterns

VS Code Extension

Headless Server/API

Migration Plan

Phase 1: Create General Agent Package

Phase 2: Create Coding Agent Package (with CLI)

Phase 3: Prove Out New Packages

Phase 4: Migrate Web UI (OPTIONAL, later)

Phase 5: Cleanup

Phase 6: Future Enhancements

Open Questions & Decisions

1. Should EditTool support multiple replacements?

2. ReadTool line limit and pagination strategy?

3. Should ReadTool handle binary files?

4. Should EditTool support regex?

5. Working directory enforcement?

6. Tool output size limits?

7. How to handle long-running bash commands?

8. Package naming alternatives?

Summary

General Agent Package (@mariozechner/agent)

Coding Agent Package (@mariozechner/coding-agent)

Key Architectural Insights

42 KiB

Raw Blame History

General Agent Package (`@mariozechner/agent`)

Coding Agent Package (`@mariozechner/coding-agent`)