Initial monorepo setup with npm workspaces and dual TypeScript configuration

- Set up npm workspaces for three packages: pi-tui, pi-agent, and pi (pods) - Implemented dual TypeScript configuration: - Root tsconfig.json with path mappings for development and type checking - Package-specific tsconfig.build.json for clean production builds - Configured lockstep versioning with sync script for inter-package dependencies - Added comprehensive documentation for development and publishing workflows - All packages at version 0.5.0 ready for npm publishing
2026-04-15 05:02:07 +00:00 · 2025-08-09 17:18:38 +02:00 · 2025-08-09 17:18:38 +02:00 · a74c5da112
commit a74c5da112
63 changed files with 14558 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,24 @@
 node_modules/
 dist/
 *.log
 .DS_Store
 *.tsbuildinfo
 packages/*/node_modules/
 packages/*/dist/
 # Environment
 .env
 # Editor files
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 # Package specific
 .npm/
 coverage/
 .nyc_output/
 .pi_config/
 tui-debug.log
--- a/.npmrc
+++ b/.npmrc
@ -0,0 +1,2 @@
 hoist=true
 shamefully-hoist=true
--- a/21
+++ b/21
@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2025 Mario Zechner
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/PUBLISHING.md
+++ b/PUBLISHING.md
@ -0,0 +1,87 @@
 # Publishing Guide
 ## Publishing Workflow
 ### 1. Pre-publish Checks
 ```bash
 # Clean everything and rebuild from scratch
 npm run clean
 npm run build
 # Run all checks
 npm run check
 # Test packages work correctly
 cd packages/agent && npx tsx src/cli.ts --help
 cd packages/pods && npx tsx src/cli.ts --help
 ```
 ### 2. Version Bump
 All packages use lockstep versioning (same version number):
 ```bash
 # Patch version bump (0.5.0 -> 0.5.1)
 npm run version:patch
 # Minor version bump (0.5.0 -> 0.6.0)
 npm run version:minor  
 # Major version bump (0.5.0 -> 1.0.0)
 npm run version:major
 ```
 This automatically:
 - Updates all package versions
 - Syncs inter-package dependencies
 ### 3. Commit & Tag
 ```bash
 # Commit the version bump
 git add -A
 git commit -m "Release v0.5.1"
 # Tag the release
 git tag -a v0.5.1 -m "Release v0.5.1"
 # Push to GitHub
 git push origin main --tags
 ```
 ### 4. Publish to npm
 ```bash
 # Dry run first (see what would be published)
 npm run publish:dry
 # If everything looks good, publish for real
 npm run publish:all
 ```
 This will:
 1. Clean all dist folders
 2. Build all packages in dependency order
 3. Run all checks
 4. Publish all packages to npm with public access
 ### 5. Verify Publication
 ```bash
 # Check npm registry
 npm view @mariozechner/pi-tui
 npm view @mariozechner/pi-agent  
 npm view @mariozechner/pi
 # Test installation
 npx @mariozechner/pi --help
 npx @mariozechner/pi-agent --help
 ```
 ## Notes
 - All packages are published with `--access public` flag
 - The `prepublishOnly` script in each package ensures clean builds
 - Dependencies between packages use `^` version ranges for flexibility
 - The monorepo itself (`pi-monorepo`) is private and not published
--- a/README.md
+++ b/README.md
@ -0,0 +1,97 @@
 # Pi Monorepo
 A collection of tools for managing LLM deployments and building AI agents.
 ## Packages
 - **[@mariozechner/pi-tui](packages/tui)** - Terminal UI library with differential rendering
 - **[@mariozechner/pi-agent](packages/agent)** - General-purpose agent with tool calling and session persistence
 - **[@mariozechner/pi](packages/pods)** - CLI for managing vLLM deployments on GPU pods
 ## Development
 This is a monorepo using npm workspaces for package management and a dual TypeScript configuration for development and building.
 ### Setup
 ```bash
 # Install all dependencies
 npm install
 # Build all packages (required for production use)
 npm run build
 # Or run directly with tsx during development (no build needed)
 cd packages/pods && npx tsx src/cli.ts
 cd packages/agent && npx tsx src/cli.ts
 ```
 ### Common Commands
 ```bash
 # Clean all build artifacts and tsconfig.tsbuildinfo files
 npm run clean
 # Build all packages in dependency order
 npm run build
 # Run biome checks and TypeScript type checking (no build required)
 npm run check
 # Run tests (if present)
 npm run test
 ```
 ### Package Dependencies
 The packages have the following dependency structure:
 `pi-tui` -> `pi-agent` -> `pi`
 ### TypeScript Configuration
 The monorepo uses a dual TypeScript configuration approach:
 - **Root `tsconfig.json`**: Contains path mappings for all packages, used for type checking and development with `tsx`
 - **Package `tsconfig.build.json`**: Clean build configuration with `rootDir` and `outDir`, used for production builds
 This setup allows:
 - Type checking without building (`npm run check` works immediately)
 - Running source files directly with `tsx` during development
 - Clean, organized build outputs for publishing
 ### Versioning
 All packages use **lockstep versioning** - they share the same version number:
 ```bash
 # Bump patch version (0.5.0 -> 0.5.1)
 npm run version:patch
 # Bump minor version (0.5.0 -> 0.6.0)
 npm run version:minor
 # Bump major version (0.5.0 -> 1.0.0)
 npm run version:major
 ```
 These commands automatically:
 1. Update all package versions
 2. Sync inter-package dependency versions
 3. Update package-lock.json
 ### Publishing
 See [PUBLISHING.md](PUBLISHING.md) for the complete publishing workflow.
 Quick version:
 ```bash
 # Dry run to see what would be published
 npm run publish:dry
 # Publish all packages to npm
 npm run publish:all
 ```
 ## License
 MIT
--- a/biome.json
+++ b/biome.json
@ -0,0 +1,33 @@
 {
 	"$schema": "https://biomejs.dev/schemas/2.1.4/schema.json",
 	"linter": {
 		"enabled": true,
 		"rules": {
 			"recommended": true,
 			"style": {
 				"noNonNullAssertion": "off",
 				"useConst": "error",
 				"useNodejsImportProtocol": "off",
 				"useTemplate": "off"
 			},
 			"correctness": {
 				"noUnusedVariables": "off"
 			},
 			"suspicious": {
 				"noExplicitAny": "off",
 				"noControlCharactersInRegex": "off",
 				"noEmptyInterface": "off"
 			}
 		}
 	},
 	"formatter": {
 		"enabled": true,
 		"formatWithErrors": false,
 		"indentStyle": "tab",
 		"indentWidth": 3,
 		"lineWidth": 120
 	},
 	"files": {
 		"includes": ["packages/*/src/**/*", "*.json", "*.md"]
 	}
 }
--- a/models.js
+++ b/models.js
--- a/package-lock.json
+++ b/package-lock.json
--- a/package.json
+++ b/package.json
@ -0,0 +1,31 @@
 {
 	"name": "pi-monorepo",
 	"private": true,
 	"type": "module",
 	"workspaces": [
 		"packages/*"
 	],
 	"scripts": {
 		"clean": "npm run clean --workspaces",
 		"build": "npm run build -w @mariozechner/pi-tui && npm run build -w @mariozechner/pi-agent && npm run build -w @mariozechner/pi",
 		"check": "biome check --write . && npm run check --workspaces && tsc --noEmit",
 		"test": "npm run test --workspaces --if-present",
 		"version:patch": "npm version patch -ws --no-git-tag-version && node scripts/sync-versions.js",
 		"version:minor": "npm version minor -ws --no-git-tag-version && node scripts/sync-versions.js",
 		"version:major": "npm version major -ws --no-git-tag-version && node scripts/sync-versions.js",
 		"version:set": "npm version -ws",
 		"version:sync": "node scripts/sync-versions.js",
 		"prepublish:all": "npm run clean && npm run build && npm run check",
 		"publish:all": "npm run prepublish:all && npm publish -ws --access public",
 		"publish:dry": "npm run prepublish:all && npm publish -ws --access public --dry-run"
 	},
 	"devDependencies": {
 		"@biomejs/biome": "^2.1.4",
 		"@types/node": "^22.10.5",
 		"tsx": "^4.20.3",
 		"typescript": "^5.9.2"
 	},
 	"engines": {
 		"node": ">=20.0.0"
 	}
 }
--- a/packages/agent/README.md
+++ b/packages/agent/README.md
@ -0,0 +1,324 @@
 # pi-agent
 A general-purpose agent with tool calling and session persistence, modeled after Claude Code but extremely hackable and minimal. It comes with a built-in TUI (also modeled after Claude Code) for interactive use.
 Everything is designed to be easy:
 - Writing custom UIs on top of it (via JSON mode in any language or the TypeScript API)
 - Using it for inference steps in deterministic programs (via JSON mode in any language or the TypeScript API)
 - Providing your own system prompts and tools
 - Working with various LLM providers or self-hosted LLMs
 ## Installation
 ```bash
 npm install -g @mariozechner/pi-agent
 ```
 This installs the `pi-agent` command globally.
 ## Quick Start
 By default, pi-agent uses OpenAI's API with model `gpt-5-mini` and authenticates using the `OPENAI_API_KEY` environment variable. Any OpenAI-compatible endpoint works, including Ollama, vLLM, OpenRouter, Groq, Anthropic, etc.
 ```bash
 # Single message
 pi-agent "What is 2+2?"
 # Multiple messages processed sequentially
 pi-agent "What is 2+2?" "What about 3+3?"
 # Interactive chat mode (no messages = interactive)
 pi-agent
 # Continue most recently modified session in current directory
 pi-agent --continue "Follow up question"
 # GPT-OSS via Groq
 pi-agent --base-url https://api.groq.com/openai/v1 --api-key $GROQ_API_KEY --model openai/gpt-oss-120b
 # GLM 4.5 via OpenRouter
 pi-agent --base-url https://openrouter.ai/api/v1 --api-key $OPENROUTER_API_KEY --model z-ai/glm-4.5
 # Claude via Anthropic (no prompt caching support - see https://docs.anthropic.com/en/api/openai-sdk)
 pi-agent --base-url https://api.anthropic.com/v1 --api-key $ANTHROPIC_API_KEY --model claude-opus-4-1-20250805
 ```
 ## Usage Modes
 ### Single-Shot Mode
 Process one or more messages and exit:
 ```bash
 pi-agent "First question" "Second question"
 ```
 ### Interactive Mode
 Start an interactive chat session:
 ```bash
 pi-agent
 ```
 - Type messages and press Enter to send
 - Type `exit` or `quit` to end session
 - Press Escape to interrupt while processing
 - Press CTRL+C to clear the text editor
 - Press CTRL+C twice quickly to exit
 ### JSON Mode
 JSON mode enables programmatic integration by outputting events as JSONL (JSON Lines).
 **Single-shot mode:** Outputs a stream of JSON events for each message, then exits.
 ```bash
 pi-agent --json "What is 2+2?" "And the meaning of life?"
 # Outputs: {"type":"session_start","sessionId":"bb6f0acb-80cf-4729-9593-bcf804431a53","model":"gpt-5-mini","api":"completions","baseURL":"https://api.openai.com/v1","systemPrompt":"You are a helpful assistant."} {"type":"user_message","text":"What is 2+2?"} {"type":"assistant_start"} {"type":"token_usage","inputTokens":314,"outputTokens":16,"totalTokens":330,"cacheReadTokens":0,"cacheWriteTokens":0} {"type":"assistant_message","text":"2 + 2 = 4"} {"type":"user_message","text":"And the meaning of life?"} {"type":"assistant_start"} {"type":"token_usage","inputTokens":337,"outputTokens":331,"totalTokens":668,"cacheReadTokens":0,"cacheWriteTokens":0} {"type":"assistant_message","text":"Short answer (pop-culture): 42.\n\nMore useful answers:\n- Philosophical...
 ```
 **Interactive mode:** Accepts JSON commands via stdin and outputs JSON events to stdout.
 ```bash
 # Start interactive JSON mode
 pi-agent --json
 # Now send commands via stdin
 # Pipe one or more initial messages in
 (echo '{"type": "message", "content": "What is 2+2?"}'; cat) | pi-agent --json
 # Outputs: {"type":"session_start","sessionId":"bb64cfbe-dd52-4662-bd4a-0d921c332fd1","model":"gpt-5-mini","api":"completions","baseURL":"https://api.openai.com/v1","systemPrompt":"You are a helpful assistant."} {"type":"user_message","text":"What is 2+2?"} {"type":"assistant_start"} {"type":"token_usage","inputTokens":314,"outputTokens":16,"totalTokens":330,"cacheReadTokens":0,"cacheWriteTokens":0} {"type":"assistant_message","text":"2 + 2 = 4"}
 ```
 Commands you can send via stdin in interactive JSON mode:
 ```json
 {"type": "message", "content": "Your message here"}  // Send a message to the agent
 {"type": "interrupt"}                                 // Interrupt current processing
 ```
 ## Configuration
 ### Command Line Options
 ```
 --base-url <url>        API base URL (default: https://api.openai.com/v1)
 --api-key <key>         API key (or set OPENAI_API_KEY env var)
 --model <model>         Model name (default: gpt-4o-mini)
 --api <type>            API type: "completions" or "responses" (default: completions)
 --system-prompt <text>  System prompt (default: "You are a helpful assistant.")
 --continue              Continue previous session
 --json                  JSON mode
 --help, -h              Show help message
 ```
 ### Environment Variables
 - `OPENAI_API_KEY` - OpenAI API key (used if --api-key not provided)
 ## Session Persistence
 Sessions are automatically saved to `~/.pi/sessions/` and include:
 - Complete conversation history
 - Tool call results
 - Token usage statistics
 Use `--continue` to resume the last session:
 ```bash
 pi-agent "Start a story about a robot"
 # ... later ...
 pi-agent --continue "Continue the story"
 ```
 ## Tools
 The agent includes built-in tools for file system operations:
 - **read_file** - Read file contents
 - **list_directory** - List directory contents
 - **bash** - Execute shell commands
 - **glob** - Find files by pattern
 - **ripgrep** - Search file contents
 These tools are automatically available when using the agent through the `pi` command for code navigation tasks.
 ## JSON Mode Events
 When using `--json`, the agent outputs these event types:
 - `session_start` - New session started with metadata
 - `user_message` - User input
 - `assistant_start` - Assistant begins responding
 - `assistant_message` - Assistant's response
 - `thinking` - Reasoning/thinking (for models that support it)
 - `tool_call` - Tool being called
 - `tool_result` - Result from tool
 - `token_usage` - Token usage statistics
 - `error` - Error occurred
 - `interrupted` - Processing was interrupted
 The complete TypeScript type definition for `AgentEvent` can be found in [`src/agent.ts`](src/agent.ts#L6).
 ## Build an Interactive UI with JSON Mode
 Build custom UIs in any language by spawning pi-agent in JSON mode and communicating via stdin/stdout.
 ```javascript
 import { spawn } from 'child_process';
 import { createInterface } from 'readline';
 // Start the agent in JSON mode
 const agent = spawn('pi-agent', ['--json']);
 // Create readline interface for parsing JSONL output from agent
 const agentOutput = createInterface({input: agent.stdout, crlfDelay: Infinity});
 // Create readline interface for user input
 const userInput = createInterface({input: process.stdin, output: process.stdout});
 // State tracking
 let isProcessing = false, lastUsage, isExiting = false;
 // Handle each line of JSON output from agent
 agentOutput.on('line', (line) => {
    try {
      const event = JSON.parse(line);
      // Handle all event types
      switch (event.type) {
        case 'session_start':
          console.log(`Session started (${event.model}, ${event.api}, ${event.baseURL})`);
          console.log('Press CTRL + C to exit');
          promptUser();
          break;
        case 'user_message':
          // Already shown in prompt, skip
          break;
        case 'assistant_start':
          isProcessing = true;
          console.log('\n[assistant]');
          break;
        case 'thinking':
          console.log(`[thinking]\n${event.text}\n`);
          break;
        case 'tool_call':
          console.log(`[tool] ${event.name}(${event.args.substring(0, 50)})\n`);
          break;
        case 'tool_result':
            const lines = event.result.split('\n');
            const truncated = lines.length - 5 > 0 ? `\n.  ... (${lines.length - 5} more lines truncated)` : '';
            console.log(`[tool result]\n${lines.slice(0, 5).join('\n')}${truncated}\n`);
          break;
        case 'assistant_message':
          console.log(event.text.trim());
          isProcessing = false;
          promptUser();
          break;
        case 'token_usage':
          lastUsage = event;
          break;
        case 'error':
          console.error('\n❌ Error:', event.message);
          isProcessing = false;
          promptUser();
          break;
        case 'interrupted':
          console.log('\n⚠️  Interrupted by user');
          isProcessing = false;
          promptUser();
          break;
      }
    } catch (e) {
      console.error('Failed to parse JSON:', line, e);
    }
 });
 // Send a message to the agent
 function sendMessage(content) {
  agent.stdin.write(`${JSON.stringify({type: 'message', content: content})}\n`);
 }
 // Send interrupt signal
 function interrupt() {
  agent.stdin.write(`${JSON.stringify({type: 'interrupt'})}\n`);
 }
 // Prompt for user input
 function promptUser() {
  if (isExiting) return;
  if (lastUsage) {
    console.log(`\nin: ${lastUsage.inputTokens}, out: ${lastUsage.outputTokens}, cache read: ${lastUsage.cacheReadTokens}, cache write: ${lastUsage.cacheWriteTokens}`);
  }
  userInput.question('\n[user]\n> ', (answer) => {
    answer = answer.trim();
    if (answer) {
      sendMessage(answer);
    } else {
      promptUser();
    }
  });
 }
 // Handle Ctrl+C
 process.on('SIGINT', () => {
  if (isProcessing) {
    interrupt();
  } else {
    agent.kill();
    process.exit(0);
  }
 });
 // Handle agent exit
 agent.on('close', (code) => {
  isExiting = true;
  userInput.close();
  console.log(`\nAgent exited with code ${code}`);
  process.exit(code);
 });
 // Handle errors
 agent.on('error', (err) => {
  console.error('Failed to start agent:', err);
  process.exit(1);
 });
 // Start the conversation
 console.log('Pi Agent Interactive Chat');
 ```
 ## Architecture
 The agent is built with:
 - **agent.ts** - Core Agent class and API functions
 - **cli.ts** - CLI entry point, argument parsing, and JSON mode handler
 - **args.ts** - Custom typed argument parser
 - **session-manager.ts** - Session persistence
 - **tools/** - Tool implementations
 - **renderers/** - Output formatters (console, TUI, JSON)
 ## Development
 ```bash
 # Run from source
 npx tsx src/cli.ts "Hello"
 # Build
 npm run build
 # Run built version
 dist/cli.js "Hello"
 ```
 ## Use as a Library
 ```typescript
 import { Agent, ConsoleRenderer } from '@mariozechner/pi-agent';
 const agent = new Agent({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: 'https://api.openai.com/v1',
  model: 'gpt-5-mini',
  api: 'completions',
  systemPrompt: 'You are a helpful assistant.'
 }, new ConsoleRenderer());
 await agent.ask('What is 2+2?');
 ```
--- a/packages/agent/package-lock.json
+++ b/packages/agent/package-lock.json
--- a/packages/agent/package.json
+++ b/packages/agent/package.json
@ -0,0 +1,47 @@
 {
 	"name": "@mariozechner/pi-agent",
 	"version": "0.5.0",
 	"description": "General-purpose agent with tool calling and session persistence",
 	"type": "module",
 	"bin": {
 		"pi-agent": "./dist/cli.js"
 	},
 	"main": "./dist/index.js",
 	"types": "./dist/index.d.ts",
 	"files": [
 		"dist"
 	],
 	"scripts": {
 		"clean": "rm -rf dist tsconfig.tsbuildinfo",
 		"build": "tsc -p tsconfig.build.json && chmod +x dist/cli.js",
 		"check": "biome check --write .",
 		"prepublishOnly": "npm run clean && npm run build"
 	},
 	"dependencies": {
 		"@mariozechner/pi-tui": "^0.5.0",
 		"@types/glob": "^8.1.0",
 		"chalk": "^5.5.0",
 		"glob": "^11.0.3",
 		"openai": "^5.12.2"
 	},
 	"devDependencies": {},
 	"keywords": [
 		"agent",
 		"ai",
 		"llm",
 		"openai",
 		"claude",
 		"cli",
 		"tui"
 	],
 	"author": "Mario Zechner",
 	"license": "MIT",
 	"repository": {
 		"type": "git",
 		"url": "https://github.com/badlogic/pi-mono.git",
 		"directory": "packages/agent"
 	},
 	"engines": {
 		"node": ">=20.0.0"
 	}
 }
--- a/packages/agent/src/agent.ts
+++ b/packages/agent/src/agent.ts
@ -0,0 +1,484 @@
 import OpenAI from "openai";
 import type { ResponseFunctionToolCallOutputItem } from "openai/resources/responses/responses.mjs";
 import type { SessionManager } from "./session-manager.js";
 import { executeTool, toolsForChat, toolsForResponses } from "./tools/tools.js";
 export type AgentEvent =
 	| { type: "session_start"; sessionId: string; model: string; api: string; baseURL: string; systemPrompt: string }
 	| { type: "assistant_start" }
 	| { type: "thinking"; text: string }
 	| { type: "tool_call"; toolCallId: string; name: string; args: string }
 	| { type: "tool_result"; toolCallId: string; result: string; isError: boolean }
 	| { type: "assistant_message"; text: string }
 	| { type: "error"; message: string }
 	| { type: "user_message"; text: string }
 	| { type: "interrupted" }
 	| {
 			type: "token_usage";
 			inputTokens: number;
 			outputTokens: number;
 			totalTokens: number;
 			cacheReadTokens: number;
 			cacheWriteTokens: number;
 	  };
 export interface AgentEventReceiver {
 	on(event: AgentEvent): Promise<void>;
 }
 export interface AgentConfig {
 	apiKey: string;
 	baseURL: string;
 	model: string;
 	api: "completions" | "responses";
 	systemPrompt: string;
 }
 export interface ToolCall {
 	name: string;
 	arguments: string;
 	id: string;
 }
 export async function callModelResponsesApi(
 	client: OpenAI,
 	model: string,
 	messages: any[],
 	signal?: AbortSignal,
 	eventReceiver?: AgentEventReceiver,
 ): Promise<void> {
 	await eventReceiver?.on({ type: "assistant_start" });
 	let conversationDone = false;
 	while (!conversationDone) {
 		// Check if we've been interrupted
 		if (signal?.aborted) {
 			await eventReceiver?.on({ type: "interrupted" });
 			throw new Error("Interrupted");
 		}
 		const response = await client.responses.create(
 			{
 				model,
 				input: messages,
 				tools: toolsForResponses as any,
 				tool_choice: "auto",
 				parallel_tool_calls: true,
 				reasoning: {
 					effort: "medium", // Use auto reasoning effort
 					summary: "auto",
 				},
 				max_output_tokens: 2000, // TODO make configurable
 			},
 			{ signal },
 		);
 		// Report token usage if available (responses API format)
 		if (response.usage) {
 			const usage = response.usage;
 			eventReceiver?.on({
 				type: "token_usage",
 				inputTokens: usage.input_tokens || 0,
 				outputTokens: usage.output_tokens || 0,
 				totalTokens: usage.total_tokens || 0,
 				cacheReadTokens: usage.input_tokens_details.cached_tokens || 0,
 				cacheWriteTokens: 0, // Not available in API
 			});
 		}
 		const output = response.output;
 		if (!output) break;
 		for (const item of output) {
 			// gpt-oss vLLM quirk: need to remove type from "message" events
 			if (item.id === "message") {
 				const { type, ...message } = item;
 				messages.push(item);
 			} else {
 				messages.push(item);
 			}
 			switch (item.type) {
 				case "reasoning": {
 					for (const content of item.content || []) {
 						if (content.type === "reasoning_text") {
 							await eventReceiver?.on({ type: "thinking", text: content.text });
 						}
 					}
 					break;
 				}
 				case "message": {
 					for (const content of item.content || []) {
 						if (content.type === "output_text") {
 							await eventReceiver?.on({ type: "assistant_message", text: content.text });
 						} else if (content.type === "refusal") {
 							await eventReceiver?.on({ type: "error", message: `Refusal: ${content.refusal}` });
 						}
 						conversationDone = true;
 					}
 					break;
 				}
 				case "function_call": {
 					if (signal?.aborted) {
 						await eventReceiver?.on({ type: "interrupted" });
 						throw new Error("Interrupted");
 					}
 					try {
 						await eventReceiver?.on({
 							type: "tool_call",
 							toolCallId: item.call_id || "",
 							name: item.name,
 							args: item.arguments,
 						});
 						const result = await executeTool(item.name, item.arguments, signal);
 						await eventReceiver?.on({
 							type: "tool_result",
 							toolCallId: item.call_id || "",
 							result,
 							isError: false,
 						});
 						// Add tool result to messages
 						const toolResultMsg = {
 							type: "function_call_output",
 							call_id: item.call_id,
 							output: result,
 						} as ResponseFunctionToolCallOutputItem;
 						messages.push(toolResultMsg);
 					} catch (e: any) {
 						await eventReceiver?.on({
 							type: "tool_result",
 							toolCallId: item.call_id || "",
 							result: e.message,
 							isError: true,
 						});
 						const errorMsg = {
 							type: "function_call_output",
 							call_id: item.id,
 							output: e.message,
 							isError: true,
 						};
 						messages.push(errorMsg);
 					}
 					break;
 				}
 				default: {
 					eventReceiver?.on({ type: "error", message: `Unknown output type in LLM response: ${item.type}` });
 					break;
 				}
 			}
 		}
 	}
 }
 export async function callModelChatCompletionsApi(
 	client: OpenAI,
 	model: string,
 	messages: any[],
 	signal?: AbortSignal,
 	eventReceiver?: AgentEventReceiver,
 ): Promise<void> {
 	await eventReceiver?.on({ type: "assistant_start" });
 	let assistantResponded = false;
 	while (!assistantResponded) {
 		if (signal?.aborted) {
 			await eventReceiver?.on({ type: "interrupted" });
 			throw new Error("Interrupted");
 		}
 		const response = await client.chat.completions.create(
 			{
 				model,
 				messages,
 				tools: toolsForChat,
 				tool_choice: "auto",
 				max_completion_tokens: 2000, // TODO make configurable
 			},
 			{ signal },
 		);
 		const message = response.choices[0].message;
 		// Report token usage if available
 		if (response.usage) {
 			const usage = response.usage;
 			await eventReceiver?.on({
 				type: "token_usage",
 				inputTokens: usage.prompt_tokens || 0,
 				outputTokens: usage.completion_tokens || 0,
 				totalTokens: usage.total_tokens || 0,
 				cacheReadTokens: usage.prompt_tokens_details?.cached_tokens || 0,
 				cacheWriteTokens: 0, // Not available in API
 			});
 		}
 		if (message.tool_calls && message.tool_calls.length > 0) {
 			// Add assistant message with tool calls to history
 			const assistantMsg: any = {
 				role: "assistant",
 				content: message.content || null,
 				tool_calls: message.tool_calls,
 			};
 			messages.push(assistantMsg);
 			// Display and execute each tool call
 			for (const toolCall of message.tool_calls) {
 				// Check if interrupted before executing tool
 				if (signal?.aborted) {
 					await eventReceiver?.on({ type: "interrupted" });
 					throw new Error("Interrupted");
 				}
 				try {
 					const funcName = toolCall.type === "function" ? toolCall.function.name : toolCall.custom.name;
 					const funcArgs = toolCall.type === "function" ? toolCall.function.arguments : toolCall.custom.input;
 					await eventReceiver?.on({ type: "tool_call", toolCallId: toolCall.id, name: funcName, args: funcArgs });
 					const result = await executeTool(funcName, funcArgs, signal);
 					await eventReceiver?.on({ type: "tool_result", toolCallId: toolCall.id, result, isError: false });
 					// Add tool result to messages
 					const toolMsg = {
 						role: "tool",
 						tool_call_id: toolCall.id,
 						content: result,
 					};
 					messages.push(toolMsg);
 				} catch (e: any) {
 					eventReceiver?.on({ type: "tool_result", toolCallId: toolCall.id, result: e.message, isError: true });
 					const errorMsg = {
 						role: "tool",
 						tool_call_id: toolCall.id,
 						content: e.message,
 					};
 					messages.push(errorMsg);
 				}
 			}
 		} else if (message.content) {
 			// Final assistant response
 			eventReceiver?.on({ type: "assistant_message", text: message.content });
 			const finalMsg = { role: "assistant", content: message.content };
 			messages.push(finalMsg);
 			assistantResponded = true;
 		}
 	}
 }
 export class Agent {
 	private client: OpenAI;
 	public readonly config: AgentConfig;
 	private messages: any[] = [];
 	private renderer?: AgentEventReceiver;
 	private sessionManager?: SessionManager;
 	private comboReceiver: AgentEventReceiver;
 	private abortController: AbortController | null = null;
 	constructor(config: AgentConfig, renderer?: AgentEventReceiver, sessionManager?: SessionManager) {
 		this.config = config;
 		this.client = new OpenAI({
 			apiKey: config.apiKey,
 			baseURL: config.baseURL,
 		});
 		// Use provided renderer or default to console
 		this.renderer = renderer;
 		this.sessionManager = sessionManager;
 		this.comboReceiver = {
 			on: async (event: AgentEvent): Promise<void> => {
 				await this.renderer?.on(event);
 				await this.sessionManager?.on(event);
 			},
 		};
 		// Initialize with system prompt if provided
 		if (config.systemPrompt) {
 			this.messages.push({ role: "system", content: config.systemPrompt });
 		}
 		// Start session logging if we have a session manager
 		if (sessionManager) {
 			sessionManager.startSession(this.config);
 			// Emit session_start event
 			this.comboReceiver.on({
 				type: "session_start",
 				sessionId: sessionManager.getSessionId(),
 				model: config.model,
 				api: config.api,
 				baseURL: config.baseURL,
 				systemPrompt: config.systemPrompt,
 			});
 		}
 	}
 	async ask(userMessage: string): Promise<void> {
 		// Render user message through the event system
 		this.comboReceiver.on({ type: "user_message", text: userMessage });
 		// Add user message
 		const userMsg = { role: "user", content: userMessage };
 		this.messages.push(userMsg);
 		// Create a new AbortController for this chat session
 		this.abortController = new AbortController();
 		try {
 			if (this.config.api === "responses") {
 				await callModelResponsesApi(
 					this.client,
 					this.config.model,
 					this.messages,
 					this.abortController.signal,
 					this.comboReceiver,
 				);
 			} else {
 				await callModelChatCompletionsApi(
 					this.client,
 					this.config.model,
 					this.messages,
 					this.abortController.signal,
 					this.comboReceiver,
 				);
 			}
 		} catch (e: any) {
 			// Check if this was an interruption
 			if (e.message === "Interrupted" || this.abortController.signal.aborted) {
 				return;
 			}
 			throw e;
 		} finally {
 			this.abortController = null;
 		}
 	}
 	interrupt(): void {
 		this.abortController?.abort();
 	}
 	setEvents(events: AgentEvent[]): void {
 		// Reconstruct messages from events based on API type
 		this.messages = [];
 		if (this.config.api === "responses") {
 			// Responses API format
 			if (this.config.systemPrompt) {
 				this.messages.push({
 					type: "system",
 					content: [{ type: "system_text", text: this.config.systemPrompt }],
 				});
 			}
 			for (const event of events) {
 				switch (event.type) {
 					case "user_message":
 						this.messages.push({
 							type: "user",
 							content: [{ type: "input_text", text: event.text }],
 						});
 						break;
 					case "thinking":
 						// Add reasoning message
 						this.messages.push({
 							type: "reasoning",
 							content: [{ type: "reasoning_text", text: event.text }],
 						});
 						break;
 					case "tool_call":
 						// Add function call
 						this.messages.push({
 							type: "function_call",
 							id: event.toolCallId,
 							name: event.name,
 							arguments: event.args,
 						});
 						break;
 					case "tool_result":
 						// Add function result
 						this.messages.push({
 							type: "function_call_output",
 							call_id: event.toolCallId,
 							output: event.result,
 						});
 						break;
 					case "assistant_message":
 						// Add final message
 						this.messages.push({
 							type: "message",
 							content: [{ type: "output_text", text: event.text }],
 						});
 						break;
 				}
 			}
 		} else {
 			// Chat Completions API format
 			if (this.config.systemPrompt) {
 				this.messages.push({ role: "system", content: this.config.systemPrompt });
 			}
 			// Track tool calls in progress
 			let pendingToolCalls: any[] = [];
 			for (const event of events) {
 				switch (event.type) {
 					case "user_message":
 						this.messages.push({ role: "user", content: event.text });
 						break;
 					case "assistant_start":
 						// Reset pending tool calls for new assistant response
 						pendingToolCalls = [];
 						break;
 					case "tool_call":
 						// Accumulate tool calls
 						pendingToolCalls.push({
 							id: event.toolCallId,
 							type: "function",
 							function: {
 								name: event.name,
 								arguments: event.args,
 							},
 						});
 						break;
 					case "tool_result":
 						// When we see the first tool result, add the assistant message with all tool calls
 						if (pendingToolCalls.length > 0) {
 							this.messages.push({
 								role: "assistant",
 								content: null,
 								tool_calls: pendingToolCalls,
 							});
 							pendingToolCalls = [];
 						}
 						// Add the tool result
 						this.messages.push({
 							role: "tool",
 							tool_call_id: event.toolCallId,
 							content: event.result,
 						});
 						break;
 					case "assistant_message":
 						// Final assistant response (no tool calls)
 						this.messages.push({ role: "assistant", content: event.text });
 						break;
 					// Skip other event types (thinking, error, interrupted, token_usage)
 				}
 			}
 		}
 	}
 }
--- a/packages/agent/src/args.ts
+++ b/packages/agent/src/args.ts
@ -0,0 +1,204 @@
 import { homedir } from "os";
 import { resolve } from "path";
 export type Choice<T = string> = {
 	value: T;
 	description?: string;
 };
 export type ArgDef = {
 	type: "flag" | "boolean" | "int" | "float" | "string" | "file";
 	alias?: string;
 	default?: any;
 	description?: string;
 	choices?: Choice[] | string[]; // Can be simple strings or objects with descriptions
 	showDefault?: boolean | string; // false to hide, true to show value, string to show custom text
 };
 export type ArgDefs = Record<string, ArgDef>;
 export type ParsedArgs<T extends ArgDefs> = {
 	[K in keyof T]: T[K]["type"] extends "flag"
 		? boolean
 		: T[K]["type"] extends "boolean"
 			? boolean
 			: T[K]["type"] extends "int"
 				? number
 				: T[K]["type"] extends "float"
 					? number
 					: T[K]["type"] extends "string"
 						? string
 						: T[K]["type"] extends "file"
 							? string
 							: never;
 } & {
 	_: string[]; // Positional arguments
 };
 export function parseArgs<T extends ArgDefs>(defs: T, args: string[]): ParsedArgs<T> {
 	const result: any = { _: [] };
 	const aliasMap: Record<string, string> = {};
 	// Build alias map and set defaults
 	for (const [key, def] of Object.entries(defs)) {
 		if (def.alias) {
 			aliasMap[def.alias] = key;
 		}
 		if (def.default !== undefined) {
 			result[key] = def.default;
 		} else if (def.type === "flag" || def.type === "boolean") {
 			result[key] = false;
 		}
 	}
 	// Parse arguments
 	for (let i = 0; i < args.length; i++) {
 		const arg = args[i];
 		// Check if it's a flag
 		if (arg.startsWith("--")) {
 			const flagName = arg.slice(2);
 			const key = aliasMap[flagName] || flagName;
 			const def = defs[key];
 			if (!def) {
 				// Unknown flag, add to positional args
 				result._.push(arg);
 				continue;
 			}
 			if (def.type === "flag") {
 				// Simple on/off flag
 				result[key] = true;
 			} else if (i + 1 < args.length) {
 				// Flag with value
 				const value = args[++i];
 				let parsedValue: any;
 				switch (def.type) {
 					case "boolean":
 						parsedValue = value === "true" || value === "1" || value === "yes";
 						break;
 					case "int":
 						parsedValue = parseInt(value, 10);
 						if (Number.isNaN(parsedValue)) {
 							throw new Error(`Invalid integer value for --${key}: ${value}`);
 						}
 						break;
 					case "float":
 						parsedValue = parseFloat(value);
 						if (Number.isNaN(parsedValue)) {
 							throw new Error(`Invalid float value for --${key}: ${value}`);
 						}
 						break;
 					case "string":
 						parsedValue = value;
 						break;
 					case "file": {
 						// Resolve ~ to home directory and make absolute
 						let path = value;
 						if (path.startsWith("~")) {
 							path = path.replace("~", homedir());
 						}
 						parsedValue = resolve(path);
 						break;
 					}
 				}
 				// Validate against choices if specified
 				if (def.choices) {
 					const validValues = def.choices.map((c) => (typeof c === "string" ? c : c.value));
 					if (!validValues.includes(parsedValue)) {
 						throw new Error(
 							`Invalid value for --${key}: "${parsedValue}". Valid choices: ${validValues.join(", ")}`,
 						);
 					}
 				}
 				result[key] = parsedValue;
 			} else {
 				throw new Error(`Flag --${key} requires a value`);
 			}
 		} else if (arg.startsWith("-") && arg.length === 2) {
 			// Short flag like -h
 			const flagChar = arg[1];
 			const key = aliasMap[flagChar] || flagChar;
 			const def = defs[key];
 			if (!def) {
 				result._.push(arg);
 				continue;
 			}
 			if (def.type === "flag") {
 				result[key] = true;
 			} else {
 				throw new Error(`Short flag -${flagChar} cannot have a value`);
 			}
 		} else {
 			// Positional argument
 			result._.push(arg);
 		}
 	}
 	return result as ParsedArgs<T>;
 }
 export function printHelp<T extends ArgDefs>(defs: T, usage: string): void {
 	console.log(usage);
 	console.log("\nOptions:");
 	for (const [key, def] of Object.entries(defs)) {
 		let line = `  --${key}`;
 		if (def.alias) {
 			line += `, -${def.alias}`;
 		}
 		if (def.type !== "flag") {
 			if (def.choices) {
 				// Show choices instead of type
 				const simpleChoices = def.choices.filter((c) => typeof c === "string");
 				if (simpleChoices.length === def.choices.length) {
 					// All choices are simple strings
 					line += ` <${simpleChoices.join("|")}>`;
 				} else {
 					// Has descriptions, just show the type
 					const typeStr = def.type === "file" ? "path" : def.type;
 					line += ` <${typeStr}>`;
 				}
 			} else {
 				const typeStr = def.type === "file" ? "path" : def.type;
 				line += ` <${typeStr}>`;
 			}
 		}
 		if (def.description) {
 			// Pad to align descriptions
 			line = line.padEnd(30) + def.description;
 		}
 		if (def.default !== undefined && def.type !== "flag" && def.showDefault !== false) {
 			if (typeof def.showDefault === "string") {
 				line += ` (default: ${def.showDefault})`;
 			} else {
 				line += ` (default: ${def.default})`;
 			}
 		}
 		console.log(line);
 		// Print choices with descriptions if available
 		if (def.choices) {
 			const hasDescriptions = def.choices.some((c) => typeof c === "object" && c.description);
 			if (hasDescriptions) {
 				for (const choice of def.choices) {
 					if (typeof choice === "object") {
 						const choiceLine = `      ${choice.value}`.padEnd(30) + (choice.description || "");
 						console.log(choiceLine);
 					}
 				}
 			}
 		}
 	}
 }
--- a/packages/agent/src/cli.ts
+++ b/packages/agent/src/cli.ts
@ -0,0 +1,294 @@
 #!/usr/bin/env node
 import chalk from "chalk";
 import { createInterface } from "readline";
 import type { AgentConfig } from "./agent.js";
 import { Agent } from "./agent.js";
 import { parseArgs, printHelp as printHelpArgs } from "./args.js";
 import { ConsoleRenderer } from "./renderers/console-renderer.js";
 import { JsonRenderer } from "./renderers/json-renderer.js";
 import { TuiRenderer } from "./renderers/tui-renderer.js";
 import { SessionManager } from "./session-manager.js";
 // Define argument structure
 const argDefs = {
 	"base-url": {
 		type: "string" as const,
 		default: "https://api.openai.com/v1",
 		description: "API base URL",
 	},
 	"api-key": {
 		type: "string" as const,
 		default: process.env.OPENAI_API_KEY || "",
 		description: "API key",
 		showDefault: "$OPENAI_API_KEY",
 	},
 	model: {
 		type: "string" as const,
 		default: "gpt-5-mini",
 		description: "Model name",
 	},
 	api: {
 		type: "string" as const,
 		default: "completions",
 		description: "API type",
 		choices: [
 			{ value: "completions", description: "OpenAI Chat Completions API (most models)" },
 			{ value: "responses", description: "OpenAI Responses API (GPT-OSS models)" },
 		],
 	},
 	"system-prompt": {
 		type: "string" as const,
 		default: "You are a helpful assistant.",
 		description: "System prompt",
 	},
 	continue: {
 		type: "flag" as const,
 		alias: "c",
 		description: "Continue previous session",
 	},
 	json: {
 		type: "flag" as const,
 		description: "Output as JSONL",
 	},
 	help: {
 		type: "flag" as const,
 		alias: "h",
 		description: "Show this help message",
 	},
 };
 interface JsonCommand {
 	type: "message" | "interrupt";
 	content?: string;
 }
 function printHelp(): void {
 	const usage = `Usage: pi-agent [options] [messages...]
 Examples:
 # Single message (default OpenAI, GPT-5 Mini, OPENAI_API_KEY env var)
 pi-agent "What is 2+2?"
 # Multiple messages processed sequentially
 pi-agent "What is 2+2?" "What about 3+3?"
 # Interactive chat mode (no messages = interactive)
 pi-agent
 # Continue most recently modified session in current directory
 pi-agent --continue "Follow up question"
 # GPT-OSS via Groq
 pi-agent --base-url https://api.groq.com/openai/v1 --api-key $GROQ_API_KEY --model openai/gpt-oss-120b
 # GLM 4.5 via OpenRouter
 pi-agent --base-url https://openrouter.ai/api/v1 --api-key $OPENROUTER_API_KEY --model z-ai/glm-4.5
 # Claude via Anthropic (no prompt caching support - see https://docs.anthropic.com/en/api/openai-sdk)
 pi-agent --base-url https://api.anthropic.com/v1 --api-key $ANTHROPIC_API_KEY --model claude-opus-4-1-20250805`;
 	printHelpArgs(argDefs, usage);
 }
 async function runJsonInteractiveMode(config: AgentConfig, sessionManager: SessionManager): Promise<void> {
 	const rl = createInterface({
 		input: process.stdin,
 		output: process.stdout,
 		terminal: false, // Don't interpret control characters
 	});
 	const renderer = new JsonRenderer();
 	const agent = new Agent(config, renderer, sessionManager);
 	let isProcessing = false;
 	let pendingMessage: string | null = null;
 	const processMessage = async (content: string): Promise<void> => {
 		isProcessing = true;
 		try {
 			await agent.ask(content);
 		} catch (e: any) {
 			await renderer.on({ type: "error", message: e.message });
 		} finally {
 			isProcessing = false;
 			// Process any pending message
 			if (pendingMessage) {
 				const msg = pendingMessage;
 				pendingMessage = null;
 				await processMessage(msg);
 			}
 		}
 	};
 	// Listen for lines from stdin
 	rl.on("line", (line) => {
 		try {
 			const command = JSON.parse(line) as JsonCommand;
 			switch (command.type) {
 				case "interrupt":
 					agent.interrupt();
 					isProcessing = false;
 					break;
 				case "message":
 					if (!command.content) {
 						renderer.on({ type: "error", message: "Message content is required" });
 						return;
 					}
 					if (isProcessing) {
 						// Queue the message for when the agent is done
 						pendingMessage = command.content;
 					} else {
 						processMessage(command.content);
 					}
 					break;
 				default:
 					renderer.on({ type: "error", message: `Unknown command type: ${(command as any).type}` });
 			}
 		} catch (e) {
 			renderer.on({ type: "error", message: `Invalid JSON: ${e}` });
 		}
 	});
 	// Wait for stdin to close
 	await new Promise<void>((resolve) => {
 		rl.on("close", () => {
 			resolve();
 		});
 	});
 }
 async function runTuiInteractiveMode(agentConfig: AgentConfig, sessionManager: SessionManager): Promise<void> {
 	const sessionData = sessionManager.getSessionData();
 	if (sessionData) {
 		console.log(chalk.dim(`Resuming session with ${sessionData.events.length} events`));
 	}
 	const renderer = new TuiRenderer();
 	// Initialize TUI BEFORE creating the agent to prevent double init
 	await renderer.init();
 	const agent = new Agent(agentConfig, renderer, sessionManager);
 	renderer.setInterruptCallback(() => {
 		agent.interrupt();
 	});
 	if (sessionData) {
 		agent.setEvents(sessionData ? sessionData.events.map((e) => e.event) : []);
 		for (const sessionEvent of sessionData.events) {
 			const event = sessionEvent.event;
 			if (event.type === "assistant_start") {
 				renderer.renderAssistantLabel();
 			} else {
 				await renderer.on(event);
 			}
 		}
 	}
 	while (true) {
 		const userInput = await renderer.getUserInput();
 		try {
 			await agent.ask(userInput);
 		} catch (e: any) {
 			await renderer.on({ type: "error", message: e.message });
 		}
 	}
 }
 async function runSingleShotMode(
 	agentConfig: AgentConfig,
 	sessionManager: SessionManager,
 	messages: string[],
 	jsonOutput: boolean,
 ): Promise<void> {
 	const sessionData = sessionManager.getSessionData();
 	const renderer = jsonOutput ? new JsonRenderer() : new ConsoleRenderer();
 	const agent = new Agent(agentConfig, renderer, sessionManager);
 	if (sessionData) {
 		if (!jsonOutput) {
 			console.log(chalk.dim(`Resuming session with ${sessionData.events.length} events`));
 		}
 		agent.setEvents(sessionData ? sessionData.events.map((e) => e.event) : []);
 	}
 	for (const msg of messages) {
 		try {
 			await agent.ask(msg);
 		} catch (e: any) {
 			await renderer.on({ type: "error", message: e.message });
 		}
 	}
 }
 // Main function to use Agent as standalone CLI
 export async function main(args: string[]): Promise<void> {
 	// Parse arguments
 	const parsed = parseArgs(argDefs, args);
 	// Show help if requested
 	if (parsed.help) {
 		printHelp();
 		return;
 	}
 	// Extract configuration from parsed args
 	const baseURL = parsed["base-url"];
 	const apiKey = parsed["api-key"];
 	const model = parsed.model;
 	const continueSession = parsed.continue;
 	const api = parsed.api as "completions" | "responses";
 	const systemPrompt = parsed["system-prompt"];
 	const jsonOutput = parsed.json;
 	const messages = parsed._; // Positional arguments
 	if (!apiKey) {
 		throw new Error("API key required (use --api-key or set OPENAI_API_KEY)");
 	}
 	// Determine mode: interactive if no messages provided
 	const isInteractive = messages.length === 0;
 	// Create session manager
 	const sessionManager = new SessionManager(continueSession);
 	// Create or restore agent
 	let agentConfig: AgentConfig = {
 		apiKey,
 		baseURL,
 		model,
 		api,
 		systemPrompt,
 	};
 	if (continueSession) {
 		const sessionData = sessionManager.getSessionData();
 		if (sessionData) {
 			agentConfig = {
 				...sessionData.config,
 				apiKey, // Allow overriding API key
 			};
 		}
 	}
 	// Run in appropriate mode
 	if (isInteractive) {
 		if (jsonOutput) {
 			await runJsonInteractiveMode(agentConfig, sessionManager);
 		} else {
 			await runTuiInteractiveMode(agentConfig, sessionManager);
 		}
 	} else {
 		await runSingleShotMode(agentConfig, sessionManager, messages, jsonOutput);
 	}
 }
 // Run as CLI if invoked directly
 if (import.meta.url === `file://${process.argv[1]}`) {
 	main(process.argv.slice(2)).catch((err) => {
 		console.error(err);
 		process.exit(1);
 	});
 }
--- a/packages/agent/src/index.ts
+++ b/packages/agent/src/index.ts
@ -0,0 +1,15 @@
 // Main exports for pi-agent package
 export type { AgentConfig, AgentEvent, AgentEventReceiver } from "./agent.js";
 export { Agent } from "./agent.js";
 export type { ArgDef, ArgDefs, ParsedArgs } from "./args.js";
 // CLI utilities
 export { parseArgs, printHelp } from "./args.js";
 // CLI main function
 export { main } from "./cli.js";
 // Renderers
 export { ConsoleRenderer } from "./renderers/console-renderer.js";
 export { JsonRenderer } from "./renderers/json-renderer.js";
 export { TuiRenderer } from "./renderers/tui-renderer.js";
 export type { SessionData, SessionEvent, SessionHeader } from "./session-manager.js";
 export { SessionManager } from "./session-manager.js";
--- a/packages/agent/src/renderers/console-renderer.ts
+++ b/packages/agent/src/renderers/console-renderer.ts
@ -0,0 +1,130 @@
 import chalk from "chalk";
 import type { AgentEvent, AgentEventReceiver } from "../agent.js";
 export class ConsoleRenderer implements AgentEventReceiver {
 	private frames = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"];
 	private currentFrame = 0;
 	private animationInterval: NodeJS.Timeout | null = null;
 	private isAnimating = false;
 	private animationLine = "";
 	private isTTY = process.stdout.isTTY;
 	private startAnimation(text: string = "Thinking"): void {
 		if (this.isAnimating || !this.isTTY) return;
 		this.isAnimating = true;
 		this.currentFrame = 0;
 		// Write initial frame
 		this.animationLine = `${chalk.cyan(this.frames[this.currentFrame])} ${chalk.dim(text)}`;
 		process.stdout.write(this.animationLine);
 		this.animationInterval = setInterval(() => {
 			// Clear current line
 			process.stdout.write(`\r${" ".repeat(this.animationLine.length)}\r`);
 			// Update frame
 			this.currentFrame = (this.currentFrame + 1) % this.frames.length;
 			this.animationLine = `${chalk.cyan(this.frames[this.currentFrame])} ${chalk.dim(text)}`;
 			process.stdout.write(this.animationLine);
 		}, 80);
 	}
 	private stopAnimation(): void {
 		if (!this.isAnimating) return;
 		if (this.animationInterval) {
 			clearInterval(this.animationInterval);
 			this.animationInterval = null;
 		}
 		// Clear the animation line
 		process.stdout.write(`\r${" ".repeat(this.animationLine.length)}\r`);
 		this.isAnimating = false;
 		this.animationLine = "";
 	}
 	async on(event: AgentEvent): Promise<void> {
 		// Stop animation for any new event except token_usage
 		if (event.type !== "token_usage" && this.isAnimating) {
 			this.stopAnimation();
 		}
 		switch (event.type) {
 			case "session_start":
 				console.log(
 					chalk.blue(
 						`[Session started] ID: ${event.sessionId}, Model: ${event.model}, API: ${event.api}, Base URL: ${event.baseURL}`,
 					),
 				);
 				console.log(chalk.dim(`System Prompt: ${event.systemPrompt}\n`));
 				break;
 			case "assistant_start":
 				console.log(chalk.hex("#FFA500")("[assistant]"));
 				this.startAnimation();
 				break;
 			case "thinking":
 				this.stopAnimation();
 				console.log(chalk.dim("[thinking]"));
 				console.log(chalk.dim(event.text));
 				console.log();
 				// Resume animation after showing thinking
 				this.startAnimation("Processing");
 				break;
 			case "tool_call":
 				this.stopAnimation();
 				console.log(chalk.yellow(`[tool] ${event.name}(${event.args})`));
 				// Resume animation while tool executes
 				this.startAnimation(`Running ${event.name}`);
 				break;
 			case "tool_result": {
 				this.stopAnimation();
 				const lines = event.result.split("\n");
 				const maxLines = 10;
 				const truncated = lines.length > maxLines;
 				const toShow = truncated ? lines.slice(0, maxLines) : lines;
 				const text = toShow.join("\n");
 				console.log(event.isError ? chalk.red(text) : chalk.gray(text));
 				if (truncated) {
 					console.log(chalk.dim(`... (${lines.length - maxLines} more lines)`));
 				}
 				console.log();
 				// Resume animation after tool result
 				this.startAnimation("Thinking");
 				break;
 			}
 			case "assistant_message":
 				this.stopAnimation();
 				console.log(event.text);
 				console.log();
 				break;
 			case "error":
 				this.stopAnimation();
 				console.error(chalk.red(`[error] ${event.message}\n`));
 				break;
 			case "user_message":
 				console.log(chalk.green("[user]"));
 				console.log(event.text);
 				console.log();
 				break;
 			case "interrupted":
 				this.stopAnimation();
 				console.log(chalk.red("[Interrupted by user]\n"));
 				break;
 			case "token_usage":
 				// Token usage is not displayed in console mode
 				// Don't stop animation for this event
 				break;
 		}
 	}
 }
--- a/packages/agent/src/renderers/json-renderer.ts
+++ b/packages/agent/src/renderers/json-renderer.ts
@ -0,0 +1,7 @@
 import type { AgentEvent, AgentEventReceiver } from "../agent.js";
 export class JsonRenderer implements AgentEventReceiver {
 	async on(event: AgentEvent): Promise<void> {
 		console.log(JSON.stringify(event));
 	}
 }
--- a/packages/agent/src/renderers/tui-renderer.ts
+++ b/packages/agent/src/renderers/tui-renderer.ts
@ -0,0 +1,353 @@
 import {
 	CombinedAutocompleteProvider,
 	Container,
 	MarkdownComponent,
 	TextComponent,
 	TextEditor,
 	TUI,
 	WhitespaceComponent,
 } from "@mariozechner/pi-tui";
 import chalk from "chalk";
 import type { AgentEvent, AgentEventReceiver } from "../agent.js";
 class LoadingAnimation extends TextComponent {
 	private frames = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"];
 	private currentFrame = 0;
 	private intervalId: NodeJS.Timeout | null = null;
 	private ui: TUI | null = null;
 	constructor(ui: TUI) {
 		super("", { bottom: 1 });
 		this.ui = ui;
 		this.start();
 	}
 	start() {
 		this.updateDisplay();
 		this.intervalId = setInterval(() => {
 			this.currentFrame = (this.currentFrame + 1) % this.frames.length;
 			this.updateDisplay();
 		}, 80);
 	}
 	stop() {
 		if (this.intervalId) {
 			clearInterval(this.intervalId);
 			this.intervalId = null;
 		}
 	}
 	private updateDisplay() {
 		const frame = this.frames[this.currentFrame];
 		this.setText(`${chalk.cyan(frame)} ${chalk.dim("Thinking...")}`);
 		if (this.ui) {
 			this.ui.requestRender();
 		}
 	}
 }
 export class TuiRenderer implements AgentEventReceiver {
 	private ui: TUI;
 	private chatContainer: Container;
 	private statusContainer: Container;
 	private editor: TextEditor;
 	private tokenContainer: Container;
 	private isInitialized = false;
 	private onInputCallback?: (text: string) => void;
 	private currentLoadingAnimation: LoadingAnimation | null = null;
 	private onInterruptCallback?: () => void;
 	private lastSigintTime = 0;
 	private lastInputTokens = 0;
 	private lastOutputTokens = 0;
 	private lastCacheReadTokens = 0;
 	private lastCacheWriteTokens = 0;
 	private tokenStatusComponent: TextComponent | null = null;
 	constructor() {
 		this.ui = new TUI();
 		this.chatContainer = new Container();
 		this.statusContainer = new Container();
 		this.editor = new TextEditor();
 		this.tokenContainer = new Container();
 		// Setup autocomplete for file paths and slash commands
 		const autocompleteProvider = new CombinedAutocompleteProvider(
 			[],
 			process.cwd(), // Base directory for file path completion
 		);
 		this.editor.setAutocompleteProvider(autocompleteProvider);
 	}
 	async init(): Promise<void> {
 		if (this.isInitialized) return;
 		// Add header with instructions
 		const header = new TextComponent(
 			chalk.gray(chalk.blueBright(">> pi interactive chat <<<")) +
 				"\n" +
 				chalk.dim("Press Escape to interrupt while processing") +
 				"\n" +
 				chalk.dim("Press CTRL+C to clear the text editor") +
 				"\n" +
 				chalk.dim("Press CTRL+C twice quickly to exit"),
 			{ bottom: 1 },
 		);
 		// Setup UI layout
 		this.ui.addChild(header);
 		this.ui.addChild(this.chatContainer);
 		this.ui.addChild(this.statusContainer);
 		this.ui.addChild(new WhitespaceComponent(1));
 		this.ui.addChild(this.editor);
 		this.ui.addChild(this.tokenContainer);
 		this.ui.setFocus(this.editor);
 		// Set up global key handler for Escape and Ctrl+C
 		this.ui.onGlobalKeyPress = (data: string): boolean => {
 			// Intercept Escape key when processing
 			if (data === "\x1b" && this.currentLoadingAnimation) {
 				// Call interrupt callback if set
 				if (this.onInterruptCallback) {
 					this.onInterruptCallback();
 				}
 				// Stop the loading animation immediately
 				if (this.currentLoadingAnimation) {
 					this.currentLoadingAnimation.stop();
 					this.statusContainer.clear();
 					this.currentLoadingAnimation = null;
 				}
 				// Don't show message here - the interrupted event will handle it
 				// Re-enable editor submission
 				this.editor.disableSubmit = false;
 				this.ui.requestRender();
 				// Don't forward to editor
 				return false;
 			}
 			// Handle Ctrl+C (raw mode sends \x03)
 			if (data === "\x03") {
 				const now = Date.now();
 				const timeSinceLastCtrlC = now - this.lastSigintTime;
 				if (timeSinceLastCtrlC < 500) {
 					// Second Ctrl+C within 500ms - exit
 					this.stop();
 					process.exit(0);
 				} else {
 					// First Ctrl+C - clear the editor
 					this.clearEditor();
 					this.lastSigintTime = now;
 				}
 				// Don't forward to editor
 				return false;
 			}
 			// Forward all other keys
 			return true;
 		};
 		// Handle editor submission
 		this.editor.onSubmit = (text: string) => {
 			text = text.trim();
 			if (!text) return;
 			if (this.onInputCallback) {
 				this.onInputCallback(text);
 			}
 		};
 		// Start the UI
 		await this.ui.start();
 		this.isInitialized = true;
 	}
 	async on(event: AgentEvent): Promise<void> {
 		// Ensure UI is initialized
 		if (!this.isInitialized) {
 			await this.init();
 		}
 		switch (event.type) {
 			case "assistant_start":
 				this.chatContainer.addChild(new TextComponent(chalk.hex("#FFA500")("[assistant]")));
 				// Disable editor submission while processing
 				this.editor.disableSubmit = true;
 				// Start loading animation in the status container
 				this.statusContainer.clear();
 				this.currentLoadingAnimation = new LoadingAnimation(this.ui);
 				this.statusContainer.addChild(this.currentLoadingAnimation);
 				break;
 			case "thinking": {
 				// Show thinking in dim text
 				const thinkingContainer = new Container();
 				thinkingContainer.addChild(new TextComponent(chalk.dim("[thinking]")));
 				// Split thinking text into lines for better display
 				const thinkingLines = event.text.split("\n");
 				for (const line of thinkingLines) {
 					thinkingContainer.addChild(new TextComponent(chalk.dim(line)));
 				}
 				thinkingContainer.addChild(new WhitespaceComponent(1));
 				this.chatContainer.addChild(thinkingContainer);
 				break;
 			}
 			case "tool_call":
 				this.chatContainer.addChild(new TextComponent(chalk.yellow(`[tool] ${event.name}(${event.args})`)));
 				break;
 			case "tool_result": {
 				// Show tool result with truncation
 				const lines = event.result.split("\n");
 				const maxLines = 10;
 				const truncated = lines.length > maxLines;
 				const toShow = truncated ? lines.slice(0, maxLines) : lines;
 				const resultContainer = new Container();
 				for (const line of toShow) {
 					resultContainer.addChild(new TextComponent(event.isError ? chalk.red(line) : chalk.gray(line)));
 				}
 				if (truncated) {
 					resultContainer.addChild(new TextComponent(chalk.dim(`... (${lines.length - maxLines} more lines)`)));
 				}
 				resultContainer.addChild(new WhitespaceComponent(1));
 				this.chatContainer.addChild(resultContainer);
 				break;
 			}
 			case "assistant_message":
 				// Stop loading animation when assistant responds
 				if (this.currentLoadingAnimation) {
 					this.currentLoadingAnimation.stop();
 					this.currentLoadingAnimation = null;
 					this.statusContainer.clear();
 				}
 				// Re-enable editor submission
 				this.editor.disableSubmit = false;
 				// Use MarkdownComponent for rich formatting
 				this.chatContainer.addChild(new MarkdownComponent(event.text));
 				this.chatContainer.addChild(new WhitespaceComponent(1));
 				break;
 			case "error":
 				// Stop loading animation on error
 				if (this.currentLoadingAnimation) {
 					this.currentLoadingAnimation.stop();
 					this.currentLoadingAnimation = null;
 					this.statusContainer.clear();
 				}
 				// Re-enable editor submission
 				this.editor.disableSubmit = false;
 				this.chatContainer.addChild(new TextComponent(chalk.red(`[error] ${event.message}`), { bottom: 1 }));
 				break;
 			case "user_message":
 				// Render user message
 				this.chatContainer.addChild(new TextComponent(chalk.green("[user]")));
 				this.chatContainer.addChild(new TextComponent(event.text, { bottom: 1 }));
 				break;
 			case "token_usage":
 				// Store the latest token counts (not cumulative since prompt includes full context)
 				this.lastInputTokens = event.inputTokens;
 				this.lastOutputTokens = event.outputTokens;
 				this.lastCacheReadTokens = event.cacheReadTokens;
 				this.lastCacheWriteTokens = event.cacheWriteTokens;
 				this.updateTokenDisplay();
 				break;
 			case "interrupted":
 				// Stop the loading animation
 				if (this.currentLoadingAnimation) {
 					this.currentLoadingAnimation.stop();
 					this.currentLoadingAnimation = null;
 					this.statusContainer.clear();
 				}
 				// Show interrupted message
 				this.chatContainer.addChild(new TextComponent(chalk.red("[Interrupted by user]"), { bottom: 1 }));
 				// Re-enable editor submission
 				this.editor.disableSubmit = false;
 				break;
 		}
 		this.ui.requestRender();
 	}
 	private updateTokenDisplay(): void {
 		// Clear and update token display
 		this.tokenContainer.clear();
 		// Build token display text
 		let tokenText = chalk.dim(`↑${this.lastInputTokens.toLocaleString()} ↓${this.lastOutputTokens.toLocaleString()}`);
 		// Add cache info if available
 		if (this.lastCacheReadTokens > 0 || this.lastCacheWriteTokens > 0) {
 			const cacheText: string[] = [];
 			if (this.lastCacheReadTokens > 0) {
 				cacheText.push(`⟲${this.lastCacheReadTokens.toLocaleString()}`);
 			}
 			if (this.lastCacheWriteTokens > 0) {
 				cacheText.push(`⟳${this.lastCacheWriteTokens.toLocaleString()}`);
 			}
 			tokenText += chalk.dim(` (${cacheText.join(" ")})`);
 		}
 		this.tokenStatusComponent = new TextComponent(tokenText);
 		this.tokenContainer.addChild(this.tokenStatusComponent);
 	}
 	async getUserInput(): Promise<string> {
 		return new Promise((resolve) => {
 			this.onInputCallback = (text: string) => {
 				this.onInputCallback = undefined; // Clear callback
 				resolve(text);
 			};
 		});
 	}
 	setInterruptCallback(callback: () => void): void {
 		this.onInterruptCallback = callback;
 	}
 	clearEditor(): void {
 		this.editor.setText("");
 		// Show hint in status container
 		this.statusContainer.clear();
 		const hint = new TextComponent(chalk.dim("Press Ctrl+C again to exit"));
 		this.statusContainer.addChild(hint);
 		this.ui.requestRender();
 		// Clear the hint after 500ms
 		setTimeout(() => {
 			this.statusContainer.clear();
 			this.ui.requestRender();
 		}, 500);
 	}
 	renderAssistantLabel(): void {
 		// Just render the assistant label without starting animations
 		// Used for restored session history
 		this.chatContainer.addChild(new TextComponent(chalk.hex("#FFA500")("[assistant]")));
 		this.ui.requestRender();
 	}
 	stop(): void {
 		if (this.currentLoadingAnimation) {
 			this.currentLoadingAnimation.stop();
 			this.currentLoadingAnimation = null;
 		}
 		if (this.isInitialized) {
 			this.ui.stop();
 			this.isInitialized = false;
 		}
 	}
 }
--- a/packages/agent/src/session-manager.ts
+++ b/packages/agent/src/session-manager.ts
@ -0,0 +1,176 @@
 import { randomBytes } from "crypto";
 import { appendFileSync, existsSync, mkdirSync, readdirSync, readFileSync, statSync } from "fs";
 import { homedir } from "os";
 import { join, resolve } from "path";
 import type { AgentConfig, AgentEvent, AgentEventReceiver } from "./agent.js";
 // Simple UUID v4 generator
 function uuidv4(): string {
 	const bytes = randomBytes(16);
 	bytes[6] = (bytes[6] & 0x0f) | 0x40; // Version 4
 	bytes[8] = (bytes[8] & 0x3f) | 0x80; // Variant 10
 	const hex = bytes.toString("hex");
 	return `${hex.slice(0, 8)}-${hex.slice(8, 12)}-${hex.slice(12, 16)}-${hex.slice(16, 20)}-${hex.slice(20, 32)}`;
 }
 export interface SessionHeader {
 	type: "session";
 	id: string;
 	timestamp: string;
 	cwd: string;
 	config: AgentConfig;
 }
 export interface SessionEvent {
 	type: "event";
 	timestamp: string;
 	event: AgentEvent;
 }
 export interface SessionData {
 	config: AgentConfig;
 	events: SessionEvent[];
 	totalUsage: Extract<AgentEvent, { type: "token_usage" }>;
 }
 export class SessionManager implements AgentEventReceiver {
 	private sessionId!: string;
 	private sessionFile!: string;
 	private sessionDir: string;
 	constructor(continueSession: boolean = false) {
 		this.sessionDir = this.getSessionDirectory();
 		if (continueSession) {
 			const mostRecent = this.findMostRecentlyModifiedSession();
 			if (mostRecent) {
 				this.sessionFile = mostRecent;
 				// Load session ID from file
 				this.loadSessionId();
 			} else {
 				// No existing session, create new
 				this.initNewSession();
 			}
 		} else {
 			this.initNewSession();
 		}
 	}
 	private getSessionDirectory(): string {
 		const cwd = process.cwd();
 		const safePath = "--" + cwd.replace(/^\//, "").replace(/\//g, "-") + "--";
 		const piConfigDir = resolve(process.env.PI_CONFIG_DIR || join(homedir(), ".pi"));
 		const sessionDir = join(piConfigDir, "sessions", safePath);
 		if (!existsSync(sessionDir)) {
 			mkdirSync(sessionDir, { recursive: true });
 		}
 		return sessionDir;
 	}
 	private initNewSession(): void {
 		this.sessionId = uuidv4();
 		const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
 		this.sessionFile = join(this.sessionDir, `${timestamp}_${this.sessionId}.jsonl`);
 	}
 	private findMostRecentlyModifiedSession(): string | null {
 		try {
 			const files = readdirSync(this.sessionDir)
 				.filter((f) => f.endsWith(".jsonl"))
 				.map((f) => ({
 					name: f,
 					path: join(this.sessionDir, f),
 					mtime: statSync(join(this.sessionDir, f)).mtime,
 				}))
 				.sort((a, b) => b.mtime.getTime() - a.mtime.getTime());
 			return files[0]?.path || null;
 		} catch {
 			return null;
 		}
 	}
 	private loadSessionId(): void {
 		if (!existsSync(this.sessionFile)) return;
 		const lines = readFileSync(this.sessionFile, "utf8").trim().split("\n");
 		for (const line of lines) {
 			try {
 				const entry = JSON.parse(line);
 				if (entry.type === "session") {
 					this.sessionId = entry.id;
 					return;
 				}
 			} catch {
 				// Skip malformed lines
 			}
 		}
 		// If no session entry found, create new ID
 		this.sessionId = uuidv4();
 	}
 	startSession(config: AgentConfig): void {
 		const entry: SessionHeader = {
 			type: "session",
 			id: this.sessionId,
 			timestamp: new Date().toISOString(),
 			cwd: process.cwd(),
 			config,
 		};
 		appendFileSync(this.sessionFile, JSON.stringify(entry) + "\n");
 	}
 	async on(event: AgentEvent): Promise<void> {
 		const entry: SessionEvent = {
 			type: "event",
 			timestamp: new Date().toISOString(),
 			event: event,
 		};
 		appendFileSync(this.sessionFile, JSON.stringify(entry) + "\n");
 	}
 	getSessionData(): SessionData | null {
 		if (!existsSync(this.sessionFile)) return null;
 		let config: AgentConfig | null = null;
 		const events: SessionEvent[] = [];
 		let totalUsage: Extract<AgentEvent, { type: "token_usage" }> = {
 			type: "token_usage",
 			inputTokens: 0,
 			outputTokens: 0,
 			totalTokens: 0,
 			cacheReadTokens: 0,
 			cacheWriteTokens: 0,
 		};
 		const lines = readFileSync(this.sessionFile, "utf8").trim().split("\n");
 		for (const line of lines) {
 			try {
 				const entry = JSON.parse(line);
 				if (entry.type === "session") {
 					config = entry.config;
 					this.sessionId = entry.id;
 				} else if (entry.type === "event") {
 					const eventEntry: SessionEvent = entry as SessionEvent;
 					events.push(eventEntry);
 					if (eventEntry.event.type === "token_usage") {
 						totalUsage = entry.event as Extract<AgentEvent, { type: "token_usage" }>;
 					}
 				}
 			} catch {
 				// Skip malformed lines
 			}
 		}
 		return config ? { config, events, totalUsage } : null;
 	}
 	getSessionId(): string {
 		return this.sessionId;
 	}
 	getSessionFile(): string {
 		return this.sessionFile;
 	}
 }
--- a/packages/agent/src/tools/tools.ts
+++ b/packages/agent/src/tools/tools.ts
@ -0,0 +1,264 @@
 import { spawn } from "node:child_process";
 import { closeSync, existsSync, openSync, readdirSync, readFileSync, readSync, statSync } from "node:fs";
 import { resolve } from "node:path";
 import { glob } from "glob";
 import type { ChatCompletionTool } from "openai/resources";
 // For GPT-OSS models via responses API
 export const toolsForResponses = [
 	{
 		type: "function" as const,
 		name: "read",
 		description: "Read contents of a file",
 		parameters: {
 			type: "object",
 			properties: {
 				path: {
 					type: "string",
 					description: "Path to the file to read",
 				},
 			},
 			required: ["path"],
 		},
 	},
 	{
 		type: "function" as const,
 		name: "list",
 		description: "List contents of a directory",
 		parameters: {
 			type: "object",
 			properties: {
 				path: {
 					type: "string",
 					description: "Path to the directory (default: current directory)",
 				},
 			},
 		},
 	},
 	{
 		type: "function" as const,
 		name: "bash",
 		description: "Execute a command in Bash",
 		parameters: {
 			type: "object",
 			properties: {
 				command: {
 					type: "string",
 					description: "Command to execute",
 				},
 			},
 			required: ["command"],
 		},
 	},
 	{
 		type: "function" as const,
 		name: "glob",
 		description: "Find files matching a glob pattern",
 		parameters: {
 			type: "object",
 			properties: {
 				pattern: {
 					type: "string",
 					description: "Glob pattern to match files (e.g., '**/*.ts', 'src/**/*.json')",
 				},
 				path: {
 					type: "string",
 					description: "Directory to search in (default: current directory)",
 				},
 			},
 			required: ["pattern"],
 		},
 	},
 	{
 		type: "function" as const,
 		name: "rg",
 		description: "Search using ripgrep.",
 		parameters: {
 			type: "object",
 			properties: {
 				args: {
 					type: "string",
 					description:
 						'Arguments to pass directly to ripgrep. Examples: "-l prompt" or "-i TODO" or "--type ts className" or "functionName src/". Never add quotes around the search pattern.',
 				},
 			},
 			required: ["args"],
 		},
 	},
 ];
 // For standard chat API (OpenAI format)
 export const toolsForChat: ChatCompletionTool[] = toolsForResponses.map((tool) => ({
 	type: "function" as const,
 	function: {
 		name: tool.name,
 		description: tool.description,
 		parameters: tool.parameters,
 	},
 }));
 // Helper to execute commands with abort support
 async function execWithAbort(command: string, signal?: AbortSignal): Promise<string> {
 	return new Promise((resolve, reject) => {
 		const child = spawn(command, {
 			shell: true,
 			signal,
 		});
 		let stdout = "";
 		let stderr = "";
 		const MAX_OUTPUT_SIZE = 1024 * 1024; // 1MB limit
 		let outputTruncated = false;
 		child.stdout?.on("data", (data) => {
 			const chunk = data.toString();
 			if (stdout.length + chunk.length > MAX_OUTPUT_SIZE) {
 				if (!outputTruncated) {
 					stdout += "\n... [Output truncated - exceeded 1MB limit] ...";
 					outputTruncated = true;
 				}
 			} else {
 				stdout += chunk;
 			}
 		});
 		child.stderr?.on("data", (data) => {
 			const chunk = data.toString();
 			if (stderr.length + chunk.length > MAX_OUTPUT_SIZE) {
 				if (!outputTruncated) {
 					stderr += "\n... [Output truncated - exceeded 1MB limit] ...";
 					outputTruncated = true;
 				}
 			} else {
 				stderr += chunk;
 			}
 		});
 		child.on("error", (error) => {
 			reject(error);
 		});
 		child.on("close", (code) => {
 			if (signal?.aborted) {
 				reject(new Error("Interrupted"));
 			} else if (code !== 0 && code !== null) {
 				// For some commands like ripgrep, exit code 1 is normal (no matches)
 				if (code === 1 && command.includes("rg")) {
 					resolve(""); // No matches for ripgrep
 				} else if (stderr && !stdout) {
 					reject(new Error(stderr));
 				} else {
 					resolve(stdout || "");
 				}
 			} else {
 				resolve(stdout || stderr || "");
 			}
 		});
 		// Kill the process if signal is aborted
 		if (signal) {
 			signal.addEventListener(
 				"abort",
 				() => {
 					child.kill("SIGTERM");
 				},
 				{ once: true },
 			);
 		}
 	});
 }
 export async function executeTool(name: string, args: string, signal?: AbortSignal): Promise<string> {
 	const parsed = JSON.parse(args);
 	switch (name) {
 		case "read": {
 			const path = parsed.path;
 			if (!path) return "Error: path parameter is required";
 			const file = resolve(path);
 			if (!existsSync(file)) return `File not found: ${file}`;
 			// Check file size before reading
 			const stats = statSync(file);
 			const MAX_FILE_SIZE = 1024 * 1024; // 1MB limit
 			if (stats.size > MAX_FILE_SIZE) {
 				// Read only the first 1MB
 				const fd = openSync(file, "r");
 				const buffer = Buffer.alloc(MAX_FILE_SIZE);
 				readSync(fd, buffer, 0, MAX_FILE_SIZE, 0);
 				closeSync(fd);
 				return buffer.toString("utf8") + "\n\n... [File truncated - exceeded 1MB limit] ...";
 			}
 			const data = readFileSync(file, "utf8");
 			return data;
 		}
 		case "list": {
 			const path = parsed.path || ".";
 			const dir = resolve(path);
 			if (!existsSync(dir)) return `Directory not found: ${dir}`;
 			const entries = readdirSync(dir, { withFileTypes: true });
 			return entries.map((entry) => (entry.isDirectory() ? entry.name + "/" : entry.name)).join("\n");
 		}
 		case "bash": {
 			const command = parsed.command;
 			if (!command) return "Error: command parameter is required";
 			try {
 				const output = await execWithAbort(command, signal);
 				return output || "Command executed successfully";
 			} catch (e: any) {
 				if (e.message === "Interrupted") {
 					throw e; // Re-throw interruption
 				}
 				throw new Error(`Command failed: ${e.message}`);
 			}
 		}
 		case "glob": {
 			const pattern = parsed.pattern;
 			if (!pattern) return "Error: pattern parameter is required";
 			const searchPath = parsed.path || process.cwd();
 			try {
 				const matches = await glob(pattern, {
 					cwd: searchPath,
 					dot: true,
 					nodir: false,
 					mark: true, // Add / to directories
 				});
 				if (matches.length === 0) {
 					return "No files found matching the pattern";
 				}
 				// Sort by modification time (most recent first) if possible
 				return matches.sort().join("\n");
 			} catch (e: any) {
 				return `Glob error: ${e.message}`;
 			}
 		}
 		case "rg": {
 			const args = parsed.args;
 			if (!args) return "Error: args parameter is required";
 			// Force ripgrep to never read from stdin by redirecting stdin from /dev/null
 			const cmd = `rg ${args} < /dev/null`;
 			try {
 				const output = await execWithAbort(cmd, signal);
 				return output.trim() || "No matches found";
 			} catch (e: any) {
 				if (e.message === "Interrupted") {
 					throw e; // Re-throw interruption
 				}
 				return `ripgrep error: ${e.message}`;
 			}
 		}
 		default:
 			return `Unknown tool: ${name}`;
 	}
 }
--- a/packages/agent/tsconfig.build.json
+++ b/packages/agent/tsconfig.build.json
@ -0,0 +1,9 @@
 {
 	"extends": "../../tsconfig.base.json",
 	"compilerOptions": {
 		"outDir": "./dist",
 		"rootDir": "./src"
 	},
 	"include": ["src/**/*"],
 	"exclude": ["node_modules", "dist"]
 }
--- a/packages/pods/README.md
+++ b/packages/pods/README.md
@ -0,0 +1,511 @@
 # pi
 Deploy and manage LLMs on GPU pods with automatic vLLM configuration for agentic workloads.
 ## Installation
 ```bash
 npm install -g @mariozechner/pi
 ```
 ## What is pi?
 `pi` simplifies running large language models on remote GPU pods. It automatically:
 - Sets up vLLM on fresh Ubuntu pods
 - Configures tool calling for agentic models (Qwen, GPT-OSS, GLM, etc.)
 - Manages multiple models on the same pod with "smart" GPU allocation
 - Provides OpenAI-compatible API endpoints for each model
 - Includes an interactive agent with file system tools for testing
 ## Quick Start
 ```bash
 # Set required environment variables
 export HF_TOKEN=your_huggingface_token      # Get from https://huggingface.co/settings/tokens
 export PI_API_KEY=your_api_key              # Any string you want for API authentication
 # Setup a DataCrunch pod with NFS storage (models path auto-extracted)
 pi pods setup dc1 "ssh root@1.2.3.4" \
  --mount "sudo mount -t nfs -o nconnect=16 nfs.fin-02.datacrunch.io:/your-pseudo /mnt/hf-models"
 # Start a model (automatic configuration for known models)
 pi start Qwen/Qwen2.5-Coder-32B-Instruct --name qwen
 # Send a single message to the model
 pi agent qwen "What is the Fibonacci sequence?"
 # Interactive chat mode with file system tools
 pi agent qwen -i
 # Use with any OpenAI-compatible client
 export OPENAI_BASE_URL='http://1.2.3.4:8001/v1'
 export OPENAI_API_KEY=$PI_API_KEY
 ```
 ## Prerequisites
 - Node.js 18+
 - HuggingFace token (for model downloads)
 - GPU pod with:
  - Ubuntu 22.04 or 24.04
  - SSH root access
  - NVIDIA drivers installed
  - Persistent storage for models
 ## Supported Providers
 ### Primary Support
 **DataCrunch** - Best for shared model storage
 - NFS volumes sharable across multiple pods in same region
 - Models download once, use everywhere
 - Ideal for teams or multiple experiments
 **RunPod** - Good persistent storage
 - Network volumes persist independently
 - Cannot share between running pods simultaneously
 - Good for single-pod workflows
 ### Also Works With
 - Vast.ai (volumes locked to specific machine)
 - Prime Intellect (no persistent storage)
 - AWS EC2 (with EFS setup)
 - Any Ubuntu machine with NVIDIA GPUs, CUDA driver, and SSH
 ## Commands
 ### Pod Management
 ```bash
 pi pods setup <name> "<ssh>" [options]        # Setup new pod
  --mount "<mount_command>"                   # Run mount command during setup
  --models-path <path>                        # Override extracted path (optional)
  --vllm release|nightly|gpt-oss              # vLLM version (default: release)
 pi pods                                       # List all configured pods
 pi pods active <name>                         # Switch active pod
 pi pods remove <name>                         # Remove pod from local config
 pi shell [<name>]                             # SSH into pod
 pi ssh [<name>] "<command>"                   # Run command on pod
 ```
 **Note**: When using `--mount`, the models path is automatically extracted from the mount command's target directory. You only need `--models-path` if not using `--mount` or to override the extracted path.
 #### vLLM Version Options
 - `release` (default): Stable vLLM release, recommended for most users
 - `nightly`: Latest vLLM features, needed for newest models like GLM-4.5
 - `gpt-oss`: Special build for OpenAI's GPT-OSS models only
 ### Model Management
 ```bash
 pi start <model> --name <name> [options]  # Start a model
  --memory <percent>      # GPU memory: 30%, 50%, 90% (default: 90%)
  --context <size>        # Context window: 4k, 8k, 16k, 32k, 64k, 128k
  --gpus <count>          # Number of GPUs to use (predefined models only)
  --pod <name>            # Target specific pod (overrides active)
  --vllm <args...>        # Pass custom args directly to vLLM
 pi stop [<name>]          # Stop model (or all if no name given)
 pi list                   # List running models with status
 pi logs <name>            # Stream model logs (tail -f)
 ```
 ### Agent & Chat Interface
 ```bash
 pi agent <name> "<message>"               # Single message to model
 pi agent <name> "<msg1>" "<msg2>"         # Multiple messages in sequence
 pi agent <name> -i                        # Interactive chat mode
 pi agent <name> -i -c                     # Continue previous session
 # Standalone OpenAI-compatible agent (works with any API)
 pi-agent --base-url http://localhost:8000/v1 --model llama-3.1 "Hello"
 pi-agent --api-key sk-... "What is 2+2?"  # Uses OpenAI by default
 pi-agent --json "What is 2+2?"            # Output event stream as JSONL
 pi-agent -i                                # Interactive mode
 ```
 The agent includes tools for file operations (read, list, bash, glob, rg) to test agentic capabilities, particularly useful for code navigation and analysis tasks.
 ## Predefined Model Configurations
 `pi` includes predefined configurations for popular agentic models, so you do not have to specify `--vllm` arguments manually. `pi` will also check if the model you selected can actually run on your pod with respect to the number of GPUs and available VRAM. Run `pi start` without additional arguments to see a list of predefined models that can run on the active pod.
 ### Qwen Models
 ```bash
 # Qwen2.5-Coder-32B - Excellent coding model, fits on single H100/H200
 pi start Qwen/Qwen2.5-Coder-32B-Instruct --name qwen
 # Qwen3-Coder-30B - Advanced reasoning with tool use
 pi start Qwen/Qwen3-Coder-30B-A3B-Instruct --name qwen3
 # Qwen3-Coder-480B - State-of-the-art on 8xH200 (data-parallel mode)
 pi start Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --name qwen-480b
 ```
 ### GPT-OSS Models
 ```bash
 # Requires special vLLM build during setup
 pi pods setup gpt-pod "ssh root@1.2.3.4" --models-path /workspace --vllm gpt-oss
 # GPT-OSS-20B - Fits on 16GB+ VRAM
 pi start openai/gpt-oss-20b --name gpt20
 # GPT-OSS-120B - Needs 60GB+ VRAM
 pi start openai/gpt-oss-120b --name gpt120
 ```
 ### GLM Models
 ```bash
 # GLM-4.5 - Requires 8-16 GPUs, includes thinking mode
 pi start zai-org/GLM-4.5 --name glm
 # GLM-4.5-Air - Smaller version, 1-2 GPUs
 pi start zai-org/GLM-4.5-Air --name glm-air
 ```
 ### Custom Models with --vllm
 For models not in the predefined list, use `--vllm` to pass arguments directly to vLLM:
 ```bash
 # DeepSeek with custom settings
 pi start deepseek-ai/DeepSeek-V3 --name deepseek --vllm \
  --tensor-parallel-size 4 --trust-remote-code
 # Mistral with pipeline parallelism
 pi start mistralai/Mixtral-8x22B-Instruct-v0.1 --name mixtral --vllm \
  --tensor-parallel-size 8 --pipeline-parallel-size 2
 # Any model with specific tool parser
 pi start some/model --name mymodel --vllm \
  --tool-call-parser hermes --enable-auto-tool-choice
 ```
 ## DataCrunch Setup
 DataCrunch offers the best experience with shared NFS storage across pods:
 ### 1. Create Shared Filesystem (SFS)
 - Go to DataCrunch dashboard → Storage → Create SFS
 - Choose size and datacenter
 - Note the mount command (e.g., `sudo mount -t nfs -o nconnect=16 nfs.fin-02.datacrunch.io:/hf-models-fin02-8ac1bab7 /mnt/hf-models-fin02`)
 ### 2. Create GPU Instance
 - Create instance in same datacenter as SFS
 - Share the SFS with the instance
 - Get SSH command from dashboard
 ### 3. Setup with pi
 ```bash
 # Get mount command from DataCrunch dashboard
 pi pods setup dc1 "ssh root@instance.datacrunch.io" \
  --mount "sudo mount -t nfs -o nconnect=16 nfs.fin-02.datacrunch.io:/your-pseudo /mnt/hf-models"
 # Models automatically stored in /mnt/hf-models (extracted from mount command)
 ```
 ### 4. Benefits
 - Models persist across instance restarts
 - Share models between multiple instances in same datacenter
 - Download once, use everywhere
 - Pay only for storage, not compute time during downloads
 ## RunPod Setup
 RunPod offers good persistent storage with network volumes:
 ### 1. Create Network Volume (optional)
 - Go to RunPod dashboard → Storage → Create Network Volume
 - Choose size and region
 ### 2. Create GPU Pod
 - Select "Network Volume" during pod creation (if using)
 - Attach your volume to `/runpod-volume`
 - Get SSH command from pod details
 ### 3. Setup with pi
 ```bash
 # With network volume
 pi pods setup runpod "ssh root@pod.runpod.io" --models-path /runpod-volume
 # Or use workspace (persists with pod but not shareable)
 pi pods setup runpod "ssh root@pod.runpod.io" --models-path /workspace
 ```
 ## Multi-GPU Support
 ### Automatic GPU Assignment
 When running multiple models, pi automatically assigns them to different GPUs:
 ```bash
 pi start model1 --name m1  # Auto-assigns to GPU 0
 pi start model2 --name m2  # Auto-assigns to GPU 1
 pi start model3 --name m3  # Auto-assigns to GPU 2
 ```
 ### Specify GPU Count for Predefined Models
 For predefined models with multiple configurations, use `--gpus` to control GPU usage:
 ```bash
 # Run Qwen on 1 GPU instead of all available
 pi start Qwen/Qwen2.5-Coder-32B-Instruct --name qwen --gpus 1
 # Run GLM-4.5 on 8 GPUs (if it has an 8-GPU config)
 pi start zai-org/GLM-4.5 --name glm --gpus 8
 ```
 If the model doesn't have a configuration for the requested GPU count, you'll see available options.
 ### Tensor Parallelism for Large Models
 For models that don't fit on a single GPU:
 ```bash
 # Use all available GPUs
 pi start meta-llama/Llama-3.1-70B-Instruct --name llama70b --vllm \
  --tensor-parallel-size 4
 # Specific GPU count
 pi start Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --name qwen480 --vllm \
  --data-parallel-size 8 --enable-expert-parallel
 ```
 ## API Integration
 All models expose OpenAI-compatible endpoints:
 ```python
 from openai import OpenAI
 client = OpenAI(
    base_url="http://your-pod-ip:8001/v1",
    api_key="your-pi-api-key"
 )
 # Chat completion with tool calling
 response = client.chat.completions.create(
    model="Qwen/Qwen2.5-Coder-32B-Instruct",
    messages=[
        {"role": "user", "content": "Write a Python function to calculate fibonacci"}
    ],
    tools=[{
        "type": "function",
        "function": {
            "name": "execute_code",
            "description": "Execute Python code",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {"type": "string"}
                },
                "required": ["code"]
            }
        }
    }],
    tool_choice="auto"
 )
 ```
 ## Standalone Agent CLI
 `pi` includes a standalone OpenAI-compatible agent that can work with any API:
 ```bash
 # Install globally to get pi-agent command
 npm install -g @mariozechner/pi
 # Use with OpenAI
 pi-agent --api-key sk-... "What is machine learning?"
 # Use with local vLLM
 pi-agent --base-url http://localhost:8000/v1 \
         --model meta-llama/Llama-3.1-8B-Instruct \
         --api-key dummy \
         "Explain quantum computing"
 # Interactive mode
 pi-agent -i
 # Continue previous session
 pi-agent --continue "Follow up question"
 # Custom system prompt
 pi-agent --system-prompt "You are a Python expert" "Write a web scraper"
 # Use responses API (for GPT-OSS models)
 pi-agent --api responses --model openai/gpt-oss-20b "Hello"
 ```
 The agent supports:
 - Session persistence across conversations
 - Interactive TUI mode with syntax highlighting
 - File system tools (read, list, bash, glob, rg) for code navigation
 - Both Chat Completions and Responses API formats
 - Custom system prompts
 ## Tool Calling Support
 `pi` automatically configures appropriate tool calling parsers for known models:
 - **Qwen models**: `hermes` parser (Qwen3-Coder uses `qwen3_coder`)
 - **GLM models**: `glm4_moe` parser with reasoning support
 - **GPT-OSS models**: Uses `/v1/responses` endpoint, as tool calling (function calling in OpenAI parlance) is currently a [WIP with the `v1/chat/completions` endpoint](https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#tool-use).
 - **Custom models**: Specify with `--vllm --tool-call-parser <parser> --enable-auto-tool-choice`
 To disable tool calling:
 ```bash
 pi start model --name mymodel --vllm --disable-tool-call-parser
 ```
 ## Memory and Context Management
 ### GPU Memory Allocation
 Controls how much GPU memory vLLM pre-allocates:
 - `--memory 30%`: High concurrency, limited context
 - `--memory 50%`: Balanced (default)
 - `--memory 90%`: Maximum context, low concurrency
 ### Context Window
 Sets maximum input + output tokens:
 - `--context 4k`: 4,096 tokens total
 - `--context 32k`: 32,768 tokens total
 - `--context 128k`: 131,072 tokens total
 Example for coding workload:
 ```bash
 # Large context for code analysis, moderate concurrency
 pi start Qwen/Qwen2.5-Coder-32B-Instruct --name coder \
  --context 64k --memory 70%
 ```
 **Note**: When using `--vllm`, the `--memory`, `--context`, and `--gpus` parameters are ignored. You'll see a warning if you try to use them together.
 ## Session Persistence
 The interactive agent mode (`-i`) saves sessions for each project directory:
 ```bash
 # Start new session
 pi agent qwen -i
 # Continue previous session (maintains chat history)
 pi agent qwen -i -c
 ```
 Sessions are stored in `~/.pi/sessions/` organized by project path and include:
 - Complete conversation history
 - Tool call results
 - Token usage statistics
 ## Architecture & Event System
 The agent uses a unified event-based architecture where all interactions flow through `AgentEvent` types. This enables:
 - Consistent UI rendering across console and TUI modes
 - Session recording and replay
 - Clean separation between API calls and UI updates
 - JSON output mode for programmatic integration
 Events are automatically converted to the appropriate API format (Chat Completions or Responses) based on the model type.
 ### JSON Output Mode
 Use `--json` flag to output the event stream as JSONL (JSON Lines) for programmatic consumption:
 ```bash
 pi-agent --api-key sk-... --json "What is 2+2?"
 ```
 Each line is a complete JSON object representing an event:
 ```jsonl
 {"type":"user_message","text":"What is 2+2?"}
 {"type":"assistant_start"}
 {"type":"assistant_message","text":"2 + 2 = 4"}
 {"type":"token_usage","inputTokens":10,"outputTokens":5,"totalTokens":15,"cacheReadTokens":0,"cacheWriteTokens":0}
 ```
 ## Troubleshooting
 ### OOM (Out of Memory) Errors
 - Reduce `--memory` percentage
 - Use smaller model or quantized version (FP8)
 - Reduce `--context` size
 ### Model Won't Start
 ```bash
 # Check GPU usage
 pi ssh "nvidia-smi"
 # Check if port is in use
 pi list
 # Force stop all models
 pi stop
 ```
 ### Tool Calling Issues
 - Not all models support tool calling reliably
 - Try different parser: `--vllm --tool-call-parser mistral`
 - Or disable: `--vllm --disable-tool-call-parser`
 ### Access Denied for Models
 Some models (Llama, Mistral) require HuggingFace access approval. Visit the model page and click "Request access".
 ### vLLM Build Issues
 If using `--vllm nightly` fails, try:
 - Use `--vllm release` for stable version
 - Check CUDA compatibility with `pi ssh "nvidia-smi"`
 ### Agent Not Finding Messages
 If the agent shows configuration instead of your message, ensure quotes around messages with special characters:
 ```bash
 # Good
 pi agent qwen "What is this file about?"
 # Bad (shell might interpret special chars)
 pi agent qwen What is this file about?
 ```
 ## Advanced Usage
 ### Working with Multiple Pods
 ```bash
 # Override active pod for any command
 pi start model --name test --pod dev-pod
 pi list --pod prod-pod
 pi stop test --pod dev-pod
 ```
 ### Custom vLLM Arguments
 ```bash
 # Pass any vLLM argument after --vllm
 pi start model --name custom --vllm \
  --quantization awq \
  --enable-prefix-caching \
  --max-num-seqs 256 \
  --gpu-memory-utilization 0.95
 ```
 ### Monitoring
 ```bash
 # Watch GPU utilization
 pi ssh "watch -n 1 nvidia-smi"
 # Check model downloads
 pi ssh "du -sh ~/.cache/huggingface/hub/*"
 # View all logs
 pi ssh "ls -la ~/.vllm_logs/"
 # Check agent session history
 ls -la ~/.pi/sessions/
 ```
 ## Environment Variables
 - `HF_TOKEN` - HuggingFace token for model downloads
 - `PI_API_KEY` - API key for vLLM endpoints
 - `PI_CONFIG_DIR` - Config directory (default: `~/.pi`)
 - `OPENAI_API_KEY` - Used by `pi-agent` when no `--api-key` provided
 ## License
 MIT
--- a/packages/pods/docs/gml-4.5.md
+++ b/packages/pods/docs/gml-4.5.md
@ -0,0 +1,189 @@
 # GLM-4.5
 [中文阅读](./README_zh.md)
 <div align="center">
 <img src=resources/logo.svg width="15%"/>
 </div>
 <p align="center">
    👋 Join our <a href="resources/WECHAT.md" target="_blank">WeChat</a> or <a href="https://discord.gg/QR7SARHRxK" target="_blank">Discord</a> community.
    <br>
    📖 Check out the GLM-4.5 <a href="https://z.ai/blog/glm-4.5" target="_blank">technical blog</a>.
    <br>
    📍 Use GLM-4.5 API services on <a href="https://docs.z.ai/guides/llm/glm-4.5">Z.ai API Platform (Global)</a> or <br> <a href="https://docs.bigmodel.cn/cn/guide/models/text/glm-4.5">Zhipu AI Open Platform (Mainland China)</a>.
    <br>
    👉 One click to <a href="https://chat.z.ai">GLM-4.5</a>.
 </p>
 ## Model Introduction
 The **GLM-4.5** series models are foundation models designed for intelligent agents. GLM-4.5 has **355** billion total
 parameters with **32** billion active parameters, while GLM-4.5-Air adopts a more compact design with **106** billion
 total parameters and **12** billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent
 capabilities to meet the complex demands of intelligent agent applications.
 Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and
 tool usage, and non-thinking mode for immediate responses.
 We have open-sourced the base models, hybrid reasoning models, and FP8 versions of the hybrid reasoning models for both
 GLM-4.5 and GLM-4.5-Air. They are released under the MIT open-source license and can be used commercially and for
 secondary development.
 As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional
 performance with a score of **63.2**, in the **3rd** place among all the proprietary and open-source models. Notably,
 GLM-4.5-Air delivers competitive results at **59.8** while maintaining superior efficiency.
 ![bench](resources/bench.png)
 For more eval results, show cases, and technical details, please visit
 our [technical blog](https://z.ai/blog/glm-4.5). The technical report will be released soon.
 The model code, tool parser and reasoning parser can be found in the implementation
 of [transformers](https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4_moe), [vLLM](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4_moe_mtp.py)
 and [SGLang](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/glm4_moe.py).
 ## Model Downloads
 You can directly experience the model on [Hugging Face](https://huggingface.co/spaces/zai-org/GLM-4.5-Space)
 or [ModelScope](https://modelscope.cn/studios/ZhipuAI/GLM-4.5-Demo) or download the model by following the links below.
 | Model            | Download Links                                                                                                                                | Model Size | Precision |
 |------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|------------|-----------|
 | GLM-4.5          | [🤗 Hugging Face](https://huggingface.co/zai-org/GLM-4.5)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/GLM-4.5)                   | 355B-A32B  | BF16      |
 | GLM-4.5-Air      | [🤗 Hugging Face](https://huggingface.co/zai-org/GLM-4.5-Air)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/GLM-4.5-Air)           | 106B-A12B  | BF16      |
 | GLM-4.5-FP8      | [🤗 Hugging Face](https://huggingface.co/zai-org/GLM-4.5-FP8)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/GLM-4.5-FP8)           | 355B-A32B  | FP8       |
 | GLM-4.5-Air-FP8  | [🤗 Hugging Face](https://huggingface.co/zai-org/GLM-4.5-Air-FP8)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/GLM-4.5-Air-FP8)   | 106B-A12B  | FP8       |
 | GLM-4.5-Base     | [🤗 Hugging Face](https://huggingface.co/zai-org/GLM-4.5-Base)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/GLM-4.5-Base)         | 355B-A32B  | BF16      |
 | GLM-4.5-Air-Base | [🤗 Hugging Face](https://huggingface.co/zai-org/GLM-4.5-Air-Base)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/GLM-4.5-Air-Base) | 106B-A12B  | BF16      |
 ## System Requirements
 ### Inference
 We provide minimum and recommended configurations for "full-featured" model inference. The data in the table below is
 based on the following conditions:
 1. All models use MTP layers and specify
   `--speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4` to ensure competitive
   inference speed.
 2. The `cpu-offload` parameter is not used.
 3. Inference batch size does not exceed `8`.
 4. All are executed on devices that natively support FP8 inference, ensuring both weights and cache are in FP8 format.
 5. Server memory must exceed `1T` to ensure normal model loading and operation.
 The models can run under the configurations in the table below:
 | Model       | Precision | GPU Type and Count   | Test Framework |
 |-------------|-----------|----------------------|----------------|
 | GLM-4.5     | BF16      | H100 x 16 / H200 x 8 | sglang         |
 | GLM-4.5     | FP8       | H100 x 8 / H200 x 4  | sglang         |
 | GLM-4.5-Air | BF16      | H100 x 4 / H200 x 2  | sglang         |
 | GLM-4.5-Air | FP8       | H100 x 2 / H200 x 1  | sglang         |
 Under the configurations in the table below, the models can utilize their full 128K context length:
 | Model       | Precision | GPU Type and Count    | Test Framework |
 |-------------|-----------|-----------------------|----------------|
 | GLM-4.5     | BF16      | H100 x 32 / H200 x 16 | sglang         |
 | GLM-4.5     | FP8       | H100 x 16 / H200 x 8  | sglang         |
 | GLM-4.5-Air | BF16      | H100 x 8 / H200 x 4   | sglang         |
 | GLM-4.5-Air | FP8       | H100 x 4 / H200 x 2   | sglang         |
 ### Fine-tuning
 The code can run under the configurations in the table below
 using [Llama Factory](https://github.com/hiyouga/LLaMA-Factory):
 | Model       | GPU Type and Count | Strategy | Batch Size (per GPU) |
 |-------------|--------------------|----------|----------------------|
 | GLM-4.5     | H100 x 16          | Lora     | 1                    |
 | GLM-4.5-Air | H100 x 4           | Lora     | 1                    |
 The code can run under the configurations in the table below using [Swift](https://github.com/modelscope/ms-swift):
 | Model       | GPU Type and Count | Strategy | Batch Size (per GPU) |
 |-------------|--------------------|----------|----------------------|
 | GLM-4.5     | H20 (96GiB) x 16   | Lora     | 1                    |
 | GLM-4.5-Air | H20 (96GiB) x 4    | Lora     | 1                    |
 | GLM-4.5     | H20 (96GiB) x 128  | SFT      | 1                    |
 | GLM-4.5-Air | H20 (96GiB) x 32   | SFT      | 1                    |
 | GLM-4.5     | H20 (96GiB) x 128  | RL       | 1                    |
 | GLM-4.5-Air | H20 (96GiB) x 32   | RL       | 1                    |
 ## Quick Start
 Please install the required packages according to `requirements.txt`.
 ```shell
 pip install -r requirements.txt
 ```
 ### transformers
 Please refer to the `trans_infer_cli.py` code in the `inference` folder.
 ### vLLM
 + Both BF16 and FP8 can be started with the following code:
 ```shell
 vllm serve zai-org/GLM-4.5-Air \
    --tensor-parallel-size 8 \
    --tool-call-parser glm45 \
    --reasoning-parser glm45 \
    --enable-auto-tool-choice \
    --served-model-name glm-4.5-air
 ```
 If you're using 8x H100 GPUs and encounter insufficient memory when running the GLM-4.5 model, you'll need
 `--cpu-offload-gb 16` (only applicable to vLLM).
 If you encounter `flash infer` issues, use `VLLM_ATTENTION_BACKEND=XFORMERS` as a temporary replacement. You can also
 specify `TORCH_CUDA_ARCH_LIST='9.0+PTX'` to use `flash infer` (different GPUs have different TORCH_CUDA_ARCH_LIST
 values, please check accordingly).
 ### SGLang
 + BF16
 ```shell
 python3 -m sglang.launch_server \
  --model-path zai-org/GLM-4.5-Air \
  --tp-size 8 \
  --tool-call-parser glm45  \
  --reasoning-parser glm45 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --mem-fraction-static 0.7 \
  --served-model-name glm-4.5-air \
  --host 0.0.0.0 \
  --port 8000
 ```
 + FP8
 ```shell
 python3 -m sglang.launch_server \
  --model-path zai-org/GLM-4.5-Air-FP8 \
  --tp-size 4 \
  --tool-call-parser glm45  \
  --reasoning-parser glm45  \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3  \
  --speculative-eagle-topk 1  \
  --speculative-num-draft-tokens 4 \
  --mem-fraction-static 0.7 \
  --disable-shared-experts-fusion \
  --served-model-name glm-4.5-air-fp8 \
  --host 0.0.0.0 \
  --port 8000
 ```
 ### Request Parameter Instructions
 + When using `vLLM` and `SGLang`, thinking mode is enabled by default when sending requests. If you want to disable the
  thinking switch, you need to add the `extra_body={"chat_template_kwargs": {"enable_thinking": False}}` parameter.
 + Both support tool calling. Please use OpenAI-style tool description format for calls.
 + For specific code, please refer to `api_request.py` in the `inference` folder.
--- a/packages/pods/docs/gpt-oss.md
+++ b/packages/pods/docs/gpt-oss.md
@ -0,0 +1,233 @@
 ## `gpt-oss` vLLM Usage Guide
 `gpt-oss-20b` and `gpt-oss-120b` are powerful reasoning models open-sourced by OpenAI.
 In vLLM, you can run it on NVIDIA H100, H200, B200 as well as MI300x, MI325x, MI355x and Radeon AI PRO R9700.
 We are actively working on ensuring this model can work on Ampere, Ada Lovelace, and RTX 5090.
 Specifically, vLLM optimizes for `gpt-oss` family of models with
 * **Flexible parallelism options**: the model can be sharded across 2, 4, 8 GPUs, scaling throughput.
 * **High performance attention and MoE kernels**: attention kernel is specifically optimized for the attention sinks mechanism and sliding window shapes.
 * **Asynchronous scheduling**: optimizing for maximum utilization and high throughput by overlapping CPU operations with GPU operations.
 This is a living document and we welcome contributions, corrections, and creation of new recipes!
 ## Quickstart
 ### Installation
 We highly recommend using a new virtual environment, as the first iteration of the release requires cutting edge kernels from various dependencies, these might not work with other models. In particular, we will be installing: a prerelease version of vLLM, PyTorch nightly, Triton nightly, FlashInfer prerelease, HuggingFace prerelease, Harmony, and gpt-oss library tools.
 ```
 uv venv
 source .venv/bin/activate
 uv pip install --pre vllm==0.10.1+gptoss \
    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
    --index-strategy unsafe-best-match
 ```
 We also provide a docker container with all the dependencies built in
 ```
 docker run --gpus all \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:gptoss \
    --model openai/gpt-oss-20b
 ```
 ### H100 & H200
 You can serve the model with its default parameters:
 * `--async-scheduling` can be enabled for higher performance. Currently it is not compatible with structured output.
 * We recommend TP=2 for H100 and H200 as the best performance tradeoff point.
 ```
 # openai/gpt-oss-20b should run in single GPU
 vllm serve openai/gpt-oss-20b --async-scheduling
 # gpt-oss-120b will fit in a single H100/H200, but scaling it to higher TP sizes can help with throughput
 vllm serve openai/gpt-oss-120b --async-scheduling
 vllm serve openai/gpt-oss-120b --tensor-parallel-size 2 --async-scheduling
 vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling
 ```
 ### B200
 NVIDIA Blackwell requires installation of FlashInfer library and several environments to enable the necessary kernels. We recommend TP=1 as a starting point for a performant option. We are actively working on the performance of vLLM on Blackwell.
 ```
 # All 3 of these are required
 export VLLM_USE_TRTLLM_ATTENTION=1
 export VLLM_USE_TRTLLM_DECODE_ATTENTION=1
 export VLLM_USE_TRTLLM_CONTEXT_ATTENTION=1
 # Pick only one out of the two.
 # mxfp8 activation for MoE. faster, but higher risk for accuracy.
 export VLLM_USE_FLASHINFER_MXFP4_MOE=1
 # bf16 activation for MoE. matching reference precision.
 export VLLM_USE_FLASHINFER_MXFP4_BF16_MOE=1
 # openai/gpt-oss-20b
 vllm serve openai/gpt-oss-20b --async-scheduling
 # gpt-oss-120b
 vllm serve openai/gpt-oss-120b --async-scheduling
 vllm serve openai/gpt-oss-120b --tensor-parallel-size 2 --async-scheduling
 vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling
 ```
 ### AMD
 ROCm supports OpenAI gpt-oss-120b or gpt-oss-20b models on these 3 different GPUs on day one, along with the pre-built docker containers:
 * gfx950: MI350x series, `rocm/vllm-dev:open-mi355-08052025`
 * gfx942: MI300x/MI325 series, `rocm/vllm-dev:open-mi300-08052025`
 * gfx1201: Radeon AI PRO R9700, `rocm/vllm-dev:open-r9700-08052025`
 To run the container:
 ```
 alias drun='sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 32G -v /data:/data -v $HOME:/myhome -w /myhome'
 drun rocm/vllm-dev:open-mi300-08052025
 ```
 For MI300x and R9700:
 ```
 export VLLM_ROCM_USE_AITER=1
 export VLLM_USE_AITER_UNIFIED_ATTENTION=1
 export VLLM_ROCM_USE_AITER_MHA=0
 vllm serve openai/gpt-oss-120b --compilation-config '{"full_cuda_graph": true}'
 ```
 For MI355x:
 ```
 # MoE preshuffle, fusion and Triton GEMM flags
 export VLLM_USE_AITER_TRITON_FUSED_SPLIT_QKV_ROPE=1
 export VLLM_USE_AITER_TRITON_FUSED_ADD_RMSNORM_PAD=1
 export VLLM_USE_AITER_TRITON_GEMM=1
 export VLLM_ROCM_USE_AITER=1
 export VLLM_USE_AITER_UNIFIED_ATTENTION=1
 export VLLM_ROCM_USE_AITER_MHA=0
 export TRITON_HIP_PRESHUFFLE_SCALES=1
 vllm serve openai/gpt-oss-120b --compilation-config '{"compile_sizes": [1, 2, 4, 8, 16, 24, 32, 64, 128, 256, 4096, 8192], "full_cuda_graph": true}' --block-size 64
 ```
 ## Usage
 Once the `vllm serve` runs and `INFO: Application startup complete` has been displayed, you can send requests using HTTP request or OpenAI SDK to the following endpoints:
 * `/v1/responses` endpoint can perform tool use (browsing, python, mcp) in between chain-of-thought and deliver a final response. This endpoint leverages the `openai-harmony` library for input rendering and output parsing. Stateful operation and full streaming API are work in progress. Responses API is recommended by OpenAI as the way to interact with this model.
 * `/v1/chat/completions` endpoint offers a familiar interface to this model. No tool will be invoked but reasoning and final text output will be returned structurally. Function calling is work in progress. You can also set the parameter `include_reasoning: false` in request parameter to skip CoT being part of the output.
 * `/v1/completions` endpoint is the endpoint for a simple input output interface without any sorts of template rendering.
 All endpoints accept `stream: true` as part of the operations to enable incremental token streaming. Please note that vLLM currently does not cover the full scope of responses API, for more detail, please see Limitation section below.
 ### Tool Use
 One premier feature of gpt-oss is the ability to call tools directly, called "built-in tools". In vLLM, we offer several options:
 * By default, we integrate with the reference library's browser (with `ExaBackend`) and demo Python interpreter via docker container. In order to use the search backend, you need to get access to [exa.ai](http://exa.ai) and put `EXA_API_KEY=` as an environment variable. For Python, either have docker available, or set `PYTHON_EXECUTION_BACKEND=UV` to dangerously allow execution of model generated code snippets to be executed on the same machine.
 ```
 uv pip install gpt-oss
 vllm serve ... --tool-server demo
 ```
 * Please note that the default options are simply for demo purposes. For production usage, vLLM itself can act as MCP client to multiple services.
 Here is an [example tool server](https://github.com/openai/gpt-oss/tree/main/gpt-oss-mcp-server) that vLLM can work with, they wrap the demo tools:
 ```
 mcp run -t sse browser_server.py:mcp
 mcp run -t sse python_server.py:mcp
 vllm serve ... --tool-server ip-1:port-1,ip-2:port-2
 ```
 The URLs are expected to be MCP SSE servers that implement `instructions` in server info and well documented tools. The tools will be injected into the system prompt for the model to enable them.
 ## Accuracy Evaluation Panels
 OpenAI recommends using the gpt-oss reference library to perform evaluation. For example,
 ```
 python -m gpt_oss.evals --model 120b-low --eval gpqa --n-threads 128
 python -m gpt_oss.evals --model 120b --eval gpqa --n-threads 128
 python -m gpt_oss.evals --model 120b-high --eval gpqa --n-threads 128
 ```
 To eval on AIME2025, change `gpqa` to `aime25`.
 With vLLM deployed:
 ```
 # Example deployment on 8xH100
 vllm serve openai/gpt-oss-120b \
  --tensor_parallel_size 8 \
  --max-model-len 131072 \
  --max-num-batched-tokens 10240 \
  --max-num-seqs 128 \
  --gpu-memory-utilization 0.85 \
  --no-enable-prefix-caching
 ```
 Here is the score we were able to reproduce without tool use, and we encourage you to try reproducing it as well!
 We’ve observed that the numbers may vary slightly across runs, so feel free to run the evaluation multiple times to get a sense of the variance.
 For a quick correctness check, we recommend starting with the low reasoning effort setting (120b-low), which should complete within minutes.
 Model: 120B
 | Reasoning Effort | GPQA | AIME25 |
 | :---- | :---- | :---- |
 | Low  | 65.3 | 51.2 |
 | Mid  | 72.4 | 79.6 |
 | High  | 79.4 | 93.0 |
 Model: 20B
 | Reasoning Effort | GPQA | AIME25 |
 | :---- | :---- | :---- |
 | Low  | 56.8 | 38.8 |
 | Mid  | 67.5 | 75.0 |
 | High  | 70.9 | 85.8  |
 ## Known Limitations
 * On H100 using tensor parallel size 1, default gpu memory utilization, and batched token will cause CUDA Out-of-memory. When running tp1, please increase your gpu memory utilization or lower batched token
 ```
 vllm serve openai/gpt-oss-120b --gpu-memory-utilization 0.95 --max-num-batched-tokens 1024
 ```
 * When running TP2 on H100, set your gpu memory utilization below 0.95 as that will also cause OOM
 * Responses API has several limitations at the current moment; we strongly welcome contribution and maintenance of this service in vLLM
 * Usage accounting is currently broken and only returns all zeros.
 * Annotations (citing URLs from search results) are not supported.
 * Truncation by `max_tokens` might not be able to preserve partial chunks.
 * Streaming is fairly barebone at the moment, for example:
  * Item id and indexing needs more work
  * Tool invocation and output are not properly streamed, rather batched.
  * Proper error handling is missing.
 ## Troubleshooting
 - Attention sink dtype error on Blackwell:
 ```
  ERROR 08-05 07:31:10 [multiproc_executor.py:559]     assert sinks.dtype == torch.float32, "Sinks must be of type float32"
  **(VllmWorker TP0 pid=174579)** ERROR 08-05 07:31:10 [multiproc_executor.py:559]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  **(VllmWorker TP0 pid=174579)** ERROR 08-05 07:31:10 [multiproc_executor.py:559] AssertionError: Sinks must be of type float32
 ```
 **Solution: Please refer to Blackwell section to check if related environment variables are added.**
 - Triton issue related to `tl.language` not defined:
 **Solution: Make sure there's no other triton installed in your environment (pytorch-triton, etc).**
--- a/packages/pods/docs/implementation-plan.md
+++ b/packages/pods/docs/implementation-plan.md
@ -0,0 +1,183 @@
 # Implementation Plan
 ## Core Principles
 - TypeScript throughout
 - Clean, minimal code
 - Self-contained modules
 - Direct SSH execution (no remote manager)
 - All state in local JSON
 ## Package 1: Pod Setup Script Generation
 Generate and execute pod_setup.sh via SSH
 - [ ] `src/setup/generate-setup-script.ts` - Generate bash script as string
  - [ ] Detect CUDA driver version
  - [ ] Determine CUDA toolkit version needed
  - [ ] Generate uv/Python install commands
  - [ ] Generate venv creation commands
  - [ ] Generate pip install commands (torch, vLLM, etc.)
  - [ ] Handle model-specific vLLM versions (e.g., gpt-oss needs 0.10.1+gptoss)
  - [ ] Generate mount commands if --mount provided
  - [ ] Generate env var setup (HF_TOKEN, PI_API_KEY)
 - [ ] `src/setup/detect-hardware.ts` - Run nvidia-smi and parse GPU info
  - [ ] Execute nvidia-smi via SSH
  - [ ] Parse GPU count, names, memory
  - [ ] Return structured GPU info
 - [ ] `src/setup/execute-setup.ts` - Main setup orchestrator
  - [ ] Generate setup script
  - [ ] Copy and execute via SSH
  - [ ] Stream output to console
  - [ ] Handle Ctrl+C properly
  - [ ] Save GPU info to local config
 ## Package 2: Config Management
 Local JSON state management
 - [ ] `src/config/types.ts` - TypeScript interfaces
  - [ ] Pod interface (ssh, gpus, models, mount)
  - [ ] Model interface (model, port, gpu, pid)
  - [ ] GPU interface (id, name, memory)
 - [ ] `src/config/store.ts` - Read/write ~/.pi/pods.json
  - [ ] Load config (handle missing file)
  - [ ] Save config (atomic write)
  - [ ] Get active pod
  - [ ] Add/remove pods
  - [ ] Update model state
 ## Package 3: SSH Executor
 Clean SSH command execution
 - [ ] `src/ssh/executor.ts` - SSH command wrapper
  - [ ] Execute command with streaming output
  - [ ] Execute command with captured output
  - [ ] Handle SSH errors gracefully
  - [ ] Support Ctrl+C propagation
  - [ ] Support background processes (nohup)
 ## Package 4: Pod Commands
 Pod management CLI commands
 - [ ] `src/commands/pods-setup.ts` - pi pods setup
  - [ ] Parse args (name, ssh, mount)
  - [ ] Check env vars (HF_TOKEN, PI_API_KEY)
  - [ ] Call setup executor
  - [ ] Save pod to config
 - [ ] `src/commands/pods-list.ts` - pi pods
  - [ ] Load config
  - [ ] Display all pods with active marker
 - [ ] `src/commands/pods-active.ts` - pi pods active
  - [ ] Switch active pod
  - [ ] Update config
 - [ ] `src/commands/pods-remove.ts` - pi pods remove
  - [ ] Remove from config (not remote)
 ## Package 5: Model Management
 Model lifecycle management
 - [ ] `src/models/model-config.ts` - Known model configurations
  - [ ] Load models.md data structure
  - [ ] Match hardware to vLLM args
  - [ ] Get model-specific env vars
 - [ ] `src/models/download.ts` - Model download via HF
  - [ ] Check if model cached
  - [ ] Run huggingface-cli download
  - [ ] Stream progress to console
  - [ ] Handle Ctrl+C
 - [ ] `src/models/vllm-builder.ts` - Build vLLM command
  - [ ] Get base command for model
  - [ ] Add hardware-specific args
  - [ ] Add user --vllm args
  - [ ] Add port and API key
 ## Package 6: Model Commands
 Model management CLI commands
 - [ ] `src/commands/start.ts` - pi start
  - [ ] Parse model and args
  - [ ] Find next available port
  - [ ] Select GPU (round-robin)
  - [ ] Download if needed
  - [ ] Build and execute vLLM command
  - [ ] Wait for health check
  - [ ] Update config on success
 - [ ] `src/commands/stop.ts` - pi stop
  - [ ] Find model in config
  - [ ] Kill process via PID
  - [ ] Clean up config
 - [ ] `src/commands/list.ts` - pi list
  - [ ] Show models from config
  - [ ] Optionally verify PIDs
 - [ ] `src/commands/logs.ts` - pi logs
  - [ ] Tail log file via SSH
  - [ ] Handle Ctrl+C (stop tailing only)
 ## Package 7: Model Testing
 Quick model testing with tools
 - [ ] `src/prompt/tools.ts` - Tool definitions
  - [ ] Define ls, read, glob, rg tools
  - [ ] Format for OpenAI API
 - [ ] `src/prompt/client.ts` - OpenAI client wrapper
  - [ ] Create client for model endpoint
  - [ ] Handle streaming responses
  - [ ] Display thinking, tools, content
 - [ ] `src/commands/prompt.ts` - pi prompt
  - [ ] Get model endpoint from config
  - [ ] Augment prompt with CWD info
  - [ ] Send request with tools
  - [ ] Display formatted response
 ## Package 8: CLI Entry Point
 Main CLI with commander.js
 - [ ] `src/cli.ts` - Main entry point
  - [ ] Setup commander program
  - [ ] Register all commands
  - [ ] Handle global options (--pod override)
  - [ ] Error handling
 - [ ] `src/index.ts` - Package exports
 ## Testing Strategy
 - [ ] Test pod_setup.sh generation locally
 - [ ] Test on local machine with GPU
 - [ ] Test SSH executor with mock commands
 - [ ] Test config management with temp files
 - [ ] Integration test on real pod
 ## Dependencies
 ```json
 {
  "dependencies": {
    "commander": "^12.0.0",
    "@commander-js/extra-typings": "^12.0.0",
    "openai": "^4.0.0",
    "chalk": "^5.0.0",
    "ora": "^8.0.0"
  },
  "devDependencies": {
    "@types/node": "^22.0.0",
    "typescript": "^5.0.0",
    "tsx": "^4.0.0"
  }
 }
 ```
 ## Build & Distribution
 - [ ] TypeScript config for Node.js target
 - [ ] Build to dist/
 - [ ] npm package with bin entry
 - [ ] npx support
--- a/packages/pods/docs/kimi-k2.md
+++ b/packages/pods/docs/kimi-k2.md
@ -0,0 +1,197 @@
 # Kimi-K2 Deployment Guide
 > [!Note]
 > This guide only provides some examples of deployment commands for Kimi-K2, which may not be the optimal configuration. Since inference engines are still being updated frequently,  please continue to follow the guidance from their homepage if you want to achieve better inference performance.
 ## vLLM Deployment
 vLLM version v0.10.0rc1 or later is required.
 The smallest deployment unit for Kimi-K2 FP8 weights with 128k seqlen on mainstream H200 or H20 platform is a cluster with 16 GPUs with either Tensor Parallel (TP) or "data parallel + expert parallel" (DP+EP).
 Running parameters for this environment are provided below. You may scale up to more nodes and increase expert-parallelism to enlarge the inference batch size and overall throughput.
 ### Tensor Parallelism
 When the parallelism degree ≤ 16, you can run inference with pure Tensor Parallelism. A sample launch command is:
 ``` bash
 # start ray on node 0 and node 1
 # node 0:
 vllm serve $MODEL_PATH \
  --port 8000 \
  --served-model-name kimi-k2 \
  --trust-remote-code \
  --tensor-parallel-size 16 \
  --enable-auto-tool-choice \
  --tool-call-parser kimi_k2
 ```
 **Key parameter notes:**
 - `--tensor-parallel-size 16`: If using more than 16 GPUs, combine with pipeline-parallelism.
 - `--enable-auto-tool-choice`: Required when enabling tool usage.
 - `--tool-call-parser kimi_k2`: Required when enabling tool usage.
 ### Data Parallelism + Expert Parallelism
 You can install libraries like DeepEP and DeepGEMM as needed. Then run the command (example on H200):
 ``` bash
 # node 0
 vllm serve $MODEL_PATH --port 8000 --served-model-name kimi-k2 --trust-remote-code --data-parallel-size 16 --data-parallel-size-local 8 --data-parallel-address $MASTER_IP --data-parallel-rpc-port $PORT --enable-expert-parallel --max-num-batched-tokens 8192 --max-num-seqs 256 --gpu-memory-utilization 0.85 --enable-auto-tool-choice --tool-call-parser kimi_k2
 # node 1
 vllm serve $MODEL_PATH --headless --data-parallel-start-rank 8 --port 8000 --served-model-name kimi-k2 --trust-remote-code --data-parallel-size 16 --data-parallel-size-local 8 --data-parallel-address $MASTER_IP --data-parallel-rpc-port $PORT --enable-expert-parallel --max-num-batched-tokens 8192 --max-num-seqs 256 --gpu-memory-utilization 0.85 --enable-auto-tool-choice --tool-call-parser kimi_k2
 ```
 ## SGLang Deployment
 Similarly, we can use TP or DP+EP in SGLang for Deployment, here are the examples.
 ### Tensor Parallelism
 Here is the simple example code to run TP16 with two nodes on H200:
 ``` bash
 # Node 0
 python -m sglang.launch_server --model-path $MODEL_PATH --tp 16 --dist-init-addr $MASTER_IP:50000 --nnodes 2 --node-rank 0 --trust-remote-code --tool-call-parser kimi_k2
 # Node 1
 python -m sglang.launch_server --model-path $MODEL_PATH --tp 16 --dist-init-addr $MASTER_IP:50000 --nnodes 2 --node-rank 1 --trust-remote-code --tool-call-parser kimi_k2
 ```
 **Key parameter notes:**
 - `--tool-call-parser kimi_k2`: Required when enabling tool usage.
 ### Data Parallelism + Expert Parallelism
 Here is an example for large scale Prefill-Decode Disaggregation (4P12D H200) with DP+EP in SGLang:
 ``` bash
 # for prefill node
 MC_TE_METRIC=true SGLANG_DISAGGREGATION_HEARTBEAT_INTERVAL=10000000 SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=100000 SGLANG_DISAGGREGATION_WAITING_TIMEOUT=100000 PYTHONUNBUFFERED=1 \
 python -m sglang.launch_server --model-path $MODEL_PATH \
 --trust-remote-code --disaggregation-mode prefill --dist-init-addr $PREFILL_NODE0$:5757 --tp-size 32 --dp-size 32 --enable-dp-attention --host $LOCAL_IP --decode-log-interval 1 --disable-radix-cache --enable-deepep-moe --moe-dense-tp-size 1 --enable-dp-lm-head --disable-shared-experts-fusion --watchdog-timeout 1000000 --enable-two-batch-overlap --disaggregation-ib-device $IB_DEVICE --chunked-prefill-size 131072 --mem-fraction-static 0.85 --deepep-mode normal --ep-dispatch-algorithm dynamic --eplb-algorithm deepseek --max-running-requests 1024 --nnodes 4 --node-rank $RANK --tool-call-parser kimi_k2
 # for decode node
 SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK=480 MC_TE_METRIC=true SGLANG_DISAGGREGATION_HEARTBEAT_INTERVAL=10000000 SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=100000 SGLANG_DISAGGREGATION_WAITING_TIMEOUT=100000 PYTHONUNBUFFERED=1 \
 python -m sglang.launch_server --model-path $MODEL_PATH --trust-remote-code --disaggregation-mode decode --dist-init-addr $DECODE_NODE0:5757 --tp-size 96 --dp-size 96 --enable-dp-attention --host $LOCAL_IP --decode-log-interval 1 --context-length 2176 --disable-radix-cache --enable-deepep-moe --moe-dense-tp-size 1 --enable-dp-lm-head --disable-shared-experts-fusion --watchdog-timeout 1000000 --enable-two-batch-overlap --disaggregation-ib-device $IB_DEVICE  --deepep-mode low_latency --mem-fraction-static 0.8 --cuda-graph-bs 480 --max-running-requests 46080 --ep-num-redundant-experts 96 --nnodes 12 --node-rank $RANK --tool-call-parser kimi_k2
 # pdlb
 PYTHONUNBUFFERED=1 python -m sglang.srt.disaggregation.launch_lb --prefill http://${PREFILL_NODE0}:30000 --decode http://${DECODE_NODE0}:30000
 ```
 ## KTransformers Deployment
 Please copy all configuration files (i.e., everything except the .safetensors files) into the GGUF checkpoint folder at /path/to/K2. Then run:
 ``` bash
 python ktransformers/server/main.py  --model_path /path/to/K2 --gguf_path /path/to/K2 --cache_lens 30000
 ```
 To enable AMX optimization, run:
 ``` bash
 python ktransformers/server/main.py  --model_path /path/to/K2 --gguf_path /path/to/K2 --cache_lens 30000 --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve-amx.yaml
 ```
 ## TensorRT-LLM Deployment
 ### Prerequisite
 Please refer to [this guide](https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html) to build TensorRT-LLM v1.0.0-rc2 from source and start a TRT-LLM docker container.
 install blobfile by:
 ```bash
 pip install blobfile
 ```
 ### Multi-node Serving
 TensorRT-LLM supports multi-node inference. You can use mpirun to launch Kimi-K2 with multi-node jobs. We will use two nodes for this example.
 #### mpirun
 mpirun requires each node to have passwordless ssh access to the other node. We need to setup the environment inside the docker container. Run the container with host network and mount the current directory as well as model directory to the container.
 ```bash
 # use host network
 IMAGE=<YOUR_IMAGE>
 NAME=test_2node_docker
 # host1
 docker run -it --name ${NAME}_host1 --ipc=host --gpus=all --network host --privileged --ulimit memlock=-1 --ulimit stack=67108864 -v ${PWD}:/workspace -v <YOUR_MODEL_DIR>:/models/DeepSeek-V3 -w /workspace ${IMAGE}
 # host2
 docker run -it --name ${NAME}_host2 --ipc=host --gpus=all --network host --privileged --ulimit memlock=-1 --ulimit stack=67108864 -v ${PWD}:/workspace -v <YOUR_MODEL_DIR>:/models/DeepSeek-V3 -w /workspace ${IMAGE}
 ```
 Set up ssh inside the container
 ```bash
 apt-get update && apt-get install -y openssh-server
 # modify /etc/ssh/sshd_config
 PermitRootLogin yes
 PubkeyAuthentication yes
 # modify /etc/ssh/sshd_config, change default port 22 to another unused port
 port 2233
 # modify /etc/ssh
 ```
 Generate ssh key on host1 and copy to host2, vice versa.
 ```bash
 # on host1
 ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519
 ssh-copy-id -i ~/.ssh/id_ed25519.pub root@<HOST2>
 # on host2
 ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519
 ssh-copy-id -i ~/.ssh/id_ed25519.pub root@<HOST1>
 # restart ssh service on host1 and host2
 service ssh restart # or
 /etc/init.d/ssh restart # or
 systemctl restart ssh
 ```
 Generate additional config for trtllm serve.
 ```bash
 cat >/path/to/TensorRT-LLM/extra-llm-api-config.yml <<EOF
 cuda_graph_config:
  padding_enabled: true
  batch_sizes:
    - 1
    - 2
    - 4
    - 8
    - 16
    - 32
    - 64
    - 128
 print_iter_log: true
 enable_attention_dp: true
 EOF
 ```
 After the preparations,you can run the trtllm-serve on two nodes using mpirun:
 ```bash
 mpirun -np 16 \
 -H <HOST1>:8,<HOST2>:8 \
 -mca plm_rsh_args "-p 2233" \
 --allow-run-as-root \
 trtllm-llmapi-launch trtllm-serve serve \
 --backend pytorch \
 --tp_size 16 \
 --ep_size 8 \
 --kv_cache_free_gpu_memory_fraction 0.95 \
 --trust_remote_code \
 --max_batch_size 128 \
 --max_num_tokens 4096 \
 --extra_llm_api_options /path/to/TensorRT-LLM/extra-llm-api-config.yml \
 --port 8000 \
 <YOUR_MODEL_DIR>
 ```
 ## Others
 Kimi-K2 reuses the `DeepSeekV3CausalLM` architecture and convert it's weight into proper shape to save redevelopment effort. To let inference engines distinguish it from DeepSeek-V3 and apply the best optimizations, we set `"model_type": "kimi_k2"` in `config.json`.
 If you are using a framework that is not on the recommended list, you can still run the model by manually changing `model_type` to "deepseek_v3" in `config.json` as a temporary workaround. You may need to manually parse tool calls in case no tool call parser is available in your framework.
--- a/packages/pods/docs/models.md
+++ b/packages/pods/docs/models.md
@ -0,0 +1,116 @@
 ### Qwen-Coder
 - [ ] Qwen2.5-Coder-32B-Instruct
  - HF: Qwen/Qwen2.5-Coder-32B-Instruct
  - Hardware:
    - 1x H100/H200
      - --tool-call-parser hermes --enable-auto-tool-choice
    - 2x H100/H200
      - --tensor-parallel-size 2 --tool-call-parser hermes --enable-auto-tool-choice
  - Notes: Good balance of size and performance. Single GPU capable.
 - [ ] Qwen3-Coder-480B-A35B-Instruct (BF16)
  - HF: Qwen/Qwen3-Coder-480B-A35B-Instruct
  - Hardware:
    - 8x H200/H20
      - --tensor-parallel-size 8 --max-model-len 32000 --enable-auto-tool-choice --tool-call-parser qwen3_coder
      - Notes: Cannot serve full 262K context on single node. Reduce max-model-len or increase gpu-memory-utilization.
 - [ ] Qwen3-Coder-480B-A35B-Instruct-FP8
  - HF: Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8
  - Hardware:
    - 8x H200/H20
      - --max-model-len 131072 --enable-expert-parallel --data-parallel-size 8 --enable-auto-tool-choice --tool-call-parser qwen3_coder
      - Env: VLLM_USE_DEEP_GEMM=1
      - Notes: Use data-parallel mode (not tensor-parallel) to avoid weight quantization errors. DeepGEMM recommended.
 - [ ] Qwen3-Coder-30B-A3B-Instruct (BF16)
  - HF: Qwen/Qwen3-Coder-30B-A3B-Instruct
  - Hardware:
    - 1x H100/H200
      - --enable-auto-tool-choice --tool-call-parser qwen3_coder
      - Notes: Fits comfortably on single GPU. ~60GB model weight.
    - 2x H100/H200
      - --tensor-parallel-size 2 --enable-auto-tool-choice --tool-call-parser qwen3_coder
      - Notes: For higher throughput/longer context.
 - [ ] Qwen3-Coder-30B-A3B-Instruct-FP8
  - HF: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
  - Hardware:
    - 1x H100/H200
      - --enable-auto-tool-choice --tool-call-parser qwen3_coder
      - Env: VLLM_USE_DEEP_GEMM=1
      - Notes: FP8 quantized, ~30GB model weight. Excellent for single GPU deployment.
 ### GPT-OSS
 - Notes: Requires vLLM 0.10.1+gptoss. Built-in tools via /v1/responses endpoint (browsing, Python). Function calling not yet supported. --async-scheduling recommended for higher perf (not compatible with structured output).
 - [ ] GPT-OSS-20B
  - HF: openai/gpt-oss-20b
  - Hardware:
    - 1x H100/H200
      - --async-scheduling
    - 1x B200
      - --async-scheduling
      - Env: VLLM_USE_TRTLLM_ATTENTION=1 VLLM_USE_TRTLLM_DECODE_ATTENTION=1 VLLM_USE_TRTLLM_CONTEXT_ATTENTION=1 VLLM_USE_FLASHINFER_MXFP4_MOE=1
 - [ ] GPT-OSS-120B
  - HF: openai/gpt-oss-120b
  - Hardware:
    - 1x H100/H200
      - --async-scheduling
      - Notes: Needs --gpu-memory-utilization 0.95 --max-num-batched-tokens 1024 to avoid OOM
    - 2x H100/H200
      - --tensor-parallel-size 2 --async-scheduling
      - Notes: Set --gpu-memory-utilization <0.95 to avoid OOM
    - 4x H100/H200
      - --tensor-parallel-size 4 --async-scheduling
    - 8x H100/H200
      - --tensor-parallel-size 8 --async-scheduling --max-model-len 131072 --max-num-batched-tokens 10240 --max-num-seqs 128 --gpu-memory-utilization 0.85 --no-enable-prefix-caching
    - 1x B200
      - --async-scheduling
      - Env: VLLM_USE_TRTLLM_ATTENTION=1 VLLM_USE_TRTLLM_DECODE_ATTENTION=1 VLLM_USE_TRTLLM_CONTEXT_ATTENTION=1 VLLM_USE_FLASHINFER_MXFP4_MOE=1
    - 2x B200
      - --tensor-parallel-size 2 --async-scheduling
      - Env: VLLM_USE_TRTLLM_ATTENTION=1 VLLM_USE_TRTLLM_DECODE_ATTENTION=1 VLLM_USE_TRTLLM_CONTEXT_ATTENTION=1 VLLM_USE_FLASHINFER_MXFP4_MOE=1
 ### GLM-4.5
 - Notes: Listed configs support reduced context. For full 128K context, double the GPU count. Models default to thinking mode (disable with API param).
 - [ ] GLM-4.5 (BF16)
  - HF: zai-org/GLM-4.5
  - Hardware:
    - 16x H100
      - --tensor-parallel-size 16 --tool-call-parser glm45 --reasoning-parser glm45 --enable-auto-tool-choice
    - 8x H200
      - --tensor-parallel-size 8 --tool-call-parser glm45 --reasoning-parser glm45 --enable-auto-tool-choice
  - Notes: On 8x H100, may need --cpu-offload-gb 16 to avoid OOM. For full 128K: needs 32x H100 or 16x H200.
 - [ ] GLM-4.5-FP8
  - HF: zai-org/GLM-4.5-FP8
  - Hardware:
    - 8x H100
      - --tensor-parallel-size 8 --tool-call-parser glm45 --reasoning-parser glm45 --enable-auto-tool-choice
    - 4x H200
      - --tensor-parallel-size 4 --tool-call-parser glm45 --reasoning-parser glm45 --enable-auto-tool-choice
  - Notes: For full 128K context: needs 16x H100 or 8x H200.
 - [ ] GLM-4.5-Air (BF16)
  - HF: zai-org/GLM-4.5-Air
  - Hardware:
    - 4x H100
      - --tensor-parallel-size 4 --tool-call-parser glm45 --reasoning-parser glm45 --enable-auto-tool-choice
    - 2x H200
      - --tensor-parallel-size 2 --tool-call-parser glm45 --reasoning-parser glm45 --enable-auto-tool-choice
  - Notes: For full 128K context: needs 8x H100 or 4x H200.
 - [ ] GLM-4.5-Air-FP8
  - HF: zai-org/GLM-4.5-Air-FP8
  - Hardware:
    - 2x H100
      - --tensor-parallel-size 2 --tool-call-parser glm45 --reasoning-parser glm45 --enable-auto-tool-choice
    - 1x H200
      - --tensor-parallel-size 1 --tool-call-parser glm45 --reasoning-parser glm45 --enable-auto-tool-choice
  - Notes: For full 128K context: needs 4x H100 or 2x H200.
 ### Kimi
 - Notes: Requires vLLM v0.10.0rc1+. Minimum 16 GPUs for FP8 with 128k context. Reuses DeepSeekV3 architecture with model_type="kimi_k2".
 - [ ] Kimi-K2-Instruct
  - HF: moonshotai/Kimi-K2-Instruct
  - Hardware:
    - 16x H200/H20
      - --tensor-parallel-size 16 --trust-remote-code --enable-auto-tool-choice --tool-call-parser kimi_k2
      - Notes: Pure TP mode. For >16 GPUs, combine with pipeline-parallelism.
    - 16x H200/H20 (DP+EP mode)
      - --data-parallel-size 16 --data-parallel-size-local 8 --enable-expert-parallel --max-num-batched-tokens 8192 --max-num-seqs 256 --gpu-memory-utilization 0.85 --trust-remote-code --enable-auto-tool-choice --tool-call-parser kimi_k2
      - Notes: Data parallel + expert parallel mode for higher throughput. Requires multi-node setup with proper networking.
--- a/packages/pods/docs/plan.md
+++ b/packages/pods/docs/plan.md
@ -0,0 +1,166 @@
 ## Pi
 Pi automates vLLM deployment on GPU pods from DataCrunch, Vast.ai, Prime Intellect, RunPod (or any Ubuntu machine with NVIDIA GPUs). It manages multiple concurrent model deployments via separate vLLM instances, each accessible through the OpenAI API protocol with API key authentication.
 Pods are treated as ephemeral - spin up when needed, tear down when done. To avoid re-downloading models (30+ minutes for 100GB+ models), pi uses persistent network volumes for model storage that can be shared across pods on the same provider. This minimizes both cost (only pay for active compute) and setup time (models already cached).
 ## Usage
 ### Pods
 ```bash
 pi pods setup dc1 "ssh root@1.2.3.4" --mount "mount -t nfs..."  # Setup pod (requires HF_TOKEN, PI_API_KEY env vars)
 pi pods                              # List all pods (* = active)
 pi pods active dc2                   # Switch active pod
 pi pods remove dc1                   # Remove pod
 ```
 ### Models
 ```bash
 pi start Qwen/Qwen2.5-72B-Instruct --name qwen72b          # Known model - pi handles vLLM args
 pi start some/unknown-model --name mymodel --vllm --tensor-parallel-size 4 --max-model-len 32768  # Custom vLLM args
 pi list                              # List running models with ports
 pi stop qwen72b                      # Stop model
 pi logs qwen72b                      # View model logs
 ```
 For known models, pi automatically configures appropriate vLLM arguments from model documentation based on the hardware of the pod. For unknown models or custom configurations, pass vLLM args after `--vllm`.
 ## Pod management
 Pi manages GPU pods from various providers (DataCrunch, Vast.ai, Prime Intellect, RunPod) as ephemeral compute resources. Users manually create pods via provider dashboards, then register them with pi for automated setup and management.
 Key capabilities:
 - **Pod setup**: Transform bare Ubuntu/Debian machines into vLLM-ready environments in ~2 minutes
 - **Model caching**: Optional persistent storage shared by pods to avoid re-downloading 100GB+ models
 - **Multi-pod management**: Register multiple pods, switch between them, maintain different environments
 ### Pod setup
 When a user creates a fresh pod on a provider, they register it with pi using the SSH command from the provider:
 ```bash
 pi pods setup dc1 "ssh root@1.2.3.4" --mount "mount -t nfs..."
 ```
 This copies and executes `pod_setup.sh` which:
 1. Detects GPUs via `nvidia-smi` and stores count/memory in local config
 2. Installs CUDA toolkit matching the driver version
 3. Creates Python environment
   - Installs uv and Python 3.12
   - Creates venv at ~/venv with PyTorch (--torch-backend=auto)
   - Installs vLLM (model-specific versions when needed)
   - Installs FlashInfer (builds from source if required)
   - Installs huggingface-hub (for model downloads)
   - Installs hf-transfer (for accelerated downloads)
 4. Mounts persistent storage if provided
   - Symlinks to ~/.cache/huggingface for model caching
 5. Configures environment variables persistently
 Required environment variables:
 - `HF_TOKEN`: HuggingFace token for model downloads
 - `PI_API_KEY`: API key for securing vLLM endpoints
 ### Model caching
 Models can be 100GB+ and take 30+ minutes to download. The `--mount` flag enables persistent model caching:
 - **DataCrunch**: NFS shared filesystems, mountable across multiple running pods in same region
 - **RunPod**: Network volumes persist independently but cannot be shared between running pods
 - **Vast.ai**: Volumes locked to specific machine - no sharing
 - **Prime Intellect**: No persistent storage documented
 Without `--mount`, models download to pod-local storage and are lost on termination.
 ### Multi-pod management
 Users can register multiple pods and switch between them:
 ```bash
 pi pods                    # List all pods (* = active)
 pi pods active dc2         # Switch active pod
 pi pods remove dc1         # Remove pod from local config but doesn't destroy pod remotely.
 ```
 All model commands (`pi start`, `pi stop`, etc.) target the active pod, unless `--pod <podname>` is given, which overrides the active pod for that command.
 ## Model deployment
 Pi uses direct SSH commands to manage vLLM instances on pods. No remote manager component is needed - everything is controlled from the local pi CLI.
 ### Architecture
 The pi CLI maintains all state locally in `~/.pi/pods.json`:
 ```json
 {
  "pods": {
    "dc1": {
      "ssh": "ssh root@1.2.3.4",
      "gpus": [
        {"id": 0, "name": "H100", "memory": "80GB"},
        {"id": 1, "name": "H100", "memory": "80GB"}
      ],
      "models": {
        "qwen": {
          "model": "Qwen/Qwen2.5-72B",
          "port": 8001,
          "gpu": "0",
          "pid": 12345
        }
      }
    }
  },
  "active": "dc1"
 }
 ```
 The location of the pi config dir can also be specified via the `PI_CONFIG_DIR` env var, e.g. for testing.
 Pods are assumed to be fully managed by pi - no other processes compete for ports or GPUs.
 ### Starting models
 When user runs `pi start Qwen/Qwen2.5-72B --name qwen`:
 1. CLI determines next available port (starting from 8001)
 2. Selects GPU (round-robin based on stored GPU info)
 3. Downloads model if not cached:
   - Sets `HF_HUB_ENABLE_HF_TRANSFER=1` for fast downloads
   - Runs via SSH with output piped to local terminal
   - Ctrl+C cancels download and returns control
 4. Builds vLLM command with appropriate args and PI_API_KEY
 5. Executes via SSH: `ssh pod "nohup vllm serve ... > ~/.vllm_logs/qwen.log 2>&1 & echo $!"`
 6. Waits for vLLM to be ready (checks health endpoint)
 7. On success: stores port, GPU, PID in local state
 8. On failure: shows exact error from vLLM logs, doesn't save to config
 ### Managing models
 - **List**: Show models from local state, optionally verify PIDs still running
 - **Stop**: SSH to kill process by PID
 - **Logs**: SSH to tail -f log files (Ctrl+C stops tailing, doesn't kill vLLM)
 ### Error handling
 - **SSH failures**: Prompt user to check connection or remove pod from config
 - **Stale state**: Commands that fail with "process not found" auto-clean local state
 - **Setup failures**: Ctrl+C during setup kills remote script and exits cleanly
 ### Testing models
 The `pi prompt` command provides a quick way to test deployed models:
 ```bash
 pi prompt qwen "What is 2+2?"                    # Simple prompt
 pi prompt qwen "Read file.txt and summarize"     # Uses built-in tools
 ```
 Built-in tools for agentic testing:
 - `ls(path, ignore?)`: List files and directories at path, with optional ignore patterns
 - `read(file_path, offset?, limit?)`: Read file contents with optional line offset/limit
 - `glob(pattern, path?)`: Find files matching glob pattern (e.g., "**/*.py", "src/**/*.ts")
 - `rg(args)`: Run ripgrep with any arguments (e.g., "pattern -t py -C 3", "TODO --type-not test")
 The provided prompt will be augmented with info on the current local working directory. File tools expect absolute paths.
 This allows testing basic agent capabilities without external tool configuration.
 `prompt` is implemented using the latest OpenAI SDK for NodeJS. It outputs thinking content, tool calls and results, and normal assistant messages.
 ## Models
 We want to support these models specifically, with alternative models being marked as "possibly works". This list will be updated with new models regularly. A checked
 box means "supported".
 See [models.md](./models.md) for a list of models, their HW reqs, vLLM args and notes, we want to support out of the box with a simple `pi start <model-name> --name <local-name>`
--- a/packages/pods/docs/qwen3-coder.md
+++ b/packages/pods/docs/qwen3-coder.md
@ -0,0 +1,132 @@
 # Qwen3-Coder Usage Guide
 [Qwen3-Coder](https://github.com/QwenLM/Qwen3-Coder) is an advanced large language model created by the Qwen team from Alibaba Cloud. vLLM already supports Qwen3-Coder, and `tool-call` functionality will be available in vLLM v0.10.0 and higher You can install vLLM with `tool-call` support using the following method:
 ## Installing vLLM
 ```bash
 uv venv
 source .venv/bin/activate
 uv pip install -U vllm --torch-backend auto
 ```
 ## Launching Qwen3-Coder with vLLM
 ### Serving on 8xH200 (or H20) GPUs (141GB × 8)
 **BF16 Model**
 ```bash
 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct \
  --tensor-parallel-size 8 \
  --max-model-len 32000 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder
 ```
 **FP8 Model**
 ```bash
 VLLM_USE_DEEP_GEMM=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
  --max-model-len 131072 \
  --enable-expert-parallel \
  --data-parallel-size 8 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder
 ```
 ## Performance Metrics
 ### Evaluation
 We launched `Qwen3-Coder-480B-A35B-Instruct-FP8` using vLLM and evaluated its performance using  [EvalPlus](https://github.com/evalplus/evalplus). The results are displayed below:
 | Dataset | Test Type | Pass@1 Score |
 |-----------|-----------|--------------|
 | HumanEval | Base tests | 0.939 |
 | HumanEval+ | Base + extra tests | 0.902 |
 | MBPP | Base tests | 0.918 |
 | MBPP+ | Base + extra tests | 0.794 |
 ### Benchmarking
 We used the following script to benchmark `Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8`
 ```bash
 vllm bench serve \
  --backend vllm \
  --model Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
  --endpoint /v1/completions \
  --dataset-name random \
  --random-input 2048 \
  --random-output 1024 \
  --max-concurrency 10 \
  --num-prompt 100 \
 ```
 If successful, you will see the following output.
 ```shell
 ============ Serving Benchmark Result ============
 Successful requests:                     100
 Benchmark duration (s):                  776.49
 Total input tokens:                      204169
 Total generated tokens:                  102400
 Request throughput (req/s):              0.13
 Output token throughput (tok/s):         131.88
 Total Token throughput (tok/s):          394.81
 ---------------Time to First Token----------------
 Mean TTFT (ms):                          7639.31
 Median TTFT (ms):                        6935.71
 P99 TTFT (ms):                           13766.68
 -----Time per Output Token (excl. 1st token)------
 Mean TPOT (ms):                          68.43
 Median TPOT (ms):                        67.23
 P99 TPOT (ms):                           72.14
 ---------------Inter-token Latency----------------
 Mean ITL (ms):                           68.43
 Median ITL (ms):                         66.34
 P99 ITL (ms):                            69.38
 ==================================================
 ```
 ## Using Tips
 ### BF16 Models
 - **Context Length Limitation**: A single H20 node cannot serve the original context length (262144). You can reduce the `max-model-len` or increase `gpu-memory-utilization` to work within memory constraints.
 ### FP8 Models
 - **Context Length Limitation**: A single H20 node cannot serve the original context length (262144). You can reduce the `max-model-len` or increase `gpu-memory-utilization` to work within memory constraints.
 - **DeepGEMM Usage**: To use [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM), set `VLLM_USE_DEEP_GEMM=1`. Follow the [setup instructions](https://github.com/vllm-project/vllm/blob/main/benchmarks/kernels/deepgemm/README.md#setup) to install it.
 - **Tensor Parallelism Issue**: When using `tensor-parallel-size 8`, the following failures are expected. Switch to data-parallel mode using `--data-parallel-size`.
 - **Additional Resources**: Refer to the [Data Parallel Deployment documentation](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment.html) for more parallelism groups.
 ```shell
 ERROR [multiproc_executor.py:511]   File "/vllm/vllm/model_executor/models/qwen3_moe.py", line 336, in <lambda>
 ERROR [multiproc_executor.py:511]     lambda prefix: Qwen3MoeDecoderLayer(config=config,
 ERROR [multiproc_executor.py:511]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 ERROR [multiproc_executor.py:511]   File "/vllm/vllm/model_executor/models/qwen3_moe.py", line 278, in __init__
 ERROR [multiproc_executor.py:511]     self.mlp = Qwen3MoeSparseMoeBlock(config=config,
 ERROR [multiproc_executor.py:511]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 ERROR [multiproc_executor.py:511]   File "/vllm/vllm/model_executor/models/qwen3_moe.py", line 113, in __init__
 ERROR [multiproc_executor.py:511]     self.experts = FusedMoE(num_experts=config.num_experts,
 ERROR [multiproc_executor.py:511]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 ERROR [multiproc_executor.py:511]   File "/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 773, in __init__
 ERROR [multiproc_executor.py:511]     self.quant_method.create_weights(layer=self, **moe_quant_params)
 ERROR [multiproc_executor.py:511]   File "/vllm/vllm/model_executor/layers/quantization/fp8.py", line 573, in create_weights
 ERROR [multiproc_executor.py:511]     raise ValueError(
 ERROR [multiproc_executor.py:511] ValueError: The output_size of gate's and up's weight = 320 is not divisible by weight quantization block_n = 128.
 ```
 ### Tool Calling
 - **Enable Tool Calls**: Add `--tool-call-parser qwen3_coder` to enable tool call parsing functionality, please refer to: [tool_calling](https://docs.vllm.ai/en/latest/features/tool_calling.html)
 ## Roadmap
 - [x] Add benchmark results
 ## Additional Resources
 - [EvalPlus](https://github.com/evalplus/evalplus)
 - [Qwen3-Coder](https://github.com/QwenLM/Qwen3-Coder)
 - [vLLM Documentation](https://docs.vllm.ai/)
--- a/packages/pods/package-lock.json
+++ b/packages/pods/package-lock.json
--- a/packages/pods/package.json
+++ b/packages/pods/package.json
@ -0,0 +1,40 @@
 {
 	"name": "@mariozechner/pi",
 	"version": "0.5.0",
 	"description": "CLI tool for managing vLLM deployments on GPU pods",
 	"type": "module",
 	"bin": {
 		"pi": "./dist/cli.js"
 	},
 	"scripts": {
 		"clean": "rm -rf dist tsconfig.tsbuildinfo",
 		"build": "tsc -p tsconfig.build.json && chmod +x dist/cli.js && cp src/models.json dist/",
 		"check": "biome check --write .",
 		"prepublishOnly": "npm run clean && npm run build"
 	},
 	"files": [
 		"dist"
 	],
 	"keywords": [
 		"llm",
 		"vllm",
 		"gpu",
 		"ai",
 		"cli"
 	],
 	"author": "Mario Zechner",
 	"license": "MIT",
 	"repository": {
 		"type": "git",
 		"url": "https://github.com/badlogic/pi-mono.git",
 		"directory": "packages/pods"
 	},
 	"engines": {
 		"node": ">=20.0.0"
 	},
 	"dependencies": {
 		"@mariozechner/pi-agent": "^0.5.0",
 		"chalk": "^5.5.0"
 	},
 	"devDependencies": {}
 }
--- a/packages/pods/scripts/model_run.sh
+++ b/packages/pods/scripts/model_run.sh
@ -0,0 +1,83 @@
 #!/usr/bin/env bash
 # Model runner script - runs sequentially, killed by pi stop
 set -euo pipefail
 # These values are replaced before upload by pi CLI
 MODEL_ID="{{MODEL_ID}}"
 NAME="{{NAME}}"
 PORT="{{PORT}}"
 VLLM_ARGS="{{VLLM_ARGS}}"
 # Trap to ensure cleanup on exit and kill any child processes
 cleanup() {
    local exit_code=$?
    echo "Model runner exiting with code $exit_code"
    # Kill any child processes
    pkill -P $$ 2>/dev/null || true
    exit $exit_code
 }
 trap cleanup EXIT TERM INT
 # Force colored output even when not a TTY
 export FORCE_COLOR=1
 export PYTHONUNBUFFERED=1
 export TERM=xterm-256color
 export RICH_FORCE_TERMINAL=1
 export CLICOLOR_FORCE=1
 # Source virtual environment
 source /root/venv/bin/activate
 echo "========================================="
 echo "Model Run: $NAME"
 echo "Model ID: $MODEL_ID"
 echo "Port: $PORT"
 if [ -n "$VLLM_ARGS" ]; then
    echo "vLLM Args: $VLLM_ARGS"
 fi
 echo "========================================="
 echo ""
 # Download model (with color progress bars)
 echo "Downloading model (will skip if cached)..."
 HF_HUB_ENABLE_HF_TRANSFER=1 hf download "$MODEL_ID"
 if [ $? -ne 0 ]; then
    echo "❌ ERROR: Failed to download model" >&2
    exit 1
 fi
 echo ""
 echo "✅ Model download complete"
 echo ""
 # Build vLLM command
 VLLM_CMD="vllm serve '$MODEL_ID' --port $PORT --api-key '$PI_API_KEY'"
 if [ -n "$VLLM_ARGS" ]; then
    VLLM_CMD="$VLLM_CMD $VLLM_ARGS"
 fi
 echo "Starting vLLM server..."
 echo "Command: $VLLM_CMD"
 echo "========================================="
 echo ""
 # Run vLLM in background so we can monitor it
 echo "Starting vLLM process..."
 bash -c "$VLLM_CMD" &
 VLLM_PID=$!
 # Monitor the vLLM process
 echo "Monitoring vLLM process (PID: $VLLM_PID)..."
 wait $VLLM_PID
 VLLM_EXIT_CODE=$?
 if [ $VLLM_EXIT_CODE -ne 0 ]; then
    echo "❌ ERROR: vLLM exited with code $VLLM_EXIT_CODE" >&2
    # Make sure to exit the script command too
    kill -TERM $$ 2>/dev/null || true
    exit $VLLM_EXIT_CODE
 fi
 echo "✅ vLLM exited normally"
 exit 0
--- a/packages/pods/scripts/pod_setup.sh
+++ b/packages/pods/scripts/pod_setup.sh
@ -0,0 +1,334 @@
 #!/usr/bin/env bash
 # GPU pod bootstrap for vLLM deployment
 set -euo pipefail
 # Parse arguments passed from pi CLI
 MOUNT_COMMAND=""
 MODELS_PATH=""
 HF_TOKEN=""
 PI_API_KEY=""
 VLLM_VERSION="release"  # Default to release
 while [[ $# -gt 0 ]]; do
    case $1 in
        --mount)
            MOUNT_COMMAND="$2"
            shift 2
            ;;
        --models-path)
            MODELS_PATH="$2"
            shift 2
            ;;
        --hf-token)
            HF_TOKEN="$2"
            shift 2
            ;;
        --vllm-api-key)
            PI_API_KEY="$2"
            shift 2
            ;;
        --vllm)
            VLLM_VERSION="$2"
            shift 2
            ;;
        *)
            echo "ERROR: Unknown option: $1" >&2
            exit 1
            ;;
    esac
 done
 # Validate required parameters
 if [ -z "$HF_TOKEN" ]; then
    echo "ERROR: HF_TOKEN is required" >&2
    exit 1
 fi
 if [ -z "$PI_API_KEY" ]; then
    echo "ERROR: PI_API_KEY is required" >&2
    exit 1
 fi
 if [ -z "$MODELS_PATH" ]; then
    echo "ERROR: MODELS_PATH is required" >&2
    exit 1
 fi
 echo "=== Starting pod setup ==="
 # Install system dependencies
 apt update -y
 apt install -y python3-pip python3-venv git build-essential cmake ninja-build curl wget lsb-release htop pkg-config
 # --- Install matching CUDA toolkit -------------------------------------------
 echo "Checking CUDA driver version..."
 DRIVER_CUDA_VERSION=$(nvidia-smi | grep "CUDA Version" | awk '{print $9}')
 echo "Driver supports CUDA: $DRIVER_CUDA_VERSION"
 # Check if nvcc exists and its version
 if command -v nvcc &> /dev/null; then
    NVCC_VERSION=$(nvcc --version | grep "release" | awk '{print $6}' | cut -d, -f1)
    echo "Current nvcc version: $NVCC_VERSION"
 else
    NVCC_VERSION="none"
    echo "nvcc not found"
 fi
 # Install CUDA toolkit matching driver version if needed
 if [[ "$NVCC_VERSION" != "$DRIVER_CUDA_VERSION" ]]; then
    echo "Installing CUDA Toolkit $DRIVER_CUDA_VERSION to match driver..."
    # Detect Ubuntu version
    UBUNTU_VERSION=$(lsb_release -rs)
    UBUNTU_CODENAME=$(lsb_release -cs)
    echo "Detected Ubuntu $UBUNTU_VERSION ($UBUNTU_CODENAME)"
    # Map Ubuntu version to NVIDIA repo path
    if [[ "$UBUNTU_VERSION" == "24.04" ]]; then
        REPO_PATH="ubuntu2404"
    elif [[ "$UBUNTU_VERSION" == "22.04" ]]; then
        REPO_PATH="ubuntu2204"
    elif [[ "$UBUNTU_VERSION" == "20.04" ]]; then
        REPO_PATH="ubuntu2004"
    else
        echo "Warning: Unsupported Ubuntu version $UBUNTU_VERSION, trying ubuntu2204"
        REPO_PATH="ubuntu2204"
    fi
    # Add NVIDIA package repositories
    wget https://developer.download.nvidia.com/compute/cuda/repos/${REPO_PATH}/x86_64/cuda-keyring_1.1-1_all.deb
    dpkg -i cuda-keyring_1.1-1_all.deb
    rm cuda-keyring_1.1-1_all.deb
    apt-get update
    # Install specific CUDA toolkit version
    # Convert version format (12.9 -> 12-9)
    CUDA_VERSION_APT=$(echo $DRIVER_CUDA_VERSION | sed 's/\./-/')
    echo "Installing cuda-toolkit-${CUDA_VERSION_APT}..."
    apt-get install -y cuda-toolkit-${CUDA_VERSION_APT}
    # Add CUDA to PATH
    export PATH=/usr/local/cuda-${DRIVER_CUDA_VERSION}/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda-${DRIVER_CUDA_VERSION}/lib64:${LD_LIBRARY_PATH:-}
    # Verify installation
    nvcc --version
 else
    echo "CUDA toolkit $NVCC_VERSION matches driver version"
    export PATH=/usr/local/cuda-${DRIVER_CUDA_VERSION}/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda-${DRIVER_CUDA_VERSION}/lib64:${LD_LIBRARY_PATH:-}
 fi
 # --- Install uv (fast Python package manager) --------------------------------
 curl -LsSf https://astral.sh/uv/install.sh | sh
 export PATH="$HOME/.local/bin:$PATH"
 # --- Install Python 3.12 if not available ------------------------------------
 if ! command -v python3.12 &> /dev/null; then
    echo "Python 3.12 not found. Installing via uv..."
    uv python install 3.12
 fi
 # --- Clean up existing environments and caches -------------------------------
 echo "Cleaning up existing environments and caches..."
 # Remove existing venv for a clean installation
 VENV="$HOME/venv"
 if [ -d "$VENV" ]; then
    echo "Removing existing virtual environment..."
    rm -rf "$VENV"
 fi
 # Remove uv cache to ensure fresh installs
 if [ -d "$HOME/.cache/uv" ]; then
    echo "Clearing uv cache..."
    rm -rf "$HOME/.cache/uv"
 fi
 # Remove vLLM cache to avoid conflicts
 if [ -d "$HOME/.cache/vllm" ]; then
    echo "Clearing vLLM cache..."
    rm -rf "$HOME/.cache/vllm"
 fi
 # --- Create and activate venv ------------------------------------------------
 echo "Creating fresh virtual environment..."
 uv venv --python 3.12 --seed "$VENV"
 source "$VENV/bin/activate"
 # --- Install PyTorch and vLLM ------------------------------------------------
 echo "Installing vLLM and dependencies (version: $VLLM_VERSION)..."
 case "$VLLM_VERSION" in
    release)
        echo "Installing vLLM release with PyTorch..."
        # Install vLLM with automatic PyTorch backend selection
        # vLLM will automatically install the correct PyTorch version
        uv pip install vllm>=0.10.0 --torch-backend=auto || {
            echo "ERROR: Failed to install vLLM"
            exit 1
        }
        ;;
    nightly)
        echo "Installing vLLM nightly with PyTorch..."
        echo "This will install the latest nightly build of vLLM..."
        # Install vLLM nightly with PyTorch
        uv pip install -U vllm \
            --torch-backend=auto \
            --extra-index-url https://wheels.vllm.ai/nightly || {
            echo "ERROR: Failed to install vLLM nightly"
            exit 1
        }
        echo "vLLM nightly successfully installed!"
        ;;
    gpt-oss)
        echo "Installing GPT-OSS special build with PyTorch nightly..."
        echo "WARNING: This build is ONLY for GPT-OSS models!"
        echo "Installing PyTorch nightly and cutting-edge dependencies..."
        # Convert CUDA version format for PyTorch (12.4 -> cu124)
        PYTORCH_CUDA="cu$(echo $DRIVER_CUDA_VERSION | sed 's/\.//')"
        echo "Using PyTorch nightly with ${PYTORCH_CUDA} (driver supports ${DRIVER_CUDA_VERSION})"
        # The GPT-OSS build will pull PyTorch nightly and other dependencies
        # via the extra index URLs. We don't pre-install torch here to avoid conflicts.
        uv pip install --pre vllm==0.10.1+gptoss \
            --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
            --extra-index-url https://download.pytorch.org/whl/nightly/${PYTORCH_CUDA} \
            --index-strategy unsafe-best-match || {
            echo "ERROR: Failed to install GPT-OSS vLLM build"
            echo "This automatically installs PyTorch nightly with ${PYTORCH_CUDA}, Triton nightly, and other dependencies"
            exit 1
        }
        # Install gpt-oss library for tool support
        uv pip install gpt-oss || {
            echo "WARNING: Failed to install gpt-oss library (needed for tool use)"
        }
        ;;
    *)
        echo "ERROR: Unknown vLLM version: $VLLM_VERSION"
        exit 1
        ;;
 esac
 # --- Install additional packages ---------------------------------------------
 echo "Installing additional packages..."
 uv pip install huggingface-hub psutil tensorrt hf_transfer
 # --- FlashInfer installation (optional, improves performance) ----------------
 echo "Attempting FlashInfer installation (optional)..."
 if uv pip install flashinfer-python; then
    echo "FlashInfer installed successfully"
 else
    echo "FlashInfer not available, using Flash Attention instead"
 fi
 # --- Mount storage if provided -----------------------------------------------
 if [ -n "$MOUNT_COMMAND" ]; then
    echo "Setting up mount..."
    # Create mount point directory if it doesn't exist
    mkdir -p "$MODELS_PATH"
    # Execute the mount command
    eval "$MOUNT_COMMAND" || {
        echo "WARNING: Mount command failed, continuing without mount"
    }
    # Verify mount succeeded (optional, may not always be a mount point)
    if mountpoint -q "$MODELS_PATH" 2>/dev/null; then
        echo "Storage successfully mounted at $MODELS_PATH"
    else
        echo "Note: $MODELS_PATH is not a mount point (might be local storage)"
    fi
 fi
 # --- Model storage setup ------------------------------------------------------
 echo ""
 echo "=== Setting up model storage ==="
 echo "Storage path: $MODELS_PATH"
 # Check if the path exists and is writable
 if [ ! -d "$MODELS_PATH" ]; then
    echo "Creating model storage directory: $MODELS_PATH"
    mkdir -p "$MODELS_PATH"
 fi
 if [ ! -w "$MODELS_PATH" ]; then
    echo "ERROR: Model storage path is not writable: $MODELS_PATH"
    echo "Please check permissions"
    exit 1
 fi
 # Create the huggingface cache directory structure in the models path
 mkdir -p "${MODELS_PATH}/huggingface/hub"
 # Remove any existing cache directory or symlink
 if [ -e ~/.cache/huggingface ] || [ -L ~/.cache/huggingface ]; then
    echo "Removing existing ~/.cache/huggingface..."
    rm -rf ~/.cache/huggingface 2>/dev/null || true
 fi
 # Create parent directory if needed
 mkdir -p ~/.cache
 # Create symlink from ~/.cache/huggingface to the models path
 ln -s "${MODELS_PATH}/huggingface" ~/.cache/huggingface
 echo "Created symlink: ~/.cache/huggingface -> ${MODELS_PATH}/huggingface"
 # Verify the symlink works
 if [ -d ~/.cache/huggingface/hub ]; then
    echo "✓ Model storage configured successfully"
    # Check available space
    AVAILABLE_SPACE=$(df -h "$MODELS_PATH" | awk 'NR==2 {print $4}')
    echo "Available space: $AVAILABLE_SPACE"
 else
    echo "ERROR: Could not verify model storage setup"
    echo "The symlink was created but the target directory is not accessible"
    exit 1
 fi
 # --- Configure environment ----------------------------------------------------
 mkdir -p ~/.config/vllm
 touch ~/.config/vllm/do_not_track
 # Write environment to .bashrc for persistence
 cat >> ~/.bashrc << EOF
 # Pi vLLM environment
 [ -d "\$HOME/venv" ] && source "\$HOME/venv/bin/activate"
 export PATH="/usr/local/cuda-${DRIVER_CUDA_VERSION}/bin:\$HOME/.local/bin:\$PATH"
 export LD_LIBRARY_PATH="/usr/local/cuda-${DRIVER_CUDA_VERSION}/lib64:\${LD_LIBRARY_PATH:-}"
 export HF_TOKEN="${HF_TOKEN}"
 export PI_API_KEY="${PI_API_KEY}"
 export HUGGING_FACE_HUB_TOKEN="${HF_TOKEN}"
 export HF_HUB_ENABLE_HF_TRANSFER=1
 export VLLM_NO_USAGE_STATS=1
 export VLLM_DO_NOT_TRACK=1
 export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
 export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
 EOF
 # Create log directory for vLLM
 mkdir -p ~/.vllm_logs
 # --- Output GPU info for pi CLI to parse -------------------------------------
 echo ""
 echo "===GPU_INFO_START==="
 nvidia-smi --query-gpu=index,name,memory.total --format=csv,noheader | while IFS=, read -r id name memory; do
    # Trim whitespace
    id=$(echo "$id" | xargs)
    name=$(echo "$name" | xargs)
    memory=$(echo "$memory" | xargs)
    echo "{\"id\": $id, \"name\": \"$name\", \"memory\": \"$memory\"}"
 done
 echo "===GPU_INFO_END==="
 echo ""
 echo "=== Setup complete ==="
 echo "Pod is ready for vLLM deployments"
 echo "Models will be cached at: $MODELS_PATH"
--- a/packages/pods/src/cli.ts
+++ b/packages/pods/src/cli.ts
@ -0,0 +1,362 @@
 #!/usr/bin/env node
 import chalk from "chalk";
 import { spawn } from "child_process";
 import { readFileSync } from "fs";
 import { dirname, join } from "path";
 import { fileURLToPath } from "url";
 import { listModels, startModel, stopModel, viewLogs } from "./commands/models.js";
 import { listPods, removePodCommand, setupPod, switchActivePod } from "./commands/pods.js";
 import { promptModel } from "./commands/prompt.js";
 import { getActivePod, loadConfig } from "./config.js";
 import { sshExecStream } from "./ssh.js";
 const __filename = fileURLToPath(import.meta.url);
 const __dirname = dirname(__filename);
 const packageJson = JSON.parse(readFileSync(join(__dirname, "../package.json"), "utf-8"));
 function printHelp() {
 	console.log(`pi v${packageJson.version} - Manage vLLM deployments on GPU pods
 Pod Management:
  pi pods setup <name> "<ssh>" --mount "<mount>"    Setup pod with mount command
    Options:
      --vllm release    Install latest vLLM release >=0.10.0 (default)
      --vllm nightly    Install vLLM nightly build (latest features)
      --vllm gpt-oss    Install vLLM 0.10.1+gptoss with PyTorch nightly (GPT-OSS only)
  pi pods                                           List all pods (* = active)
  pi pods active <name>                             Switch active pod
  pi pods remove <name>                             Remove pod from local config
  pi shell [<name>]                                 Open shell on pod (active or specified)
  pi ssh [<name>] "<command>"                       Run SSH command on pod
 Model Management:
  pi start <model> --name <name> [options]          Start a model
    --memory <percent>   GPU memory allocation (30%, 50%, 90%)
    --context <size>     Context window (4k, 8k, 16k, 32k, 64k, 128k)
    --gpus <count>       Number of GPUs to use (predefined models only)
    --vllm <args...>     Pass remaining args to vLLM (ignores other options)
  pi stop [<name>]                                  Stop model (or all if no name)
  pi list                                           List running models
  pi logs <name>                                    Stream model logs
  pi agent <name> ["<message>"...] [options]        Chat with model using agent & tools
  pi agent <name> [options]                         Interactive chat mode
    --continue, -c       Continue previous session
    --json              Output as JSONL
    (All pi-agent options are supported)
  All model commands support --pod <name> to override the active pod.
 Environment:
  HF_TOKEN         HuggingFace token for model downloads
  PI_API_KEY     API key for vLLM endpoints
  PI_CONFIG_DIR    Config directory (default: ~/.pi)`);
 }
 // Parse command line arguments
 const args = process.argv.slice(2);
 if (args.length === 0 || args[0] === "--help" || args[0] === "-h") {
 	printHelp();
 	process.exit(0);
 }
 if (args[0] === "--version" || args[0] === "-v") {
 	console.log(packageJson.version);
 	process.exit(0);
 }
 const command = args[0];
 const subcommand = args[1];
 // Main command handler
 try {
 	// Handle "pi pods" commands
 	if (command === "pods") {
 		if (!subcommand) {
 			// pi pods - list all pods
 			listPods();
 		} else if (subcommand === "setup") {
 			// pi pods setup <name> "<ssh>" [--mount "<mount>"] [--models-path <path>] [--vllm release|nightly|gpt-oss]
 			const name = args[2];
 			const sshCmd = args[3];
 			if (!name || !sshCmd) {
 				console.error(
 					'Usage: pi pods setup <name> "<ssh>" [--mount "<mount>"] [--models-path <path>] [--vllm release|nightly|gpt-oss]',
 				);
 				process.exit(1);
 			}
 			// Parse options
 			const options: { mount?: string; modelsPath?: string; vllm?: "release" | "nightly" | "gpt-oss" } = {};
 			for (let i = 4; i < args.length; i++) {
 				if (args[i] === "--mount" && i + 1 < args.length) {
 					options.mount = args[i + 1];
 					i++;
 				} else if (args[i] === "--models-path" && i + 1 < args.length) {
 					options.modelsPath = args[i + 1];
 					i++;
 				} else if (args[i] === "--vllm" && i + 1 < args.length) {
 					const vllmType = args[i + 1];
 					if (vllmType === "release" || vllmType === "nightly" || vllmType === "gpt-oss") {
 						options.vllm = vllmType;
 					} else {
 						console.error(chalk.red(`Invalid vLLM type: ${vllmType}`));
 						console.error("Valid options: release, nightly, gpt-oss");
 						process.exit(1);
 					}
 					i++;
 				}
 			}
 			// If --mount provided but no --models-path, try to extract path from mount command
 			if (options.mount && !options.modelsPath) {
 				// Extract last part of mount command as models path
 				const parts = options.mount.trim().split(" ");
 				const lastPart = parts[parts.length - 1];
 				if (lastPart?.startsWith("/")) {
 					options.modelsPath = lastPart;
 				}
 			}
 			await setupPod(name, sshCmd, options);
 		} else if (subcommand === "active") {
 			// pi pods active <name>
 			const name = args[2];
 			if (!name) {
 				console.error("Usage: pi pods active <name>");
 				process.exit(1);
 			}
 			switchActivePod(name);
 		} else if (subcommand === "remove") {
 			// pi pods remove <name>
 			const name = args[2];
 			if (!name) {
 				console.error("Usage: pi pods remove <name>");
 				process.exit(1);
 			}
 			removePodCommand(name);
 		} else {
 			console.error(`Unknown pods subcommand: ${subcommand}`);
 			process.exit(1);
 		}
 	} else {
 		// Parse --pod override for model commands
 		let podOverride: string | undefined;
 		const podIndex = args.indexOf("--pod");
 		if (podIndex !== -1 && podIndex + 1 < args.length) {
 			podOverride = args[podIndex + 1];
 			// Remove --pod and its value from args
 			args.splice(podIndex, 2);
 		}
 		// Handle SSH/shell commands and model commands
 		switch (command) {
 			case "shell": {
 				// pi shell [<name>] - open interactive shell
 				const podName = args[1];
 				let podInfo: { name: string; pod: import("./types.js").Pod } | null = null;
 				if (podName) {
 					const config = loadConfig();
 					const pod = config.pods[podName];
 					if (pod) {
 						podInfo = { name: podName, pod };
 					}
 				} else {
 					podInfo = getActivePod();
 				}
 				if (!podInfo) {
 					if (podName) {
 						console.error(chalk.red(`Pod '${podName}' not found`));
 					} else {
 						console.error(chalk.red("No active pod. Use 'pi pods active <name>' to set one."));
 					}
 					process.exit(1);
 				}
 				console.log(chalk.green(`Connecting to pod '${podInfo.name}'...`));
 				// Execute SSH in interactive mode
 				const sshArgs = podInfo.pod.ssh.split(" ").slice(1); // Remove 'ssh' from command
 				const sshProcess = spawn("ssh", sshArgs, {
 					stdio: "inherit",
 					env: process.env,
 				});
 				sshProcess.on("exit", (code) => {
 					process.exit(code || 0);
 				});
 				break;
 			}
 			case "ssh": {
 				// pi ssh [<name>] "<command>" - run command via SSH
 				let podName: string | undefined;
 				let sshCommand: string;
 				if (args.length === 2) {
 					// pi ssh "<command>" - use active pod
 					sshCommand = args[1];
 				} else if (args.length === 3) {
 					// pi ssh <name> "<command>"
 					podName = args[1];
 					sshCommand = args[2];
 				} else {
 					console.error('Usage: pi ssh [<name>] "<command>"');
 					process.exit(1);
 				}
 				let podInfo: { name: string; pod: import("./types.js").Pod } | null = null;
 				if (podName) {
 					const config = loadConfig();
 					const pod = config.pods[podName];
 					if (pod) {
 						podInfo = { name: podName, pod };
 					}
 				} else {
 					podInfo = getActivePod();
 				}
 				if (!podInfo) {
 					if (podName) {
 						console.error(chalk.red(`Pod '${podName}' not found`));
 					} else {
 						console.error(chalk.red("No active pod. Use 'pi pods active <name>' to set one."));
 					}
 					process.exit(1);
 				}
 				console.log(chalk.gray(`Running on pod '${podInfo.name}': ${sshCommand}`));
 				// Execute command and stream output
 				const exitCode = await sshExecStream(podInfo.pod.ssh, sshCommand);
 				process.exit(exitCode);
 				break;
 			}
 			case "start": {
 				// pi start <model> --name <name> [options]
 				const modelId = args[1];
 				if (!modelId) {
 					// Show available models
 					const { showKnownModels } = await import("./commands/models.js");
 					await showKnownModels();
 					process.exit(0);
 				}
 				// Parse options
 				let name: string | undefined;
 				let memory: string | undefined;
 				let context: string | undefined;
 				let gpus: number | undefined;
 				const vllmArgs: string[] = [];
 				let inVllmArgs = false;
 				for (let i = 2; i < args.length; i++) {
 					if (inVllmArgs) {
 						vllmArgs.push(args[i]);
 					} else if (args[i] === "--name" && i + 1 < args.length) {
 						name = args[i + 1];
 						i++;
 					} else if (args[i] === "--memory" && i + 1 < args.length) {
 						memory = args[i + 1];
 						i++;
 					} else if (args[i] === "--context" && i + 1 < args.length) {
 						context = args[i + 1];
 						i++;
 					} else if (args[i] === "--gpus" && i + 1 < args.length) {
 						gpus = parseInt(args[i + 1]);
 						if (Number.isNaN(gpus) || gpus < 1) {
 							console.error(chalk.red("--gpus must be a positive number"));
 							process.exit(1);
 						}
 						i++;
 					} else if (args[i] === "--vllm") {
 						inVllmArgs = true;
 					}
 				}
 				if (!name) {
 					console.error("--name is required");
 					process.exit(1);
 				}
 				// Warn if --vllm is used with other parameters
 				if (vllmArgs.length > 0 && (memory || context || gpus)) {
 					console.log(
 						chalk.yellow("⚠ Warning: --memory, --context, and --gpus are ignored when --vllm is specified"),
 					);
 					console.log(chalk.yellow("  Using only custom vLLM arguments"));
 					console.log("");
 				}
 				await startModel(modelId, name, {
 					pod: podOverride,
 					memory,
 					context,
 					gpus,
 					vllmArgs: vllmArgs.length > 0 ? vllmArgs : undefined,
 				});
 				break;
 			}
 			case "stop": {
 				// pi stop [name] - stop specific model or all models
 				const name = args[1];
 				if (!name) {
 					// Stop all models on the active pod
 					const { stopAllModels } = await import("./commands/models.js");
 					await stopAllModels({ pod: podOverride });
 				} else {
 					await stopModel(name, { pod: podOverride });
 				}
 				break;
 			}
 			case "list":
 				// pi list
 				await listModels({ pod: podOverride });
 				break;
 			case "logs": {
 				// pi logs <name>
 				const name = args[1];
 				if (!name) {
 					console.error("Usage: pi logs <name>");
 					process.exit(1);
 				}
 				await viewLogs(name, { pod: podOverride });
 				break;
 			}
 			case "agent": {
 				// pi agent <name> [messages...] [options]
 				const name = args[1];
 				if (!name) {
 					console.error("Usage: pi agent <name> [messages...] [options]");
 					process.exit(1);
 				}
 				const apiKey = process.env.PI_API_KEY;
 				// Pass all args after the model name
 				const agentArgs = args.slice(2);
 				// If no messages provided, it's interactive mode
 				await promptModel(name, agentArgs, {
 					pod: podOverride,
 					apiKey,
 				}).catch(() => {
 					// Error already handled in promptModel, just exit cleanly
 					process.exit(0);
 				});
 				break;
 			}
 			default:
 				console.error(`Unknown command: ${command}`);
 				printHelp();
 				process.exit(1);
 		}
 	}
 } catch (error) {
 	console.error("Error:", error);
 	process.exit(1);
 }
--- a/packages/pods/src/commands/models.ts
+++ b/packages/pods/src/commands/models.ts
@ -0,0 +1,703 @@
 import chalk from "chalk";
 import { spawn } from "child_process";
 import { readFileSync } from "fs";
 import { dirname, join } from "path";
 import { fileURLToPath } from "url";
 import { getActivePod, loadConfig, saveConfig } from "../config.js";
 import { getModelConfig, getModelName, isKnownModel } from "../model-configs.js";
 import { sshExec } from "../ssh.js";
 import type { Pod } from "../types.js";
 /**
 * Get the pod to use (active or override)
 */
 const getPod = (podOverride?: string): { name: string; pod: Pod } => {
 	if (podOverride) {
 		const config = loadConfig();
 		const pod = config.pods[podOverride];
 		if (!pod) {
 			console.error(chalk.red(`Pod '${podOverride}' not found`));
 			process.exit(1);
 		}
 		return { name: podOverride, pod };
 	}
 	const active = getActivePod();
 	if (!active) {
 		console.error(chalk.red("No active pod. Use 'pi pods active <name>' to set one."));
 		process.exit(1);
 	}
 	return active;
 };
 /**
 * Find next available port starting from 8001
 */
 const getNextPort = (pod: Pod): number => {
 	const usedPorts = Object.values(pod.models).map((m) => m.port);
 	let port = 8001;
 	while (usedPorts.includes(port)) {
 		port++;
 	}
 	return port;
 };
 /**
 * Select GPUs for model deployment (round-robin)
 */
 const selectGPUs = (pod: Pod, count: number = 1): number[] => {
 	if (count === pod.gpus.length) {
 		// Use all GPUs
 		return pod.gpus.map((g) => g.id);
 	}
 	// Count GPU usage across all models
 	const gpuUsage = new Map<number, number>();
 	for (const gpu of pod.gpus) {
 		gpuUsage.set(gpu.id, 0);
 	}
 	for (const model of Object.values(pod.models)) {
 		for (const gpuId of model.gpu) {
 			gpuUsage.set(gpuId, (gpuUsage.get(gpuId) || 0) + 1);
 		}
 	}
 	// Sort GPUs by usage (least used first)
 	const sortedGPUs = Array.from(gpuUsage.entries())
 		.sort((a, b) => a[1] - b[1])
 		.map((entry) => entry[0]);
 	// Return the least used GPUs
 	return sortedGPUs.slice(0, count);
 };
 /**
 * Start a model
 */
 export const startModel = async (
 	modelId: string,
 	name: string,
 	options: {
 		pod?: string;
 		vllmArgs?: string[];
 		memory?: string;
 		context?: string;
 		gpus?: number;
 	},
 ) => {
 	const { name: podName, pod } = getPod(options.pod);
 	// Validation
 	if (!pod.modelsPath) {
 		console.error(chalk.red("Pod does not have a models path configured"));
 		process.exit(1);
 	}
 	if (pod.models[name]) {
 		console.error(chalk.red(`Model '${name}' already exists on pod '${podName}'`));
 		process.exit(1);
 	}
 	const port = getNextPort(pod);
 	// Determine GPU allocation and vLLM args
 	let gpus: number[] = [];
 	let vllmArgs: string[] = [];
 	let modelConfig = null;
 	if (options.vllmArgs?.length) {
 		// Custom args override everything
 		vllmArgs = options.vllmArgs;
 		console.log(chalk.gray("Using custom vLLM args, GPU allocation managed by vLLM"));
 	} else if (isKnownModel(modelId)) {
 		// Handle --gpus parameter for known models
 		if (options.gpus) {
 			// Validate GPU count
 			if (options.gpus > pod.gpus.length) {
 				console.error(chalk.red(`Error: Requested ${options.gpus} GPUs but pod only has ${pod.gpus.length}`));
 				process.exit(1);
 			}
 			// Try to find config for requested GPU count
 			modelConfig = getModelConfig(modelId, pod.gpus, options.gpus);
 			if (modelConfig) {
 				gpus = selectGPUs(pod, options.gpus);
 				vllmArgs = [...(modelConfig.args || [])];
 			} else {
 				console.error(
 					chalk.red(`Model '${getModelName(modelId)}' does not have a configuration for ${options.gpus} GPU(s)`),
 				);
 				console.error(chalk.yellow("Available configurations:"));
 				// Show available configurations
 				for (let gpuCount = 1; gpuCount <= pod.gpus.length; gpuCount++) {
 					const config = getModelConfig(modelId, pod.gpus, gpuCount);
 					if (config) {
 						console.error(chalk.gray(`  - ${gpuCount} GPU(s)`));
 					}
 				}
 				process.exit(1);
 			}
 		} else {
 			// Find best config for this hardware (original behavior)
 			for (let gpuCount = pod.gpus.length; gpuCount >= 1; gpuCount--) {
 				modelConfig = getModelConfig(modelId, pod.gpus, gpuCount);
 				if (modelConfig) {
 					gpus = selectGPUs(pod, gpuCount);
 					vllmArgs = [...(modelConfig.args || [])];
 					break;
 				}
 			}
 			if (!modelConfig) {
 				console.error(chalk.red(`Model '${getModelName(modelId)}' not compatible with this pod's GPUs`));
 				process.exit(1);
 			}
 		}
 	} else {
 		// Unknown model
 		if (options.gpus) {
 			console.error(chalk.red("Error: --gpus can only be used with predefined models"));
 			console.error(chalk.yellow("For custom models, use --vllm with tensor-parallel-size or similar arguments"));
 			process.exit(1);
 		}
 		// Single GPU default
 		gpus = selectGPUs(pod, 1);
 		console.log(chalk.gray("Unknown model, defaulting to single GPU"));
 	}
 	// Apply memory/context overrides
 	if (!options.vllmArgs?.length) {
 		if (options.memory) {
 			const fraction = parseFloat(options.memory.replace("%", "")) / 100;
 			vllmArgs = vllmArgs.filter((arg) => !arg.includes("gpu-memory-utilization"));
 			vllmArgs.push("--gpu-memory-utilization", String(fraction));
 		}
 		if (options.context) {
 			const contextSizes: Record<string, number> = {
 				"4k": 4096,
 				"8k": 8192,
 				"16k": 16384,
 				"32k": 32768,
 				"64k": 65536,
 				"128k": 131072,
 			};
 			const maxTokens = contextSizes[options.context.toLowerCase()] || parseInt(options.context);
 			vllmArgs = vllmArgs.filter((arg) => !arg.includes("max-model-len"));
 			vllmArgs.push("--max-model-len", String(maxTokens));
 		}
 	}
 	// Show what we're doing
 	console.log(chalk.green(`Starting model '${name}' on pod '${podName}'...`));
 	console.log(`Model: ${modelId}`);
 	console.log(`Port: ${port}`);
 	console.log(`GPU(s): ${gpus.length ? gpus.join(", ") : "Managed by vLLM"}`);
 	if (modelConfig?.notes) console.log(chalk.yellow(`Note: ${modelConfig.notes}`));
 	console.log("");
 	// Read and customize model_run.sh script with our values
 	const scriptPath = join(dirname(fileURLToPath(import.meta.url)), "../../scripts/model_run.sh");
 	let scriptContent = readFileSync(scriptPath, "utf-8");
 	// Replace placeholders - no escaping needed, heredoc with 'EOF' is literal
 	scriptContent = scriptContent
 		.replace("{{MODEL_ID}}", modelId)
 		.replace("{{NAME}}", name)
 		.replace("{{PORT}}", String(port))
 		.replace("{{VLLM_ARGS}}", vllmArgs.join(" "));
 	// Upload customized script
 	const result = await sshExec(
 		pod.ssh,
 		`cat > /tmp/model_run_${name}.sh << 'EOF'
 ${scriptContent}
 EOF
 chmod +x /tmp/model_run_${name}.sh`,
 	);
 	// Prepare environment
 	const env = [
 		`HF_TOKEN='${process.env.HF_TOKEN}'`,
 		`PI_API_KEY='${process.env.PI_API_KEY}'`,
 		`HF_HUB_ENABLE_HF_TRANSFER=1`,
 		`VLLM_NO_USAGE_STATS=1`,
 		`PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`,
 		`FORCE_COLOR=1`,
 		`TERM=xterm-256color`,
 		...(gpus.length === 1 ? [`CUDA_VISIBLE_DEVICES=${gpus[0]}`] : []),
 		...Object.entries(modelConfig?.env || {}).map(([k, v]) => `${k}='${v}'`),
 	]
 		.map((e) => `export ${e}`)
 		.join("\n");
 	// Start the model runner with script command for pseudo-TTY (preserves colors)
 	// Note: We use script to preserve colors and create a log file
 	// setsid creates a new session so it survives SSH disconnection
 	const startCmd = `
 		${env}
 		mkdir -p ~/.vllm_logs
 		# Create a wrapper that monitors the script command
 		cat > /tmp/model_wrapper_${name}.sh << 'WRAPPER'
 #!/bin/bash
 script -q -f -c "/tmp/model_run_${name}.sh" ~/.vllm_logs/${name}.log
 exit_code=$?
 echo "Script exited with code $exit_code" >> ~/.vllm_logs/${name}.log
 exit $exit_code
 WRAPPER
 		chmod +x /tmp/model_wrapper_${name}.sh
 		setsid /tmp/model_wrapper_${name}.sh </dev/null >/dev/null 2>&1 &
 		echo $!
 		exit 0
 	`;
 	const pidResult = await sshExec(pod.ssh, startCmd);
 	const pid = parseInt(pidResult.stdout.trim());
 	if (!pid) {
 		console.error(chalk.red("Failed to start model runner"));
 		process.exit(1);
 	}
 	// Save to config
 	const config = loadConfig();
 	config.pods[podName].models[name] = { model: modelId, port, gpu: gpus, pid };
 	saveConfig(config);
 	console.log(`Model runner started with PID: ${pid}`);
 	console.log("Streaming logs... (waiting for startup)\n");
 	// Small delay to ensure log file is created
 	await new Promise((resolve) => setTimeout(resolve, 500));
 	// Stream logs with color support, watching for startup complete
 	const sshParts = pod.ssh.split(" ");
 	const sshCommand = sshParts[0]; // "ssh"
 	const sshArgs = sshParts.slice(1); // ["root@86.38.238.55"]
 	const host = sshArgs[0].split("@")[1] || "localhost";
 	const tailCmd = `tail -f ~/.vllm_logs/${name}.log`;
 	// Build the full args array for spawn
 	const fullArgs = [...sshArgs, tailCmd];
 	const logProcess = spawn(sshCommand, fullArgs, {
 		stdio: ["inherit", "pipe", "pipe"], // capture stdout and stderr
 		env: { ...process.env, FORCE_COLOR: "1" },
 	});
 	let interrupted = false;
 	let startupComplete = false;
 	// Handle Ctrl+C
 	const sigintHandler = () => {
 		interrupted = true;
 		logProcess.kill();
 	};
 	process.on("SIGINT", sigintHandler);
 	// Process log output line by line
 	const processOutput = (data: Buffer) => {
 		const lines = data.toString().split("\n");
 		for (const line of lines) {
 			if (line) {
 				console.log(line); // Echo the line to console
 				// Check for startup complete message
 				if (line.includes("Application startup complete")) {
 					startupComplete = true;
 					logProcess.kill(); // Stop tailing logs
 				}
 			}
 		}
 	};
 	logProcess.stdout?.on("data", processOutput);
 	logProcess.stderr?.on("data", processOutput);
 	await new Promise<void>((resolve) => logProcess.on("exit", resolve));
 	process.removeListener("SIGINT", sigintHandler);
 	if (startupComplete) {
 		// Model started successfully - output connection details
 		console.log("\n" + chalk.green("✓ Model started successfully!"));
 		console.log("\n" + chalk.bold("Connection Details:"));
 		console.log(chalk.cyan("─".repeat(50)));
 		console.log(chalk.white("Base URL:    ") + chalk.yellow(`http://${host}:${port}/v1`));
 		console.log(chalk.white("Model:       ") + chalk.yellow(modelId));
 		console.log(chalk.white("API Key:     ") + chalk.yellow(process.env.PI_API_KEY || "(not set)"));
 		console.log(chalk.cyan("─".repeat(50)));
 		console.log("\n" + chalk.bold("Export for shell:"));
 		console.log(chalk.gray(`export OPENAI_BASE_URL="http://${host}:${port}/v1"`));
 		console.log(chalk.gray(`export OPENAI_API_KEY="${process.env.PI_API_KEY || "your-api-key"}"`));
 		console.log(chalk.gray(`export OPENAI_MODEL="${modelId}"`));
 		console.log("\n" + chalk.bold("Example usage:"));
 		console.log(
 			chalk.gray(`
  # Python
  from openai import OpenAI
  client = OpenAI()  # Uses env vars
  response = client.chat.completions.create(
      model="${modelId}",
      messages=[{"role": "user", "content": "Hello!"}]
  )
  # CLI
  curl $OPENAI_BASE_URL/chat/completions \\
    -H "Authorization: Bearer $OPENAI_API_KEY" \\
    -H "Content-Type: application/json" \\
    -d '{"model":"${modelId}","messages":[{"role":"user","content":"Hi"}]}'`),
 		);
 		console.log("");
 		console.log(chalk.cyan(`Chat with model:  pi agent ${name} "Your message"`));
 		console.log(chalk.cyan(`Interactive mode: pi agent ${name} -i`));
 		console.log(chalk.cyan(`Monitor logs:     pi logs ${name}`));
 		console.log(chalk.cyan(`Stop model:       pi stop ${name}`));
 	} else if (interrupted) {
 		console.log(chalk.yellow("\n\nStopped monitoring. Model deployment continues in background."));
 		console.log(chalk.cyan(`Chat with model: pi agent ${name} "Your message"`));
 		console.log(chalk.cyan(`Check status: pi logs ${name}`));
 		console.log(chalk.cyan(`Stop model: pi stop ${name}`));
 	} else {
 		console.log(chalk.yellow("\n\nLog stream ended. Model may still be running."));
 		console.log(chalk.cyan(`Chat with model: pi agent ${name} "Your message"`));
 		console.log(chalk.cyan(`Check status: pi logs ${name}`));
 		console.log(chalk.cyan(`Stop model: pi stop ${name}`));
 	}
 };
 /**
 * Stop a model
 */
 export const stopModel = async (name: string, options: { pod?: string }) => {
 	const { name: podName, pod } = getPod(options.pod);
 	const model = pod.models[name];
 	if (!model) {
 		console.error(chalk.red(`Model '${name}' not found on pod '${podName}'`));
 		process.exit(1);
 	}
 	console.log(chalk.yellow(`Stopping model '${name}' on pod '${podName}'...`));
 	// Kill the script process and all its children
 	// Using pkill to kill the process and all children
 	const killCmd = `
 		# Kill the script process and all its children
 		pkill -TERM -P ${model.pid} 2>/dev/null || true
 		kill ${model.pid} 2>/dev/null || true
 	`;
 	await sshExec(pod.ssh, killCmd);
 	// Remove from config
 	const config = loadConfig();
 	delete config.pods[podName].models[name];
 	saveConfig(config);
 	console.log(chalk.green(`✓ Model '${name}' stopped`));
 };
 /**
 * Stop all models on a pod
 */
 export const stopAllModels = async (options: { pod?: string }) => {
 	const { name: podName, pod } = getPod(options.pod);
 	const modelNames = Object.keys(pod.models);
 	if (modelNames.length === 0) {
 		console.log(`No models running on pod '${podName}'`);
 		return;
 	}
 	console.log(chalk.yellow(`Stopping ${modelNames.length} model(s) on pod '${podName}'...`));
 	// Kill all script processes and their children
 	const pids = Object.values(pod.models).map((m) => m.pid);
 	const killCmd = `
 		for PID in ${pids.join(" ")}; do
 			pkill -TERM -P $PID 2>/dev/null || true
 			kill $PID 2>/dev/null || true
 		done
 	`;
 	await sshExec(pod.ssh, killCmd);
 	// Clear all models from config
 	const config = loadConfig();
 	config.pods[podName].models = {};
 	saveConfig(config);
 	console.log(chalk.green(`✓ Stopped all models: ${modelNames.join(", ")}`));
 };
 /**
 * List all models
 */
 export const listModels = async (options: { pod?: string }) => {
 	const { name: podName, pod } = getPod(options.pod);
 	const modelNames = Object.keys(pod.models);
 	if (modelNames.length === 0) {
 		console.log(`No models running on pod '${podName}'`);
 		return;
 	}
 	// Get pod SSH host for URL display
 	const sshParts = pod.ssh.split(" ");
 	const host = sshParts.find((p) => p.includes("@"))?.split("@")[1] || "unknown";
 	console.log(`Models on pod '${chalk.bold(podName)}':`);
 	for (const name of modelNames) {
 		const model = pod.models[name];
 		const gpuStr =
 			model.gpu.length > 1
 				? `GPUs ${model.gpu.join(",")}`
 				: model.gpu.length === 1
 					? `GPU ${model.gpu[0]}`
 					: "GPU unknown";
 		console.log(`  ${chalk.green(name)} - Port ${model.port} - ${gpuStr} - PID ${model.pid}`);
 		console.log(`    Model: ${chalk.gray(model.model)}`);
 		console.log(`    URL: ${chalk.cyan(`http://${host}:${model.port}/v1`)}`);
 	}
 	// Optionally verify processes are still running
 	console.log("");
 	console.log("Verifying processes...");
 	let anyDead = false;
 	for (const name of modelNames) {
 		const model = pod.models[name];
 		// Check both the wrapper process and if vLLM is responding
 		const checkCmd = `
 			# Check if wrapper process exists
 			if ps -p ${model.pid} > /dev/null 2>&1; then
 				# Process exists, now check if vLLM is responding
 				if curl -s -f http://localhost:${model.port}/health > /dev/null 2>&1; then
 					echo "running"
 				else
 					# Check if it's still starting up
 					if tail -n 20 ~/.vllm_logs/${name}.log 2>/dev/null | grep -q "ERROR\\|Failed\\|Cuda error\\|died"; then
 						echo "crashed"
 					else
 						echo "starting"
 					fi
 				fi
 			else
 				echo "dead"
 			fi
 		`;
 		const result = await sshExec(pod.ssh, checkCmd);
 		const status = result.stdout.trim();
 		if (status === "dead") {
 			console.log(chalk.red(`  ${name}: Process ${model.pid} is not running`));
 			anyDead = true;
 		} else if (status === "crashed") {
 			console.log(chalk.red(`  ${name}: vLLM crashed (check logs with 'pi logs ${name}')`));
 			anyDead = true;
 		} else if (status === "starting") {
 			console.log(chalk.yellow(`  ${name}: Still starting up...`));
 		}
 	}
 	if (anyDead) {
 		console.log("");
 		console.log(chalk.yellow("Some models are not running. Clean up with:"));
 		console.log(chalk.cyan("  pi stop <name>"));
 	} else {
 		console.log(chalk.green("✓ All processes verified"));
 	}
 };
 /**
 * View model logs
 */
 export const viewLogs = async (name: string, options: { pod?: string }) => {
 	const { name: podName, pod } = getPod(options.pod);
 	const model = pod.models[name];
 	if (!model) {
 		console.error(chalk.red(`Model '${name}' not found on pod '${podName}'`));
 		process.exit(1);
 	}
 	console.log(chalk.green(`Streaming logs for '${name}' on pod '${podName}'...`));
 	console.log(chalk.gray("Press Ctrl+C to stop"));
 	console.log("");
 	// Stream logs with color preservation
 	const sshParts = pod.ssh.split(" ");
 	const sshCommand = sshParts[0]; // "ssh"
 	const sshArgs = sshParts.slice(1); // ["root@86.38.238.55"]
 	const tailCmd = `tail -f ~/.vllm_logs/${name}.log`;
 	const logProcess = spawn(sshCommand, [...sshArgs, tailCmd], {
 		stdio: "inherit",
 		env: {
 			...process.env,
 			FORCE_COLOR: "1",
 		},
 	});
 	// Wait for process to exit
 	await new Promise<void>((resolve) => {
 		logProcess.on("exit", () => resolve());
 	});
 };
 /**
 * Show known models and their hardware requirements
 */
 export const showKnownModels = async () => {
 	const modelsJson = await import("../models.json", { assert: { type: "json" } });
 	const models = modelsJson.default.models;
 	// Get active pod info if available
 	const activePod = getActivePod();
 	let podGpuCount = 0;
 	let podGpuType = "";
 	if (activePod) {
 		podGpuCount = activePod.pod.gpus.length;
 		// Extract GPU type from name (e.g., "NVIDIA H200" -> "H200")
 		podGpuType = activePod.pod.gpus[0]?.name?.replace("NVIDIA", "")?.trim()?.split(" ")[0] || "";
 		console.log(chalk.bold(`Known Models for ${activePod.name} (${podGpuCount}x ${podGpuType || "GPU"}):\n`));
 	} else {
 		console.log(chalk.bold("Known Models:\n"));
 		console.log(chalk.yellow("No active pod. Use 'pi pods active <name>' to filter compatible models.\n"));
 	}
 	console.log("Usage: pi start <model> --name <name> [options]\n");
 	// Group models by compatibility and family
 	const compatible: Record<string, Array<{ id: string; name: string; config: string; notes?: string }>> = {};
 	const incompatible: Record<string, Array<{ id: string; name: string; minGpu: string; notes?: string }>> = {};
 	for (const [modelId, info] of Object.entries(models)) {
 		const modelInfo = info as any;
 		const family = modelInfo.name.split("-")[0] || "Other";
 		let isCompatible = false;
 		let compatibleConfig = "";
 		let minGpu = "Unknown";
 		let minNotes: string | undefined;
 		if (modelInfo.configs && modelInfo.configs.length > 0) {
 			// Sort configs by GPU count to find minimum
 			const sortedConfigs = [...modelInfo.configs].sort((a: any, b: any) => (a.gpuCount || 1) - (b.gpuCount || 1));
 			// Find minimum requirements
 			const minConfig = sortedConfigs[0];
 			const minGpuCount = minConfig.gpuCount || 1;
 			const gpuTypes = minConfig.gpuTypes?.join("/") || "H100/H200";
 			if (minGpuCount === 1) {
 				minGpu = `1x ${gpuTypes}`;
 			} else {
 				minGpu = `${minGpuCount}x ${gpuTypes}`;
 			}
 			minNotes = minConfig.notes || modelInfo.notes;
 			// Check compatibility with active pod
 			if (activePod && podGpuCount > 0) {
 				// Find best matching config for this pod
 				for (const config of sortedConfigs) {
 					const configGpuCount = config.gpuCount || 1;
 					const configGpuTypes = config.gpuTypes || [];
 					// Check if we have enough GPUs
 					if (configGpuCount <= podGpuCount) {
 						// Check if GPU type matches (if specified)
 						if (
 							configGpuTypes.length === 0 ||
 							configGpuTypes.some((type: string) => podGpuType.includes(type) || type.includes(podGpuType))
 						) {
 							isCompatible = true;
 							if (configGpuCount === 1) {
 								compatibleConfig = `1x ${podGpuType}`;
 							} else {
 								compatibleConfig = `${configGpuCount}x ${podGpuType}`;
 							}
 							minNotes = config.notes || modelInfo.notes;
 							break;
 						}
 					}
 				}
 			}
 		}
 		const modelEntry = {
 			id: modelId,
 			name: modelInfo.name,
 			notes: minNotes,
 		};
 		if (activePod && isCompatible) {
 			if (!compatible[family]) {
 				compatible[family] = [];
 			}
 			compatible[family].push({ ...modelEntry, config: compatibleConfig });
 		} else {
 			if (!incompatible[family]) {
 				incompatible[family] = [];
 			}
 			incompatible[family].push({ ...modelEntry, minGpu });
 		}
 	}
 	// Display compatible models first
 	if (activePod && Object.keys(compatible).length > 0) {
 		console.log(chalk.green.bold("✓ Compatible Models:\n"));
 		const sortedFamilies = Object.keys(compatible).sort();
 		for (const family of sortedFamilies) {
 			console.log(chalk.cyan(`${family} Models:`));
 			const modelList = compatible[family].sort((a, b) => a.name.localeCompare(b.name));
 			for (const model of modelList) {
 				console.log(`  ${chalk.green(model.id)}`);
 				console.log(`    Name: ${model.name}`);
 				console.log(`    Config: ${model.config}`);
 				if (model.notes) {
 					console.log(chalk.gray(`    Note: ${model.notes}`));
 				}
 				console.log("");
 			}
 		}
 	}
 	// Display incompatible models
 	if (Object.keys(incompatible).length > 0) {
 		if (activePod && Object.keys(compatible).length > 0) {
 			console.log(chalk.red.bold("✗ Incompatible Models (need more/different GPUs):\n"));
 		}
 		const sortedFamilies = Object.keys(incompatible).sort();
 		for (const family of sortedFamilies) {
 			if (!activePod) {
 				console.log(chalk.cyan(`${family} Models:`));
 			} else {
 				console.log(chalk.gray(`${family} Models:`));
 			}
 			const modelList = incompatible[family].sort((a, b) => a.name.localeCompare(b.name));
 			for (const model of modelList) {
 				const color = activePod ? chalk.gray : chalk.green;
 				console.log(`  ${color(model.id)}`);
 				console.log(chalk.gray(`    Name: ${model.name}`));
 				console.log(chalk.gray(`    Min Hardware: ${model.minGpu}`));
 				if (model.notes && !activePod) {
 					console.log(chalk.gray(`    Note: ${model.notes}`));
 				}
 				if (activePod) {
 					console.log(""); // Less verbose for incompatible models when filtered
 				} else {
 					console.log("");
 				}
 			}
 		}
 	}
 	console.log(chalk.gray("\nFor unknown models, defaults to single GPU deployment."));
 	console.log(chalk.gray("Use --vllm to pass custom arguments to vLLM."));
 };
--- a/packages/pods/src/commands/pods.ts
+++ b/packages/pods/src/commands/pods.ts
@ -0,0 +1,205 @@
 import chalk from "chalk";
 import { dirname, join } from "path";
 import { fileURLToPath } from "url";
 import { addPod, loadConfig, removePod, setActivePod } from "../config.js";
 import { scpFile, sshExec, sshExecStream } from "../ssh.js";
 import type { GPU, Pod } from "../types.js";
 const __filename = fileURLToPath(import.meta.url);
 const __dirname = dirname(__filename);
 /**
 * List all pods
 */
 export const listPods = () => {
 	const config = loadConfig();
 	const podNames = Object.keys(config.pods);
 	if (podNames.length === 0) {
 		console.log("No pods configured. Use 'pi pods setup' to add a pod.");
 		return;
 	}
 	console.log("Configured pods:");
 	for (const name of podNames) {
 		const pod = config.pods[name];
 		const isActive = config.active === name;
 		const marker = isActive ? chalk.green("*") : " ";
 		const gpuCount = pod.gpus?.length || 0;
 		const gpuInfo = gpuCount > 0 ? `${gpuCount}x ${pod.gpus[0].name}` : "no GPUs detected";
 		const vllmInfo = pod.vllmVersion ? ` (vLLM: ${pod.vllmVersion})` : "";
 		console.log(`${marker} ${chalk.bold(name)} - ${gpuInfo}${vllmInfo} - ${pod.ssh}`);
 		if (pod.modelsPath) {
 			console.log(`    Models: ${pod.modelsPath}`);
 		}
 		if (pod.vllmVersion === "gpt-oss") {
 			console.log(chalk.yellow(`    ⚠️  GPT-OSS build - only for GPT-OSS models`));
 		}
 	}
 };
 /**
 * Setup a new pod
 */
 export const setupPod = async (
 	name: string,
 	sshCmd: string,
 	options: { mount?: string; modelsPath?: string; vllm?: "release" | "nightly" | "gpt-oss" },
 ) => {
 	// Validate environment variables
 	const hfToken = process.env.HF_TOKEN;
 	const vllmApiKey = process.env.PI_API_KEY;
 	if (!hfToken) {
 		console.error(chalk.red("ERROR: HF_TOKEN environment variable is required"));
 		console.error("Get a token from: https://huggingface.co/settings/tokens");
 		console.error("Then run: export HF_TOKEN=your_token_here");
 		process.exit(1);
 	}
 	if (!vllmApiKey) {
 		console.error(chalk.red("ERROR: PI_API_KEY environment variable is required"));
 		console.error("Set an API key: export PI_API_KEY=your_api_key_here");
 		process.exit(1);
 	}
 	// Determine models path
 	let modelsPath = options.modelsPath;
 	if (!modelsPath && options.mount) {
 		// Extract path from mount command if not explicitly provided
 		// e.g., "mount -t nfs ... /mnt/sfs" -> "/mnt/sfs"
 		const parts = options.mount.split(" ");
 		modelsPath = parts[parts.length - 1];
 	}
 	if (!modelsPath) {
 		console.error(chalk.red("ERROR: --models-path is required (or must be extractable from --mount)"));
 		process.exit(1);
 	}
 	console.log(chalk.green(`Setting up pod '${name}'...`));
 	console.log(`SSH: ${sshCmd}`);
 	console.log(`Models path: ${modelsPath}`);
 	console.log(
 		`vLLM version: ${options.vllm || "release"} ${options.vllm === "gpt-oss" ? chalk.yellow("(GPT-OSS special build)") : ""}`,
 	);
 	if (options.mount) {
 		console.log(`Mount command: ${options.mount}`);
 	}
 	console.log("");
 	// Test SSH connection
 	console.log("Testing SSH connection...");
 	const testResult = await sshExec(sshCmd, "echo 'SSH OK'");
 	if (testResult.exitCode !== 0) {
 		console.error(chalk.red("Failed to connect via SSH"));
 		console.error(testResult.stderr);
 		process.exit(1);
 	}
 	console.log(chalk.green("✓ SSH connection successful"));
 	// Copy setup script
 	console.log("Copying setup script...");
 	const scriptPath = join(__dirname, "../../scripts/pod_setup.sh");
 	const success = await scpFile(sshCmd, scriptPath, "/tmp/pod_setup.sh");
 	if (!success) {
 		console.error(chalk.red("Failed to copy setup script"));
 		process.exit(1);
 	}
 	console.log(chalk.green("✓ Setup script copied"));
 	// Build setup command
 	let setupCmd = `bash /tmp/pod_setup.sh --models-path '${modelsPath}' --hf-token '${hfToken}' --vllm-api-key '${vllmApiKey}'`;
 	if (options.mount) {
 		setupCmd += ` --mount '${options.mount}'`;
 	}
 	// Add vLLM version flag
 	const vllmVersion = options.vllm || "release";
 	setupCmd += ` --vllm '${vllmVersion}'`;
 	// Run setup script
 	console.log("");
 	console.log(chalk.yellow("Running setup (this will take 2-5 minutes)..."));
 	console.log("");
 	// Use forceTTY to preserve colors from apt, pip, etc.
 	const exitCode = await sshExecStream(sshCmd, setupCmd, { forceTTY: true });
 	if (exitCode !== 0) {
 		console.error(chalk.red("\nSetup failed. Check the output above for errors."));
 		process.exit(1);
 	}
 	// Parse GPU info from setup output
 	console.log("");
 	console.log("Detecting GPU configuration...");
 	const gpuResult = await sshExec(sshCmd, "nvidia-smi --query-gpu=index,name,memory.total --format=csv,noheader");
 	const gpus: GPU[] = [];
 	if (gpuResult.exitCode === 0 && gpuResult.stdout) {
 		const lines = gpuResult.stdout.trim().split("\n");
 		for (const line of lines) {
 			const [id, name, memory] = line.split(",").map((s) => s.trim());
 			if (id !== undefined) {
 				gpus.push({
 					id: parseInt(id),
 					name: name || "Unknown",
 					memory: memory || "Unknown",
 				});
 			}
 		}
 	}
 	console.log(chalk.green(`✓ Detected ${gpus.length} GPU(s)`));
 	for (const gpu of gpus) {
 		console.log(`  GPU ${gpu.id}: ${gpu.name} (${gpu.memory})`);
 	}
 	// Save pod configuration
 	const pod: Pod = {
 		ssh: sshCmd,
 		gpus,
 		models: {},
 		modelsPath,
 		vllmVersion: options.vllm || "release",
 	};
 	addPod(name, pod);
 	console.log("");
 	console.log(chalk.green(`✓ Pod '${name}' setup complete and set as active pod`));
 	console.log("");
 	console.log("You can now deploy models with:");
 	console.log(chalk.cyan(`  pi start <model> --name <name>`));
 };
 /**
 * Switch active pod
 */
 export const switchActivePod = (name: string) => {
 	const config = loadConfig();
 	if (!config.pods[name]) {
 		console.error(chalk.red(`Pod '${name}' not found`));
 		console.log("\nAvailable pods:");
 		for (const podName of Object.keys(config.pods)) {
 			console.log(`  ${podName}`);
 		}
 		process.exit(1);
 	}
 	setActivePod(name);
 	console.log(chalk.green(`✓ Switched active pod to '${name}'`));
 };
 /**
 * Remove a pod from config
 */
 export const removePodCommand = (name: string) => {
 	const config = loadConfig();
 	if (!config.pods[name]) {
 		console.error(chalk.red(`Pod '${name}' not found`));
 		process.exit(1);
 	}
 	removePod(name);
 	console.log(chalk.green(`✓ Removed pod '${name}' from configuration`));
 	console.log(chalk.yellow("Note: This only removes the local configuration. The remote pod is not affected."));
 };
--- a/packages/pods/src/commands/prompt.ts
+++ b/packages/pods/src/commands/prompt.ts
@ -0,0 +1,85 @@
 import { main as agentMain } from "@mariozechner/pi-agent";
 import chalk from "chalk";
 import { getActivePod, loadConfig } from "../config.js";
 // ────────────────────────────────────────────────────────────────────────────────
 // Types
 // ────────────────────────────────────────────────────────────────────────────────
 interface PromptOptions {
 	pod?: string;
 	apiKey?: string;
 }
 // ────────────────────────────────────────────────────────────────────────────────
 // Main prompt function
 // ────────────────────────────────────────────────────────────────────────────────
 export async function promptModel(modelName: string, userArgs: string[], opts: PromptOptions = {}) {
 	// Get pod and model configuration
 	const activePod = opts.pod ? { name: opts.pod, pod: loadConfig().pods[opts.pod] } : getActivePod();
 	if (!activePod) {
 		console.error(chalk.red("No active pod. Use 'pi pods active <name>' to set one."));
 		process.exit(1);
 	}
 	const { name: podName, pod } = activePod;
 	const modelConfig = pod.models[modelName];
 	if (!modelConfig) {
 		console.error(chalk.red(`Model '${modelName}' not found on pod '${podName}'`));
 		process.exit(1);
 	}
 	// Extract host from SSH string
 	const host =
 		pod.ssh
 			.split(" ")
 			.find((p) => p.includes("@"))
 			?.split("@")[1] ?? "localhost";
 	// Build the system prompt for code navigation
 	const systemPrompt = `You help the user understand and navigate the codebase in the current working directory.
 You can read files, list directories, and execute shell commands via the respective tools.
 Do not output file contents you read via the read_file tool directly, unless asked to.
 Do not output markdown tables as part of your responses.
 Keep your responses concise and relevant to the user's request.
 File paths you output must include line numbers where possible, e.g. "src/index.ts:10-20" for lines 10 to 20 in src/index.ts.
 Current working directory: ${process.cwd()}`;
 	// Build arguments for agent main function
 	const args: string[] = [];
 	// Add base configuration that we control
 	args.push(
 		"--base-url",
 		`http://${host}:${modelConfig.port}/v1`,
 		"--model",
 		modelConfig.model,
 		"--api-key",
 		opts.apiKey || process.env.PI_API_KEY || "dummy",
 		"--api",
 		modelConfig.model.toLowerCase().includes("gpt-oss") ? "responses" : "completions",
 		"--system-prompt",
 		systemPrompt,
 	);
 	// Pass through all user-provided arguments
 	// This includes messages, --continue, --json, etc.
 	args.push(...userArgs);
 	// Call agent main function directly
 	try {
 		await agentMain(args);
 	} catch (err: any) {
 		console.error(chalk.red(`Agent error: ${err.message}`));
 		process.exit(1);
 	}
 }
--- a/packages/pods/src/config.ts
+++ b/packages/pods/src/config.ts
@ -0,0 +1,80 @@
 import { existsSync, mkdirSync, readFileSync, writeFileSync } from "fs";
 import { homedir } from "os";
 import { join } from "path";
 import type { Config, Pod } from "./types.js";
 // Get config directory from env or use default
 const getConfigDir = (): string => {
 	const configDir = process.env.PI_CONFIG_DIR || join(homedir(), ".pi");
 	if (!existsSync(configDir)) {
 		mkdirSync(configDir, { recursive: true });
 	}
 	return configDir;
 };
 const getConfigPath = (): string => {
 	return join(getConfigDir(), "pods.json");
 };
 export const loadConfig = (): Config => {
 	const configPath = getConfigPath();
 	if (!existsSync(configPath)) {
 		// Return empty config if file doesn't exist
 		return { pods: {} };
 	}
 	try {
 		const data = readFileSync(configPath, "utf-8");
 		return JSON.parse(data);
 	} catch (e) {
 		console.error(`Error reading config: ${e}`);
 		return { pods: {} };
 	}
 };
 export const saveConfig = (config: Config): void => {
 	const configPath = getConfigPath();
 	try {
 		writeFileSync(configPath, JSON.stringify(config, null, 2));
 	} catch (e) {
 		console.error(`Error saving config: ${e}`);
 		process.exit(1);
 	}
 };
 export const getActivePod = (): { name: string; pod: Pod } | null => {
 	const config = loadConfig();
 	if (!config.active || !config.pods[config.active]) {
 		return null;
 	}
 	return { name: config.active, pod: config.pods[config.active] };
 };
 export const addPod = (name: string, pod: Pod): void => {
 	const config = loadConfig();
 	config.pods[name] = pod;
 	// If no active pod, make this one active
 	if (!config.active) {
 		config.active = name;
 	}
 	saveConfig(config);
 };
 export const removePod = (name: string): void => {
 	const config = loadConfig();
 	delete config.pods[name];
 	// If this was the active pod, clear active
 	if (config.active === name) {
 		config.active = undefined;
 	}
 	saveConfig(config);
 };
 export const setActivePod = (name: string): void => {
 	const config = loadConfig();
 	if (!config.pods[name]) {
 		console.error(`Pod '${name}' not found`);
 		process.exit(1);
 	}
 	config.active = name;
 	saveConfig(config);
 };
--- a/packages/pods/src/index.ts
+++ b/packages/pods/src/index.ts
@ -0,0 +1,2 @@
 // Main library exports
 export * from "./types.js";
--- a/packages/pods/src/model-configs.ts
+++ b/packages/pods/src/model-configs.ts
@ -0,0 +1,111 @@
 import { readFileSync } from "fs";
 import { dirname, join } from "path";
 import { fileURLToPath } from "url";
 import type { GPU } from "./types.js";
 const __filename = fileURLToPath(import.meta.url);
 const __dirname = dirname(__filename);
 interface ModelConfig {
 	gpuCount: number;
 	gpuTypes?: string[];
 	args: string[];
 	env?: Record<string, string>;
 	notes?: string;
 }
 interface ModelInfo {
 	name: string;
 	configs: ModelConfig[];
 	notes?: string;
 }
 interface ModelsData {
 	models: Record<string, ModelInfo>;
 }
 // Load models configuration - resolve relative to this file
 const modelsJsonPath = join(__dirname, "models.json");
 const modelsData: ModelsData = JSON.parse(readFileSync(modelsJsonPath, "utf-8"));
 /**
 * Get the best configuration for a model based on available GPUs
 */
 export const getModelConfig = (
 	modelId: string,
 	gpus: GPU[],
 	requestedGpuCount: number,
 ): { args: string[]; env?: Record<string, string>; notes?: string } | null => {
 	const modelInfo = modelsData.models[modelId];
 	if (!modelInfo) {
 		// Unknown model, no default config
 		return null;
 	}
 	// Extract GPU type from the first GPU name (e.g., "NVIDIA H200" -> "H200")
 	const gpuType = gpus[0]?.name?.replace("NVIDIA", "")?.trim()?.split(" ")[0] || "";
 	// Find best matching config
 	let bestConfig: ModelConfig | null = null;
 	for (const config of modelInfo.configs) {
 		// Check GPU count
 		if (config.gpuCount !== requestedGpuCount) {
 			continue;
 		}
 		// Check GPU type if specified
 		if (config.gpuTypes && config.gpuTypes.length > 0) {
 			const typeMatches = config.gpuTypes.some((type) => gpuType.includes(type) || type.includes(gpuType));
 			if (!typeMatches) {
 				continue;
 			}
 		}
 		// This config matches
 		bestConfig = config;
 		break;
 	}
 	// If no exact match, try to find a config with just the right GPU count
 	if (!bestConfig) {
 		for (const config of modelInfo.configs) {
 			if (config.gpuCount === requestedGpuCount) {
 				bestConfig = config;
 				break;
 			}
 		}
 	}
 	if (!bestConfig) {
 		// No suitable config found
 		return null;
 	}
 	return {
 		args: [...bestConfig.args],
 		env: bestConfig.env ? { ...bestConfig.env } : undefined,
 		notes: bestConfig.notes || modelInfo.notes,
 	};
 };
 /**
 * Check if a model is known
 */
 export const isKnownModel = (modelId: string): boolean => {
 	return modelId in modelsData.models;
 };
 /**
 * Get all known models
 */
 export const getKnownModels = (): string[] => {
 	return Object.keys(modelsData.models);
 };
 /**
 * Get model display name
 */
 export const getModelName = (modelId: string): string => {
 	return modelsData.models[modelId]?.name || modelId;
 };
--- a/packages/pods/src/models.json
+++ b/packages/pods/src/models.json
@ -0,0 +1,305 @@
 {
 	"models": {
 		"Qwen/Qwen2.5-Coder-32B-Instruct": {
 			"name": "Qwen2.5-Coder-32B",
 			"configs": [
 				{
 					"gpuCount": 1,
 					"gpuTypes": ["H100", "H200"],
 					"args": ["--tool-call-parser", "hermes", "--enable-auto-tool-choice"]
 				},
 				{
 					"gpuCount": 2,
 					"gpuTypes": ["H100", "H200"],
 					"args": ["--tensor-parallel-size", "2", "--tool-call-parser", "hermes", "--enable-auto-tool-choice"]
 				}
 			]
 		},
 		"Qwen/Qwen3-Coder-30B-A3B-Instruct": {
 			"name": "Qwen3-Coder-30B",
 			"configs": [
 				{
 					"gpuCount": 1,
 					"gpuTypes": ["H100", "H200"],
 					"args": ["--enable-auto-tool-choice", "--tool-call-parser", "qwen3_coder"],
 					"notes": "Fits comfortably on single GPU. ~60GB model weight."
 				},
 				{
 					"gpuCount": 2,
 					"gpuTypes": ["H100", "H200"],
 					"args": [
 						"--tensor-parallel-size",
 						"2",
 						"--enable-auto-tool-choice",
 						"--tool-call-parser",
 						"qwen3_coder"
 					],
 					"notes": "For higher throughput/longer context."
 				}
 			]
 		},
 		"Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8": {
 			"name": "Qwen3-Coder-30B-FP8",
 			"configs": [
 				{
 					"gpuCount": 1,
 					"gpuTypes": ["H100", "H200"],
 					"args": ["--enable-auto-tool-choice", "--tool-call-parser", "qwen3_coder"],
 					"env": {
 						"VLLM_USE_DEEP_GEMM": "1"
 					},
 					"notes": "FP8 quantized, ~30GB model weight. Excellent for single GPU deployment."
 				}
 			]
 		},
 		"Qwen/Qwen3-Coder-480B-A35B-Instruct": {
 			"name": "Qwen3-Coder-480B",
 			"configs": [
 				{
 					"gpuCount": 8,
 					"gpuTypes": ["H200", "H20"],
 					"args": [
 						"--tensor-parallel-size",
 						"8",
 						"--max-model-len",
 						"32000",
 						"--enable-auto-tool-choice",
 						"--tool-call-parser",
 						"qwen3_coder"
 					],
 					"notes": "Cannot serve full 262K context on single node. Reduce max-model-len or increase gpu-memory-utilization."
 				}
 			]
 		},
 		"Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8": {
 			"name": "Qwen3-Coder-480B-FP8",
 			"configs": [
 				{
 					"gpuCount": 8,
 					"gpuTypes": ["H200", "H20"],
 					"args": [
 						"--max-model-len",
 						"131072",
 						"--enable-expert-parallel",
 						"--data-parallel-size",
 						"8",
 						"--enable-auto-tool-choice",
 						"--tool-call-parser",
 						"qwen3_coder"
 					],
 					"env": {
 						"VLLM_USE_DEEP_GEMM": "1"
 					},
 					"notes": "Use data-parallel mode (not tensor-parallel) to avoid weight quantization errors."
 				}
 			]
 		},
 		"openai/gpt-oss-20b": {
 			"name": "GPT-OSS-20B",
 			"configs": [
 				{
 					"gpuCount": 1,
 					"gpuTypes": ["H100", "H200"],
 					"args": ["--async-scheduling"]
 				},
 				{
 					"gpuCount": 1,
 					"gpuTypes": ["B200"],
 					"args": ["--async-scheduling"],
 					"env": {
 						"VLLM_USE_TRTLLM_ATTENTION": "1",
 						"VLLM_USE_TRTLLM_DECODE_ATTENTION": "1",
 						"VLLM_USE_TRTLLM_CONTEXT_ATTENTION": "1",
 						"VLLM_USE_FLASHINFER_MXFP4_MOE": "1"
 					}
 				}
 			],
 			"notes": "Requires vLLM 0.10.1+gptoss. Tools/functoin calls only  via /v1/responses endpoint."
 		},
 		"openai/gpt-oss-120b": {
 			"name": "GPT-OSS-120B",
 			"configs": [
 				{
 					"gpuCount": 1,
 					"gpuTypes": ["H100", "H200"],
 					"args": ["--async-scheduling", "--gpu-memory-utilization", "0.95", "--max-num-batched-tokens", "1024"],
 					"notes": "Single GPU deployment. Requires vLLM 0.10.1+gptoss. Tools/function calls only via /v1/responses endpoint."
 				},
 				{
 					"gpuCount": 2,
 					"gpuTypes": ["H100", "H200"],
 					"args": ["--tensor-parallel-size", "2", "--async-scheduling", "--gpu-memory-utilization", "0.94"],
 					"notes": "Recommended for H100/H200. Requires vLLM 0.10.1+gptoss. Tools/function calls only via /v1/responses endpoint."
 				},
 				{
 					"gpuCount": 4,
 					"gpuTypes": ["H100", "H200"],
 					"args": ["--tensor-parallel-size", "4", "--async-scheduling"],
 					"notes": "Higher throughput. Requires vLLM 0.10.1+gptoss. Tools/function calls only via /v1/responses endpoint."
 				},
 				{
 					"gpuCount": 8,
 					"gpuTypes": ["H100", "H200"],
 					"args": ["--tensor-parallel-size", "8", "--async-scheduling"],
 					"notes": "Maximum throughput for evaluation workloads. Requires vLLM 0.10.1+gptoss. Tools/function calls only via /v1/responses endpoint."
 				}
 			]
 		},
 		"zai-org/GLM-4.5": {
 			"name": "GLM-4.5",
 			"configs": [
 				{
 					"gpuCount": 16,
 					"gpuTypes": ["H100"],
 					"args": [
 						"--tensor-parallel-size",
 						"16",
 						"--tool-call-parser",
 						"glm4_moe",
 						"--reasoning-parser",
 						"glm4_moe",
 						"--enable-auto-tool-choice"
 					]
 				},
 				{
 					"gpuCount": 8,
 					"gpuTypes": ["H200"],
 					"args": [
 						"--tensor-parallel-size",
 						"8",
 						"--tool-call-parser",
 						"glm4_moe",
 						"--reasoning-parser",
 						"glm4_moe",
 						"--enable-auto-tool-choice"
 					]
 				}
 			],
 			"notes": "Models default to thinking mode. For full 128K context, double the GPU count."
 		},
 		"zai-org/GLM-4.5-FP8": {
 			"name": "GLM-4.5-FP8",
 			"configs": [
 				{
 					"gpuCount": 8,
 					"gpuTypes": ["H100"],
 					"args": [
 						"--tensor-parallel-size",
 						"8",
 						"--tool-call-parser",
 						"glm4_moe",
 						"--reasoning-parser",
 						"glm4_moe",
 						"--enable-auto-tool-choice"
 					]
 				},
 				{
 					"gpuCount": 4,
 					"gpuTypes": ["H200"],
 					"args": [
 						"--tensor-parallel-size",
 						"4",
 						"--tool-call-parser",
 						"glm4_moe",
 						"--reasoning-parser",
 						"glm4_moe",
 						"--enable-auto-tool-choice"
 					]
 				}
 			]
 		},
 		"zai-org/GLM-4.5-Air-FP8": {
 			"name": "GLM-4.5-Air-FP8",
 			"configs": [
 				{
 					"gpuCount": 2,
 					"gpuTypes": ["H100"],
 					"args": [
 						"--tensor-parallel-size",
 						"2",
 						"--tool-call-parser",
 						"glm4_moe",
 						"--reasoning-parser",
 						"glm4_moe",
 						"--enable-auto-tool-choice",
 						"--quantization",
 						"fp8"
 					],
 					"env": {
 						"VLLM_ATTENTION_BACKEND": "XFORMERS"
 					},
 					"notes": "FP8 model requires vLLM with proper FP8 support or MTP module"
 				},
 				{
 					"gpuCount": 1,
 					"gpuTypes": ["H200"],
 					"args": [
 						"--tool-call-parser",
 						"glm4_moe",
 						"--reasoning-parser",
 						"glm4_moe",
 						"--enable-auto-tool-choice",
 						"--quantization",
 						"fp8"
 					],
 					"env": {
 						"VLLM_ATTENTION_BACKEND": "XFORMERS"
 					},
 					"notes": "FP8 model requires vLLM with proper FP8 support or MTP module"
 				}
 			]
 		},
 		"zai-org/GLM-4.5-Air": {
 			"name": "GLM-4.5-Air",
 			"configs": [
 				{
 					"gpuCount": 2,
 					"gpuTypes": ["H100", "H200"],
 					"args": [
 						"--tensor-parallel-size",
 						"2",
 						"--tool-call-parser",
 						"glm4_moe",
 						"--reasoning-parser",
 						"glm4_moe",
 						"--enable-auto-tool-choice"
 					],
 					"notes": "Non-quantized BF16 version, more compatible"
 				},
 				{
 					"gpuCount": 1,
 					"gpuTypes": ["H200"],
 					"args": [
 						"--tool-call-parser",
 						"glm4_moe",
 						"--reasoning-parser",
 						"glm4_moe",
 						"--enable-auto-tool-choice",
 						"--gpu-memory-utilization",
 						"0.95"
 					],
 					"notes": "Single H200 can fit the BF16 model with high memory utilization"
 				}
 			]
 		},
 		"moonshotai/Kimi-K2-Instruct": {
 			"name": "Kimi-K2",
 			"configs": [
 				{
 					"gpuCount": 16,
 					"gpuTypes": ["H200", "H20"],
 					"args": [
 						"--tensor-parallel-size",
 						"16",
 						"--trust-remote-code",
 						"--enable-auto-tool-choice",
 						"--tool-call-parser",
 						"kimi_k2"
 					],
 					"notes": "Pure TP mode. For >16 GPUs, combine with pipeline-parallelism."
 				}
 			],
 			"notes": "Requires vLLM v0.10.0rc1+. Minimum 16 GPUs for FP8 with 128k context."
 		}
 	}
 }
--- a/packages/pods/src/ssh.ts
+++ b/packages/pods/src/ssh.ts
@ -0,0 +1,151 @@
 import { type SpawnOptions, spawn } from "child_process";
 export interface SSHResult {
 	stdout: string;
 	stderr: string;
 	exitCode: number;
 }
 /**
 * Execute an SSH command and return the result
 */
 export const sshExec = async (
 	sshCmd: string,
 	command: string,
 	options?: { keepAlive?: boolean },
 ): Promise<SSHResult> => {
 	return new Promise((resolve) => {
 		// Parse SSH command (e.g., "ssh root@1.2.3.4" or "ssh -p 22 root@1.2.3.4")
 		const sshParts = sshCmd.split(" ").filter((p) => p);
 		const sshBinary = sshParts[0];
 		let sshArgs = [...sshParts.slice(1)];
 		// Add SSH keepalive options for long-running commands
 		if (options?.keepAlive) {
 			// ServerAliveInterval=30 sends keepalive every 30 seconds
 			// ServerAliveCountMax=120 allows up to 120 failures (60 minutes total)
 			sshArgs = ["-o", "ServerAliveInterval=30", "-o", "ServerAliveCountMax=120", ...sshArgs];
 		}
 		sshArgs.push(command);
 		const proc = spawn(sshBinary, sshArgs, {
 			stdio: ["ignore", "pipe", "pipe"],
 		});
 		let stdout = "";
 		let stderr = "";
 		proc.stdout.on("data", (data) => {
 			stdout += data.toString();
 		});
 		proc.stderr.on("data", (data) => {
 			stderr += data.toString();
 		});
 		proc.on("close", (code) => {
 			resolve({
 				stdout,
 				stderr,
 				exitCode: code || 0,
 			});
 		});
 		proc.on("error", (err) => {
 			resolve({
 				stdout,
 				stderr: err.message,
 				exitCode: 1,
 			});
 		});
 	});
 };
 /**
 * Execute an SSH command with streaming output to console
 */
 export const sshExecStream = async (
 	sshCmd: string,
 	command: string,
 	options?: { silent?: boolean; forceTTY?: boolean; keepAlive?: boolean },
 ): Promise<number> => {
 	return new Promise((resolve) => {
 		const sshParts = sshCmd.split(" ").filter((p) => p);
 		const sshBinary = sshParts[0];
 		// Build SSH args
 		let sshArgs = [...sshParts.slice(1)];
 		// Add -t flag if requested and not already present
 		if (options?.forceTTY && !sshParts.includes("-t")) {
 			sshArgs = ["-t", ...sshArgs];
 		}
 		// Add SSH keepalive options for long-running commands
 		if (options?.keepAlive) {
 			// ServerAliveInterval=30 sends keepalive every 30 seconds
 			// ServerAliveCountMax=120 allows up to 120 failures (60 minutes total)
 			sshArgs = ["-o", "ServerAliveInterval=30", "-o", "ServerAliveCountMax=120", ...sshArgs];
 		}
 		sshArgs.push(command);
 		const spawnOptions: SpawnOptions = options?.silent
 			? { stdio: ["ignore", "ignore", "ignore"] }
 			: { stdio: "inherit" };
 		const proc = spawn(sshBinary, sshArgs, spawnOptions);
 		proc.on("close", (code) => {
 			resolve(code || 0);
 		});
 		proc.on("error", () => {
 			resolve(1);
 		});
 	});
 };
 /**
 * Copy a file to remote via SCP
 */
 export const scpFile = async (sshCmd: string, localPath: string, remotePath: string): Promise<boolean> => {
 	// Extract host from SSH command
 	const sshParts = sshCmd.split(" ").filter((p) => p);
 	let host = "";
 	let port = "22";
 	let i = 1; // Skip 'ssh'
 	while (i < sshParts.length) {
 		if (sshParts[i] === "-p" && i + 1 < sshParts.length) {
 			port = sshParts[i + 1];
 			i += 2;
 		} else if (!sshParts[i].startsWith("-")) {
 			host = sshParts[i];
 			break;
 		} else {
 			i++;
 		}
 	}
 	if (!host) {
 		console.error("Could not parse host from SSH command");
 		return false;
 	}
 	// Build SCP command
 	const scpArgs = ["-P", port, localPath, `${host}:${remotePath}`];
 	return new Promise((resolve) => {
 		const proc = spawn("scp", scpArgs, { stdio: "inherit" });
 		proc.on("close", (code) => {
 			resolve(code === 0);
 		});
 		proc.on("error", () => {
 			resolve(false);
 		});
 	});
 };
--- a/packages/pods/src/types.ts
+++ b/packages/pods/src/types.ts
@ -0,0 +1,27 @@
 // Core type definitions for pi
 export interface GPU {
 	id: number;
 	name: string;
 	memory: string;
 }
 export interface Model {
 	model: string;
 	port: number;
 	gpu: number[]; // Array of GPU IDs for multi-GPU deployment
 	pid: number;
 }
 export interface Pod {
 	ssh: string;
 	gpus: GPU[];
 	models: Record<string, Model>;
 	modelsPath?: string;
 	vllmVersion?: "release" | "nightly" | "gpt-oss"; // Track which vLLM version is installed
 }
 export interface Config {
 	pods: Record<string, Pod>;
 	active?: string;
 }
--- a/packages/pods/tsconfig.build.json
+++ b/packages/pods/tsconfig.build.json
@ -0,0 +1,9 @@
 {
 	"extends": "../../tsconfig.base.json",
 	"compilerOptions": {
 		"outDir": "./dist",
 		"rootDir": "./src"
 	},
 	"include": ["src/**/*", "src/**/*.json"],
 	"exclude": ["node_modules", "dist"]
 }
--- a/packages/tui/README.md
+++ b/packages/tui/README.md
@ -0,0 +1,655 @@
 # @mariozechner/pi-tui
 Terminal UI framework with differential rendering for building interactive CLI applications.
 ## Features
 - **Differential Rendering**: Only re-renders content that has changed for optimal performance
 - **Interactive Components**: Text editor, autocomplete, selection lists, and markdown rendering
 - **Composable Architecture**: Container-based component system with proper lifecycle management
 - **Text Editor Autocomplete System**: File completion and slash commands with provider interface
 ## Quick Start
 ```typescript
 import { TUI, Container, TextComponent, TextEditor } from "@mariozechner/pi-tui";
 // Create TUI manager
 const ui = new TUI();
 // Create components
 const header = new TextComponent("🚀 My TUI App");
 const chatContainer = new Container();
 const editor = new TextEditor();
 // Add components to UI
 ui.addChild(header);
 ui.addChild(chatContainer);
 ui.addChild(editor);
 // Set focus to the editor
 ui.setFocus(editor);
 // Handle editor submissions
 editor.onSubmit = (text: string) => {
 	if (text.trim()) {
 		const message = new TextComponent(`💬 ${text}`);
 		chatContainer.addChild(message);
 		ui.requestRender();
 	}
 };
 // Start the UI
 ui.start();
 ```
 ## Core Components
 ### TUI
 Main TUI manager that handles rendering, input, and component coordination.
 **Methods:**
 - `addChild(component)` - Add a component to the TUI
 - `removeChild(component)` - Remove a component from the TUI
 - `setFocus(component)` - Set which component receives keyboard input
 - `start()` - Start the TUI (enables raw mode)
 - `stop()` - Stop the TUI (disables raw mode)
 - `requestRender()` - Request a re-render on next tick
 - `configureLogging(config)` - Configure debug logging
 - `cleanupSentinels()` - Remove placeholder components after removal operations
 - `findComponent(component)` - Check if a component exists in the hierarchy (private)
 - `findInContainer(container, component)` - Search for component in container (private)
 ### Container
 Component that manages child components with differential rendering.
 **Constructor:**
 ```typescript
 new Container(parentTui?: TUI | undefined)
 ```
 **Methods:**
 - `addChild(component)` - Add a child component
 - `removeChild(component)` - Remove a child component
 - `getChild(index)` - Get a specific child component
 - `getChildCount()` - Get the number of child components
 - `clear()` - Remove all child components
 - `setParentTui(tui)` - Set the parent TUI reference
 - `cleanupSentinels()` - Clean up removed component placeholders
 - `render(width)` - Render all child components (returns ContainerRenderResult)
 ### TextEditor
 Interactive multiline text editor with cursor support and comprehensive keyboard shortcuts.
 **Constructor:**
 ```typescript
 new TextEditor(config?: TextEditorConfig)
 ```
 **Configuration:**
 ```typescript
 interface TextEditorConfig {
 	// Configuration options for text editor
 }
 editor.configure(config: Partial<TextEditorConfig>)
 ```
 **Properties:**
 - `onSubmit?: (text: string) => void` - Callback when user presses Enter
 - `onChange?: (text: string) => void` - Callback when text content changes
 **Methods:**
 - `getText()` - Get current text content
 - `setText(text)` - Set text content and move cursor to end
 - `setAutocompleteProvider(provider)` - Set autocomplete provider for Tab completion
 - `render(width)` - Render the editor with current state
 - `handleInput(data)` - Process keyboard input
 **Keyboard Shortcuts:**
 **Navigation:**
 - `Arrow Keys` - Move cursor
 - `Home` / `Ctrl+A` - Move to start of line
 - `End` / `Ctrl+E` - Move to end of line
 **Editing:**
 - `Backspace` - Delete character before cursor
 - `Delete` / `Fn+Backspace` - Delete character at cursor
 - `Ctrl+K` - Delete current line
 - `Enter` - Submit text (calls onSubmit)
 - `Shift+Enter` / `Option+Enter` - Add new line
 - `Tab` - Trigger autocomplete
 **Autocomplete (when active):**
 - `Tab` - Apply selected completion
 - `Arrow Up/Down` - Navigate suggestions
 - `Escape` - Cancel autocomplete
 - `Enter` - Cancel autocomplete and submit
 **Paste Detection:**
 - Automatically handles multi-line paste
 - Converts tabs to 4 spaces
 - Filters non-printable characters
 ### TextComponent
 Simple text component with automatic text wrapping and differential rendering.
 **Constructor:**
 ```typescript
 new TextComponent(text: string, padding?: Padding)
 interface Padding {
 	top?: number;
 	bottom?: number;
 	left?: number;
 	right?: number;
 }
 ```
 **Methods:**
 - `setText(text)` - Update the text content
 - `getText()` - Get current text content
 - `render(width)` - Render with word wrapping
 **Features:**
 - Automatic text wrapping to fit terminal width
 - Configurable padding on all sides
 - Preserves line breaks in source text
 - Uses differential rendering to avoid unnecessary updates
 ### MarkdownComponent
 Renders markdown content with syntax highlighting and proper formatting.
 **Constructor:**
 ```typescript
 new MarkdownComponent(text?: string)
 ```
 **Methods:**
 - `setText(text)` - Update markdown content
 - `render(width)` - Render parsed markdown
 **Features:**
 - **Headings**: Styled with colors and formatting
 - **Code blocks**: Syntax highlighting with gray background
 - **Lists**: Bullet points (•) and numbered lists
 - **Emphasis**: **Bold** and _italic_ text
 - **Links**: Underlined with URL display
 - **Blockquotes**: Styled with left border
 - **Inline code**: Highlighted with background
 - **Horizontal rules**: Terminal-width separator lines
 - Differential rendering for performance
 ### SelectList
 Interactive selection component for choosing from options.
 **Constructor:**
 ```typescript
 new SelectList(items: SelectItem[], maxVisible?: number)
 interface SelectItem {
 	value: string;
 	label: string;
 	description?: string;
 }
 ```
 **Properties:**
 - `onSelect?: (item: SelectItem) => void` - Called when item is selected
 - `onCancel?: () => void` - Called when selection is cancelled
 **Methods:**
 - `setFilter(filter)` - Filter items by value
 - `getSelectedItem()` - Get currently selected item
 - `handleInput(keyData)` - Handle keyboard navigation
 - `render(width)` - Render the selection list
 **Features:**
 - Keyboard navigation (arrow keys, Enter)
 - Search/filter functionality
 - Scrolling for long lists
 - Custom option rendering with descriptions
 - Visual selection indicator (→)
 - Scroll position indicator
 ### Autocomplete System
 Comprehensive autocomplete system supporting slash commands and file paths.
 #### AutocompleteProvider Interface
 ```typescript
 interface AutocompleteProvider {
 	getSuggestions(
 		lines: string[],
 		cursorLine: number,
 		cursorCol: number,
 	): {
 		items: AutocompleteItem[];
 		prefix: string;
 	} | null;
 	applyCompletion(
 		lines: string[],
 		cursorLine: number,
 		cursorCol: number,
 		item: AutocompleteItem,
 		prefix: string,
 	): {
 		lines: string[];
 		cursorLine: number;
 		cursorCol: number;
 	};
 }
 interface AutocompleteItem {
 	value: string;
 	label: string;
 	description?: string;
 }
 ```
 #### CombinedAutocompleteProvider
 Built-in provider supporting slash commands and file completion.
 **Constructor:**
 ```typescript
 new CombinedAutocompleteProvider(
 	commands: (SlashCommand | AutocompleteItem)[] = [],
 	basePath: string = process.cwd()
 )
 interface SlashCommand {
 	name: string;
 	description?: string;
 	getArgumentCompletions?(argumentPrefix: string): AutocompleteItem[] | null;
 }
 ```
 **Features:**
 **Slash Commands:**
 - Type `/` to trigger command completion
 - Auto-completion for command names
 - Argument completion for commands that support it
 - Space after command name for argument input
 **File Completion:**
 - `Tab` key triggers file completion
 - `@` prefix for file attachments
 - Home directory expansion (`~/`)
 - Relative and absolute path support
 - Directory-first sorting
 - Filters to attachable files for `@` prefix
 **Path Patterns:**
 - `./` and `../` - Relative paths
 - `~/` - Home directory
 - `@path` - File attachment syntax
 - Tab completion from any context
 **Methods:**
 - `getSuggestions()` - Get completions for current context
 - `getForceFileSuggestions()` - Force file completion (Tab key)
 - `shouldTriggerFileCompletion()` - Check if file completion should trigger
 - `applyCompletion()` - Apply selected completion
 ## Differential Rendering
 The core concept: components return `{lines: string[], changed: boolean, keepLines?: number}`:
 - `lines`: All lines the component should display
 - `changed`: Whether the component has changed since last render
 - `keepLines`: (Containers only) How many lines from the beginning are unchanged
 **How it works:**
 1. TUI calculates total unchanged lines from top (`keepLines`)
 2. Moves cursor up by `(totalLines - keepLines)` positions
 3. Clears from cursor position down with `\x1b[0J`
 4. Prints only the changing lines: `result.lines.slice(keepLines)`
 This approach minimizes screen updates and provides smooth performance even with large amounts of text.
 **Important:** Don't add extra cursor positioning after printing - it interferes with terminal scrolling and causes rendering artifacts.
 ## Advanced Examples
 ### Chat Application with Autocomplete
 ```typescript
 import { TUI, Container, TextEditor, MarkdownComponent, CombinedAutocompleteProvider } from "@mariozechner/pi-tui";
 const ui = new TUI();
 const chatHistory = new Container();
 const editor = new TextEditor();
 // Set up autocomplete with slash commands
 const autocompleteProvider = new CombinedAutocompleteProvider([
 	{ name: "clear", description: "Clear chat history" },
 	{ name: "help", description: "Show help information" },
 	{
 		name: "attach",
 		description: "Attach a file",
 		getArgumentCompletions: (prefix) => {
 			// Return file suggestions for attach command
 			return null; // Use default file completion
 		},
 	},
 ]);
 editor.setAutocompleteProvider(autocompleteProvider);
 editor.onSubmit = (text) => {
 	// Handle slash commands
 	if (text.startsWith("/")) {
 		const [command, ...args] = text.slice(1).split(" ");
 		if (command === "clear") {
 			chatHistory.clear();
 			return;
 		}
 		if (command === "help") {
 			const help = new MarkdownComponent(`
 ## Available Commands
 - \`/clear\` - Clear chat history
 - \`/help\` - Show this help
 - \`/attach <file>\` - Attach a file
 			`);
 			chatHistory.addChild(help);
 			ui.requestRender();
 			return;
 		}
 	}
 	// Regular message
 	const message = new MarkdownComponent(`**You:** ${text}`);
 	chatHistory.addChild(message);
 	// Add AI response (simulated)
 	setTimeout(() => {
 		const response = new MarkdownComponent(`**AI:** Response to "${text}"`);
 		chatHistory.addChild(response);
 		ui.requestRender();
 	}, 1000);
 };
 ui.addChild(chatHistory);
 ui.addChild(editor);
 ui.setFocus(editor);
 ui.start();
 ```
 ### File Browser
 ```typescript
 import { TUI, SelectList } from "@mariozechner/pi-tui";
 import { readdirSync, statSync } from "fs";
 import { join } from "path";
 const ui = new TUI();
 let currentPath = process.cwd();
 function createFileList(path: string) {
 	const entries = readdirSync(path).map((entry) => {
 		const fullPath = join(path, entry);
 		const isDir = statSync(fullPath).isDirectory();
 		return {
 			value: entry,
 			label: entry,
 			description: isDir ? "directory" : "file",
 		};
 	});
 	// Add parent directory option
 	if (path !== "/") {
 		entries.unshift({
 			value: "..",
 			label: "..",
 			description: "parent directory",
 		});
 	}
 	return entries;
 }
 function showDirectory(path: string) {
 	ui.clear();
 	const entries = createFileList(path);
 	const fileList = new SelectList(entries, 10);
 	fileList.onSelect = (item) => {
 		if (item.value === "..") {
 			currentPath = join(currentPath, "..");
 			showDirectory(currentPath);
 		} else if (item.description === "directory") {
 			currentPath = join(currentPath, item.value);
 			showDirectory(currentPath);
 		} else {
 			console.log(`Selected file: ${join(currentPath, item.value)}`);
 			ui.stop();
 		}
 	};
 	ui.addChild(fileList);
 	ui.setFocus(fileList);
 }
 showDirectory(currentPath);
 ui.start();
 ```
 ### Multi-Component Layout
 ```typescript
 import { TUI, Container, TextComponent, TextEditor, MarkdownComponent } from "@mariozechner/pi-tui";
 const ui = new TUI();
 // Create layout containers
 const header = new TextComponent("📝 Advanced TUI Demo", { bottom: 1 });
 const mainContent = new Container();
 const sidebar = new Container();
 const footer = new TextComponent("Press Ctrl+C to exit", { top: 1 });
 // Sidebar content
 sidebar.addChild(new TextComponent("📁 Files:", { bottom: 1 }));
 sidebar.addChild(new TextComponent("- config.json"));
 sidebar.addChild(new TextComponent("- README.md"));
 sidebar.addChild(new TextComponent("- package.json"));
 // Main content area
 const chatArea = new Container();
 const inputArea = new TextEditor();
 // Add welcome message
 chatArea.addChild(
 	new MarkdownComponent(`
 # Welcome to the TUI Demo
 This demonstrates multiple components working together:
 - **Header**: Static title with padding
 - **Sidebar**: File list (simulated)
 - **Chat Area**: Scrollable message history
 - **Input**: Interactive text editor
 - **Footer**: Status information
 Try typing a message and pressing Enter!
 `),
 );
 inputArea.onSubmit = (text) => {
 	if (text.trim()) {
 		const message = new MarkdownComponent(`
 **${new Date().toLocaleTimeString()}:** ${text}
 		`);
 		chatArea.addChild(message);
 		ui.requestRender();
 	}
 };
 // Build layout
 mainContent.addChild(chatArea);
 mainContent.addChild(inputArea);
 ui.addChild(header);
 ui.addChild(mainContent);
 ui.addChild(footer);
 ui.setFocus(inputArea);
 // Configure debug logging
 ui.configureLogging({
 	enabled: true,
 	level: "info",
 	logFile: "tui-debug.log",
 });
 ui.start();
 ```
 ## Interfaces and Types
 ### Core Types
 ```typescript
 interface ComponentRenderResult {
 	lines: string[];
 	changed: boolean;
 }
 interface ContainerRenderResult extends ComponentRenderResult {
 	keepLines: number;
 }
 interface Component {
 	render(width: number): ComponentRenderResult;
 	handleInput?(keyData: string): void;
 }
 interface Padding {
 	top?: number;
 	bottom?: number;
 	left?: number;
 	right?: number;
 }
 ```
 ### Autocomplete Types
 ```typescript
 interface AutocompleteItem {
 	value: string;
 	label: string;
 	description?: string;
 }
 interface SlashCommand {
 	name: string;
 	description?: string;
 	getArgumentCompletions?(argumentPrefix: string): AutocompleteItem[] | null;
 }
 interface AutocompleteProvider {
 	getSuggestions(
 		lines: string[],
 		cursorLine: number,
 		cursorCol: number,
 	): {
 		items: AutocompleteItem[];
 		prefix: string;
 	} | null;
 	applyCompletion(
 		lines: string[],
 		cursorLine: number,
 		cursorCol: number,
 		item: AutocompleteItem,
 		prefix: string,
 	): {
 		lines: string[];
 		cursorLine: number;
 		cursorCol: number;
 	};
 }
 ```
 ### Selection Types
 ```typescript
 interface SelectItem {
 	value: string;
 	label: string;
 	description?: string;
 }
 ```
 ## Development
 ```bash
 # Install dependencies (from monorepo root)
 npm install
 # Build the package
 npm run build
 # Run type checking
 npm run check
 ```
 **Testing:**
 Create a test file and run it with tsx:
 ```bash
 # From packages/tui directory
 npx tsx test/demo.ts
 ```
 Special input keywords for simulation: "TAB", "ENTER", "SPACE", "ESC"
 **Debugging:**
 Enable logging to see detailed component behavior:
 ```typescript
 ui.configureLogging({
 	enabled: true,
 	level: "debug", // "error" | "warn" | "info" | "debug"
 	logFile: "tui-debug.log",
 });
 ```
 Check the log file to debug rendering issues, input handling, and component lifecycle.
--- a/packages/tui/package-lock.json
+++ b/packages/tui/package-lock.json
@ -0,0 +1,289 @@
 {
 	"name": "@mariozechner/tui",
 	"version": "0.5.0",
 	"lockfileVersion": 3,
 	"requires": true,
 	"packages": {
 		"": {
 			"name": "@mariozechner/tui",
 			"version": "0.5.0",
 			"license": "MIT",
 			"dependencies": {
 				"@types/mime-types": "^2.1.4",
 				"chalk": "^5.4.1",
 				"marked": "^15.0.12",
 				"mime-types": "^3.0.1"
 			},
 			"devDependencies": {
 				"@biomejs/biome": "^2.1.3",
 				"@types/node": "^20.19.9",
 				"husky": "^9.1.7",
 				"typescript": "^5.0.0"
 			},
 			"engines": {
 				"node": ">=18.0.0"
 			}
 		},
 		"node_modules/@biomejs/biome": {
 			"version": "2.1.3",
 			"resolved": "https://registry.npmjs.org/@biomejs/biome/-/biome-2.1.3.tgz",
 			"integrity": "sha512-KE/tegvJIxTkl7gJbGWSgun7G6X/n2M6C35COT6ctYrAy7SiPyNvi6JtoQERVK/VRbttZfgGq96j2bFmhmnH4w==",
 			"dev": true,
 			"license": "MIT OR Apache-2.0",
 			"bin": {
 				"biome": "bin/biome"
 			},
 			"engines": {
 				"node": ">=14.21.3"
 			},
 			"funding": {
 				"type": "opencollective",
 				"url": "https://opencollective.com/biome"
 			},
 			"optionalDependencies": {
 				"@biomejs/cli-darwin-arm64": "2.1.3",
 				"@biomejs/cli-darwin-x64": "2.1.3",
 				"@biomejs/cli-linux-arm64": "2.1.3",
 				"@biomejs/cli-linux-arm64-musl": "2.1.3",
 				"@biomejs/cli-linux-x64": "2.1.3",
 				"@biomejs/cli-linux-x64-musl": "2.1.3",
 				"@biomejs/cli-win32-arm64": "2.1.3",
 				"@biomejs/cli-win32-x64": "2.1.3"
 			}
 		},
 		"node_modules/@biomejs/cli-darwin-arm64": {
 			"version": "2.1.3",
 			"resolved": "https://registry.npmjs.org/@biomejs/cli-darwin-arm64/-/cli-darwin-arm64-2.1.3.tgz",
 			"integrity": "sha512-LFLkSWRoSGS1wVUD/BE6Nlt2dSn0ulH3XImzg2O/36BoToJHKXjSxzPEMAqT9QvwVtk7/9AQhZpTneERU9qaXA==",
 			"cpu": [
 				"arm64"
 			],
 			"dev": true,
 			"license": "MIT OR Apache-2.0",
 			"optional": true,
 			"os": [
 				"darwin"
 			],
 			"engines": {
 				"node": ">=14.21.3"
 			}
 		},
 		"node_modules/@biomejs/cli-darwin-x64": {
 			"version": "2.1.3",
 			"resolved": "https://registry.npmjs.org/@biomejs/cli-darwin-x64/-/cli-darwin-x64-2.1.3.tgz",
 			"integrity": "sha512-Q/4OTw8P9No9QeowyxswcWdm0n2MsdCwWcc5NcKQQvzwPjwuPdf8dpPPf4r+x0RWKBtl1FLiAUtJvBlri6DnYw==",
 			"cpu": [
 				"x64"
 			],
 			"dev": true,
 			"license": "MIT OR Apache-2.0",
 			"optional": true,
 			"os": [
 				"darwin"
 			],
 			"engines": {
 				"node": ">=14.21.3"
 			}
 		},
 		"node_modules/@biomejs/cli-linux-arm64": {
 			"version": "2.1.3",
 			"resolved": "https://registry.npmjs.org/@biomejs/cli-linux-arm64/-/cli-linux-arm64-2.1.3.tgz",
 			"integrity": "sha512-2hS6LgylRqMFmAZCOFwYrf77QMdUwJp49oe8PX/O8+P2yKZMSpyQTf3Eo5ewnsMFUEmYbPOskafdV1ds1MZMJA==",
 			"cpu": [
 				"arm64"
 			],
 			"dev": true,
 			"license": "MIT OR Apache-2.0",
 			"optional": true,
 			"os": [
 				"linux"
 			],
 			"engines": {
 				"node": ">=14.21.3"
 			}
 		},
 		"node_modules/@biomejs/cli-linux-arm64-musl": {
 			"version": "2.1.3",
 			"resolved": "https://registry.npmjs.org/@biomejs/cli-linux-arm64-musl/-/cli-linux-arm64-musl-2.1.3.tgz",
 			"integrity": "sha512-KXouFSBnoxAWZYDQrnNRzZBbt5s9UJkIm40hdvSL9mBxSSoxRFQJbtg1hP3aa8A2SnXyQHxQfpiVeJlczZt76w==",
 			"cpu": [
 				"arm64"
 			],
 			"dev": true,
 			"license": "MIT OR Apache-2.0",
 			"optional": true,
 			"os": [
 				"linux"
 			],
 			"engines": {
 				"node": ">=14.21.3"
 			}
 		},
 		"node_modules/@biomejs/cli-linux-x64": {
 			"version": "2.1.3",
 			"resolved": "https://registry.npmjs.org/@biomejs/cli-linux-x64/-/cli-linux-x64-2.1.3.tgz",
 			"integrity": "sha512-NxlSCBhLvQtWGagEztfAZ4WcE1AkMTntZV65ZvR+J9jp06+EtOYEBPQndA70ZGhHbEDG57bR6uNvqkd1WrEYVA==",
 			"cpu": [
 				"x64"
 			],
 			"dev": true,
 			"license": "MIT OR Apache-2.0",
 			"optional": true,
 			"os": [
 				"linux"
 			],
 			"engines": {
 				"node": ">=14.21.3"
 			}
 		},
 		"node_modules/@biomejs/cli-linux-x64-musl": {
 			"version": "2.1.3",
 			"resolved": "https://registry.npmjs.org/@biomejs/cli-linux-x64-musl/-/cli-linux-x64-musl-2.1.3.tgz",
 			"integrity": "sha512-KaLAxnROouzIWtl6a0Y88r/4hW5oDUJTIqQorOTVQITaKQsKjZX4XCUmHIhdEk8zMnaiLZzRTAwk1yIAl+mIew==",
 			"cpu": [
 				"x64"
 			],
 			"dev": true,
 			"license": "MIT OR Apache-2.0",
 			"optional": true,
 			"os": [
 				"linux"
 			],
 			"engines": {
 				"node": ">=14.21.3"
 			}
 		},
 		"node_modules/@biomejs/cli-win32-arm64": {
 			"version": "2.1.3",
 			"resolved": "https://registry.npmjs.org/@biomejs/cli-win32-arm64/-/cli-win32-arm64-2.1.3.tgz",
 			"integrity": "sha512-V9CUZCtWH4u0YwyCYbQ3W5F4ZGPWp2C2TYcsiWFNNyRfmOW1j/TY/jAurl33SaRjgZPO5UUhGyr9m6BN9t84NQ==",
 			"cpu": [
 				"arm64"
 			],
 			"dev": true,
 			"license": "MIT OR Apache-2.0",
 			"optional": true,
 			"os": [
 				"win32"
 			],
 			"engines": {
 				"node": ">=14.21.3"
 			}
 		},
 		"node_modules/@biomejs/cli-win32-x64": {
 			"version": "2.1.3",
 			"resolved": "https://registry.npmjs.org/@biomejs/cli-win32-x64/-/cli-win32-x64-2.1.3.tgz",
 			"integrity": "sha512-dxy599q6lgp8ANPpR8sDMscwdp9oOumEsVXuVCVT9N2vAho8uYXlCz53JhxX6LtJOXaE73qzgkGQ7QqvFlMC0g==",
 			"cpu": [
 				"x64"
 			],
 			"dev": true,
 			"license": "MIT OR Apache-2.0",
 			"optional": true,
 			"os": [
 				"win32"
 			],
 			"engines": {
 				"node": ">=14.21.3"
 			}
 		},
 		"node_modules/@types/mime-types": {
 			"version": "2.1.4",
 			"resolved": "https://registry.npmjs.org/@types/mime-types/-/mime-types-2.1.4.tgz",
 			"integrity": "sha512-lfU4b34HOri+kAY5UheuFMWPDOI+OPceBSHZKp69gEyTL/mmJ4cnU6Y/rlme3UL3GyOn6Y42hyIEw0/q8sWx5w==",
 			"license": "MIT"
 		},
 		"node_modules/@types/node": {
 			"version": "20.19.9",
 			"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.9.tgz",
 			"integrity": "sha512-cuVNgarYWZqxRJDQHEB58GEONhOK79QVR/qYx4S7kcUObQvUwvFnYxJuuHUKm2aieN9X3yZB4LZsuYNU1Qphsw==",
 			"dev": true,
 			"license": "MIT",
 			"dependencies": {
 				"undici-types": "~6.21.0"
 			}
 		},
 		"node_modules/chalk": {
 			"version": "5.4.1",
 			"resolved": "https://registry.npmjs.org/chalk/-/chalk-5.4.1.tgz",
 			"integrity": "sha512-zgVZuo2WcZgfUEmsn6eO3kINexW8RAE4maiQ8QNs8CtpPCSyMiYsULR3HQYkm3w8FIA3SberyMJMSldGsW+U3w==",
 			"license": "MIT",
 			"engines": {
 				"node": "^12.17.0 || ^14.13 || >=16.0.0"
 			},
 			"funding": {
 				"url": "https://github.com/chalk/chalk?sponsor=1"
 			}
 		},
 		"node_modules/husky": {
 			"version": "9.1.7",
 			"resolved": "https://registry.npmjs.org/husky/-/husky-9.1.7.tgz",
 			"integrity": "sha512-5gs5ytaNjBrh5Ow3zrvdUUY+0VxIuWVL4i9irt6friV+BqdCfmV11CQTWMiBYWHbXhco+J1kHfTOUkePhCDvMA==",
 			"dev": true,
 			"license": "MIT",
 			"bin": {
 				"husky": "bin.js"
 			},
 			"engines": {
 				"node": ">=18"
 			},
 			"funding": {
 				"url": "https://github.com/sponsors/typicode"
 			}
 		},
 		"node_modules/marked": {
 			"version": "15.0.12",
 			"resolved": "https://registry.npmjs.org/marked/-/marked-15.0.12.tgz",
 			"integrity": "sha512-8dD6FusOQSrpv9Z1rdNMdlSgQOIP880DHqnohobOmYLElGEqAL/JvxvuxZO16r4HtjTlfPRDC1hbvxC9dPN2nA==",
 			"license": "MIT",
 			"bin": {
 				"marked": "bin/marked.js"
 			},
 			"engines": {
 				"node": ">= 18"
 			}
 		},
 		"node_modules/mime-db": {
 			"version": "1.54.0",
 			"resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.54.0.tgz",
 			"integrity": "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==",
 			"license": "MIT",
 			"engines": {
 				"node": ">= 0.6"
 			}
 		},
 		"node_modules/mime-types": {
 			"version": "3.0.1",
 			"resolved": "https://registry.npmjs.org/mime-types/-/mime-types-3.0.1.tgz",
 			"integrity": "sha512-xRc4oEhT6eaBpU1XF7AjpOFD+xQmXNB5OVKwp4tqCuBpHLS/ZbBDrc07mYTDqVMg6PfxUjjNp85O6Cd2Z/5HWA==",
 			"license": "MIT",
 			"dependencies": {
 				"mime-db": "^1.54.0"
 			},
 			"engines": {
 				"node": ">= 0.6"
 			}
 		},
 		"node_modules/typescript": {
 			"version": "5.9.2",
 			"resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.2.tgz",
 			"integrity": "sha512-CWBzXQrc/qOkhidw1OzBTQuYRbfyxDXJMVJ1XNwUHGROVmuaeiEm3OslpZ1RV96d7SKKjZKrSJu3+t/xlw3R9A==",
 			"dev": true,
 			"license": "Apache-2.0",
 			"bin": {
 				"tsc": "bin/tsc",
 				"tsserver": "bin/tsserver"
 			},
 			"engines": {
 				"node": ">=14.17"
 			}
 		},
 		"node_modules/undici-types": {
 			"version": "6.21.0",
 			"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
 			"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==",
 			"dev": true,
 			"license": "MIT"
 		}
 	}
 }
--- a/packages/tui/package.json
+++ b/packages/tui/package.json
@ -0,0 +1,44 @@
 {
 	"name": "@mariozechner/pi-tui",
 	"version": "0.5.0",
 	"description": "Terminal User Interface library with differential rendering for efficient text-based applications",
 	"type": "module",
 	"main": "dist/index.js",
 	"scripts": {
 		"clean": "rm -rf dist tsconfig.tsbuildinfo",
 		"build": "tsc -p tsconfig.build.json",
 		"check": "biome check --write .",
 		"prepublishOnly": "npm run clean && npm run build"
 	},
 	"files": [
 		"dist/**/*",
 		"README.md"
 	],
 	"keywords": [
 		"tui",
 		"terminal",
 		"ui",
 		"text-editor",
 		"differential-rendering",
 		"typescript",
 		"cli"
 	],
 	"author": "Mario Zechner",
 	"license": "MIT",
 	"repository": {
 		"type": "git",
 		"url": "https://github.com/badlogic/pi-mono.git",
 		"directory": "packages/tui"
 	},
 	"engines": {
 		"node": ">=20.0.0"
 	},
 	"devDependencies": {},
 	"types": "./dist/index.d.ts",
 	"dependencies": {
 		"@types/mime-types": "^2.1.4",
 		"chalk": "^5.5.0",
 		"marked": "^15.0.12",
 		"mime-types": "^3.0.1"
 	}
 }
--- a/packages/tui/src/autocomplete.ts
+++ b/packages/tui/src/autocomplete.ts
@ -0,0 +1,549 @@
 import { readdirSync, statSync } from "fs";
 import mimeTypes from "mime-types";
 import { homedir } from "os";
 import { basename, dirname, extname, join } from "path";
 import { logger } from "./logger.js";
 function isAttachableFile(filePath: string): boolean {
 	const mimeType = mimeTypes.lookup(filePath);
 	// Check file extension for common text files that might be misidentified
 	const textExtensions = [
 		".txt",
 		".md",
 		".markdown",
 		".js",
 		".ts",
 		".tsx",
 		".jsx",
 		".py",
 		".java",
 		".c",
 		".cpp",
 		".h",
 		".hpp",
 		".cs",
 		".php",
 		".rb",
 		".go",
 		".rs",
 		".swift",
 		".kt",
 		".scala",
 		".sh",
 		".bash",
 		".zsh",
 		".fish",
 		".html",
 		".htm",
 		".css",
 		".scss",
 		".sass",
 		".less",
 		".xml",
 		".json",
 		".yaml",
 		".yml",
 		".toml",
 		".ini",
 		".cfg",
 		".conf",
 		".log",
 		".sql",
 		".r",
 		".R",
 		".m",
 		".pl",
 		".lua",
 		".vim",
 		".dockerfile",
 		".makefile",
 		".cmake",
 		".gradle",
 		".maven",
 		".properties",
 		".env",
 	];
 	const ext = extname(filePath).toLowerCase();
 	if (textExtensions.includes(ext)) return true;
 	if (!mimeType) return false;
 	if (mimeType.startsWith("image/")) return true;
 	if (mimeType.startsWith("text/")) return true;
 	// Special cases for common text files that might not be detected as text/
 	const commonTextTypes = [
 		"application/json",
 		"application/javascript",
 		"application/typescript",
 		"application/xml",
 		"application/yaml",
 		"application/x-yaml",
 	];
 	return commonTextTypes.includes(mimeType);
 }
 export interface AutocompleteItem {
 	value: string;
 	label: string;
 	description?: string;
 }
 export interface SlashCommand {
 	name: string;
 	description?: string;
 	// Function to get argument completions for this command
 	// Returns null if no argument completion is available
 	getArgumentCompletions?(argumentPrefix: string): AutocompleteItem[] | null;
 }
 export interface AutocompleteProvider {
 	// Get autocomplete suggestions for current text/cursor position
 	// Returns null if no suggestions available
 	getSuggestions(
 		lines: string[],
 		cursorLine: number,
 		cursorCol: number,
 	): {
 		items: AutocompleteItem[];
 		prefix: string; // What we're matching against (e.g., "/" or "src/")
 	} | null;
 	// Apply the selected item
 	// Returns the new text and cursor position
 	applyCompletion(
 		lines: string[],
 		cursorLine: number,
 		cursorCol: number,
 		item: AutocompleteItem,
 		prefix: string,
 	): {
 		lines: string[];
 		cursorLine: number;
 		cursorCol: number;
 	};
 }
 // Combined provider that handles both slash commands and file paths
 export class CombinedAutocompleteProvider implements AutocompleteProvider {
 	private commands: (SlashCommand | AutocompleteItem)[];
 	private basePath: string;
 	constructor(commands: (SlashCommand | AutocompleteItem)[] = [], basePath: string = process.cwd()) {
 		this.commands = commands;
 		this.basePath = basePath;
 	}
 	getSuggestions(
 		lines: string[],
 		cursorLine: number,
 		cursorCol: number,
 	): { items: AutocompleteItem[]; prefix: string } | null {
 		logger.debug("CombinedAutocompleteProvider", "getSuggestions called", {
 			lines,
 			cursorLine,
 			cursorCol,
 		});
 		const currentLine = lines[cursorLine] || "";
 		const textBeforeCursor = currentLine.slice(0, cursorCol);
 		// Check for slash commands
 		if (textBeforeCursor.startsWith("/")) {
 			const spaceIndex = textBeforeCursor.indexOf(" ");
 			if (spaceIndex === -1) {
 				// No space yet - complete command names
 				const prefix = textBeforeCursor.slice(1); // Remove the "/"
 				const filtered = this.commands
 					.filter((cmd) => {
 						const name = "name" in cmd ? cmd.name : cmd.value; // Check if SlashCommand or AutocompleteItem
 						return name?.toLowerCase().startsWith(prefix.toLowerCase());
 					})
 					.map((cmd) => ({
 						value: "name" in cmd ? cmd.name : cmd.value,
 						label: "name" in cmd ? cmd.name : cmd.label,
 						...(cmd.description && { description: cmd.description }),
 					}));
 				if (filtered.length === 0) return null;
 				return {
 					items: filtered,
 					prefix: textBeforeCursor,
 				};
 			} else {
 				// Space found - complete command arguments
 				const commandName = textBeforeCursor.slice(1, spaceIndex); // Command without "/"
 				const argumentText = textBeforeCursor.slice(spaceIndex + 1); // Text after space
 				const command = this.commands.find((cmd) => {
 					const name = "name" in cmd ? cmd.name : cmd.value;
 					return name === commandName;
 				});
 				if (!command || !("getArgumentCompletions" in command) || !command.getArgumentCompletions) {
 					return null; // No argument completion for this command
 				}
 				const argumentSuggestions = command.getArgumentCompletions(argumentText);
 				if (!argumentSuggestions || argumentSuggestions.length === 0) {
 					return null;
 				}
 				return {
 					items: argumentSuggestions,
 					prefix: argumentText,
 				};
 			}
 		}
 		// Check for file paths - triggered by Tab or if we detect a path pattern
 		const pathMatch = this.extractPathPrefix(textBeforeCursor, false);
 		logger.debug("CombinedAutocompleteProvider", "Path match check", {
 			textBeforeCursor,
 			pathMatch,
 		});
 		if (pathMatch !== null) {
 			const suggestions = this.getFileSuggestions(pathMatch);
 			if (suggestions.length === 0) return null;
 			return {
 				items: suggestions,
 				prefix: pathMatch,
 			};
 		}
 		return null;
 	}
 	applyCompletion(
 		lines: string[],
 		cursorLine: number,
 		cursorCol: number,
 		item: AutocompleteItem,
 		prefix: string,
 	): { lines: string[]; cursorLine: number; cursorCol: number } {
 		const currentLine = lines[cursorLine] || "";
 		const beforePrefix = currentLine.slice(0, cursorCol - prefix.length);
 		const afterCursor = currentLine.slice(cursorCol);
 		// Check if we're completing a slash command (prefix starts with "/")
 		if (prefix.startsWith("/")) {
 			// This is a command name completion
 			const newLine = beforePrefix + "/" + item.value + " " + afterCursor;
 			const newLines = [...lines];
 			newLines[cursorLine] = newLine;
 			return {
 				lines: newLines,
 				cursorLine,
 				cursorCol: beforePrefix.length + item.value.length + 2, // +2 for "/" and space
 			};
 		}
 		// Check if we're completing a file attachment (prefix starts with "@")
 		if (prefix.startsWith("@")) {
 			// This is a file attachment completion
 			const newLine = beforePrefix + item.value + " " + afterCursor;
 			const newLines = [...lines];
 			newLines[cursorLine] = newLine;
 			return {
 				lines: newLines,
 				cursorLine,
 				cursorCol: beforePrefix.length + item.value.length + 1, // +1 for space
 			};
 		}
 		// Check if we're in a slash command context (beforePrefix contains "/command ")
 		const textBeforeCursor = currentLine.slice(0, cursorCol);
 		if (textBeforeCursor.includes("/") && textBeforeCursor.includes(" ")) {
 			// This is likely a command argument completion
 			const newLine = beforePrefix + item.value + afterCursor;
 			const newLines = [...lines];
 			newLines[cursorLine] = newLine;
 			return {
 				lines: newLines,
 				cursorLine,
 				cursorCol: beforePrefix.length + item.value.length,
 			};
 		}
 		// For file paths, complete the path
 		const newLine = beforePrefix + item.value + afterCursor;
 		const newLines = [...lines];
 		newLines[cursorLine] = newLine;
 		return {
 			lines: newLines,
 			cursorLine,
 			cursorCol: beforePrefix.length + item.value.length,
 		};
 	}
 	// Extract a path-like prefix from the text before cursor
 	private extractPathPrefix(text: string, forceExtract: boolean = false): string | null {
 		// Check for @ file attachment syntax first
 		const atMatch = text.match(/@([^\s]*)$/);
 		if (atMatch) {
 			return atMatch[0]; // Return the full @path pattern
 		}
 		// Match paths - including those ending with /, ~/, or any word at end for forced extraction
 		// This regex captures:
 		// - Paths starting from beginning of line or after space/quote/equals
 		// - Optional ./ or ../ or ~/ prefix (including the trailing slash for ~/)
 		// - The path itself (can include / in the middle)
 		// - For forced extraction, capture any word at the end
 		const matches = text.match(/(?:^|[\s"'=])((?:~\/|\.{0,2}\/?)?(?:[^\s"'=]*\/?)*[^\s"'=]*)$/);
 		if (!matches) {
 			// If forced extraction and no matches, return empty string to trigger from current dir
 			return forceExtract ? "" : null;
 		}
 		const pathPrefix = matches[1] || "";
 		// For forced extraction (Tab key), always return something
 		if (forceExtract) {
 			return pathPrefix;
 		}
 		// For natural triggers, return if it looks like a path, ends with /, starts with ~/, .
 		// Only return empty string if the text looks like it's starting a path context
 		if (pathPrefix.includes("/") || pathPrefix.startsWith(".") || pathPrefix.startsWith("~/")) {
 			return pathPrefix;
 		}
 		// Return empty string only if we're at the beginning of the line or after a space
 		// (not after quotes or other delimiters that don't suggest file paths)
 		if (pathPrefix === "" && (text === "" || text.endsWith(" "))) {
 			return pathPrefix;
 		}
 		return null;
 	}
 	// Expand home directory (~/) to actual home path
 	private expandHomePath(path: string): string {
 		if (path.startsWith("~/")) {
 			const expandedPath = join(homedir(), path.slice(2));
 			// Preserve trailing slash if original path had one
 			return path.endsWith("/") && !expandedPath.endsWith("/") ? expandedPath + "/" : expandedPath;
 		} else if (path === "~") {
 			return homedir();
 		}
 		return path;
 	}
 	// Get file/directory suggestions for a given path prefix
 	private getFileSuggestions(prefix: string): AutocompleteItem[] {
 		logger.debug("CombinedAutocompleteProvider", "getFileSuggestions called", {
 			prefix,
 			basePath: this.basePath,
 		});
 		try {
 			let searchDir: string;
 			let searchPrefix: string;
 			let expandedPrefix = prefix;
 			let isAtPrefix = false;
 			// Handle @ file attachment prefix
 			if (prefix.startsWith("@")) {
 				isAtPrefix = true;
 				expandedPrefix = prefix.slice(1); // Remove the @
 			}
 			// Handle home directory expansion
 			if (expandedPrefix.startsWith("~")) {
 				expandedPrefix = this.expandHomePath(expandedPrefix);
 			}
 			if (
 				expandedPrefix === "" ||
 				expandedPrefix === "./" ||
 				expandedPrefix === "../" ||
 				expandedPrefix === "~" ||
 				expandedPrefix === "~/" ||
 				prefix === "@"
 			) {
 				// Complete from specified position
 				if (prefix.startsWith("~")) {
 					searchDir = expandedPrefix;
 				} else {
 					searchDir = join(this.basePath, expandedPrefix);
 				}
 				searchPrefix = "";
 			} else if (expandedPrefix.endsWith("/")) {
 				// If prefix ends with /, show contents of that directory
 				if (prefix.startsWith("~") || (isAtPrefix && expandedPrefix.startsWith("/"))) {
 					searchDir = expandedPrefix;
 				} else {
 					searchDir = join(this.basePath, expandedPrefix);
 				}
 				searchPrefix = "";
 			} else {
 				// Split into directory and file prefix
 				const dir = dirname(expandedPrefix);
 				const file = basename(expandedPrefix);
 				if (prefix.startsWith("~") || (isAtPrefix && expandedPrefix.startsWith("/"))) {
 					searchDir = dir;
 				} else {
 					searchDir = join(this.basePath, dir);
 				}
 				searchPrefix = file;
 			}
 			logger.debug("CombinedAutocompleteProvider", "Searching directory", {
 				searchDir,
 				searchPrefix,
 			});
 			const entries = readdirSync(searchDir);
 			const suggestions: AutocompleteItem[] = [];
 			for (const entry of entries) {
 				if (!entry.toLowerCase().startsWith(searchPrefix.toLowerCase())) {
 					continue;
 				}
 				const fullPath = join(searchDir, entry);
 				const isDirectory = statSync(fullPath).isDirectory();
 				// For @ prefix, filter to only show directories and attachable files
 				if (isAtPrefix && !isDirectory && !isAttachableFile(fullPath)) {
 					continue;
 				}
 				let relativePath: string;
 				// Handle @ prefix path construction
 				if (isAtPrefix) {
 					const pathWithoutAt = expandedPrefix;
 					if (pathWithoutAt.endsWith("/")) {
 						relativePath = "@" + pathWithoutAt + entry;
 					} else if (pathWithoutAt.includes("/")) {
 						if (pathWithoutAt.startsWith("~/")) {
 							const homeRelativeDir = pathWithoutAt.slice(2); // Remove ~/
 							const dir = dirname(homeRelativeDir);
 							relativePath = "@~/" + (dir === "." ? entry : join(dir, entry));
 						} else {
 							relativePath = "@" + join(dirname(pathWithoutAt), entry);
 						}
 					} else {
 						if (pathWithoutAt.startsWith("~")) {
 							relativePath = "@~/" + entry;
 						} else {
 							relativePath = "@" + entry;
 						}
 					}
 				} else if (prefix.endsWith("/")) {
 					// If prefix ends with /, append entry to the prefix
 					relativePath = prefix + entry;
 				} else if (prefix.includes("/")) {
 					// Preserve ~/ format for home directory paths
 					if (prefix.startsWith("~/")) {
 						const homeRelativeDir = prefix.slice(2); // Remove ~/
 						const dir = dirname(homeRelativeDir);
 						relativePath = "~/" + (dir === "." ? entry : join(dir, entry));
 					} else {
 						relativePath = join(dirname(prefix), entry);
 					}
 				} else {
 					// For standalone entries, preserve ~/ if original prefix was ~/
 					if (prefix.startsWith("~")) {
 						relativePath = "~/" + entry;
 					} else {
 						relativePath = entry;
 					}
 				}
 				suggestions.push({
 					value: isDirectory ? relativePath + "/" : relativePath,
 					label: entry,
 					description: isDirectory ? "directory" : "file",
 				});
 			}
 			// Sort directories first, then alphabetically
 			suggestions.sort((a, b) => {
 				const aIsDir = a.description === "directory";
 				const bIsDir = b.description === "directory";
 				if (aIsDir && !bIsDir) return -1;
 				if (!aIsDir && bIsDir) return 1;
 				return a.label.localeCompare(b.label);
 			});
 			logger.debug("CombinedAutocompleteProvider", "Returning suggestions", {
 				count: suggestions.length,
 				firstFew: suggestions.slice(0, 3).map((s) => s.label),
 			});
 			return suggestions.slice(0, 10); // Limit to 10 suggestions
 		} catch (e) {
 			// Directory doesn't exist or not accessible
 			logger.error("CombinedAutocompleteProvider", "Error reading directory", {
 				error: e instanceof Error ? e.message : String(e),
 			});
 			return [];
 		}
 	}
 	// Force file completion (called on Tab key) - always returns suggestions
 	getForceFileSuggestions(
 		lines: string[],
 		cursorLine: number,
 		cursorCol: number,
 	): { items: AutocompleteItem[]; prefix: string } | null {
 		logger.debug("CombinedAutocompleteProvider", "getForceFileSuggestions called", {
 			lines,
 			cursorLine,
 			cursorCol,
 		});
 		const currentLine = lines[cursorLine] || "";
 		const textBeforeCursor = currentLine.slice(0, cursorCol);
 		// Don't trigger if we're in a slash command
 		if (textBeforeCursor.startsWith("/") && !textBeforeCursor.includes(" ")) {
 			return null;
 		}
 		// Force extract path prefix - this will always return something
 		const pathMatch = this.extractPathPrefix(textBeforeCursor, true);
 		logger.debug("CombinedAutocompleteProvider", "Forced path match", {
 			textBeforeCursor,
 			pathMatch,
 		});
 		if (pathMatch !== null) {
 			const suggestions = this.getFileSuggestions(pathMatch);
 			if (suggestions.length === 0) return null;
 			return {
 				items: suggestions,
 				prefix: pathMatch,
 			};
 		}
 		return null;
 	}
 	// Check if we should trigger file completion (called on Tab key)
 	shouldTriggerFileCompletion(lines: string[], cursorLine: number, cursorCol: number): boolean {
 		const currentLine = lines[cursorLine] || "";
 		const textBeforeCursor = currentLine.slice(0, cursorCol);
 		// Don't trigger if we're in a slash command
 		if (textBeforeCursor.startsWith("/") && !textBeforeCursor.includes(" ")) {
 			return false;
 		}
 		return true;
 	}
 }
--- a/packages/tui/src/index.ts
+++ b/packages/tui/src/index.ts
@ -0,0 +1,29 @@
 // Core TUI interfaces and classes
 // Autocomplete support
 export {
 	type AutocompleteItem,
 	type AutocompleteProvider,
 	CombinedAutocompleteProvider,
 	type SlashCommand,
 } from "./autocomplete.js";
 // Logger for debugging
 export { type LoggerConfig, logger } from "./logger.js";
 // Markdown component
 export { MarkdownComponent } from "./markdown-component.js";
 // Select list component
 export { type SelectItem, SelectList } from "./select-list.js";
 // Text component
 export { TextComponent } from "./text-component.js";
 // Text editor component
 export { TextEditor, type TextEditorConfig } from "./text-editor.js";
 export {
 	type Component,
 	type ComponentRenderResult,
 	Container,
 	type ContainerRenderResult,
 	type Padding,
 	TUI,
 } from "./tui.js";
 // Whitespace component
 export { WhitespaceComponent } from "./whitespace-component.js";
--- a/packages/tui/src/logger.ts
+++ b/packages/tui/src/logger.ts
@ -0,0 +1,95 @@
 import { appendFileSync, writeFileSync } from "fs";
 import { join } from "path";
 export interface LoggerConfig {
 	enabled: boolean;
 	logFile: string;
 	logLevel: "debug" | "info" | "warn" | "error";
 }
 class Logger {
 	private config: LoggerConfig = {
 		enabled: false,
 		logFile: join(process.cwd(), "tui-debug.log"),
 		logLevel: "debug",
 	};
 	configure(config: Partial<LoggerConfig>): void {
 		this.config = { ...this.config, ...config };
 		if (this.config.enabled) {
 			// Clear log file on startup
 			try {
 				writeFileSync(this.config.logFile, `=== TUI Debug Log Started ${new Date().toISOString()} ===\n`);
 			} catch (error) {
 				// Silently fail if we can't write to log file
 			}
 		}
 	}
 	private shouldLog(level: string): boolean {
 		if (!this.config.enabled) return false;
 		const levels = ["debug", "info", "warn", "error"];
 		const currentLevel = levels.indexOf(this.config.logLevel);
 		const messageLevel = levels.indexOf(level);
 		return messageLevel >= currentLevel;
 	}
 	private log(level: string, component: string, message: string, data?: any): void {
 		if (!this.shouldLog(level)) return;
 		try {
 			const timestamp = new Date().toISOString();
 			const dataStr = data ? ` | Data: ${JSON.stringify(data)}` : "";
 			const logLine = `[${timestamp}] ${level.toUpperCase()} [${component}] ${message}${dataStr}\n`;
 			appendFileSync(this.config.logFile, logLine);
 		} catch (error) {
 			// Silently fail if we can't write to log file
 		}
 	}
 	debug(component: string, message: string, data?: any): void {
 		this.log("debug", component, message, data);
 	}
 	info(component: string, message: string, data?: any): void {
 		this.log("info", component, message, data);
 	}
 	warn(component: string, message: string, data?: any): void {
 		this.log("warn", component, message, data);
 	}
 	error(component: string, message: string, data?: any): void {
 		this.log("error", component, message, data);
 	}
 	// Specific TUI logging methods
 	keyInput(component: string, keyData: string): void {
 		this.debug(component, "Key input received", {
 			keyData,
 			charCodes: Array.from(keyData).map((c) => c.charCodeAt(0)),
 		});
 	}
 	render(component: string, renderResult: any): void {
 		this.debug(component, "Render result", renderResult);
 	}
 	focus(component: string, focused: boolean): void {
 		this.info(component, `Focus ${focused ? "gained" : "lost"}`);
 	}
 	componentLifecycle(component: string, action: string, details?: any): void {
 		this.info(component, `Component ${action}`, details);
 	}
 	stateChange(component: string, property: string, oldValue: any, newValue: any): void {
 		this.debug(component, `State change: ${property}`, { oldValue, newValue });
 	}
 }
 export const logger = new Logger();
--- a/packages/tui/src/markdown-component.ts
+++ b/packages/tui/src/markdown-component.ts
@ -0,0 +1,260 @@
 import chalk from "chalk";
 import { marked, type Token } from "marked";
 import type { Component, ComponentRenderResult } from "./tui.js";
 export class MarkdownComponent implements Component {
 	private text: string;
 	private lines: string[] = [];
 	private previousLines: string[] = [];
 	constructor(text: string = "") {
 		this.text = text;
 	}
 	setText(text: string): void {
 		this.text = text;
 	}
 	render(width: number): ComponentRenderResult {
 		// Parse markdown to HTML-like tokens
 		const tokens = marked.lexer(this.text);
 		// Convert tokens to styled terminal output
 		const renderedLines: string[] = [];
 		for (let i = 0; i < tokens.length; i++) {
 			const token = tokens[i];
 			const nextToken = tokens[i + 1];
 			const tokenLines = this.renderToken(token, width, nextToken?.type);
 			renderedLines.push(...tokenLines);
 		}
 		// Wrap lines to fit width
 		const wrappedLines: string[] = [];
 		for (const line of renderedLines) {
 			wrappedLines.push(...this.wrapLine(line, width));
 		}
 		this.previousLines = this.lines;
 		this.lines = wrappedLines;
 		// Determine if content changed
 		const changed =
 			this.lines.length !== this.previousLines.length ||
 			this.lines.some((line, i) => line !== this.previousLines[i]);
 		return {
 			lines: this.lines,
 			changed,
 		};
 	}
 	private renderToken(token: Token, width: number, nextTokenType?: string): string[] {
 		const lines: string[] = [];
 		switch (token.type) {
 			case "heading": {
 				const headingLevel = token.depth;
 				const headingPrefix = "#".repeat(headingLevel) + " ";
 				const headingText = this.renderInlineTokens(token.tokens || []);
 				if (headingLevel === 1) {
 					lines.push(chalk.bold.underline.yellow(headingText));
 				} else if (headingLevel === 2) {
 					lines.push(chalk.bold.yellow(headingText));
 				} else {
 					lines.push(chalk.bold(headingPrefix + headingText));
 				}
 				lines.push(""); // Add spacing after headings
 				break;
 			}
 			case "paragraph": {
 				const paragraphText = this.renderInlineTokens(token.tokens || []);
 				lines.push(paragraphText);
 				// Don't add spacing if next token is space or list
 				if (nextTokenType && nextTokenType !== "list" && nextTokenType !== "space") {
 					lines.push("");
 				}
 				break;
 			}
 			case "code": {
 				lines.push(chalk.gray("```" + (token.lang || "")));
 				// Split code by newlines and style each line
 				const codeLines = token.text.split("\n");
 				for (const codeLine of codeLines) {
 					lines.push(chalk.dim("  ") + chalk.green(codeLine));
 				}
 				lines.push(chalk.gray("```"));
 				lines.push(""); // Add spacing after code blocks
 				break;
 			}
 			case "list":
 				for (let i = 0; i < token.items.length; i++) {
 					const item = token.items[i];
 					const bullet = token.ordered ? `${i + 1}. ` : "- ";
 					const itemText = this.renderInlineTokens(item.tokens || []);
 					// Check if the item text contains multiple lines (embedded content)
 					const itemLines = itemText.split("\n").filter((line) => line.trim());
 					if (itemLines.length > 1) {
 						// First line is the list item
 						lines.push(chalk.cyan(bullet) + itemLines[0]);
 						// Rest are treated as separate content
 						for (let j = 1; j < itemLines.length; j++) {
 							lines.push(""); // Add spacing
 							lines.push(itemLines[j]);
 						}
 					} else {
 						lines.push(chalk.cyan(bullet) + itemText);
 					}
 				}
 				// Don't add spacing after lists if a space token follows
 				// (the space token will handle it)
 				break;
 			case "blockquote": {
 				const quoteText = this.renderInlineTokens(token.tokens || []);
 				const quoteLines = quoteText.split("\n");
 				for (const quoteLine of quoteLines) {
 					lines.push(chalk.gray("│ ") + chalk.italic(quoteLine));
 				}
 				lines.push(""); // Add spacing after blockquotes
 				break;
 			}
 			case "hr":
 				lines.push(chalk.gray("─".repeat(Math.min(width, 80))));
 				lines.push(""); // Add spacing after horizontal rules
 				break;
 			case "html":
 				// Skip HTML for terminal output
 				break;
 			case "space":
 				// Space tokens represent blank lines in markdown
 				lines.push("");
 				break;
 			default:
 				// Handle any other token types as plain text
 				if ("text" in token && typeof token.text === "string") {
 					lines.push(token.text);
 				}
 		}
 		return lines;
 	}
 	private renderInlineTokens(tokens: Token[]): string {
 		let result = "";
 		for (const token of tokens) {
 			switch (token.type) {
 				case "text":
 					// Text tokens in list items can have nested tokens for inline formatting
 					if (token.tokens && token.tokens.length > 0) {
 						result += this.renderInlineTokens(token.tokens);
 					} else {
 						result += token.text;
 					}
 					break;
 				case "strong":
 					result += chalk.bold(this.renderInlineTokens(token.tokens || []));
 					break;
 				case "em":
 					result += chalk.italic(this.renderInlineTokens(token.tokens || []));
 					break;
 				case "codespan":
 					result += chalk.gray("`") + chalk.cyan(token.text) + chalk.gray("`");
 					break;
 				case "link": {
 					const linkText = this.renderInlineTokens(token.tokens || []);
 					result += chalk.underline.blue(linkText) + chalk.gray(` (${token.href})`);
 					break;
 				}
 				case "br":
 					result += "\n";
 					break;
 				case "del":
 					result += chalk.strikethrough(this.renderInlineTokens(token.tokens || []));
 					break;
 				default:
 					// Handle any other inline token types as plain text
 					if ("text" in token && typeof token.text === "string") {
 						result += token.text;
 					}
 			}
 		}
 		return result;
 	}
 	private wrapLine(line: string, width: number): string[] {
 		// Handle ANSI escape codes properly when wrapping
 		const wrapped: string[] = [];
 		// Handle undefined or null lines
 		if (!line) {
 			return [""];
 		}
 		// If line fits within width, return as-is
 		const visibleLength = this.getVisibleLength(line);
 		if (visibleLength <= width) {
 			return [line];
 		}
 		// Need to wrap - this is complex with ANSI codes
 		// For now, use a simple approach that may break styling at wrap points
 		let currentLine = "";
 		let currentLength = 0;
 		let i = 0;
 		while (i < line.length) {
 			if (line[i] === "\x1b" && line[i + 1] === "[") {
 				// ANSI escape sequence - include it without counting length
 				let j = i + 2;
 				while (j < line.length && line[j] && !/[mGKHJ]/.test(line[j]!)) {
 					j++;
 				}
 				if (j < line.length) {
 					currentLine += line.substring(i, j + 1);
 					i = j + 1;
 				} else {
 					break;
 				}
 			} else {
 				// Regular character
 				if (currentLength >= width) {
 					wrapped.push(currentLine);
 					currentLine = "";
 					currentLength = 0;
 				}
 				currentLine += line[i];
 				currentLength++;
 				i++;
 			}
 		}
 		if (currentLine) {
 			wrapped.push(currentLine);
 		}
 		return wrapped.length > 0 ? wrapped : [""];
 	}
 	private getVisibleLength(str: string): number {
 		// Remove ANSI escape codes and count visible characters
 		return (str || "").replace(/\x1b\[[0-9;]*m/g, "").length;
 	}
 }
--- a/packages/tui/src/select-list.ts
+++ b/packages/tui/src/select-list.ts
@ -0,0 +1,154 @@
 import chalk from "chalk";
 import type { Component, ComponentRenderResult } from "./tui.js";
 export interface SelectItem {
 	value: string;
 	label: string;
 	description?: string;
 }
 export class SelectList implements Component {
 	private items: SelectItem[] = [];
 	private filteredItems: SelectItem[] = [];
 	private selectedIndex: number = 0;
 	private filter: string = "";
 	private maxVisible: number = 5;
 	public onSelect?: (item: SelectItem) => void;
 	public onCancel?: () => void;
 	constructor(items: SelectItem[], maxVisible: number = 5) {
 		this.items = items;
 		this.filteredItems = items;
 		this.maxVisible = maxVisible;
 	}
 	setFilter(filter: string): void {
 		this.filter = filter;
 		this.filteredItems = this.items.filter((item) => item.value.toLowerCase().startsWith(filter.toLowerCase()));
 		// Reset selection when filter changes
 		this.selectedIndex = 0;
 	}
 	render(width: number): ComponentRenderResult {
 		const lines: string[] = [];
 		// If no items match filter, show message
 		if (this.filteredItems.length === 0) {
 			lines.push(chalk.gray("  No matching commands"));
 			return { lines, changed: true };
 		}
 		// Calculate visible range with scrolling
 		const startIndex = Math.max(
 			0,
 			Math.min(this.selectedIndex - Math.floor(this.maxVisible / 2), this.filteredItems.length - this.maxVisible),
 		);
 		const endIndex = Math.min(startIndex + this.maxVisible, this.filteredItems.length);
 		// Render visible items
 		for (let i = startIndex; i < endIndex; i++) {
 			const item = this.filteredItems[i];
 			if (!item) continue;
 			const isSelected = i === this.selectedIndex;
 			let line = "";
 			if (isSelected) {
 				// Use arrow indicator for selection
 				const prefix = chalk.blue("→ ");
 				const displayValue = item.label || item.value;
 				if (item.description && width > 40) {
 					// Calculate how much space we have for value + description
 					const maxValueLength = Math.min(displayValue.length, 30);
 					const truncatedValue = displayValue.substring(0, maxValueLength);
 					const spacing = " ".repeat(Math.max(1, 32 - truncatedValue.length));
 					// Calculate remaining space for description
 					const descriptionStart = prefix.length + truncatedValue.length + spacing.length - 2; // -2 for arrow color codes
 					const remainingWidth = width - descriptionStart - 2; // -2 for safety
 					if (remainingWidth > 10) {
 						const truncatedDesc = item.description.substring(0, remainingWidth);
 						line = prefix + chalk.blue(truncatedValue) + chalk.gray(spacing + truncatedDesc);
 					} else {
 						// Not enough space for description
 						const maxWidth = width - 4; // 2 for arrow + space, 2 for safety
 						line = prefix + chalk.blue(displayValue.substring(0, maxWidth));
 					}
 				} else {
 					// No description or not enough width
 					const maxWidth = width - 4; // 2 for arrow + space, 2 for safety
 					line = prefix + chalk.blue(displayValue.substring(0, maxWidth));
 				}
 			} else {
 				const displayValue = item.label || item.value;
 				const prefix = "  ";
 				if (item.description && width > 40) {
 					// Calculate how much space we have for value + description
 					const maxValueLength = Math.min(displayValue.length, 30);
 					const truncatedValue = displayValue.substring(0, maxValueLength);
 					const spacing = " ".repeat(Math.max(1, 32 - truncatedValue.length));
 					// Calculate remaining space for description
 					const descriptionStart = prefix.length + truncatedValue.length + spacing.length;
 					const remainingWidth = width - descriptionStart - 2; // -2 for safety
 					if (remainingWidth > 10) {
 						const truncatedDesc = item.description.substring(0, remainingWidth);
 						line = prefix + truncatedValue + chalk.gray(spacing + truncatedDesc);
 					} else {
 						// Not enough space for description
 						const maxWidth = width - prefix.length - 2;
 						line = prefix + displayValue.substring(0, maxWidth);
 					}
 				} else {
 					// No description or not enough width
 					const maxWidth = width - prefix.length - 2;
 					line = prefix + displayValue.substring(0, maxWidth);
 				}
 			}
 			lines.push(line);
 		}
 		// Add scroll indicators if needed
 		if (startIndex > 0 || endIndex < this.filteredItems.length) {
 			const scrollInfo = chalk.gray(`  (${this.selectedIndex + 1}/${this.filteredItems.length})`);
 			lines.push(scrollInfo);
 		}
 		return { lines, changed: true };
 	}
 	handleInput(keyData: string): void {
 		// Up arrow
 		if (keyData === "\x1b[A") {
 			this.selectedIndex = Math.max(0, this.selectedIndex - 1);
 		}
 		// Down arrow
 		else if (keyData === "\x1b[B") {
 			this.selectedIndex = Math.min(this.filteredItems.length - 1, this.selectedIndex + 1);
 		}
 		// Enter
 		else if (keyData === "\r") {
 			const selectedItem = this.filteredItems[this.selectedIndex];
 			if (selectedItem && this.onSelect) {
 				this.onSelect(selectedItem);
 			}
 		}
 		// Escape
 		else if (keyData === "\x1b") {
 			if (this.onCancel) {
 				this.onCancel();
 			}
 		}
 	}
 	getSelectedItem(): SelectItem | null {
 		const item = this.filteredItems[this.selectedIndex];
 		return item || null;
 	}
 }
--- a/packages/tui/src/text-component.ts
+++ b/packages/tui/src/text-component.ts
@ -0,0 +1,104 @@
 import type { Component, ComponentRenderResult, Padding } from "./tui.js";
 export class TextComponent implements Component {
 	private text: string;
 	private lastRenderedLines: string[] = [];
 	private padding: Required<Padding>;
 	constructor(text: string, padding?: Padding) {
 		this.text = text;
 		this.padding = {
 			top: padding?.top ?? 0,
 			bottom: padding?.bottom ?? 0,
 			left: padding?.left ?? 0,
 			right: padding?.right ?? 0,
 		};
 	}
 	render(width: number): ComponentRenderResult {
 		// Calculate available width after horizontal padding
 		const availableWidth = Math.max(1, width - this.padding.left - this.padding.right);
 		const leftPadding = " ".repeat(this.padding.left);
 		// First split by newlines to preserve line breaks
 		const textLines = this.text.split("\n");
 		const lines: string[] = [];
 		// Add top padding
 		for (let i = 0; i < this.padding.top; i++) {
 			lines.push("");
 		}
 		// Process each line for word wrapping
 		for (const textLine of textLines) {
 			if (textLine.length === 0) {
 				// Preserve empty lines with padding
 				lines.push(leftPadding);
 			} else {
 				// Word wrapping with ANSI-aware length calculation
 				const words = textLine.split(" ");
 				let currentLine = "";
 				let currentVisibleLength = 0;
 				for (const word of words) {
 					const wordVisibleLength = this.getVisibleLength(word);
 					const spaceLength = currentLine ? 1 : 0;
 					if (currentVisibleLength + spaceLength + wordVisibleLength <= availableWidth) {
 						currentLine += (currentLine ? " " : "") + word;
 						currentVisibleLength += spaceLength + wordVisibleLength;
 					} else {
 						if (currentLine) {
 							lines.push(leftPadding + currentLine);
 						}
 						currentLine = word;
 						currentVisibleLength = wordVisibleLength;
 					}
 				}
 				if (currentLine) {
 					lines.push(leftPadding + currentLine);
 				}
 			}
 		}
 		// Add bottom padding
 		for (let i = 0; i < this.padding.bottom; i++) {
 			lines.push("");
 		}
 		const newLines = lines.length > 0 ? lines : [""];
 		// Check if content changed
 		const changed = !this.arraysEqual(newLines, this.lastRenderedLines);
 		// Always cache the current rendered lines
 		this.lastRenderedLines = [...newLines];
 		return {
 			lines: newLines,
 			changed,
 		};
 	}
 	setText(text: string): void {
 		this.text = text;
 	}
 	getText(): string {
 		return this.text;
 	}
 	private arraysEqual(a: string[], b: string[]): boolean {
 		if (a.length !== b.length) return false;
 		for (let i = 0; i < a.length; i++) {
 			if (a[i] !== b[i]) return false;
 		}
 		return true;
 	}
 	private getVisibleLength(str: string): number {
 		// Remove ANSI escape codes and count visible characters
 		return (str || "").replace(/\x1b\[[0-9;]*m/g, "").length;
 	}
 }
--- a/packages/tui/src/text-editor.ts
+++ b/packages/tui/src/text-editor.ts
@ -0,0 +1,802 @@
 import chalk from "chalk";
 import type { AutocompleteProvider, CombinedAutocompleteProvider } from "./autocomplete.js";
 import { logger } from "./logger.js";
 import { SelectList } from "./select-list.js";
 import type { Component, ComponentRenderResult } from "./tui.js";
 interface EditorState {
 	lines: string[];
 	cursorLine: number;
 	cursorCol: number;
 }
 interface LayoutLine {
 	text: string;
 	hasCursor: boolean;
 	cursorPos?: number;
 }
 export interface TextEditorConfig {
 	// Configuration options for text editor (none currently)
 }
 export class TextEditor implements Component {
 	private state: EditorState = {
 		lines: [""],
 		cursorLine: 0,
 		cursorCol: 0,
 	};
 	private config: TextEditorConfig = {};
 	// Autocomplete support
 	private autocompleteProvider?: AutocompleteProvider;
 	private autocompleteList?: SelectList;
 	private isAutocompleting: boolean = false;
 	private autocompletePrefix: string = "";
 	public onSubmit?: (text: string) => void;
 	public onChange?: (text: string) => void;
 	public disableSubmit: boolean = false;
 	constructor(config?: TextEditorConfig) {
 		if (config) {
 			this.config = { ...this.config, ...config };
 		}
 		logger.componentLifecycle("TextEditor", "created", { config: this.config });
 	}
 	configure(config: Partial<TextEditorConfig>): void {
 		this.config = { ...this.config, ...config };
 		logger.info("TextEditor", "Configuration updated", { config: this.config });
 	}
 	setAutocompleteProvider(provider: AutocompleteProvider): void {
 		this.autocompleteProvider = provider;
 	}
 	render(width: number): ComponentRenderResult {
 		// Box drawing characters
 		const topLeft = chalk.gray("╭");
 		const topRight = chalk.gray("╮");
 		const bottomLeft = chalk.gray("╰");
 		const bottomRight = chalk.gray("╯");
 		const horizontal = chalk.gray("─");
 		const vertical = chalk.gray("│");
 		// Calculate box width (leave some margin)
 		const boxWidth = width - 1;
 		const contentWidth = boxWidth - 4; // Account for "│ " and " │"
 		// Layout the text
 		const layoutLines = this.layoutText(contentWidth);
 		const result: string[] = [];
 		// Render top border
 		result.push(topLeft + horizontal.repeat(boxWidth - 2) + topRight);
 		// Render each layout line
 		for (const layoutLine of layoutLines) {
 			let displayText = layoutLine.text;
 			let visibleLength = layoutLine.text.length;
 			// Add cursor if this line has it
 			if (layoutLine.hasCursor && layoutLine.cursorPos !== undefined) {
 				const before = displayText.slice(0, layoutLine.cursorPos);
 				const after = displayText.slice(layoutLine.cursorPos);
 				if (after.length > 0) {
 					// Cursor is on a character - replace it with highlighted version
 					const cursor = `\x1b[7m${after[0]}\x1b[0m`;
 					const restAfter = after.slice(1);
 					displayText = before + cursor + restAfter;
 					// visibleLength stays the same - we're replacing, not adding
 				} else {
 					// Cursor is at the end - add highlighted space
 					const cursor = "\x1b[7m \x1b[0m";
 					displayText = before + cursor;
 					// visibleLength increases by 1 - we're adding a space
 					visibleLength = layoutLine.text.length + 1;
 				}
 			}
 			// Calculate padding based on actual visible length
 			const padding = " ".repeat(Math.max(0, contentWidth - visibleLength));
 			// Render the line
 			result.push(`${vertical} ${displayText}${padding} ${vertical}`);
 		}
 		// Render bottom border
 		result.push(bottomLeft + horizontal.repeat(boxWidth - 2) + bottomRight);
 		// Add autocomplete list if active
 		if (this.isAutocompleting && this.autocompleteList) {
 			const autocompleteResult = this.autocompleteList.render(width);
 			result.push(...autocompleteResult.lines);
 		}
 		// For interactive components like text editors, always assume changed
 		// This ensures cursor position updates are always reflected
 		return {
 			lines: result,
 			changed: true,
 		};
 	}
 	handleInput(data: string): void {
 		logger.keyInput("TextEditor", data);
 		logger.debug("TextEditor", "Current state before input", {
 			lines: this.state.lines,
 			cursorLine: this.state.cursorLine,
 			cursorCol: this.state.cursorCol,
 		});
 		// Handle special key combinations first
 		// Ctrl+C - Exit (let parent handle this)
 		if (data.charCodeAt(0) === 3) {
 			logger.debug("TextEditor", "Ctrl+C received, returning to parent");
 			return;
 		}
 		// Handle paste - detect when we get a lot of text at once
 		const isPaste = data.length > 10 || (data.length > 2 && data.includes("\n"));
 		logger.debug("TextEditor", "Paste detection", {
 			dataLength: data.length,
 			includesNewline: data.includes("\n"),
 			includesTabs: data.includes("\t"),
 			tabCount: (data.match(/\t/g) || []).length,
 			isPaste,
 			data: JSON.stringify(data),
 			charCodes: Array.from(data).map((c) => c.charCodeAt(0)),
 		});
 		if (isPaste) {
 			logger.info("TextEditor", "Handling as paste");
 			this.handlePaste(data);
 			return;
 		}
 		// Handle autocomplete special keys first (but don't block other input)
 		if (this.isAutocompleting && this.autocompleteList) {
 			logger.debug("TextEditor", "Autocomplete active, handling input", {
 				data,
 				charCode: data.charCodeAt(0),
 				isEscape: data === "\x1b",
 				isArrowOrEnter: data === "\x1b[A" || data === "\x1b[B" || data === "\r",
 			});
 			// Escape - cancel autocomplete
 			if (data === "\x1b") {
 				this.cancelAutocomplete();
 				return;
 			}
 			// Let the autocomplete list handle navigation and selection
 			else if (data === "\x1b[A" || data === "\x1b[B" || data === "\r" || data === "\t") {
 				// Only pass arrow keys to the list, not Enter/Tab (we handle those directly)
 				if (data === "\x1b[A" || data === "\x1b[B") {
 					this.autocompleteList.handleInput(data);
 				}
 				// If Tab was pressed, apply the selection
 				if (data === "\t") {
 					const selected = this.autocompleteList.getSelectedItem();
 					if (selected && this.autocompleteProvider) {
 						const result = this.autocompleteProvider.applyCompletion(
 							this.state.lines,
 							this.state.cursorLine,
 							this.state.cursorCol,
 							selected,
 							this.autocompletePrefix,
 						);
 						this.state.lines = result.lines;
 						this.state.cursorLine = result.cursorLine;
 						this.state.cursorCol = result.cursorCol;
 						this.cancelAutocomplete();
 						if (this.onChange) {
 							this.onChange(this.getText());
 						}
 					}
 					return;
 				}
 				// If Enter was pressed, cancel autocomplete and let it fall through to submission
 				else if (data === "\r") {
 					this.cancelAutocomplete();
 					// Don't return here - let Enter fall through to normal submission handling
 				} else {
 					// For other keys, handle normally within autocomplete
 					return;
 				}
 			}
 			// For other keys (like regular typing), DON'T return here
 			// Let them fall through to normal character handling
 			logger.debug("TextEditor", "Autocomplete active but falling through to normal handling");
 		}
 		// Tab key - context-aware completion (but not when already autocompleting)
 		if (data === "\t" && !this.isAutocompleting) {
 			logger.debug("TextEditor", "Tab key pressed, determining context", {
 				isAutocompleting: this.isAutocompleting,
 				hasProvider: !!this.autocompleteProvider,
 			});
 			this.handleTabCompletion();
 			return;
 		}
 		// Continue with rest of input handling
 		// Ctrl+K - Delete current line
 		if (data.charCodeAt(0) === 11) {
 			this.deleteCurrentLine();
 		}
 		// Ctrl+A - Move to start of line
 		else if (data.charCodeAt(0) === 1) {
 			this.moveToLineStart();
 		}
 		// Ctrl+E - Move to end of line
 		else if (data.charCodeAt(0) === 5) {
 			this.moveToLineEnd();
 		}
 		// New line shortcuts (but not plain LF/CR which should be submit)
 		else if (
 			(data.charCodeAt(0) === 10 && data.length > 1) || // Ctrl+Enter with modifiers
 			data === "\x1b\r" || // Option+Enter in some terminals
 			data === "\x1b[13;2~" || // Shift+Enter in some terminals
 			(data.length > 1 && data.includes("\x1b") && data.includes("\r")) ||
 			(data === "\n" && data.length === 1) || // Shift+Enter from iTerm2 mapping
 			data === "\\\r" // Shift+Enter in VS Code terminal
 		) {
 			// Modifier + Enter = new line
 			this.addNewLine();
 		}
 		// Plain Enter (char code 13 for CR) - only CR submits, LF adds new line
 		else if (data.charCodeAt(0) === 13 && data.length === 1) {
 			// If submit is disabled, do nothing
 			if (this.disableSubmit) {
 				return;
 			}
 			// Plain Enter = submit
 			const result = this.state.lines.join("\n").trim();
 			logger.info("TextEditor", "Submit triggered", {
 				result,
 				rawResult: JSON.stringify(this.state.lines.join("\n")),
 				lines: this.state.lines,
 				resultLines: result.split("\n"),
 			});
 			// Reset editor
 			this.state = {
 				lines: [""],
 				cursorLine: 0,
 				cursorCol: 0,
 			};
 			// Notify that editor is now empty
 			if (this.onChange) {
 				this.onChange("");
 			}
 			if (this.onSubmit) {
 				logger.info("TextEditor", "Calling onSubmit callback", { result });
 				this.onSubmit(result);
 			} else {
 				logger.warn("TextEditor", "No onSubmit callback set");
 			}
 		}
 		// Backspace
 		else if (data.charCodeAt(0) === 127 || data.charCodeAt(0) === 8) {
 			this.handleBackspace();
 		}
 		// Line navigation shortcuts (Home/End keys)
 		else if (data === "\x1b[H" || data === "\x1b[1~" || data === "\x1b[7~") {
 			// Home key
 			this.moveToLineStart();
 		} else if (data === "\x1b[F" || data === "\x1b[4~" || data === "\x1b[8~") {
 			// End key
 			this.moveToLineEnd();
 		}
 		// Forward delete (Fn+Backspace or Delete key)
 		else if (data === "\x1b[3~") {
 			// Delete key
 			this.handleForwardDelete();
 		}
 		// Arrow keys
 		else if (data === "\x1b[A") {
 			// Up
 			this.moveCursor(-1, 0);
 		} else if (data === "\x1b[B") {
 			// Down
 			this.moveCursor(1, 0);
 		} else if (data === "\x1b[C") {
 			// Right
 			this.moveCursor(0, 1);
 		} else if (data === "\x1b[D") {
 			// Left
 			this.moveCursor(0, -1);
 		}
 		// Regular characters (printable ASCII)
 		else if (data.charCodeAt(0) >= 32 && data.charCodeAt(0) <= 126) {
 			logger.debug("TextEditor", "Inserting character", { char: data, charCode: data.charCodeAt(0) });
 			this.insertCharacter(data);
 		} else {
 			logger.warn("TextEditor", "Unhandled input", {
 				data,
 				charCodes: Array.from(data).map((c) => c.charCodeAt(0)),
 			});
 		}
 	}
 	private layoutText(contentWidth: number): LayoutLine[] {
 		const layoutLines: LayoutLine[] = [];
 		if (this.state.lines.length === 0 || (this.state.lines.length === 1 && this.state.lines[0] === "")) {
 			// Empty editor
 			layoutLines.push({
 				text: "> ",
 				hasCursor: true,
 				cursorPos: 2,
 			});
 			return layoutLines;
 		}
 		// Process each logical line
 		for (let i = 0; i < this.state.lines.length; i++) {
 			const line = this.state.lines[i] || "";
 			const isCurrentLine = i === this.state.cursorLine;
 			const prefix = i === 0 ? "> " : "  ";
 			const prefixedLine = prefix + line;
 			const maxLineLength = contentWidth;
 			if (prefixedLine.length <= maxLineLength) {
 				// Line fits in one layout line
 				if (isCurrentLine) {
 					layoutLines.push({
 						text: prefixedLine,
 						hasCursor: true,
 						cursorPos: prefix.length + this.state.cursorCol,
 					});
 				} else {
 					layoutLines.push({
 						text: prefixedLine,
 						hasCursor: false,
 					});
 				}
 			} else {
 				// Line needs wrapping
 				const chunks = [];
 				for (let pos = 0; pos < prefixedLine.length; pos += maxLineLength) {
 					chunks.push(prefixedLine.slice(pos, pos + maxLineLength));
 				}
 				for (let chunkIndex = 0; chunkIndex < chunks.length; chunkIndex++) {
 					const chunk = chunks[chunkIndex];
 					if (!chunk) continue;
 					const chunkStart = chunkIndex * maxLineLength;
 					const chunkEnd = chunkStart + chunk.length;
 					const cursorPos = prefix.length + this.state.cursorCol;
 					const hasCursorInChunk = isCurrentLine && cursorPos >= chunkStart && cursorPos < chunkEnd;
 					if (hasCursorInChunk) {
 						layoutLines.push({
 							text: chunk,
 							hasCursor: true,
 							cursorPos: cursorPos - chunkStart,
 						});
 					} else {
 						layoutLines.push({
 							text: chunk,
 							hasCursor: false,
 						});
 					}
 				}
 			}
 		}
 		return layoutLines;
 	}
 	getText(): string {
 		return this.state.lines.join("\n");
 	}
 	setText(text: string): void {
 		// Split text into lines, handling different line endings
 		const lines = text.replace(/\r\n/g, "\n").replace(/\r/g, "\n").split("\n");
 		// Ensure at least one empty line
 		this.state.lines = lines.length === 0 ? [""] : lines;
 		// Reset cursor to end of text
 		this.state.cursorLine = this.state.lines.length - 1;
 		this.state.cursorCol = this.state.lines[this.state.cursorLine]?.length || 0;
 		// Notify of change
 		if (this.onChange) {
 			this.onChange(this.getText());
 		}
 	}
 	// All the editor methods from before...
 	private insertCharacter(char: string): void {
 		const line = this.state.lines[this.state.cursorLine] || "";
 		const before = line.slice(0, this.state.cursorCol);
 		const after = line.slice(this.state.cursorCol);
 		this.state.lines[this.state.cursorLine] = before + char + after;
 		this.state.cursorCol += char.length; // Fix: increment by the length of the inserted string
 		if (this.onChange) {
 			this.onChange(this.getText());
 		}
 		// Check if we should trigger or update autocomplete
 		if (!this.isAutocompleting) {
 			// Auto-trigger for "/" at the start of a line (slash commands)
 			if (char === "/" && this.isAtStartOfMessage()) {
 				this.tryTriggerAutocomplete();
 			}
 			// Also auto-trigger when typing letters in a slash command context
 			else if (/[a-zA-Z0-9]/.test(char)) {
 				const currentLine = this.state.lines[this.state.cursorLine] || "";
 				const textBeforeCursor = currentLine.slice(0, this.state.cursorCol);
 				// Check if we're in a slash command with a space (i.e., typing arguments)
 				if (textBeforeCursor.startsWith("/") && textBeforeCursor.includes(" ")) {
 					this.tryTriggerAutocomplete();
 				}
 			}
 		} else {
 			this.updateAutocomplete();
 		}
 	}
 	private handlePaste(pastedText: string): void {
 		logger.debug("TextEditor", "Processing paste", {
 			pastedText: JSON.stringify(pastedText),
 			hasTab: pastedText.includes("\t"),
 			tabCount: (pastedText.match(/\t/g) || []).length,
 		});
 		// Clean the pasted text
 		const cleanText = pastedText.replace(/\r\n/g, "\n").replace(/\r/g, "\n");
 		// Convert tabs to spaces (4 spaces per tab)
 		const tabExpandedText = cleanText.replace(/\t/g, "    ");
 		// Filter out non-printable characters except newlines
 		const filteredText = tabExpandedText
 			.split("")
 			.filter((char) => char === "\n" || (char >= " " && char <= "~"))
 			.join("");
 		// Split into lines
 		const pastedLines = filteredText.split("\n");
 		if (pastedLines.length === 1) {
 			// Single line - just insert each character
 			const text = pastedLines[0] || "";
 			for (const char of text) {
 				this.insertCharacter(char);
 			}
 			return;
 		}
 		// Multi-line paste - be very careful with array manipulation
 		const currentLine = this.state.lines[this.state.cursorLine] || "";
 		const beforeCursor = currentLine.slice(0, this.state.cursorCol);
 		const afterCursor = currentLine.slice(this.state.cursorCol);
 		// Build the new lines array step by step
 		const newLines: string[] = [];
 		// Add all lines before current line
 		for (let i = 0; i < this.state.cursorLine; i++) {
 			newLines.push(this.state.lines[i] || "");
 		}
 		// Add the first pasted line merged with before cursor text
 		newLines.push(beforeCursor + (pastedLines[0] || ""));
 		// Add all middle pasted lines
 		for (let i = 1; i < pastedLines.length - 1; i++) {
 			newLines.push(pastedLines[i] || "");
 		}
 		// Add the last pasted line with after cursor text
 		newLines.push((pastedLines[pastedLines.length - 1] || "") + afterCursor);
 		// Add all lines after current line
 		for (let i = this.state.cursorLine + 1; i < this.state.lines.length; i++) {
 			newLines.push(this.state.lines[i] || "");
 		}
 		// Replace the entire lines array
 		this.state.lines = newLines;
 		// Update cursor position to end of pasted content
 		this.state.cursorLine += pastedLines.length - 1;
 		this.state.cursorCol = (pastedLines[pastedLines.length - 1] || "").length;
 		// Notify of change
 		if (this.onChange) {
 			this.onChange(this.getText());
 		}
 	}
 	private addNewLine(): void {
 		const currentLine = this.state.lines[this.state.cursorLine] || "";
 		const before = currentLine.slice(0, this.state.cursorCol);
 		const after = currentLine.slice(this.state.cursorCol);
 		// Split current line
 		this.state.lines[this.state.cursorLine] = before;
 		this.state.lines.splice(this.state.cursorLine + 1, 0, after);
 		// Move cursor to start of new line
 		this.state.cursorLine++;
 		this.state.cursorCol = 0;
 		if (this.onChange) {
 			this.onChange(this.getText());
 		}
 	}
 	private handleBackspace(): void {
 		if (this.state.cursorCol > 0) {
 			// Delete character in current line
 			const line = this.state.lines[this.state.cursorLine] || "";
 			const before = line.slice(0, this.state.cursorCol - 1);
 			const after = line.slice(this.state.cursorCol);
 			this.state.lines[this.state.cursorLine] = before + after;
 			this.state.cursorCol--;
 		} else if (this.state.cursorLine > 0) {
 			// Merge with previous line
 			const currentLine = this.state.lines[this.state.cursorLine] || "";
 			const previousLine = this.state.lines[this.state.cursorLine - 1] || "";
 			this.state.lines[this.state.cursorLine - 1] = previousLine + currentLine;
 			this.state.lines.splice(this.state.cursorLine, 1);
 			this.state.cursorLine--;
 			this.state.cursorCol = previousLine.length;
 		}
 		if (this.onChange) {
 			this.onChange(this.getText());
 		}
 		// Update autocomplete after backspace
 		if (this.isAutocompleting) {
 			this.updateAutocomplete();
 		}
 	}
 	private moveToLineStart(): void {
 		this.state.cursorCol = 0;
 	}
 	private moveToLineEnd(): void {
 		const currentLine = this.state.lines[this.state.cursorLine] || "";
 		this.state.cursorCol = currentLine.length;
 	}
 	private handleForwardDelete(): void {
 		const currentLine = this.state.lines[this.state.cursorLine] || "";
 		if (this.state.cursorCol < currentLine.length) {
 			// Delete character at cursor position (forward delete)
 			const before = currentLine.slice(0, this.state.cursorCol);
 			const after = currentLine.slice(this.state.cursorCol + 1);
 			this.state.lines[this.state.cursorLine] = before + after;
 		} else if (this.state.cursorLine < this.state.lines.length - 1) {
 			// At end of line - merge with next line
 			const nextLine = this.state.lines[this.state.cursorLine + 1] || "";
 			this.state.lines[this.state.cursorLine] = currentLine + nextLine;
 			this.state.lines.splice(this.state.cursorLine + 1, 1);
 		}
 		if (this.onChange) {
 			this.onChange(this.getText());
 		}
 	}
 	private deleteCurrentLine(): void {
 		if (this.state.lines.length === 1) {
 			// Only one line - just clear it
 			this.state.lines[0] = "";
 			this.state.cursorCol = 0;
 		} else {
 			// Multiple lines - remove current line
 			this.state.lines.splice(this.state.cursorLine, 1);
 			// Adjust cursor position
 			if (this.state.cursorLine >= this.state.lines.length) {
 				// Was on last line, move to new last line
 				this.state.cursorLine = this.state.lines.length - 1;
 			}
 			// Clamp cursor column to new line length
 			const newLine = this.state.lines[this.state.cursorLine] || "";
 			this.state.cursorCol = Math.min(this.state.cursorCol, newLine.length);
 		}
 		if (this.onChange) {
 			this.onChange(this.getText());
 		}
 	}
 	private moveCursor(deltaLine: number, deltaCol: number): void {
 		if (deltaLine !== 0) {
 			const newLine = this.state.cursorLine + deltaLine;
 			if (newLine >= 0 && newLine < this.state.lines.length) {
 				this.state.cursorLine = newLine;
 				// Clamp cursor column to new line length
 				const line = this.state.lines[this.state.cursorLine] || "";
 				this.state.cursorCol = Math.min(this.state.cursorCol, line.length);
 			}
 		}
 		if (deltaCol !== 0) {
 			// Move column
 			const newCol = this.state.cursorCol + deltaCol;
 			const currentLine = this.state.lines[this.state.cursorLine] || "";
 			const maxCol = currentLine.length;
 			this.state.cursorCol = Math.max(0, Math.min(maxCol, newCol));
 		}
 	}
 	// Helper method to check if cursor is at start of message (for slash command detection)
 	private isAtStartOfMessage(): boolean {
 		const currentLine = this.state.lines[this.state.cursorLine] || "";
 		const beforeCursor = currentLine.slice(0, this.state.cursorCol);
 		// At start if line is empty, only contains whitespace, or is just "/"
 		return beforeCursor.trim() === "" || beforeCursor.trim() === "/";
 	}
 	// Autocomplete methods
 	private tryTriggerAutocomplete(explicitTab: boolean = false): void {
 		logger.debug("TextEditor", "tryTriggerAutocomplete called", {
 			explicitTab,
 			hasProvider: !!this.autocompleteProvider,
 		});
 		if (!this.autocompleteProvider) return;
 		// Check if we should trigger file completion on Tab
 		if (explicitTab) {
 			const provider = this.autocompleteProvider as CombinedAutocompleteProvider;
 			const shouldTrigger =
 				!provider.shouldTriggerFileCompletion ||
 				provider.shouldTriggerFileCompletion(this.state.lines, this.state.cursorLine, this.state.cursorCol);
 			logger.debug("TextEditor", "Tab file completion check", {
 				hasShouldTriggerMethod: !!provider.shouldTriggerFileCompletion,
 				shouldTrigger,
 				lines: this.state.lines,
 				cursorLine: this.state.cursorLine,
 				cursorCol: this.state.cursorCol,
 			});
 			if (!shouldTrigger) {
 				return;
 			}
 		}
 		const suggestions = this.autocompleteProvider.getSuggestions(
 			this.state.lines,
 			this.state.cursorLine,
 			this.state.cursorCol,
 		);
 		logger.debug("TextEditor", "Autocomplete suggestions", {
 			hasSuggestions: !!suggestions,
 			itemCount: suggestions?.items.length || 0,
 			prefix: suggestions?.prefix,
 		});
 		if (suggestions && suggestions.items.length > 0) {
 			this.autocompletePrefix = suggestions.prefix;
 			this.autocompleteList = new SelectList(suggestions.items, 5);
 			this.isAutocompleting = true;
 		} else {
 			this.cancelAutocomplete();
 		}
 	}
 	private handleTabCompletion(): void {
 		if (!this.autocompleteProvider) return;
 		const currentLine = this.state.lines[this.state.cursorLine] || "";
 		const beforeCursor = currentLine.slice(0, this.state.cursorCol);
 		// Check if we're in a slash command context
 		if (beforeCursor.trimStart().startsWith("/")) {
 			logger.debug("TextEditor", "Tab in slash command context", { beforeCursor });
 			this.handleSlashCommandCompletion();
 		} else {
 			logger.debug("TextEditor", "Tab in file completion context", { beforeCursor });
 			this.forceFileAutocomplete();
 		}
 	}
 	private handleSlashCommandCompletion(): void {
 		// For now, fall back to regular autocomplete (slash commands)
 		// This can be extended later to handle command-specific argument completion
 		logger.debug("TextEditor", "Handling slash command completion");
 		this.tryTriggerAutocomplete(true);
 	}
 	private forceFileAutocomplete(): void {
 		logger.debug("TextEditor", "forceFileAutocomplete called", {
 			hasProvider: !!this.autocompleteProvider,
 		});
 		if (!this.autocompleteProvider) return;
 		// Check if provider has the force method
 		const provider = this.autocompleteProvider as any;
 		if (!provider.getForceFileSuggestions) {
 			logger.debug("TextEditor", "Provider doesn't support forced file completion, falling back to regular");
 			this.tryTriggerAutocomplete(true);
 			return;
 		}
 		const suggestions = provider.getForceFileSuggestions(
 			this.state.lines,
 			this.state.cursorLine,
 			this.state.cursorCol,
 		);
 		logger.debug("TextEditor", "Forced file autocomplete suggestions", {
 			hasSuggestions: !!suggestions,
 			itemCount: suggestions?.items.length || 0,
 			prefix: suggestions?.prefix,
 		});
 		if (suggestions && suggestions.items.length > 0) {
 			this.autocompletePrefix = suggestions.prefix;
 			this.autocompleteList = new SelectList(suggestions.items, 5);
 			this.isAutocompleting = true;
 		} else {
 			this.cancelAutocomplete();
 		}
 	}
 	private cancelAutocomplete(): void {
 		this.isAutocompleting = false;
 		this.autocompleteList = undefined as any;
 		this.autocompletePrefix = "";
 	}
 	private updateAutocomplete(): void {
 		if (!this.isAutocompleting || !this.autocompleteProvider) return;
 		const suggestions = this.autocompleteProvider.getSuggestions(
 			this.state.lines,
 			this.state.cursorLine,
 			this.state.cursorCol,
 		);
 		if (suggestions && suggestions.items.length > 0) {
 			this.autocompletePrefix = suggestions.prefix;
 			if (this.autocompleteList) {
 				// Update the existing list with new items
 				this.autocompleteList = new SelectList(suggestions.items, 5);
 			}
 		} else {
 			// No more matches, cancel autocomplete
 			this.cancelAutocomplete();
 		}
 	}
 }
--- a/packages/tui/src/tui.ts
+++ b/packages/tui/src/tui.ts
@ -0,0 +1,473 @@
 import { writeSync } from "fs";
 import process from "process";
 import { logger } from "./logger.js";
 export interface Padding {
 	top?: number;
 	bottom?: number;
 	left?: number;
 	right?: number;
 }
 export interface ComponentRenderResult {
 	lines: string[];
 	changed: boolean;
 }
 export interface ContainerRenderResult extends ComponentRenderResult {
 	keepLines: number;
 }
 export interface Component {
 	render(width: number): ComponentRenderResult;
 	handleInput?(keyData: string): void;
 }
 // Sentinel component used to mark removed components - triggers cascade rendering
 class SentinelComponent implements Component {
 	render(): ComponentRenderResult {
 		return {
 			lines: [],
 			changed: true, // Always trigger cascade
 		};
 	}
 }
 // Base Container class that manages child components
 export class Container {
 	protected children: Element[] = [];
 	protected lines: string[] = [];
 	protected parentTui: TUI | undefined; // Reference to parent TUI for triggering re-renders
 	constructor(parentTui?: TUI | undefined) {
 		this.parentTui = parentTui;
 	}
 	setParentTui(tui: TUI | undefined): void {
 		this.parentTui = tui;
 	}
 	addChild(component: Element): void {
 		this.children.push(component);
 		// Set parent TUI reference for nested containers
 		if (component instanceof Container && this.parentTui) {
 			component.setParentTui(this.parentTui);
 		}
 		if (this.parentTui) {
 			this.parentTui.requestRender();
 		}
 	}
 	removeChild(component: Element): void {
 		const index = this.children.indexOf(component);
 		if (index >= 0) {
 			// Replace with sentinel instead of splicing to maintain array structure
 			this.children[index] = new SentinelComponent();
 			// Keep the childTotalLines entry - sentinel will update it to 0
 			// Clear parent TUI reference for nested containers
 			if (component instanceof Container) {
 				component.setParentTui(undefined);
 			}
 			// Use normal render - sentinel will trigger cascade naturally
 			if (this.parentTui) {
 				this.parentTui.requestRender();
 			}
 		} else {
 			for (const child of this.children) {
 				if (child instanceof Container) {
 					child.removeChild(component);
 				}
 			}
 		}
 	}
 	removeChildAt(index: number): void {
 		if (index >= 0 && index < this.children.length) {
 			const component = this.children[index];
 			// Replace with sentinel instead of splicing to maintain array structure
 			this.children[index] = new SentinelComponent();
 			// Clear parent TUI reference for nested containers
 			if (component instanceof Container) {
 				component.setParentTui(undefined);
 			}
 			// Use normal render - sentinel will trigger cascade naturally
 			if (this.parentTui) {
 				this.parentTui.requestRender();
 			}
 		}
 	}
 	render(width: number): ContainerRenderResult {
 		let keepLines = 0;
 		let changed = false;
 		const newLines: string[] = [];
 		for (let i = 0; i < this.children.length; i++) {
 			const child = this.children[i];
 			if (!child) continue;
 			if (child instanceof Container) {
 				const result = child.render(width);
 				newLines.push(...result.lines);
 				if (!changed && !result.changed) {
 					keepLines += result.lines.length;
 				} else {
 					if (!changed) {
 						// First change - use the child's keepLines
 						changed = true;
 						keepLines += result.keepLines;
 					}
 					// After first change, don't add any more keepLines
 				}
 			} else {
 				const result = child.render(width);
 				newLines.push(...result.lines);
 				if (!changed && !result.changed) {
 					keepLines += result.lines.length;
 				} else {
 					if (!changed) {
 						// First change for a non-container component
 						changed = true;
 					}
 					// After first change, don't add any more keepLines
 				}
 			}
 		}
 		this.lines = newLines;
 		return {
 			lines: this.lines,
 			changed,
 			keepLines,
 		};
 	}
 	// Get child for external manipulation
 	// Get child at index
 	// Note: This may return a SentinelComponent if a child was removed but not yet cleaned up
 	getChild(index: number): Element | undefined {
 		return this.children[index];
 	}
 	// Get number of children
 	// Note: This count includes sentinel components until they are cleaned up after the next render pass
 	getChildCount(): number {
 		return this.children.length;
 	}
 	// Clear all children from the container
 	clear(): void {
 		// Clear parent TUI references for nested containers
 		for (const child of this.children) {
 			if (child instanceof Container) {
 				child.setParentTui(undefined);
 			}
 		}
 		// Clear the children array
 		this.children = [];
 		// Request render if we have a parent TUI
 		if (this.parentTui) {
 			this.parentTui.requestRender();
 		}
 	}
 	// Clean up sentinel components
 	cleanupSentinels(): void {
 		const originalCount = this.children.length;
 		const validChildren: Element[] = [];
 		let sentinelCount = 0;
 		for (const child of this.children) {
 			if (child && !(child instanceof SentinelComponent)) {
 				validChildren.push(child);
 				// Recursively clean up nested containers
 				if (child instanceof Container) {
 					child.cleanupSentinels();
 				}
 			} else if (child instanceof SentinelComponent) {
 				sentinelCount++;
 			}
 		}
 		this.children = validChildren;
 		if (sentinelCount > 0) {
 			logger.debug("Container", "Cleaned up sentinels", {
 				originalCount,
 				newCount: this.children.length,
 				sentinelsRemoved: sentinelCount,
 			});
 		}
 	}
 }
 type Element = Component | Container;
 export class TUI extends Container {
 	private focusedComponent: Component | null = null;
 	private needsRender: boolean = false;
 	private wasRaw: boolean = false;
 	private totalLines: number = 0;
 	private isFirstRender: boolean = true;
 	private isStarted: boolean = false;
 	public onGlobalKeyPress?: (data: string) => boolean;
 	constructor() {
 		super(); // No parent TUI for root
 		this.handleResize = this.handleResize.bind(this);
 		this.handleKeypress = this.handleKeypress.bind(this);
 		logger.componentLifecycle("TUI", "created");
 	}
 	configureLogging(config: Parameters<typeof logger.configure>[0]): void {
 		logger.configure(config);
 		logger.info("TUI", "Logging configured", config);
 	}
 	override addChild(component: Element): void {
 		// Set parent TUI reference for containers
 		if (component instanceof Container) {
 			component.setParentTui(this);
 		}
 		super.addChild(component);
 		// Only auto-render if TUI has been started
 		if (this.isStarted) {
 			this.requestRender();
 		}
 	}
 	override removeChild(component: Element): void {
 		super.removeChild(component);
 		this.requestRender();
 	}
 	setFocus(component: Component): void {
 		// Check if component exists anywhere in the hierarchy
 		if (this.findComponent(component)) {
 			this.focusedComponent = component;
 		}
 	}
 	private findComponent(component: Component): boolean {
 		// Check direct children
 		if (this.children.includes(component)) {
 			return true;
 		}
 		// Recursively search in containers
 		for (const comp of this.children) {
 			if (comp instanceof Container) {
 				if (this.findInContainer(comp, component)) {
 					return true;
 				}
 			}
 		}
 		return false;
 	}
 	private findInContainer(container: Container, component: Component): boolean {
 		const childCount = container.getChildCount();
 		// Check direct children
 		for (let i = 0; i < childCount; i++) {
 			const child = container.getChild(i);
 			if (child === component) {
 				return true;
 			}
 		}
 		// Recursively search in nested containers
 		for (let i = 0; i < childCount; i++) {
 			const child = container.getChild(i);
 			if (child instanceof Container) {
 				if (this.findInContainer(child, component)) {
 					return true;
 				}
 			}
 		}
 		return false;
 	}
 	requestRender(): void {
 		if (!this.isStarted) return;
 		this.needsRender = true;
 		// Batch renders on next tick
 		process.nextTick(() => {
 			if (this.needsRender) {
 				this.renderToScreen();
 				this.needsRender = false;
 			}
 		});
 	}
 	start(): void {
 		// Set started flag
 		this.isStarted = true;
 		// Hide the terminal cursor
 		process.stdout.write("\x1b[?25l");
 		// Set up raw mode for key capture
 		try {
 			this.wasRaw = process.stdin.isRaw || false;
 			if (process.stdin.setRawMode) {
 				process.stdin.setRawMode(true);
 			}
 			process.stdin.setEncoding("utf8");
 			process.stdin.resume();
 			// Listen for events
 			process.stdout.on("resize", this.handleResize);
 			process.stdin.on("data", this.handleKeypress);
 		} catch (error) {
 			console.error("Error setting up raw mode:", error);
 		}
 		// Initial render
 		this.renderToScreen();
 	}
 	stop(): void {
 		// Show the terminal cursor again
 		process.stdout.write("\x1b[?25h");
 		process.stdin.removeListener("data", this.handleKeypress);
 		process.stdout.removeListener("resize", this.handleResize);
 		if (process.stdin.setRawMode) {
 			process.stdin.setRawMode(this.wasRaw);
 		}
 	}
 	private renderToScreen(resize: boolean = false): void {
 		const termWidth = process.stdout.columns || 80;
 		logger.debug("TUI", "Starting render cycle", {
 			termWidth,
 			componentCount: this.children.length,
 			isFirstRender: this.isFirstRender,
 		});
 		const result = this.render(termWidth);
 		if (resize) {
 			this.totalLines = result.lines.length;
 			result.keepLines = 0;
 			this.isFirstRender = true;
 		}
 		logger.debug("TUI", "Render result", {
 			totalLines: result.lines.length,
 			keepLines: result.keepLines,
 			changed: result.changed,
 			previousTotalLines: this.totalLines,
 		});
 		if (!result.changed) {
 			// Nothing changed - skip render
 			return;
 		}
 		// Handle cursor positioning
 		if (this.isFirstRender) {
 			// First render: just append to current terminal position
 			this.isFirstRender = false;
 			// Output all lines normally on first render
 			for (const line of result.lines) {
 				console.log(line);
 			}
 		} else {
 			// Move cursor up to start of changing content and clear down
 			const linesToMoveUp = this.totalLines - result.keepLines;
 			let output = "";
 			logger.debug("TUI", "Cursor movement", {
 				linesToMoveUp,
 				totalLines: this.totalLines,
 				keepLines: result.keepLines,
 				changingLineCount: result.lines.length - result.keepLines,
 			});
 			if (linesToMoveUp > 0) {
 				output += `\x1b[${linesToMoveUp}A\x1b[0J`;
 			}
 			// Build the output string for all changing lines
 			const changingLines = result.lines.slice(result.keepLines);
 			logger.debug("TUI", "Output details", {
 				linesToMoveUp,
 				changingLinesCount: changingLines.length,
 				keepLines: result.keepLines,
 				totalLines: result.lines.length,
 				previousTotalLines: this.totalLines,
 			});
 			for (const line of changingLines) {
 				output += `${line}\n`;
 			}
 			// Write everything at once - use synchronous write to prevent race conditions
 			writeSync(process.stdout.fd, output);
 		}
 		this.totalLines = result.lines.length;
 		// Clean up sentinels after rendering
 		this.cleanupSentinels();
 	}
 	private handleResize(): void {
 		// Clear screen, hide cursor, and reset color
 		process.stdout.write("\u001Bc\x1b[?25l\u001B[3J");
 		// Terminal size changed - force re-render all
 		this.renderToScreen(true);
 	}
 	private handleKeypress(data: string): void {
 		logger.keyInput("TUI", data);
 		// Don't handle Ctrl+C here - let the global key handler deal with it
 		// if (data.charCodeAt(0) === 3) {
 		// 	logger.info("TUI", "Ctrl+C received");
 		// 	return; // Don't process this key further
 		// }
 		// Call global key handler if set
 		if (this.onGlobalKeyPress) {
 			const shouldForward = this.onGlobalKeyPress(data);
 			if (!shouldForward) {
 				// Global handler consumed the key, don't forward to focused component
 				this.requestRender();
 				return;
 			}
 		}
 		// Send input to focused component
 		if (this.focusedComponent?.handleInput) {
 			logger.debug("TUI", "Forwarding input to focused component", {
 				componentType: this.focusedComponent.constructor.name,
 			});
 			this.focusedComponent.handleInput(data);
 			// Trigger re-render after input
 			this.requestRender();
 		} else {
 			logger.warn("TUI", "No focused component to handle input", {
 				focusedComponent: this.focusedComponent?.constructor.name || "none",
 				hasHandleInput: this.focusedComponent?.handleInput ? "yes" : "no",
 			});
 		}
 	}
 }
--- a/packages/tui/src/whitespace-component.ts
+++ b/packages/tui/src/whitespace-component.ts
@ -0,0 +1,24 @@
 import type { Component, ComponentRenderResult } from "./tui.js";
 /**
 * A simple component that renders blank lines for spacing
 */
 export class WhitespaceComponent implements Component {
 	private lines: string[] = [];
 	private lineCount: number;
 	private firstRender: boolean = true;
 	constructor(lineCount: number = 1) {
 		this.lineCount = Math.max(0, lineCount); // Ensure non-negative
 		this.lines = new Array(this.lineCount).fill("");
 	}
 	render(_width: number): ComponentRenderResult {
 		const result = {
 			lines: this.lines,
 			changed: this.firstRender,
 		};
 		this.firstRender = false;
 		return result;
 	}
 }
--- a/packages/tui/test/demo.ts
+++ b/packages/tui/test/demo.ts
@ -0,0 +1,98 @@
 #!/usr/bin/env node
 import {
 	CombinedAutocompleteProvider,
 	Container,
 	MarkdownComponent,
 	TextComponent,
 	TextEditor,
 	TUI,
 } from "../src/index.js";
 // Create TUI manager
 const ui = new TUI();
 ui.configureLogging({
 	enabled: true,
 	logLevel: "debug",
 	logFile: "tui-debug.log",
 });
 // Create a chat container that will hold messages
 const chatContainer = new Container();
 const editor = new TextEditor();
 // Set up autocomplete with slash commands
 const autocompleteProvider = new CombinedAutocompleteProvider(
 	[
 		{ name: "clear", description: "Clear chat history" },
 		{ name: "clear-last", description: "Clear last message" },
 		{ name: "exit", description: "Exit the application" },
 	],
 	process.cwd(),
 );
 editor.setAutocompleteProvider(autocompleteProvider);
 // Add components to UI
 ui.addChild(new TextComponent("Differential Rendering TUI"));
 ui.addChild(chatContainer);
 ui.addChild(editor);
 // Set focus to the editor (index 2)
 ui.setFocus(editor);
 // Test with Claude's multiline text
 const testText = `Root level:
 - CLAUDE.md
 - README.md
 - biome.json
 - package.json
 - package-lock.json
 - tsconfig.json
 - tui-debug.log
 Directories:
 - \`data/\` (JSON test files)
 - \`dist/\` 
 - \`docs/\` (markdown documentation)
 - \`node_modules/\`
 - \`src/\` (TypeScript source files)`;
 // Pre-fill the editor with the test text
 editor.setText(testText);
 // Handle editor submissions
 editor.onSubmit = (text: string) => {
 	text = text.trim();
 	if (text === "/clear") {
 		chatContainer.clear();
 		ui.requestRender();
 		return;
 	}
 	if (text === "/clear-last") {
 		const count = chatContainer.getChildCount();
 		if (count > 0) {
 			chatContainer.removeChildAt(count - 1);
 			ui.requestRender();
 		}
 		return;
 	}
 	if (text === "/exit") {
 		ui.stop();
 		return;
 	}
 	if (text) {
 		// Create new message component and add to chat container
 		const message = new MarkdownComponent(text);
 		chatContainer.addChild(message);
 		// Manually trigger re-render
 		ui.requestRender();
 	}
 };
 // Start the UI
 ui.start();
--- a/packages/tui/tsconfig.build.json
+++ b/packages/tui/tsconfig.build.json
@ -0,0 +1,9 @@
 {
 	"extends": "../../tsconfig.base.json",
 	"compilerOptions": {
 		"outDir": "./dist",
 		"rootDir": "./src"
 	},
 	"include": ["src/**/*"],
 	"exclude": ["node_modules", "dist"]
 }
--- a/scripts/sync-versions.js
+++ b/scripts/sync-versions.js
@ -0,0 +1,40 @@
 #!/usr/bin/env node
 /**
 * Syncs inter-package dependency versions in the monorepo
 * Updates @mariozechner/pi-tui and @mariozechner/pi-agent versions
 * in dependent packages to match their current versions
 */
 import { readFileSync, writeFileSync } from 'fs';
 import { join } from 'path';
 const packagesDir = join(process.cwd(), 'packages');
 // Read current versions
 const tui = JSON.parse(readFileSync(join(packagesDir, 'tui/package.json'), 'utf8'));
 const agent = JSON.parse(readFileSync(join(packagesDir, 'agent/package.json'), 'utf8'));
 const pods = JSON.parse(readFileSync(join(packagesDir, 'pods/package.json'), 'utf8'));
 console.log('Current versions:');
 console.log(`  @mariozechner/pi-tui: ${tui.version}`);
 console.log(`  @mariozechner/pi-agent: ${agent.version}`);
 console.log(`  @mariozechner/pi: ${pods.version}`);
 // Update agent's dependency on tui
 if (agent.dependencies['@mariozechner/pi-tui']) {
  const oldVersion = agent.dependencies['@mariozechner/pi-tui'];
  agent.dependencies['@mariozechner/pi-tui'] = `^${tui.version}`;
  writeFileSync(join(packagesDir, 'agent/package.json'), JSON.stringify(agent, null, '\t') + '\n');
  console.log(`\nUpdated agent's dependency on pi-tui: ${oldVersion} → ^${tui.version}`);
 }
 // Update pods' dependency on agent
 if (pods.dependencies['@mariozechner/pi-agent']) {
  const oldVersion = pods.dependencies['@mariozechner/pi-agent'];
  pods.dependencies['@mariozechner/pi-agent'] = `^${agent.version}`;
  writeFileSync(join(packagesDir, 'pods/package.json'), JSON.stringify(pods, null, '\t') + '\n');
  console.log(`Updated pods' dependency on pi-agent: ${oldVersion} → ^${agent.version}`);
 }
 console.log('\n✅ Version sync complete!');
--- a/todos/todos.md
+++ b/todos/todos.md
@ -0,0 +1,3 @@
 - pods: if a pod is down and i run `pi list`, verifying processes says All processes verified. But that can't be true, as we can no longer SSH into the pod to check.
 - agent: start a new agent session. when i press CTRL+C, "Press Ctrl+C again to exit" appears above the text editor followed by an empty line. After about 1 second, the empty line disappears. We should either not show the empty line, or always show the empty line. Maybe Ctrl+C info should be displayed below the text editor.
 - tui: npx tsx test/demo.ts, using /exit or pressing CTRL+C does not work to exit the demo.
--- a/tsconfig.base.json
+++ b/tsconfig.base.json
@ -0,0 +1,17 @@
 {
 	"compilerOptions": {
 		"target": "esnext",
 		"module": "esnext",
 		"lib": ["ES2022"],
 		"strict": true,
 		"esModuleInterop": true,
 		"skipLibCheck": true,
 		"forceConsistentCasingInFileNames": true,
 		"declaration": true,
 		"declarationMap": true,
 		"sourceMap": true,
 		"moduleResolution": "bundler",
 		"resolveJsonModule": true,
 		"types": ["node"]
 	}
 }
--- a/tsconfig.json
+++ b/tsconfig.json
@ -0,0 +1,11 @@
 {
 	"extends": "./tsconfig.base.json",
 	"compilerOptions": {
 		"paths": {
 			"@mariozechner/pi-tui": ["./packages/tui/src/index.ts"],
 			"@mariozechner/pi-agent": ["./packages/agent/src/index.ts"],
 			"@mariozechner/pi": ["./packages/pods/src/index.ts"]
 		}
 	},
 	"include": ["packages/*/src/**/*"]
 }
		`@ -0,0 +1,2 @@`
							`// Main library exports`
							`export * from "./types.js";`