sandbox-agent/research/agents/opencode.md

540 lines
13 KiB
Markdown

# OpenCode Research
Research notes on OpenCode's configuration, credential discovery, and runtime behavior based on agent-jj implementation.
## Overview
- **Provider**: Multi-provider (OpenAI, Anthropic, others)
- **Execution Method**: Embedded server via SDK, or CLI binary
- **Session Persistence**: Session ID (string)
- **SDK**: `@opencode-ai/sdk` (server + client)
- **Binary Location**: `~/.opencode/bin/opencode`
- **Written in**: Go (with Bubble Tea TUI)
## CLI Usage (Alternative to SDK)
OpenCode can be used as a standalone binary instead of embedding the SDK:
### Interactive TUI Mode
```bash
opencode # Start TUI in current directory
opencode /path/to/project # Start in specific directory
opencode -c # Continue last session
opencode -s SESSION_ID # Continue specific session
```
### Non-Interactive Mode (`opencode run`)
```bash
opencode run "your prompt here"
opencode run --format json "prompt" # Raw JSON events output
opencode run -m anthropic/claude-sonnet-4-20250514 "prompt"
opencode run --agent plan "analyze this code"
opencode run -c "follow up question" # Continue last session
opencode run -s SESSION_ID "prompt" # Continue specific session
opencode run -f file1.ts -f file2.ts "review these files"
```
### Custom Args (CLI Flags)
#### Core Flags
| Flag | Type | Description |
|------|------|-------------|
| `-m, --model PROVIDER/MODEL` | string | Model in format `provider/model` (e.g., `anthropic/claude-sonnet-4-20250514`) |
| `--agent AGENT` | string | Agent to use (`build`, `plan`, or custom agent ID) |
| `--format FORMAT` | enum | `default` (formatted) or `json` (raw JSON events) |
| `--variant VARIANT` | string | Reasoning effort level (e.g., `high`, `max`, `minimal`) |
#### Session Flags
| Flag | Type | Description |
|------|------|-------------|
| `-c, --continue` | bool | Continue the last session |
| `-s, --session ID` | string | Continue a specific session by ID |
| `--title TEXT` | string | Title for the session (uses truncated prompt if omitted) |
| `--share` | bool | Share the session publicly |
#### Input/Output Flags
| Flag | Type | Description |
|------|------|-------------|
| `-f, --file FILE` | path[] | Attach file(s) to message (repeatable) |
| `--attach URL` | string | Attach to a running OpenCode server (e.g., `http://localhost:4096`) |
| `--port PORT` | int | Port for the local server (random if not specified) |
#### Debugging Flags
| Flag | Type | Values | Description |
|------|------|--------|-------------|
| `--log-level LEVEL` | enum | `DEBUG`, `INFO`, `WARN`, `ERROR` | Log verbosity level |
| `--print-logs` | bool | - | Print logs to stderr |
### Headless Server Mode
```bash
opencode serve # Start headless server
opencode serve --port 4096 # Specific port
opencode attach http://localhost:4096 # Attach to running server
```
### Other Commands
```bash
opencode models # List available models
opencode models anthropic # List models for provider
opencode auth # Manage credentials
opencode session # Manage sessions
opencode export SESSION_ID # Export session as JSON
opencode stats # Token usage statistics
```
Sources: [OpenCode GitHub](https://github.com/opencode-ai/opencode), [OpenCode Docs](https://opencode.ai/docs/cli/)
## Architecture
OpenCode runs as an embedded HTTP server per workspace/change:
```
┌─────────────────────┐
│ agent-jj backend │
│ │
│ ┌───────────────┐ │
│ │ OpenCode │ │
│ │ Server │◄─┼── HTTP API
│ │ (per change) │ │
│ └───────────────┘ │
└─────────────────────┘
```
- One server per `changeId` (workspace+repo+change combination)
- Multiple sessions can share a server
- Server runs on dynamic port (4200-4300 range)
## Credential Discovery
### Priority Order
1. Environment variables: `ANTHROPIC_API_KEY`, `CLAUDE_API_KEY`
2. Environment variables: `OPENAI_API_KEY`, `CODEX_API_KEY`
3. Claude Code config files
4. Codex config files
5. OpenCode config files
### Config File Location
| Path | Description |
|------|-------------|
| `~/.local/share/opencode/auth.json` | Primary auth config |
### Auth File Structure
```json
{
"anthropic": {
"type": "api",
"key": "sk-ant-..."
},
"openai": {
"type": "api",
"key": "sk-..."
},
"custom-provider": {
"type": "oauth",
"access": "token...",
"refresh": "refresh-token...",
"expires": 1704067200000
}
}
```
### Provider Config Types
```typescript
interface OpenCodeProviderConfig {
type: "api" | "oauth";
key?: string; // For API type
access?: string; // For OAuth type
refresh?: string; // For OAuth type
expires?: number; // Unix timestamp (ms)
}
```
OAuth tokens are validated for expiry before use.
## Server Management
### Starting a Server
```typescript
import { createOpencodeServer } from "@opencode-ai/sdk/server";
import { createOpencodeClient } from "@opencode-ai/sdk";
const server = await createOpencodeServer({
hostname: "127.0.0.1",
port: 4200,
config: { logLevel: "DEBUG" }
});
const client = createOpencodeClient({
baseUrl: `http://127.0.0.1:${port}`
});
```
### Server Configuration
```typescript
// From config.json
{
"opencode": {
"host": "127.0.0.1", // Bind address
"advertisedHost": "127.0.0.1" // External address (for tunnels)
}
}
```
### Port Selection
Uses `get-port` package to find available port in range 4200-4300.
## Client API
### Session Management
```typescript
// Create session
const response = await client.session.create({});
const sessionId = response.data.id;
// Get session info
const session = await client.session.get({ path: { id: sessionId } });
// Get session messages
const messages = await client.session.messages({ path: { id: sessionId } });
// Get session todos
const todos = await client.session.todo({ path: { id: sessionId } });
```
### Sending Prompts
#### Synchronous
```typescript
const response = await client.session.prompt({
path: { id: sessionId },
body: {
model: { providerID: "openai", modelID: "gpt-4o" },
agent: "build",
parts: [{ type: "text", text: "prompt text" }]
}
});
```
#### Asynchronous (Streaming)
```typescript
// Start prompt asynchronously
await client.session.promptAsync({
path: { id: sessionId },
body: {
model: { providerID: "openai", modelID: "gpt-4o" },
agent: "build",
parts: [{ type: "text", text: "prompt text" }]
}
});
// Subscribe to events
const eventStream = await client.event.subscribe({});
for await (const event of eventStream.stream) {
// Process events
}
```
## Event Types
| Event Type | Description |
|------------|-------------|
| `message.part.updated` | Message part streamed/updated |
| `session.status` | Session status changed |
| `session.idle` | Session finished processing |
| `session.error` | Session error occurred |
| `question.asked` | AI asking user question |
| `permission.asked` | AI requesting permission |
### Event Structure
```typescript
interface SDKEvent {
type: string;
properties: {
part?: SDKPart & { sessionID?: string };
delta?: string; // Text delta for streaming
status?: { type?: string };
sessionID?: string;
error?: { data?: { message?: string } };
id?: string;
questions?: QuestionInfo[];
permission?: string;
patterns?: string[];
metadata?: Record<string, unknown>;
always?: string[];
tool?: { messageID?: string; callID?: string };
};
}
```
## Message Parts
OpenCode has rich message part types:
| Type | Description |
|------|-------------|
| `text` | Plain text content |
| `reasoning` | Model reasoning (chain-of-thought) |
| `tool` | Tool invocation with status |
| `file` | File reference |
| `step-start` | Step boundary start |
| `step-finish` | Step boundary end with reason |
| `subtask` | Delegated subtask |
### Part Structure
```typescript
interface MessagePart {
type: "text" | "reasoning" | "tool" | "file" | "step-start" | "step-finish" | "subtask" | "other";
id: string;
content: string;
// Tool-specific
toolName?: string;
toolStatus?: "pending" | "running" | "completed" | "error";
toolInput?: Record<string, unknown>;
toolOutput?: string;
// File-specific
filename?: string;
mimeType?: string;
// Step-specific
stepReason?: string;
// Subtask-specific
subtaskAgent?: string;
subtaskDescription?: string;
}
```
## Questions and Permissions
### Question Request
```typescript
interface QuestionRequest {
id: string;
sessionID: string;
questions: Array<{
header?: string;
question: string;
options: Array<{ label: string; description?: string }>;
multiSelect?: boolean;
}>;
tool?: { messageID: string; callID: string };
}
```
### Responding to Questions
```typescript
// V2 client for question/permission APIs
const clientV2 = createOpencodeClientV2({
baseUrl: `http://127.0.0.1:${port}`
});
// Reply with answers
await clientV2.question.reply({
requestID: requestId,
answers: [["selected option"]] // Array of selected labels per question
});
// Reject question
await clientV2.question.reject({ requestID: requestId });
```
### Permission Request
```typescript
interface PermissionRequest {
id: string;
sessionID: string;
permission: string; // Permission type (e.g., "file:write")
patterns: string[]; // Affected paths/patterns
metadata: Record<string, unknown>;
always: string[]; // Options for "always allow"
tool?: { messageID: string; callID: string };
}
```
### Responding to Permissions
```typescript
await clientV2.permission.reply({
requestID: requestId,
reply: "once" | "always" | "reject"
});
```
## Provider/Model Discovery
```typescript
// Get available providers and models
const providerResponse = await client.provider.list({});
const agentResponse = await client.app.agents({});
interface ProviderInfo {
id: string;
name: string;
models: Array<{
id: string;
name: string;
reasoning: boolean;
toolCall: boolean;
}>;
}
interface AgentInfo {
id: string;
name: string;
primary: boolean; // "build" and "plan" are primary
}
```
### Internal Agents (Hidden from UI)
- `compaction`
- `title`
- `summary`
## Token Usage
```typescript
interface TokenUsage {
input: number;
output: number;
reasoning?: number;
cache?: {
read: number;
write: number;
};
}
```
Available in message `info` field for assistant messages.
## Agent Modes vs Permission Modes
OpenCode properly separates these concepts:
### Agent Modes
Agents are first-class concepts with their own system prompts and behavior:
| Agent ID | Description |
|----------|-------------|
| `build` | Default execution agent |
| `plan` | Planning/analysis agent |
| Custom | User-defined agents in config |
```typescript
// Sending a prompt with specific agent
await client.session.promptAsync({
body: {
agent: "plan", // or "build", or custom agent ID
parts: [{ type: "text", text: "..." }]
}
});
```
### Listing Available Agents
```typescript
const agents = await client.app.agents({});
// Returns: [{ id: "build", name: "Build", primary: true }, ...]
```
### Permission Modes
Permissions are configured via rulesets on the session, separate from agent selection:
```typescript
interface PermissionRuleset {
// Tool-specific permission rules
}
```
### Human-in-the-Loop
OpenCode has full interactive HITL via SSE events:
| Event | Endpoint |
|-------|----------|
| `question.asked` | `POST /question/{id}/reply` |
| `permission.asked` | `POST /permission/{id}/reply` |
See `research/human-in-the-loop.md` for full API details.
## Defaults
```typescript
const DEFAULT_OPENCODE_MODEL_ID = "gpt-4o";
const DEFAULT_OPENCODE_PROVIDER_ID = "openai";
```
## Concurrency Control
Server startup uses a lock to prevent race conditions:
```typescript
async function withStartLock<T>(fn: () => Promise<T>): Promise<T> {
const prior = startLock;
let release: () => void;
startLock = new Promise((resolve) => { release = resolve; });
await prior;
try {
return await fn();
} finally {
release();
}
}
```
## Working Directory
Server must be started in the correct working directory:
```typescript
async function withWorkingDir<T>(workingDir: string, fn: () => Promise<T>): Promise<T> {
const previous = process.cwd();
process.chdir(workingDir);
try {
return await fn();
} finally {
process.chdir(previous);
}
}
```
## Polling Fallback
A polling mechanism checks session status every 2 seconds in case SSE events don't arrive:
```typescript
const pollInterval = setInterval(async () => {
const session = await client.session.get({ path: { id: sessionId } });
if (session.data?.status?.type === "idle") {
abortController.abort();
}
}, 2000);
```
## Notes
- OpenCode is the most feature-rich runtime (streaming, questions, permissions)
- Server persists for the lifetime of a change (workspace+repo+change)
- Parts are streamed incrementally with delta updates
- V2 client is needed for question/permission APIs
- Working directory affects credential discovery and file operations