mirror of https://github.com/harivansh-afk/sandbox-agent.git synced 2026-04-15 15:03:37 +00:00

Nathan Flurry 94353f7696 chore: fix bad merge

2026-02-11 07:57:02 -08:00

17 KiB

Raw Blame History

OpenCode Research

Research notes on OpenCode's configuration, credential discovery, and runtime behavior based on agent-jj implementation.

Overview

Provider: Multi-provider (OpenAI, Anthropic, others)
Execution Method: Embedded server via SDK, or CLI binary
Session Persistence: Session ID (string)
SDK: @opencode-ai/sdk (server + client)
Binary Location: ~/.opencode/bin/opencode
Written in: Go (with Bubble Tea TUI)

CLI Usage (Alternative to SDK)

OpenCode can be used as a standalone binary instead of embedding the SDK:

Interactive TUI Mode

opencode                      # Start TUI in current directory
opencode /path/to/project     # Start in specific directory
opencode -c                   # Continue last session
opencode -s SESSION_ID        # Continue specific session

Non-Interactive Mode (`opencode run`)

opencode run "your prompt here"
opencode run --format json "prompt"   # Raw JSON events output
opencode run -m anthropic/claude-sonnet-4-20250514 "prompt"
opencode run --agent plan "analyze this code"
opencode run -c "follow up question"  # Continue last session
opencode run -s SESSION_ID "prompt"   # Continue specific session
opencode run -f file1.ts -f file2.ts "review these files"

Key CLI Flags

Flag	Description
`--format json`	Output raw JSON events (for parsing)
`-m, --model PROVIDER/MODEL`	Model in format `provider/model`
`--agent AGENT`	Agent to use (`build`, `plan`)
`-c, --continue`	Continue last session
`-s, --session ID`	Continue specific session
`-f, --file FILE`	Attach file(s) to message
`--attach URL`	Attach to running server
`--port PORT`	Local server port
`--variant VARIANT`	Reasoning effort (e.g., `high`, `max`)

Headless Server Mode

opencode serve                        # Start headless server
opencode serve --port 4096            # Specific port
opencode attach http://localhost:4096 # Attach to running server

Other Commands

opencode models                 # List available models
opencode models anthropic       # List models for provider
opencode auth                   # Manage credentials
opencode session                # Manage sessions
opencode export SESSION_ID      # Export session as JSON
opencode stats                  # Token usage statistics

Sources: OpenCode GitHub, OpenCode Docs

Architecture

OpenCode runs as an embedded HTTP server per workspace/change:

┌─────────────────────┐
│   agent-jj backend  │
│                     │
│  ┌───────────────┐  │
│  │ OpenCode      │  │
│  │ Server        │◄─┼── HTTP API
│  │ (per change)  │  │
│  └───────────────┘  │
└─────────────────────┘

One server per changeId (workspace+repo+change combination)
Multiple sessions can share a server
Server runs on dynamic port (4200-4300 range)

Credential Discovery

Priority Order

Environment variables: ANTHROPIC_API_KEY, CLAUDE_API_KEY
Environment variables: OPENAI_API_KEY, CODEX_API_KEY
Claude Code config files
Codex config files
OpenCode config files

Config File Location

Path	Description
`~/.local/share/opencode/auth.json`	Primary auth config

Auth File Structure

{
  "anthropic": {
    "type": "api",
    "key": "sk-ant-..."
  },
  "openai": {
    "type": "api",
    "key": "sk-..."
  },
  "custom-provider": {
    "type": "oauth",
    "access": "token...",
    "refresh": "refresh-token...",
    "expires": 1704067200000
  }
}

Provider Config Types

interface OpenCodeProviderConfig {
  type: "api" | "oauth";
  key?: string;      // For API type
  access?: string;   // For OAuth type
  refresh?: string;  // For OAuth type
  expires?: number;  // Unix timestamp (ms)
}

OAuth tokens are validated for expiry before use.

Server Management

Starting a Server

import { createOpencodeServer } from "@opencode-ai/sdk/server";
import { createOpencodeClient } from "@opencode-ai/sdk";

const server = await createOpencodeServer({
  hostname: "127.0.0.1",
  port: 4200,
  config: { logLevel: "DEBUG" }
});

const client = createOpencodeClient({
  baseUrl: `http://127.0.0.1:${port}`
});

Server Configuration

// From config.json
{
  "opencode": {
    "host": "127.0.0.1",        // Bind address
    "advertisedHost": "127.0.0.1" // External address (for tunnels)
  }
}

Port Selection

Uses get-port package to find available port in range 4200-4300.

Client API

Session Management

// Create session
const response = await client.session.create({});
const sessionId = response.data.id;

// Get session info
const session = await client.session.get({ path: { id: sessionId } });

// Get session messages
const messages = await client.session.messages({ path: { id: sessionId } });

// Get session todos
const todos = await client.session.todo({ path: { id: sessionId } });

Sending Prompts

Synchronous

const response = await client.session.prompt({
  path: { id: sessionId },
  body: {
    model: { providerID: "openai", modelID: "gpt-4o" },
    agent: "build",
    parts: [{ type: "text", text: "prompt text" }]
  }
});

Asynchronous (Streaming)

// Start prompt asynchronously
await client.session.promptAsync({
  path: { id: sessionId },
  body: {
    model: { providerID: "openai", modelID: "gpt-4o" },
    agent: "build",
    parts: [{ type: "text", text: "prompt text" }]
  }
});

// Subscribe to events
const eventStream = await client.event.subscribe({});

for await (const event of eventStream.stream) {
  // Process events
}

Event Types

Event Type	Description
`message.part.updated`	Message part streamed/updated
`session.status`	Session status changed
`session.idle`	Session finished processing
`session.error`	Session error occurred
`question.asked`	AI asking user question
`permission.asked`	AI requesting permission

Event Structure

interface SDKEvent {
  type: string;
  properties: {
    part?: SDKPart & { sessionID?: string };
    delta?: string;          // Text delta for streaming
    status?: { type?: string };
    sessionID?: string;
    error?: { data?: { message?: string } };
    id?: string;
    questions?: QuestionInfo[];
    permission?: string;
    patterns?: string[];
    metadata?: Record<string, unknown>;
    always?: string[];
    tool?: { messageID?: string; callID?: string };
  };
}

Message Parts

OpenCode has rich message part types:

Type	Description
`text`	Plain text content
`reasoning`	Model reasoning (chain-of-thought)
`tool`	Tool invocation with status
`file`	File reference
`step-start`	Step boundary start
`step-finish`	Step boundary end with reason
`subtask`	Delegated subtask

Part Structure

interface MessagePart {
  type: "text" | "reasoning" | "tool" | "file" | "step-start" | "step-finish" | "subtask" | "other";
  id: string;
  content: string;
  // Tool-specific
  toolName?: string;
  toolStatus?: "pending" | "running" | "completed" | "error";
  toolInput?: Record<string, unknown>;
  toolOutput?: string;
  // File-specific
  filename?: string;
  mimeType?: string;
  // Step-specific
  stepReason?: string;
  // Subtask-specific
  subtaskAgent?: string;
  subtaskDescription?: string;
}

Questions and Permissions

Question Request

interface QuestionRequest {
  id: string;
  sessionID: string;
  questions: Array<{
    header?: string;
    question: string;
    options: Array<{ label: string; description?: string }>;
    multiSelect?: boolean;
  }>;
  tool?: { messageID: string; callID: string };
}

Responding to Questions

// V1 client for question/permission APIs
const clientV1 = createOpencodeClientV1({
  baseUrl: `http://127.0.0.1:${port}`
});

// Reply with answers
await clientV1.question.reply({
  requestID: requestId,
  answers: [["selected option"]]  // Array of selected labels per question
});

// Reject question
await clientV1.question.reject({ requestID: requestId });

Permission Request

interface PermissionRequest {
  id: string;
  sessionID: string;
  permission: string;     // Permission type (e.g., "file:write")
  patterns: string[];     // Affected paths/patterns
  metadata: Record<string, unknown>;
  always: string[];       // Options for "always allow"
  tool?: { messageID: string; callID: string };
}

Responding to Permissions

await clientV1.permission.reply({
  requestID: requestId,
  reply: "once" | "always" | "reject"
});

Provider/Model Discovery

// Get available providers and models
const providerResponse = await client.provider.list({});
const agentResponse = await client.app.agents({});

interface ProviderInfo {
  id: string;
  name: string;
  models: Array<{
    id: string;
    name: string;
    reasoning: boolean;
    toolCall: boolean;
  }>;
}

interface AgentInfo {
  id: string;
  name: string;
  primary: boolean;  // "build" and "plan" are primary
}

Internal Agents (Hidden from UI)

compaction
title
summary

Token Usage

interface TokenUsage {
  input: number;
  output: number;
  reasoning?: number;
  cache?: {
    read: number;
    write: number;
  };
}

Available in message info field for assistant messages.

Agent Modes vs Permission Modes

OpenCode properly separates these concepts:

Agent Modes

Agents are first-class concepts with their own system prompts and behavior:

Agent ID	Description
`build`	Default execution agent
`plan`	Planning/analysis agent
Custom	User-defined agents in config

// Sending a prompt with specific agent
await client.session.promptAsync({
  body: {
    agent: "plan",  // or "build", or custom agent ID
    parts: [{ type: "text", text: "..." }]
  }
});

Listing Available Agents

const agents = await client.app.agents({});
// Returns: [{ id: "build", name: "Build", primary: true }, ...]

Permission Modes

Permissions are configured via rulesets on the session, separate from agent selection:

interface PermissionRuleset {
  // Tool-specific permission rules
}

Human-in-the-Loop

OpenCode has full interactive HITL via SSE events:

Event	Endpoint
`question.asked`	`POST /question/{id}/reply`
`permission.asked`	`POST /permission/{id}/reply`

See research/human-in-the-loop.md for full API details.

Defaults

const DEFAULT_OPENCODE_MODEL_ID = "gpt-4o";
const DEFAULT_OPENCODE_PROVIDER_ID = "openai";

Concurrency Control

Server startup uses a lock to prevent race conditions:

async function withStartLock<T>(fn: () => Promise<T>): Promise<T> {
  const prior = startLock;
  let release: () => void;
  startLock = new Promise((resolve) => { release = resolve; });
  await prior;
  try {
    return await fn();
  } finally {
    release();
  }
}

Working Directory

Server must be started in the correct working directory:

async function withWorkingDir<T>(workingDir: string, fn: () => Promise<T>): Promise<T> {
  const previous = process.cwd();
  process.chdir(workingDir);
  try {
    return await fn();
  } finally {
    process.chdir(previous);
  }
}

Polling Fallback

A polling mechanism checks session status every 2 seconds in case SSE events don't arrive:

const pollInterval = setInterval(async () => {
  const session = await client.session.get({ path: { id: sessionId } });
  if (session.data?.status?.type === "idle") {
    abortController.abort();
  }
}, 2000);

Model Discovery

OpenCode has the richest model discovery support with both CLI and HTTP API.

CLI Commands

opencode models                 # List all available models
opencode models <provider>      # List models for a specific provider

HTTP Endpoint

GET /provider

Response Schema

{
  "all": [
    {
      "id": "anthropic",
      "name": "Anthropic",
      "api": "string",
      "env": ["ANTHROPIC_API_KEY"],
      "npm": "string",
      "models": {
        "model-key": {
          "id": "string",
          "name": "string",
          "family": "string",
          "release_date": "string",
          "attachment": true,
          "reasoning": false,
          "tool_call": true,
          "cost": {
            "input": 0.003,
            "output": 0.015,
            "cache_read": 0.0003,
            "cache_write": 0.00375
          },
          "limit": {
            "context": 200000,
            "input": 200000,
            "output": 8192
          },
          "modalities": {
            "input": ["text", "image"],
            "output": ["text"]
          },
          "experimental": false,
          "status": "beta"
        }
      }
    }
  ],
  "default": {
    "anthropic": "claude-sonnet-4-20250514"
  },
  "connected": ["anthropic"]
}

SDK Usage

const client = createOpencodeClient();
const response = await client.provider.list();

How to Replicate

When an OpenCode server is running, call GET /provider on its HTTP port. Returns full model metadata including capabilities, costs, context limits, and modalities.

Command Execution & Process Management

Agent Tool Execution

The agent executes commands via internal tools (not exposed in the HTTP API). The agent's tool calls are synchronous within its turn. Tool parts have states: pending, running, completed, error.

PTY System (`/pty/*`) - User-Facing Terminals

Separate from the agent's command execution. PTYs are server-scoped interactive terminals for the user:

POST /pty - Create PTY (command, args, cwd, title, env)
GET /pty - List all PTYs
GET /pty/{ptyID} - Get PTY info
PUT /pty/{ptyID} - Update PTY (title, resize via size: {rows, cols})
DELETE /pty/{ptyID} - Kill and remove PTY
GET /pty/{ptyID}/connect - WebSocket for bidirectional I/O

PTY events (globally broadcast via SSE): pty.created, pty.updated, pty.exited, pty.deleted.

The agent does NOT use the PTY system. PTYs are for the user's interactive terminal panel, independent of any AI session.

Session Commands (`/session/{id}/command`, `/session/{id}/shell`) - Context Injection

External clients can inject command results into an AI session's conversation context:

POST /session/{sessionID}/command - Executes a command and records the result as an AssistantMessage in the session. Required fields: command, arguments. The output becomes part of the AI's context for subsequent turns.
POST /session/{sessionID}/shell - Similar but wraps in sh -c. Required fields: command, agent.
GET /command - Lists available command definitions (metadata, not execution).

Session commands emit command.executed events with sessionID + messageID.

Key distinction: These endpoints execute commands directly (not via the AI), then inject the output into the session as if the AI produced it. The AI doesn't actively run the command - it just finds the output in its conversation history on the next turn.

Three Separate Execution Mechanisms

Mechanism	Who uses it	Scoped to	AI sees output?
Agent tools (internal)	AI agent	Session turn	Yes (immediate)
PTY (`/pty/*`)	User/frontend	Server (global)	No
Session commands (`/session/{id}/*`)	Frontend/SDK client	Session	Yes (next turn)

The agent has no tool to interact with PTYs and cannot access the session command endpoints. When the agent needs to run a background process, it uses its internal bash-equivalent tool with shell backgrounding (&).

Comparison

Capability	Supported?	Notes
Agent runs commands	Yes (internal tools)	Synchronous, blocks agent turn
User runs commands → agent sees output	Yes (`/session/{id}/command`)	HTTP API, first-class
External API for command injection	Yes	Session-scoped endpoints
Command source tracking	Implicit	Endpoint implies source (no enum)
Background process management	No	Shell `&` only for agent
PTY / interactive terminal	Yes (`/pty/*`)	Server-scoped, WebSocket I/O

Notes

OpenCode is the most feature-rich runtime (streaming, questions, permissions)
Server persists for the lifetime of a change (workspace+repo+change)
Parts are streamed incrementally with delta updates
V1 client is needed for question/permission APIs
Working directory affects credential discovery and file operations

17 KiB Raw Blame History