harivansh-afk/sandbox-agent

Fork 0

mirror of https://github.com/harivansh-afk/sandbox-agent.git synced 2026-04-15 17:01:02 +00:00

financialvice a5a6492165

fix: support Claude OAuth token for model listing (#109 )

2026-02-06 20:17:01 -08:00

13 KiB

Raw Permalink Blame History

Architecture

How the daemon, schemas, and agents fit together.

Sandbox Agent SDK is built around a single daemon that runs inside the sandbox and exposes a universal HTTP API. Clients use the API (or the TypeScript SDK / CLI) to create sessions, send messages, and stream events.

Components

Daemon: Rust HTTP server that manages agent processes and streaming.
Universal schema: Shared input/output types for messages and events.
SDKs & CLI: Convenience wrappers around the HTTP API.

Agent Schema Pipeline

The schema pipeline extracts type definitions from AI coding agents and converts them to a universal format.

Schema Extraction

TypeScript extractors in resources/agent-schemas/src/ pull schemas from each agent:

Agent	Source	Extractor
Claude	`claude --output-format json --json-schema`	`claude.ts`
Codex	`codex app-server generate-json-schema`	`codex.ts`
OpenCode	GitHub OpenAPI spec	`opencode.ts`
Amp	Scrapes ampcode.com docs	`amp.ts`

All extractors include fallback schemas for when CLIs or URLs are unavailable.

Output: JSON schemas written to resources/agent-schemas/artifacts/json-schema/

Rust Type Generation

The server/packages/extracted-agent-schemas/ package generates Rust types at build time:

build.rs reads JSON schemas and uses the typify crate to generate Rust structs
Generated code is written to $OUT_DIR/{agent}.rs
Types are exposed via include!() macros in src/lib.rs

resources/agent-schemas/artifacts/json-schema/*.json
        ↓ (build.rs + typify)
$OUT_DIR/{claude,codex,opencode,amp}.rs
        ↓ (include!)
extracted_agent_schemas::{claude,codex,opencode,amp}::*

Universal Schema

The server/packages/universal-agent-schema/ package defines agent-agnostic types:

Core types (src/lib.rs):

UniversalEvent - Wrapper with id, timestamp, session_id, agent, data
UniversalEventData - Enum: Message, Started, Error, QuestionAsked, PermissionAsked, Unknown
UniversalMessage - Parsed (role, parts, metadata) or Unparsed (raw JSON)
UniversalMessagePart - Text, ToolCall, ToolResult, FunctionCall, FunctionResult, File, Image, Error, Unknown

Converters (src/agents/{claude,codex,opencode,amp}.rs):

Each agent has a converter module that transforms native events to universal format
Conversions are best-effort; unparseable data preserved in Unparsed or Unknown variants

Session Management

Sessions track agent conversations with in-memory state.

Session Model

Session ID: Client-provided primary session identifier.
Agent session ID: Underlying ID from the agent (thread/session). This is surfaced in events but is not the primary key.

Storage

Sessions are stored in an in-memory HashMap<String, SessionState> inside SessionManager:

struct SessionManager {
    sessions: Mutex<HashMap<String, SessionState>>,
    // ...
}

There is no disk persistence. Sessions are ephemeral and lost on server restart.

SessionState

Each session tracks:

Field	Purpose
`session_id`	Client-provided identifier
`agent`	Agent type (Claude, Codex, OpenCode, Amp)
`agent_mode`	Operating mode (build, plan, custom)
`permission_mode`	Permission handling (default, plan, bypass)
`model`	Optional model override
`events: Vec<UniversalEvent>`	Full event history
`pending_questions`	Question IDs awaiting reply
`pending_permissions`	Permission IDs awaiting reply
`broadcaster`	Tokio broadcast channel for SSE streaming
`ended`	Whether agent process has terminated

Lifecycle

POST /v1/sessions/{sessionId}     Create session, auto-install agent
        ↓
POST /v1/sessions/{id}/messages   Spawn agent subprocess, stream output
POST /v1/sessions/{id}/messages/stream   Post and stream a single turn
        ↓
GET /v1/sessions/{id}/events      Poll for new events (offset-based)
GET /v1/sessions/{id}/events/sse  Subscribe to SSE stream
        ↓
POST .../questions/{id}/reply     Answer agent question
POST .../permissions/{id}/reply   Grant/deny permission request
        ↓
(agent process terminates)        Session marked as ended

Event Streaming

Events are stored in memory per session and assigned a monotonically increasing id.
/events returns a slice of events by offset/limit.
/events/sse streams new events from the same offset semantics.

When a message is sent:

send_message() spawns the agent CLI as a subprocess
consume_spawn() reads stdout/stderr line by line
Each JSON line is parsed and converted via parse_agent_line()
Events are recorded via record_event() which:
- Assigns incrementing event ID
- Appends to events vector
- Broadcasts to SSE subscribers

Agent Execution

Each agent has a different execution model and communication pattern. There are two main architectural patterns:

Architecture Patterns

Subprocess Model (Claude, Amp):

New process spawned per message/turn
Process terminates after turn completes
Multi-turn via CLI resume flags (--resume, --continue)
Simple but has process spawn overhead

Client/Server Model (OpenCode, Codex):

Single long-running server process
Multiple sessions/threads multiplexed via RPC
Multi-turn via server-side thread persistence
More efficient for repeated interactions

Overview

Agent	Architecture	Binary Source	Multi-Turn Method
Claude Code	Subprocess (per-turn)	GCS (Anthropic)	`--resume` flag
Codex	Shared Server (JSON-RPC)	GitHub releases	Thread persistence
OpenCode	HTTP Server (SSE)	GitHub releases	Server-side sessions
Amp	Subprocess (per-turn)	GCS (Amp)	`--continue` flag

Claude Code

Spawned as a subprocess with JSONL streaming:

claude --print --output-format stream-json --verbose \
  [--model MODEL] [--resume SESSION_ID] \
  [--permission-mode plan | --dangerously-skip-permissions] \
  PROMPT

Streams JSON events to stdout, one per line
Supports session resumption via --resume
Permission modes: --permission-mode plan for approval workflow, --dangerously-skip-permissions for bypass

Codex

Uses a shared app-server process that handles multiple sessions via JSON-RPC over stdio:

codex app-server

Daemon flow:

First Codex session triggers codex app-server spawn
Performs initialize / initialized handshake
Each session creation sends thread/start → receives thread_id
Messages sent via turn/start with thread_id
Notifications routed back to session by thread_id

Key characteristics:

Single process handles all Codex sessions
JSON-RPC over stdio (JSONL format)
Thread IDs map to daemon session IDs
Approval requests arrive as server-to-client JSON-RPC requests
Process lifetime matches daemon lifetime (not per-turn)

OpenCode

Unique architecture - runs as a persistent HTTP server rather than per-message subprocess:

opencode serve --port {4200-4300}

Then communicates via HTTP endpoints:

Endpoint	Purpose
`POST /session`	Create new session
`POST /session/{id}/prompt`	Send message
`GET /event/subscribe`	SSE event stream
`POST /question/reply`	Answer HITL question
`POST /permission/reply`	Grant/deny permission

The server is started once and reused across sessions. Events are received via Server-Sent Events (SSE) subscription.

Amp

Spawned as a subprocess with dynamic flag detection:

amp [--execute|--print] [--output-format stream-json] \
  [--model MODEL] [--continue SESSION_ID] \
  [--dangerously-skip-permissions] PROMPT

Dynamic flag detection: Probes --help output to determine which flags the installed version supports
Fallback strategy: If execution fails, retries with progressively simpler flag combinations
Streams JSON events to stdout
Supports session continuation via --continue

Communication Patterns

Per-turn subprocess agents (Claude, Amp):

Agent CLI spawned with appropriate flags
Stdout/stderr read line-by-line
Each line parsed as JSON
Events converted via parse_agent_line() → agent-specific converter
Universal events recorded and broadcast to SSE subscribers
Process terminated on turn completion

Shared stdio server agent (Codex):

Single codex app-server process started on first session
initialize/initialized handshake performed once
New sessions send thread/start, receive thread_id
Messages sent via turn/start with thread_id
Notifications read from stdout, routed by thread_id
Process persists across sessions and turns

HTTP server agent (OpenCode):

Server started on available port (if not running)
Session created via HTTP POST
Prompts sent via HTTP POST
Events received via SSE subscription
HITL responses forwarded via HTTP POST

Credential Handling

All agents receive API keys via environment variables:

Agent	Environment Variables
Claude	`ANTHROPIC_API_KEY`, `CLAUDE_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, `ANTHROPIC_AUTH_TOKEN`
Codex	`OPENAI_API_KEY`, `CODEX_API_KEY`
OpenCode	`OPENAI_API_KEY`
Amp	`ANTHROPIC_API_KEY`

Human-in-the-Loop

Questions and permission prompts are normalized into the universal schema:

Question events surface as questionAsked with selectable options.
Permission events surface as permissionAsked with reply: once | always | reject.
Claude plan approval is normalized into a question event (approve/reject).

SDK Modes

The TypeScript SDK supports two connection modes.

Embedded Mode

Defined in sdks/typescript/src/spawn.ts:

Binary resolution: Checks SANDBOX_AGENT_BIN env, then platform-specific npm package, then PATH
Port selection: Uses provided port or finds a free one via net.createServer()
Token generation: Uses provided token or generates random 24-byte hex string
Spawn: Launches sandbox-agent server --host <host> --port <port> --token <token>
Health wait: Polls GET /v1/health until server is ready (up to 15s timeout)
Cleanup: On dispose, sends SIGTERM then SIGKILL if needed; also registers process exit handlers

const handle = await spawnSandboxAgent({ log: "inherit" });
// handle.baseUrl = "http://127.0.0.1:<port>"
// handle.token = "<generated>"
// handle.dispose() to cleanup

Server Mode

Defined in sdks/typescript/src/client.ts:

Direct HTTP client to a remote sandbox-agent server
Uses provided baseUrl and optional token
No subprocess management

const client = await SandboxAgent.connect({
  baseUrl: "http://remote-server:8080",
  token: "secret",
});

Auto-Detection

SandboxAgent provides two factory methods:

// Connect to existing server
const client = await SandboxAgent.connect({
  baseUrl: "http://remote:8080",
});

// Start embedded subprocess
const client = await SandboxAgent.start();

// With options
const client = await SandboxAgent.start({
  spawn: { port: 9000 },
});

The spawn option can be:

true / false - Enable/disable embedded mode
SandboxAgentSpawnOptions - Fine-grained control over host, port, token, binary path, timeout, logging

Authentication

The daemon uses a global token configured at startup. All HTTP and CLI operations reuse the same token and are validated against the Authorization header (Bearer or Token).

Key Files

Component	Path
Agent spawn/install	`server/packages/agent-management/src/agents.rs`
Session routing	`server/packages/sandbox-agent/src/router.rs`
Event converters	`server/packages/universal-agent-schema/src/agents/*.rs`
Schema extractors	`resources/agent-schemas/src/*.ts`
TypeScript SDK	`sdks/typescript/src/`

Agent Compatibility

Supported agents, install methods, and streaming formats.

Compatibility Matrix

Agent	Provider	Binary	Install method	Session ID	Streaming format
Claude Code	Anthropic	`claude`	curl raw binary from GCS	`session_id`	JSONL via stdout
Codex	OpenAI	`codex`	curl tarball from GitHub releases	`thread_id`	JSON-RPC over stdio
OpenCode	Multi-provider	`opencode`	curl tarball from GitHub releases	`session_id`	SSE or JSONL
Amp	Sourcegraph	`amp`	curl raw binary from GCS	`session_id`	JSONL via stdout
Mock	Built-in	—	bundled	`mock-*`	daemon-generated

Agent Modes

OpenCode: discovered via the server API.
Claude Code / Codex / Amp: hardcoded modes (typically build, plan, or custom).

Capability Notes

Questions / permissions: OpenCode natively supports these workflows. Claude plan approval is normalized into a question event (tests do not currently exercise Claude question/permission flows).
Streaming: all agents stream events; OpenCode uses SSE, Codex uses JSON-RPC over stdio, others use JSONL. Codex is currently normalized to thread/turn starts plus user/assistant completed items (deltas and tool/reasoning items are not emitted yet).
User messages: Claude CLI output does not include explicit user-message events in our snapshots, so only assistant messages are surfaced for Claude today.
Files and images: normalized via UniversalMessagePart with File and Image parts.

13 KiB Raw Permalink Blame History