mirror of
https://github.com/harivansh-afk/sandbox-agent.git
synced 2026-04-15 13:03:46 +00:00
350 lines
12 KiB
Text
350 lines
12 KiB
Text
---
|
|
title: "Architecture"
|
|
description: "How the daemon, schemas, and agents fit together."
|
|
---
|
|
|
|
Sandbox Agent SDK is built around a single daemon that runs inside the sandbox and exposes a universal HTTP API. Clients use the API (or the TypeScript SDK / CLI) to create sessions, send messages, and stream events.
|
|
|
|
## Components
|
|
|
|
- **Daemon**: Rust HTTP server that manages agent processes and streaming.
|
|
- **Universal schema**: Shared input/output types for messages and events.
|
|
- **SDKs & CLI**: Convenience wrappers around the HTTP API.
|
|
|
|
## Agent Schema Pipeline
|
|
|
|
The schema pipeline extracts type definitions from AI coding agents and converts them to a universal format.
|
|
|
|
### Schema Extraction
|
|
|
|
TypeScript extractors in `resources/agent-schemas/src/` pull schemas from each agent:
|
|
|
|
| Agent | Source | Extractor |
|
|
|-------|--------|-----------|
|
|
| Claude | `claude --output-format json --json-schema` | `claude.ts` |
|
|
| Codex | `codex app-server generate-json-schema` | `codex.ts` |
|
|
| OpenCode | GitHub OpenAPI spec | `opencode.ts` |
|
|
| Amp | Scrapes ampcode.com docs | `amp.ts` |
|
|
|
|
All extractors include fallback schemas for when CLIs or URLs are unavailable.
|
|
|
|
**Output:** JSON schemas written to `resources/agent-schemas/artifacts/json-schema/`
|
|
|
|
### Rust Type Generation
|
|
|
|
The `server/packages/extracted-agent-schemas/` package generates Rust types at build time:
|
|
|
|
- `build.rs` reads JSON schemas and uses the `typify` crate to generate Rust structs
|
|
- Generated code is written to `$OUT_DIR/{agent}.rs`
|
|
- Types are exposed via `include!()` macros in `src/lib.rs`
|
|
|
|
```
|
|
resources/agent-schemas/artifacts/json-schema/*.json
|
|
↓ (build.rs + typify)
|
|
$OUT_DIR/{claude,codex,opencode,amp}.rs
|
|
↓ (include!)
|
|
extracted_agent_schemas::{claude,codex,opencode,amp}::*
|
|
```
|
|
|
|
### Universal Schema
|
|
|
|
The `server/packages/universal-agent-schema/` package defines agent-agnostic types:
|
|
|
|
**Core types** (`src/lib.rs`):
|
|
- `UniversalEvent` - Wrapper with id, timestamp, session_id, agent, data
|
|
- `UniversalEventData` - Enum: Message, Started, Error, QuestionAsked, PermissionAsked, Unknown
|
|
- `UniversalMessage` - Parsed (role, parts, metadata) or Unparsed (raw JSON)
|
|
- `UniversalMessagePart` - Text, ToolCall, ToolResult, FunctionCall, FunctionResult, File, Image, Error, Unknown
|
|
|
|
**Converters** (`src/agents/{claude,codex,opencode,amp}.rs`):
|
|
- Each agent has a converter module that transforms native events to universal format
|
|
- Conversions are best-effort; unparseable data preserved in `Unparsed` or `Unknown` variants
|
|
|
|
## Session Management
|
|
|
|
Sessions track agent conversations with in-memory state.
|
|
|
|
### Session Model
|
|
|
|
- **Session ID**: Client-provided primary session identifier.
|
|
- **Agent session ID**: Underlying ID from the agent (thread/session). This is surfaced in events but is not the primary key.
|
|
|
|
### Storage
|
|
|
|
Sessions are stored in an in-memory `HashMap<String, SessionState>` inside `SessionManager`:
|
|
|
|
```rust
|
|
struct SessionManager {
|
|
sessions: Mutex<HashMap<String, SessionState>>,
|
|
// ...
|
|
}
|
|
```
|
|
|
|
There is no disk persistence. Sessions are ephemeral and lost on server restart.
|
|
|
|
### SessionState
|
|
|
|
Each session tracks:
|
|
|
|
| Field | Purpose |
|
|
|-------|---------|
|
|
| `session_id` | Client-provided identifier |
|
|
| `agent` | Agent type (Claude, Codex, OpenCode, Amp) |
|
|
| `agent_mode` | Operating mode (build, plan, custom) |
|
|
| `permission_mode` | Permission handling (default, plan, bypass) |
|
|
| `model` | Optional model override |
|
|
| `events: Vec<UniversalEvent>` | Full event history |
|
|
| `pending_questions` | Question IDs awaiting reply |
|
|
| `pending_permissions` | Permission IDs awaiting reply |
|
|
| `broadcaster` | Tokio broadcast channel for SSE streaming |
|
|
| `ended` | Whether agent process has terminated |
|
|
|
|
### Lifecycle
|
|
|
|
```
|
|
POST /v1/sessions/{sessionId} Create session, auto-install agent
|
|
↓
|
|
POST /v1/sessions/{id}/messages Spawn agent subprocess, stream output
|
|
POST /v1/sessions/{id}/messages/stream Post and stream a single turn
|
|
↓
|
|
GET /v1/sessions/{id}/events Poll for new events (offset-based)
|
|
GET /v1/sessions/{id}/events/sse Subscribe to SSE stream
|
|
↓
|
|
POST .../questions/{id}/reply Answer agent question
|
|
POST .../permissions/{id}/reply Grant/deny permission request
|
|
↓
|
|
(agent process terminates) Session marked as ended
|
|
```
|
|
|
|
### Event Streaming
|
|
|
|
- Events are stored in memory per session and assigned a monotonically increasing `id`.
|
|
- `/events` returns a slice of events by offset/limit.
|
|
- `/events/sse` streams new events from the same offset semantics.
|
|
|
|
When a message is sent:
|
|
|
|
1. `send_message()` spawns the agent CLI as a subprocess
|
|
2. `consume_spawn()` reads stdout/stderr line by line
|
|
3. Each JSON line is parsed and converted via `parse_agent_line()`
|
|
4. Events are recorded via `record_event()` which:
|
|
- Assigns incrementing event ID
|
|
- Appends to `events` vector
|
|
- Broadcasts to SSE subscribers
|
|
|
|
## Agent Execution
|
|
|
|
Each agent has a different execution model and communication pattern. There are two main architectural patterns:
|
|
|
|
### Architecture Patterns
|
|
|
|
**Subprocess Model (Claude, Amp):**
|
|
- New process spawned per message/turn
|
|
- Process terminates after turn completes
|
|
- Multi-turn via CLI resume flags (`--resume`, `--continue`)
|
|
- Simple but has process spawn overhead
|
|
|
|
**Client/Server Model (OpenCode, Codex):**
|
|
- Single long-running server process
|
|
- Multiple sessions/threads multiplexed via RPC
|
|
- Multi-turn via server-side thread persistence
|
|
- More efficient for repeated interactions
|
|
|
|
### Overview
|
|
|
|
| Agent | Architecture | Binary Source | Multi-Turn Method |
|
|
|-------|--------------|---------------|-------------------|
|
|
| Claude Code | Subprocess (per-turn) | GCS (Anthropic) | `--resume` flag |
|
|
| Codex | **Shared Server (JSON-RPC)** | GitHub releases | **Thread persistence** |
|
|
| OpenCode | HTTP Server (SSE) | GitHub releases | Server-side sessions |
|
|
| Amp | Subprocess (per-turn) | GCS (Amp) | `--continue` flag |
|
|
|
|
### Claude Code
|
|
|
|
Spawned as a subprocess with JSONL streaming:
|
|
|
|
```bash
|
|
claude --print --output-format stream-json --verbose \
|
|
[--model MODEL] [--resume SESSION_ID] \
|
|
[--permission-mode plan | --dangerously-skip-permissions] \
|
|
PROMPT
|
|
```
|
|
|
|
- Streams JSON events to stdout, one per line
|
|
- Supports session resumption via `--resume`
|
|
- Permission modes: `--permission-mode plan` for approval workflow, `--dangerously-skip-permissions` for bypass
|
|
|
|
### Codex
|
|
|
|
Uses a **shared app-server process** that handles multiple sessions via JSON-RPC over stdio:
|
|
|
|
```bash
|
|
codex app-server
|
|
```
|
|
|
|
**Daemon flow:**
|
|
1. First Codex session triggers `codex app-server` spawn
|
|
2. Performs `initialize` / `initialized` handshake
|
|
3. Each session creation sends `thread/start` → receives `thread_id`
|
|
4. Messages sent via `turn/start` with `thread_id`
|
|
5. Notifications routed back to session by `thread_id`
|
|
|
|
**Key characteristics:**
|
|
- Single process handles all Codex sessions
|
|
- JSON-RPC over stdio (JSONL format)
|
|
- Thread IDs map to daemon session IDs
|
|
- Approval requests arrive as server-to-client JSON-RPC requests
|
|
- Process lifetime matches daemon lifetime (not per-turn)
|
|
|
|
### OpenCode
|
|
|
|
Unique architecture - runs as a **persistent HTTP server** rather than per-message subprocess:
|
|
|
|
```bash
|
|
opencode serve --port {4200-4300}
|
|
```
|
|
|
|
Then communicates via HTTP endpoints:
|
|
|
|
| Endpoint | Purpose |
|
|
|----------|---------|
|
|
| `POST /session` | Create new session |
|
|
| `POST /session/{id}/prompt` | Send message |
|
|
| `GET /event/subscribe` | SSE event stream |
|
|
| `POST /question/reply` | Answer HITL question |
|
|
| `POST /permission/reply` | Grant/deny permission |
|
|
|
|
The server is started once and reused across sessions. Events are received via Server-Sent Events (SSE) subscription.
|
|
|
|
### Amp
|
|
|
|
Spawned as a subprocess with dynamic flag detection:
|
|
|
|
```bash
|
|
amp [--execute|--print] [--output-format stream-json] \
|
|
[--model MODEL] [--continue SESSION_ID] \
|
|
[--dangerously-skip-permissions] PROMPT
|
|
```
|
|
|
|
- **Dynamic flag detection**: Probes `--help` output to determine which flags the installed version supports
|
|
- **Fallback strategy**: If execution fails, retries with progressively simpler flag combinations
|
|
- Streams JSON events to stdout
|
|
- Supports session continuation via `--continue`
|
|
|
|
### Communication Patterns
|
|
|
|
**Per-turn subprocess agents (Claude, Amp):**
|
|
1. Agent CLI spawned with appropriate flags
|
|
2. Stdout/stderr read line-by-line
|
|
3. Each line parsed as JSON
|
|
4. Events converted via `parse_agent_line()` → agent-specific converter
|
|
5. Universal events recorded and broadcast to SSE subscribers
|
|
6. Process terminated on turn completion
|
|
|
|
**Shared stdio server agent (Codex):**
|
|
1. Single `codex app-server` process started on first session
|
|
2. `initialize`/`initialized` handshake performed once
|
|
3. New sessions send `thread/start`, receive `thread_id`
|
|
4. Messages sent via `turn/start` with `thread_id`
|
|
5. Notifications read from stdout, routed by `thread_id`
|
|
6. Process persists across sessions and turns
|
|
|
|
**HTTP server agent (OpenCode):**
|
|
1. Server started on available port (if not running)
|
|
2. Session created via HTTP POST
|
|
3. Prompts sent via HTTP POST
|
|
4. Events received via SSE subscription
|
|
5. HITL responses forwarded via HTTP POST
|
|
|
|
### Credential Handling
|
|
|
|
All agents receive API keys via environment variables:
|
|
|
|
| Agent | Environment Variables |
|
|
|-------|----------------------|
|
|
| Claude | `ANTHROPIC_API_KEY`, `CLAUDE_API_KEY` |
|
|
| Codex | `OPENAI_API_KEY`, `CODEX_API_KEY` |
|
|
| OpenCode | `OPENAI_API_KEY` |
|
|
| Amp | `ANTHROPIC_API_KEY` |
|
|
|
|
## Human-in-the-Loop
|
|
|
|
Questions and permission prompts are normalized into the universal schema:
|
|
|
|
- Question events surface as `questionAsked` with selectable options.
|
|
- Permission events surface as `permissionAsked` with `reply: once | always | reject`.
|
|
- Claude plan approval is normalized into a question event (approve/reject).
|
|
|
|
## SDK Modes
|
|
|
|
The TypeScript SDK supports two connection modes.
|
|
|
|
### Embedded Mode
|
|
|
|
Defined in `sdks/typescript/src/spawn.ts`:
|
|
|
|
1. **Binary resolution**: Checks `SANDBOX_AGENT_BIN` env, then platform-specific npm package, then `PATH`
|
|
2. **Port selection**: Uses provided port or finds a free one via `net.createServer()`
|
|
3. **Token generation**: Uses provided token or generates random 24-byte hex string
|
|
4. **Spawn**: Launches `sandbox-agent server --host <host> --port <port> --token <token>`
|
|
5. **Health wait**: Polls `GET /v1/health` until server is ready (up to 15s timeout)
|
|
6. **Cleanup**: On dispose, sends SIGTERM then SIGKILL if needed; also registers process exit handlers
|
|
|
|
```typescript
|
|
const handle = await spawnSandboxAgent({ log: "inherit" });
|
|
// handle.baseUrl = "http://127.0.0.1:<port>"
|
|
// handle.token = "<generated>"
|
|
// handle.dispose() to cleanup
|
|
```
|
|
|
|
### Server Mode
|
|
|
|
Defined in `sdks/typescript/src/client.ts`:
|
|
|
|
- Direct HTTP client to a remote `sandbox-agent` server
|
|
- Uses provided `baseUrl` and optional `token`
|
|
- No subprocess management
|
|
|
|
```typescript
|
|
const client = await SandboxAgent.connect({
|
|
baseUrl: "http://remote-server:8080",
|
|
token: "secret",
|
|
});
|
|
```
|
|
|
|
### Auto-Detection
|
|
|
|
`SandboxAgent` provides two factory methods:
|
|
|
|
```typescript
|
|
// Connect to existing server
|
|
const client = await SandboxAgent.connect({
|
|
baseUrl: "http://remote:8080",
|
|
});
|
|
|
|
// Start embedded subprocess
|
|
const client = await SandboxAgent.start();
|
|
|
|
// With options
|
|
const client = await SandboxAgent.start({
|
|
spawn: { port: 9000 },
|
|
});
|
|
```
|
|
|
|
The `spawn` option can be:
|
|
- `true` / `false` - Enable/disable embedded mode
|
|
- `SandboxAgentSpawnOptions` - Fine-grained control over host, port, token, binary path, timeout, logging
|
|
|
|
## Authentication
|
|
|
|
The daemon uses a **global token** configured at startup. All HTTP and CLI operations reuse the same token and are validated against the `Authorization` header (`Bearer` or `Token`).
|
|
|
|
## Key Files
|
|
|
|
| Component | Path |
|
|
|-----------|------|
|
|
| Agent spawn/install | `server/packages/agent-management/src/agents.rs` |
|
|
| Session routing | `server/packages/sandbox-agent/src/router.rs` |
|
|
| Event converters | `server/packages/universal-agent-schema/src/agents/*.rs` |
|
|
| Schema extractors | `resources/agent-schemas/src/*.ts` |
|
|
| TypeScript SDK | `sdks/typescript/src/` |
|