sandbox-agent/spec.md at 66922c0ac02f03197f825e8861fca7cd3eed4dc0

harivansh-afk/sandbox-agent

Fork 0

mirror of https://github.com/harivansh-afk/sandbox-agent.git synced 2026-04-15 06:04:43 +00:00

Nathan Flurry 66922c0ac0 feat: add HITL endpoints to HTTP API spec

2026-01-24 22:48:58 -08:00

15 KiB

Raw Blame History

i need to build a library that is a universal api to work with agents

glossary

agent = claude code, codex, and opencode -> the acutal binary/sdk that runs the coding agent
agent mode = what the agent does, for example build/plan agent mode
model = claude, codex, gemni, etc -> the model that's use din the agent
variant = variant on the model if exists, eg low, mid, high, xhigh for codex

concepts

universal api types

we need to define a universal base type for input & output from agents that is a common denominator for all agent schemas

this also needs to support quesitons (ie human in the loop)

working with the agents

these agents all have differnet ways of working with them.

claude code uses headless mode
codex uses a typescript sdk
opencode uses a server

component: daemon

this is what runs inside the sandbox to manage everything

this is a rust component that exposes an http server

router

use axum for routing and utoipa for the json schema and schemars for generating json schemas. see how this is done in:

~/rivet
- engine/packages/config-schema-gen/build.rs
- ~/rivet/engine/packages/api-public/src/router.rs (but use thiserror instead of anyhow)

we need a standard thiserror for error responses. return errors as RFC 7807 Problem Details

cli

it's ran with a token like this using clap:

sandbox-daemon --token --host xxxx --port xxxx

(you can specify --no-token too)

also expose a CLI endpoint for every http endpoint we have (specify this in claude.md to keep this to date) so we can do:

sandbox-daemon sessions get-messages --endpoint xxxx --token xxxx

http api

POST /agents/{}/install (this will install the agent) {}

POST /sessions/{} (will install agent if not already installed)

{ agent:"claud"|"codex"|"opencode", model?:string, variant?:string, token?: string, validateToken?: boolean, dangerouslySkipPermissions?: boolean, agentVersion?: string } < { healthy: boolean, error?: AgentError }

POST /sessions/{}/messages { message: string }

GET /sessions/{}/events?offset=x&limit=x < { events: UniversalEvent[], hasMore: bool }

GET /sessions/{}/events/sse?offset=x

same as above but using sse

POST /sessions/{}/questions/{questionId}/reply { answers: string[][] } // Array per question of selected option labels

POST /sessions/{}/questions/{questionId}/reject {}

POST /sessions/{}/permissions/{permissionId}/reply { reply: "once" | "always" | "reject" }

types:

type UniversalEvent = | { message: UniversalMessage } | { started: Started } | { error: CrashInfo } | { questionAsked: QuestionRequest } | { permissionAsked: PermissionRequest };

// See research/human-in-the-loop.md for QuestionRequest/PermissionRequest details

type AgentError = { tokenError: ... } | { processExisted: ... } | { installFailed: ... } | etc

schema converters

we need to have a 2 way conversion for both:

universal agent input message <-> agent input message
universal agent event <-> agent event

for messages, we need to have a sepcial universal message type for failed to parse with the raw json that we attempted to parse

managing agents

Note: We do NOT use JS SDKs for agent communication. All agents are spawned as subprocesses or accessed via a shared server. This keeps the daemon language-agnostic (Rust) and avoids Node.js dependencies.

agent comparison

Agent	Provider	Binary	Install Method	Session ID	Streaming Format
Claude Code	Anthropic	`claude`	curl raw binary from GCS	`session_id` (string)	JSONL via stdout
Codex	OpenAI	`codex`	curl tarball from GitHub releases	`thread_id` (string)	JSONL via stdout
OpenCode	Multi-provider	`opencode`	curl tarball from GitHub releases	`session_id` (string)	SSE or JSONL
Amp	Sourcegraph	`amp`	curl raw binary from GCS	`session_id` (string)	JSONL via stdout

spawning approaches

There are two ways to spawn agents:

1. subprocess per session

Each session spawns a dedicated agent subprocess that lives for the duration of the session.

How it works:

On session create, spawn the agent binary with appropriate flags
Communicate via stdin/stdout using JSONL
Process terminates when session ends or times out

Agents that support this:

Claude Code: claude --print --output-format stream-json --verbose --dangerously-skip-permissions [--resume SESSION_ID] "PROMPT"
Codex: codex exec --json --dangerously-bypass-approvals-and-sandbox "PROMPT" or codex exec resume --last
Amp: amp --print --output-format stream-json --dangerously-skip-permissions "PROMPT"

Pros:

Simple implementation
Process isolation per session
No shared state to manage

Cons:

Higher latency (process startup per message)
More resource usage (one process per active session)
No connection reuse

2. shared server (preferred for OpenCode)

A single long-running server handles multiple sessions. The daemon connects to this server via HTTP/SSE.

How it works:

On daemon startup (or first session for an agent), start the server if not running
Server listens on a port (e.g., 4200-4300 range for OpenCode)
Sessions are created/managed via HTTP API
Events streamed via SSE

Agents that support this:

OpenCode: opencode serve --port PORT starts the server, then use HTTP API:
- POST /session - create session
- POST /session/{id}/prompt - send message
- GET /event/subscribe - SSE event stream
- Supports questions/permissions via /question/reply, /permission/reply

Pros:

Lower latency (no process startup per message)
Shared resources across sessions
Better for high-throughput scenarios
Native support for SSE streaming

Cons:

More complex lifecycle management
Need to handle server crashes/restarts
Shared state between sessions

which approach to use

Agent	Recommended Approach	Reason
Claude Code	Subprocess per session	No server mode available
Codex	Subprocess per session	No server mode available
OpenCode	Shared server	Native server support, lower latency
Amp	Subprocess per session	No server mode available

installation

Before spawning, agents must be installed. We curl raw binaries directly - no npm, brew, install scripts, or other package managers.

Claude Code

# Get latest version
VERSION=$(curl -s https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest)

# Linux x64
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/linux-x64/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude

# Linux x64 (musl)
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/linux-x64-musl/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude

# Linux ARM64
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/linux-arm64/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude

# macOS ARM64 (Apple Silicon)
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/darwin-arm64/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude

# macOS x64 (Intel)
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/darwin-x64/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude

Codex

# Linux x64 (musl for max compatibility)
curl -fsSL https://github.com/openai/codex/releases/latest/download/codex-x86_64-unknown-linux-musl.tar.gz | tar -xz
mv codex-x86_64-unknown-linux-musl /usr/local/bin/codex

# Linux ARM64
curl -fsSL https://github.com/openai/codex/releases/latest/download/codex-aarch64-unknown-linux-musl.tar.gz | tar -xz
mv codex-aarch64-unknown-linux-musl /usr/local/bin/codex

# macOS ARM64 (Apple Silicon)
curl -fsSL https://github.com/openai/codex/releases/latest/download/codex-aarch64-apple-darwin.tar.gz | tar -xz
mv codex-aarch64-apple-darwin /usr/local/bin/codex

# macOS x64 (Intel)
curl -fsSL https://github.com/openai/codex/releases/latest/download/codex-x86_64-apple-darwin.tar.gz | tar -xz
mv codex-x86_64-apple-darwin /usr/local/bin/codex

OpenCode

# Linux x64
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-linux-x64.tar.gz | tar -xz
mv opencode /usr/local/bin/opencode

# Linux x64 (musl)
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-linux-x64-musl.tar.gz | tar -xz
mv opencode /usr/local/bin/opencode

# Linux ARM64
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-linux-arm64.tar.gz | tar -xz
mv opencode /usr/local/bin/opencode

# macOS ARM64 (Apple Silicon)
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-darwin-arm64.zip -o opencode.zip && unzip -o opencode.zip && rm opencode.zip
mv opencode /usr/local/bin/opencode

# macOS x64 (Intel)
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-darwin-x64.zip -o opencode.zip && unzip -o opencode.zip && rm opencode.zip
mv opencode /usr/local/bin/opencode

Amp

# Get latest version
VERSION=$(curl -s https://storage.googleapis.com/amp-public-assets-prod-0/cli/cli-version.txt)

# Linux x64
curl -fsSL "https://storage.googleapis.com/amp-public-assets-prod-0/cli/${VERSION}/amp-linux-x64" -o /usr/local/bin/amp && chmod +x /usr/local/bin/amp

# Linux ARM64
curl -fsSL "https://storage.googleapis.com/amp-public-assets-prod-0/cli/${VERSION}/amp-linux-arm64" -o /usr/local/bin/amp && chmod +x /usr/local/bin/amp

# macOS ARM64 (Apple Silicon)
curl -fsSL "https://storage.googleapis.com/amp-public-assets-prod-0/cli/${VERSION}/amp-darwin-arm64" -o /usr/local/bin/amp && chmod +x /usr/local/bin/amp

# macOS x64 (Intel)
curl -fsSL "https://storage.googleapis.com/amp-public-assets-prod-0/cli/${VERSION}/amp-darwin-x64" -o /usr/local/bin/amp && chmod +x /usr/local/bin/amp

binary URL summary

Agent	Version URL	Binary URL Pattern
Claude Code	`https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest`	`.../{version}/{platform}/claude`
Codex	`https://api.github.com/repos/openai/codex/releases/latest`	`https://github.com/openai/codex/releases/latest/download/codex-{target}.tar.gz`
OpenCode	`https://api.github.com/repos/anomalyco/opencode/releases/latest`	`https://github.com/anomalyco/opencode/releases/latest/download/opencode-{platform}.tar.gz`
Amp	`https://storage.googleapis.com/amp-public-assets-prod-0/cli/cli-version.txt`	`.../{version}/amp-{platform}`

platform mappings

Platform	Claude Code	Codex	OpenCode	Amp
Linux x64	`linux-x64`	`x86_64-unknown-linux-musl`	`linux-x64`	`linux-x64`
Linux x64 musl	`linux-x64-musl`	`x86_64-unknown-linux-musl`	`linux-x64-musl`	N/A
Linux ARM64	`linux-arm64`	`aarch64-unknown-linux-musl`	`linux-arm64`	`linux-arm64`
macOS ARM64	`darwin-arm64`	`aarch64-apple-darwin`	`darwin-arm64`	`darwin-arm64`
macOS x64	`darwin-x64`	`x86_64-apple-darwin`	`darwin-x64`	`darwin-x64`

versioning

Agent	Get Latest Version	Specific Version
Claude Code	`curl -s https://storage.googleapis.com/claude-code-dist-.../latest`	Replace `${VERSION}` in URL
Codex	`curl -s https://api.github.com/repos/openai/codex/releases/latest \| jq -r .tag_name`	Replace `latest` with `download/{tag}`
OpenCode	`curl -s https://api.github.com/repos/anomalyco/opencode/releases/latest \| jq -r .tag_name`	Replace `latest` with `download/{tag}`
Amp	`curl -s https://storage.googleapis.com/amp-public-assets-prod-0/cli/cli-version.txt`	Replace `${VERSION}` in URL

communication

Subprocess mode (Claude Code, Codex, Amp):

Spawn process with appropriate flags
Close stdin immediately after sending prompt (for single-turn) or keep open (for multi-turn)
Read JSONL events from stdout line-by-line
Parse each line as JSON and convert to UniversalEvent
Capture session/thread ID from events for resumption
Handle process exit/timeout

Server mode (OpenCode):

Ensure server is running (opencode serve --port PORT)
Create session via POST /session
Send prompts via POST /session/{id}/prompt (async version for streaming)
Subscribe to events via GET /event/subscribe (SSE)
Handle questions/permissions via dedicated endpoints
Session persists across multiple prompts

credential passing

Agent	Env Var	Config File
Claude Code	`ANTHROPIC_API_KEY`	`~/.claude.json`, `~/.claude/.credentials.json`
Codex	`OPENAI_API_KEY` or `CODEX_API_KEY`	`~/.codex/auth.json`
OpenCode	`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`	`~/.local/share/opencode/auth.json`
Amp	`ANTHROPIC_API_KEY`	Uses Claude Code credentials

When spawning subprocesses, pass the API key via environment variable. For OpenCode server mode, the server reads credentials from its config on startup.

testing

TODO

component: sdks

we need to auto-generate types from our json schema for these languages

typescript sdk
- also need to support standard schema
- can run in inline mode that doesn't require this
python sdk

spec todo

generate common denominator with conversion functions
what else do we need, like todo, etc?
how can we dump the spec for all of the agents somehow
generate an example ui for this
architecture document
how should we handle the tokens for auth?

future problems to visit

api features
- list agent modes available
- list models available
- handle planning mode
api key gateway
configuring mcp/skills/etc
process management inside container
otel
better authentication systems
s3-based file system
ai sdk compatability for their ecosystem (useChat, etc)
resumable messages
todo lists
all other features
misc
- bootstrap tool that extracts tokens from the current system
management ui
skill
pre-package these as bun binaries instead of npm installations
build & release pipeline with musl
agent feature matrix for api features

future work

provide a pty to access the agent data
other agent features like file system

misc

comparison to agentapi:

it does not use the pty since we need to get more information from the agent

15 KiB Raw Blame History