15 KiB
i need to build a library that is a universal api to work with agents
glossary
- agent = claude code, codex, and opencode -> the acutal binary/sdk that runs the coding agent
- agent mode = what the agent does, for example build/plan agent mode
- model = claude, codex, gemni, etc -> the model that's use din the agent
- variant = variant on the model if exists, eg low, mid, high, xhigh for codex
concepts
universal api types
we need to define a universal base type for input & output from agents that is a common denominator for all agent schemas
this also needs to support quesitons (ie human in the loop)
working with the agents
these agents all have differnet ways of working with them.
- claude code uses headless mode
- codex uses a typescript sdk
- opencode uses a server
component: daemon
this is what runs inside the sandbox to manage everything
this is a rust component that exposes an http server
router
use axum for routing and utoipa for the json schema and schemars for generating json schemas. see how this is done in:
- ~/rivet
- engine/packages/config-schema-gen/build.rs
- ~/rivet/engine/packages/api-public/src/router.rs (but use thiserror instead of anyhow)
we need a standard thiserror for error responses. return errors as RFC 7807 Problem Details
cli
it's ran with a token like this using clap:
sandbox-daemon --token --host xxxx --port xxxx
(you can specify --no-token too)
also expose a CLI endpoint for every http endpoint we have (specify this in claude.md to keep this to date) so we can do:
sandbox-daemon sessions get-messages --endpoint xxxx --token xxxx
http api
POST /agents/{}/install (this will install the agent) {}
POST /sessions/{} (will install agent if not already installed)
{ agent:"claud"|"codex"|"opencode", model?:string, variant?:string, token?: string, validateToken?: boolean, dangerouslySkipPermissions?: boolean, agentVersion?: string } < { healthy: boolean, error?: AgentError }
POST /sessions/{}/messages { message: string }
GET /sessions/{}/events?offset=x&limit=x < { events: UniversalEvent[], hasMore: bool }
GET /sessions/{}/events/sse?offset=x
- same as above but using sse
POST /sessions/{}/questions/{questionId}/reply { answers: string[][] } // Array per question of selected option labels
POST /sessions/{}/questions/{questionId}/reject {}
POST /sessions/{}/permissions/{permissionId}/reply { reply: "once" | "always" | "reject" }
types:
type UniversalEvent = | { message: UniversalMessage } | { started: Started } | { error: CrashInfo } | { questionAsked: QuestionRequest } | { permissionAsked: PermissionRequest };
// See research/human-in-the-loop.md for QuestionRequest/PermissionRequest details
type AgentError = { tokenError: ... } | { processExisted: ... } | { installFailed: ... } | etc
schema converters
we need to have a 2 way conversion for both:
- universal agent input message <-> agent input message
- universal agent event <-> agent event
for messages, we need to have a sepcial universal message type for failed to parse with the raw json that we attempted to parse
managing agents
Note: We do NOT use JS SDKs for agent communication. All agents are spawned as subprocesses or accessed via a shared server. This keeps the daemon language-agnostic (Rust) and avoids Node.js dependencies.
agent comparison
| Agent | Provider | Binary | Install Method | Session ID | Streaming Format |
|---|---|---|---|---|---|
| Claude Code | Anthropic | claude |
curl raw binary from GCS | session_id (string) |
JSONL via stdout |
| Codex | OpenAI | codex |
curl tarball from GitHub releases | thread_id (string) |
JSONL via stdout |
| OpenCode | Multi-provider | opencode |
curl tarball from GitHub releases | session_id (string) |
SSE or JSONL |
| Amp | Sourcegraph | amp |
curl raw binary from GCS | session_id (string) |
JSONL via stdout |
spawning approaches
There are two ways to spawn agents:
1. subprocess per session
Each session spawns a dedicated agent subprocess that lives for the duration of the session.
How it works:
- On session create, spawn the agent binary with appropriate flags
- Communicate via stdin/stdout using JSONL
- Process terminates when session ends or times out
Agents that support this:
- Claude Code:
claude --print --output-format stream-json --verbose --dangerously-skip-permissions [--resume SESSION_ID] "PROMPT" - Codex:
codex exec --json --dangerously-bypass-approvals-and-sandbox "PROMPT"orcodex exec resume --last - Amp:
amp --print --output-format stream-json --dangerously-skip-permissions "PROMPT"
Pros:
- Simple implementation
- Process isolation per session
- No shared state to manage
Cons:
- Higher latency (process startup per message)
- More resource usage (one process per active session)
- No connection reuse
2. shared server (preferred for OpenCode)
A single long-running server handles multiple sessions. The daemon connects to this server via HTTP/SSE.
How it works:
- On daemon startup (or first session for an agent), start the server if not running
- Server listens on a port (e.g., 4200-4300 range for OpenCode)
- Sessions are created/managed via HTTP API
- Events streamed via SSE
Agents that support this:
- OpenCode:
opencode serve --port PORTstarts the server, then use HTTP API:POST /session- create sessionPOST /session/{id}/prompt- send messageGET /event/subscribe- SSE event stream- Supports questions/permissions via
/question/reply,/permission/reply
Pros:
- Lower latency (no process startup per message)
- Shared resources across sessions
- Better for high-throughput scenarios
- Native support for SSE streaming
Cons:
- More complex lifecycle management
- Need to handle server crashes/restarts
- Shared state between sessions
which approach to use
| Agent | Recommended Approach | Reason |
|---|---|---|
| Claude Code | Subprocess per session | No server mode available |
| Codex | Subprocess per session | No server mode available |
| OpenCode | Shared server | Native server support, lower latency |
| Amp | Subprocess per session | No server mode available |
installation
Before spawning, agents must be installed. We curl raw binaries directly - no npm, brew, install scripts, or other package managers.
Claude Code
# Get latest version
VERSION=$(curl -s https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest)
# Linux x64
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/linux-x64/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude
# Linux x64 (musl)
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/linux-x64-musl/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude
# Linux ARM64
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/linux-arm64/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude
# macOS ARM64 (Apple Silicon)
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/darwin-arm64/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude
# macOS x64 (Intel)
curl -fsSL "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${VERSION}/darwin-x64/claude" -o /usr/local/bin/claude && chmod +x /usr/local/bin/claude
Codex
# Linux x64 (musl for max compatibility)
curl -fsSL https://github.com/openai/codex/releases/latest/download/codex-x86_64-unknown-linux-musl.tar.gz | tar -xz
mv codex-x86_64-unknown-linux-musl /usr/local/bin/codex
# Linux ARM64
curl -fsSL https://github.com/openai/codex/releases/latest/download/codex-aarch64-unknown-linux-musl.tar.gz | tar -xz
mv codex-aarch64-unknown-linux-musl /usr/local/bin/codex
# macOS ARM64 (Apple Silicon)
curl -fsSL https://github.com/openai/codex/releases/latest/download/codex-aarch64-apple-darwin.tar.gz | tar -xz
mv codex-aarch64-apple-darwin /usr/local/bin/codex
# macOS x64 (Intel)
curl -fsSL https://github.com/openai/codex/releases/latest/download/codex-x86_64-apple-darwin.tar.gz | tar -xz
mv codex-x86_64-apple-darwin /usr/local/bin/codex
OpenCode
# Linux x64
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-linux-x64.tar.gz | tar -xz
mv opencode /usr/local/bin/opencode
# Linux x64 (musl)
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-linux-x64-musl.tar.gz | tar -xz
mv opencode /usr/local/bin/opencode
# Linux ARM64
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-linux-arm64.tar.gz | tar -xz
mv opencode /usr/local/bin/opencode
# macOS ARM64 (Apple Silicon)
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-darwin-arm64.zip -o opencode.zip && unzip -o opencode.zip && rm opencode.zip
mv opencode /usr/local/bin/opencode
# macOS x64 (Intel)
curl -fsSL https://github.com/anomalyco/opencode/releases/latest/download/opencode-darwin-x64.zip -o opencode.zip && unzip -o opencode.zip && rm opencode.zip
mv opencode /usr/local/bin/opencode
Amp
# Get latest version
VERSION=$(curl -s https://storage.googleapis.com/amp-public-assets-prod-0/cli/cli-version.txt)
# Linux x64
curl -fsSL "https://storage.googleapis.com/amp-public-assets-prod-0/cli/${VERSION}/amp-linux-x64" -o /usr/local/bin/amp && chmod +x /usr/local/bin/amp
# Linux ARM64
curl -fsSL "https://storage.googleapis.com/amp-public-assets-prod-0/cli/${VERSION}/amp-linux-arm64" -o /usr/local/bin/amp && chmod +x /usr/local/bin/amp
# macOS ARM64 (Apple Silicon)
curl -fsSL "https://storage.googleapis.com/amp-public-assets-prod-0/cli/${VERSION}/amp-darwin-arm64" -o /usr/local/bin/amp && chmod +x /usr/local/bin/amp
# macOS x64 (Intel)
curl -fsSL "https://storage.googleapis.com/amp-public-assets-prod-0/cli/${VERSION}/amp-darwin-x64" -o /usr/local/bin/amp && chmod +x /usr/local/bin/amp
binary URL summary
| Agent | Version URL | Binary URL Pattern |
|---|---|---|
| Claude Code | https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest |
.../{version}/{platform}/claude |
| Codex | https://api.github.com/repos/openai/codex/releases/latest |
https://github.com/openai/codex/releases/latest/download/codex-{target}.tar.gz |
| OpenCode | https://api.github.com/repos/anomalyco/opencode/releases/latest |
https://github.com/anomalyco/opencode/releases/latest/download/opencode-{platform}.tar.gz |
| Amp | https://storage.googleapis.com/amp-public-assets-prod-0/cli/cli-version.txt |
.../{version}/amp-{platform} |
platform mappings
| Platform | Claude Code | Codex | OpenCode | Amp |
|---|---|---|---|---|
| Linux x64 | linux-x64 |
x86_64-unknown-linux-musl |
linux-x64 |
linux-x64 |
| Linux x64 musl | linux-x64-musl |
x86_64-unknown-linux-musl |
linux-x64-musl |
N/A |
| Linux ARM64 | linux-arm64 |
aarch64-unknown-linux-musl |
linux-arm64 |
linux-arm64 |
| macOS ARM64 | darwin-arm64 |
aarch64-apple-darwin |
darwin-arm64 |
darwin-arm64 |
| macOS x64 | darwin-x64 |
x86_64-apple-darwin |
darwin-x64 |
darwin-x64 |
versioning
| Agent | Get Latest Version | Specific Version |
|---|---|---|
| Claude Code | curl -s https://storage.googleapis.com/claude-code-dist-.../latest |
Replace ${VERSION} in URL |
| Codex | curl -s https://api.github.com/repos/openai/codex/releases/latest | jq -r .tag_name |
Replace latest with download/{tag} |
| OpenCode | curl -s https://api.github.com/repos/anomalyco/opencode/releases/latest | jq -r .tag_name |
Replace latest with download/{tag} |
| Amp | curl -s https://storage.googleapis.com/amp-public-assets-prod-0/cli/cli-version.txt |
Replace ${VERSION} in URL |
communication
Subprocess mode (Claude Code, Codex, Amp):
- Spawn process with appropriate flags
- Close stdin immediately after sending prompt (for single-turn) or keep open (for multi-turn)
- Read JSONL events from stdout line-by-line
- Parse each line as JSON and convert to
UniversalEvent - Capture session/thread ID from events for resumption
- Handle process exit/timeout
Server mode (OpenCode):
- Ensure server is running (
opencode serve --port PORT) - Create session via
POST /session - Send prompts via
POST /session/{id}/prompt(async version for streaming) - Subscribe to events via
GET /event/subscribe(SSE) - Handle questions/permissions via dedicated endpoints
- Session persists across multiple prompts
credential passing
| Agent | Env Var | Config File |
|---|---|---|
| Claude Code | ANTHROPIC_API_KEY |
~/.claude.json, ~/.claude/.credentials.json |
| Codex | OPENAI_API_KEY or CODEX_API_KEY |
~/.codex/auth.json |
| OpenCode | ANTHROPIC_API_KEY, OPENAI_API_KEY |
~/.local/share/opencode/auth.json |
| Amp | ANTHROPIC_API_KEY |
Uses Claude Code credentials |
When spawning subprocesses, pass the API key via environment variable. For OpenCode server mode, the server reads credentials from its config on startup.
testing
TODO
component: sdks
we need to auto-generate types from our json schema for these languages
- typescript sdk
- also need to support standard schema
- can run in inline mode that doesn't require this
- python sdk
spec todo
- generate common denominator with conversion functions
- what else do we need, like todo, etc?
- how can we dump the spec for all of the agents somehow
- generate an example ui for this
- architecture document
- how should we handle the tokens for auth?
future problems to visit
- api features
- list agent modes available
- list models available
- handle planning mode
- api key gateway
- configuring mcp/skills/etc
- process management inside container
- otel
- better authentication systems
- s3-based file system
- ai sdk compatability for their ecosystem (useChat, etc)
- resumable messages
- todo lists
- all other features
- misc
- bootstrap tool that extracts tokens from the current system
- management ui
- skill
- pre-package these as bun binaries instead of npm installations
- build & release pipeline with musl
- agent feature matrix for api features
future work
- provide a pty to access the agent data
- other agent features like file system
misc
comparison to agentapi:
- it does not use the pty since we need to get more information from the agent