sandbox-agent/todo.md
2026-01-26 21:50:37 -08:00

6.1 KiB

TODO (from spec.md)

Universal API + Types

  • Define universal base types for agent input/output (common denominator across schemas)
  • Add universal question + permission types (HITL) and ensure they are supported end-to-end
  • Define UniversalEvent + UniversalEventData union and AgentError shape
  • Define a universal message type for "failed to parse" with raw JSON payload
  • Implement 2-way converters:
    • Universal input message <-> agent-specific input
    • Universal event <-> agent-specific event
  • Normalize Claude system/init events into universal started events
  • Support Codex CLI type-based event format in universal converter
  • Enforce agentMode vs permissionMode semantics + defaults at the API boundary
  • Ensure session id vs agentSessionId semantics are respected and surfaced consistently

Daemon (Rust HTTP server)

  • Build axum router + utoipa + schemars integration
  • Implement RFC 7807 Problem Details error responses backed by a thiserror enum
  • Implement canonical error type values + required error variants from spec
  • Implement offset semantics for events (exclusive last-seen id, default offset 0)
  • Implement SSE endpoint for events with same semantics as JSON endpoint
  • Replace in-memory session store with sandbox session manager (questions/permissions routing, long-lived processes)
  • Remove legacy token header support
  • Embed inspector frontend and serve it at /ui
  • Log inspector URL when starting the HTTP server

CLI

  • Implement clap CLI flags: --token, --no-token, --host, --port, CORS flags
  • Implement a CLI endpoint for every HTTP endpoint
  • Update CLAUDE.md to keep CLI endpoints in sync with HTTP API changes
  • Prefix CLI API requests with /v1
  • Add CLI credentials extractor subcommand
  • Move daemon startup to server subcommand
  • Add sandbox-daemon CLI alias

HTTP API Endpoints

  • POST /agents/{}/install with reinstall handling
  • GET /agents/{}/modes (mode discovery or hardcoded)
  • GET /agents (installed/version/path; version checked at request time)
  • POST /sessions/{} (create session, install if needed, return health + agentSessionId)
  • POST /sessions/{}/messages (send prompt)
  • GET /sessions/{}/events (pagination with offset/limit)
  • GET /sessions/{}/events/sse (streaming)
  • POST /sessions/{}/questions/{questionId}/reply
  • POST /sessions/{}/questions/{questionId}/reject
  • POST /sessions/{}/permissions/{permissionId}/reply
  • Prefix all HTTP API endpoints with /v1

Agent Management

  • Implement install/version/spawn basics for Claude/Codex/OpenCode/Amp
  • Implement agent install URL patterns + platform mappings for supported OS/arch
  • Parse JSONL output for subprocess agents and extract session/result metadata
  • Migrate Codex subprocess to App Server JSON-RPC protocol
  • Map permissionMode to agent CLI flags (Claude/Codex/Amp)
  • Implement session resume flags for Claude/OpenCode/Amp (Codex unsupported)
  • Replace sandbox-agent core agent modules with new agent-management crate (delete originals)
  • Stabilize agent-management crate API and fix build issues (sandbox-agent currently wired to WIP crate)
  • Implement OpenCode shared server lifecycle (opencode serve, health, restart)
  • Implement OpenCode HTTP session APIs + SSE event stream integration
  • Implement JSONL parsing for subprocess agents and map to UniversalEvent
  • Capture agent session id from events and expose as agentSessionId
  • Handle agent process exit and map to agent_process_exited error
  • Implement agentMode discovery rules (OpenCode API, hardcoded others)
  • Enforce permissionMode behavior (default/plan/bypass) for subprocesses

Credentials

  • Implement credential extraction module (Claude/Codex/OpenCode)
  • Add Amp credential extraction (config-based)
  • Move credential extraction into agent-credentials crate
  • Pass extracted credentials into subprocess env vars per agent
  • Ensure OpenCode server reads credentials from config on startup

Testing

  • Build a universal agent test suite that exercises all features (messages, questions, permissions, etc.) using HTTP API
  • Run the full suite against every agent (Claude/Codex/OpenCode/Amp) without mocks
  • Add real install/version/spawn tests for Claude/Codex/OpenCode (Amp conditional)
  • Expand agent lifecycle tests (reinstall, session id extraction, resume, plan mode)
  • Add OpenCode server-mode tests (session create, prompt, SSE)
  • Add tests for question/permission flows using deterministic prompts
  • Add HTTP/SSE snapshot tests for real agents (env-configured)
  • Add snapshot coverage for auth, CORS, and concurrent sessions
  • Add inspector UI route test

Frontend (frontend/packages/inspector)

  • Build Vite + React app with connect screen (endpoint + optional token)
  • Add instructions to run sandbox-agent (including CORS)
  • Implement full agent UI covering all features
  • Add HTTP request log with copyable curl command
  • Add Content-Type header to CORS callout command
  • Default inspector endpoint to current origin and auto-connect via health check

TypeScript SDK

  • Generate OpenAPI from utoipa and run openapi-typescript
  • Implement a thin fetch-based client wrapper
  • Update CLAUDE.md to require SDK + CLI updates when API changes
  • Prefix SDK requests with /v1

Examples + Tests

  • Add examples for Docker, E2B, Daytona, Vercel Sandboxes, Cloudflare Sandboxes
  • Add Vitest unit test for each example (Cloudflare requires special setup)

Documentation

  • Write README covering architecture, agent compatibility, and deployment guide
  • Add universal API feature checklist (questions, approve plan, etc.)
  • Document CLI, HTTP API, frontend app, and TypeScript SDK usage
  • Use collapsible sections for endpoints and SDK methods

  • implement release pipeline
  • implement e2b example
  • implement typescript "start locally" by pulling form server using version
  • Move agent schema sources to src/agents