mirror of
https://github.com/harivansh-afk/sandbox-agent.git
synced 2026-04-15 08:03:46 +00:00
6.3 KiB
6.3 KiB
TODO (from spec.md)
Universal API + Types
- Define universal base types for agent input/output (common denominator across schemas)
- Add universal question + permission types (HITL) and ensure they are supported end-to-end
- Define
UniversalEvent+UniversalEventDataunion andAgentErrorshape - Define a universal message type for "failed to parse" with raw JSON payload
- Implement 2-way converters:
- Universal input message <-> agent-specific input
- Universal event <-> agent-specific event
- Normalize Claude system/init events into universal started events
- Support Codex CLI type-based event format in universal converter
- Enforce agentMode vs permissionMode semantics + defaults at the API boundary
- Ensure session id vs agentSessionId semantics are respected and surfaced consistently
Daemon (Rust HTTP server)
- Build axum router + utoipa + schemars integration
- Implement RFC 7807 Problem Details error responses backed by a
thiserrorenum - Implement canonical error
typevalues + required error variants from spec - Implement offset semantics for events (exclusive last-seen id, default offset 0)
- Implement SSE endpoint for events with same semantics as JSON endpoint
- Replace in-memory session store with sandbox session manager (questions/permissions routing, long-lived processes)
- Remove legacy token header support
- Embed inspector frontend and serve it at
/ui - Log inspector URL when starting the HTTP server
CLI
- Implement clap CLI flags:
--token,--no-token,--host,--port, CORS flags - Implement a CLI endpoint for every HTTP endpoint
- Update
CLAUDE.mdto keep CLI endpoints in sync with HTTP API changes - Prefix CLI API requests with
/v1 - Add CLI credentials extractor subcommand
- Move daemon startup to
serversubcommand - Add
sandbox-daemonCLI alias
HTTP API Endpoints
- POST
/agents/{}/installwithreinstallhandling - GET
/agents/{}/modes(mode discovery or hardcoded) - GET
/agents(installed/version/path; version checked at request time) - POST
/sessions/{}(create session, install if needed, return health + agentSessionId) - POST
/sessions/{}/messages(send prompt) - GET
/sessions/{}/events(pagination with offset/limit) - GET
/sessions/{}/events/sse(streaming) - POST
/sessions/{}/questions/{questionId}/reply - POST
/sessions/{}/questions/{questionId}/reject - POST
/sessions/{}/permissions/{permissionId}/reply - Prefix all HTTP API endpoints with
/v1
Agent Management
- Implement install/version/spawn basics for Claude/Codex/OpenCode/Amp
- Implement agent install URL patterns + platform mappings for supported OS/arch
- Parse JSONL output for subprocess agents and extract session/result metadata
- Migrate Codex subprocess to App Server JSON-RPC protocol
- Map permissionMode to agent CLI flags (Claude/Codex/Amp)
- Implement session resume flags for Claude/OpenCode/Amp (Codex unsupported)
- Replace sandbox-agent core agent modules with new agent-management crate (delete originals)
- Stabilize agent-management crate API and fix build issues (sandbox-agent currently wired to WIP crate)
- Implement OpenCode shared server lifecycle (
opencode serve, health, restart) - Implement OpenCode HTTP session APIs + SSE event stream integration
- Implement JSONL parsing for subprocess agents and map to
UniversalEvent - Capture agent session id from events and expose as
agentSessionId - Handle agent process exit and map to
agent_process_exitederror - Implement agentMode discovery rules (OpenCode API, hardcoded others)
- Enforce permissionMode behavior (default/plan/bypass) for subprocesses
Credentials
- Implement credential extraction module (Claude/Codex/OpenCode)
- Add Amp credential extraction (config-based)
- Move credential extraction into
agent-credentialscrate - Pass extracted credentials into subprocess env vars per agent
- Ensure OpenCode server reads credentials from config on startup
Testing
- Build a universal agent test suite that exercises all features (messages, questions, permissions, etc.) using HTTP API
- Run the full suite against every agent (Claude/Codex/OpenCode/Amp) without mocks
- Add real install/version/spawn tests for Claude/Codex/OpenCode (Amp conditional)
- Expand agent lifecycle tests (reinstall, session id extraction, resume, plan mode)
- Add OpenCode server-mode tests (session create, prompt, SSE)
- Add tests for question/permission flows using deterministic prompts
- Add HTTP/SSE snapshot tests for real agents (env-configured)
- Add snapshot coverage for auth, CORS, and concurrent sessions
- Add inspector UI route test
Frontend (frontend/packages/inspector)
- Build Vite + React app with connect screen (endpoint + optional token)
- Add instructions to run sandbox-agent (including CORS)
- Implement full agent UI covering all features
- Add HTTP request log with copyable curl command
- Add Content-Type header to CORS callout command
- Default inspector endpoint to current origin and auto-connect via health check
- Update inspector to universal schema events (items, deltas, approvals, errors)
TypeScript SDK
- Generate OpenAPI from utoipa and run
openapi-typescript - Implement a thin fetch-based client wrapper
- Update
CLAUDE.mdto require SDK + CLI updates when API changes - Prefix SDK requests with
/v1
Examples + Tests
- Add examples for Docker, E2B, Daytona, Vercel Sandboxes, Cloudflare Sandboxes
- Add Vitest unit test for each example (Cloudflare requires special setup)
Documentation
- Write README covering architecture, agent compatibility, and deployment guide
- Add universal API feature checklist (questions, approve plan, etc.)
- Document CLI, HTTP API, frontend app, and TypeScript SDK usage
- Use collapsible sections for endpoints and SDK methods
- Integrate OpenAPI spec with Mintlify (docs/openapi.json + validation)
- implement release pipeline
- implement e2b example
- implement typescript "start locally" by pulling form server using version
- Move agent schema sources to src/agents
- Add Vercel AI SDK UIMessage schema extractor