mirror of
https://github.com/harivansh-afk/sandbox-agent.git
synced 2026-04-17 04:02:25 +00:00
acp spec (#155)
This commit is contained in:
parent
70287ec471
commit
e72eb9f611
264 changed files with 18559 additions and 51021 deletions
142
server/CLAUDE.md
142
server/CLAUDE.md
|
|
@ -1,122 +1,40 @@
|
|||
# Server
|
||||
# Server Instructions
|
||||
|
||||
See [ARCHITECTURE.md](./ARCHITECTURE.md) for detailed architecture documentation covering the daemon, agent schema pipeline, session management, agent execution patterns, and SDK modes.
|
||||
## ACP v2 Architecture
|
||||
|
||||
## Skill Source Installation
|
||||
- Public API routes are defined in `server/packages/sandbox-agent/src/router.rs`.
|
||||
- ACP runtime/process bridge is in `server/packages/sandbox-agent/src/acp_runtime.rs`.
|
||||
- `/v2` is the only active API surface for sessions/prompts (`/v2/rpc`).
|
||||
- Keep binary filesystem transfer endpoints as dedicated HTTP APIs:
|
||||
- `GET /v2/fs/file`
|
||||
- `PUT /v2/fs/file`
|
||||
- `POST /v2/fs/upload-batch`
|
||||
- Rationale: host-owned cross-agent-consistent behavior and large binary transfer needs that ACP JSON-RPC is not suited to stream efficiently.
|
||||
- Maintain ACP variants in parallel only when they share the same underlying filesystem implementation; SDK defaults should still prefer HTTP for large/binary transfers.
|
||||
- `/v1/*` must remain hard-removed (`410`) and `/opencode/*` stays disabled (`503`) until Phase 7.
|
||||
- Agent install logic (native + ACP agent process + lazy install) is handled by `server/packages/agent-management/`.
|
||||
|
||||
Skills are installed via `skills.sources` in the session create request. The [vercel-labs/skills](https://github.com/vercel-labs/skills) repo (`~/misc/skills`) provides reference for skill installation patterns and source parsing logic. The server handles fetching GitHub repos (via zip download) and git repos (via clone) to `~/.sandbox-agent/skills-cache/`, discovering `SKILL.md` files, and symlinking into agent skill roots.
|
||||
## API Contract Rules
|
||||
|
||||
# Server Testing
|
||||
- Every `#[utoipa::path(...)]` handler needs a summary line + description lines in its doc comment.
|
||||
- Every `responses(...)` entry must include `description`.
|
||||
- Regenerate `docs/openapi.json` after endpoint contract changes.
|
||||
- Keep CLI and HTTP endpoint behavior aligned (`docs/cli.mdx`).
|
||||
|
||||
## Test placement
|
||||
## Tests
|
||||
|
||||
Place all new tests under `server/packages/**/tests/` (or a package-specific `tests/` folder). Avoid inline tests inside source files unless there is no viable alternative.
|
||||
Primary v2 integration coverage:
|
||||
- `server/packages/sandbox-agent/tests/v2_api.rs`
|
||||
- `server/packages/sandbox-agent/tests/v2_agent_process_matrix.rs`
|
||||
|
||||
## Test locations (overview)
|
||||
|
||||
- Sandbox-agent integration tests live under `server/packages/sandbox-agent/tests/`:
|
||||
- Agent flow coverage in `agent-flows/`
|
||||
- Agent management coverage in `agent-management/`
|
||||
- Shared server manager coverage in `server-manager/`
|
||||
- HTTP endpoint snapshots in `http/` (snapshots in `http/snapshots/`)
|
||||
- Session feature coverage snapshots in `sessions/` (one file per feature, e.g. `session_lifecycle.rs`, `permissions.rs`, `questions.rs`, `reasoning.rs`, `status.rs`; snapshots in `sessions/snapshots/`)
|
||||
- UI coverage in `ui/`
|
||||
- Shared helpers in `common/`
|
||||
- Extracted agent schema roundtrip tests live under `server/packages/extracted-agent-schemas/tests/`
|
||||
|
||||
## Snapshot tests
|
||||
|
||||
HTTP endpoint snapshot entrypoint:
|
||||
- `server/packages/sandbox-agent/tests/http_endpoints.rs`
|
||||
|
||||
Session snapshot entrypoint:
|
||||
- `server/packages/sandbox-agent/tests/sessions.rs`
|
||||
|
||||
Snapshots are written to:
|
||||
- `server/packages/sandbox-agent/tests/http/snapshots/` (HTTP endpoint snapshots)
|
||||
- `server/packages/sandbox-agent/tests/sessions/snapshots/` (session/feature coverage snapshots)
|
||||
|
||||
## Agent selection
|
||||
|
||||
`SANDBOX_TEST_AGENTS` controls which agents run. It accepts a comma-separated list or `all`.
|
||||
If it is **not set**, tests will auto-detect installed agents by checking:
|
||||
- binaries on `PATH`, and
|
||||
- the default install dir (`$XDG_DATA_HOME/sandbox-agent/bin` or `./.sandbox-agent/bin`)
|
||||
|
||||
If no agents are found, tests fail with a clear error.
|
||||
|
||||
## Credential handling
|
||||
|
||||
Credentials are pulled from the host by default via `extract_all_credentials`:
|
||||
- environment variables (e.g. `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`)
|
||||
- local CLI configs (Claude/Codex/Amp/OpenCode)
|
||||
|
||||
You can override host credentials for tests with:
|
||||
- `SANDBOX_TEST_ANTHROPIC_API_KEY`
|
||||
- `SANDBOX_TEST_OPENAI_API_KEY`
|
||||
|
||||
If `SANDBOX_TEST_AGENTS` includes an agent that requires a provider credential and it is missing,
|
||||
tests fail before starting.
|
||||
|
||||
## Credential health checks
|
||||
|
||||
Before running agent tests, credentials are validated with minimal API calls:
|
||||
- Anthropic: `GET https://api.anthropic.com/v1/models`
|
||||
- `x-api-key` for API keys
|
||||
- `Authorization: Bearer` for OAuth tokens
|
||||
- `anthropic-version: 2023-06-01`
|
||||
- OpenAI: `GET https://api.openai.com/v1/models` with `Authorization: Bearer`
|
||||
|
||||
401/403 yields a hard failure (`invalid credentials`). Other non-2xx responses or network
|
||||
errors fail with a health-check error.
|
||||
|
||||
Health checks run in a blocking thread to avoid Tokio runtime drop errors inside async tests.
|
||||
|
||||
## Snapshot stability
|
||||
|
||||
To keep snapshots deterministic:
|
||||
- Use the mock agent as the **master** event sequence; all other agents must match its behavior 1:1.
|
||||
- Snapshots should compare a **canonical event skeleton** (event order matters) with strict ordering across:
|
||||
- `item.started` → `item.delta` → `item.completed`
|
||||
- presence/absence of `session.ended`
|
||||
- permission/question request and resolution flows
|
||||
- Scrub non-deterministic fields from snapshots:
|
||||
- IDs, timestamps, native IDs
|
||||
- text content, tool inputs/outputs, provider-specific metadata
|
||||
- `source` and `synthetic` flags (these are implementation details)
|
||||
- Scrub `reasoning` and `status` content from session-baseline snapshots to keep the core event skeleton consistent across agents; validate those content types separately in their feature-coverage-specific tests.
|
||||
- The sandbox-agent is responsible for emitting **synthetic events** so that real agents match the mock sequence exactly.
|
||||
- Event streams are truncated after the first assistant or error event.
|
||||
- Permission flow snapshots are truncated after the permission request (or first assistant) event.
|
||||
- Unknown events are preserved as `kind: unknown` (raw payload in universal schema).
|
||||
- Prefer snapshot-based event skeleton assertions over manual event-order assertions in tests.
|
||||
- **Never update snapshots based on any agent that is not the mock agent.** The mock agent is the source of truth for snapshots; other agents must be compared against the mock snapshots without regenerating them.
|
||||
- Agent-specific endpoints keep per-agent snapshots; any session-related snapshots must use the mock baseline as the single source of truth.
|
||||
|
||||
## Typical commands
|
||||
|
||||
Run only Claude session snapshots:
|
||||
```
|
||||
SANDBOX_TEST_AGENTS=claude cargo test -p sandbox-agent --test sessions
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p sandbox-agent --test v2_api
|
||||
cargo test -p sandbox-agent --test v2_agent_process_matrix
|
||||
```
|
||||
|
||||
Run all detected session snapshots:
|
||||
```
|
||||
cargo test -p sandbox-agent --test sessions
|
||||
```
|
||||
## Migration Docs Sync
|
||||
|
||||
Run HTTP endpoint snapshots:
|
||||
```
|
||||
cargo test -p sandbox-agent --test http_endpoints
|
||||
```
|
||||
|
||||
## Universal Schema
|
||||
|
||||
When modifying agent conversion code in `server/packages/universal-agent-schema/src/agents/` or adding/changing properties on the universal schema, update the feature matrix in `README.md` to reflect which agents support which features.
|
||||
|
||||
## Feature coverage sync
|
||||
|
||||
When updating agent feature coverage (flags or values), keep them in sync across:
|
||||
- `README.md` (feature matrix / documented support)
|
||||
- server Rust implementation (`AgentCapabilities` + `agent_capabilities_for`)
|
||||
- frontend feature coverage views/badges (Inspector UI)
|
||||
- Keep `research/acp/spec.md` as the source spec.
|
||||
- Update `research/acp/todo.md` when scope/status changes.
|
||||
- Log blockers/decisions in `research/acp/friction.md`.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue