- Add .github/actions/docker-setup composite action (from rivet) - Add docker/runtime/Dockerfile for Docker image builds - Update release.yaml to match rivet patterns: - Use corepack enable instead of pnpm/action-setup - Add reuse_engine_version input - Add Docker job with Depot runners - Use --no-frozen-lockfile for pnpm install - Add id-token permission for setup job
4.8 KiB
Server Testing
Test placement
Place all new tests under server/packages/**/tests/ (or a package-specific tests/ folder). Avoid inline tests inside source files unless there is no viable alternative.
Test locations (overview)
- Sandbox-agent integration tests live under
server/packages/sandbox-agent/tests/:- Agent flow coverage in
agent-flows/ - Agent management coverage in
agent-management/ - Shared server manager coverage in
server-manager/ - HTTP endpoint snapshots in
http/(snapshots inhttp/snapshots/) - Session capability snapshots in
sessions/(one file per capability, e.g.session_lifecycle.rs,permissions.rs,questions.rs,reasoning.rs,status.rs; snapshots insessions/snapshots/) - UI coverage in
ui/ - Shared helpers in
common/
- Agent flow coverage in
- Extracted agent schema roundtrip tests live under
server/packages/extracted-agent-schemas/tests/
Snapshot tests
HTTP endpoint snapshot entrypoint:
server/packages/sandbox-agent/tests/http_endpoints.rs
Session snapshot entrypoint:
server/packages/sandbox-agent/tests/sessions.rs
Snapshots are written to:
server/packages/sandbox-agent/tests/http/snapshots/(HTTP endpoint snapshots)server/packages/sandbox-agent/tests/sessions/snapshots/(session/capability snapshots)
Agent selection
SANDBOX_TEST_AGENTS controls which agents run. It accepts a comma-separated list or all.
If it is not set, tests will auto-detect installed agents by checking:
- binaries on
PATH, and - the default install dir (
$XDG_DATA_HOME/sandbox-agent/binor./.sandbox-agent/bin)
If no agents are found, tests fail with a clear error.
Credential handling
Credentials are pulled from the host by default via extract_all_credentials:
- environment variables (e.g.
ANTHROPIC_API_KEY,OPENAI_API_KEY) - local CLI configs (Claude/Codex/Amp/OpenCode)
You can override host credentials for tests with:
SANDBOX_TEST_ANTHROPIC_API_KEYSANDBOX_TEST_OPENAI_API_KEY
If SANDBOX_TEST_AGENTS includes an agent that requires a provider credential and it is missing,
tests fail before starting.
Credential health checks
Before running agent tests, credentials are validated with minimal API calls:
- Anthropic:
GET https://api.anthropic.com/v1/modelsx-api-keyfor API keysAuthorization: Bearerfor OAuth tokensanthropic-version: 2023-06-01
- OpenAI:
GET https://api.openai.com/v1/modelswithAuthorization: Bearer
401/403 yields a hard failure (invalid credentials). Other non-2xx responses or network
errors fail with a health-check error.
Health checks run in a blocking thread to avoid Tokio runtime drop errors inside async tests.
Snapshot stability
To keep snapshots deterministic:
- Use the mock agent as the master event sequence; all other agents must match its behavior 1:1.
- Snapshots should compare a canonical event skeleton (event order matters) with strict ordering across:
item.started→item.delta→item.completed- presence/absence of
session.ended - permission/question request and resolution flows
- Scrub non-deterministic fields from snapshots:
- IDs, timestamps, native IDs
- text content, tool inputs/outputs, provider-specific metadata
sourceandsyntheticflags (these are implementation details)
- Scrub
reasoningandstatuscontent from session-baseline snapshots to keep the core event skeleton consistent across agents; validate those content types separately in their capability-specific tests. - The sandbox-agent is responsible for emitting synthetic events so that real agents match the mock sequence exactly.
- Event streams are truncated after the first assistant or error event.
- Permission flow snapshots are truncated after the permission request (or first assistant) event.
- Unknown events are preserved as
kind: unknown(raw payload in universal schema). - Prefer snapshot-based event skeleton assertions over manual event-order assertions in tests.
- Never update snapshots based on any agent that is not the mock agent. The mock agent is the source of truth for snapshots; other agents must be compared against the mock snapshots without regenerating them.
- Agent-specific endpoints keep per-agent snapshots; any session-related snapshots must use the mock baseline as the single source of truth.
Typical commands
Run only Claude session snapshots:
SANDBOX_TEST_AGENTS=claude cargo test -p sandbox-agent --test sessions
Run all detected session snapshots:
cargo test -p sandbox-agent --test sessions
Run HTTP endpoint snapshots:
cargo test -p sandbox-agent --test http_endpoints
Universal Schema
When modifying agent conversion code in server/packages/universal-agent-schema/src/agents/ or adding/changing properties on the universal schema, update the feature matrix in README.md to reflect which agents support which features.