Merge pull request #264 from rivet-dev/desktop-computer-use-neko

feat: desktop computer-use APIs with neko streaming
This commit is contained in:
Nathan Flurry 2026-03-17 02:36:50 -07:00 committed by GitHub
commit 3b8c74589d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
60 changed files with 17904 additions and 1581 deletions

View file

@ -1,202 +0,0 @@
# Proposal: Revert Actions-Only Pattern Back to Queues/Workflows
## Background
We converted all actors from queue/workflow-based communication to direct actions as a workaround for a RivetKit bug where `c.queue.iter()` deadlocked for actors created from another actor's context. That bug has since been fixed in RivetKit. We want to revert to queues/workflows because they provide better observability (workflow history in the inspector), replay/recovery semantics, and are the idiomatic RivetKit pattern.
## Reference branches
- **`main`** at commit `32f3c6c3` — the original queue/workflow code BEFORE the actions refactor
- **`queues-to-actions`** — the actions refactor code with bug fixes (E2B, lazy tasks, etc.)
- **`task-owner-git-auth`** at commit `3684e2e5` — the CURRENT branch with all work including task owner system, lazy tasks, and actions refactor
Use `main` as the reference for the queue/workflow communication patterns. Use `task-owner-git-auth` (current HEAD) as the authoritative source for ALL features and bug fixes that MUST be preserved — it has everything from `queues-to-actions` plus the task owner system.
## What to KEEP (do NOT revert these)
These are bug fixes and improvements made during the actions refactor that are independent of the communication pattern:
### 1. Lazy task actor creation
- Virtual task entries in org's `taskIndex` + `taskSummaries` tables (no actor fan-out during PR sync)
- `refreshTaskSummaryForBranchMutation` writes directly to org tables instead of spawning task actors
- Task actors self-initialize in `getCurrentRecord()` from `getTaskIndexEntry` when lazily created
- `getTaskIndexEntry` action on org actor
- See CLAUDE.md "Lazy Task Actor Creation" section
### 2. `resolveTaskRepoId` replacing `requireRepoExists`
- `requireRepoExists` was removed — it did a cross-actor call from org to github-data that was fragile
- Replaced with `resolveTaskRepoId` which reads from the org's local `taskIndex` table
- `getTask` action resolves `repoId` from task index when not provided (sandbox actor only has taskId)
### 3. `getOrganizationContext` overrides threaded through sync phases
- `fullSyncBranchBatch`, `fullSyncMembers`, `fullSyncPullRequestBatch` now pass `connectedAccount`, `installationStatus`, `installationId` overrides from `FullSyncConfig`
- Without this, phases 2-4 fail with "Organization not initialized" when the org profile doesn't exist yet (webhook-triggered sync before user sign-in)
### 4. E2B sandbox fixes
- `timeoutMs: 60 * 60 * 1000` in E2B create options (TEMPORARY until rivetkit autoPause lands)
- Sandbox repo path uses `/home/user/repo` for E2B compatibility
- `listProcesses` error handling for expired E2B sandboxes
### 5. Frontend fixes
- React `useEffect` dependency stability in `mock-layout.tsx` and `organization-dashboard.tsx` (prevents infinite re-render loops)
- Terminal pane ref handling
### 6. Process crash protection
- `process.on("uncaughtException")` and `process.on("unhandledRejection")` handlers in `foundry/packages/backend/src/index.ts`
### 7. CLAUDE.md updates
- All new sections: lazy task creation rules, no-silent-catch policy, React hook dependency safety, dev workflow instructions, debugging section
### 8. `requireWorkspaceTask` uses `getOrCreate`
- User-initiated actions (createSession, sendMessage, etc.) use `getOrCreate` to lazily materialize virtual tasks
- The `getOrCreate` call passes `{ organizationId, repoId, taskId }` as `createWithInput`
### 9. `getTask` uses `getOrCreate` with `resolveTaskRepoId`
- When `repoId` is not provided (sandbox actor), resolves from task index
- Uses `getOrCreate` since the task may be virtual
### 10. Audit log deleted workflow file
- `foundry/packages/backend/src/actors/audit-log/workflow.ts` was deleted
- The audit-log actor was simplified to a single `append` action
- Keep this simplification — audit-log doesn't need a workflow
### 11. Task owner (primary user) system
- New `task_owner` single-row table in task actor DB schema (`foundry/packages/backend/src/actors/task/db/schema.ts`) — stores `primaryUserId`, `primaryGithubLogin`, `primaryGithubEmail`, `primaryGithubAvatarUrl`
- New migration in `foundry/packages/backend/src/actors/task/db/migrations.ts` creating the `task_owner` table
- `primaryUserLogin` and `primaryUserAvatarUrl` columns added to org's `taskSummaries` table (`foundry/packages/backend/src/actors/organization/db/schema.ts`) + corresponding migration
- `readTaskOwner()`, `upsertTaskOwner()` helpers in `workspace.ts`
- `maybeSwapTaskOwner()` — called from `sendWorkspaceMessage()`, checks if a different user is sending and swaps owner + injects git credentials into sandbox
- `changeTaskOwnerManually()` — called from the new `changeOwner` action on the task actor, updates owner without injecting credentials (credentials injected on next message from that user)
- `injectGitCredentials()` — pushes `git config user.name/email` + credential store file into the sandbox via `runProcess`
- `resolveGithubIdentity()` — resolves user's GitHub login/email/avatar/accessToken from their auth session
- `buildTaskSummary()` now includes `primaryUserLogin` and `primaryUserAvatarUrl` in the summary pushed to org coordinator
- New `changeOwner` action on task actor in `workflow/index.ts`
- New `changeWorkspaceTaskOwner` action on org actor in `actions/tasks.ts`
- New `TaskWorkspaceChangeOwnerInput` type in shared types (`foundry/packages/shared/src/workspace.ts`)
- `TaskSummary` type extended with `primaryUserLogin` and `primaryUserAvatarUrl`
### 12. Task owner UI
- New "Overview" tab in right sidebar (`foundry/packages/frontend/src/components/mock-layout/right-sidebar.tsx`) — shows current owner with avatar, click to open dropdown of org members to change owner
- `onChangeOwner` and `members` props added to `RightSidebar` component
- Primary user login shown in green in left sidebar task items (`foundry/packages/frontend/src/components/mock-layout/sidebar.tsx`)
- `changeWorkspaceTaskOwner` method added to backend client and workspace client interfaces
### 13. Client changes for task owner
- `changeWorkspaceTaskOwner()` added to `backend-client.ts` and all workspace client implementations (mock, remote)
- Mock workspace client implements the owner change
- Subscription manager test updated for new task summary shape
## What to REVERT (communication pattern only)
For each actor, revert from direct action calls back to queue sends with `expectQueueResponse` / fire-and-forget patterns. The reference for the queue patterns is `main` at `32f3c6c3`.
### 1. Organization actor (`foundry/packages/backend/src/actors/organization/`)
**`index.ts`:**
- Revert from actions-only to `run: workflow(runOrganizationWorkflow)`
- Keep the actions that are pure reads (getAppSnapshot, getOrganizationSummarySnapshot, etc.)
- Mutations should go through the workflow queue command loop
**`workflow.ts`:**
- Restore `runOrganizationWorkflow` with the `ctx.loop("organization-command-loop", ...)` that dispatches queue names to mutation handlers
- Restore `ORGANIZATION_QUEUE_NAMES` and `COMMAND_HANDLERS`
- Restore `organizationWorkflowQueueName()` helper
**`app-shell.ts`:**
- Revert direct action calls back to queue sends: `sendOrganizationCommand(org, "organization.command.X", body)` pattern
- Revert `githubData.syncRepos(...)``githubData.send(githubDataWorkflowQueueName("syncRepos"), ...)`
- But KEEP the `getOrganizationContext` override threading fix
**`actions/tasks.ts`:**
- Keep `resolveTaskRepoId` (replacing `requireRepoExists`)
- Keep `requireWorkspaceTask` using `getOrCreate`
- Keep `getTask` using `getOrCreate` with `resolveTaskRepoId`
- Keep `getTaskIndexEntry`
- Keep `changeWorkspaceTaskOwner` (new action — delegates to task actor's `changeOwner`)
- Revert task actor calls from direct actions to queue sends where applicable
**`actions/task-mutations.ts`:**
- Keep lazy task creation (virtual entries in org tables)
- Revert `taskHandle.initialize(...)``taskHandle.send(taskWorkflowQueueName("task.command.initialize"), ...)`
- Revert `task.pullRequestSync(...)``task.send(taskWorkflowQueueName("task.command.pullRequestSync"), ...)`
- Revert `auditLog.append(...)``auditLog.send("auditLog.command.append", ...)`
**`actions/organization.ts`:**
- Revert direct calls to org workflow back to queue sends
**`actions/github.ts`:**
- Revert direct calls back to queue sends
### 2. Task actor (`foundry/packages/backend/src/actors/task/`)
**`index.ts`:**
- Revert from actions-only to `run: workflow(runTaskWorkflow)` (or plain `run` with queue iteration)
- Keep read actions: `get`, `getTaskSummary`, `getTaskDetail`, `getSessionDetail`
**`workflow/index.ts`:**
- Restore `taskCommandActions` as queue handlers in the workflow command loop
- Restore `TASK_QUEUE_NAMES` and dispatch map
- Add `changeOwner` to the queue dispatch map (new command, not in `main` — add as `task.command.changeOwner`)
**`workspace.ts`:**
- Revert sandbox/org action calls back to queue sends where they were queue-based before
- Keep ALL task owner code: `readTaskOwner`, `upsertTaskOwner`, `maybeSwapTaskOwner`, `changeTaskOwnerManually`, `injectGitCredentials`, `resolveGithubIdentity`
- Keep the `authSessionId` param added to `ensureSandboxRepo`
- Keep the `maybeSwapTaskOwner` call in `sendWorkspaceMessage`
- Keep `primaryUserLogin`/`primaryUserAvatarUrl` in `buildTaskSummary`
### 3. User actor (`foundry/packages/backend/src/actors/user/`)
**`index.ts`:**
- Revert from actions-only to `run: workflow(runUserWorkflow)` (or plain run with queue iteration)
**`workflow.ts`:**
- Restore queue command loop dispatching to mutation functions
### 4. GitHub-data actor (`foundry/packages/backend/src/actors/github-data/`)
**`index.ts`:**
- Revert from actions-only to having a run handler with queue iteration
- Keep the `getOrganizationContext` override threading fix
- Keep the `actionTimeout: 10 * 60_000` for long sync operations
### 5. Audit-log actor
- Keep as actions-only (simplified). No need to revert — it's simpler with just `append`.
### 6. Callers
**`foundry/packages/backend/src/services/better-auth.ts`:**
- Revert direct user actor action calls back to queue sends
**`foundry/packages/backend/src/actors/sandbox/index.ts`:**
- Revert `organization.getTask(...)` → queue send if it was queue-based before
- Keep the E2B timeout fix and listProcesses error handling
## Step-by-step procedure
1. Create a new branch from `task-owner-git-auth` (current HEAD)
2. For each actor, open a 3-way comparison: `main` (original queues), `queues-to-actions` (current), and your working copy
3. Restore queue/workflow run handlers and command loops from `main`
4. Restore queue name helpers and constants from `main`
5. Restore caller sites to use queue sends from `main`
6. Carefully preserve all items in the "KEEP" list above
7. Test: `cd foundry && docker compose -f compose.dev.yaml up -d`, sign in, verify GitHub sync completes, verify tasks show in sidebar, verify session creation works
8. Nuke RivetKit data between test runs: `docker volume rm foundry_foundry_rivetkit_storage`
## Verification checklist
- [ ] GitHub sync completes (160 repos for rivet-dev)
- [ ] Tasks show in sidebar (from PR sync, lazy/virtual entries)
- [ ] No task actors spawned during sync (check RivetKit inspector — should see 0 task actors until user clicks one)
- [ ] Clicking a task materializes the actor (lazy creation via getOrCreate)
- [ ] Session creation works on sandbox-agent-testing repo
- [ ] E2B sandbox provisions and connects
- [ ] Agent responds to messages
- [ ] No 500 errors in backend logs (except expected E2B sandbox expiry)
- [ ] Workflow history visible in RivetKit inspector for org, task, user actors
- [ ] CLAUDE.md constraints still documented and respected
- [ ] Task owner shows in right sidebar "Overview" tab
- [ ] Owner dropdown shows org members and allows switching
- [ ] Sending a message as a different user swaps the owner
- [ ] Primary user login shown in green on sidebar task items
- [ ] Git credentials injected into sandbox on owner swap (check `/home/user/.git-token` exists)

View file

@ -1,94 +0,0 @@
# Proposal: RivetKit Sandbox Actor Resilience
## Context
The rivetkit sandbox actor (`src/sandbox/actor.ts`) does not handle the case where the underlying cloud sandbox (e.g. E2B VM) is destroyed while the actor is still alive. This causes cascading 500 errors when the actor tries to call the dead sandbox. Additionally, a UNIQUE constraint bug in event persistence crashes the host process.
The sandbox-agent repo (which defines the E2B provider) will be updated separately to use `autoPause` and expose `pause()`/typed errors. This proposal covers the rivetkit-side changes needed to handle those signals.
## Changes
### 1. Fix `persistObservedEnvelope` UNIQUE constraint crash
**File:** `insertEvent` in the sandbox actor's SQLite persistence layer
The `sandbox_agent_events` table has a UNIQUE constraint on `(session_id, event_index)`. When the same event is observed twice (reconnection, replay, duplicate WebSocket delivery), the insert throws and crashes the host process as an unhandled rejection.
**Fix:** Change the INSERT to `INSERT OR IGNORE` / `ON CONFLICT DO NOTHING`. Duplicate events are expected and harmless — they should be silently deduplicated at the persistence layer.
### 2. Handle destroyed sandbox in `ensureAgent()`
**File:** `src/sandbox/actor.ts``ensureAgent()` function
When the provider's `start()` is called with an existing `sandboxId` and the sandbox no longer exists, the provider throws a typed `SandboxDestroyedError` (defined in the sandbox-agent provider contract).
`ensureAgent()` should catch this error and check the `onSandboxExpired` config option:
```typescript
// New config option on sandboxActor()
onSandboxExpired?: "destroy" | "recreate"; // default: "destroy"
```
**`"destroy"` (default):**
- Set `state.sandboxDestroyed = true`
- Emit `sandboxExpired` event to all connected clients
- All subsequent action calls (runProcess, createSession, etc.) return a clear error: "Sandbox has expired. Create a new task to continue."
- The sandbox actor stays alive (preserves session history, audit log) but rejects new work
**`"recreate"`:**
- Call provider `create()` to provision a fresh sandbox
- Store new `sandboxId` in state
- Emit `sandboxRecreated` event to connected clients with a notice that sessions are lost (new VM, no prior state)
- Resume normal operation with the new sandbox
### 3. Expose `pause` action
**File:** `src/sandbox/actor.ts` — actions
Add a `pause` action that delegates to the provider's `pause()` method. This is user-initiated only (e.g. user clicks "Pause sandbox" in UI to save credits). The sandbox actor should never auto-pause.
```typescript
async pause(c) {
await c.provider.pause();
state.sandboxPaused = true;
c.broadcast("sandboxPaused", {});
}
```
### 4. Expose `resume` action
**File:** `src/sandbox/actor.ts` — actions
Add a `resume` action for explicit recovery. Calls `provider.start({ sandboxId: state.sandboxId })` which auto-resumes if paused.
```typescript
async resume(c) {
await ensureAgent(c); // handles reconnect internally
state.sandboxPaused = false;
c.broadcast("sandboxResumed", {});
}
```
### 5. Keep-alive while sessions are active
**File:** `src/sandbox/actor.ts`
While the sandbox actor has connected WebSocket clients, periodically extend the underlying sandbox TTL to prevent it from being garbage collected mid-session.
- On first client connect: start a keep-alive interval (e.g. every 2 minutes)
- Each tick: call `provider.extendTimeout(extensionMs)` (the provider maps this to `sandbox.setTimeout()` for E2B)
- On last client disconnect: clear the interval, let the sandbox idle toward its natural timeout
This prevents the common case where a user is actively working but the sandbox expires because the E2B default timeout (5 min) is too short. The `timeoutMs` in create options is the initial TTL; keep-alive extends it dynamically.
## Key invariant
**Never silently fail.** Every destroyed/expired/error state must be surfaced to connected clients via events. The actor must always tell the UI what happened so the user can act on it. See CLAUDE.md "never silently catch errors" rule.
## Dependencies
These changes depend on the sandbox-agent provider contract exposing:
- `pause()` method
- `extendTimeout(ms)` method
- Typed `SandboxDestroyedError` thrown from `start()` when sandbox is gone
- `start()` auto-resuming paused sandboxes via `Sandbox.connect(sandboxId)`

View file

@ -1,200 +0,0 @@
# Proposal: Task Primary Owner & Git Authentication
## Problem
Sandbox git operations (commit, push, PR creation) require authentication.
Currently, the sandbox has no user-scoped credentials. The E2B sandbox
clones repos using the GitHub App installation token, but push operations
need user-scoped auth so commits are attributed correctly and branch
protection rules are enforced.
## Design
### Concept: Primary User per Task
Each task has a **primary user** (the "owner"). This is the last user who
sent a message on the task. Their GitHub OAuth credentials are injected
into the sandbox for git operations. When the owner changes, the sandbox
git config and credentials swap to the new user.
### Data Model
**Task actor DB** -- new `task_owner` single-row table:
- `primaryUserId` (text) -- better-auth user ID
- `primaryGithubLogin` (text) -- GitHub username (for `git config user.name`)
- `primaryGithubEmail` (text) -- GitHub email (for `git config user.email`)
- `primaryGithubAvatarUrl` (text) -- avatar for UI display
- `updatedAt` (integer)
**Org coordinator** -- add to `taskSummaries` table:
- `primaryUserLogin` (text, nullable)
- `primaryUserAvatarUrl` (text, nullable)
### Owner Swap Flow
Triggered when `sendWorkspaceMessage` is called with a different user than
the current primary:
1. `sendWorkspaceMessage(authSessionId, ...)` resolves user from auth session
2. Look up user's GitHub identity from auth account table (`providerId = "github"`)
3. Compare `primaryUserId` with current owner. If different:
a. Update `task_owner` row in task actor DB
b. Get user's OAuth `accessToken` from auth account
c. Push into sandbox via `runProcess`:
- `git config user.name "{login}"`
- `git config user.email "{email}"`
- Write token to `/home/user/.git-token` (or equivalent)
d. Push updated task summary to org coordinator (includes `primaryUserLogin`)
e. Broadcast `taskUpdated` to connected clients
4. If same user, no-op (token is still valid)
### Token Injection
The user's GitHub OAuth token (stored in better-auth account table) has
`repo` scope (verified -- see `better-auth.ts` line 480: `scope: ["read:org", "repo"]`).
This is a standard **OAuth App** flow (not GitHub App OAuth). OAuth App
tokens do not expire unless explicitly revoked. No refresh logic is needed.
**Injection method:**
On first sandbox repo setup (`ensureSandboxRepo`), configure:
```bash
# Write token file
echo "{token}" > /home/user/.git-token
chmod 600 /home/user/.git-token
# Configure git to use it
git config --global credential.helper 'store --file=/home/user/.git-token'
# Format: https://{login}:{token}@github.com
echo "https://{login}:{token}@github.com" > /home/user/.git-token
```
On owner swap, overwrite `/home/user/.git-token` with new user's credentials.
**Important: git should never prompt for credentials.** The credential
store file ensures all git operations are auto-authenticated. No
`GIT_ASKPASS` prompts, no interactive auth.
**Race condition (expected behavior):** If User A sends a message and the
agent starts a long git operation, then User B sends a message and triggers
an owner swap, the in-flight git process still has User A's credentials
(already read from the credential store). The next git operation uses
User B's credentials. This is expected behavior -- document in comments.
### Token Validity
OAuth App tokens (our flow) do not expire. They persist until the user
revokes them or the OAuth App is deauthorized. No periodic refresh needed.
If a token becomes invalid (user revokes), git operations will fail with
a 401. The error surfaces through the standard `ensureSandboxRepo` /
`runProcess` error path and is displayed in the UI.
### User Removal
When a user is removed from the organization:
1. Org actor queries active tasks with that user as primary owner
2. For each, clear the `task_owner` row
3. Task actor clears the sandbox git credentials (overwrite credential file)
4. Push updated task summaries to org coordinator
5. Subsequent git operations fail with "No active owner -- assign an owner to enable git operations"
### UI Changes
**Right sidebar -- new "Overview" tab:**
- Add as a new tab alongside "Changes" and "All Files"
- Shows current primary user: avatar, name, login
- Click on the user -> dropdown of all workspace users (from org member list)
- Select a user -> triggers explicit owner swap (same flow as message-triggered)
- Also shows task metadata: branch, repo, created date
**Left sidebar -- task items:**
- Show primary user's GitHub login in green text next to task name
- Only shown when there is an active owner
**Task detail header:**
- Show small avatar of primary user next to task title
### Org Coordinator
`commandApplyTaskSummaryUpdate` already receives the full task summary
from the task actor. Add `primaryUserLogin` and `primaryUserAvatarUrl`
to the summary payload. The org writes it to `taskSummaries`. The sidebar
reads it from the org snapshot.
### Sandbox Architecture Note
Structurally, the system supports multiple sandboxes per task, but in
practice there is exactly one active sandbox per task. Design the owner
injection assuming one sandbox. The token is injected into the active
sandbox only. If multi-sandbox support is needed in the future, extend
the injection to target specific sandbox IDs.
## Security Considerations
### OAuth Token Scope
The user's GitHub OAuth token has `repo` scope, which grants **full control
of all private repositories** the user has access to. When injected into
the sandbox:
- The agent can read/write ANY repo the user has access to, not just the
task's target repo
- The token persists in the sandbox filesystem until overwritten
- Any process running in the sandbox can read the credential file
**Mitigations:**
- Credential file has `chmod 600` (owner-read-only)
- Sandbox is isolated per-task (E2B VM boundary)
- Token is overwritten on owner swap (old user's token removed)
- Token is cleared on user removal from org
- Sandbox has a finite lifetime (E2B timeout + autoPause)
**Accepted risk:** This is the standard trade-off for OAuth-based git
integrations (same as GitHub Codespaces, Gitpod, etc.). The user consents
to `repo` scope at sign-in time. Document this in user-facing terms in
the product's security/privacy page.
### Future: Fine-grained tokens
GitHub supports fine-grained personal access tokens scoped to specific
repos. A future improvement could mint per-repo tokens instead of using
the user's full OAuth token. This requires the user to create and manage
fine-grained tokens, which adds friction. Evaluate based on user feedback.
## Implementation Order
1. Add `task_owner` table to task actor schema + migration
2. Add `primaryUserLogin` / `primaryUserAvatarUrl` to `taskSummaries` schema + migration
3. Implement owner swap in `sendWorkspaceMessage` flow
4. Implement credential injection in `ensureSandboxRepo`
5. Implement credential swap via `runProcess` on owner change
6. Implement user removal cleanup in org actor
7. Add "Overview" tab to right sidebar
8. Add owner display to left sidebar task items
9. Add owner picker dropdown in Overview tab
10. Update org coordinator to propagate owner in task summaries
## Files to Modify
### Backend
- `foundry/packages/backend/src/actors/task/db/schema.ts` -- add `task_owner` table
- `foundry/packages/backend/src/actors/task/db/migrations.ts` -- add migration
- `foundry/packages/backend/src/actors/organization/db/schema.ts` -- add owner columns to `taskSummaries`
- `foundry/packages/backend/src/actors/organization/db/migrations.ts` -- add migration
- `foundry/packages/backend/src/actors/task/workspace.ts` -- owner swap logic in `sendWorkspaceMessage`, credential injection in `ensureSandboxRepo`
- `foundry/packages/backend/src/actors/task/workflow/index.ts` -- wire owner swap action
- `foundry/packages/backend/src/actors/organization/actions/task-mutations.ts` -- propagate owner in summaries
- `foundry/packages/backend/src/actors/organization/actions/tasks.ts` -- `sendWorkspaceMessage` owner check
- `foundry/packages/backend/src/services/better-auth.ts` -- expose `getAccessTokenForSession` for owner lookup
### Shared
- `foundry/packages/shared/src/types.ts` -- add `primaryUserLogin` to `TaskSummary`
### Frontend
- `foundry/packages/frontend/src/components/mock-layout/right-sidebar.tsx` -- add Overview tab
- `foundry/packages/frontend/src/components/organization-dashboard.tsx` -- show owner in sidebar task items
- `foundry/packages/frontend/src/components/mock-layout.tsx` -- wire Overview tab state

1
.gitignore vendored
View file

@ -59,3 +59,4 @@ sdks/cli/platforms/*/bin/
# Foundry desktop app build artifacts
foundry/packages/desktop/frontend-dist/
foundry/packages/desktop/src-tauri/sidecars/
.context/

View file

@ -22,19 +22,6 @@
- `server/packages/sandbox-agent/src/cli.rs`
- Keep docs aligned to implemented endpoints/commands only (for example ACP under `/v1/acp`, not legacy `/v1/sessions` APIs).
## E2E Agent Testing
- When asked to test agents e2e and you do not have the API tokens/credentials required, always stop and ask the user where to find the tokens before proceeding.
## ACP Adapter Audit
- `scripts/audit-acp-deps/adapters.json` is the single source of truth for ACP adapter npm packages, pinned versions, and the `@agentclientprotocol/sdk` pin.
- The Rust fallback install path in `server/packages/agent-management/src/agents.rs` reads adapter entries from `adapters.json` at compile time via `include_str!`.
- Run `cd scripts/audit-acp-deps && npx tsx audit.ts` to compare our pinned versions against the ACP registry and npm latest.
- When bumping an adapter version, update `adapters.json` only — the Rust code picks it up automatically.
- When adding a new agent, add an entry to `adapters.json` (the `_` fallback arm in `install_agent_process_fallback` handles it).
- When updating the `@agentclientprotocol/sdk` pin, update both `adapters.json` (sdkDeps) and `sdks/acp-http-client/package.json`.
## Change Tracking
- If the user asks to "push" changes, treat that as permission to commit and push all current workspace changes, not a hand-picked subset, unless the user explicitly scopes the push.
@ -43,41 +30,22 @@
- Regenerate `docs/openapi.json` when HTTP contracts change.
- Keep `docs/inspector.mdx` and `docs/sdks/typescript.mdx` aligned with implementation.
- Append blockers/decisions to `research/acp/friction.md` during ACP work.
- Each agent has its own doc page at `docs/agents/<name>.mdx` listing models, modes, and thought levels. Update the relevant page when changing `fallback_config_options`. To regenerate capability data, run `cd scripts/agent-configs && npx tsx dump.ts`. Source data: `scripts/agent-configs/resources/*.json` and hardcoded entries in `server/packages/sandbox-agent/src/router/support.rs` (`fallback_config_options`).
- `docs/agent-capabilities.mdx` lists models/modes/thought levels per agent. Update it when adding a new agent or changing `fallback_config_options`. If its "Last updated" date is >2 weeks old, re-run `cd scripts/agent-configs && npx tsx dump.ts` and update the doc to match. Source data: `scripts/agent-configs/resources/*.json` and hardcoded entries in `server/packages/sandbox-agent/src/router/support.rs` (`fallback_config_options`).
- Some agent models are gated by subscription (e.g. Claude `opus`). The live report only shows models available to the current credentials. The static doc and JSON resource files should list all known models regardless of subscription tier.
## Adding Providers
## Docker Test Image
When adding a new sandbox provider, update all of the following:
- Docker-backed Rust and TypeScript tests build `docker/test-agent/Dockerfile` directly in-process and cache the image tag only in memory (`OnceLock` in Rust, module-level variable in TypeScript).
- Do not add cross-process image-build scripts unless there is a concrete need for them.
- `sdks/typescript/src/providers/<name>.ts` — provider implementation
- `sdks/typescript/package.json` — add `./<name>` export, peerDependencies, peerDependenciesMeta, devDependencies
- `sdks/typescript/tsup.config.ts` — add entry point and external
- `sdks/typescript/tests/providers.test.ts` — add test entry
- `examples/<name>/` — create example with `src/index.ts` and `tests/<name>.test.ts`
- `docs/deploy/<name>.mdx` — create deploy guide
- `docs/docs.json` — add to Deploy pages navigation
- `docs/quickstart.mdx` — add tab in "Start the sandbox" step, add credentials entry in "Passing LLM credentials" accordion
## Common Software Sync
## Adding Agents
When adding a new agent, update all of the following:
- `docs/agents/<name>.mdx` — create agent page with usage snippet and capabilities table
- `docs/docs.json` — add to the Agents group under Agent
- `docs/quickstart.mdx` — add tab in the "Create a session and send a prompt" CodeGroup
## Persist Packages (Deprecated)
- The `@sandbox-agent/persist-*` npm packages (`persist-sqlite`, `persist-postgres`, `persist-indexeddb`, `persist-rivet`) are deprecated stubs. They still publish to npm but throw a deprecation error at import time.
- Driver implementations now live inline in examples and consuming packages:
- SQLite: `examples/persist-sqlite/src/persist.ts`
- Postgres: `examples/persist-postgres/src/persist.ts`
- IndexedDB: `frontend/packages/inspector/src/persist-indexeddb.ts`
- Rivet: inlined in `docs/multiplayer.mdx`
- In-memory: built into the main `sandbox-agent` SDK (`InMemorySessionPersistDriver`)
- Docs (`docs/session-persistence.mdx`) link to the example implementations on GitHub instead of referencing the packages.
- Do not re-add `@sandbox-agent/persist-*` as dependencies anywhere. New persist drivers should be copied into the consuming project directly.
- These three files must stay in sync:
- `docs/common-software.mdx` (user-facing documentation)
- `docker/test-common-software/Dockerfile` (packages installed in the test image)
- `server/packages/sandbox-agent/tests/common_software.rs` (test assertions)
- When adding or removing software from `docs/common-software.mdx`, also add/remove the corresponding `apt-get install` line in the Dockerfile and add/remove the test in `common_software.rs`.
- Run `cargo test -p sandbox-agent --test common_software` to verify.
## Install Version References
@ -93,27 +61,20 @@ When adding a new agent, update all of the following:
- `docs/sdk-overview.mdx`
- `docs/react-components.mdx`
- `docs/session-persistence.mdx`
- `docs/architecture.mdx`
- `docs/deploy/local.mdx`
- `docs/deploy/cloudflare.mdx`
- `docs/deploy/vercel.mdx`
- `docs/deploy/daytona.mdx`
- `docs/deploy/e2b.mdx`
- `docs/deploy/docker.mdx`
- `docs/deploy/boxlite.mdx`
- `docs/deploy/modal.mdx`
- `docs/deploy/computesdk.mdx`
- `frontend/packages/website/src/components/GetStarted.tsx`
- `.claude/commands/post-release-testing.md`
- `examples/cloudflare/Dockerfile`
- `examples/boxlite/Dockerfile`
- `examples/boxlite-python/Dockerfile`
- `examples/daytona/src/index.ts`
- `examples/shared/src/docker.ts`
- `examples/docker/src/index.ts`
- `examples/e2b/src/index.ts`
- `examples/vercel/src/index.ts`
- `sdks/typescript/src/providers/shared.ts`
- `scripts/release/main.ts`
- `scripts/release/promote-artifacts.ts`
- `scripts/release/sdk.ts`

View file

@ -0,0 +1,7 @@
FROM node:22-bookworm-slim
RUN npm install -g pnpm@10.28.2
WORKDIR /app
CMD ["bash", "-lc", "pnpm install --filter @sandbox-agent/inspector... && cd frontend/packages/inspector && exec pnpm vite --host 0.0.0.0 --port 5173"]

View file

@ -149,7 +149,8 @@ FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y \
ca-certificates \
curl \
git && \
git \
ffmpeg && \
rm -rf /var/lib/apt/lists/*
# Copy the binary from builder

View file

@ -0,0 +1,61 @@
FROM rust:1.88.0-bookworm AS builder
WORKDIR /build
COPY Cargo.toml Cargo.lock ./
COPY server/ ./server/
COPY gigacode/ ./gigacode/
COPY resources/agent-schemas/artifacts/ ./resources/agent-schemas/artifacts/
COPY scripts/agent-configs/ ./scripts/agent-configs/
COPY scripts/audit-acp-deps/ ./scripts/audit-acp-deps/
ENV SANDBOX_AGENT_SKIP_INSPECTOR=1
RUN --mount=type=cache,target=/usr/local/cargo/registry \
--mount=type=cache,target=/usr/local/cargo/git \
--mount=type=cache,target=/build/target \
cargo build -p sandbox-agent --release && \
cp target/release/sandbox-agent /sandbox-agent
# Extract neko binary from the official image for WebRTC desktop streaming.
# Using neko v3 base image from GHCR which provides multi-arch support (amd64, arm64).
# Pinned by digest to prevent breaking changes from upstream.
# Reference client: https://github.com/demodesk/neko-client/blob/37f93eae6bd55b333c94bd009d7f2b079075a026/src/component/internal/webrtc.ts
FROM ghcr.io/m1k1o/neko/base@sha256:0c384afa56268aaa2d5570211d284763d0840dcdd1a7d9a24be3081d94d3dfce AS neko-base
FROM node:22-bookworm-slim
RUN apt-get update -qq && \
apt-get install -y -qq --no-install-recommends \
ca-certificates \
bash \
libstdc++6 \
xvfb \
openbox \
xdotool \
imagemagick \
ffmpeg \
gstreamer1.0-tools \
gstreamer1.0-plugins-base \
gstreamer1.0-plugins-good \
gstreamer1.0-plugins-bad \
gstreamer1.0-plugins-ugly \
gstreamer1.0-nice \
gstreamer1.0-x \
gstreamer1.0-pulseaudio \
libxcvt0 \
x11-xserver-utils \
dbus-x11 \
xauth \
fonts-dejavu-core \
xterm \
> /dev/null 2>&1 && \
rm -rf /var/lib/apt/lists/*
COPY --from=builder /sandbox-agent /usr/local/bin/sandbox-agent
COPY --from=neko-base /usr/bin/neko /usr/local/bin/neko
EXPOSE 3000
# Expose UDP port range for WebRTC media transport
EXPOSE 59050-59070/udp
ENTRYPOINT ["/usr/local/bin/sandbox-agent"]
CMD ["server", "--host", "0.0.0.0", "--port", "3000", "--no-token"]

View file

@ -0,0 +1,37 @@
# Extends the base test-agent image with common software pre-installed.
# Used by the common_software integration test to verify that all documented
# software in docs/common-software.mdx works correctly inside the sandbox.
#
# KEEP IN SYNC with docs/common-software.mdx
ARG BASE_IMAGE=sandbox-agent-test:dev
FROM ${BASE_IMAGE}
USER root
RUN apt-get update -qq && \
apt-get install -y -qq --no-install-recommends \
# Browsers
chromium \
firefox-esr \
# Languages
python3 python3-pip python3-venv \
default-jdk \
ruby-full \
# Databases
sqlite3 \
redis-server \
# Build tools
build-essential cmake pkg-config \
# CLI tools
git jq tmux \
# Media and graphics
imagemagick \
poppler-utils \
# Desktop apps
gimp \
> /dev/null 2>&1 && \
rm -rf /var/lib/apt/lists/*
ENTRYPOINT ["/usr/local/bin/sandbox-agent"]
CMD ["server", "--host", "0.0.0.0", "--port", "3000", "--no-token"]

View file

@ -37,6 +37,36 @@ Notes:
- Set `SANDBOX_AGENT_LOG_STDOUT=1` to force stdout/stderr logging.
- Use `SANDBOX_AGENT_LOG_DIR` to override log directory.
## install
Install first-party runtime dependencies.
### install desktop
Install the Linux desktop runtime packages required by `/v1/desktop/*`.
```bash
sandbox-agent install desktop [OPTIONS]
```
| Option | Description |
|--------|-------------|
| `--yes` | Skip the confirmation prompt |
| `--print-only` | Print the package-manager command without executing it |
| `--package-manager <apt\|dnf\|apk>` | Override package-manager detection |
| `--no-fonts` | Skip the default DejaVu font package |
```bash
sandbox-agent install desktop --yes
sandbox-agent install desktop --print-only
```
Notes:
- Supported on Linux only.
- The command detects `apt`, `dnf`, or `apk`.
- If the host is not already running as root, the command requires `sudo`.
## install-agent
Install or reinstall a single agent, or every supported agent with `--all`.

560
docs/common-software.mdx Normal file
View file

@ -0,0 +1,560 @@
---
title: "Common Software"
description: "Install browsers, languages, databases, and other tools inside the sandbox."
sidebarTitle: "Common Software"
icon: "box-open"
---
The sandbox runs a Debian/Ubuntu base image. You can install software with `apt-get` via the [Process API](/processes) or by customizing your Docker image. This page covers commonly needed packages and how to install them.
## Browsers
### Chromium
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "chromium", "chromium-sandbox"],
});
// Launch headless
await sdk.runProcess({
command: "chromium",
args: ["--headless", "--no-sandbox", "--disable-gpu", "https://example.com"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","chromium","chromium-sandbox"]}'
```
</CodeGroup>
<Note>
Use `--no-sandbox` when running Chromium inside a container. The container itself provides isolation.
</Note>
### Firefox
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "firefox-esr"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","firefox-esr"]}'
```
</CodeGroup>
### Playwright browsers
Playwright bundles its own browser binaries. Install the Playwright CLI and let it download browsers for you.
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "npx",
args: ["playwright", "install", "--with-deps", "chromium"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"npx","args":["playwright","install","--with-deps","chromium"]}'
```
</CodeGroup>
---
## Languages and runtimes
### Node.js
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "nodejs", "npm"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","nodejs","npm"]}'
```
</CodeGroup>
For a specific version, use [nvm](https://github.com/nvm-sh/nvm):
```ts TypeScript
await sdk.runProcess({
command: "bash",
args: ["-c", "curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash && . ~/.nvm/nvm.sh && nvm install 22"],
});
```
### Python
Python 3 is typically pre-installed. To add pip and common packages:
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "python3", "python3-pip", "python3-venv"],
});
await sdk.runProcess({
command: "pip3",
args: ["install", "numpy", "pandas", "matplotlib"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","python3","python3-pip","python3-venv"]}'
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"pip3","args":["install","numpy","pandas","matplotlib"]}'
```
</CodeGroup>
### Go
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "bash",
args: ["-c", "curl -fsSL https://go.dev/dl/go1.23.6.linux-amd64.tar.gz | tar -C /usr/local -xz"],
});
// Add to PATH for subsequent commands
await sdk.runProcess({
command: "bash",
args: ["-c", "export PATH=$PATH:/usr/local/go/bin && go version"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"bash","args":["-c","curl -fsSL https://go.dev/dl/go1.23.6.linux-amd64.tar.gz | tar -C /usr/local -xz"]}'
```
</CodeGroup>
### Rust
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "bash",
args: ["-c", "curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"bash","args":["-c","curl --proto =https --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y"]}'
```
</CodeGroup>
### Java (OpenJDK)
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "default-jdk"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","default-jdk"]}'
```
</CodeGroup>
### Ruby
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "ruby-full"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","ruby-full"]}'
```
</CodeGroup>
---
## Databases
### PostgreSQL
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "postgresql", "postgresql-client"],
});
// Start the service
const proc = await sdk.createProcess({
command: "bash",
args: ["-c", "su - postgres -c 'pg_ctlcluster 15 main start'"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","postgresql","postgresql-client"]}'
```
</CodeGroup>
### SQLite
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "sqlite3"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","sqlite3"]}'
```
</CodeGroup>
### Redis
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "redis-server"],
});
const proc = await sdk.createProcess({
command: "redis-server",
args: ["--daemonize", "no"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","redis-server"]}'
curl -X POST "http://127.0.0.1:2468/v1/processes" \
-H "Content-Type: application/json" \
-d '{"command":"redis-server","args":["--daemonize","no"]}'
```
</CodeGroup>
### MySQL / MariaDB
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "mariadb-server", "mariadb-client"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","mariadb-server","mariadb-client"]}'
```
</CodeGroup>
---
## Build tools
### Essential build toolchain
Most compiled software needs the standard build toolchain:
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "build-essential", "cmake", "pkg-config"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","build-essential","cmake","pkg-config"]}'
```
</CodeGroup>
This installs `gcc`, `g++`, `make`, `cmake`, and related tools.
---
## Desktop applications
These require the [Computer Use](/computer-use) desktop to be started first.
### LibreOffice
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "libreoffice"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","libreoffice"]}'
```
</CodeGroup>
### GIMP
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "gimp"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","gimp"]}'
```
</CodeGroup>
### VLC
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "vlc"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","vlc"]}'
```
</CodeGroup>
### VS Code (code-server)
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "bash",
args: ["-c", "curl -fsSL https://code-server.dev/install.sh | sh"],
});
const proc = await sdk.createProcess({
command: "code-server",
args: ["--bind-addr", "0.0.0.0:8080", "--auth", "none"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"bash","args":["-c","curl -fsSL https://code-server.dev/install.sh | sh"]}'
curl -X POST "http://127.0.0.1:2468/v1/processes" \
-H "Content-Type: application/json" \
-d '{"command":"code-server","args":["--bind-addr","0.0.0.0:8080","--auth","none"]}'
```
</CodeGroup>
---
## CLI tools
### Git
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "git"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","git"]}'
```
</CodeGroup>
### Docker
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "bash",
args: ["-c", "curl -fsSL https://get.docker.com | sh"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"bash","args":["-c","curl -fsSL https://get.docker.com | sh"]}'
```
</CodeGroup>
### jq
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "jq"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","jq"]}'
```
</CodeGroup>
### tmux
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "tmux"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","tmux"]}'
```
</CodeGroup>
---
## Media and graphics
### FFmpeg
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "ffmpeg"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","ffmpeg"]}'
```
</CodeGroup>
### ImageMagick
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "imagemagick"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","imagemagick"]}'
```
</CodeGroup>
### Poppler (PDF utilities)
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "poppler-utils"],
});
// Convert PDF to images
await sdk.runProcess({
command: "pdftoppm",
args: ["-png", "document.pdf", "output"],
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","poppler-utils"]}'
```
</CodeGroup>
---
## Pre-installing in a Docker image
For production use, install software in your Dockerfile instead of at runtime. This avoids repeated downloads and makes startup faster.
```dockerfile
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
chromium \
firefox-esr \
nodejs npm \
python3 python3-pip \
git curl wget \
build-essential \
sqlite3 \
ffmpeg \
imagemagick \
jq \
&& rm -rf /var/lib/apt/lists/*
RUN pip3 install numpy pandas matplotlib
```
See [Docker deployment](/deploy/docker) for how to use custom images with Sandbox Agent.

859
docs/computer-use.mdx Normal file
View file

@ -0,0 +1,859 @@
---
title: "Computer Use"
description: "Control a virtual desktop inside the sandbox with mouse, keyboard, screenshots, recordings, and live streaming."
sidebarTitle: "Computer Use"
icon: "desktop"
---
Sandbox Agent provides a managed virtual desktop (Xvfb + openbox) that you can control programmatically. This is useful for browser automation, GUI testing, and AI computer-use workflows.
## Start and stop
<CodeGroup>
```ts TypeScript
import { SandboxAgent } from "sandbox-agent";
const sdk = await SandboxAgent.connect({
baseUrl: "http://127.0.0.1:2468",
});
const status = await sdk.startDesktop({
width: 1920,
height: 1080,
dpi: 96,
});
console.log(status.state); // "active"
console.log(status.display); // ":99"
// When done
await sdk.stopDesktop();
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/desktop/start" \
-H "Content-Type: application/json" \
-d '{"width":1920,"height":1080,"dpi":96}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/stop"
```
</CodeGroup>
All fields in the start request are optional. Defaults are 1440x900 at 96 DPI.
### Start request options
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `width` | number | 1440 | Desktop width in pixels |
| `height` | number | 900 | Desktop height in pixels |
| `dpi` | number | 96 | Display DPI |
| `displayNum` | number | 99 | Starting X display number. The runtime probes from this number upward to find an available display. |
| `stateDir` | string | (auto) | Desktop state directory for home, logs, recordings |
| `streamVideoCodec` | string | `"vp8"` | WebRTC video codec (`vp8`, `vp9`, `h264`) |
| `streamAudioCodec` | string | `"opus"` | WebRTC audio codec (`opus`, `g722`) |
| `streamFrameRate` | number | 30 | Streaming frame rate (1-60) |
| `webrtcPortRange` | string | `"59050-59070"` | UDP port range for WebRTC media |
| `recordingFps` | number | 30 | Default recording FPS when not specified in `startDesktopRecording` (1-60) |
The streaming and recording options configure defaults for the desktop session. They take effect when streaming or recording is started later.
<CodeGroup>
```ts TypeScript
const status = await sdk.startDesktop({
width: 1920,
height: 1080,
streamVideoCodec: "h264",
streamFrameRate: 60,
webrtcPortRange: "59100-59120",
recordingFps: 15,
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/desktop/start" \
-H "Content-Type: application/json" \
-d '{
"width": 1920,
"height": 1080,
"streamVideoCodec": "h264",
"streamFrameRate": 60,
"webrtcPortRange": "59100-59120",
"recordingFps": 15
}'
```
</CodeGroup>
## Status
<CodeGroup>
```ts TypeScript
const status = await sdk.getDesktopStatus();
console.log(status.state); // "inactive" | "active" | "failed" | ...
```
```bash cURL
curl "http://127.0.0.1:2468/v1/desktop/status"
```
</CodeGroup>
## Screenshots
Capture the full desktop or a specific region. Optionally include the cursor position.
<CodeGroup>
```ts TypeScript
// Full screenshot (PNG by default)
const png = await sdk.takeDesktopScreenshot();
// JPEG at 70% quality, half scale
const jpeg = await sdk.takeDesktopScreenshot({
format: "jpeg",
quality: 70,
scale: 0.5,
});
// Include cursor overlay
const withCursor = await sdk.takeDesktopScreenshot({
showCursor: true,
});
// Region screenshot
const region = await sdk.takeDesktopRegionScreenshot({
x: 100,
y: 100,
width: 400,
height: 300,
});
```
```bash cURL
curl "http://127.0.0.1:2468/v1/desktop/screenshot" --output screenshot.png
curl "http://127.0.0.1:2468/v1/desktop/screenshot?format=jpeg&quality=70&scale=0.5" \
--output screenshot.jpg
# Include cursor overlay
curl "http://127.0.0.1:2468/v1/desktop/screenshot?show_cursor=true" \
--output with_cursor.png
curl "http://127.0.0.1:2468/v1/desktop/screenshot/region?x=100&y=100&width=400&height=300" \
--output region.png
```
</CodeGroup>
### Screenshot options
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `format` | string | `"png"` | Output format: `png`, `jpeg`, or `webp` |
| `quality` | number | 85 | Compression quality (1-100, JPEG/WebP only) |
| `scale` | number | 1.0 | Scale factor (0.1-1.0) |
| `showCursor` | boolean | `false` | Composite a crosshair at the cursor position |
When `showCursor` is enabled, the cursor position is captured at the moment of the screenshot and a red crosshair is drawn at that location. This is useful for AI agents that need to see where the cursor is in the screenshot.
## Mouse
<CodeGroup>
```ts TypeScript
// Get current position
const pos = await sdk.getDesktopMousePosition();
console.log(pos.x, pos.y);
// Move
await sdk.moveDesktopMouse({ x: 500, y: 300 });
// Click (left by default)
await sdk.clickDesktop({ x: 500, y: 300 });
// Right click
await sdk.clickDesktop({ x: 500, y: 300, button: "right" });
// Double click
await sdk.clickDesktop({ x: 500, y: 300, clickCount: 2 });
// Drag
await sdk.dragDesktopMouse({
startX: 100, startY: 100,
endX: 400, endY: 400,
});
// Scroll
await sdk.scrollDesktop({ x: 500, y: 300, deltaY: -3 });
```
```bash cURL
curl "http://127.0.0.1:2468/v1/desktop/mouse/position"
curl -X POST "http://127.0.0.1:2468/v1/desktop/mouse/click" \
-H "Content-Type: application/json" \
-d '{"x":500,"y":300}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/mouse/drag" \
-H "Content-Type: application/json" \
-d '{"startX":100,"startY":100,"endX":400,"endY":400}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/mouse/scroll" \
-H "Content-Type: application/json" \
-d '{"x":500,"y":300,"deltaY":-3}'
```
</CodeGroup>
## Keyboard
<CodeGroup>
```ts TypeScript
// Type text
await sdk.typeDesktopText({ text: "Hello, world!" });
// Press a key with modifiers
await sdk.pressDesktopKey({
key: "c",
modifiers: { ctrl: true },
});
// Low-level key down/up
await sdk.keyDownDesktop({ key: "Shift_L" });
await sdk.keyUpDesktop({ key: "Shift_L" });
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/desktop/keyboard/type" \
-H "Content-Type: application/json" \
-d '{"text":"Hello, world!"}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/keyboard/press" \
-H "Content-Type: application/json" \
-d '{"key":"c","modifiers":{"ctrl":true}}'
```
</CodeGroup>
## Clipboard
Read and write the X11 clipboard programmatically.
<CodeGroup>
```ts TypeScript
// Read clipboard
const clipboard = await sdk.getDesktopClipboard();
console.log(clipboard.text);
// Read primary selection (mouse-selected text)
const primary = await sdk.getDesktopClipboard({ selection: "primary" });
// Write to clipboard
await sdk.setDesktopClipboard({ text: "Pasted via API" });
// Write to both clipboard and primary selection
await sdk.setDesktopClipboard({
text: "Synced text",
selection: "both",
});
```
```bash cURL
curl "http://127.0.0.1:2468/v1/desktop/clipboard"
curl "http://127.0.0.1:2468/v1/desktop/clipboard?selection=primary"
curl -X POST "http://127.0.0.1:2468/v1/desktop/clipboard" \
-H "Content-Type: application/json" \
-d '{"text":"Pasted via API"}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/clipboard" \
-H "Content-Type: application/json" \
-d '{"text":"Synced text","selection":"both"}'
```
</CodeGroup>
The `selection` parameter controls which X11 selection to read or write:
| Value | Description |
|-------|-------------|
| `clipboard` (default) | The standard clipboard (Ctrl+C / Ctrl+V) |
| `primary` | The primary selection (text selected with the mouse) |
| `both` | Write to both clipboard and primary selection (write only) |
## Display and windows
<CodeGroup>
```ts TypeScript
const display = await sdk.getDesktopDisplayInfo();
console.log(display.resolution); // { width: 1920, height: 1080, dpi: 96 }
const { windows } = await sdk.listDesktopWindows();
for (const win of windows) {
console.log(win.title, win.x, win.y, win.width, win.height);
}
```
```bash cURL
curl "http://127.0.0.1:2468/v1/desktop/display/info"
curl "http://127.0.0.1:2468/v1/desktop/windows"
```
</CodeGroup>
The windows endpoint filters out noise automatically: window manager internals (Openbox), windows with empty titles, and tiny helper windows (under 120x80) are excluded. The currently active/focused window is always included regardless of filters.
### Focused window
Get the currently focused window without listing all windows.
<CodeGroup>
```ts TypeScript
const focused = await sdk.getDesktopFocusedWindow();
console.log(focused.title, focused.id);
```
```bash cURL
curl "http://127.0.0.1:2468/v1/desktop/windows/focused"
```
</CodeGroup>
Returns 404 if no window currently has focus.
### Window management
Focus, move, and resize windows by their X11 window ID.
<CodeGroup>
```ts TypeScript
const { windows } = await sdk.listDesktopWindows();
const win = windows[0];
// Bring window to foreground
await sdk.focusDesktopWindow(win.id);
// Move window
await sdk.moveDesktopWindow(win.id, { x: 100, y: 50 });
// Resize window
await sdk.resizeDesktopWindow(win.id, { width: 1280, height: 720 });
```
```bash cURL
# Focus a window
curl -X POST "http://127.0.0.1:2468/v1/desktop/windows/12345/focus"
# Move a window
curl -X POST "http://127.0.0.1:2468/v1/desktop/windows/12345/move" \
-H "Content-Type: application/json" \
-d '{"x":100,"y":50}'
# Resize a window
curl -X POST "http://127.0.0.1:2468/v1/desktop/windows/12345/resize" \
-H "Content-Type: application/json" \
-d '{"width":1280,"height":720}'
```
</CodeGroup>
All three endpoints return the updated window info so you can verify the operation took effect. The window manager may adjust the requested position or size.
## App launching
Launch applications or open files/URLs on the desktop without needing to shell out.
<CodeGroup>
```ts TypeScript
// Launch an app by name
const result = await sdk.launchDesktopApp({
app: "firefox",
args: ["--private"],
});
console.log(result.processId); // "proc_7"
// Launch and wait for the window to appear
const withWindow = await sdk.launchDesktopApp({
app: "xterm",
wait: true,
});
console.log(withWindow.windowId); // "12345" or null if timed out
// Open a URL with the default handler
const opened = await sdk.openDesktopTarget({
target: "https://example.com",
});
console.log(opened.processId);
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/desktop/launch" \
-H "Content-Type: application/json" \
-d '{"app":"firefox","args":["--private"]}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/launch" \
-H "Content-Type: application/json" \
-d '{"app":"xterm","wait":true}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/open" \
-H "Content-Type: application/json" \
-d '{"target":"https://example.com"}'
```
</CodeGroup>
The returned `processId` can be used with the [Process API](/processes) to read logs (`GET /v1/processes/{id}/logs`) or stop the application (`POST /v1/processes/{id}/stop`).
When `wait` is `true`, the API polls for up to 5 seconds for a window to appear. If the window appears, its ID is returned in `windowId`. If it times out, `windowId` is `null` but the process is still running.
<Tip>
**Launch/Open vs the Process API:** Both `launch` and `open` are convenience wrappers around the [Process API](/processes). They create managed processes (with `owner: "desktop"`) that you can inspect, log, and stop through the same Process endpoints. The difference is that `launch` validates the binary exists in PATH first and can optionally wait for a window to appear, while `open` delegates to the system default handler (`xdg-open`). Use the Process API directly when you need full control over command, environment, working directory, or restart policies.
</Tip>
## Recording
Record the desktop to MP4.
<CodeGroup>
```ts TypeScript
const recording = await sdk.startDesktopRecording({ fps: 30 });
console.log(recording.id);
// ... do things ...
const stopped = await sdk.stopDesktopRecording();
// List all recordings
const { recordings } = await sdk.listDesktopRecordings();
// Download
const mp4 = await sdk.downloadDesktopRecording(recording.id);
// Clean up
await sdk.deleteDesktopRecording(recording.id);
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/desktop/recording/start" \
-H "Content-Type: application/json" \
-d '{"fps":30}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/recording/stop"
curl "http://127.0.0.1:2468/v1/desktop/recordings"
curl "http://127.0.0.1:2468/v1/desktop/recordings/rec_1/download" --output recording.mp4
curl -X DELETE "http://127.0.0.1:2468/v1/desktop/recordings/rec_1"
```
</CodeGroup>
## Desktop processes
The desktop runtime manages several background processes (Xvfb, openbox, neko, ffmpeg). These are all registered with the general [Process API](/processes) under the `desktop` owner, so you can inspect logs, check status, and troubleshoot using the same tools you use for any other managed process.
<CodeGroup>
```ts TypeScript
// List all processes, including desktop-owned ones
const { processes } = await sdk.listProcesses();
const desktopProcs = processes.filter((p) => p.owner === "desktop");
for (const p of desktopProcs) {
console.log(p.id, p.command, p.status);
}
// Read logs from a specific desktop process
const logs = await sdk.getProcessLogs(desktopProcs[0].id, { tail: 50 });
for (const entry of logs.entries) {
console.log(entry.stream, atob(entry.data));
}
```
```bash cURL
# List all processes (desktop processes have owner: "desktop")
curl "http://127.0.0.1:2468/v1/processes"
# Get logs from a specific desktop process
curl "http://127.0.0.1:2468/v1/processes/proc_1/logs?tail=50"
```
</CodeGroup>
The desktop status endpoint also includes a summary of running processes:
<CodeGroup>
```ts TypeScript
const status = await sdk.getDesktopStatus();
for (const proc of status.processes) {
console.log(proc.name, proc.pid, proc.running);
}
```
```bash cURL
curl "http://127.0.0.1:2468/v1/desktop/status"
# Response includes: processes: [{ name: "Xvfb", pid: 123, running: true }, ...]
```
</CodeGroup>
| Process | Role | Restart policy |
|---------|------|---------------|
| Xvfb | Virtual X11 framebuffer | Auto-restart while desktop is active |
| openbox | Window manager | Auto-restart while desktop is active |
| neko | WebRTC streaming server (started by `startDesktopStream`) | No auto-restart |
| ffmpeg | Screen recorder (started by `startDesktopRecording`) | No auto-restart |
## Live streaming
Start a WebRTC stream for real-time desktop viewing in a browser.
<CodeGroup>
```ts TypeScript
await sdk.startDesktopStream();
// Check stream status
const status = await sdk.getDesktopStreamStatus();
console.log(status.active); // true
console.log(status.processId); // "proc_5"
// Connect via the React DesktopViewer component or
// use the WebSocket signaling endpoint directly
// at ws://127.0.0.1:2468/v1/desktop/stream/signaling
await sdk.stopDesktopStream();
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/desktop/stream/start"
# Check stream status
curl "http://127.0.0.1:2468/v1/desktop/stream/status"
# Connect to ws://127.0.0.1:2468/v1/desktop/stream/signaling for WebRTC signaling
curl -X POST "http://127.0.0.1:2468/v1/desktop/stream/stop"
```
</CodeGroup>
For a drop-in React component, see [React Components](/react-components).
## API reference
### Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/desktop/start` | Start the desktop runtime |
| `POST` | `/v1/desktop/stop` | Stop the desktop runtime |
| `GET` | `/v1/desktop/status` | Get desktop runtime status |
| `GET` | `/v1/desktop/screenshot` | Capture full desktop screenshot |
| `GET` | `/v1/desktop/screenshot/region` | Capture a region screenshot |
| `GET` | `/v1/desktop/mouse/position` | Get current mouse position |
| `POST` | `/v1/desktop/mouse/move` | Move the mouse |
| `POST` | `/v1/desktop/mouse/click` | Click the mouse |
| `POST` | `/v1/desktop/mouse/down` | Press mouse button down |
| `POST` | `/v1/desktop/mouse/up` | Release mouse button |
| `POST` | `/v1/desktop/mouse/drag` | Drag from one point to another |
| `POST` | `/v1/desktop/mouse/scroll` | Scroll at a position |
| `POST` | `/v1/desktop/keyboard/type` | Type text |
| `POST` | `/v1/desktop/keyboard/press` | Press a key with optional modifiers |
| `POST` | `/v1/desktop/keyboard/down` | Press a key down (hold) |
| `POST` | `/v1/desktop/keyboard/up` | Release a key |
| `GET` | `/v1/desktop/display/info` | Get display info |
| `GET` | `/v1/desktop/windows` | List visible windows |
| `GET` | `/v1/desktop/windows/focused` | Get focused window info |
| `POST` | `/v1/desktop/windows/{id}/focus` | Focus a window |
| `POST` | `/v1/desktop/windows/{id}/move` | Move a window |
| `POST` | `/v1/desktop/windows/{id}/resize` | Resize a window |
| `GET` | `/v1/desktop/clipboard` | Read clipboard contents |
| `POST` | `/v1/desktop/clipboard` | Write to clipboard |
| `POST` | `/v1/desktop/launch` | Launch an application |
| `POST` | `/v1/desktop/open` | Open a file or URL |
| `POST` | `/v1/desktop/recording/start` | Start recording |
| `POST` | `/v1/desktop/recording/stop` | Stop recording |
| `GET` | `/v1/desktop/recordings` | List recordings |
| `GET` | `/v1/desktop/recordings/{id}` | Get recording metadata |
| `GET` | `/v1/desktop/recordings/{id}/download` | Download recording |
| `DELETE` | `/v1/desktop/recordings/{id}` | Delete recording |
| `POST` | `/v1/desktop/stream/start` | Start WebRTC streaming |
| `POST` | `/v1/desktop/stream/stop` | Stop WebRTC streaming |
| `GET` | `/v1/desktop/stream/status` | Get stream status |
| `GET` | `/v1/desktop/stream/signaling` | WebSocket for WebRTC signaling |
### TypeScript SDK methods
| Method | Returns | Description |
|--------|---------|-------------|
| `startDesktop(request?)` | `DesktopStatusResponse` | Start the desktop |
| `stopDesktop()` | `DesktopStatusResponse` | Stop the desktop |
| `getDesktopStatus()` | `DesktopStatusResponse` | Get desktop status |
| `takeDesktopScreenshot(query?)` | `Uint8Array` | Capture screenshot |
| `takeDesktopRegionScreenshot(query)` | `Uint8Array` | Capture region screenshot |
| `getDesktopMousePosition()` | `DesktopMousePositionResponse` | Get mouse position |
| `moveDesktopMouse(request)` | `DesktopMousePositionResponse` | Move mouse |
| `clickDesktop(request)` | `DesktopMousePositionResponse` | Click mouse |
| `mouseDownDesktop(request)` | `DesktopMousePositionResponse` | Mouse button down |
| `mouseUpDesktop(request)` | `DesktopMousePositionResponse` | Mouse button up |
| `dragDesktopMouse(request)` | `DesktopMousePositionResponse` | Drag mouse |
| `scrollDesktop(request)` | `DesktopMousePositionResponse` | Scroll |
| `typeDesktopText(request)` | `DesktopActionResponse` | Type text |
| `pressDesktopKey(request)` | `DesktopActionResponse` | Press key |
| `keyDownDesktop(request)` | `DesktopActionResponse` | Key down |
| `keyUpDesktop(request)` | `DesktopActionResponse` | Key up |
| `getDesktopDisplayInfo()` | `DesktopDisplayInfoResponse` | Get display info |
| `listDesktopWindows()` | `DesktopWindowListResponse` | List windows |
| `getDesktopFocusedWindow()` | `DesktopWindowInfo` | Get focused window |
| `focusDesktopWindow(id)` | `DesktopWindowInfo` | Focus a window |
| `moveDesktopWindow(id, request)` | `DesktopWindowInfo` | Move a window |
| `resizeDesktopWindow(id, request)` | `DesktopWindowInfo` | Resize a window |
| `getDesktopClipboard(query?)` | `DesktopClipboardResponse` | Read clipboard |
| `setDesktopClipboard(request)` | `DesktopActionResponse` | Write clipboard |
| `launchDesktopApp(request)` | `DesktopLaunchResponse` | Launch an app |
| `openDesktopTarget(request)` | `DesktopOpenResponse` | Open file/URL |
| `startDesktopRecording(request?)` | `DesktopRecordingInfo` | Start recording |
| `stopDesktopRecording()` | `DesktopRecordingInfo` | Stop recording |
| `listDesktopRecordings()` | `DesktopRecordingListResponse` | List recordings |
| `getDesktopRecording(id)` | `DesktopRecordingInfo` | Get recording |
| `downloadDesktopRecording(id)` | `Uint8Array` | Download recording |
| `deleteDesktopRecording(id)` | `void` | Delete recording |
| `startDesktopStream()` | `DesktopStreamStatusResponse` | Start streaming |
| `stopDesktopStream()` | `DesktopStreamStatusResponse` | Stop streaming |
| `getDesktopStreamStatus()` | `DesktopStreamStatusResponse` | Stream status |
## Customizing the desktop environment
The desktop runs inside the sandbox filesystem, so you can customize it using the [File System](/file-system) API before or after starting the desktop. The desktop HOME directory is located at `~/.local/state/sandbox-agent/desktop/home` (or `$XDG_STATE_HOME/sandbox-agent/desktop/home` if `XDG_STATE_HOME` is set).
All configuration files below are written to paths relative to this HOME directory.
### Window manager (openbox)
The desktop uses [openbox](http://openbox.org/) as its window manager. You can customize its behavior, theme, and keyboard shortcuts by writing an `rc.xml` config file.
<CodeGroup>
```ts TypeScript
const openboxConfig = `<?xml version="1.0" encoding="UTF-8"?>
<openbox_config xmlns="http://openbox.org/3.4/rc">
<theme>
<name>Clearlooks</name>
<titleLayout>NLIMC</titleLayout>
<font place="ActiveWindow"><name>DejaVu Sans</name><size>10</size></font>
</theme>
<desktops><number>1</number></desktops>
<keyboard>
<keybind key="A-F4"><action name="Close"/></keybind>
<keybind key="A-Tab"><action name="NextWindow"/></keybind>
</keyboard>
</openbox_config>`;
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/rc.xml" },
openboxConfig,
);
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/fs/mkdir?path=~/.local/state/sandbox-agent/desktop/home/.config/openbox"
curl -X PUT "http://127.0.0.1:2468/v1/fs/file?path=~/.local/state/sandbox-agent/desktop/home/.config/openbox/rc.xml" \
-H "Content-Type: application/octet-stream" \
--data-binary @rc.xml
```
</CodeGroup>
### Autostart programs
Openbox runs scripts in `~/.config/openbox/autostart` on startup. Use this to launch applications, set the background, or configure the environment.
<CodeGroup>
```ts TypeScript
const autostart = `#!/bin/sh
# Set a solid background color
xsetroot -solid "#1e1e2e" &
# Launch a terminal
xterm -geometry 120x40+50+50 &
# Launch a browser
firefox --no-remote &
`;
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" },
autostart,
);
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/fs/mkdir?path=~/.local/state/sandbox-agent/desktop/home/.config/openbox"
curl -X PUT "http://127.0.0.1:2468/v1/fs/file?path=~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" \
-H "Content-Type: application/octet-stream" \
--data-binary @autostart.sh
```
</CodeGroup>
<Note>
The autostart script runs when openbox starts, which happens during `startDesktop()`. Write the autostart file before calling `startDesktop()` for it to take effect.
</Note>
### Background
There is no wallpaper set by default (the background is the X root window default). You can set it using `xsetroot` in the autostart script (as shown above), or use `feh` if you need an image:
<CodeGroup>
```ts TypeScript
// Upload a wallpaper image
import fs from "node:fs";
const wallpaper = await fs.promises.readFile("./wallpaper.png");
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/wallpaper.png" },
wallpaper,
);
// Set the autostart to apply it
const autostart = `#!/bin/sh
feh --bg-fill ~/wallpaper.png &
`;
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" },
autostart,
);
```
```bash cURL
curl -X PUT "http://127.0.0.1:2468/v1/fs/file?path=~/.local/state/sandbox-agent/desktop/home/wallpaper.png" \
-H "Content-Type: application/octet-stream" \
--data-binary @wallpaper.png
curl -X PUT "http://127.0.0.1:2468/v1/fs/file?path=~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" \
-H "Content-Type: application/octet-stream" \
--data-binary @autostart.sh
```
</CodeGroup>
<Note>
`feh` is not installed by default. Install it via the [Process API](/processes) before starting the desktop: `await sdk.runProcess({ command: "apt-get", args: ["install", "-y", "feh"] })`.
</Note>
### Fonts
Only `fonts-dejavu-core` is installed by default. To add more fonts, install them with your system package manager or copy font files into the sandbox:
<CodeGroup>
```ts TypeScript
// Install a font package
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "fonts-noto", "fonts-liberation"],
});
// Or copy a custom font file
import fs from "node:fs";
const font = await fs.promises.readFile("./CustomFont.ttf");
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.local/share/fonts" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.local/share/fonts/CustomFont.ttf" },
font,
);
// Rebuild the font cache
await sdk.runProcess({ command: "fc-cache", args: ["-fv"] });
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","fonts-noto","fonts-liberation"]}'
curl -X POST "http://127.0.0.1:2468/v1/fs/mkdir?path=~/.local/state/sandbox-agent/desktop/home/.local/share/fonts"
curl -X PUT "http://127.0.0.1:2468/v1/fs/file?path=~/.local/state/sandbox-agent/desktop/home/.local/share/fonts/CustomFont.ttf" \
-H "Content-Type: application/octet-stream" \
--data-binary @CustomFont.ttf
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"fc-cache","args":["-fv"]}'
```
</CodeGroup>
### Cursor theme
<CodeGroup>
```ts TypeScript
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "dmz-cursor-theme"],
});
const xresources = `Xcursor.theme: DMZ-White\nXcursor.size: 24\n`;
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.Xresources" },
xresources,
);
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/processes/run" \
-H "Content-Type: application/json" \
-d '{"command":"apt-get","args":["install","-y","dmz-cursor-theme"]}'
curl -X PUT "http://127.0.0.1:2468/v1/fs/file?path=~/.local/state/sandbox-agent/desktop/home/.Xresources" \
-H "Content-Type: application/octet-stream" \
--data-binary 'Xcursor.theme: DMZ-White\nXcursor.size: 24'
```
</CodeGroup>
<Note>
Run `xrdb -merge ~/.Xresources` (via the autostart or process API) after writing the file for changes to take effect.
</Note>
### Shell and terminal
No terminal emulator or shell is launched by default. Add one to the openbox autostart:
```sh
# In ~/.config/openbox/autostart
xterm -geometry 120x40+50+50 &
```
To use a different shell, set the `SHELL` environment variable in your Dockerfile or install your preferred shell and configure the terminal to use it.
### GTK theme
Applications using GTK will pick up settings from `~/.config/gtk-3.0/settings.ini`:
<CodeGroup>
```ts TypeScript
const gtkSettings = `[Settings]
gtk-theme-name=Adwaita
gtk-icon-theme-name=Adwaita
gtk-font-name=DejaVu Sans 10
gtk-cursor-theme-name=DMZ-White
gtk-cursor-theme-size=24
`;
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0/settings.ini" },
gtkSettings,
);
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/fs/mkdir?path=~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0"
curl -X PUT "http://127.0.0.1:2468/v1/fs/file?path=~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0/settings.ini" \
-H "Content-Type: application/octet-stream" \
--data-binary @settings.ini
```
</CodeGroup>
### Summary of configuration paths
All paths are relative to the desktop HOME directory (`~/.local/state/sandbox-agent/desktop/home`).
| What | Path | Notes |
|------|------|-------|
| Openbox config | `.config/openbox/rc.xml` | Window manager theme, keybindings, behavior |
| Autostart | `.config/openbox/autostart` | Shell script run on desktop start |
| Custom fonts | `.local/share/fonts/` | TTF/OTF files, run `fc-cache -fv` after |
| Cursor theme | `.Xresources` | Requires `xrdb -merge` to apply |
| GTK 3 settings | `.config/gtk-3.0/settings.ini` | Theme, icons, fonts for GTK apps |
| Wallpaper | Any path, referenced from autostart | Requires `feh` or similar tool |

View file

@ -15,43 +15,64 @@ Run the published full image with all supported agents pre-installed:
docker run --rm -p 3000:3000 \
-e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
-e OPENAI_API_KEY="$OPENAI_API_KEY" \
rivetdev/sandbox-agent:0.4.1-rc.1-full \
rivetdev/sandbox-agent:0.3.1-full \
server --no-token --host 0.0.0.0 --port 3000
```
The `0.4.1-rc.1-full` tag pins the exact version. The moving `full` tag is also published for contributors who want the latest full image.
The `0.3.1-full` tag pins the exact version. The moving `full` tag is also published for contributors who want the latest full image.
## TypeScript with the Docker provider
If you also want the desktop API inside the container, install desktop dependencies before starting the server:
```bash
npm install sandbox-agent@0.3.x dockerode get-port
docker run --rm -p 3000:3000 \
-e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
-e OPENAI_API_KEY="$OPENAI_API_KEY" \
node:22-bookworm-slim sh -c "\
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y curl ca-certificates bash libstdc++6 && \
rm -rf /var/lib/apt/lists/* && \
curl -fsSL https://releases.rivet.dev/sandbox-agent/0.3.x/install.sh | sh && \
sandbox-agent install desktop --yes && \
sandbox-agent server --no-token --host 0.0.0.0 --port 3000"
```
```typescript
import { SandboxAgent } from "sandbox-agent";
import { docker } from "sandbox-agent/docker";
In a Dockerfile:
const sdk = await SandboxAgent.start({
sandbox: docker({
env: [
`ANTHROPIC_API_KEY=${process.env.ANTHROPIC_API_KEY}`,
`OPENAI_API_KEY=${process.env.OPENAI_API_KEY}`,
].filter(Boolean),
}),
```dockerfile
RUN sandbox-agent install desktop --yes
```
## TypeScript with dockerode
```typescript
import Docker from "dockerode";
import { SandboxAgent } from "sandbox-agent";
const docker = new Docker();
const PORT = 3000;
const container = await docker.createContainer({
Image: "rivetdev/sandbox-agent:0.3.1-full",
Cmd: ["server", "--no-token", "--host", "0.0.0.0", "--port", `${PORT}`],
Env: [
`ANTHROPIC_API_KEY=${process.env.ANTHROPIC_API_KEY}`,
`OPENAI_API_KEY=${process.env.OPENAI_API_KEY}`,
`CODEX_API_KEY=${process.env.CODEX_API_KEY}`,
].filter(Boolean),
ExposedPorts: { [`${PORT}/tcp`]: {} },
HostConfig: {
AutoRemove: true,
PortBindings: { [`${PORT}/tcp`]: [{ HostPort: `${PORT}` }] },
},
});
try {
const session = await sdk.createSession({ agent: "codex" });
await session.prompt([{ type: "text", text: "Summarize this repository." }]);
} finally {
await sdk.destroySandbox();
}
```
await container.start();
The `docker` provider uses the `rivetdev/sandbox-agent:0.4.1-rc.1-full` image by default. Override with `image`:
const baseUrl = `http://127.0.0.1:${PORT}`;
const sdk = await SandboxAgent.connect({ baseUrl });
```typescript
docker({ image: "my-custom-image:latest" })
const session = await sdk.createSession({ agent: "codex" });
await session.prompt([{ type: "text", text: "Summarize this repository." }]);
```
## Building a custom image with everything preinstalled

View file

@ -87,7 +87,7 @@
},
{
"group": "System",
"pages": ["file-system", "processes"]
"pages": ["file-system", "processes", "computer-use", "common-software"]
},
{
"group": "Orchestration",

View file

@ -35,6 +35,7 @@ console.log(url);
- Prompt testing
- Request/response debugging
- Interactive permission prompts (approve, always-allow, or reject tool-use requests)
- Desktop panel for status, remediation, start/stop, and screenshot refresh
- Process management (create, stop, kill, delete, view logs)
- Interactive PTY terminal for tty processes
- One-shot command execution
@ -50,3 +51,16 @@ console.log(url);
The Inspector includes an embedded Ghostty-based terminal for interactive tty
processes. The UI uses the SDK's high-level `connectProcessTerminal(...)`
wrapper via the shared `@sandbox-agent/react` `ProcessTerminal` component.
## Desktop panel
The `Desktop` panel shows the current desktop runtime state, missing dependencies,
the suggested install command, last error details, process/log paths, and the
latest captured screenshot.
Use it to:
- Check whether desktop dependencies are installed
- Start or stop the managed desktop runtime
- Refresh desktop status
- Capture a fresh screenshot on demand

File diff suppressed because it is too large Load diff

View file

@ -1,370 +1,289 @@
---
title: "Quickstart"
description: "Get a coding agent running in a sandbox in under a minute."
description: "Start the server and send your first message."
icon: "rocket"
---
<Steps>
<Step title="Install">
<Step title="Install skill (optional)">
<Tabs>
<Tab title="npm">
<Tab title="npx">
```bash
npx skills add rivet-dev/skills -s sandbox-agent
```
</Tab>
<Tab title="bunx">
```bash
bunx skills add rivet-dev/skills -s sandbox-agent
```
</Tab>
</Tabs>
</Step>
<Step title="Set environment variables">
Each coding agent requires API keys to connect to their respective LLM providers.
<Tabs>
<Tab title="Local shell">
```bash
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
```
</Tab>
<Tab title="E2B">
```typescript
import { Sandbox } from "@e2b/code-interpreter";
const envs: Record<string, string> = {};
if (process.env.ANTHROPIC_API_KEY) envs.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
if (process.env.OPENAI_API_KEY) envs.OPENAI_API_KEY = process.env.OPENAI_API_KEY;
const sandbox = await Sandbox.create({ envs });
```
</Tab>
<Tab title="Daytona">
```typescript
import { Daytona } from "@daytonaio/sdk";
const envVars: Record<string, string> = {};
if (process.env.ANTHROPIC_API_KEY) envVars.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
if (process.env.OPENAI_API_KEY) envVars.OPENAI_API_KEY = process.env.OPENAI_API_KEY;
const daytona = new Daytona();
const sandbox = await daytona.create({
snapshot: "sandbox-agent-ready",
envVars,
});
```
</Tab>
<Tab title="Docker">
```bash
docker run -p 2468:2468 \
-e ANTHROPIC_API_KEY="sk-ant-..." \
-e OPENAI_API_KEY="sk-..." \
rivetdev/sandbox-agent:0.3.1-full \
server --no-token --host 0.0.0.0 --port 2468
```
</Tab>
</Tabs>
<AccordionGroup>
<Accordion title="Extracting API keys from current machine">
Use `sandbox-agent credentials extract-env --export` to extract your existing API keys (Anthropic, OpenAI, etc.) from local Claude Code or Codex config files.
</Accordion>
<Accordion title="Testing without API keys">
Use the `mock` agent for SDK and integration testing without provider credentials.
</Accordion>
<Accordion title="Multi-tenant and per-user billing">
For per-tenant token tracking, budget enforcement, or usage-based billing, see [LLM Credentials](/llm-credentials) for gateway options like OpenRouter, LiteLLM, and Portkey.
</Accordion>
</AccordionGroup>
</Step>
<Step title="Run the server">
<Tabs>
<Tab title="curl">
Install and run the binary directly.
```bash
curl -fsSL https://releases.rivet.dev/sandbox-agent/0.3.x/install.sh | sh
sandbox-agent server --no-token --host 0.0.0.0 --port 2468
```
</Tab>
<Tab title="npx">
Run without installing globally.
```bash
npx @sandbox-agent/cli@0.3.x server --no-token --host 0.0.0.0 --port 2468
```
</Tab>
<Tab title="bunx">
Run without installing globally.
```bash
bunx @sandbox-agent/cli@0.3.x server --no-token --host 0.0.0.0 --port 2468
```
</Tab>
<Tab title="npm i -g">
Install globally, then run.
```bash
npm install -g @sandbox-agent/cli@0.3.x
sandbox-agent server --no-token --host 0.0.0.0 --port 2468
```
</Tab>
<Tab title="bun add -g">
Install globally, then run.
```bash
bun add -g @sandbox-agent/cli@0.3.x
# Allow Bun to run postinstall scripts for native binaries (required for SandboxAgent.start()).
bun pm -g trust @sandbox-agent/cli-linux-x64 @sandbox-agent/cli-linux-arm64 @sandbox-agent/cli-darwin-arm64 @sandbox-agent/cli-darwin-x64 @sandbox-agent/cli-win32-x64
sandbox-agent server --no-token --host 0.0.0.0 --port 2468
```
</Tab>
<Tab title="Node.js (local)">
For local development, use `SandboxAgent.start()` to spawn and manage the server as a subprocess.
```bash
npm install sandbox-agent@0.3.x
```
```typescript
import { SandboxAgent } from "sandbox-agent";
const sdk = await SandboxAgent.start();
```
</Tab>
<Tab title="bun">
<Tab title="Bun (local)">
For local development, use `SandboxAgent.start()` to spawn and manage the server as a subprocess.
```bash
bun add sandbox-agent@0.3.x
# Allow Bun to run postinstall scripts for native binaries (required for SandboxAgent.start()).
bun pm trust @sandbox-agent/cli-linux-x64 @sandbox-agent/cli-linux-arm64 @sandbox-agent/cli-darwin-arm64 @sandbox-agent/cli-darwin-x64 @sandbox-agent/cli-win32-x64
```
</Tab>
</Tabs>
</Step>
<Step title="Start the sandbox">
`SandboxAgent.start()` provisions a sandbox, starts a lightweight [Sandbox Agent server](/architecture) inside it, and connects your SDK client.
<Tabs>
<Tab title="Local">
```bash
npm install sandbox-agent@0.3.x
```
```typescript
import { SandboxAgent } from "sandbox-agent";
import { local } from "sandbox-agent/local";
// Runs on your machine. Inherits process.env automatically.
const client = await SandboxAgent.start({
sandbox: local(),
});
const sdk = await SandboxAgent.start();
```
See [Local deploy guide](/deploy/local)
</Tab>
<Tab title="E2B">
<Tab title="Build from source">
If you're running from source instead of the installed CLI.
```bash
npm install sandbox-agent@0.3.x @e2b/code-interpreter
cargo run -p sandbox-agent -- server --no-token --host 0.0.0.0 --port 2468
```
```typescript
import { SandboxAgent } from "sandbox-agent";
import { e2b } from "sandbox-agent/e2b";
// Provisions a cloud sandbox on E2B, installs the server, and connects.
const client = await SandboxAgent.start({
sandbox: e2b(),
});
```
See [E2B deploy guide](/deploy/e2b)
</Tab>
<Tab title="Daytona">
```bash
npm install sandbox-agent@0.3.x @daytonaio/sdk
```
```typescript
import { SandboxAgent } from "sandbox-agent";
import { daytona } from "sandbox-agent/daytona";
// Provisions a Daytona workspace with the server pre-installed.
const client = await SandboxAgent.start({
sandbox: daytona(),
});
```
See [Daytona deploy guide](/deploy/daytona)
</Tab>
<Tab title="Vercel">
```bash
npm install sandbox-agent@0.3.x @vercel/sandbox
```
```typescript
import { SandboxAgent } from "sandbox-agent";
import { vercel } from "sandbox-agent/vercel";
// Provisions a Vercel sandbox with the server installed on boot.
const client = await SandboxAgent.start({
sandbox: vercel(),
});
```
See [Vercel deploy guide](/deploy/vercel)
</Tab>
<Tab title="Modal">
```bash
npm install sandbox-agent@0.3.x modal
```
```typescript
import { SandboxAgent } from "sandbox-agent";
import { modal } from "sandbox-agent/modal";
// Builds a container image with agents pre-installed (cached after first run),
// starts a Modal sandbox from that image, and connects.
const client = await SandboxAgent.start({
sandbox: modal(),
});
```
See [Modal deploy guide](/deploy/modal)
</Tab>
<Tab title="Cloudflare">
```bash
npm install sandbox-agent@0.3.x @cloudflare/sandbox
```
```typescript
import { SandboxAgent } from "sandbox-agent";
import { cloudflare } from "sandbox-agent/cloudflare";
import { SandboxClient } from "@cloudflare/sandbox";
// Uses the Cloudflare Sandbox SDK to provision and connect.
// The Cloudflare SDK handles server lifecycle internally.
const cfSandboxClient = new SandboxClient();
const client = await SandboxAgent.start({
sandbox: cloudflare({ sdk: cfSandboxClient }),
});
```
See [Cloudflare deploy guide](/deploy/cloudflare)
</Tab>
<Tab title="Docker">
```bash
npm install sandbox-agent@0.3.x dockerode get-port
```
```typescript
import { SandboxAgent } from "sandbox-agent";
import { docker } from "sandbox-agent/docker";
// Runs a Docker container locally. Good for testing.
const client = await SandboxAgent.start({
sandbox: docker(),
});
```
See [Docker deploy guide](/deploy/docker)
</Tab>
</Tabs>
<div style={{ height: "1rem" }} />
**More info:**
Binding to `0.0.0.0` allows the server to accept connections from any network interface, which is required when running inside a sandbox where clients connect remotely.
<AccordionGroup>
<Accordion title="Passing LLM credentials">
Agents need API keys for their LLM provider. Each provider passes credentials differently:
<Accordion title="Configuring token">
Tokens are usually not required. Most sandbox providers (E2B, Daytona, etc.) already secure networking at the infrastructure layer.
```typescript
// Local — inherits process.env automatically
If you expose the server publicly, use `--token "$SANDBOX_TOKEN"` to require authentication:
// E2B
e2b({ create: { envs: { ANTHROPIC_API_KEY: "..." } } })
// Daytona
daytona({ create: { envVars: { ANTHROPIC_API_KEY: "..." } } })
// Vercel
vercel({ create: { env: { ANTHROPIC_API_KEY: "..." } } })
// Modal
modal({ create: { secrets: { ANTHROPIC_API_KEY: "..." } } })
// Docker
docker({ env: ["ANTHROPIC_API_KEY=..."] })
```bash
sandbox-agent server --token "$SANDBOX_TOKEN" --host 0.0.0.0 --port 2468
```
For multi-tenant billing, per-user keys, and gateway options, see [LLM Credentials](/llm-credentials).
</Accordion>
Then pass the token when connecting:
<Accordion title="Implementing a custom provider">
Implement the `SandboxProvider` interface to use any sandbox platform:
```typescript
import { SandboxAgent, type SandboxProvider } from "sandbox-agent";
const myProvider: SandboxProvider = {
name: "my-provider",
async create() {
// Provision a sandbox, install & start the server, return an ID
return "sandbox-123";
},
async destroy(sandboxId) {
// Tear down the sandbox
},
async getUrl(sandboxId) {
// Return the Sandbox Agent server URL
return `https://${sandboxId}.my-platform.dev:3000`;
},
};
const client = await SandboxAgent.start({
sandbox: myProvider,
});
```
</Accordion>
<Accordion title="Connecting to an existing server">
If you already have a Sandbox Agent server running, connect directly:
```typescript
const client = await SandboxAgent.connect({
baseUrl: "http://127.0.0.1:2468",
});
```
</Accordion>
<Accordion title="Starting the server manually">
<Tabs>
<Tab title="TypeScript">
```typescript
import { SandboxAgent } from "sandbox-agent";
const sdk = await SandboxAgent.connect({
baseUrl: "http://your-server:2468",
token: process.env.SANDBOX_TOKEN,
});
```
</Tab>
<Tab title="curl">
```bash
curl -fsSL https://releases.rivet.dev/sandbox-agent/0.3.x/install.sh | sh
sandbox-agent server --no-token --host 0.0.0.0 --port 2468
curl "http://your-server:2468/v1/health" \
-H "Authorization: Bearer $SANDBOX_TOKEN"
```
</Tab>
<Tab title="npx">
<Tab title="CLI">
```bash
npx @sandbox-agent/cli@0.3.x server --no-token --host 0.0.0.0 --port 2468
```
</Tab>
<Tab title="Docker">
```bash
docker run -p 2468:2468 \
-e ANTHROPIC_API_KEY="sk-ant-..." \
-e OPENAI_API_KEY="sk-..." \
rivetdev/sandbox-agent:0.4.1-rc.1-full \
server --no-token --host 0.0.0.0 --port 2468
sandbox-agent --token "$SANDBOX_TOKEN" api agents list \
--endpoint http://your-server:2468
```
</Tab>
</Tabs>
</Accordion>
<Accordion title="CORS">
If you're calling the server from a browser, see the [CORS configuration guide](/cors).
</Accordion>
</AccordionGroup>
</Step>
<Step title="Create a session and send a prompt">
<CodeGroup>
<Step title="Install agents (optional)">
To preinstall agents:
```typescript Claude
const session = await client.createSession({
agent: "claude",
});
session.onEvent((event) => {
console.log(event.sender, event.payload);
});
const result = await session.prompt([
{ type: "text", text: "Summarize the repository and suggest next steps." },
]);
console.log(result.stopReason);
```
```typescript Codex
const session = await client.createSession({
agent: "codex",
});
session.onEvent((event) => {
console.log(event.sender, event.payload);
});
const result = await session.prompt([
{ type: "text", text: "Summarize the repository and suggest next steps." },
]);
console.log(result.stopReason);
```
```typescript OpenCode
const session = await client.createSession({
agent: "opencode",
});
session.onEvent((event) => {
console.log(event.sender, event.payload);
});
const result = await session.prompt([
{ type: "text", text: "Summarize the repository and suggest next steps." },
]);
console.log(result.stopReason);
```
```typescript Cursor
const session = await client.createSession({
agent: "cursor",
});
session.onEvent((event) => {
console.log(event.sender, event.payload);
});
const result = await session.prompt([
{ type: "text", text: "Summarize the repository and suggest next steps." },
]);
console.log(result.stopReason);
```
```typescript Amp
const session = await client.createSession({
agent: "amp",
});
session.onEvent((event) => {
console.log(event.sender, event.payload);
});
const result = await session.prompt([
{ type: "text", text: "Summarize the repository and suggest next steps." },
]);
console.log(result.stopReason);
```
```typescript Pi
const session = await client.createSession({
agent: "pi",
});
session.onEvent((event) => {
console.log(event.sender, event.payload);
});
const result = await session.prompt([
{ type: "text", text: "Summarize the repository and suggest next steps." },
]);
console.log(result.stopReason);
```
</CodeGroup>
See [Agent Sessions](/agent-sessions) for the full sessions API.
</Step>
<Step title="Clean up">
```typescript
await client.destroySandbox(); // provider-defined cleanup and disconnect
```bash
sandbox-agent install-agent --all
```
Use `client.dispose()` instead to disconnect without changing sandbox state. On E2B, `client.pauseSandbox()` pauses the sandbox and `client.killSandbox()` deletes it permanently.
If agents are not installed up front, they are lazily installed when creating a session.
</Step>
<Step title="Inspect with the UI">
Open the Inspector at `/ui/` on your server (e.g. `http://localhost:2468/ui/`) to view sessions and events in a GUI.
<Step title="Install desktop dependencies (optional, Linux only)">
If you want to use `/v1/desktop/*`, install the desktop runtime packages first:
```bash
sandbox-agent install desktop --yes
```
Then use `GET /v1/desktop/status` or `sdk.getDesktopStatus()` to verify the runtime is ready before calling desktop screenshot or input APIs.
</Step>
<Step title="Create a session">
```typescript
import { SandboxAgent } from "sandbox-agent";
const sdk = await SandboxAgent.connect({
baseUrl: "http://127.0.0.1:2468",
});
const session = await sdk.createSession({
agent: "claude",
sessionInit: {
cwd: "/",
mcpServers: [],
},
});
console.log(session.id);
```
</Step>
<Step title="Send a message">
```typescript
const result = await session.prompt([
{ type: "text", text: "Summarize the repository and suggest next steps." },
]);
console.log(result.stopReason);
```
</Step>
<Step title="Read events">
```typescript
const off = session.onEvent((event) => {
console.log(event.sender, event.payload);
});
const page = await sdk.getEvents({
sessionId: session.id,
limit: 50,
});
console.log(page.items.length);
off();
```
</Step>
<Step title="Test with Inspector">
Open the Inspector UI at `/ui/` on your server (for example, `http://localhost:2468/ui/`) to inspect sessions and events in a GUI.
<Frame>
<img src="/images/inspector.png" alt="Sandbox Agent Inspector" />
@ -372,44 +291,16 @@ icon: "rocket"
</Step>
</Steps>
## Full example
```typescript
import { SandboxAgent } from "sandbox-agent";
import { e2b } from "sandbox-agent/e2b";
const client = await SandboxAgent.start({
sandbox: e2b({
create: {
envs: { ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY },
},
}),
});
try {
const session = await client.createSession({ agent: "claude" });
session.onEvent((event) => {
console.log(`[${event.sender}]`, JSON.stringify(event.payload));
});
const result = await session.prompt([
{ type: "text", text: "Write a function that checks if a number is prime." },
]);
console.log("Done:", result.stopReason);
} finally {
await client.destroySandbox();
}
```
## Next steps
<CardGroup cols={2}>
<Card title="SDK Overview" icon="compass" href="/sdk-overview">
Full TypeScript SDK API surface.
<CardGroup cols={3}>
<Card title="Session Persistence" icon="database" href="/session-persistence">
Configure in-memory, Rivet Actor state, IndexedDB, SQLite, and Postgres persistence.
</Card>
<Card title="Deploy to a Sandbox" icon="box" href="/deploy/local">
Deploy to E2B, Daytona, Docker, Vercel, or Cloudflare.
Deploy your agent to E2B, Daytona, Docker, Vercel, or Cloudflare.
</Card>
<Card title="SDK Overview" icon="compass" href="/sdk-overview">
Use the latest TypeScript SDK API.
</Card>
</CardGroup>

View file

@ -196,6 +196,44 @@ const writeResult = await sdk.writeFsFile({ path: "./hello.txt" }, "hello");
console.log(health.status, agents.agents.length, entries.length, writeResult.path);
```
## Desktop API
The SDK also wraps the desktop host/runtime HTTP API.
Install desktop dependencies first on Linux hosts:
```bash
sandbox-agent install desktop --yes
```
Then query status, surface remediation if needed, and start the runtime:
```ts
const status = await sdk.getDesktopStatus();
if (status.state === "install_required") {
console.log(status.installCommand);
}
const started = await sdk.startDesktop({
width: 1440,
height: 900,
dpi: 96,
});
const screenshot = await sdk.takeDesktopScreenshot();
const displayInfo = await sdk.getDesktopDisplayInfo();
await sdk.moveDesktopMouse({ x: 400, y: 300 });
await sdk.clickDesktop({ x: 400, y: 300, button: "left", clickCount: 1 });
await sdk.typeDesktopText({ text: "hello world", delayMs: 10 });
await sdk.pressDesktopKey({ key: "ctrl+l" });
await sdk.stopDesktop();
```
Screenshot helpers return `Uint8Array` PNG bytes. The SDK does not attempt to install OS packages remotely; callers should surface `missingDependencies` and `installCommand` from `getDesktopStatus()`.
## Error handling
```ts

View file

@ -2889,6 +2889,181 @@
gap: 20px;
}
.desktop-panel {
display: flex;
flex-direction: column;
gap: 16px;
}
.desktop-state-grid {
display: grid;
grid-template-columns: repeat(3, minmax(0, 1fr));
gap: 12px;
margin-bottom: 12px;
}
.desktop-start-controls {
display: grid;
grid-template-columns: repeat(3, minmax(0, 1fr));
gap: 10px;
}
.desktop-screenshot-controls {
display: flex;
align-items: flex-end;
gap: 10px;
flex-wrap: wrap;
margin-bottom: 12px;
}
.desktop-checkbox-label {
display: flex;
align-items: center;
gap: 6px;
font-size: 12px;
cursor: pointer;
white-space: nowrap;
padding-bottom: 4px;
}
.desktop-advanced-grid {
display: grid;
grid-template-columns: repeat(3, minmax(0, 1fr));
gap: 10px;
margin-top: 8px;
}
.desktop-input-group {
display: flex;
flex-direction: column;
gap: 4px;
}
.desktop-chip-list {
display: flex;
flex-wrap: wrap;
gap: 8px;
}
.desktop-command {
margin-top: 6px;
padding: 8px 10px;
border-radius: var(--radius);
border: 1px solid var(--border);
background: var(--surface);
overflow-x: auto;
}
.desktop-diagnostic-block + .desktop-diagnostic-block {
margin-top: 14px;
}
.desktop-process-list {
display: flex;
flex-direction: column;
gap: 10px;
margin-top: 8px;
}
.desktop-process-item {
padding: 10px;
border-radius: var(--radius);
border: 1px solid var(--border);
background: var(--surface);
display: flex;
flex-direction: column;
gap: 4px;
}
.desktop-clipboard-text {
margin: 4px 0 0;
padding: 8px 10px;
border-radius: var(--radius);
border: 1px solid var(--border);
background: var(--surface);
font-size: 12px;
white-space: pre-wrap;
word-break: break-all;
max-height: 120px;
overflow-y: auto;
}
.desktop-window-item {
padding: 10px;
border-radius: var(--radius);
border: 1px solid var(--border);
background: var(--surface);
display: flex;
flex-direction: column;
gap: 6px;
}
.desktop-window-focused {
border-color: var(--success);
box-shadow: inset 0 0 0 1px var(--success);
}
.desktop-window-editor {
display: flex;
align-items: center;
gap: 6px;
margin-top: 6px;
padding-top: 6px;
border-top: 1px solid var(--border);
}
.desktop-launch-row {
display: flex;
align-items: center;
gap: 8px;
margin-top: 6px;
flex-wrap: wrap;
}
.desktop-mouse-pos {
display: flex;
align-items: center;
gap: 8px;
margin-top: 8px;
}
.desktop-stream-hint {
display: flex;
align-items: center;
justify-content: space-between;
gap: 12px;
margin-bottom: 8px;
font-size: 11px;
color: var(--muted);
}
.desktop-screenshot-empty {
padding: 18px;
border: 1px dashed var(--border);
border-radius: var(--radius);
color: var(--muted);
background: var(--surface);
text-align: center;
}
.desktop-screenshot-frame {
border-radius: calc(var(--radius) + 2px);
overflow: hidden;
border: 1px solid var(--border);
background:
linear-gradient(135deg, rgba(15, 23, 42, 0.9), rgba(30, 41, 59, 0.92)),
radial-gradient(circle at top right, rgba(56, 189, 248, 0.12), transparent 40%);
padding: 10px;
}
.desktop-screenshot-image {
display: block;
width: 100%;
height: auto;
border-radius: var(--radius);
background: rgba(0, 0, 0, 0.24);
}
.processes-section {
display: flex;
flex-direction: column;
@ -3551,6 +3726,12 @@
grid-template-columns: 1fr;
}
.desktop-state-grid,
.desktop-start-controls,
.desktop-advanced-grid {
grid-template-columns: 1fr;
}
.session-sidebar {
display: none;
}

View file

@ -18,6 +18,7 @@
"@types/react-dom": "^19.1.6",
"@vitejs/plugin-react": "^4.3.1",
"fake-indexeddb": "^6.2.4",
"jsdom": "^26.1.0",
"typescript": "^5.7.3",
"vite": "^5.4.7",
"vitest": "^3.0.0"

View file

@ -1,4 +1,4 @@
import { ChevronLeft, ChevronRight, Cloud, Play, PlayCircle, Server, Terminal, Wrench } from "lucide-react";
import { ChevronLeft, ChevronRight, Cloud, Monitor, Play, PlayCircle, Server, Terminal, Wrench } from "lucide-react";
import type { AgentInfo, SandboxAgent, SessionEvent } from "sandbox-agent";
type AgentModeInfo = { id: string; name: string; description: string };
@ -9,9 +9,10 @@ import ProcessesTab from "./ProcessesTab";
import ProcessRunTab from "./ProcessRunTab";
import SkillsTab from "./SkillsTab";
import RequestLogTab from "./RequestLogTab";
import DesktopTab from "./DesktopTab";
import type { RequestLog } from "../../types/requestLog";
export type DebugTab = "log" | "events" | "agents" | "mcp" | "skills" | "processes" | "run-process";
export type DebugTab = "log" | "events" | "agents" | "desktop" | "mcp" | "skills" | "processes" | "run-process";
const DebugPanel = ({
debugTab,
@ -75,6 +76,10 @@ const DebugPanel = ({
<Cloud className="button-icon" style={{ marginRight: 4, width: 12, height: 12 }} />
Agents
</button>
<button className={`debug-tab ${debugTab === "desktop" ? "active" : ""}`} onClick={() => onDebugTabChange("desktop")}>
<Monitor className="button-icon" style={{ marginRight: 4, width: 12, height: 12 }} />
Desktop
</button>
<button className={`debug-tab ${debugTab === "mcp" ? "active" : ""}`} onClick={() => onDebugTabChange("mcp")}>
<Server className="button-icon" style={{ marginRight: 4, width: 12, height: 12 }} />
MCP
@ -112,6 +117,8 @@ const DebugPanel = ({
/>
)}
{debugTab === "desktop" && <DesktopTab getClient={getClient} />}
{debugTab === "mcp" && <McpTab getClient={getClient} />}
{debugTab === "processes" && <ProcessesTab getClient={getClient} />}

View file

@ -0,0 +1,142 @@
// @vitest-environment jsdom
import { act } from "react";
import { createRoot, type Root } from "react-dom/client";
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import { SandboxAgent } from "sandbox-agent";
import {
createDockerTestLayout,
disposeDockerTestLayout,
startDockerSandboxAgent,
type DockerSandboxAgentHandle,
} from "../../../../../../sdks/typescript/tests/helpers/docker.ts";
import DesktopTab from "./DesktopTab";
type DockerTestLayout = ReturnType<typeof createDockerTestLayout>;
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function waitFor<T>(fn: () => T | undefined | null, timeoutMs = 20_000, stepMs = 50): Promise<T> {
const started = Date.now();
while (Date.now() - started < timeoutMs) {
const value = fn();
if (value !== undefined && value !== null) {
return value;
}
await sleep(stepMs);
}
throw new Error("timed out waiting for condition");
}
function findButton(container: HTMLElement, label: string): HTMLButtonElement | undefined {
return Array.from(container.querySelectorAll("button")).find((button) => button.textContent?.includes(label)) as HTMLButtonElement | undefined;
}
describe.sequential("DesktopTab", () => {
let container: HTMLDivElement;
let root: Root;
let layout: DockerTestLayout | undefined;
let handle: DockerSandboxAgentHandle | undefined;
let client: SandboxAgent | undefined;
beforeEach(() => {
(globalThis as { IS_REACT_ACT_ENVIRONMENT?: boolean }).IS_REACT_ACT_ENVIRONMENT = true;
container = document.createElement("div");
document.body.appendChild(container);
root = createRoot(container);
});
afterEach(async () => {
await act(async () => {
root.unmount();
});
if (client) {
await client.stopDesktop().catch(() => {});
await client.dispose().catch(() => {});
}
if (handle) {
await handle.dispose();
}
if (layout) {
disposeDockerTestLayout(layout);
}
container.remove();
delete (globalThis as { IS_REACT_ACT_ENVIRONMENT?: boolean }).IS_REACT_ACT_ENVIRONMENT;
client = undefined;
handle = undefined;
layout = undefined;
});
async function connectDesktopClient(options?: { pathMode?: "merge" | "replace" }): Promise<SandboxAgent> {
layout = createDockerTestLayout();
handle = await startDockerSandboxAgent(layout, {
timeoutMs: 30_000,
pathMode: options?.pathMode,
env: options?.pathMode === "replace" ? { PATH: layout.rootDir } : undefined,
});
client = await SandboxAgent.connect({
baseUrl: handle.baseUrl,
token: handle.token,
});
return client;
}
it("renders install remediation when desktop deps are missing", async () => {
const connectedClient = await connectDesktopClient({ pathMode: "replace" });
await act(async () => {
root.render(<DesktopTab getClient={() => connectedClient} />);
});
await waitFor(() => {
const text = container.textContent ?? "";
return text.includes("install_required") ? text : undefined;
});
expect(container.textContent).toContain("install_required");
expect(container.textContent).toContain("sandbox-agent install desktop --yes");
expect(container.textContent).toContain("Xvfb");
});
it("starts desktop, refreshes screenshot, and stops desktop", async () => {
const connectedClient = await connectDesktopClient();
await act(async () => {
root.render(<DesktopTab getClient={() => connectedClient} />);
});
await waitFor(() => {
const text = container.textContent ?? "";
return text.includes("inactive") ? true : undefined;
});
const startButton = await waitFor(() => findButton(container, "Start Desktop"));
await act(async () => {
startButton.dispatchEvent(new MouseEvent("click", { bubbles: true }));
});
await waitFor(() => {
const screenshot = container.querySelector("img[alt='Desktop screenshot']") as HTMLImageElement | null;
return screenshot?.src ? screenshot : undefined;
});
const screenshot = container.querySelector("img[alt='Desktop screenshot']") as HTMLImageElement | null;
expect(screenshot).toBeTruthy();
expect(screenshot?.src.startsWith("blob:") || screenshot?.src.startsWith("data:image/png")).toBe(true);
expect(container.textContent).toContain("active");
const stopButton = await waitFor(() => findButton(container, "Stop Desktop"));
await act(async () => {
stopButton.dispatchEvent(new MouseEvent("click", { bubbles: true }));
});
await waitFor(() => {
const text = container.textContent ?? "";
return text.includes("inactive") ? true : undefined;
});
expect(container.textContent).toContain("inactive");
});
});

File diff suppressed because it is too large Load diff

View file

@ -8,7 +8,7 @@ export default defineConfig(({ command }) => ({
port: 5173,
proxy: {
"/v1": {
target: "http://localhost:2468",
target: process.env.SANDBOX_AGENT_URL || "http://localhost:2468",
changeOrigin: true,
ws: true,
},

527
pnpm-lock.yaml generated

File diff suppressed because it is too large Load diff

View file

@ -277,3 +277,13 @@ Update this file continuously during the migration.
- Owner: Unassigned.
- Status: resolved
- Links: `sdks/acp-http-client/src/index.ts`, `sdks/acp-http-client/tests/smoke.test.ts`, `sdks/typescript/tests/integration.test.ts`
- Date: 2026-03-07
- Area: Desktop host/runtime API boundary
- Issue: Desktop automation needed screenshot/input/file-transfer-like host capabilities, but routing it through ACP would have mixed agent protocol semantics with host-owned runtime control and binary payloads.
- Impact: A desktop feature built as ACP methods would blur the division between agent/session behavior and Sandbox Agent host/runtime APIs, and would complicate binary screenshot transport.
- Proposed direction: Ship desktop as first-party HTTP endpoints under `/v1/desktop/*`, keep health/install/remediation in the server runtime, and expose the feature through the SDK and inspector without ACP extension methods.
- Decision: Accepted and implemented for phase one.
- Owner: Unassigned.
- Status: resolved
- Links: `server/packages/sandbox-agent/src/router.rs`, `server/packages/sandbox-agent/src/desktop_runtime.rs`, `sdks/typescript/src/client.ts`, `frontend/packages/inspector/src/components/debug/DesktopTab.tsx`

View file

@ -0,0 +1,103 @@
# Desktop Streaming Architecture
## Decision: neko over GStreamer (direct) and VNC
We evaluated three approaches for streaming the virtual desktop to browser clients:
1. **VNC (noVNC/websockify)** - traditional remote desktop
2. **GStreamer WebRTC (direct)** - custom GStreamer pipeline in the sandbox agent process
3. **neko** - standalone WebRTC streaming server with its own GStreamer pipeline
We chose **neko**.
## Approach comparison
### VNC (noVNC)
- Uses RFB protocol, not WebRTC. Relies on pixel-diff framebuffer updates over WebSocket.
- Higher latency than WebRTC (no hardware-accelerated codec, no adaptive bitrate).
- Requires a VNC server (x11vnc or similar) plus websockify for browser access.
- Input handling is mature but tied to the RFB protocol.
- No audio support without additional plumbing.
**Rejected because:** Latency is noticeably worse than WebRTC-based approaches. The pixel-diff approach doesn't scale well at higher resolutions or frame rates. No native audio path.
### GStreamer WebRTC (direct)
- Custom pipeline: `ximagesrc -> videoconvert -> vp8enc -> rtpvp8pay -> webrtcbin`.
- Runs inside the sandbox agent Rust process using `gstreamer-rs` bindings.
- Requires feature-gating (`desktop-gstreamer` Cargo feature) and linking GStreamer at compile time.
- ICE candidate handling is complex: Docker-internal IPs (172.17.x.x) must be rewritten to 127.0.0.1 for host browser connectivity.
- UDP port range must be constrained via libnice NiceAgent properties to stay within Docker-forwarded ports.
- Input must be implemented separately (xdotool or custom X11 input injection).
- No built-in session management, authentication, or multi-client support.
**Rejected because:** Too much complexity for the sandbox agent to own directly. ICE/NAT traversal bugs are hard to debug. The GStreamer Rust bindings add significant compile-time dependencies. Input handling requires a separate implementation. We built and tested this approach (branch `desktop-computer-use`, PR #226) and found:
- Black screen issues due to GStreamer pipeline negotiation failures
- ICE candidate rewriting fragility across Docker networking modes
- libnice port range configuration requires accessing internal NiceAgent properties that vary across GStreamer versions
- No data channel for low-latency input (had to fall back to WebSocket-based input which adds a round trip)
### neko (chosen)
- Standalone Go binary extracted from `ghcr.io/m1k1o/neko/base`.
- Has its own GStreamer pipeline internally (same `ximagesrc -> vp8enc -> webrtcbin` approach, but battle-tested).
- Provides WebSocket signaling, WebRTC media, and a binary data channel for input, all out of the box.
- Input via data channel is low-latency (sub-frame, no HTTP round trip). Uses X11 XTEST extension.
- Multi-session support with `noauth` provider (each browser tab gets its own session).
- ICE-lite mode with `--webrtc.nat1to1 127.0.0.1` eliminates NAT traversal issues for Docker-to-host.
- EPR (ephemeral port range) flag constrains UDP ports cleanly.
- Sandbox agent acts as a thin WebSocket proxy: browser WS connects to sandbox agent, which creates a per-connection neko login session and relays signaling messages bidirectionally.
- Audio codec support (opus) included for free.
**Chosen because:** Neko encapsulates all the hard WebRTC/GStreamer/input complexity into a single binary. The sandbox agent only needs to:
1. Manage the neko process lifecycle (start/stop via the process runtime)
2. Proxy WebSocket signaling (bidirectional relay, ~60 lines of code)
3. Handle neko session creation (HTTP login to get a session cookie)
This keeps the sandbox agent's desktop streaming code simple (~300 lines for the manager, ~120 lines for the WS proxy) while delivering production-quality WebRTC streaming with data channel input.
## Architecture
```
Browser Sandbox Agent neko (internal)
| | |
|-- WS /stream/signaling --> |-- WS ws://127.0.0.1:18100/api/ws -->|
| | (bidirectional relay) |
|<-- neko signaling ---------|<-- neko signaling -------|
| | |
|<========= WebRTC (UDP 59000-59100) ==================>|
| VP8 video, Opus audio, binary data channel |
| |
|-- data channel input (mouse/keyboard) --------------->|
| (binary protocol: opcode + payload, big-endian) |
```
Key points:
- neko listens on internal port 18100 (not exposed externally).
- UDP ports 59000-59100 are forwarded through Docker for WebRTC media.
- `--webrtc.icelite` + `--webrtc.nat1to1 127.0.0.1` means neko advertises 127.0.0.1 as its ICE candidate, so the browser connects to localhost UDP ports directly.
- `--desktop.input.enabled=false` disables neko's custom xf86-input driver (not available outside neko's official Docker images). Input falls back to XTEST.
- Each WebSocket proxy connection creates a fresh neko login session with a unique username to avoid session conflicts when multiple clients connect.
## Trade-offs
| Concern | neko | GStreamer direct |
|---------|------|-----------------|
| Binary size | ~30MB additional binary | ~0 (uses system GStreamer libs) |
| Compile-time deps | None (external binary) | gstreamer-rs crate + GStreamer dev libs |
| Input latency | Sub-frame (data channel) | WebSocket round trip |
| ICE/NAT complexity | Handled by neko flags | Must implement in Rust |
| Multi-client | Built-in session management | Must implement |
| Maintenance | Upstream neko updates | Own all the code |
| Audio | Built-in (opus) | Must add audio pipeline |
The main trade-off is the additional ~30MB binary size from neko. This is acceptable for the Docker-based deployment model where image size is less critical than reliability and development velocity.
## References
- neko v3: https://github.com/m1k1o/neko
- neko client reference: https://github.com/demodesk/neko-client
- neko data channel protocol: https://github.com/m1k1o/neko/blob/master/server/internal/webrtc/payload/receive.go
- GStreamer branch (closed): PR #226, branch `desktop-computer-use`
- Image digest: `ghcr.io/m1k1o/neko/base@sha256:0c384afa56268aaa2d5570211d284763d0840dcdd1a7d9a24be3081d94d3dfce`

View file

@ -0,0 +1,272 @@
"use client";
import type { CSSProperties, MouseEvent, WheelEvent } from "react";
import { useEffect, useRef, useState } from "react";
import type { DesktopMouseButton, DesktopStreamErrorStatus, DesktopStreamReadyStatus, SandboxAgent } from "sandbox-agent";
type ConnectionState = "connecting" | "ready" | "closed" | "error";
export type DesktopViewerClient = Pick<SandboxAgent, "connectDesktopStream">;
export interface DesktopViewerProps {
client: DesktopViewerClient;
className?: string;
style?: CSSProperties;
imageStyle?: CSSProperties;
height?: number | string;
showStatusBar?: boolean;
onConnect?: (status: DesktopStreamReadyStatus) => void;
onDisconnect?: () => void;
onError?: (error: DesktopStreamErrorStatus | Error) => void;
}
const shellStyle: CSSProperties = {
display: "flex",
flexDirection: "column",
overflow: "hidden",
border: "1px solid rgba(15, 23, 42, 0.14)",
borderRadius: 14,
background: "linear-gradient(180deg, rgba(248, 250, 252, 0.96) 0%, rgba(226, 232, 240, 0.92) 100%)",
boxShadow: "0 20px 40px rgba(15, 23, 42, 0.08)",
};
const statusBarStyle: CSSProperties = {
display: "flex",
alignItems: "center",
justifyContent: "space-between",
gap: 12,
padding: "10px 14px",
borderBottom: "1px solid rgba(15, 23, 42, 0.08)",
background: "rgba(255, 255, 255, 0.78)",
color: "#0f172a",
fontSize: 12,
lineHeight: 1.4,
};
const viewportStyle: CSSProperties = {
position: "relative",
display: "flex",
alignItems: "center",
justifyContent: "center",
overflow: "hidden",
background: "radial-gradient(circle at top, rgba(14, 165, 233, 0.18), transparent 45%), linear-gradient(180deg, #0f172a 0%, #111827 100%)",
};
const videoBaseStyle: CSSProperties = {
display: "block",
width: "100%",
height: "100%",
objectFit: "contain",
userSelect: "none",
};
const hintStyle: CSSProperties = {
opacity: 0.66,
};
const getStatusColor = (state: ConnectionState): string => {
switch (state) {
case "ready":
return "#15803d";
case "error":
return "#b91c1c";
case "closed":
return "#b45309";
default:
return "#475569";
}
};
export const DesktopViewer = ({
client,
className,
style,
imageStyle,
height = 480,
showStatusBar = true,
onConnect,
onDisconnect,
onError,
}: DesktopViewerProps) => {
const wrapperRef = useRef<HTMLDivElement | null>(null);
const videoRef = useRef<HTMLVideoElement | null>(null);
const sessionRef = useRef<ReturnType<DesktopViewerClient["connectDesktopStream"]> | null>(null);
const [connectionState, setConnectionState] = useState<ConnectionState>("connecting");
const [statusMessage, setStatusMessage] = useState("Starting desktop stream...");
const [hasVideo, setHasVideo] = useState(false);
const [resolution, setResolution] = useState<{ width: number; height: number } | null>(null);
useEffect(() => {
let cancelled = false;
setConnectionState("connecting");
setStatusMessage("Connecting to desktop stream...");
setResolution(null);
setHasVideo(false);
const session = client.connectDesktopStream();
sessionRef.current = session;
session.onReady((status) => {
if (cancelled) return;
setConnectionState("ready");
setStatusMessage("Desktop stream connected.");
setResolution({ width: status.width, height: status.height });
onConnect?.(status);
});
session.onTrack((stream) => {
if (cancelled) return;
const video = videoRef.current;
if (video) {
video.srcObject = stream;
void video.play().catch(() => undefined);
setHasVideo(true);
}
});
session.onError((error) => {
if (cancelled) return;
setConnectionState("error");
setStatusMessage(error instanceof Error ? error.message : error.message);
onError?.(error);
});
session.onDisconnect(() => {
if (cancelled) return;
setConnectionState((current) => (current === "error" ? current : "closed"));
setStatusMessage((current) => (current === "Desktop stream connected." ? "Desktop stream disconnected." : current));
onDisconnect?.();
});
return () => {
cancelled = true;
session.close();
sessionRef.current = null;
const video = videoRef.current;
if (video) {
video.srcObject = null;
}
setHasVideo(false);
};
}, [client, onConnect, onDisconnect, onError]);
const scalePoint = (clientX: number, clientY: number) => {
const video = videoRef.current;
if (!video || !resolution) {
return null;
}
const rect = video.getBoundingClientRect();
if (rect.width === 0 || rect.height === 0) {
return null;
}
// The video uses objectFit: "contain", so we need to compute the actual
// rendered content area within the <video> element to map coordinates
// accurately (ignoring letterbox bars).
const videoAspect = resolution.width / resolution.height;
const elemAspect = rect.width / rect.height;
let renderW: number;
let renderH: number;
if (elemAspect > videoAspect) {
// Pillarboxed (black bars on left/right)
renderH = rect.height;
renderW = rect.height * videoAspect;
} else {
// Letterboxed (black bars on top/bottom)
renderW = rect.width;
renderH = rect.width / videoAspect;
}
const offsetX = (rect.width - renderW) / 2;
const offsetY = (rect.height - renderH) / 2;
const relX = clientX - rect.left - offsetX;
const relY = clientY - rect.top - offsetY;
const x = Math.max(0, Math.min(resolution.width, (relX / renderW) * resolution.width));
const y = Math.max(0, Math.min(resolution.height, (relY / renderH) * resolution.height));
return {
x: Math.round(x),
y: Math.round(y),
};
};
const buttonFromMouseEvent = (event: MouseEvent<HTMLDivElement>): DesktopMouseButton => {
switch (event.button) {
case 1:
return "middle";
case 2:
return "right";
default:
return "left";
}
};
const withSession = (callback: (session: NonNullable<ReturnType<DesktopViewerClient["connectDesktopStream"]>>) => void) => {
const session = sessionRef.current;
if (session) {
callback(session);
}
};
return (
<div className={className} style={{ ...shellStyle, ...style }}>
{showStatusBar ? (
<div style={statusBarStyle}>
<span style={{ color: getStatusColor(connectionState) }}>{statusMessage}</span>
<span style={hintStyle}>{resolution ? `${resolution.width}×${resolution.height}` : "Awaiting stream"}</span>
</div>
) : null}
<div
ref={wrapperRef}
role="button"
tabIndex={0}
style={{ ...viewportStyle, height }}
onMouseMove={(event) => {
const point = scalePoint(event.clientX, event.clientY);
if (!point) {
return;
}
withSession((session) => session.moveMouse(point.x, point.y));
}}
onContextMenu={(event) => {
event.preventDefault();
}}
onMouseDown={(event) => {
event.preventDefault();
// preventDefault on mousedown suppresses the default focus behavior,
// so we must explicitly focus the wrapper to receive keyboard events.
wrapperRef.current?.focus();
const point = scalePoint(event.clientX, event.clientY);
withSession((session) => session.mouseDown(buttonFromMouseEvent(event), point?.x, point?.y));
}}
onMouseUp={(event) => {
const point = scalePoint(event.clientX, event.clientY);
withSession((session) => session.mouseUp(buttonFromMouseEvent(event), point?.x, point?.y));
}}
onWheel={(event: WheelEvent<HTMLDivElement>) => {
event.preventDefault();
const point = scalePoint(event.clientX, event.clientY);
if (!point) {
return;
}
withSession((session) => session.scroll(point.x, point.y, Math.round(event.deltaX), Math.round(event.deltaY)));
}}
onKeyDown={(event) => {
event.preventDefault();
event.stopPropagation();
withSession((session) => session.keyDown(event.key));
}}
onKeyUp={(event) => {
event.preventDefault();
event.stopPropagation();
withSession((session) => session.keyUp(event.key));
}}
>
<video
ref={videoRef}
autoPlay
playsInline
muted
tabIndex={-1}
draggable={false}
style={{ ...videoBaseStyle, ...imageStyle, display: hasVideo ? "block" : "none", pointerEvents: "none" }}
/>
</div>
</div>
);
};

View file

@ -1,6 +1,7 @@
export { AgentConversation } from "./AgentConversation.tsx";
export { AgentTranscript } from "./AgentTranscript.tsx";
export { ChatComposer } from "./ChatComposer.tsx";
export { DesktopViewer } from "./DesktopViewer.tsx";
export { ProcessTerminal } from "./ProcessTerminal.tsx";
export { useTranscriptVirtualizer } from "./useTranscriptVirtualizer.ts";
@ -23,6 +24,11 @@ export type {
ChatComposerProps,
} from "./ChatComposer.tsx";
export type {
DesktopViewerClient,
DesktopViewerProps,
} from "./DesktopViewer.tsx";
export type {
ProcessTerminalClient,
ProcessTerminalProps,

View file

@ -23,12 +23,45 @@ import {
type SetSessionModeRequest,
} from "acp-http-client";
import type { SandboxProvider } from "./providers/types.ts";
import { DesktopStreamSession, type DesktopStreamConnectOptions } from "./desktop-stream.ts";
import {
type AcpServerListResponse,
type AgentInfo,
type AgentInstallRequest,
type AgentInstallResponse,
type AgentListResponse,
type DesktopActionResponse,
type DesktopClipboardQuery,
type DesktopClipboardResponse,
type DesktopClipboardWriteRequest,
type DesktopDisplayInfoResponse,
type DesktopKeyboardDownRequest,
type DesktopKeyboardPressRequest,
type DesktopKeyboardTypeRequest,
type DesktopLaunchRequest,
type DesktopLaunchResponse,
type DesktopMouseClickRequest,
type DesktopMouseDownRequest,
type DesktopMouseDragRequest,
type DesktopMouseMoveRequest,
type DesktopMousePositionResponse,
type DesktopMouseScrollRequest,
type DesktopMouseUpRequest,
type DesktopKeyboardUpRequest,
type DesktopOpenRequest,
type DesktopOpenResponse,
type DesktopRecordingInfo,
type DesktopRecordingListResponse,
type DesktopRecordingStartRequest,
type DesktopRegionScreenshotQuery,
type DesktopScreenshotQuery,
type DesktopStartRequest,
type DesktopStatusResponse,
type DesktopStreamStatusResponse,
type DesktopWindowInfo,
type DesktopWindowListResponse,
type DesktopWindowMoveRequest,
type DesktopWindowResizeRequest,
type FsActionResponse,
type FsDeleteQuery,
type FsEntriesQuery,
@ -53,7 +86,9 @@ import {
type ProcessInfo,
type ProcessInputRequest,
type ProcessInputResponse,
type ProcessListQuery,
type ProcessListResponse,
type ProcessOwner,
type ProcessLogEntry,
type ProcessLogsQuery,
type ProcessLogsResponse,
@ -201,6 +236,7 @@ export interface ProcessTerminalConnectOptions extends ProcessTerminalWebSocketU
}
export type ProcessTerminalSessionOptions = ProcessTerminalConnectOptions;
export type DesktopStreamSessionOptions = DesktopStreamConnectOptions;
export class SandboxAgentError extends Error {
readonly status: number;
@ -1533,6 +1569,196 @@ export class SandboxAgent {
return this.requestHealth();
}
async startDesktop(request: DesktopStartRequest = {}): Promise<DesktopStatusResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/start`, {
body: request,
});
}
async stopDesktop(): Promise<DesktopStatusResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/stop`);
}
async getDesktopStatus(): Promise<DesktopStatusResponse> {
return this.requestJson("GET", `${API_PREFIX}/desktop/status`);
}
async getDesktopDisplayInfo(): Promise<DesktopDisplayInfoResponse> {
return this.requestJson("GET", `${API_PREFIX}/desktop/display/info`);
}
async takeDesktopScreenshot(query: DesktopScreenshotQuery = {}): Promise<Uint8Array> {
const response = await this.requestRaw("GET", `${API_PREFIX}/desktop/screenshot`, {
query,
accept: "image/*",
});
const buffer = await response.arrayBuffer();
return new Uint8Array(buffer);
}
async takeDesktopRegionScreenshot(query: DesktopRegionScreenshotQuery): Promise<Uint8Array> {
const response = await this.requestRaw("GET", `${API_PREFIX}/desktop/screenshot/region`, {
query,
accept: "image/*",
});
const buffer = await response.arrayBuffer();
return new Uint8Array(buffer);
}
async getDesktopMousePosition(): Promise<DesktopMousePositionResponse> {
return this.requestJson("GET", `${API_PREFIX}/desktop/mouse/position`);
}
async moveDesktopMouse(request: DesktopMouseMoveRequest): Promise<DesktopMousePositionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/mouse/move`, {
body: request,
});
}
async clickDesktop(request: DesktopMouseClickRequest): Promise<DesktopMousePositionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/mouse/click`, {
body: request,
});
}
async mouseDownDesktop(request: DesktopMouseDownRequest): Promise<DesktopMousePositionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/mouse/down`, {
body: request,
});
}
async mouseUpDesktop(request: DesktopMouseUpRequest): Promise<DesktopMousePositionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/mouse/up`, {
body: request,
});
}
async dragDesktopMouse(request: DesktopMouseDragRequest): Promise<DesktopMousePositionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/mouse/drag`, {
body: request,
});
}
async scrollDesktop(request: DesktopMouseScrollRequest): Promise<DesktopMousePositionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/mouse/scroll`, {
body: request,
});
}
async typeDesktopText(request: DesktopKeyboardTypeRequest): Promise<DesktopActionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/keyboard/type`, {
body: request,
});
}
async pressDesktopKey(request: DesktopKeyboardPressRequest): Promise<DesktopActionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/keyboard/press`, {
body: request,
});
}
async keyDownDesktop(request: DesktopKeyboardDownRequest): Promise<DesktopActionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/keyboard/down`, {
body: request,
});
}
async keyUpDesktop(request: DesktopKeyboardUpRequest): Promise<DesktopActionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/keyboard/up`, {
body: request,
});
}
async listDesktopWindows(): Promise<DesktopWindowListResponse> {
return this.requestJson("GET", `${API_PREFIX}/desktop/windows`);
}
async getDesktopFocusedWindow(): Promise<DesktopWindowInfo> {
return this.requestJson("GET", `${API_PREFIX}/desktop/windows/focused`);
}
async focusDesktopWindow(windowId: string): Promise<DesktopWindowInfo> {
return this.requestJson("POST", `${API_PREFIX}/desktop/windows/${encodeURIComponent(windowId)}/focus`);
}
async moveDesktopWindow(windowId: string, request: DesktopWindowMoveRequest): Promise<DesktopWindowInfo> {
return this.requestJson("POST", `${API_PREFIX}/desktop/windows/${encodeURIComponent(windowId)}/move`, {
body: request,
});
}
async resizeDesktopWindow(windowId: string, request: DesktopWindowResizeRequest): Promise<DesktopWindowInfo> {
return this.requestJson("POST", `${API_PREFIX}/desktop/windows/${encodeURIComponent(windowId)}/resize`, {
body: request,
});
}
async getDesktopClipboard(query: DesktopClipboardQuery = {}): Promise<DesktopClipboardResponse> {
return this.requestJson("GET", `${API_PREFIX}/desktop/clipboard`, {
query,
});
}
async setDesktopClipboard(request: DesktopClipboardWriteRequest): Promise<DesktopActionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/clipboard`, {
body: request,
});
}
async launchDesktopApp(request: DesktopLaunchRequest): Promise<DesktopLaunchResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/launch`, {
body: request,
});
}
async openDesktopTarget(request: DesktopOpenRequest): Promise<DesktopOpenResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/open`, {
body: request,
});
}
async getDesktopStreamStatus(): Promise<DesktopStreamStatusResponse> {
return this.requestJson("GET", `${API_PREFIX}/desktop/stream/status`);
}
async startDesktopRecording(request: DesktopRecordingStartRequest = {}): Promise<DesktopRecordingInfo> {
return this.requestJson("POST", `${API_PREFIX}/desktop/recording/start`, {
body: request,
});
}
async stopDesktopRecording(): Promise<DesktopRecordingInfo> {
return this.requestJson("POST", `${API_PREFIX}/desktop/recording/stop`);
}
async listDesktopRecordings(): Promise<DesktopRecordingListResponse> {
return this.requestJson("GET", `${API_PREFIX}/desktop/recordings`);
}
async getDesktopRecording(id: string): Promise<DesktopRecordingInfo> {
return this.requestJson("GET", `${API_PREFIX}/desktop/recordings/${encodeURIComponent(id)}`);
}
async downloadDesktopRecording(id: string): Promise<Uint8Array> {
const response = await this.requestRaw("GET", `${API_PREFIX}/desktop/recordings/${encodeURIComponent(id)}/download`, {
accept: "video/mp4",
});
const buffer = await response.arrayBuffer();
return new Uint8Array(buffer);
}
async deleteDesktopRecording(id: string): Promise<void> {
await this.requestRaw("DELETE", `${API_PREFIX}/desktop/recordings/${encodeURIComponent(id)}`);
}
async startDesktopStream(): Promise<DesktopStreamStatusResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/stream/start`);
}
async stopDesktopStream(): Promise<DesktopStreamStatusResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/stream/stop`);
}
async listAgents(options?: AgentQueryOptions): Promise<AgentListResponse> {
return this.requestJson("GET", `${API_PREFIX}/agents`, {
query: toAgentQuery(options),
@ -1665,8 +1891,10 @@ export class SandboxAgent {
});
}
async listProcesses(): Promise<ProcessListResponse> {
return this.requestJson("GET", `${API_PREFIX}/processes`);
async listProcesses(query?: ProcessListQuery): Promise<ProcessListResponse> {
return this.requestJson("GET", `${API_PREFIX}/processes`, {
query,
});
}
async getProcess(id: string): Promise<ProcessInfo> {
@ -1754,6 +1982,32 @@ export class SandboxAgent {
return new ProcessTerminalSession(this.connectProcessTerminalWebSocket(id, options));
}
buildDesktopStreamWebSocketUrl(options: ProcessTerminalWebSocketUrlOptions = {}): string {
return toWebSocketUrl(
this.buildUrl(`${API_PREFIX}/desktop/stream/signaling`, {
access_token: options.accessToken ?? this.token,
}),
);
}
connectDesktopStreamWebSocket(options: DesktopStreamConnectOptions = {}): WebSocket {
const WebSocketCtor = options.WebSocket ?? globalThis.WebSocket;
if (!WebSocketCtor) {
throw new Error("WebSocket API is not available; provide a WebSocket implementation.");
}
return new WebSocketCtor(
this.buildDesktopStreamWebSocketUrl({
accessToken: options.accessToken,
}),
options.protocols,
);
}
connectDesktopStream(options: DesktopStreamSessionOptions = {}): DesktopStreamSession {
return new DesktopStreamSession(this.connectDesktopStreamWebSocket(options));
}
private async getLiveConnection(agent: string): Promise<LiveAcpConnection> {
await this.awaitHealthy();

View file

@ -0,0 +1,541 @@
import type { DesktopMouseButton } from "./types.ts";
const WS_READY_STATE_OPEN = 1;
const WS_READY_STATE_CLOSED = 3;
export interface DesktopStreamReadyStatus {
type: "ready";
width: number;
height: number;
}
export interface DesktopStreamErrorStatus {
type: "error";
message: string;
}
export type DesktopStreamStatusMessage = DesktopStreamReadyStatus | DesktopStreamErrorStatus;
export interface DesktopStreamConnectOptions {
accessToken?: string;
WebSocket?: typeof WebSocket;
protocols?: string | string[];
RTCPeerConnection?: typeof RTCPeerConnection;
rtcConfig?: RTCConfiguration;
}
/**
* Neko data channel binary input protocol (Big Endian, v3).
*
* Reference implementation:
* https://github.com/demodesk/neko-client/blob/37f93eae6bd55b333c94bd009d7f2b079075a026/src/component/internal/webrtc.ts
*
* Server-side protocol:
* https://github.com/m1k1o/neko/blob/master/server/internal/webrtc/payload/receive.go
*
* Pinned to neko server image: m1k1o/neko:base@sha256:14e4012bc361025f71205ffc2a9342a628f39168c0a1d855db033fb18590fcae
*/
const NEKO_OP_MOVE = 0x01;
const NEKO_OP_SCROLL = 0x02;
const NEKO_OP_KEY_DOWN = 0x03;
const NEKO_OP_KEY_UP = 0x04;
const NEKO_OP_BTN_DOWN = 0x05;
const NEKO_OP_BTN_UP = 0x06;
function mouseButtonToX11(button?: DesktopMouseButton): number {
switch (button) {
case "middle":
return 2;
case "right":
return 3;
default:
return 1;
}
}
function keyToX11Keysym(key: string): number {
if (key.length === 1) {
const cp = key.charCodeAt(0);
if (cp >= 0x20 && cp <= 0x7e) return cp;
return 0x01000000 + cp;
}
const map: Record<string, number> = {
Backspace: 0xff08,
Tab: 0xff09,
Return: 0xff0d,
Enter: 0xff0d,
Escape: 0xff1b,
Delete: 0xffff,
Home: 0xff50,
Left: 0xff51,
ArrowLeft: 0xff51,
Up: 0xff52,
ArrowUp: 0xff52,
Right: 0xff53,
ArrowRight: 0xff53,
Down: 0xff54,
ArrowDown: 0xff54,
PageUp: 0xff55,
PageDown: 0xff56,
End: 0xff57,
Insert: 0xff63,
F1: 0xffbe,
F2: 0xffbf,
F3: 0xffc0,
F4: 0xffc1,
F5: 0xffc2,
F6: 0xffc3,
F7: 0xffc4,
F8: 0xffc5,
F9: 0xffc6,
F10: 0xffc7,
F11: 0xffc8,
F12: 0xffc9,
Shift: 0xffe1,
ShiftLeft: 0xffe1,
ShiftRight: 0xffe2,
Control: 0xffe3,
ControlLeft: 0xffe3,
ControlRight: 0xffe4,
Alt: 0xffe9,
AltLeft: 0xffe9,
AltRight: 0xffea,
Meta: 0xffeb,
MetaLeft: 0xffeb,
MetaRight: 0xffec,
CapsLock: 0xffe5,
NumLock: 0xff7f,
ScrollLock: 0xff14,
" ": 0x0020,
Space: 0x0020,
};
return map[key] ?? 0;
}
export class DesktopStreamSession {
readonly socket: WebSocket;
readonly closed: Promise<void>;
private pc: RTCPeerConnection | null = null;
private dataChannel: RTCDataChannel | null = null;
private mediaStream: MediaStream | null = null;
private connected = false;
private pendingCandidates: Record<string, unknown>[] = [];
private cachedReadyStatus: DesktopStreamReadyStatus | null = null;
private readonly readyListeners = new Set<(status: DesktopStreamReadyStatus) => void>();
private readonly trackListeners = new Set<(stream: MediaStream) => void>();
private readonly connectListeners = new Set<() => void>();
private readonly disconnectListeners = new Set<() => void>();
private readonly errorListeners = new Set<(error: DesktopStreamErrorStatus | Error) => void>();
private closedResolve!: () => void;
private readonly PeerConnection: typeof RTCPeerConnection;
private readonly rtcConfig: RTCConfiguration;
constructor(socket: WebSocket, options: DesktopStreamConnectOptions = {}) {
this.socket = socket;
this.PeerConnection = options.RTCPeerConnection ?? globalThis.RTCPeerConnection;
this.rtcConfig = options.rtcConfig ?? {};
this.closed = new Promise<void>((resolve) => {
this.closedResolve = resolve;
});
this.socket.addEventListener("message", (event) => {
this.handleMessage(event.data as string);
});
this.socket.addEventListener("error", () => {
this.emitError(new Error("Desktop stream signaling connection failed."));
});
this.socket.addEventListener("close", () => {
this.teardownPeerConnection();
this.closedResolve();
for (const listener of this.disconnectListeners) {
listener();
}
});
}
onReady(listener: (status: DesktopStreamReadyStatus) => void): () => void {
this.readyListeners.add(listener);
// Deliver cached status to late listeners (handles race where system/init
// arrives before onReady is called).
if (this.cachedReadyStatus) {
listener(this.cachedReadyStatus);
}
return () => {
this.readyListeners.delete(listener);
};
}
onTrack(listener: (stream: MediaStream) => void): () => void {
this.trackListeners.add(listener);
if (this.mediaStream) {
listener(this.mediaStream);
}
return () => {
this.trackListeners.delete(listener);
};
}
onConnect(listener: () => void): () => void {
this.connectListeners.add(listener);
return () => {
this.connectListeners.delete(listener);
};
}
onDisconnect(listener: () => void): () => void {
this.disconnectListeners.add(listener);
return () => {
this.disconnectListeners.delete(listener);
};
}
onError(listener: (error: DesktopStreamErrorStatus | Error) => void): () => void {
this.errorListeners.add(listener);
return () => {
this.errorListeners.delete(listener);
};
}
/** @deprecated Use onDisconnect instead. */
onClose(listener: () => void): () => void {
return this.onDisconnect(listener);
}
/** @deprecated No longer emits JPEG frames. Use onTrack for WebRTC media. */
onFrame(_listener: (frame: Uint8Array) => void): () => void {
return () => {};
}
getMediaStream(): MediaStream | null {
return this.mediaStream;
}
/** Build a neko data channel message with the 3-byte header (event + length). */
private buildNekoMsg(event: number, payloadSize: number): { buf: ArrayBuffer; view: DataView } {
const totalLen = 3 + payloadSize; // 1 byte event + 2 bytes length + payload
const buf = new ArrayBuffer(totalLen);
const view = new DataView(buf);
view.setUint8(0, event);
view.setUint16(1, payloadSize, false);
return { buf, view };
}
moveMouse(x: number, y: number): void {
// Move payload: X(uint16) + Y(uint16) = 4 bytes
const { buf, view } = this.buildNekoMsg(NEKO_OP_MOVE, 4);
view.setUint16(3, x, false);
view.setUint16(5, y, false);
this.sendDataChannel(buf);
}
mouseDown(button?: DesktopMouseButton, x?: number, y?: number): void {
if (x != null && y != null) {
this.moveMouse(x, y);
}
// Button payload: Key(uint32) = 4 bytes
const { buf, view } = this.buildNekoMsg(NEKO_OP_BTN_DOWN, 4);
view.setUint32(3, mouseButtonToX11(button), false);
this.sendDataChannel(buf);
}
mouseUp(button?: DesktopMouseButton, x?: number, y?: number): void {
if (x != null && y != null) {
this.moveMouse(x, y);
}
const { buf, view } = this.buildNekoMsg(NEKO_OP_BTN_UP, 4);
view.setUint32(3, mouseButtonToX11(button), false);
this.sendDataChannel(buf);
}
scroll(x: number, y: number, deltaX?: number, deltaY?: number): void {
this.moveMouse(x, y);
// Scroll payload: DeltaX(int16) + DeltaY(int16) + ControlKey(uint8) = 5 bytes
const { buf, view } = this.buildNekoMsg(NEKO_OP_SCROLL, 5);
view.setInt16(3, deltaX ?? 0, false);
view.setInt16(5, deltaY ?? 0, false);
view.setUint8(7, 0); // controlKey = false
this.sendDataChannel(buf);
}
keyDown(key: string): void {
const keysym = keyToX11Keysym(key);
if (keysym === 0) return;
// Key payload: Key(uint32) = 4 bytes
const { buf, view } = this.buildNekoMsg(NEKO_OP_KEY_DOWN, 4);
view.setUint32(3, keysym, false);
this.sendDataChannel(buf);
}
keyUp(key: string): void {
const keysym = keyToX11Keysym(key);
if (keysym === 0) return;
const { buf, view } = this.buildNekoMsg(NEKO_OP_KEY_UP, 4);
view.setUint32(3, keysym, false);
this.sendDataChannel(buf);
}
close(): void {
this.teardownPeerConnection();
if (this.socket.readyState !== WS_READY_STATE_CLOSED) {
this.socket.close();
}
}
private handleMessage(data: string): void {
let msg: Record<string, unknown>;
try {
msg = JSON.parse(data) as Record<string, unknown>;
} catch {
return;
}
const event = (msg.event as string) ?? "";
// Neko uses "payload" for message data, not "data".
const payload = (msg.payload ?? msg.data) as Record<string, unknown> | undefined;
switch (event) {
case "system/init": {
const screenData = payload?.screen_size as Record<string, unknown> | undefined;
if (screenData) {
const status: DesktopStreamReadyStatus = {
type: "ready",
width: Number(screenData.width) || 0,
height: Number(screenData.height) || 0,
};
this.cachedReadyStatus = status;
for (const listener of this.readyListeners) {
listener(status);
}
}
// Request control so this session can send input.
this.sendSignaling("control/request", {});
// Request WebRTC stream from neko. The server will respond with
// signal/provide containing the SDP offer.
this.sendSignaling("signal/request", { video: {}, audio: {} });
break;
}
case "signal/provide":
case "signal/offer": {
if (payload?.sdp) {
void this.handleNekoOffer(payload);
}
break;
}
case "signal/restart": {
// Server-initiated renegotiation (treated as a new offer).
// Ref: https://github.com/demodesk/neko-client/blob/37f93ea/src/component/internal/messages.ts#L190-L192
if (payload?.sdp) {
void this.handleNekoOffer(payload);
}
break;
}
case "signal/candidate": {
if (payload) {
void this.handleNekoCandidate(payload);
}
break;
}
case "signal/close": {
// Server is closing the WebRTC connection.
this.teardownPeerConnection();
break;
}
case "system/disconnect": {
const message = (payload as Record<string, unknown>)?.message as string | undefined;
this.emitError(new Error(message ?? "Server disconnected."));
this.close();
break;
}
default:
break;
}
}
private async handleNekoOffer(data: Record<string, unknown>): Promise<void> {
try {
const iceServers: RTCIceServer[] = [];
const nekoIce = (data.iceservers ?? data.ice) as Array<Record<string, unknown>> | undefined;
if (nekoIce) {
for (const server of nekoIce) {
if (server.urls) {
iceServers.push(server as unknown as RTCIceServer);
}
}
}
if (iceServers.length === 0) {
iceServers.push({ urls: "stun:stun.l.google.com:19302" });
}
const config: RTCConfiguration = { ...this.rtcConfig, iceServers };
const pc = new this.PeerConnection(config);
this.pc = pc;
pc.ontrack = (event) => {
const stream = event.streams[0] ?? new MediaStream([event.track]);
this.mediaStream = stream;
for (const listener of this.trackListeners) {
listener(stream);
}
};
pc.onicecandidate = (event) => {
if (event.candidate) {
this.sendSignaling("signal/candidate", event.candidate.toJSON());
}
};
// Ref: https://github.com/demodesk/neko-client/blob/37f93ea/src/component/internal/webrtc.ts#L123-L173
pc.onconnectionstatechange = () => {
switch (pc.connectionState) {
case "connected":
if (!this.connected) {
this.connected = true;
for (const listener of this.connectListeners) {
listener();
}
}
break;
case "closed":
case "failed":
this.emitError(new Error(`WebRTC connection ${pc.connectionState}.`));
break;
}
};
pc.oniceconnectionstatechange = () => {
switch (pc.iceConnectionState) {
case "connected":
if (!this.connected) {
this.connected = true;
for (const listener of this.connectListeners) {
listener();
}
}
break;
case "closed":
case "failed":
this.emitError(new Error(`WebRTC ICE ${pc.iceConnectionState}.`));
break;
}
};
// Neko v3 creates data channels on the server side.
// Ref: https://github.com/demodesk/neko-client/blob/37f93ea/src/component/internal/webrtc.ts#L477-L486
pc.ondatachannel = (event) => {
this.dataChannel = event.channel;
this.dataChannel.binaryType = "arraybuffer";
this.dataChannel.onerror = () => {
this.emitError(new Error("WebRTC data channel error."));
};
this.dataChannel.onclose = () => {
this.dataChannel = null;
};
};
const sdp = data.sdp as string;
await pc.setRemoteDescription({ type: "offer", sdp });
// Flush any ICE candidates that arrived before the PC was ready.
for (const pending of this.pendingCandidates) {
try {
await pc.addIceCandidate(pending as unknown as RTCIceCandidateInit);
} catch {
// ignore stale candidates
}
}
this.pendingCandidates = [];
const answer = await pc.createAnswer();
// Enable stereo audio for Chromium.
// Ref: https://github.com/demodesk/neko-client/blob/37f93ea/src/component/internal/webrtc.ts#L262
if (answer.sdp) {
answer.sdp = answer.sdp.replace(/(stereo=1;)?useinbandfec=1/, "useinbandfec=1;stereo=1");
}
await pc.setLocalDescription(answer);
this.sendSignaling("signal/answer", { sdp: answer.sdp });
} catch (error) {
this.emitError(error instanceof Error ? error : new Error(String(error)));
}
}
private async handleNekoCandidate(data: Record<string, unknown>): Promise<void> {
// Buffer candidates that arrive before the peer connection is created.
if (!this.pc) {
this.pendingCandidates.push(data);
return;
}
try {
const candidate = data as unknown as RTCIceCandidateInit;
await this.pc.addIceCandidate(candidate);
} catch (error) {
this.emitError(error instanceof Error ? error : new Error(String(error)));
}
}
private sendSignaling(event: string, payload: unknown): void {
if (this.socket.readyState !== WS_READY_STATE_OPEN) return;
this.socket.send(JSON.stringify({ event, payload }));
}
private sendDataChannel(buf: ArrayBuffer): void {
if (this.dataChannel && this.dataChannel.readyState === "open") {
this.dataChannel.send(buf);
}
}
/** Tear down the peer connection, nullifying handlers first to prevent stale
* callbacks. Matches the reference disconnect() pattern.
* Ref: https://github.com/demodesk/neko-client/blob/37f93ea/src/component/internal/webrtc.ts#L321-L363 */
private teardownPeerConnection(): void {
if (this.dataChannel) {
this.dataChannel.onerror = null;
this.dataChannel.onmessage = null;
this.dataChannel.onopen = null;
this.dataChannel.onclose = null;
try {
this.dataChannel.close();
} catch {
/* ignore */
}
this.dataChannel = null;
}
if (this.pc) {
this.pc.onicecandidate = null;
this.pc.onicecandidateerror = null;
this.pc.onconnectionstatechange = null;
this.pc.oniceconnectionstatechange = null;
this.pc.onsignalingstatechange = null;
this.pc.onnegotiationneeded = null;
this.pc.ontrack = null;
this.pc.ondatachannel = null;
try {
this.pc.close();
} catch {
/* ignore */
}
this.pc = null;
}
this.mediaStream = null;
this.connected = false;
}
private emitError(error: DesktopStreamErrorStatus | Error): void {
for (const listener of this.errorListeners) {
listener(error);
}
}
}

File diff suppressed because it is too large Load diff

View file

@ -14,10 +14,18 @@ export {
export { AcpRpcError } from "acp-http-client";
export { buildInspectorUrl } from "./inspector.ts";
export { DesktopStreamSession } from "./desktop-stream.ts";
export type {
DesktopStreamConnectOptions,
DesktopStreamErrorStatus,
DesktopStreamReadyStatus,
DesktopStreamStatusMessage,
} from "./desktop-stream.ts";
export type {
SandboxAgentHealthWaitOptions,
AgentQueryOptions,
DesktopStreamSessionOptions,
ProcessLogFollowQuery,
ProcessLogListener,
ProcessLogSubscription,
@ -50,6 +58,37 @@ export type {
AgentInstallRequest,
AgentInstallResponse,
AgentListResponse,
DesktopActionResponse,
DesktopDisplayInfoResponse,
DesktopErrorInfo,
DesktopKeyboardDownRequest,
DesktopKeyboardUpRequest,
DesktopKeyModifiers,
DesktopKeyboardPressRequest,
DesktopKeyboardTypeRequest,
DesktopMouseButton,
DesktopMouseClickRequest,
DesktopMouseDownRequest,
DesktopMouseDragRequest,
DesktopMouseMoveRequest,
DesktopMousePositionResponse,
DesktopMouseScrollRequest,
DesktopMouseUpRequest,
DesktopProcessInfo,
DesktopRecordingInfo,
DesktopRecordingListResponse,
DesktopRecordingStartRequest,
DesktopRecordingStatus,
DesktopRegionScreenshotQuery,
DesktopResolution,
DesktopScreenshotFormat,
DesktopScreenshotQuery,
DesktopStartRequest,
DesktopState,
DesktopStatusResponse,
DesktopStreamStatusResponse,
DesktopWindowInfo,
DesktopWindowListResponse,
FsActionResponse,
FsDeleteQuery,
FsEntriesQuery,
@ -74,10 +113,12 @@ export type {
ProcessInfo,
ProcessInputRequest,
ProcessInputResponse,
ProcessListQuery,
ProcessListResponse,
ProcessLogEntry,
ProcessLogsQuery,
ProcessLogsResponse,
ProcessOwner,
ProcessLogsStream,
ProcessRunRequest,
ProcessRunResponse,

View file

@ -4,6 +4,48 @@ import type { components, operations } from "./generated/openapi.ts";
export type ProblemDetails = components["schemas"]["ProblemDetails"];
export type HealthResponse = JsonResponse<operations["get_v1_health"], 200>;
export type DesktopState = components["schemas"]["DesktopState"];
export type DesktopResolution = components["schemas"]["DesktopResolution"];
export type DesktopErrorInfo = components["schemas"]["DesktopErrorInfo"];
export type DesktopProcessInfo = components["schemas"]["DesktopProcessInfo"];
export type DesktopStatusResponse = JsonResponse<operations["get_v1_desktop_status"], 200>;
export type DesktopStartRequest = JsonRequestBody<operations["post_v1_desktop_start"]>;
export type DesktopScreenshotFormat = components["schemas"]["DesktopScreenshotFormat"];
export type DesktopScreenshotQuery =
QueryParams<operations["get_v1_desktop_screenshot"]> extends never ? Record<string, never> : QueryParams<operations["get_v1_desktop_screenshot"]>;
export type DesktopRegionScreenshotQuery = QueryParams<operations["get_v1_desktop_screenshot_region"]>;
export type DesktopMousePositionResponse = JsonResponse<operations["get_v1_desktop_mouse_position"], 200>;
export type DesktopMouseButton = components["schemas"]["DesktopMouseButton"];
export type DesktopMouseMoveRequest = JsonRequestBody<operations["post_v1_desktop_mouse_move"]>;
export type DesktopMouseClickRequest = JsonRequestBody<operations["post_v1_desktop_mouse_click"]>;
export type DesktopMouseDownRequest = JsonRequestBody<operations["post_v1_desktop_mouse_down"]>;
export type DesktopMouseUpRequest = JsonRequestBody<operations["post_v1_desktop_mouse_up"]>;
export type DesktopMouseDragRequest = JsonRequestBody<operations["post_v1_desktop_mouse_drag"]>;
export type DesktopMouseScrollRequest = JsonRequestBody<operations["post_v1_desktop_mouse_scroll"]>;
export type DesktopKeyboardTypeRequest = JsonRequestBody<operations["post_v1_desktop_keyboard_type"]>;
export type DesktopKeyModifiers = components["schemas"]["DesktopKeyModifiers"];
export type DesktopKeyboardPressRequest = JsonRequestBody<operations["post_v1_desktop_keyboard_press"]>;
export type DesktopKeyboardDownRequest = JsonRequestBody<operations["post_v1_desktop_keyboard_down"]>;
export type DesktopKeyboardUpRequest = JsonRequestBody<operations["post_v1_desktop_keyboard_up"]>;
export type DesktopActionResponse = JsonResponse<operations["post_v1_desktop_keyboard_type"], 200>;
export type DesktopDisplayInfoResponse = JsonResponse<operations["get_v1_desktop_display_info"], 200>;
export type DesktopWindowInfo = components["schemas"]["DesktopWindowInfo"];
export type DesktopWindowListResponse = JsonResponse<operations["get_v1_desktop_windows"], 200>;
export type DesktopRecordingStartRequest = JsonRequestBody<operations["post_v1_desktop_recording_start"]>;
export type DesktopRecordingStatus = components["schemas"]["DesktopRecordingStatus"];
export type DesktopRecordingInfo = JsonResponse<operations["post_v1_desktop_recording_start"], 200>;
export type DesktopRecordingListResponse = JsonResponse<operations["get_v1_desktop_recordings"], 200>;
export type DesktopStreamStatusResponse = JsonResponse<operations["post_v1_desktop_stream_start"], 200>;
export type DesktopClipboardResponse = JsonResponse<operations["get_v1_desktop_clipboard"], 200>;
export type DesktopClipboardQuery =
QueryParams<operations["get_v1_desktop_clipboard"]> extends never ? Record<string, never> : QueryParams<operations["get_v1_desktop_clipboard"]>;
export type DesktopClipboardWriteRequest = JsonRequestBody<operations["post_v1_desktop_clipboard"]>;
export type DesktopLaunchRequest = JsonRequestBody<operations["post_v1_desktop_launch"]>;
export type DesktopLaunchResponse = JsonResponse<operations["post_v1_desktop_launch"], 200>;
export type DesktopOpenRequest = JsonRequestBody<operations["post_v1_desktop_open"]>;
export type DesktopOpenResponse = JsonResponse<operations["post_v1_desktop_open"], 200>;
export type DesktopWindowMoveRequest = JsonRequestBody<operations["post_v1_desktop_window_move"]>;
export type DesktopWindowResizeRequest = JsonRequestBody<operations["post_v1_desktop_window_resize"]>;
export type AgentListResponse = JsonResponse<operations["get_v1_agents"], 200>;
export type AgentInfo = components["schemas"]["AgentInfo"];
export type AgentQuery = QueryParams<operations["get_v1_agents"]>;
@ -37,11 +79,13 @@ export type ProcessCreateRequest = JsonRequestBody<operations["post_v1_processes
export type ProcessInfo = components["schemas"]["ProcessInfo"];
export type ProcessInputRequest = JsonRequestBody<operations["post_v1_process_input"]>;
export type ProcessInputResponse = JsonResponse<operations["post_v1_process_input"], 200>;
export type ProcessListQuery = QueryParams<operations["get_v1_processes"]>;
export type ProcessListResponse = JsonResponse<operations["get_v1_processes"], 200>;
export type ProcessLogEntry = components["schemas"]["ProcessLogEntry"];
export type ProcessLogsQuery = QueryParams<operations["get_v1_process_logs"]>;
export type ProcessLogsResponse = JsonResponse<operations["get_v1_process_logs"], 200>;
export type ProcessLogsStream = components["schemas"]["ProcessLogsStream"];
export type ProcessOwner = components["schemas"]["ProcessOwner"];
export type ProcessRunRequest = JsonRequestBody<operations["post_v1_processes_run"]>;
export type ProcessRunResponse = JsonResponse<operations["post_v1_processes_run"], 200>;
export type ProcessSignalQuery = QueryParams<operations["post_v1_process_stop"]>;

View file

@ -0,0 +1,244 @@
import { execFileSync } from "node:child_process";
import { mkdtempSync, mkdirSync, rmSync } from "node:fs";
import { dirname, join, resolve } from "node:path";
import { fileURLToPath } from "node:url";
const __dirname = dirname(fileURLToPath(import.meta.url));
const REPO_ROOT = resolve(__dirname, "../../../..");
const CONTAINER_PORT = 3000;
const DEFAULT_PATH = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin";
const DEFAULT_IMAGE_TAG = "sandbox-agent-test:dev";
const STANDARD_PATHS = new Set(["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"]);
let cachedImage: string | undefined;
let containerCounter = 0;
export type DockerSandboxAgentHandle = {
baseUrl: string;
token: string;
dispose: () => Promise<void>;
};
export type DockerSandboxAgentOptions = {
env?: Record<string, string>;
pathMode?: "merge" | "replace";
timeoutMs?: number;
};
type TestLayout = {
rootDir: string;
homeDir: string;
xdgDataHome: string;
xdgStateHome: string;
appDataDir: string;
localAppDataDir: string;
installDir: string;
};
export function createDockerTestLayout(): TestLayout {
const tempRoot = join(REPO_ROOT, ".context", "docker-test-");
mkdirSync(resolve(REPO_ROOT, ".context"), { recursive: true });
const rootDir = mkdtempSync(tempRoot);
const homeDir = join(rootDir, "home");
const xdgDataHome = join(rootDir, "xdg-data");
const xdgStateHome = join(rootDir, "xdg-state");
const appDataDir = join(rootDir, "appdata", "Roaming");
const localAppDataDir = join(rootDir, "appdata", "Local");
const installDir = join(xdgDataHome, "sandbox-agent", "bin");
for (const dir of [homeDir, xdgDataHome, xdgStateHome, appDataDir, localAppDataDir, installDir]) {
mkdirSync(dir, { recursive: true });
}
return {
rootDir,
homeDir,
xdgDataHome,
xdgStateHome,
appDataDir,
localAppDataDir,
installDir,
};
}
export function disposeDockerTestLayout(layout: TestLayout): void {
try {
rmSync(layout.rootDir, { recursive: true, force: true });
} catch (error) {
if (typeof process.getuid === "function" && typeof process.getgid === "function") {
try {
execFileSync(
"docker",
[
"run",
"--rm",
"--user",
"0:0",
"--entrypoint",
"sh",
"-v",
`${layout.rootDir}:${layout.rootDir}`,
ensureImage(),
"-c",
`chown -R ${process.getuid()}:${process.getgid()} '${layout.rootDir}'`,
],
{ stdio: "pipe" },
);
rmSync(layout.rootDir, { recursive: true, force: true });
return;
} catch {}
}
throw error;
}
}
export async function startDockerSandboxAgent(layout: TestLayout, options: DockerSandboxAgentOptions = {}): Promise<DockerSandboxAgentHandle> {
const image = ensureImage();
const containerId = uniqueContainerId();
const env = buildEnv(layout, options.env ?? {}, options.pathMode ?? "merge");
const mounts = buildMounts(layout.rootDir, env);
const args = ["run", "-d", "--rm", "--name", containerId, "-p", `127.0.0.1::${CONTAINER_PORT}`];
if (typeof process.getuid === "function" && typeof process.getgid === "function") {
args.push("--user", `${process.getuid()}:${process.getgid()}`);
}
if (process.platform === "linux") {
args.push("--add-host", "host.docker.internal:host-gateway");
}
for (const mount of mounts) {
args.push("-v", `${mount}:${mount}`);
}
for (const [key, value] of Object.entries(env)) {
args.push("-e", `${key}=${value}`);
}
args.push(image, "server", "--host", "0.0.0.0", "--port", String(CONTAINER_PORT), "--no-token");
execFileSync("docker", args, { stdio: "pipe" });
try {
const mapping = execFileSync("docker", ["port", containerId, `${CONTAINER_PORT}/tcp`], {
encoding: "utf8",
stdio: ["ignore", "pipe", "pipe"],
}).trim();
const mappingParts = mapping.split(":");
const hostPort = mappingParts[mappingParts.length - 1]?.trim();
if (!hostPort) {
throw new Error(`missing mapped host port in ${mapping}`);
}
const baseUrl = `http://127.0.0.1:${hostPort}`;
await waitForHealth(baseUrl, options.timeoutMs ?? 30_000);
return {
baseUrl,
token: "",
dispose: async () => {
try {
execFileSync("docker", ["rm", "-f", containerId], { stdio: "pipe" });
} catch {}
},
};
} catch (error) {
try {
execFileSync("docker", ["rm", "-f", containerId], { stdio: "pipe" });
} catch {}
throw error;
}
}
function ensureImage(): string {
if (cachedImage) {
return cachedImage;
}
cachedImage = process.env.SANDBOX_AGENT_TEST_IMAGE ?? DEFAULT_IMAGE_TAG;
execFileSync("docker", ["build", "--tag", cachedImage, "--file", resolve(REPO_ROOT, "docker/test-agent/Dockerfile"), REPO_ROOT], {
cwd: REPO_ROOT,
stdio: ["ignore", "ignore", "pipe"],
});
return cachedImage;
}
function buildEnv(layout: TestLayout, extraEnv: Record<string, string>, pathMode: "merge" | "replace"): Record<string, string> {
const env: Record<string, string> = {
HOME: layout.homeDir,
USERPROFILE: layout.homeDir,
XDG_DATA_HOME: layout.xdgDataHome,
XDG_STATE_HOME: layout.xdgStateHome,
APPDATA: layout.appDataDir,
LOCALAPPDATA: layout.localAppDataDir,
PATH: DEFAULT_PATH,
};
const customPathEntries = new Set<string>();
for (const entry of (extraEnv.PATH ?? "").split(":")) {
if (!entry || entry === DEFAULT_PATH || !entry.startsWith("/")) continue;
if (entry.startsWith(layout.rootDir)) {
customPathEntries.add(entry);
}
}
if (pathMode === "replace") {
env.PATH = extraEnv.PATH ?? "";
} else if (customPathEntries.size > 0) {
env.PATH = `${Array.from(customPathEntries).join(":")}:${DEFAULT_PATH}`;
}
for (const [key, value] of Object.entries(extraEnv)) {
if (key === "PATH") {
continue;
}
env[key] = rewriteLocalhostUrl(key, value);
}
return env;
}
function buildMounts(rootDir: string, env: Record<string, string>): string[] {
const mounts = new Set<string>([rootDir]);
for (const key of ["HOME", "USERPROFILE", "XDG_DATA_HOME", "XDG_STATE_HOME", "APPDATA", "LOCALAPPDATA", "SANDBOX_AGENT_DESKTOP_FAKE_STATE_DIR"]) {
const value = env[key];
if (value?.startsWith("/")) {
mounts.add(value);
}
}
for (const entry of (env.PATH ?? "").split(":")) {
if (entry.startsWith("/") && !STANDARD_PATHS.has(entry)) {
mounts.add(entry);
}
}
return Array.from(mounts);
}
async function waitForHealth(baseUrl: string, timeoutMs: number): Promise<void> {
const started = Date.now();
while (Date.now() - started < timeoutMs) {
try {
const response = await fetch(`${baseUrl}/v1/health`);
if (response.ok) {
return;
}
} catch {}
await new Promise((resolve) => setTimeout(resolve, 200));
}
throw new Error(`timed out waiting for sandbox-agent health at ${baseUrl}`);
}
function uniqueContainerId(): string {
containerCounter += 1;
return `sandbox-agent-ts-${process.pid}-${Date.now().toString(36)}-${containerCounter.toString(36)}`;
}
function rewriteLocalhostUrl(key: string, value: string): string {
if (key.endsWith("_URL") || key.endsWith("_URI")) {
return value.replace("http://127.0.0.1", "http://host.docker.internal").replace("http://localhost", "http://host.docker.internal");
}
return value;
}

View file

@ -1,9 +1,6 @@
import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { existsSync } from "node:fs";
import { mkdtempSync, rmSync } from "node:fs";
import { dirname, resolve } from "node:path";
import { describe, it, expect, beforeEach, afterEach } from "vitest";
import { mkdirSync, mkdtempSync, rmSync } from "node:fs";
import { join } from "node:path";
import { fileURLToPath } from "node:url";
import { tmpdir } from "node:os";
import {
InMemorySessionPersistDriver,
@ -14,36 +11,11 @@ import {
type SessionPersistDriver,
type SessionRecord,
} from "../src/index.ts";
import { spawnSandboxAgent, isNodeRuntime, type SandboxAgentSpawnHandle } from "../src/spawn.ts";
import { isNodeRuntime } from "../src/spawn.ts";
import { createDockerTestLayout, disposeDockerTestLayout, startDockerSandboxAgent, type DockerSandboxAgentHandle } from "./helpers/docker.ts";
import { prepareMockAgentDataHome } from "./helpers/mock-agent.ts";
import WebSocket from "ws";
const __dirname = dirname(fileURLToPath(import.meta.url));
function findBinary(): string | null {
if (process.env.SANDBOX_AGENT_BIN) {
return process.env.SANDBOX_AGENT_BIN;
}
const cargoPaths = [resolve(__dirname, "../../../target/debug/sandbox-agent"), resolve(__dirname, "../../../target/release/sandbox-agent")];
for (const p of cargoPaths) {
if (existsSync(p)) {
return p;
}
}
return null;
}
const BINARY_PATH = findBinary();
if (!BINARY_PATH) {
throw new Error("sandbox-agent binary not found. Build it (cargo build -p sandbox-agent) or set SANDBOX_AGENT_BIN.");
}
if (!process.env.SANDBOX_AGENT_BIN) {
process.env.SANDBOX_AGENT_BIN = BINARY_PATH;
}
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
@ -110,6 +82,15 @@ async function waitForAsync<T>(fn: () => Promise<T | undefined | null>, timeoutM
throw new Error("timed out waiting for condition");
}
async function withTimeout<T>(promise: Promise<T>, label: string, timeoutMs = 15_000): Promise<T> {
return await Promise.race([
promise,
sleep(timeoutMs).then(() => {
throw new Error(`${label} timed out after ${timeoutMs}ms`);
}),
]);
}
function buildTarArchive(entries: Array<{ name: string; content: string }>): Uint8Array {
const blocks: Buffer[] = [];
@ -174,34 +155,77 @@ function decodeProcessLogData(data: string, encoding: string): string {
function nodeCommand(source: string): { command: string; args: string[] } {
return {
command: process.execPath,
command: "node",
args: ["-e", source],
};
}
function forwardRequest(defaultFetch: typeof fetch, baseUrl: string, outgoing: Request, parsed: URL): Promise<Response> {
const forwardedInit: RequestInit & { duplex?: "half" } = {
method: outgoing.method,
headers: new Headers(outgoing.headers),
signal: outgoing.signal,
};
if (outgoing.method !== "GET" && outgoing.method !== "HEAD") {
forwardedInit.body = outgoing.body;
forwardedInit.duplex = "half";
}
const forwardedUrl = new URL(`${parsed.pathname}${parsed.search}`, baseUrl);
return defaultFetch(forwardedUrl, forwardedInit);
}
async function launchDesktopFocusWindow(sdk: SandboxAgent, display: string): Promise<string> {
const windowProcess = await sdk.createProcess({
command: "xterm",
args: ["-geometry", "80x24+40+40", "-title", "Sandbox Desktop Test", "-e", "sh", "-lc", "sleep 60"],
env: { DISPLAY: display },
});
await waitForAsync(
async () => {
const result = await sdk.runProcess({
command: "sh",
args: [
"-lc",
'wid="$(xdotool search --onlyvisible --name \'Sandbox Desktop Test\' 2>/dev/null | head -n 1 || true)"; if [ -z "$wid" ]; then exit 3; fi; xdotool windowactivate "$wid"',
],
env: { DISPLAY: display },
timeoutMs: 5_000,
});
return result.exitCode === 0 ? true : undefined;
},
10_000,
200,
);
return windowProcess.id;
}
describe("Integration: TypeScript SDK flat session API", () => {
let handle: SandboxAgentSpawnHandle;
let handle: DockerSandboxAgentHandle;
let baseUrl: string;
let token: string;
let dataHome: string;
let layout: ReturnType<typeof createDockerTestLayout>;
beforeAll(async () => {
dataHome = mkdtempSync(join(tmpdir(), "sdk-integration-"));
const agentEnv = prepareMockAgentDataHome(dataHome);
beforeEach(async () => {
layout = createDockerTestLayout();
prepareMockAgentDataHome(layout.xdgDataHome);
handle = await spawnSandboxAgent({
enabled: true,
log: "silent",
handle = await startDockerSandboxAgent(layout, {
timeoutMs: 30000,
env: agentEnv,
});
baseUrl = handle.baseUrl;
token = handle.token;
});
afterAll(async () => {
await handle.dispose();
rmSync(dataHome, { recursive: true, force: true });
afterEach(async () => {
await handle?.dispose?.();
if (layout) {
disposeDockerTestLayout(layout);
}
});
it("detects Node.js runtime", () => {
@ -280,11 +304,12 @@ describe("Integration: TypeScript SDK flat session API", () => {
token,
});
const directory = mkdtempSync(join(tmpdir(), "sdk-fs-"));
const directory = join(layout.rootDir, "fs-test");
const nestedDir = join(directory, "nested");
const filePath = join(directory, "notes.txt");
const movedPath = join(directory, "notes-moved.txt");
const uploadDir = join(directory, "uploaded");
mkdirSync(directory, { recursive: true });
try {
const listedAgents = await sdk.listAgents({ config: true, noCache: true });
@ -341,25 +366,30 @@ describe("Integration: TypeScript SDK flat session API", () => {
const parsed = new URL(outgoing.url);
seenPaths.push(parsed.pathname);
const forwardedUrl = new URL(`${parsed.pathname}${parsed.search}`, baseUrl);
const forwarded = new Request(forwardedUrl.toString(), outgoing);
return defaultFetch(forwarded);
return forwardRequest(defaultFetch, baseUrl, outgoing, parsed);
};
const sdk = await SandboxAgent.connect({
token,
fetch: customFetch,
});
let sessionId: string | undefined;
await sdk.getHealth();
const session = await sdk.createSession({ agent: "mock" });
const prompt = await session.prompt([{ type: "text", text: "custom fetch integration test" }]);
expect(prompt.stopReason).toBe("end_turn");
try {
await withTimeout(sdk.getHealth(), "custom fetch getHealth");
const session = await withTimeout(sdk.createSession({ agent: "mock" }), "custom fetch createSession");
sessionId = session.id;
expect(session.agent).toBe("mock");
await withTimeout(sdk.destroySession(session.id), "custom fetch destroySession");
expect(seenPaths).toContain("/v1/health");
expect(seenPaths.some((path) => path.startsWith("/v1/acp/"))).toBe(true);
await sdk.dispose();
expect(seenPaths).toContain("/v1/health");
expect(seenPaths.some((path) => path.startsWith("/v1/acp/"))).toBe(true);
} finally {
if (sessionId) {
await sdk.destroySession(sessionId).catch(() => {});
}
await withTimeout(sdk.dispose(), "custom fetch dispose");
}
}, 60_000);
it("requires baseUrl when fetch is not provided", async () => {
@ -386,9 +416,7 @@ describe("Integration: TypeScript SDK flat session API", () => {
}
}
const forwardedUrl = new URL(`${parsed.pathname}${parsed.search}`, baseUrl);
const forwarded = new Request(forwardedUrl.toString(), outgoing);
return defaultFetch(forwarded);
return forwardRequest(defaultFetch, baseUrl, outgoing, parsed);
};
const sdk = await SandboxAgent.connect({
@ -710,7 +738,9 @@ describe("Integration: TypeScript SDK flat session API", () => {
token,
});
const directory = mkdtempSync(join(tmpdir(), "sdk-config-"));
const directory = join(layout.rootDir, "config-test");
mkdirSync(directory, { recursive: true });
const mcpConfig = {
type: "local" as const,
@ -957,4 +987,98 @@ describe("Integration: TypeScript SDK flat session API", () => {
await sdk.dispose();
}
});
it("covers desktop status, screenshot, display, mouse, and keyboard helpers", async () => {
const sdk = await SandboxAgent.connect({
baseUrl,
token,
});
let focusWindowProcessId: string | undefined;
try {
const initialStatus = await sdk.getDesktopStatus();
expect(initialStatus.state).toBe("inactive");
const started = await sdk.startDesktop({
width: 1440,
height: 900,
dpi: 96,
});
expect(started.state).toBe("active");
expect(started.display?.startsWith(":")).toBe(true);
expect(started.missingDependencies).toEqual([]);
const displayInfo = await sdk.getDesktopDisplayInfo();
expect(displayInfo.display).toBe(started.display);
expect(displayInfo.resolution.width).toBe(1440);
expect(displayInfo.resolution.height).toBe(900);
const screenshot = await sdk.takeDesktopScreenshot();
expect(Buffer.from(screenshot.subarray(0, 8)).equals(Buffer.from("\x89PNG\r\n\x1a\n", "binary"))).toBe(true);
const region = await sdk.takeDesktopRegionScreenshot({
x: 10,
y: 20,
width: 40,
height: 50,
});
expect(Buffer.from(region.subarray(0, 8)).equals(Buffer.from("\x89PNG\r\n\x1a\n", "binary"))).toBe(true);
const moved = await sdk.moveDesktopMouse({ x: 40, y: 50 });
expect(moved.x).toBe(40);
expect(moved.y).toBe(50);
const dragged = await sdk.dragDesktopMouse({
startX: 40,
startY: 50,
endX: 80,
endY: 90,
button: "left",
});
expect(dragged.x).toBe(80);
expect(dragged.y).toBe(90);
const clicked = await sdk.clickDesktop({
x: 80,
y: 90,
button: "left",
clickCount: 1,
});
expect(clicked.x).toBe(80);
expect(clicked.y).toBe(90);
const scrolled = await sdk.scrollDesktop({
x: 80,
y: 90,
deltaY: -2,
});
expect(scrolled.x).toBe(80);
expect(scrolled.y).toBe(90);
const position = await sdk.getDesktopMousePosition();
expect(position.x).toBe(80);
expect(position.y).toBe(90);
focusWindowProcessId = await launchDesktopFocusWindow(sdk, started.display!);
const typed = await sdk.typeDesktopText({
text: "hello desktop",
delayMs: 5,
});
expect(typed.ok).toBe(true);
const pressed = await sdk.pressDesktopKey({ key: "ctrl+l" });
expect(pressed.ok).toBe(true);
const stopped = await sdk.stopDesktop();
expect(stopped.state).toBe("inactive");
} finally {
if (focusWindowProcessId) {
await sdk.killProcess(focusWindowProcessId, { waitMs: 5_000 }).catch(() => {});
await sdk.deleteProcess(focusWindowProcessId).catch(() => {});
}
await sdk.stopDesktop().catch(() => {});
await sdk.dispose();
}
});
});

View file

@ -4,7 +4,6 @@ export default defineConfig({
test: {
include: ["tests/**/*.test.ts"],
testTimeout: 30000,
teardownTimeout: 10000,
pool: "forks",
hookTimeout: 120000,
},
});

42
server/compose.dev.yaml Normal file
View file

@ -0,0 +1,42 @@
name: sandbox-agent-dev
services:
backend:
build:
context: ..
dockerfile: docker/test-agent/Dockerfile
image: sandbox-agent-dev
command: ["server", "--host", "0.0.0.0", "--port", "3000", "--no-token"]
environment:
RUST_LOG: "${RUST_LOG:-info}"
ports:
- "2468:3000"
# UDP port range for WebRTC media transport (neko)
- "59050-59070:59050-59070/udp"
frontend:
build:
context: ..
dockerfile: docker/inspector-dev/Dockerfile
working_dir: /app
depends_on:
- backend
environment:
SANDBOX_AGENT_URL: "http://backend:3000"
ports:
- "5173:5173"
volumes:
- "..:/app"
# Keep Linux-native node_modules inside the container.
- "sa_root_node_modules:/app/node_modules"
- "sa_inspector_node_modules:/app/frontend/packages/inspector/node_modules"
- "sa_react_node_modules:/app/sdks/react/node_modules"
- "sa_typescript_node_modules:/app/sdks/typescript/node_modules"
- "sa_pnpm_store:/root/.local/share/pnpm/store"
volumes:
sa_root_node_modules: {}
sa_inspector_node_modules: {}
sa_react_node_modules: {}
sa_typescript_node_modules: {}
sa_pnpm_store: {}

View file

@ -41,6 +41,7 @@ base64.workspace = true
toml_edit.workspace = true
tar.workspace = true
zip.workspace = true
tokio-tungstenite = "0.24"
tempfile = { workspace = true, optional = true }
[target.'cfg(unix)'.dependencies]

View file

@ -11,6 +11,7 @@ mod build_version {
include!(concat!(env!("OUT_DIR"), "/version.rs"));
}
use crate::desktop_install::{install_desktop, DesktopInstallRequest, DesktopPackageManager};
use crate::router::{
build_router_with_state, shutdown_servers, AppState, AuthConfig, BrandingMode,
};
@ -75,6 +76,8 @@ pub enum Command {
Server(ServerArgs),
/// Call the HTTP API without writing client code.
Api(ApiArgs),
/// Install first-party runtime dependencies.
Install(InstallArgs),
/// EXPERIMENTAL: OpenCode compatibility layer (disabled until ACP Phase 7).
Opencode(OpencodeArgs),
/// Manage the sandbox-agent background daemon.
@ -118,6 +121,12 @@ pub struct ApiArgs {
command: ApiCommand,
}
#[derive(Args, Debug)]
pub struct InstallArgs {
#[command(subcommand)]
command: InstallCommand,
}
#[derive(Args, Debug)]
pub struct OpencodeArgs {
#[arg(long, short = 'H', default_value = DEFAULT_HOST)]
@ -156,6 +165,12 @@ pub struct DaemonArgs {
command: DaemonCommand,
}
#[derive(Subcommand, Debug)]
pub enum InstallCommand {
/// Install desktop runtime dependencies.
Desktop(InstallDesktopArgs),
}
#[derive(Subcommand, Debug)]
pub enum DaemonCommand {
/// Start the daemon in the background.
@ -310,6 +325,18 @@ pub struct InstallAgentArgs {
agent_process_version: Option<String>,
}
#[derive(Args, Debug)]
pub struct InstallDesktopArgs {
#[arg(long, default_value_t = false)]
yes: bool,
#[arg(long, default_value_t = false)]
print_only: bool,
#[arg(long, value_enum)]
package_manager: Option<DesktopPackageManager>,
#[arg(long, default_value_t = false)]
no_fonts: bool,
}
#[derive(Args, Debug)]
pub struct CredentialsExtractArgs {
#[arg(long, short = 'a', value_enum)]
@ -405,6 +432,7 @@ pub fn run_command(command: &Command, cli: &CliConfig) -> Result<(), CliError> {
match command {
Command::Server(args) => run_server(cli, args),
Command::Api(subcommand) => run_api(&subcommand.command, cli),
Command::Install(subcommand) => run_install(&subcommand.command),
Command::Opencode(args) => run_opencode(cli, args),
Command::Daemon(subcommand) => run_daemon(&subcommand.command, cli),
Command::InstallAgent(args) => install_agent_local(args),
@ -413,6 +441,12 @@ pub fn run_command(command: &Command, cli: &CliConfig) -> Result<(), CliError> {
}
}
fn run_install(command: &InstallCommand) -> Result<(), CliError> {
match command {
InstallCommand::Desktop(args) => install_desktop_local(args),
}
}
fn run_server(cli: &CliConfig, server: &ServerArgs) -> Result<(), CliError> {
let auth = if let Some(token) = cli.token.clone() {
AuthConfig::with_token(token)
@ -477,6 +511,17 @@ fn run_api(command: &ApiCommand, cli: &CliConfig) -> Result<(), CliError> {
}
}
fn install_desktop_local(args: &InstallDesktopArgs) -> Result<(), CliError> {
install_desktop(DesktopInstallRequest {
yes: args.yes,
print_only: args.print_only,
package_manager: args.package_manager,
no_fonts: args.no_fonts,
})
.map(|_| ())
.map_err(CliError::Server)
}
fn run_agents(command: &AgentsCommand, cli: &CliConfig) -> Result<(), CliError> {
match command {
AgentsCommand::List(args) => {

View file

@ -0,0 +1,251 @@
use sandbox_agent_error::ProblemDetails;
use serde_json::{json, Map, Value};
use crate::desktop_types::{DesktopErrorInfo, DesktopProcessInfo};
#[derive(Debug, Clone)]
pub struct DesktopProblem {
status: u16,
title: &'static str,
code: &'static str,
message: String,
missing_dependencies: Vec<String>,
install_command: Option<String>,
processes: Vec<DesktopProcessInfo>,
}
impl DesktopProblem {
pub fn unsupported_platform(message: impl Into<String>) -> Self {
Self::new(
501,
"Desktop Unsupported",
"desktop_unsupported_platform",
message,
)
}
pub fn dependencies_missing(
missing_dependencies: Vec<String>,
install_command: Option<String>,
processes: Vec<DesktopProcessInfo>,
) -> Self {
let mut message = if missing_dependencies.is_empty() {
"Desktop dependencies are not installed".to_string()
} else {
format!(
"Desktop dependencies are not installed: {}",
missing_dependencies.join(", ")
)
};
if let Some(command) = install_command.as_ref() {
message.push_str(&format!(
". Run `{command}` to install them, or install the required tools manually."
));
}
Self::new(
503,
"Desktop Dependencies Missing",
"desktop_dependencies_missing",
message,
)
.with_missing_dependencies(missing_dependencies)
.with_install_command(install_command)
.with_processes(processes)
}
pub fn runtime_inactive(message: impl Into<String>) -> Self {
Self::new(
409,
"Desktop Runtime Inactive",
"desktop_runtime_inactive",
message,
)
}
pub fn runtime_starting(message: impl Into<String>) -> Self {
Self::new(
409,
"Desktop Runtime Starting",
"desktop_runtime_starting",
message,
)
}
pub fn runtime_failed(
message: impl Into<String>,
install_command: Option<String>,
processes: Vec<DesktopProcessInfo>,
) -> Self {
Self::new(
503,
"Desktop Runtime Failed",
"desktop_runtime_failed",
message,
)
.with_install_command(install_command)
.with_processes(processes)
}
pub fn invalid_action(message: impl Into<String>) -> Self {
Self::new(
400,
"Desktop Invalid Action",
"desktop_invalid_action",
message,
)
}
pub fn screenshot_failed(
message: impl Into<String>,
processes: Vec<DesktopProcessInfo>,
) -> Self {
Self::new(
502,
"Desktop Screenshot Failed",
"desktop_screenshot_failed",
message,
)
.with_processes(processes)
}
pub fn input_failed(message: impl Into<String>, processes: Vec<DesktopProcessInfo>) -> Self {
Self::new(502, "Desktop Input Failed", "desktop_input_failed", message)
.with_processes(processes)
}
pub fn window_not_found(message: impl Into<String>) -> Self {
Self::new(404, "Window Not Found", "window_not_found", message)
}
pub fn no_focused_window() -> Self {
Self::new(
404,
"No Focused Window",
"no_focused_window",
"No window currently has focus",
)
}
pub fn stream_already_active(message: impl Into<String>) -> Self {
Self::new(
409,
"Stream Already Active",
"stream_already_active",
message,
)
}
pub fn stream_not_active(message: impl Into<String>) -> Self {
Self::new(409, "Stream Not Active", "stream_not_active", message)
}
pub fn clipboard_failed(message: impl Into<String>) -> Self {
Self::new(500, "Clipboard Failed", "clipboard_failed", message)
}
pub fn app_not_found(message: impl Into<String>) -> Self {
Self::new(404, "App Not Found", "app_not_found", message)
}
pub fn to_problem_details(&self) -> ProblemDetails {
let mut extensions = Map::new();
extensions.insert("code".to_string(), Value::String(self.code.to_string()));
if !self.missing_dependencies.is_empty() {
extensions.insert(
"missingDependencies".to_string(),
Value::Array(
self.missing_dependencies
.iter()
.cloned()
.map(Value::String)
.collect(),
),
);
}
if let Some(install_command) = self.install_command.as_ref() {
extensions.insert(
"installCommand".to_string(),
Value::String(install_command.clone()),
);
}
if !self.processes.is_empty() {
extensions.insert("processes".to_string(), json!(self.processes));
}
ProblemDetails {
type_: format!("urn:sandbox-agent:error:{}", self.code),
title: self.title.to_string(),
status: self.status,
detail: Some(self.message.clone()),
instance: None,
extensions,
}
}
pub fn to_error_info(&self) -> DesktopErrorInfo {
DesktopErrorInfo {
code: self.code.to_string(),
message: self.message.clone(),
}
}
pub fn code(&self) -> &'static str {
self.code
}
fn new(
status: u16,
title: &'static str,
code: &'static str,
message: impl Into<String>,
) -> Self {
Self {
status,
title,
code,
message: message.into(),
missing_dependencies: Vec::new(),
install_command: None,
processes: Vec::new(),
}
}
fn with_missing_dependencies(mut self, missing_dependencies: Vec<String>) -> Self {
self.missing_dependencies = missing_dependencies;
self
}
fn with_install_command(mut self, install_command: Option<String>) -> Self {
self.install_command = install_command;
self
}
fn with_processes(mut self, processes: Vec<DesktopProcessInfo>) -> Self {
self.processes = processes;
self
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn dependencies_missing_detail_includes_install_command() {
let problem = DesktopProblem::dependencies_missing(
vec!["Xvfb".to_string(), "openbox".to_string()],
Some("sandbox-agent install desktop --yes".to_string()),
Vec::new(),
);
let details = problem.to_problem_details();
let detail = details.detail.expect("detail");
assert!(detail.contains("Desktop dependencies are not installed: Xvfb, openbox"));
assert!(detail.contains("sandbox-agent install desktop --yes"));
assert_eq!(
details.extensions.get("installCommand"),
Some(&Value::String(
"sandbox-agent install desktop --yes".to_string()
))
);
}
}

View file

@ -0,0 +1,324 @@
use std::fmt;
use std::io::{self, Write};
use std::path::PathBuf;
use std::process::Command as ProcessCommand;
use clap::ValueEnum;
const AUTOMATIC_INSTALL_SUPPORTED_DISTROS: &str =
"Automatic desktop dependency installation is supported on Debian/Ubuntu (apt), Fedora/RHEL (dnf), and Alpine (apk).";
const AUTOMATIC_INSTALL_UNSUPPORTED_ENVS: &str =
"Automatic installation is not supported on macOS, Windows, or Linux distributions without apt, dnf, or apk.";
#[derive(Debug, Clone, Copy, PartialEq, Eq, ValueEnum)]
pub enum DesktopPackageManager {
Apt,
Dnf,
Apk,
}
#[derive(Debug, Clone)]
pub struct DesktopInstallRequest {
pub yes: bool,
pub print_only: bool,
pub package_manager: Option<DesktopPackageManager>,
pub no_fonts: bool,
}
pub(crate) fn desktop_platform_support_message() -> String {
format!("Desktop APIs are only supported on Linux. {AUTOMATIC_INSTALL_SUPPORTED_DISTROS}")
}
fn linux_install_support_message() -> String {
format!("{AUTOMATIC_INSTALL_SUPPORTED_DISTROS} {AUTOMATIC_INSTALL_UNSUPPORTED_ENVS}")
}
pub fn install_desktop(request: DesktopInstallRequest) -> Result<(), String> {
if std::env::consts::OS != "linux" {
return Err(format!(
"desktop installation is only supported on Linux. {}",
linux_install_support_message()
));
}
let package_manager = match request.package_manager {
Some(value) => value,
None => detect_package_manager().ok_or_else(|| {
format!(
"could not detect a supported package manager. {} Install the desktop dependencies manually on this distribution.",
linux_install_support_message()
)
})?,
};
let packages = desktop_packages(package_manager, request.no_fonts);
let used_sudo = !running_as_root() && find_binary("sudo").is_some();
if !running_as_root() && !used_sudo {
return Err(
"desktop installation requires root or sudo access; rerun as root or install dependencies manually"
.to_string(),
);
}
println!("Desktop package manager: {}", package_manager);
println!("Desktop packages:");
for package in &packages {
println!(" - {package}");
}
println!("Install command:");
println!(
" {}",
render_install_command(package_manager, used_sudo, &packages)
);
if request.print_only {
return Ok(());
}
if !request.yes && !prompt_yes_no("Proceed with desktop dependency installation? [y/N] ")? {
return Err("installation cancelled".to_string());
}
run_install_commands(package_manager, used_sudo, &packages)?;
println!("Desktop dependencies installed.");
Ok(())
}
fn detect_package_manager() -> Option<DesktopPackageManager> {
if find_binary("apt-get").is_some() {
return Some(DesktopPackageManager::Apt);
}
if find_binary("dnf").is_some() {
return Some(DesktopPackageManager::Dnf);
}
if find_binary("apk").is_some() {
return Some(DesktopPackageManager::Apk);
}
None
}
fn desktop_packages(package_manager: DesktopPackageManager, no_fonts: bool) -> Vec<String> {
let mut packages = match package_manager {
DesktopPackageManager::Apt => vec![
"xvfb",
"openbox",
"xdotool",
"imagemagick",
"ffmpeg",
"x11-xserver-utils",
"dbus-x11",
"xauth",
"fonts-dejavu-core",
],
DesktopPackageManager::Dnf => vec![
"xorg-x11-server-Xvfb",
"openbox",
"xdotool",
"ImageMagick",
"ffmpeg",
"xrandr",
"dbus-x11",
"xauth",
"dejavu-sans-fonts",
],
DesktopPackageManager::Apk => vec![
"xvfb",
"openbox",
"xdotool",
"imagemagick",
"ffmpeg",
"xrandr",
"dbus",
"xauth",
"ttf-dejavu",
],
}
.into_iter()
.map(str::to_string)
.collect::<Vec<_>>();
if no_fonts {
packages.retain(|package| {
package != "fonts-dejavu-core"
&& package != "dejavu-sans-fonts"
&& package != "ttf-dejavu"
});
}
packages
}
fn render_install_command(
package_manager: DesktopPackageManager,
used_sudo: bool,
packages: &[String],
) -> String {
let sudo = if used_sudo { "sudo " } else { "" };
match package_manager {
DesktopPackageManager::Apt => format!(
"{sudo}apt-get update && {sudo}env DEBIAN_FRONTEND=noninteractive apt-get install -y {}",
packages.join(" ")
),
DesktopPackageManager::Dnf => {
format!("{sudo}dnf install -y {}", packages.join(" "))
}
DesktopPackageManager::Apk => {
format!("{sudo}apk add --no-cache {}", packages.join(" "))
}
}
}
fn run_install_commands(
package_manager: DesktopPackageManager,
used_sudo: bool,
packages: &[String],
) -> Result<(), String> {
match package_manager {
DesktopPackageManager::Apt => {
run_command(command_with_privilege(
used_sudo,
"apt-get",
vec!["update".to_string()],
))?;
let mut args = vec![
"DEBIAN_FRONTEND=noninteractive".to_string(),
"apt-get".to_string(),
"install".to_string(),
"-y".to_string(),
];
args.extend(packages.iter().cloned());
run_command(command_with_privilege(used_sudo, "env", args))?;
}
DesktopPackageManager::Dnf => {
let mut args = vec!["install".to_string(), "-y".to_string()];
args.extend(packages.iter().cloned());
run_command(command_with_privilege(used_sudo, "dnf", args))?;
}
DesktopPackageManager::Apk => {
let mut args = vec!["add".to_string(), "--no-cache".to_string()];
args.extend(packages.iter().cloned());
run_command(command_with_privilege(used_sudo, "apk", args))?;
}
}
Ok(())
}
fn command_with_privilege(
used_sudo: bool,
program: &str,
args: Vec<String>,
) -> (String, Vec<String>) {
if used_sudo {
let mut sudo_args = vec![program.to_string()];
sudo_args.extend(args);
("sudo".to_string(), sudo_args)
} else {
(program.to_string(), args)
}
}
fn run_command((program, args): (String, Vec<String>)) -> Result<(), String> {
let status = ProcessCommand::new(&program)
.args(&args)
.status()
.map_err(|err| format!("failed to run `{program}`: {err}"))?;
if !status.success() {
return Err(format!(
"command `{}` exited with status {}",
format_command(&program, &args),
status
));
}
Ok(())
}
fn prompt_yes_no(prompt: &str) -> Result<bool, String> {
print!("{prompt}");
io::stdout()
.flush()
.map_err(|err| format!("failed to flush prompt: {err}"))?;
let mut input = String::new();
io::stdin()
.read_line(&mut input)
.map_err(|err| format!("failed to read confirmation: {err}"))?;
let normalized = input.trim().to_ascii_lowercase();
Ok(matches!(normalized.as_str(), "y" | "yes"))
}
fn running_as_root() -> bool {
#[cfg(unix)]
unsafe {
return libc::geteuid() == 0;
}
#[cfg(not(unix))]
{
false
}
}
fn find_binary(name: &str) -> Option<PathBuf> {
let path_env = std::env::var_os("PATH")?;
for path in std::env::split_paths(&path_env) {
let candidate = path.join(name);
if candidate.is_file() {
return Some(candidate);
}
}
None
}
fn format_command(program: &str, args: &[String]) -> String {
let mut parts = vec![program.to_string()];
parts.extend(args.iter().cloned());
parts.join(" ")
}
impl fmt::Display for DesktopPackageManager {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
DesktopPackageManager::Apt => write!(f, "apt"),
DesktopPackageManager::Dnf => write!(f, "dnf"),
DesktopPackageManager::Apk => write!(f, "apk"),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn desktop_platform_support_message_mentions_linux_and_supported_distros() {
let message = desktop_platform_support_message();
assert!(message.contains("only supported on Linux"));
assert!(message.contains("Debian/Ubuntu (apt)"));
assert!(message.contains("Fedora/RHEL (dnf)"));
assert!(message.contains("Alpine (apk)"));
}
#[test]
fn linux_install_support_message_mentions_unsupported_environments() {
let message = linux_install_support_message();
assert!(message.contains("Debian/Ubuntu (apt)"));
assert!(message.contains("Fedora/RHEL (dnf)"));
assert!(message.contains("Alpine (apk)"));
assert!(message.contains("macOS"));
assert!(message.contains("Windows"));
assert!(message.contains("without apt, dnf, or apk"));
}
#[test]
fn desktop_packages_support_no_fonts() {
let packages = desktop_packages(DesktopPackageManager::Apt, true);
assert!(!packages.iter().any(|value| value == "fonts-dejavu-core"));
assert!(packages.iter().any(|value| value == "xvfb"));
}
#[test]
fn render_install_command_matches_package_manager() {
let packages = vec!["xvfb".to_string(), "openbox".to_string()];
let command = render_install_command(DesktopPackageManager::Apk, false, &packages);
assert_eq!(command, "apk add --no-cache xvfb openbox");
}
}

View file

@ -0,0 +1,329 @@
use std::collections::BTreeMap;
use std::fs;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use tokio::sync::Mutex;
use sandbox_agent_error::SandboxError;
use crate::desktop_types::{
DesktopRecordingInfo, DesktopRecordingListResponse, DesktopRecordingStartRequest,
DesktopRecordingStatus, DesktopResolution,
};
use crate::process_runtime::{
ProcessOwner, ProcessRuntime, ProcessStartSpec, ProcessStatus, RestartPolicy,
};
#[derive(Debug, Clone)]
pub struct DesktopRecordingContext {
pub display: String,
pub environment: std::collections::HashMap<String, String>,
pub resolution: DesktopResolution,
}
#[derive(Debug, Clone)]
pub struct DesktopRecordingManager {
process_runtime: Arc<ProcessRuntime>,
recordings_dir: PathBuf,
inner: Arc<Mutex<DesktopRecordingState>>,
}
#[derive(Debug, Default)]
struct DesktopRecordingState {
next_id: u64,
current_id: Option<String>,
recordings: BTreeMap<String, RecordingEntry>,
}
#[derive(Debug, Clone)]
struct RecordingEntry {
info: DesktopRecordingInfo,
path: PathBuf,
}
impl DesktopRecordingManager {
pub fn new(process_runtime: Arc<ProcessRuntime>, state_dir: PathBuf) -> Self {
Self {
process_runtime,
recordings_dir: state_dir.join("recordings"),
inner: Arc::new(Mutex::new(DesktopRecordingState::default())),
}
}
pub async fn start(
&self,
context: DesktopRecordingContext,
request: DesktopRecordingStartRequest,
) -> Result<DesktopRecordingInfo, SandboxError> {
if find_binary("ffmpeg").is_none() {
return Err(SandboxError::Conflict {
message: "ffmpeg is required for desktop recording".to_string(),
});
}
self.ensure_recordings_dir()?;
{
let mut state = self.inner.lock().await;
self.refresh_locked(&mut state).await?;
if state.current_id.is_some() {
return Err(SandboxError::Conflict {
message: "a desktop recording is already active".to_string(),
});
}
}
let mut state = self.inner.lock().await;
let id_num = state.next_id + 1;
state.next_id = id_num;
let id = format!("rec_{id_num}");
let file_name = format!("{id}.mp4");
let path = self.recordings_dir.join(&file_name);
let fps = request.fps.unwrap_or(30).clamp(1, 60);
let args = vec![
"-y".to_string(),
"-video_size".to_string(),
format!("{}x{}", context.resolution.width, context.resolution.height),
"-framerate".to_string(),
fps.to_string(),
"-f".to_string(),
"x11grab".to_string(),
"-i".to_string(),
context.display,
"-c:v".to_string(),
"libx264".to_string(),
"-preset".to_string(),
"ultrafast".to_string(),
"-pix_fmt".to_string(),
"yuv420p".to_string(),
path.to_string_lossy().to_string(),
];
let snapshot = self
.process_runtime
.start_process(ProcessStartSpec {
command: "ffmpeg".to_string(),
args,
cwd: None,
env: context.environment,
tty: false,
interactive: false,
owner: ProcessOwner::Desktop,
restart_policy: Some(RestartPolicy::Never),
})
.await?;
let info = DesktopRecordingInfo {
id: id.clone(),
status: DesktopRecordingStatus::Recording,
process_id: Some(snapshot.id),
file_name,
bytes: 0,
started_at: chrono::Utc::now().to_rfc3339(),
ended_at: None,
};
state.current_id = Some(id.clone());
state.recordings.insert(
id,
RecordingEntry {
info: info.clone(),
path,
},
);
Ok(info)
}
pub async fn stop(&self) -> Result<DesktopRecordingInfo, SandboxError> {
let (recording_id, process_id) = {
let mut state = self.inner.lock().await;
self.refresh_locked(&mut state).await?;
let recording_id = state
.current_id
.clone()
.ok_or_else(|| SandboxError::Conflict {
message: "no desktop recording is active".to_string(),
})?;
let process_id = state
.recordings
.get(&recording_id)
.and_then(|entry| entry.info.process_id.clone());
(recording_id, process_id)
};
if let Some(process_id) = process_id {
let snapshot = self
.process_runtime
.stop_process(&process_id, Some(5_000))
.await?;
if snapshot.status == ProcessStatus::Running {
let _ = self
.process_runtime
.kill_process(&process_id, Some(1_000))
.await;
}
}
let mut state = self.inner.lock().await;
self.refresh_locked(&mut state).await?;
let entry = state
.recordings
.get(&recording_id)
.ok_or_else(|| SandboxError::NotFound {
resource: "desktop_recording".to_string(),
id: recording_id.clone(),
})?;
Ok(entry.info.clone())
}
pub async fn list(&self) -> Result<DesktopRecordingListResponse, SandboxError> {
let mut state = self.inner.lock().await;
self.refresh_locked(&mut state).await?;
Ok(DesktopRecordingListResponse {
recordings: state
.recordings
.values()
.map(|entry| entry.info.clone())
.collect(),
})
}
pub async fn get(&self, id: &str) -> Result<DesktopRecordingInfo, SandboxError> {
let mut state = self.inner.lock().await;
self.refresh_locked(&mut state).await?;
state
.recordings
.get(id)
.map(|entry| entry.info.clone())
.ok_or_else(|| SandboxError::NotFound {
resource: "desktop_recording".to_string(),
id: id.to_string(),
})
}
pub async fn download_path(&self, id: &str) -> Result<PathBuf, SandboxError> {
let mut state = self.inner.lock().await;
self.refresh_locked(&mut state).await?;
let entry = state
.recordings
.get(id)
.ok_or_else(|| SandboxError::NotFound {
resource: "desktop_recording".to_string(),
id: id.to_string(),
})?;
if !entry.path.is_file() {
return Err(SandboxError::NotFound {
resource: "desktop_recording_file".to_string(),
id: id.to_string(),
});
}
Ok(entry.path.clone())
}
pub async fn delete(&self, id: &str) -> Result<(), SandboxError> {
let mut state = self.inner.lock().await;
self.refresh_locked(&mut state).await?;
if state.current_id.as_deref() == Some(id) {
return Err(SandboxError::Conflict {
message: "stop the active desktop recording before deleting it".to_string(),
});
}
let entry = state
.recordings
.remove(id)
.ok_or_else(|| SandboxError::NotFound {
resource: "desktop_recording".to_string(),
id: id.to_string(),
})?;
if entry.path.exists() {
fs::remove_file(&entry.path).map_err(|err| SandboxError::StreamError {
message: format!(
"failed to delete desktop recording {}: {err}",
entry.path.display()
),
})?;
}
Ok(())
}
fn ensure_recordings_dir(&self) -> Result<(), SandboxError> {
fs::create_dir_all(&self.recordings_dir).map_err(|err| SandboxError::StreamError {
message: format!(
"failed to create desktop recordings dir {}: {err}",
self.recordings_dir.display()
),
})
}
async fn refresh_locked(&self, state: &mut DesktopRecordingState) -> Result<(), SandboxError> {
let ids: Vec<String> = state.recordings.keys().cloned().collect();
for id in ids {
let should_clear_current = {
let Some(entry) = state.recordings.get_mut(&id) else {
continue;
};
let Some(process_id) = entry.info.process_id.clone() else {
Self::refresh_bytes(entry);
continue;
};
let snapshot = match self.process_runtime.snapshot(&process_id).await {
Ok(snapshot) => snapshot,
Err(SandboxError::NotFound { .. }) => {
Self::finalize_entry(entry, false);
continue;
}
Err(err) => return Err(err),
};
if snapshot.status == ProcessStatus::Running {
Self::refresh_bytes(entry);
false
} else {
Self::finalize_entry(entry, snapshot.exit_code == Some(0));
true
}
};
if should_clear_current && state.current_id.as_deref() == Some(id.as_str()) {
state.current_id = None;
}
}
Ok(())
}
fn refresh_bytes(entry: &mut RecordingEntry) {
entry.info.bytes = file_size(&entry.path);
}
fn finalize_entry(entry: &mut RecordingEntry, success: bool) {
let bytes = file_size(&entry.path);
entry.info.status = if success || (entry.path.is_file() && bytes > 0) {
DesktopRecordingStatus::Completed
} else {
DesktopRecordingStatus::Failed
};
entry
.info
.ended_at
.get_or_insert_with(|| chrono::Utc::now().to_rfc3339());
entry.info.bytes = bytes;
}
}
fn find_binary(name: &str) -> Option<PathBuf> {
let path_env = std::env::var_os("PATH")?;
for path in std::env::split_paths(&path_env) {
let candidate = path.join(name);
if candidate.is_file() {
return Some(candidate);
}
}
None
}
fn file_size(path: &Path) -> u64 {
fs::metadata(path)
.map(|metadata| metadata.len())
.unwrap_or(0)
}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,391 @@
use std::collections::HashMap;
use std::sync::Arc;
use std::time::Duration;
use tokio::sync::Mutex;
use sandbox_agent_error::SandboxError;
use crate::desktop_types::{DesktopProcessInfo, DesktopResolution, DesktopStreamStatusResponse};
use crate::process_runtime::{ProcessOwner, ProcessRuntime, ProcessStartSpec};
/// Internal port where neko listens for HTTP/WS traffic.
const NEKO_INTERNAL_PORT: u16 = 18100;
/// UDP ephemeral port range for WebRTC media.
const NEKO_EPR: &str = "59050-59070";
/// How long to wait for neko to become ready.
const NEKO_READY_TIMEOUT: Duration = Duration::from_secs(15);
/// How long between readiness polls.
const NEKO_READY_POLL: Duration = Duration::from_millis(300);
#[derive(Debug, Clone)]
pub struct StreamingConfig {
pub video_codec: String,
pub audio_codec: String,
pub frame_rate: u32,
pub webrtc_port_range: String,
}
impl Default for StreamingConfig {
fn default() -> Self {
Self {
video_codec: "vp8".to_string(),
audio_codec: "opus".to_string(),
frame_rate: 30,
webrtc_port_range: NEKO_EPR.to_string(),
}
}
}
#[derive(Debug, Clone)]
pub struct DesktopStreamingManager {
inner: Arc<Mutex<DesktopStreamingState>>,
process_runtime: Arc<ProcessRuntime>,
}
#[derive(Debug)]
struct DesktopStreamingState {
active: bool,
process_id: Option<String>,
/// Base URL for neko's internal HTTP server (e.g. "http://127.0.0.1:18100").
neko_base_url: Option<String>,
/// Session cookie obtained from neko login, used for WS auth.
neko_session_cookie: Option<String>,
display: Option<String>,
resolution: Option<DesktopResolution>,
streaming_config: StreamingConfig,
window_id: Option<String>,
}
impl Default for DesktopStreamingState {
fn default() -> Self {
Self {
active: false,
process_id: None,
neko_base_url: None,
neko_session_cookie: None,
display: None,
resolution: None,
streaming_config: StreamingConfig::default(),
window_id: None,
}
}
}
impl DesktopStreamingManager {
pub fn new(process_runtime: Arc<ProcessRuntime>) -> Self {
Self {
inner: Arc::new(Mutex::new(DesktopStreamingState::default())),
process_runtime,
}
}
/// Start the neko streaming subprocess targeting the given display.
pub async fn start(
&self,
display: &str,
resolution: DesktopResolution,
environment: &HashMap<String, String>,
config: Option<StreamingConfig>,
window_id: Option<String>,
) -> Result<DesktopStreamStatusResponse, SandboxError> {
let config = config.unwrap_or_default();
let mut state = self.inner.lock().await;
if state.active {
return Ok(DesktopStreamStatusResponse {
active: true,
window_id: state.window_id.clone(),
process_id: state.process_id.clone(),
});
}
// Stop any stale process.
if let Some(ref old_id) = state.process_id {
let _ = self.process_runtime.stop_process(old_id, Some(2000)).await;
state.process_id = None;
state.neko_base_url = None;
state.neko_session_cookie = None;
}
let mut env = environment.clone();
env.insert("DISPLAY".to_string(), display.to_string());
let bind_addr = format!("0.0.0.0:{}", NEKO_INTERNAL_PORT);
let screen = format!(
"{}x{}@{}",
resolution.width, resolution.height, config.frame_rate
);
let snapshot = self
.process_runtime
.start_process(ProcessStartSpec {
command: "neko".to_string(),
args: vec![
"serve".to_string(),
"--server.bind".to_string(),
bind_addr,
"--desktop.screen".to_string(),
screen,
"--desktop.display".to_string(),
display.to_string(),
"--capture.video.display".to_string(),
display.to_string(),
"--capture.video.codec".to_string(),
config.video_codec.clone(),
"--capture.audio.codec".to_string(),
config.audio_codec.clone(),
"--webrtc.epr".to_string(),
config.webrtc_port_range.clone(),
"--webrtc.icelite".to_string(),
"--webrtc.nat1to1".to_string(),
"127.0.0.1".to_string(),
"--member.provider".to_string(),
"noauth".to_string(),
// Disable the custom xf86-input-neko driver (defaults to true
// in neko v3). The driver socket is not available outside
// neko's official Docker images; XTEST is used instead.
"--desktop.input.enabled=false".to_string(),
],
cwd: None,
env,
tty: false,
interactive: false,
owner: ProcessOwner::Desktop,
restart_policy: None,
})
.await
.map_err(|e| SandboxError::Conflict {
message: format!("failed to start neko streaming process: {e}"),
})?;
let neko_base = format!("http://127.0.0.1:{}", NEKO_INTERNAL_PORT);
let process_id_clone = snapshot.id.clone();
state.process_id = Some(snapshot.id.clone());
state.neko_base_url = Some(neko_base.clone());
state.display = Some(display.to_string());
state.resolution = Some(resolution);
state.streaming_config = config;
state.window_id = window_id;
state.active = true;
// Drop the lock before waiting for readiness.
drop(state);
// Wait for neko to be ready by polling its login endpoint.
let deadline = tokio::time::Instant::now() + NEKO_READY_TIMEOUT;
let login_url = format!("{}/api/login", neko_base);
let client = reqwest::Client::builder()
.redirect(reqwest::redirect::Policy::none())
.build()
.unwrap_or_else(|_| reqwest::Client::new());
let mut session_cookie = None;
loop {
match client
.post(&login_url)
.json(&serde_json::json!({"username": "admin", "password": "admin"}))
.send()
.await
{
Ok(resp) if resp.status().is_success() => {
// Extract NEKO_SESSION cookie from Set-Cookie header.
if let Some(set_cookie) = resp.headers().get("set-cookie") {
if let Ok(cookie_str) = set_cookie.to_str() {
// Extract just the cookie value (before the first ';').
if let Some(cookie_part) = cookie_str.split(';').next() {
session_cookie = Some(cookie_part.to_string());
}
}
}
tracing::info!("neko streaming process ready, session obtained");
// Take control so the connected client can send input.
let control_url = format!("{}/api/room/control/take", neko_base);
if let Some(ref cookie) = session_cookie {
let _ = client
.post(&control_url)
.header("Cookie", cookie.as_str())
.send()
.await;
tracing::info!("neko control taken");
}
break;
}
_ => {}
}
if tokio::time::Instant::now() >= deadline {
tracing::warn!("neko did not become ready within timeout, proceeding anyway");
break;
}
tokio::time::sleep(NEKO_READY_POLL).await;
}
// Store the session cookie.
if let Some(ref cookie) = session_cookie {
let mut state = self.inner.lock().await;
state.neko_session_cookie = Some(cookie.clone());
}
let state = self.inner.lock().await;
let state_window_id = state.window_id.clone();
drop(state);
Ok(DesktopStreamStatusResponse {
active: true,
window_id: state_window_id,
process_id: Some(process_id_clone),
})
}
/// Stop streaming and tear down neko subprocess.
pub async fn stop(&self) -> DesktopStreamStatusResponse {
let mut state = self.inner.lock().await;
if let Some(ref process_id) = state.process_id.take() {
let _ = self
.process_runtime
.stop_process(process_id, Some(3000))
.await;
}
state.active = false;
state.neko_base_url = None;
state.neko_session_cookie = None;
state.display = None;
state.resolution = None;
state.window_id = None;
DesktopStreamStatusResponse {
active: false,
window_id: None,
process_id: None,
}
}
pub async fn status(&self) -> DesktopStreamStatusResponse {
let state = self.inner.lock().await;
DesktopStreamStatusResponse {
active: state.active,
window_id: state.window_id.clone(),
process_id: state.process_id.clone(),
}
}
pub async fn ensure_active(&self) -> Result<(), SandboxError> {
if self.inner.lock().await.active {
Ok(())
} else {
Err(SandboxError::Conflict {
message: "desktop streaming is not active".to_string(),
})
}
}
/// Get the neko WebSocket URL for signaling proxy, including session cookie.
pub async fn neko_ws_url(&self) -> Option<String> {
self.inner
.lock()
.await
.neko_base_url
.as_ref()
.map(|base| base.replace("http://", "ws://") + "/api/ws")
}
/// Get the neko base HTTP URL (e.g. `http://127.0.0.1:18100`).
pub async fn neko_base_url(&self) -> Option<String> {
self.inner.lock().await.neko_base_url.clone()
}
/// Create a fresh neko login session and return the session cookie.
/// Each WebSocket proxy connection should call this to get its own
/// session, avoiding conflicts when multiple clients connect.
/// Uses a unique username per connection so neko treats them as
/// separate members (noauth provider allows any credentials).
pub async fn create_neko_session(&self) -> Option<String> {
let base_url = self.neko_base_url().await?;
let client = reqwest::Client::new();
let login_url = format!("{}/api/login", base_url);
let username = format!(
"user-{}",
std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap_or_default()
.as_nanos()
);
tracing::debug!(
"creating neko session: username={}, url={}",
username,
login_url
);
let resp = match client
.post(&login_url)
.json(&serde_json::json!({"username": username, "password": "admin"}))
.send()
.await
{
Ok(r) => r,
Err(e) => {
tracing::warn!("neko login request failed: {e}");
return None;
}
};
if !resp.status().is_success() {
tracing::warn!("neko login returned status {}", resp.status());
return None;
}
let cookie = resp
.headers()
.get("set-cookie")
.and_then(|v| v.to_str().ok())
.map(|v| v.split(';').next().unwrap_or(v).to_string());
let cookie = match cookie {
Some(c) => c,
None => {
tracing::warn!("neko login response missing set-cookie header");
return None;
}
};
tracing::debug!("neko session created: {}", username);
// Take control for this session.
let control_url = format!("{}/api/room/control/take", base_url);
let _ = client
.post(&control_url)
.header("Cookie", &cookie)
.send()
.await;
Some(cookie)
}
/// Get the shared neko session cookie (used during startup).
pub async fn neko_session_cookie(&self) -> Option<String> {
self.inner.lock().await.neko_session_cookie.clone()
}
pub async fn resolution(&self) -> Option<DesktopResolution> {
self.inner.lock().await.resolution.clone()
}
pub async fn is_active(&self) -> bool {
self.inner.lock().await.active
}
/// Return process diagnostics for the neko streaming subprocess, if one
/// has been started. The returned info mirrors the shape used by
/// `DesktopRuntime::processes_locked` for xvfb/openbox/dbus.
pub async fn process_info(&self) -> Option<DesktopProcessInfo> {
let state = self.inner.lock().await;
let process_id = state.process_id.as_ref()?;
let snapshot = self.process_runtime.snapshot(process_id).await.ok()?;
Some(DesktopProcessInfo {
name: "neko".to_string(),
pid: snapshot.pid,
running: snapshot.status == crate::process_runtime::ProcessStatus::Running,
log_path: None,
})
}
}

View file

@ -0,0 +1,397 @@
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use utoipa::{IntoParams, ToSchema};
#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum DesktopState {
Inactive,
InstallRequired,
Starting,
Active,
Stopping,
Failed,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopResolution {
pub width: u32,
pub height: u32,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub dpi: Option<u32>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopErrorInfo {
pub code: String,
pub message: String,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopProcessInfo {
pub name: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub pid: Option<u32>,
pub running: bool,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub log_path: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopStatusResponse {
pub state: DesktopState,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub display: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub resolution: Option<DesktopResolution>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub started_at: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub last_error: Option<DesktopErrorInfo>,
#[serde(default)]
pub missing_dependencies: Vec<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub install_command: Option<String>,
#[serde(default)]
pub processes: Vec<DesktopProcessInfo>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub runtime_log_path: Option<String>,
/// Current visible windows (included when the desktop is active).
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub windows: Vec<DesktopWindowInfo>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)]
#[serde(rename_all = "camelCase")]
pub struct DesktopStartRequest {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub width: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub height: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub dpi: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub display_num: Option<i32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub state_dir: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub stream_video_codec: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub stream_audio_codec: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub stream_frame_rate: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub webrtc_port_range: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub recording_fps: Option<u32>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)]
#[serde(rename_all = "camelCase")]
pub struct DesktopScreenshotQuery {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub format: Option<DesktopScreenshotFormat>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub quality: Option<u8>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub scale: Option<f32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub show_cursor: Option<bool>,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum DesktopScreenshotFormat {
Png,
Jpeg,
Webp,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams)]
#[serde(rename_all = "camelCase")]
pub struct DesktopRegionScreenshotQuery {
pub x: i32,
pub y: i32,
pub width: u32,
pub height: u32,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub format: Option<DesktopScreenshotFormat>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub quality: Option<u8>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub scale: Option<f32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub show_cursor: Option<bool>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopMousePositionResponse {
pub x: i32,
pub y: i32,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub screen: Option<i32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub window: Option<String>,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum DesktopMouseButton {
Left,
Middle,
Right,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopMouseMoveRequest {
pub x: i32,
pub y: i32,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopMouseClickRequest {
pub x: i32,
pub y: i32,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub button: Option<DesktopMouseButton>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub click_count: Option<u32>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopMouseDownRequest {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub x: Option<i32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub y: Option<i32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub button: Option<DesktopMouseButton>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopMouseUpRequest {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub x: Option<i32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub y: Option<i32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub button: Option<DesktopMouseButton>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopMouseDragRequest {
pub start_x: i32,
pub start_y: i32,
pub end_x: i32,
pub end_y: i32,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub button: Option<DesktopMouseButton>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopMouseScrollRequest {
pub x: i32,
pub y: i32,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub delta_x: Option<i32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub delta_y: Option<i32>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopKeyboardTypeRequest {
pub text: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub delay_ms: Option<u32>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopKeyboardPressRequest {
pub key: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub modifiers: Option<DesktopKeyModifiers>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq, Default)]
#[serde(rename_all = "camelCase")]
pub struct DesktopKeyModifiers {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub ctrl: Option<bool>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub shift: Option<bool>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub alt: Option<bool>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub cmd: Option<bool>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopKeyboardDownRequest {
pub key: String,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopKeyboardUpRequest {
pub key: String,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopActionResponse {
pub ok: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopDisplayInfoResponse {
pub display: String,
pub resolution: DesktopResolution,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopWindowInfo {
pub id: String,
pub title: String,
pub x: i32,
pub y: i32,
pub width: u32,
pub height: u32,
pub is_active: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopWindowListResponse {
pub windows: Vec<DesktopWindowInfo>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, Default)]
#[serde(rename_all = "camelCase")]
pub struct DesktopRecordingStartRequest {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub fps: Option<u32>,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum DesktopRecordingStatus {
Recording,
Completed,
Failed,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopRecordingInfo {
pub id: String,
pub status: DesktopRecordingStatus,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub process_id: Option<String>,
pub file_name: String,
pub bytes: u64,
pub started_at: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub ended_at: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopRecordingListResponse {
pub recordings: Vec<DesktopRecordingInfo>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
pub struct DesktopStreamStatusResponse {
pub active: bool,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub window_id: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub process_id: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopClipboardResponse {
pub text: String,
pub selection: String,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)]
#[serde(rename_all = "camelCase")]
pub struct DesktopClipboardQuery {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub selection: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopClipboardWriteRequest {
pub text: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub selection: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopLaunchRequest {
pub app: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub args: Option<Vec<String>>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub wait: Option<bool>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopLaunchResponse {
pub process_id: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub pid: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub window_id: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopOpenRequest {
pub target: String,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopOpenResponse {
pub process_id: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub pid: Option<u32>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopWindowMoveRequest {
pub x: i32,
pub y: i32,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopWindowResizeRequest {
pub width: u32,
pub height: u32,
}

View file

@ -3,6 +3,12 @@
mod acp_proxy_runtime;
pub mod cli;
pub mod daemon;
mod desktop_errors;
mod desktop_install;
mod desktop_recording;
mod desktop_runtime;
mod desktop_streaming;
pub mod desktop_types;
mod process_runtime;
pub mod router;
pub mod server_logs;

View file

@ -1,5 +1,5 @@
use std::collections::{HashMap, VecDeque};
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
use std::sync::Arc;
use std::time::Instant;
@ -27,6 +27,22 @@ pub enum ProcessStream {
Pty,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum ProcessOwner {
User,
Desktop,
System,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum RestartPolicy {
Never,
Always,
OnFailure,
}
#[derive(Debug, Clone)]
pub struct ProcessStartSpec {
pub command: String,
@ -35,6 +51,8 @@ pub struct ProcessStartSpec {
pub env: HashMap<String, String>,
pub tty: bool,
pub interactive: bool,
pub owner: ProcessOwner,
pub restart_policy: Option<RestartPolicy>,
}
#[derive(Debug, Clone)]
@ -78,6 +96,7 @@ pub struct ProcessSnapshot {
pub cwd: Option<String>,
pub tty: bool,
pub interactive: bool,
pub owner: ProcessOwner,
pub status: ProcessStatus,
pub pid: Option<u32>,
pub exit_code: Option<i32>,
@ -129,17 +148,27 @@ struct ManagedProcess {
cwd: Option<String>,
tty: bool,
interactive: bool,
owner: ProcessOwner,
#[allow(dead_code)]
restart_policy: RestartPolicy,
spec: ProcessStartSpec,
created_at_ms: i64,
pid: Option<u32>,
max_log_bytes: usize,
stdin: Mutex<Option<ProcessStdin>>,
#[cfg(unix)]
pty_resize_fd: Mutex<Option<std::fs::File>>,
runtime: Mutex<ManagedRuntime>,
status: RwLock<ManagedStatus>,
sequence: AtomicU64,
logs: Mutex<VecDeque<StoredLog>>,
total_log_bytes: Mutex<usize>,
log_tx: broadcast::Sender<ProcessLogLine>,
stop_requested: AtomicBool,
}
#[derive(Debug)]
struct ManagedRuntime {
pid: Option<u32>,
stdin: Option<ProcessStdin>,
#[cfg(unix)]
pty_resize_fd: Option<std::fs::File>,
}
#[derive(Debug)]
@ -162,17 +191,17 @@ struct ManagedStatus {
}
struct SpawnedPipeProcess {
process: Arc<ManagedProcess>,
child: Child,
stdout: tokio::process::ChildStdout,
stderr: tokio::process::ChildStderr,
runtime: ManagedRuntime,
}
#[cfg(unix)]
struct SpawnedTtyProcess {
process: Arc<ManagedProcess>,
child: Child,
reader: tokio::fs::File,
runtime: ManagedRuntime,
}
impl ProcessRuntime {
@ -224,21 +253,14 @@ impl ProcessRuntime {
&self,
spec: ProcessStartSpec,
) -> Result<ProcessSnapshot, SandboxError> {
let config = self.get_config().await;
let process_refs = {
let processes = self.inner.processes.read().await;
processes.values().cloned().collect::<Vec<_>>()
};
let mut running_count = 0usize;
for process in process_refs {
if process.status.read().await.status == ProcessStatus::Running {
running_count += 1;
}
if spec.command.trim().is_empty() {
return Err(SandboxError::InvalidRequest {
message: "command must not be empty".to_string(),
});
}
if running_count >= config.max_concurrent_processes {
let config = self.get_config().await;
if self.running_process_count().await >= config.max_concurrent_processes {
return Err(SandboxError::Conflict {
message: format!(
"max concurrent process limit reached ({})",
@ -247,73 +269,44 @@ impl ProcessRuntime {
});
}
if spec.command.trim().is_empty() {
return Err(SandboxError::InvalidRequest {
message: "command must not be empty".to_string(),
});
}
let id_num = self.inner.next_id.fetch_add(1, Ordering::Relaxed);
let id = format!("proc_{id_num}");
let process = Arc::new(ManagedProcess {
id: id.clone(),
command: spec.command.clone(),
args: spec.args.clone(),
cwd: spec.cwd.clone(),
tty: spec.tty,
interactive: spec.interactive,
owner: spec.owner,
restart_policy: spec.restart_policy.unwrap_or(RestartPolicy::Never),
spec,
created_at_ms: now_ms(),
max_log_bytes: config.max_log_bytes_per_process,
runtime: Mutex::new(ManagedRuntime {
pid: None,
stdin: None,
#[cfg(unix)]
pty_resize_fd: None,
}),
status: RwLock::new(ManagedStatus {
status: ProcessStatus::Running,
exit_code: None,
exited_at_ms: None,
}),
sequence: AtomicU64::new(1),
logs: Mutex::new(VecDeque::new()),
total_log_bytes: Mutex::new(0),
log_tx: broadcast::channel(512).0,
stop_requested: AtomicBool::new(false),
});
if spec.tty {
#[cfg(unix)]
{
let spawned = self
.spawn_tty_process(id.clone(), spec, config.max_log_bytes_per_process)
.await?;
let process = spawned.process.clone();
self.inner
.processes
.write()
.await
.insert(id, process.clone());
let p = process.clone();
tokio::spawn(async move {
pump_output(p, spawned.reader, ProcessStream::Pty).await;
});
let p = process.clone();
tokio::spawn(async move {
watch_exit(p, spawned.child).await;
});
return Ok(process.snapshot().await);
}
#[cfg(not(unix))]
{
return Err(SandboxError::StreamError {
message: "tty process mode is not supported on this platform".to_string(),
});
}
}
let spawned = self
.spawn_pipe_process(id.clone(), spec, config.max_log_bytes_per_process)
.await?;
let process = spawned.process.clone();
self.spawn_existing_process(process.clone()).await?;
self.inner
.processes
.write()
.await
.insert(id, process.clone());
let p = process.clone();
tokio::spawn(async move {
pump_output(p, spawned.stdout, ProcessStream::Stdout).await;
});
let p = process.clone();
tokio::spawn(async move {
pump_output(p, spawned.stderr, ProcessStream::Stderr).await;
});
let p = process.clone();
tokio::spawn(async move {
watch_exit(p, spawned.child).await;
});
Ok(process.snapshot().await)
}
@ -412,11 +405,13 @@ impl ProcessRuntime {
})
}
pub async fn list_processes(&self) -> Vec<ProcessSnapshot> {
pub async fn list_processes(&self, owner: Option<ProcessOwner>) -> Vec<ProcessSnapshot> {
let processes = self.inner.processes.read().await;
let mut items = Vec::with_capacity(processes.len());
for process in processes.values() {
items.push(process.snapshot().await);
if owner.is_none_or(|expected| process.owner == expected) {
items.push(process.snapshot().await);
}
}
items.sort_by(|a, b| a.id.cmp(&b.id));
items
@ -453,6 +448,7 @@ impl ProcessRuntime {
wait_ms: Option<u64>,
) -> Result<ProcessSnapshot, SandboxError> {
let process = self.lookup_process(id).await?;
process.stop_requested.store(true, Ordering::SeqCst);
process.send_signal(SIGTERM).await?;
maybe_wait_for_exit(process.clone(), wait_ms.unwrap_or(2_000)).await;
Ok(process.snapshot().await)
@ -464,6 +460,7 @@ impl ProcessRuntime {
wait_ms: Option<u64>,
) -> Result<ProcessSnapshot, SandboxError> {
let process = self.lookup_process(id).await?;
process.stop_requested.store(true, Ordering::SeqCst);
process.send_signal(SIGKILL).await?;
maybe_wait_for_exit(process.clone(), wait_ms.unwrap_or(1_000)).await;
Ok(process.snapshot().await)
@ -506,6 +503,17 @@ impl ProcessRuntime {
Ok(process.log_tx.subscribe())
}
async fn running_process_count(&self) -> usize {
let processes = self.inner.processes.read().await;
let mut running = 0usize;
for process in processes.values() {
if process.status.read().await.status == ProcessStatus::Running {
running += 1;
}
}
running
}
async fn lookup_process(&self, id: &str) -> Result<Arc<ManagedProcess>, SandboxError> {
let process = self.inner.processes.read().await.get(id).cloned();
process.ok_or_else(|| SandboxError::NotFound {
@ -514,11 +522,83 @@ impl ProcessRuntime {
})
}
async fn spawn_pipe_process(
async fn spawn_existing_process(
&self,
id: String,
spec: ProcessStartSpec,
max_log_bytes: usize,
process: Arc<ManagedProcess>,
) -> Result<(), SandboxError> {
process.stop_requested.store(false, Ordering::SeqCst);
let mut runtime_guard = process.runtime.lock().await;
let mut status_guard = process.status.write().await;
if process.tty {
#[cfg(unix)]
{
let SpawnedTtyProcess {
child,
reader,
runtime,
} = self.spawn_tty_process(&process.spec)?;
*runtime_guard = runtime;
status_guard.status = ProcessStatus::Running;
status_guard.exit_code = None;
status_guard.exited_at_ms = None;
drop(status_guard);
drop(runtime_guard);
let process_for_output = process.clone();
tokio::spawn(async move {
pump_output(process_for_output, reader, ProcessStream::Pty).await;
});
let runtime = self.clone();
tokio::spawn(async move {
watch_exit(runtime, process, child).await;
});
return Ok(());
}
#[cfg(not(unix))]
{
return Err(SandboxError::StreamError {
message: "tty process mode is not supported on this platform".to_string(),
});
}
}
let SpawnedPipeProcess {
child,
stdout,
stderr,
runtime,
} = self.spawn_pipe_process(&process.spec)?;
*runtime_guard = runtime;
status_guard.status = ProcessStatus::Running;
status_guard.exit_code = None;
status_guard.exited_at_ms = None;
drop(status_guard);
drop(runtime_guard);
let process_for_stdout = process.clone();
tokio::spawn(async move {
pump_output(process_for_stdout, stdout, ProcessStream::Stdout).await;
});
let process_for_stderr = process.clone();
tokio::spawn(async move {
pump_output(process_for_stderr, stderr, ProcessStream::Stderr).await;
});
let runtime = self.clone();
tokio::spawn(async move {
watch_exit(runtime, process, child).await;
});
Ok(())
}
fn spawn_pipe_process(
&self,
spec: &ProcessStartSpec,
) -> Result<SpawnedPipeProcess, SandboxError> {
let mut cmd = Command::new(&spec.command);
cmd.args(&spec.args)
@ -551,35 +631,14 @@ impl ProcessRuntime {
.ok_or_else(|| SandboxError::StreamError {
message: "failed to capture stderr".to_string(),
})?;
let pid = child.id();
let (tx, _rx) = broadcast::channel(512);
let process = Arc::new(ManagedProcess {
id,
command: spec.command,
args: spec.args,
cwd: spec.cwd,
tty: false,
interactive: spec.interactive,
created_at_ms: now_ms(),
pid,
max_log_bytes,
stdin: Mutex::new(stdin.map(ProcessStdin::Pipe)),
#[cfg(unix)]
pty_resize_fd: Mutex::new(None),
status: RwLock::new(ManagedStatus {
status: ProcessStatus::Running,
exit_code: None,
exited_at_ms: None,
}),
sequence: AtomicU64::new(1),
logs: Mutex::new(VecDeque::new()),
total_log_bytes: Mutex::new(0),
log_tx: tx,
});
Ok(SpawnedPipeProcess {
process,
runtime: ManagedRuntime {
pid: child.id(),
stdin: stdin.map(ProcessStdin::Pipe),
#[cfg(unix)]
pty_resize_fd: None,
},
child,
stdout,
stderr,
@ -587,11 +646,9 @@ impl ProcessRuntime {
}
#[cfg(unix)]
async fn spawn_tty_process(
fn spawn_tty_process(
&self,
id: String,
spec: ProcessStartSpec,
max_log_bytes: usize,
spec: &ProcessStartSpec,
) -> Result<SpawnedTtyProcess, SandboxError> {
use std::os::fd::AsRawFd;
use std::process::Stdio;
@ -632,8 +689,8 @@ impl ProcessRuntime {
let child = cmd.spawn().map_err(|err| SandboxError::StreamError {
message: format!("failed to spawn tty process: {err}"),
})?;
let pid = child.id();
drop(slave_fd);
let master_raw = master_fd.as_raw_fd();
@ -644,32 +701,12 @@ impl ProcessRuntime {
let writer_file = tokio::fs::File::from_std(std::fs::File::from(writer_fd));
let resize_file = std::fs::File::from(resize_fd);
let (tx, _rx) = broadcast::channel(512);
let process = Arc::new(ManagedProcess {
id,
command: spec.command,
args: spec.args,
cwd: spec.cwd,
tty: true,
interactive: spec.interactive,
created_at_ms: now_ms(),
pid,
max_log_bytes,
stdin: Mutex::new(Some(ProcessStdin::Pty(writer_file))),
pty_resize_fd: Mutex::new(Some(resize_file)),
status: RwLock::new(ManagedStatus {
status: ProcessStatus::Running,
exit_code: None,
exited_at_ms: None,
}),
sequence: AtomicU64::new(1),
logs: Mutex::new(VecDeque::new()),
total_log_bytes: Mutex::new(0),
log_tx: tx,
});
Ok(SpawnedTtyProcess {
process,
runtime: ManagedRuntime {
pid,
stdin: Some(ProcessStdin::Pty(writer_file)),
pty_resize_fd: Some(resize_file),
},
child,
reader: reader_file,
})
@ -694,6 +731,7 @@ pub struct ProcessLogFilter {
impl ManagedProcess {
async fn snapshot(&self) -> ProcessSnapshot {
let status = self.status.read().await.clone();
let pid = self.runtime.lock().await.pid;
ProcessSnapshot {
id: self.id.clone(),
command: self.command.clone(),
@ -701,8 +739,9 @@ impl ManagedProcess {
cwd: self.cwd.clone(),
tty: self.tty,
interactive: self.interactive,
owner: self.owner,
status: status.status,
pid: self.pid,
pid,
exit_code: status.exit_code,
created_at_ms: self.created_at_ms,
exited_at_ms: status.exited_at_ms,
@ -752,10 +791,13 @@ impl ManagedProcess {
});
}
let mut guard = self.stdin.lock().await;
let stdin = guard.as_mut().ok_or_else(|| SandboxError::Conflict {
message: "process does not accept stdin".to_string(),
})?;
let mut runtime = self.runtime.lock().await;
let stdin = runtime
.stdin
.as_mut()
.ok_or_else(|| SandboxError::Conflict {
message: "process does not accept stdin".to_string(),
})?;
match stdin {
ProcessStdin::Pipe(pipe) => {
@ -825,7 +867,7 @@ impl ManagedProcess {
if self.status.read().await.status != ProcessStatus::Running {
return Ok(());
}
let Some(pid) = self.pid else {
let Some(pid) = self.runtime.lock().await.pid else {
return Ok(());
};
@ -840,8 +882,9 @@ impl ManagedProcess {
#[cfg(unix)]
{
use std::os::fd::AsRawFd;
let guard = self.pty_resize_fd.lock().await;
let Some(fd) = guard.as_ref() else {
let runtime = self.runtime.lock().await;
let Some(fd) = runtime.pty_resize_fd.as_ref() else {
return Err(SandboxError::Conflict {
message: "PTY resize handle unavailable".to_string(),
});
@ -857,6 +900,32 @@ impl ManagedProcess {
Ok(())
}
#[allow(dead_code)]
fn should_restart(&self, exit_code: Option<i32>) -> bool {
match self.restart_policy {
RestartPolicy::Never => false,
RestartPolicy::Always => true,
RestartPolicy::OnFailure => exit_code.unwrap_or(1) != 0,
}
}
async fn mark_exited(&self, exit_code: Option<i32>, exited_at_ms: Option<i64>) {
{
let mut status = self.status.write().await;
status.status = ProcessStatus::Exited;
status.exit_code = exit_code;
status.exited_at_ms = exited_at_ms;
}
let mut runtime = self.runtime.lock().await;
runtime.pid = None;
let _ = runtime.stdin.take();
#[cfg(unix)]
{
let _ = runtime.pty_resize_fd.take();
}
}
}
fn stream_matches(stream: ProcessStream, filter: ProcessLogFilterStream) -> bool {
@ -909,21 +978,16 @@ where
}
}
async fn watch_exit(process: Arc<ManagedProcess>, mut child: Child) {
async fn watch_exit(runtime: ProcessRuntime, process: Arc<ManagedProcess>, mut child: Child) {
let _ = runtime;
let wait = child.wait().await;
let (exit_code, exited_at_ms) = match wait {
Ok(status) => (status.code(), Some(now_ms())),
Err(_) => (None, Some(now_ms())),
};
{
let mut state = process.status.write().await;
state.status = ProcessStatus::Exited;
state.exit_code = exit_code;
state.exited_at_ms = exited_at_ms;
}
let _ = process.stdin.lock().await.take();
let _ = process.stop_requested.swap(false, Ordering::SeqCst);
process.mark_exited(exit_code, exited_at_ms).await;
}
async fn capture_output<R>(mut reader: R, max_bytes: usize) -> std::io::Result<(Vec<u8>, bool)>

File diff suppressed because it is too large Load diff

View file

@ -33,7 +33,8 @@ pub(super) async fn require_token(
.and_then(|value| value.to_str().ok())
.and_then(|value| value.strip_prefix("Bearer "));
let allow_query_token = request.uri().path().ends_with("/terminal/ws");
let allow_query_token = request.uri().path().ends_with("/terminal/ws")
|| request.uri().path().ends_with("/stream/ws");
let query_token = if allow_query_token {
request
.uri()

View file

@ -425,6 +425,14 @@ pub enum ProcessState {
Exited,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum ProcessOwner {
User,
Desktop,
System,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct ProcessInfo {
@ -435,6 +443,7 @@ pub struct ProcessInfo {
pub cwd: Option<String>,
pub tty: bool,
pub interactive: bool,
pub owner: ProcessOwner,
pub status: ProcessState,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub pid: Option<u32>,
@ -451,6 +460,13 @@ pub struct ProcessListResponse {
pub processes: Vec<ProcessInfo>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams)]
#[serde(rename_all = "camelCase")]
pub struct ProcessListQuery {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub owner: Option<ProcessOwner>,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum ProcessLogsStream {

View file

@ -0,0 +1,497 @@
/// Integration tests that verify all software documented in docs/common-software.mdx
/// is installed and working inside the sandbox.
///
/// These tests use `docker/test-common-software/Dockerfile` which extends the base
/// test-agent image with all documented software pre-installed.
///
/// KEEP IN SYNC with docs/common-software.mdx and docker/test-common-software/Dockerfile.
///
/// Run with:
/// cargo test -p sandbox-agent --test common_software
use reqwest::header::HeaderMap;
use reqwest::{Method, StatusCode};
use serde_json::{json, Value};
use serial_test::serial;
#[path = "support/docker_common_software.rs"]
mod docker_support;
use docker_support::TestApp;
async fn send_request(
app: &docker_support::DockerApp,
method: Method,
uri: &str,
body: Option<Value>,
) -> (StatusCode, HeaderMap, Vec<u8>) {
let client = reqwest::Client::new();
let mut builder = client.request(method, app.http_url(uri));
let response = if let Some(body) = body {
builder = builder.header("content-type", "application/json");
builder
.body(body.to_string())
.send()
.await
.expect("request")
} else {
builder.send().await.expect("request")
};
let status = response.status();
let headers = response.headers().clone();
let bytes = response.bytes().await.expect("body");
(status, headers, bytes.to_vec())
}
fn parse_json(bytes: &[u8]) -> Value {
if bytes.is_empty() {
Value::Null
} else {
serde_json::from_slice(bytes).expect("valid json")
}
}
/// Run a command inside the sandbox and assert it exits with code 0.
/// Returns the parsed JSON response.
async fn run_ok(app: &docker_support::DockerApp, command: &str, args: &[&str]) -> Value {
run_ok_with_timeout(app, command, args, 30_000).await
}
async fn run_ok_with_timeout(
app: &docker_support::DockerApp,
command: &str,
args: &[&str],
timeout_ms: u64,
) -> Value {
let (status, _, body) = send_request(
app,
Method::POST,
"/v1/processes/run",
Some(json!({
"command": command,
"args": args,
"timeoutMs": timeout_ms
})),
)
.await;
assert_eq!(
status,
StatusCode::OK,
"run {command} failed: {}",
String::from_utf8_lossy(&body)
);
let parsed = parse_json(&body);
assert_eq!(
parsed["exitCode"], 0,
"{command} exited with non-zero code.\nstdout: {}\nstderr: {}",
parsed["stdout"], parsed["stderr"]
);
parsed
}
// ---------------------------------------------------------------------------
// Browsers
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn chromium_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "chromium", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("Chromium"),
"expected Chromium version string, got: {stdout}"
);
}
#[tokio::test]
#[serial]
async fn firefox_esr_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "firefox-esr", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("Mozilla Firefox"),
"expected Firefox version string, got: {stdout}"
);
}
// ---------------------------------------------------------------------------
// Languages and runtimes
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn nodejs_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "node", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.starts_with('v'),
"expected node version string, got: {stdout}"
);
}
#[tokio::test]
#[serial]
async fn npm_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "npm", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn python3_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "python3", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("Python 3"),
"expected Python version string, got: {stdout}"
);
}
#[tokio::test]
#[serial]
async fn pip3_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "pip3", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn java_is_installed_and_runs() {
let test_app = TestApp::new();
// java --version prints to stdout on modern JDKs
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/processes/run",
Some(json!({
"command": "java",
"args": ["--version"],
"timeoutMs": 30000
})),
)
.await;
assert_eq!(status, StatusCode::OK);
let parsed = parse_json(&body);
assert_eq!(parsed["exitCode"], 0);
let combined = format!(
"{}{}",
parsed["stdout"].as_str().unwrap_or(""),
parsed["stderr"].as_str().unwrap_or("")
);
assert!(
combined.contains("openjdk") || combined.contains("OpenJDK") || combined.contains("java"),
"expected Java version string, got: {combined}"
);
}
#[tokio::test]
#[serial]
async fn ruby_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "ruby", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("ruby"),
"expected Ruby version string, got: {stdout}"
);
}
// ---------------------------------------------------------------------------
// Databases
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn sqlite3_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "sqlite3", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(!stdout.is_empty(), "expected sqlite3 version output");
}
#[tokio::test]
#[serial]
async fn redis_server_is_installed() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "redis-server", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("Redis") || stdout.contains("redis"),
"expected Redis version string, got: {stdout}"
);
}
// ---------------------------------------------------------------------------
// Build tools
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn gcc_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "gcc", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn make_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "make", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn cmake_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "cmake", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn pkg_config_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "pkg-config", &["--version"]).await;
}
// ---------------------------------------------------------------------------
// CLI tools
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn git_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "git", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("git version"),
"expected git version string, got: {stdout}"
);
}
#[tokio::test]
#[serial]
async fn jq_is_installed_and_runs() {
let test_app = TestApp::new();
// Pipe a simple JSON through jq
let result = run_ok(&test_app.app, "sh", &["-c", "echo '{\"a\":1}' | jq '.a'"]).await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
assert_eq!(stdout, "1", "jq did not parse JSON correctly: {stdout}");
}
#[tokio::test]
#[serial]
async fn tmux_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "tmux", &["-V"]).await;
}
// ---------------------------------------------------------------------------
// Media and graphics
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn ffmpeg_is_installed_and_runs() {
let test_app = TestApp::new();
// ffmpeg prints version to stderr, so just check exit code via -version
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/processes/run",
Some(json!({
"command": "ffmpeg",
"args": ["-version"],
"timeoutMs": 10000
})),
)
.await;
assert_eq!(status, StatusCode::OK);
let parsed = parse_json(&body);
assert_eq!(parsed["exitCode"], 0);
let combined = format!(
"{}{}",
parsed["stdout"].as_str().unwrap_or(""),
parsed["stderr"].as_str().unwrap_or("")
);
assert!(
combined.contains("ffmpeg version"),
"expected ffmpeg version string, got: {combined}"
);
}
#[tokio::test]
#[serial]
async fn imagemagick_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "convert", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn poppler_pdftoppm_is_installed() {
let test_app = TestApp::new();
// pdftoppm -v prints to stderr and exits 0
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/processes/run",
Some(json!({
"command": "pdftoppm",
"args": ["-v"],
"timeoutMs": 10000
})),
)
.await;
assert_eq!(status, StatusCode::OK);
let parsed = parse_json(&body);
assert_eq!(parsed["exitCode"], 0);
}
// ---------------------------------------------------------------------------
// Desktop applications (verify binary exists, don't launch GUI)
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn gimp_is_installed() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "gimp", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("GIMP") || stdout.contains("gimp") || stdout.contains("Image Manipulation"),
"expected GIMP version string, got: {stdout}"
);
}
// ---------------------------------------------------------------------------
// Functional tests: verify tools actually work, not just that they're present
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn python3_can_run_script() {
let test_app = TestApp::new();
let result = run_ok(
&test_app.app,
"python3",
&["-c", "import json; print(json.dumps({'ok': True}))"],
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
let parsed: Value = serde_json::from_str(stdout).expect("python json output");
assert_eq!(parsed["ok"], true);
}
#[tokio::test]
#[serial]
async fn node_can_run_script() {
let test_app = TestApp::new();
let result = run_ok(
&test_app.app,
"node",
&["-e", "console.log(JSON.stringify({ok: true}))"],
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
let parsed: Value = serde_json::from_str(stdout).expect("node json output");
assert_eq!(parsed["ok"], true);
}
#[tokio::test]
#[serial]
async fn ruby_can_run_script() {
let test_app = TestApp::new();
let result = run_ok(
&test_app.app,
"ruby",
&["-e", "require 'json'; puts JSON.generate({ok: true})"],
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
let parsed: Value = serde_json::from_str(stdout).expect("ruby json output");
assert_eq!(parsed["ok"], true);
}
#[tokio::test]
#[serial]
async fn gcc_can_compile_and_run_hello_world() {
let test_app = TestApp::new();
// Write a C file
run_ok(
&test_app.app,
"sh",
&["-c", r#"printf '#include <stdio.h>\nint main(){printf("hello\\n");return 0;}\n' > /tmp/hello.c"#],
)
.await;
// Compile it
run_ok(&test_app.app, "gcc", &["-o", "/tmp/hello", "/tmp/hello.c"]).await;
// Run it
let result = run_ok(&test_app.app, "/tmp/hello", &[]).await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
assert_eq!(stdout, "hello");
}
#[tokio::test]
#[serial]
async fn sqlite3_can_create_and_query() {
let test_app = TestApp::new();
let result = run_ok(
&test_app.app,
"sh",
&[
"-c",
"sqlite3 /tmp/test.db 'CREATE TABLE t(v TEXT); INSERT INTO t VALUES(\"ok\"); SELECT v FROM t;'",
],
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
assert_eq!(stdout, "ok");
}
#[tokio::test]
#[serial]
async fn git_can_init_and_commit() {
let test_app = TestApp::new();
run_ok(
&test_app.app,
"sh",
&[
"-c",
"cd /tmp && mkdir -p testrepo && cd testrepo && git init && git config user.email 'test@test.com' && git config user.name 'Test' && touch file && git add file && git commit -m 'init'",
],
)
.await;
}
#[tokio::test]
#[serial]
async fn chromium_headless_can_dump_dom() {
let test_app = TestApp::new();
// Use headless mode to dump the DOM of a blank page
let result = run_ok_with_timeout(
&test_app.app,
"chromium",
&[
"--headless",
"--no-sandbox",
"--disable-gpu",
"--dump-dom",
"data:text/html,<h1>hello</h1>",
],
30_000,
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("hello"),
"expected hello in DOM dump, got: {stdout}"
);
}

View file

@ -0,0 +1,593 @@
use std::collections::{BTreeMap, BTreeSet};
use std::fs;
use std::io::{Read, Write};
use std::net::TcpStream;
use std::path::{Path, PathBuf};
use std::process::Command;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::OnceLock;
use std::thread;
use std::time::{Duration, SystemTime, UNIX_EPOCH};
use sandbox_agent::router::AuthConfig;
use serial_test::serial;
use tempfile::TempDir;
const CONTAINER_PORT: u16 = 3000;
const DEFAULT_PATH: &str = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin";
const DEFAULT_IMAGE_TAG: &str = "sandbox-agent-test:dev";
const STANDARD_PATHS: &[&str] = &[
"/usr/local/sbin",
"/usr/local/bin",
"/usr/sbin",
"/usr/bin",
"/sbin",
"/bin",
];
static IMAGE_TAG: OnceLock<String> = OnceLock::new();
static DOCKER_BIN: OnceLock<PathBuf> = OnceLock::new();
static CONTAINER_COUNTER: AtomicU64 = AtomicU64::new(0);
#[derive(Clone)]
pub struct DockerApp {
base_url: String,
}
impl DockerApp {
pub fn http_url(&self, path: &str) -> String {
format!("{}{}", self.base_url, path)
}
pub fn ws_url(&self, path: &str) -> String {
let suffix = self
.base_url
.strip_prefix("http://")
.unwrap_or(&self.base_url);
format!("ws://{suffix}{path}")
}
}
pub struct TestApp {
pub app: DockerApp,
install_dir: PathBuf,
_root: TempDir,
container_id: String,
}
#[derive(Default)]
pub struct TestAppOptions {
pub env: BTreeMap<String, String>,
pub extra_paths: Vec<PathBuf>,
pub replace_path: bool,
}
impl TestApp {
pub fn new(auth: AuthConfig) -> Self {
Self::with_setup(auth, |_| {})
}
pub fn with_setup<F>(auth: AuthConfig, setup: F) -> Self
where
F: FnOnce(&Path),
{
Self::with_options(auth, TestAppOptions::default(), setup)
}
pub fn with_options<F>(auth: AuthConfig, options: TestAppOptions, setup: F) -> Self
where
F: FnOnce(&Path),
{
let root = tempfile::tempdir().expect("create docker test root");
let layout = TestLayout::new(root.path());
layout.create();
setup(&layout.install_dir);
let container_id = unique_container_id();
let image = ensure_test_image();
let env = build_env(&layout, &auth, &options);
let mounts = build_mounts(root.path(), &env);
let base_url = run_container(&container_id, &image, &mounts, &env, &auth);
Self {
app: DockerApp { base_url },
install_dir: layout.install_dir,
_root: root,
container_id,
}
}
pub fn install_path(&self) -> &Path {
&self.install_dir
}
pub fn root_path(&self) -> &Path {
self._root.path()
}
}
impl Drop for TestApp {
fn drop(&mut self) {
let _ = Command::new(docker_bin())
.args(["rm", "-f", &self.container_id])
.output();
}
}
pub struct LiveServer {
base_url: String,
}
impl LiveServer {
pub async fn spawn(app: DockerApp) -> Self {
Self {
base_url: app.base_url,
}
}
pub fn http_url(&self, path: &str) -> String {
format!("{}{}", self.base_url, path)
}
pub fn ws_url(&self, path: &str) -> String {
let suffix = self
.base_url
.strip_prefix("http://")
.unwrap_or(&self.base_url);
format!("ws://{suffix}{path}")
}
pub async fn shutdown(self) {}
}
struct TestLayout {
home: PathBuf,
xdg_data_home: PathBuf,
xdg_state_home: PathBuf,
appdata: PathBuf,
local_appdata: PathBuf,
install_dir: PathBuf,
}
impl TestLayout {
fn new(root: &Path) -> Self {
let home = root.join("home");
let xdg_data_home = root.join("xdg-data");
let xdg_state_home = root.join("xdg-state");
let appdata = root.join("appdata").join("Roaming");
let local_appdata = root.join("appdata").join("Local");
let install_dir = xdg_data_home.join("sandbox-agent").join("bin");
Self {
home,
xdg_data_home,
xdg_state_home,
appdata,
local_appdata,
install_dir,
}
}
fn create(&self) {
for dir in [
&self.home,
&self.xdg_data_home,
&self.xdg_state_home,
&self.appdata,
&self.local_appdata,
&self.install_dir,
] {
fs::create_dir_all(dir).expect("create docker test dir");
}
}
}
fn ensure_test_image() -> String {
IMAGE_TAG
.get_or_init(|| {
let repo_root = repo_root();
let image_tag = std::env::var("SANDBOX_AGENT_TEST_IMAGE")
.unwrap_or_else(|_| DEFAULT_IMAGE_TAG.to_string());
let output = Command::new(docker_bin())
.args(["build", "--tag", &image_tag, "--file"])
.arg(
repo_root
.join("docker")
.join("test-agent")
.join("Dockerfile"),
)
.arg(&repo_root)
.output()
.expect("build sandbox-agent test image");
if !output.status.success() {
panic!(
"failed to build sandbox-agent test image: {}",
String::from_utf8_lossy(&output.stderr)
);
}
image_tag
})
.clone()
}
fn build_env(
layout: &TestLayout,
auth: &AuthConfig,
options: &TestAppOptions,
) -> BTreeMap<String, String> {
let mut env = BTreeMap::new();
env.insert(
"HOME".to_string(),
layout.home.to_string_lossy().to_string(),
);
env.insert(
"USERPROFILE".to_string(),
layout.home.to_string_lossy().to_string(),
);
env.insert(
"XDG_DATA_HOME".to_string(),
layout.xdg_data_home.to_string_lossy().to_string(),
);
env.insert(
"XDG_STATE_HOME".to_string(),
layout.xdg_state_home.to_string_lossy().to_string(),
);
env.insert(
"APPDATA".to_string(),
layout.appdata.to_string_lossy().to_string(),
);
env.insert(
"LOCALAPPDATA".to_string(),
layout.local_appdata.to_string_lossy().to_string(),
);
for (key, value) in std::env::vars() {
if key == "PATH" {
continue;
}
if key == "XDG_STATE_HOME" || key == "HOME" || key == "USERPROFILE" {
continue;
}
if key.starts_with("SANDBOX_AGENT_") || key.starts_with("OPENCODE_COMPAT_") {
env.insert(key.clone(), rewrite_localhost_url(&key, &value));
}
}
if let Some(token) = auth.token.as_ref() {
env.insert("SANDBOX_AGENT_TEST_AUTH_TOKEN".to_string(), token.clone());
}
if options.replace_path {
env.insert(
"PATH".to_string(),
options.env.get("PATH").cloned().unwrap_or_default(),
);
} else {
let mut custom_path_entries =
custom_path_entries(layout.install_dir.parent().expect("install base"));
custom_path_entries.extend(explicit_path_entries());
custom_path_entries.extend(
options
.extra_paths
.iter()
.filter(|path| path.is_absolute() && path.exists())
.cloned(),
);
custom_path_entries.sort();
custom_path_entries.dedup();
if custom_path_entries.is_empty() {
env.insert("PATH".to_string(), DEFAULT_PATH.to_string());
} else {
let joined = custom_path_entries
.iter()
.map(|path| path.to_string_lossy().to_string())
.collect::<Vec<_>>()
.join(":");
env.insert("PATH".to_string(), format!("{joined}:{DEFAULT_PATH}"));
}
}
for (key, value) in &options.env {
if key == "PATH" {
continue;
}
env.insert(key.clone(), rewrite_localhost_url(key, value));
}
env
}
fn build_mounts(root: &Path, env: &BTreeMap<String, String>) -> Vec<PathBuf> {
let mut mounts = BTreeSet::new();
mounts.insert(root.to_path_buf());
for key in [
"HOME",
"USERPROFILE",
"XDG_DATA_HOME",
"XDG_STATE_HOME",
"APPDATA",
"LOCALAPPDATA",
"SANDBOX_AGENT_DESKTOP_FAKE_STATE_DIR",
] {
if let Some(value) = env.get(key) {
let path = PathBuf::from(value);
if path.is_absolute() {
mounts.insert(path);
}
}
}
if let Some(path_value) = env.get("PATH") {
for entry in path_value.split(':') {
if entry.is_empty() || STANDARD_PATHS.contains(&entry) {
continue;
}
let path = PathBuf::from(entry);
if path.is_absolute() && path.exists() {
mounts.insert(path);
}
}
}
mounts.into_iter().collect()
}
fn run_container(
container_id: &str,
image: &str,
mounts: &[PathBuf],
env: &BTreeMap<String, String>,
auth: &AuthConfig,
) -> String {
let mut args = vec![
"run".to_string(),
"-d".to_string(),
"--rm".to_string(),
"--name".to_string(),
container_id.to_string(),
"-p".to_string(),
format!("127.0.0.1::{CONTAINER_PORT}"),
];
#[cfg(unix)]
{
args.push("--user".to_string());
args.push(format!("{}:{}", unsafe { libc::geteuid() }, unsafe {
libc::getegid()
}));
}
if cfg!(target_os = "linux") {
args.push("--add-host".to_string());
args.push("host.docker.internal:host-gateway".to_string());
}
for mount in mounts {
args.push("-v".to_string());
args.push(format!("{}:{}", mount.display(), mount.display()));
}
for (key, value) in env {
args.push("-e".to_string());
args.push(format!("{key}={value}"));
}
args.push(image.to_string());
args.push("server".to_string());
args.push("--host".to_string());
args.push("0.0.0.0".to_string());
args.push("--port".to_string());
args.push(CONTAINER_PORT.to_string());
match auth.token.as_ref() {
Some(token) => {
args.push("--token".to_string());
args.push(token.clone());
}
None => args.push("--no-token".to_string()),
}
let output = Command::new(docker_bin())
.args(&args)
.output()
.expect("start docker test container");
if !output.status.success() {
panic!(
"failed to start docker test container: {}",
String::from_utf8_lossy(&output.stderr)
);
}
let port_output = Command::new(docker_bin())
.args(["port", container_id, &format!("{CONTAINER_PORT}/tcp")])
.output()
.expect("resolve mapped docker port");
if !port_output.status.success() {
panic!(
"failed to resolve docker test port: {}",
String::from_utf8_lossy(&port_output.stderr)
);
}
let mapping = String::from_utf8(port_output.stdout)
.expect("docker port utf8")
.trim()
.to_string();
let host_port = mapping.rsplit(':').next().expect("mapped host port").trim();
let base_url = format!("http://127.0.0.1:{host_port}");
wait_for_health(&base_url, auth.token.as_deref());
base_url
}
fn wait_for_health(base_url: &str, token: Option<&str>) {
let started = SystemTime::now();
loop {
if probe_health(base_url, token) {
return;
}
if started
.elapsed()
.unwrap_or_else(|_| Duration::from_secs(0))
.gt(&Duration::from_secs(30))
{
panic!("timed out waiting for sandbox-agent docker test server");
}
thread::sleep(Duration::from_millis(200));
}
}
fn probe_health(base_url: &str, token: Option<&str>) -> bool {
let address = base_url.strip_prefix("http://").unwrap_or(base_url);
let mut stream = match TcpStream::connect(address) {
Ok(stream) => stream,
Err(_) => return false,
};
let _ = stream.set_read_timeout(Some(Duration::from_secs(2)));
let _ = stream.set_write_timeout(Some(Duration::from_secs(2)));
let mut request =
format!("GET /v1/health HTTP/1.1\r\nHost: {address}\r\nConnection: close\r\n");
if let Some(token) = token {
request.push_str(&format!("Authorization: Bearer {token}\r\n"));
}
request.push_str("\r\n");
if stream.write_all(request.as_bytes()).is_err() {
return false;
}
let mut response = String::new();
if stream.read_to_string(&mut response).is_err() {
return false;
}
response.starts_with("HTTP/1.1 200") || response.starts_with("HTTP/1.0 200")
}
fn custom_path_entries(root: &Path) -> Vec<PathBuf> {
let mut entries = Vec::new();
if let Some(value) = std::env::var_os("PATH") {
for entry in std::env::split_paths(&value) {
if !entry.exists() {
continue;
}
if entry.starts_with(root) || entry.starts_with(std::env::temp_dir()) {
entries.push(entry);
}
}
}
entries.sort();
entries.dedup();
entries
}
fn explicit_path_entries() -> Vec<PathBuf> {
let mut entries = Vec::new();
if let Some(value) = std::env::var_os("SANDBOX_AGENT_TEST_EXTRA_PATHS") {
for entry in std::env::split_paths(&value) {
if entry.is_absolute() && entry.exists() {
entries.push(entry);
}
}
}
entries
}
fn rewrite_localhost_url(key: &str, value: &str) -> String {
if key.ends_with("_URL") || key.ends_with("_URI") {
return value
.replace("http://127.0.0.1", "http://host.docker.internal")
.replace("http://localhost", "http://host.docker.internal");
}
value.to_string()
}
fn unique_container_id() -> String {
let millis = SystemTime::now()
.duration_since(UNIX_EPOCH)
.map(|value| value.as_millis())
.unwrap_or(0);
let counter = CONTAINER_COUNTER.fetch_add(1, Ordering::Relaxed);
format!(
"sandbox-agent-test-{}-{millis}-{counter}",
std::process::id()
)
}
fn repo_root() -> PathBuf {
PathBuf::from(env!("CARGO_MANIFEST_DIR"))
.join("../../..")
.canonicalize()
.expect("repo root")
}
fn docker_bin() -> &'static Path {
DOCKER_BIN
.get_or_init(|| {
if let Some(value) = std::env::var_os("SANDBOX_AGENT_TEST_DOCKER_BIN") {
let path = PathBuf::from(value);
if path.exists() {
return path;
}
}
for candidate in [
"/usr/local/bin/docker",
"/opt/homebrew/bin/docker",
"/usr/bin/docker",
] {
let path = PathBuf::from(candidate);
if path.exists() {
return path;
}
}
PathBuf::from("docker")
})
.as_path()
}
#[cfg(test)]
mod tests {
use super::*;
struct EnvVarGuard {
key: &'static str,
old: Option<std::ffi::OsString>,
}
impl EnvVarGuard {
fn set(key: &'static str, value: &Path) -> Self {
let old = std::env::var_os(key);
std::env::set_var(key, value);
Self { key, old }
}
}
impl Drop for EnvVarGuard {
fn drop(&mut self) {
match self.old.as_ref() {
Some(value) => std::env::set_var(self.key, value),
None => std::env::remove_var(self.key),
}
}
}
#[test]
#[serial]
fn build_env_keeps_test_local_xdg_state_home() {
let root = tempfile::tempdir().expect("create docker support tempdir");
let host_state = tempfile::tempdir().expect("create host xdg state tempdir");
let _guard = EnvVarGuard::set("XDG_STATE_HOME", host_state.path());
let layout = TestLayout::new(root.path());
layout.create();
let env = build_env(&layout, &AuthConfig::disabled(), &TestAppOptions::default());
assert_eq!(
env.get("XDG_STATE_HOME"),
Some(&layout.xdg_state_home.to_string_lossy().to_string())
);
}
}

View file

@ -0,0 +1,332 @@
/// Docker support for common-software integration tests.
///
/// Builds the `docker/test-common-software/Dockerfile` image (which extends the
/// base test-agent image with pre-installed common software) and provides a
/// `TestApp` that runs a container from it.
///
/// KEEP IN SYNC with docs/common-software.mdx and docker/test-common-software/Dockerfile.
use std::collections::BTreeMap;
use std::io::{Read, Write};
use std::net::TcpStream;
use std::path::{Path, PathBuf};
use std::process::Command;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::OnceLock;
use std::thread;
use std::time::{Duration, SystemTime, UNIX_EPOCH};
use tempfile::TempDir;
const CONTAINER_PORT: u16 = 3000;
const DEFAULT_PATH: &str = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin";
const BASE_IMAGE_TAG: &str = "sandbox-agent-test:dev";
const COMMON_SOFTWARE_IMAGE_TAG: &str = "sandbox-agent-test-common-software:dev";
static IMAGE_TAG: OnceLock<String> = OnceLock::new();
static DOCKER_BIN: OnceLock<PathBuf> = OnceLock::new();
static CONTAINER_COUNTER: AtomicU64 = AtomicU64::new(0);
#[derive(Clone)]
pub struct DockerApp {
base_url: String,
}
impl DockerApp {
pub fn http_url(&self, path: &str) -> String {
format!("{}{}", self.base_url, path)
}
}
pub struct TestApp {
pub app: DockerApp,
_root: TempDir,
container_id: String,
}
impl TestApp {
pub fn new() -> Self {
let root = tempfile::tempdir().expect("create docker test root");
let layout = TestLayout::new(root.path());
layout.create();
let container_id = unique_container_id();
let image = ensure_common_software_image();
let env = build_env(&layout);
let mounts = build_mounts(root.path());
let base_url = run_container(&container_id, &image, &mounts, &env);
Self {
app: DockerApp { base_url },
_root: root,
container_id,
}
}
}
impl Drop for TestApp {
fn drop(&mut self) {
let _ = Command::new(docker_bin())
.args(["rm", "-f", &self.container_id])
.output();
}
}
struct TestLayout {
home: PathBuf,
xdg_data_home: PathBuf,
xdg_state_home: PathBuf,
}
impl TestLayout {
fn new(root: &Path) -> Self {
Self {
home: root.join("home"),
xdg_data_home: root.join("xdg-data"),
xdg_state_home: root.join("xdg-state"),
}
}
fn create(&self) {
for dir in [&self.home, &self.xdg_data_home, &self.xdg_state_home] {
std::fs::create_dir_all(dir).expect("create docker test dir");
}
}
}
fn ensure_base_image() -> String {
let repo_root = repo_root();
let image_tag =
std::env::var("SANDBOX_AGENT_TEST_IMAGE").unwrap_or_else(|_| BASE_IMAGE_TAG.to_string());
let output = Command::new(docker_bin())
.args(["build", "--tag", &image_tag, "--file"])
.arg(
repo_root
.join("docker")
.join("test-agent")
.join("Dockerfile"),
)
.arg(&repo_root)
.output()
.expect("build base test image");
if !output.status.success() {
panic!(
"failed to build base test image: {}",
String::from_utf8_lossy(&output.stderr)
);
}
image_tag
}
fn ensure_common_software_image() -> String {
IMAGE_TAG
.get_or_init(|| {
let base_image = ensure_base_image();
let repo_root = repo_root();
let image_tag = std::env::var("SANDBOX_AGENT_TEST_COMMON_SOFTWARE_IMAGE")
.unwrap_or_else(|_| COMMON_SOFTWARE_IMAGE_TAG.to_string());
let output = Command::new(docker_bin())
.args([
"build",
"--tag",
&image_tag,
"--build-arg",
&format!("BASE_IMAGE={base_image}"),
"--file",
])
.arg(
repo_root
.join("docker")
.join("test-common-software")
.join("Dockerfile"),
)
.arg(&repo_root)
.output()
.expect("build common-software test image");
if !output.status.success() {
panic!(
"failed to build common-software test image: {}",
String::from_utf8_lossy(&output.stderr)
);
}
image_tag
})
.clone()
}
fn build_env(layout: &TestLayout) -> BTreeMap<String, String> {
let mut env = BTreeMap::new();
env.insert(
"HOME".to_string(),
layout.home.to_string_lossy().to_string(),
);
env.insert(
"XDG_DATA_HOME".to_string(),
layout.xdg_data_home.to_string_lossy().to_string(),
);
env.insert(
"XDG_STATE_HOME".to_string(),
layout.xdg_state_home.to_string_lossy().to_string(),
);
env.insert("PATH".to_string(), DEFAULT_PATH.to_string());
env
}
fn build_mounts(root: &Path) -> Vec<PathBuf> {
vec![root.to_path_buf()]
}
fn run_container(
container_id: &str,
image: &str,
mounts: &[PathBuf],
env: &BTreeMap<String, String>,
) -> String {
let mut args = vec![
"run".to_string(),
"-d".to_string(),
"--rm".to_string(),
"--name".to_string(),
container_id.to_string(),
"-p".to_string(),
format!("127.0.0.1::{CONTAINER_PORT}"),
];
if cfg!(target_os = "linux") {
args.push("--add-host".to_string());
args.push("host.docker.internal:host-gateway".to_string());
}
for mount in mounts {
args.push("-v".to_string());
args.push(format!("{}:{}", mount.display(), mount.display()));
}
for (key, value) in env {
args.push("-e".to_string());
args.push(format!("{key}={value}"));
}
args.push(image.to_string());
args.push("server".to_string());
args.push("--host".to_string());
args.push("0.0.0.0".to_string());
args.push("--port".to_string());
args.push(CONTAINER_PORT.to_string());
args.push("--no-token".to_string());
let output = Command::new(docker_bin())
.args(&args)
.output()
.expect("start docker test container");
if !output.status.success() {
panic!(
"failed to start docker test container: {}",
String::from_utf8_lossy(&output.stderr)
);
}
let port_output = Command::new(docker_bin())
.args(["port", container_id, &format!("{CONTAINER_PORT}/tcp")])
.output()
.expect("resolve mapped docker port");
if !port_output.status.success() {
panic!(
"failed to resolve docker test port: {}",
String::from_utf8_lossy(&port_output.stderr)
);
}
let mapping = String::from_utf8(port_output.stdout)
.expect("docker port utf8")
.trim()
.to_string();
let host_port = mapping.rsplit(':').next().expect("mapped host port").trim();
let base_url = format!("http://127.0.0.1:{host_port}");
wait_for_health(&base_url);
base_url
}
fn wait_for_health(base_url: &str) {
let started = SystemTime::now();
loop {
if probe_health(base_url) {
return;
}
if started
.elapsed()
.unwrap_or_else(|_| Duration::from_secs(0))
.gt(&Duration::from_secs(60))
{
panic!("timed out waiting for common-software docker test server");
}
thread::sleep(Duration::from_millis(200));
}
}
fn probe_health(base_url: &str) -> bool {
let address = base_url.strip_prefix("http://").unwrap_or(base_url);
let mut stream = match TcpStream::connect(address) {
Ok(stream) => stream,
Err(_) => return false,
};
let _ = stream.set_read_timeout(Some(Duration::from_secs(2)));
let _ = stream.set_write_timeout(Some(Duration::from_secs(2)));
let request =
format!("GET /v1/health HTTP/1.1\r\nHost: {address}\r\nConnection: close\r\n\r\n");
if stream.write_all(request.as_bytes()).is_err() {
return false;
}
let mut response = String::new();
if stream.read_to_string(&mut response).is_err() {
return false;
}
response.starts_with("HTTP/1.1 200") || response.starts_with("HTTP/1.0 200")
}
fn unique_container_id() -> String {
let millis = SystemTime::now()
.duration_since(UNIX_EPOCH)
.map(|value| value.as_millis())
.unwrap_or(0);
let counter = CONTAINER_COUNTER.fetch_add(1, Ordering::Relaxed);
format!(
"sandbox-agent-common-sw-{}-{millis}-{counter}",
std::process::id()
)
}
fn repo_root() -> PathBuf {
PathBuf::from(env!("CARGO_MANIFEST_DIR"))
.join("../../..")
.canonicalize()
.expect("repo root")
}
fn docker_bin() -> &'static Path {
DOCKER_BIN
.get_or_init(|| {
if let Some(value) = std::env::var_os("SANDBOX_AGENT_TEST_DOCKER_BIN") {
let path = PathBuf::from(value);
if path.exists() {
return path;
}
}
for candidate in [
"/usr/local/bin/docker",
"/opt/homebrew/bin/docker",
"/usr/bin/docker",
] {
let path = PathBuf::from(candidate);
if path.exists() {
return path;
}
}
PathBuf::from("docker")
})
.as_path()
}

View file

@ -1,37 +1,14 @@
use std::fs;
use std::path::Path;
use axum::body::Body;
use axum::http::{Method, Request, StatusCode};
use futures::StreamExt;
use http_body_util::BodyExt;
use sandbox_agent::router::{build_router, AppState, AuthConfig};
use sandbox_agent_agent_management::agents::AgentManager;
use reqwest::{Method, StatusCode};
use sandbox_agent::router::AuthConfig;
use serde_json::{json, Value};
use tempfile::TempDir;
use tower::util::ServiceExt;
struct TestApp {
app: axum::Router,
_install_dir: TempDir,
}
impl TestApp {
fn with_setup<F>(setup: F) -> Self
where
F: FnOnce(&Path),
{
let install_dir = tempfile::tempdir().expect("create temp install dir");
setup(install_dir.path());
let manager = AgentManager::new(install_dir.path()).expect("create agent manager");
let state = AppState::new(AuthConfig::disabled(), manager);
let app = build_router(state);
Self {
app,
_install_dir: install_dir,
}
}
}
#[path = "support/docker.rs"]
mod docker_support;
use docker_support::TestApp;
fn write_executable(path: &Path, script: &str) {
fs::write(path, script).expect("write executable");
@ -101,28 +78,29 @@ fn setup_stub_agent_process_only(install_dir: &Path, agent: &str) {
}
async fn send_request(
app: &axum::Router,
app: &docker_support::DockerApp,
method: Method,
uri: &str,
body: Option<Value>,
) -> (StatusCode, Vec<u8>) {
let mut builder = Request::builder().method(method).uri(uri);
let request_body = if let Some(body) = body {
builder = builder.header("content-type", "application/json");
Body::from(body.to_string())
let client = reqwest::Client::new();
let response = if let Some(body) = body {
client
.request(method, app.http_url(uri))
.header("content-type", "application/json")
.body(body.to_string())
.send()
.await
.expect("request handled")
} else {
Body::empty()
client
.request(method, app.http_url(uri))
.send()
.await
.expect("request handled")
};
let request = builder.body(request_body).expect("build request");
let response = app.clone().oneshot(request).await.expect("request handled");
let status = response.status();
let bytes = response
.into_body()
.collect()
.await
.expect("collect body")
.to_bytes();
let bytes = response.bytes().await.expect("collect body");
(status, bytes.to_vec())
}
@ -145,7 +123,7 @@ async fn agent_process_matrix_smoke_and_jsonrpc_conformance() {
.chain(agent_process_only_agents.iter())
.copied()
.collect();
let test_app = TestApp::with_setup(|install_dir| {
let test_app = TestApp::with_setup(AuthConfig::disabled(), |install_dir| {
for agent in native_agents {
setup_stub_artifacts(install_dir, agent);
}
@ -201,21 +179,15 @@ async fn agent_process_matrix_smoke_and_jsonrpc_conformance() {
assert_eq!(new_json["id"], 2, "{agent}: session/new id");
assert_eq!(new_json["result"]["echoedMethod"], "session/new");
let request = Request::builder()
.method(Method::GET)
.uri(format!("/v1/acp/{agent}-server"))
.body(Body::empty())
.expect("build sse request");
let response = test_app
.app
.clone()
.oneshot(request)
let response = reqwest::Client::new()
.get(test_app.app.http_url(&format!("/v1/acp/{agent}-server")))
.header("accept", "text/event-stream")
.send()
.await
.expect("sse response");
assert_eq!(response.status(), StatusCode::OK);
let mut stream = response.into_body().into_data_stream();
let mut stream = response.bytes_stream();
let chunk = tokio::time::timeout(std::time::Duration::from_secs(5), async move {
while let Some(item) = stream.next().await {
let bytes = item.expect("sse chunk");

View file

@ -1,128 +1,19 @@
use std::fs;
use std::io::{Read, Write};
use std::net::{SocketAddr, TcpListener, TcpStream};
use std::net::{TcpListener, TcpStream};
use std::path::Path;
use std::time::Duration;
use axum::body::Body;
use axum::http::{header, HeaderMap, Method, Request, StatusCode};
use axum::Router;
use futures::StreamExt;
use http_body_util::BodyExt;
use sandbox_agent::router::{build_router, AppState, AuthConfig};
use sandbox_agent_agent_management::agents::AgentManager;
use reqwest::header::{self, HeaderMap, HeaderName, HeaderValue};
use reqwest::{Method, StatusCode};
use sandbox_agent::router::AuthConfig;
use serde_json::{json, Value};
use serial_test::serial;
use tempfile::TempDir;
use tokio::sync::oneshot;
use tokio::task::JoinHandle;
use tower::util::ServiceExt;
struct TestApp {
app: Router,
install_dir: TempDir,
}
impl TestApp {
fn new(auth: AuthConfig) -> Self {
Self::with_setup(auth, |_| {})
}
fn with_setup<F>(auth: AuthConfig, setup: F) -> Self
where
F: FnOnce(&Path),
{
let install_dir = tempfile::tempdir().expect("create temp install dir");
setup(install_dir.path());
let manager = AgentManager::new(install_dir.path()).expect("create agent manager");
let state = AppState::new(auth, manager);
let app = build_router(state);
Self { app, install_dir }
}
fn install_path(&self) -> &Path {
self.install_dir.path()
}
}
struct EnvVarGuard {
key: &'static str,
previous: Option<std::ffi::OsString>,
}
struct LiveServer {
address: SocketAddr,
shutdown_tx: Option<oneshot::Sender<()>>,
task: JoinHandle<()>,
}
impl LiveServer {
async fn spawn(app: Router) -> Self {
let listener = tokio::net::TcpListener::bind("127.0.0.1:0")
.await
.expect("bind live server");
let address = listener.local_addr().expect("live server address");
let (shutdown_tx, shutdown_rx) = oneshot::channel::<()>();
let task = tokio::spawn(async move {
let server =
axum::serve(listener, app.into_make_service()).with_graceful_shutdown(async {
let _ = shutdown_rx.await;
});
let _ = server.await;
});
Self {
address,
shutdown_tx: Some(shutdown_tx),
task,
}
}
fn http_url(&self, path: &str) -> String {
format!("http://{}{}", self.address, path)
}
fn ws_url(&self, path: &str) -> String {
format!("ws://{}{}", self.address, path)
}
async fn shutdown(mut self) {
if let Some(shutdown_tx) = self.shutdown_tx.take() {
let _ = shutdown_tx.send(());
}
let _ = tokio::time::timeout(Duration::from_secs(3), async {
let _ = self.task.await;
})
.await;
}
}
impl EnvVarGuard {
fn set(key: &'static str, value: &str) -> Self {
let previous = std::env::var_os(key);
std::env::set_var(key, value);
Self { key, previous }
}
fn set_os(key: &'static str, value: &std::ffi::OsStr) -> Self {
let previous = std::env::var_os(key);
std::env::set_var(key, value);
Self { key, previous }
}
}
impl Drop for EnvVarGuard {
fn drop(&mut self) {
if let Some(previous) = self.previous.as_ref() {
std::env::set_var(self.key, previous);
} else {
std::env::remove_var(self.key);
}
}
}
#[path = "support/docker.rs"]
mod docker_support;
use docker_support::{LiveServer, TestApp};
fn write_executable(path: &Path, script: &str) {
fs::write(path, script).expect("write executable");
@ -168,17 +59,18 @@ exit 0
}
fn serve_registry_once(document: Value) -> String {
let listener = TcpListener::bind("127.0.0.1:0").expect("bind registry server");
let address = listener.local_addr().expect("registry address");
let listener = TcpListener::bind("0.0.0.0:0").expect("bind registry server");
let port = listener.local_addr().expect("registry address").port();
let body = document.to_string();
std::thread::spawn(move || {
if let Ok((mut stream, _)) = listener.accept() {
respond_json(&mut stream, &body);
std::thread::spawn(move || loop {
match listener.accept() {
Ok((mut stream, _)) => respond_json(&mut stream, &body),
Err(_) => break,
}
});
format!("http://{address}/registry.json")
format!("http://127.0.0.1:{port}/registry.json")
}
fn respond_json(stream: &mut TcpStream, body: &str) {
@ -196,74 +88,96 @@ fn respond_json(stream: &mut TcpStream, body: &str) {
}
async fn send_request(
app: &Router,
app: &docker_support::DockerApp,
method: Method,
uri: &str,
body: Option<Value>,
headers: &[(&str, &str)],
) -> (StatusCode, HeaderMap, Vec<u8>) {
let mut builder = Request::builder().method(method).uri(uri);
let client = reqwest::Client::new();
let mut builder = client.request(method, app.http_url(uri));
for (name, value) in headers {
builder = builder.header(*name, *value);
let header_name = HeaderName::from_bytes(name.as_bytes()).expect("header name");
let header_value = HeaderValue::from_str(value).expect("header value");
builder = builder.header(header_name, header_value);
}
let request_body = if let Some(body) = body {
builder = builder.header(header::CONTENT_TYPE, "application/json");
Body::from(body.to_string())
let response = if let Some(body) = body {
builder
.header(header::CONTENT_TYPE, "application/json")
.body(body.to_string())
.send()
.await
.expect("request handled")
} else {
Body::empty()
builder.send().await.expect("request handled")
};
let request = builder.body(request_body).expect("build request");
let response = app.clone().oneshot(request).await.expect("request handled");
let status = response.status();
let headers = response.headers().clone();
let bytes = response
.into_body()
.collect()
.await
.expect("collect body")
.to_bytes();
let bytes = response.bytes().await.expect("collect body");
(status, headers, bytes.to_vec())
}
async fn send_request_raw(
app: &Router,
app: &docker_support::DockerApp,
method: Method,
uri: &str,
body: Option<Vec<u8>>,
headers: &[(&str, &str)],
content_type: Option<&str>,
) -> (StatusCode, HeaderMap, Vec<u8>) {
let mut builder = Request::builder().method(method).uri(uri);
let client = reqwest::Client::new();
let mut builder = client.request(method, app.http_url(uri));
for (name, value) in headers {
builder = builder.header(*name, *value);
let header_name = HeaderName::from_bytes(name.as_bytes()).expect("header name");
let header_value = HeaderValue::from_str(value).expect("header value");
builder = builder.header(header_name, header_value);
}
let request_body = if let Some(body) = body {
let response = if let Some(body) = body {
if let Some(content_type) = content_type {
builder = builder.header(header::CONTENT_TYPE, content_type);
}
Body::from(body)
builder.body(body).send().await.expect("request handled")
} else {
Body::empty()
builder.send().await.expect("request handled")
};
let request = builder.body(request_body).expect("build request");
let response = app.clone().oneshot(request).await.expect("request handled");
let status = response.status();
let headers = response.headers().clone();
let bytes = response
.into_body()
.collect()
.await
.expect("collect body")
.to_bytes();
let bytes = response.bytes().await.expect("collect body");
(status, headers, bytes.to_vec())
}
async fn launch_desktop_focus_window(app: &docker_support::DockerApp, display: &str) {
let command = r#"nohup xterm -geometry 80x24+40+40 -title 'Sandbox Desktop Test' -e sh -lc 'sleep 60' >/tmp/sandbox-agent-xterm.log 2>&1 < /dev/null & for _ in $(seq 1 50); do wid="$(xdotool search --onlyvisible --name 'Sandbox Desktop Test' 2>/dev/null | head -n 1 || true)"; if [ -n "$wid" ]; then xdotool windowactivate "$wid"; exit 0; fi; sleep 0.1; done; exit 1"#;
let (status, _, body) = send_request(
app,
Method::POST,
"/v1/processes/run",
Some(json!({
"command": "sh",
"args": ["-lc", command],
"env": {
"DISPLAY": display,
},
"timeoutMs": 10_000
})),
&[],
)
.await;
assert_eq!(
status,
StatusCode::OK,
"unexpected desktop focus window launch response: {}",
String::from_utf8_lossy(&body)
);
let parsed = parse_json(&body);
assert_eq!(parsed["exitCode"], 0);
}
fn parse_json(bytes: &[u8]) -> Value {
if bytes.is_empty() {
Value::Null
@ -284,7 +198,7 @@ fn initialize_payload() -> Value {
})
}
async fn bootstrap_server(app: &Router, server_id: &str, agent: &str) {
async fn bootstrap_server(app: &docker_support::DockerApp, server_id: &str, agent: &str) {
let initialize = initialize_payload();
let (status, _, _body) = send_request(
app,
@ -297,17 +211,17 @@ async fn bootstrap_server(app: &Router, server_id: &str, agent: &str) {
assert_eq!(status, StatusCode::OK);
}
async fn read_first_sse_data(app: &Router, server_id: &str) -> String {
let request = Request::builder()
.method(Method::GET)
.uri(format!("/v1/acp/{server_id}"))
.body(Body::empty())
.expect("build request");
let response = app.clone().oneshot(request).await.expect("sse response");
async fn read_first_sse_data(app: &docker_support::DockerApp, server_id: &str) -> String {
let client = reqwest::Client::new();
let response = client
.get(app.http_url(&format!("/v1/acp/{server_id}")))
.header("accept", "text/event-stream")
.send()
.await
.expect("sse response");
assert_eq!(response.status(), StatusCode::OK);
let mut stream = response.into_body().into_data_stream();
let mut stream = response.bytes_stream();
tokio::time::timeout(Duration::from_secs(5), async move {
while let Some(chunk) = stream.next().await {
let bytes = chunk.expect("stream chunk");
@ -323,21 +237,21 @@ async fn read_first_sse_data(app: &Router, server_id: &str) -> String {
}
async fn read_first_sse_data_with_last_id(
app: &Router,
app: &docker_support::DockerApp,
server_id: &str,
last_event_id: u64,
) -> String {
let request = Request::builder()
.method(Method::GET)
.uri(format!("/v1/acp/{server_id}"))
let client = reqwest::Client::new();
let response = client
.get(app.http_url(&format!("/v1/acp/{server_id}")))
.header("accept", "text/event-stream")
.header("last-event-id", last_event_id.to_string())
.body(Body::empty())
.expect("build request");
let response = app.clone().oneshot(request).await.expect("sse response");
.send()
.await
.expect("sse response");
assert_eq!(response.status(), StatusCode::OK);
let mut stream = response.into_body().into_data_stream();
let mut stream = response.bytes_stream();
tokio::time::timeout(Duration::from_secs(5), async move {
while let Some(chunk) = stream.next().await {
let bytes = chunk.expect("stream chunk");
@ -375,5 +289,7 @@ mod acp_transport;
mod config_endpoints;
#[path = "v1_api/control_plane.rs"]
mod control_plane;
#[path = "v1_api/desktop.rs"]
mod desktop;
#[path = "v1_api/processes.rs"]
mod processes;

View file

@ -22,8 +22,9 @@ async fn mcp_config_requires_directory_and_name() {
#[tokio::test]
async fn mcp_config_crud_round_trip() {
let test_app = TestApp::new(AuthConfig::disabled());
let project = tempfile::tempdir().expect("tempdir");
let directory = project.path().to_string_lossy().to_string();
let project = test_app.root_path().join("mcp-config-project");
fs::create_dir_all(&project).expect("create project dir");
let directory = project.to_string_lossy().to_string();
let entry = json!({
"type": "local",
@ -99,8 +100,9 @@ async fn skills_config_requires_directory_and_name() {
#[tokio::test]
async fn skills_config_crud_round_trip() {
let test_app = TestApp::new(AuthConfig::disabled());
let project = tempfile::tempdir().expect("tempdir");
let directory = project.path().to_string_lossy().to_string();
let project = test_app.root_path().join("skills-config-project");
fs::create_dir_all(&project).expect("create project dir");
let directory = project.to_string_lossy().to_string();
let entry = json!({
"sources": [

View file

@ -1,4 +1,5 @@
use super::*;
use std::collections::BTreeMap;
#[tokio::test]
async fn v1_health_removed_legacy_and_opencode_unmounted() {
@ -137,10 +138,19 @@ async fn v1_filesystem_endpoints_round_trip() {
#[tokio::test]
#[serial]
async fn require_preinstall_blocks_missing_agent() {
let test_app = {
let _preinstall = EnvVarGuard::set("SANDBOX_AGENT_REQUIRE_PREINSTALL", "true");
TestApp::new(AuthConfig::disabled())
};
let mut env = BTreeMap::new();
env.insert(
"SANDBOX_AGENT_REQUIRE_PREINSTALL".to_string(),
"true".to_string(),
);
let test_app = TestApp::with_options(
AuthConfig::disabled(),
docker_support::TestAppOptions {
env,
..Default::default()
},
|_| {},
);
let (status, _, body) = send_request(
&test_app.app,
@ -176,20 +186,26 @@ async fn lazy_install_runs_on_first_bootstrap() {
]
}));
let _registry = EnvVarGuard::set("SANDBOX_AGENT_ACP_REGISTRY_URL", &registry_url);
let test_app = TestApp::with_setup(AuthConfig::disabled(), |install_path| {
fs::create_dir_all(install_path.join("agent_processes"))
.expect("create agent processes dir");
write_executable(&install_path.join("codex"), "#!/usr/bin/env sh\nexit 0\n");
fs::create_dir_all(install_path.join("bin")).expect("create bin dir");
write_fake_npm(&install_path.join("bin").join("npm"));
});
let helper_bin_root = tempfile::tempdir().expect("helper bin tempdir");
let helper_bin = helper_bin_root.path().join("bin");
fs::create_dir_all(&helper_bin).expect("create helper bin dir");
write_fake_npm(&helper_bin.join("npm"));
let original_path = std::env::var_os("PATH").unwrap_or_default();
let mut paths = vec![test_app.install_path().join("bin")];
paths.extend(std::env::split_paths(&original_path));
let merged_path = std::env::join_paths(paths).expect("join PATH");
let _path_guard = EnvVarGuard::set_os("PATH", merged_path.as_os_str());
let mut env = BTreeMap::new();
env.insert("SANDBOX_AGENT_ACP_REGISTRY_URL".to_string(), registry_url);
let test_app = TestApp::with_options(
AuthConfig::disabled(),
docker_support::TestAppOptions {
env,
extra_paths: vec![helper_bin.clone()],
..Default::default()
},
|install_path| {
fs::create_dir_all(install_path.join("agent_processes"))
.expect("create agent processes dir");
write_executable(&install_path.join("codex"), "#!/usr/bin/env sh\nexit 0\n");
},
);
let (status, _, _) = send_request(
&test_app.app,

View file

@ -0,0 +1,494 @@
use super::*;
use futures::{SinkExt, StreamExt};
use serial_test::serial;
use std::collections::BTreeMap;
use tokio_tungstenite::connect_async;
use tokio_tungstenite::tungstenite::Message;
fn png_dimensions(bytes: &[u8]) -> (u32, u32) {
assert!(bytes.starts_with(b"\x89PNG\r\n\x1a\n"));
let width = u32::from_be_bytes(bytes[16..20].try_into().expect("png width bytes"));
let height = u32::from_be_bytes(bytes[20..24].try_into().expect("png height bytes"));
(width, height)
}
async fn recv_ws_message(
ws: &mut tokio_tungstenite::WebSocketStream<
tokio_tungstenite::MaybeTlsStream<tokio::net::TcpStream>,
>,
) -> Message {
tokio::time::timeout(Duration::from_secs(5), ws.next())
.await
.expect("timed out waiting for websocket frame")
.expect("websocket stream ended")
.expect("websocket frame")
}
#[tokio::test]
#[serial]
async fn v1_desktop_status_reports_install_required_when_dependencies_are_missing() {
let temp = tempfile::tempdir().expect("create empty path tempdir");
let mut env = BTreeMap::new();
env.insert(
"PATH".to_string(),
temp.path().to_string_lossy().to_string(),
);
let test_app = TestApp::with_options(
AuthConfig::disabled(),
docker_support::TestAppOptions {
env,
replace_path: true,
..Default::default()
},
|_| {},
);
let (status, _, body) =
send_request(&test_app.app, Method::GET, "/v1/desktop/status", None, &[]).await;
assert_eq!(status, StatusCode::OK);
let parsed = parse_json(&body);
assert_eq!(parsed["state"], "install_required");
assert!(parsed["missingDependencies"]
.as_array()
.expect("missingDependencies array")
.iter()
.any(|value| value == "Xvfb"));
assert_eq!(
parsed["installCommand"],
"sandbox-agent install desktop --yes"
);
}
#[tokio::test]
#[serial]
async fn v1_desktop_lifecycle_and_actions_work_with_real_runtime() {
let test_app = TestApp::new(AuthConfig::disabled());
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/start",
Some(json!({
"width": 1440,
"height": 900,
"dpi": 96
})),
&[],
)
.await;
assert_eq!(
status,
StatusCode::OK,
"unexpected start response: {}",
String::from_utf8_lossy(&body)
);
let parsed = parse_json(&body);
assert_eq!(parsed["state"], "active");
let display = parsed["display"]
.as_str()
.expect("desktop display")
.to_string();
assert!(display.starts_with(':'));
assert_eq!(parsed["resolution"]["width"], 1440);
assert_eq!(parsed["resolution"]["height"], 900);
let (status, headers, body) = send_request_raw(
&test_app.app,
Method::GET,
"/v1/desktop/screenshot",
None,
&[],
None,
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(
headers
.get(header::CONTENT_TYPE)
.and_then(|value| value.to_str().ok()),
Some("image/png")
);
assert!(body.starts_with(b"\x89PNG\r\n\x1a\n"));
assert_eq!(png_dimensions(&body), (1440, 900));
let (status, headers, body) = send_request_raw(
&test_app.app,
Method::GET,
"/v1/desktop/screenshot?format=jpeg&quality=50",
None,
&[],
None,
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(
headers
.get(header::CONTENT_TYPE)
.and_then(|value| value.to_str().ok()),
Some("image/jpeg")
);
assert!(body.starts_with(&[0xff, 0xd8, 0xff]));
let (status, headers, body) = send_request_raw(
&test_app.app,
Method::GET,
"/v1/desktop/screenshot?scale=0.5",
None,
&[],
None,
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(
headers
.get(header::CONTENT_TYPE)
.and_then(|value| value.to_str().ok()),
Some("image/png")
);
assert_eq!(png_dimensions(&body), (720, 450));
let (status, _, body) = send_request_raw(
&test_app.app,
Method::GET,
"/v1/desktop/screenshot/region?x=10&y=20&width=30&height=40",
None,
&[],
None,
)
.await;
assert_eq!(status, StatusCode::OK);
assert!(body.starts_with(b"\x89PNG\r\n\x1a\n"));
let (status, _, body) = send_request(
&test_app.app,
Method::GET,
"/v1/desktop/display/info",
None,
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let display_info = parse_json(&body);
assert_eq!(display_info["display"], display);
assert_eq!(display_info["resolution"]["width"], 1440);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/mouse/move",
Some(json!({ "x": 400, "y": 300 })),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let mouse = parse_json(&body);
assert_eq!(mouse["x"], 400);
assert_eq!(mouse["y"], 300);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/mouse/drag",
Some(json!({
"startX": 100,
"startY": 110,
"endX": 220,
"endY": 230,
"button": "left"
})),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let dragged = parse_json(&body);
assert_eq!(dragged["x"], 220);
assert_eq!(dragged["y"], 230);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/mouse/click",
Some(json!({
"x": 220,
"y": 230,
"button": "left",
"clickCount": 1
})),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let clicked = parse_json(&body);
assert_eq!(clicked["x"], 220);
assert_eq!(clicked["y"], 230);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/mouse/down",
Some(json!({
"x": 220,
"y": 230,
"button": "left"
})),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let mouse_down = parse_json(&body);
assert_eq!(mouse_down["x"], 220);
assert_eq!(mouse_down["y"], 230);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/mouse/move",
Some(json!({ "x": 260, "y": 280 })),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let moved_while_down = parse_json(&body);
assert_eq!(moved_while_down["x"], 260);
assert_eq!(moved_while_down["y"], 280);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/mouse/up",
Some(json!({ "button": "left" })),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let mouse_up = parse_json(&body);
assert_eq!(mouse_up["x"], 260);
assert_eq!(mouse_up["y"], 280);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/mouse/scroll",
Some(json!({
"x": 220,
"y": 230,
"deltaY": -3
})),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let scrolled = parse_json(&body);
assert_eq!(scrolled["x"], 220);
assert_eq!(scrolled["y"], 230);
let (status, _, body) =
send_request(&test_app.app, Method::GET, "/v1/desktop/windows", None, &[]).await;
assert_eq!(status, StatusCode::OK);
assert!(parse_json(&body)["windows"].is_array());
let (status, _, body) = send_request(
&test_app.app,
Method::GET,
"/v1/desktop/mouse/position",
None,
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let position = parse_json(&body);
assert_eq!(position["x"], 220);
assert_eq!(position["y"], 230);
launch_desktop_focus_window(&test_app.app, &display).await;
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/keyboard/type",
Some(json!({ "text": "hello world", "delayMs": 5 })),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["ok"], true);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/keyboard/press",
Some(json!({ "key": "ctrl+l" })),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["ok"], true);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/keyboard/press",
Some(json!({
"key": "l",
"modifiers": {
"ctrl": true
}
})),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["ok"], true);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/keyboard/down",
Some(json!({ "key": "shift" })),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["ok"], true);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/keyboard/up",
Some(json!({ "key": "shift" })),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["ok"], true);
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/recording/start",
Some(json!({ "fps": 8 })),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let recording = parse_json(&body);
let recording_id = recording["id"].as_str().expect("recording id").to_string();
assert_eq!(recording["status"], "recording");
tokio::time::sleep(Duration::from_secs(2)).await;
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/recording/stop",
None,
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let stopped_recording = parse_json(&body);
assert_eq!(stopped_recording["id"], recording_id);
assert_eq!(stopped_recording["status"], "completed");
let (status, _, body) = send_request(
&test_app.app,
Method::GET,
"/v1/desktop/recordings",
None,
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
assert!(parse_json(&body)["recordings"].is_array());
let (status, headers, body) = send_request_raw(
&test_app.app,
Method::GET,
&format!("/v1/desktop/recordings/{recording_id}/download"),
None,
&[],
None,
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(
headers
.get(header::CONTENT_TYPE)
.and_then(|value| value.to_str().ok()),
Some("video/mp4")
);
assert!(body.windows(4).any(|window| window == b"ftyp"));
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/stream/start",
None,
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["active"], true);
let (mut ws, _) = connect_async(test_app.app.ws_url("/v1/desktop/stream/ws"))
.await
.expect("connect desktop stream websocket");
let ready = recv_ws_message(&mut ws).await;
match ready {
Message::Text(text) => {
let value: Value = serde_json::from_str(&text).expect("desktop stream ready frame");
assert_eq!(value["type"], "ready");
assert_eq!(value["width"], 1440);
assert_eq!(value["height"], 900);
}
other => panic!("expected text ready frame, got {other:?}"),
}
let frame = recv_ws_message(&mut ws).await;
match frame {
Message::Binary(bytes) => assert!(bytes.starts_with(&[0xff, 0xd8, 0xff])),
other => panic!("expected binary jpeg frame, got {other:?}"),
}
ws.send(Message::Text(
json!({
"type": "moveMouse",
"x": 320,
"y": 330
})
.to_string()
.into(),
))
.await
.expect("send desktop stream mouse move");
let _ = ws.close(None).await;
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/stream/stop",
None,
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["active"], false);
let (status, _, _) = send_request(
&test_app.app,
Method::DELETE,
&format!("/v1/desktop/recordings/{recording_id}"),
None,
&[],
)
.await;
assert_eq!(status, StatusCode::NO_CONTENT);
let (status, _, body) =
send_request(&test_app.app, Method::POST, "/v1/desktop/stop", None, &[]).await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["state"], "inactive");
}

View file

@ -2,6 +2,7 @@ use super::*;
use base64::engine::general_purpose::STANDARD as BASE64;
use base64::Engine;
use futures::{SinkExt, StreamExt};
use serial_test::serial;
use tokio_tungstenite::connect_async;
use tokio_tungstenite::tungstenite::Message;
@ -277,6 +278,98 @@ async fn v1_process_tty_input_and_logs() {
assert_eq!(status, StatusCode::NO_CONTENT);
}
#[tokio::test]
#[serial]
async fn v1_processes_owner_filter_separates_user_and_desktop_processes() {
let test_app = TestApp::new(AuthConfig::disabled());
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/processes",
Some(json!({
"command": "sh",
"args": ["-lc", "sleep 30"],
"tty": false,
"interactive": false
})),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let user_process_id = parse_json(&body)["id"]
.as_str()
.expect("process id")
.to_string();
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/desktop/start",
Some(json!({
"width": 1024,
"height": 768
})),
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["state"], "active");
let (status, _, body) = send_request(
&test_app.app,
Method::GET,
"/v1/processes?owner=user",
None,
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let user_processes = parse_json(&body)["processes"]
.as_array()
.cloned()
.unwrap_or_default();
assert!(user_processes
.iter()
.any(|process| process["id"] == user_process_id));
assert!(user_processes
.iter()
.all(|process| process["owner"] == "user"));
let (status, _, body) = send_request(
&test_app.app,
Method::GET,
"/v1/processes?owner=desktop",
None,
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let desktop_processes = parse_json(&body)["processes"]
.as_array()
.cloned()
.unwrap_or_default();
assert!(desktop_processes.len() >= 2);
assert!(desktop_processes
.iter()
.all(|process| process["owner"] == "desktop"));
let (status, _, _) = send_request(
&test_app.app,
Method::POST,
&format!("/v1/processes/{user_process_id}/kill"),
None,
&[],
)
.await;
assert_eq!(status, StatusCode::OK);
let (status, _, body) =
send_request(&test_app.app, Method::POST, "/v1/desktop/stop", None, &[]).await;
assert_eq!(status, StatusCode::OK);
assert_eq!(parse_json(&body)["state"], "inactive");
}
#[tokio::test]
async fn v1_process_not_found_returns_404() {
let test_app = TestApp::new(AuthConfig::disabled());
@ -413,22 +506,17 @@ async fn v1_process_logs_follow_sse_streams_entries() {
.expect("process id")
.to_string();
let request = Request::builder()
.method(Method::GET)
.uri(format!(
let response = reqwest::Client::new()
.get(test_app.app.http_url(&format!(
"/v1/processes/{process_id}/logs?stream=stdout&follow=true"
))
.body(Body::empty())
.expect("build request");
let response = test_app
.app
.clone()
.oneshot(request)
)))
.header("accept", "text/event-stream")
.send()
.await
.expect("sse response");
assert_eq!(response.status(), StatusCode::OK);
let mut stream = response.into_body().into_data_stream();
let mut stream = response.bytes_stream();
let chunk = tokio::time::timeout(Duration::from_secs(5), async move {
while let Some(chunk) = stream.next().await {
let bytes = chunk.expect("stream chunk");