feat: desktop computer-use APIs with windows, launch/open, and neko streaming

Adds desktop computer-use endpoints (windows, screenshots, mouse/keyboard,
launch/open), enhances neko-based streaming integration, updates inspector
UI with desktop debug tab, and adds common software test infrastructure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Nathan Flurry 2026-03-17 02:35:52 -07:00
parent 2d8508d6e2
commit dff7614b11
17 changed files with 4045 additions and 136 deletions

View file

@ -38,6 +38,15 @@
- Docker-backed Rust and TypeScript tests build `docker/test-agent/Dockerfile` directly in-process and cache the image tag only in memory (`OnceLock` in Rust, module-level variable in TypeScript). - Docker-backed Rust and TypeScript tests build `docker/test-agent/Dockerfile` directly in-process and cache the image tag only in memory (`OnceLock` in Rust, module-level variable in TypeScript).
- Do not add cross-process image-build scripts unless there is a concrete need for them. - Do not add cross-process image-build scripts unless there is a concrete need for them.
## Common Software Sync
- These three files must stay in sync:
- `docs/common-software.mdx` (user-facing documentation)
- `docker/test-common-software/Dockerfile` (packages installed in the test image)
- `server/packages/sandbox-agent/tests/common_software.rs` (test assertions)
- When adding or removing software from `docs/common-software.mdx`, also add/remove the corresponding `apt-get install` line in the Dockerfile and add/remove the test in `common_software.rs`.
- Run `cargo test -p sandbox-agent --test common_software` to verify.
## Install Version References ## Install Version References
- Channel policy: - Channel policy:

View file

@ -0,0 +1,37 @@
# Extends the base test-agent image with common software pre-installed.
# Used by the common_software integration test to verify that all documented
# software in docs/common-software.mdx works correctly inside the sandbox.
#
# KEEP IN SYNC with docs/common-software.mdx
ARG BASE_IMAGE=sandbox-agent-test:dev
FROM ${BASE_IMAGE}
USER root
RUN apt-get update -qq && \
apt-get install -y -qq --no-install-recommends \
# Browsers
chromium \
firefox-esr \
# Languages
python3 python3-pip python3-venv \
default-jdk \
ruby-full \
# Databases
sqlite3 \
redis-server \
# Build tools
build-essential cmake pkg-config \
# CLI tools
git jq tmux \
# Media and graphics
imagemagick \
poppler-utils \
# Desktop apps
gimp \
> /dev/null 2>&1 && \
rm -rf /var/lib/apt/lists/*
ENTRYPOINT ["/usr/local/bin/sandbox-agent"]
CMD ["server", "--host", "0.0.0.0", "--port", "3000", "--no-token"]

View file

@ -41,6 +41,49 @@ curl -X POST "http://127.0.0.1:2468/v1/desktop/stop"
All fields in the start request are optional. Defaults are 1440x900 at 96 DPI. All fields in the start request are optional. Defaults are 1440x900 at 96 DPI.
### Start request options
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `width` | number | 1440 | Desktop width in pixels |
| `height` | number | 900 | Desktop height in pixels |
| `dpi` | number | 96 | Display DPI |
| `displayNum` | number | 99 | Starting X display number. The runtime probes from this number upward to find an available display. |
| `stateDir` | string | (auto) | Desktop state directory for home, logs, recordings |
| `streamVideoCodec` | string | `"vp8"` | WebRTC video codec (`vp8`, `vp9`, `h264`) |
| `streamAudioCodec` | string | `"opus"` | WebRTC audio codec (`opus`, `g722`) |
| `streamFrameRate` | number | 30 | Streaming frame rate (1-60) |
| `webrtcPortRange` | string | `"59050-59070"` | UDP port range for WebRTC media |
| `recordingFps` | number | 30 | Default recording FPS when not specified in `startDesktopRecording` (1-60) |
The streaming and recording options configure defaults for the desktop session. They take effect when streaming or recording is started later.
<CodeGroup>
```ts TypeScript
const status = await sdk.startDesktop({
width: 1920,
height: 1080,
streamVideoCodec: "h264",
streamFrameRate: 60,
webrtcPortRange: "59100-59120",
recordingFps: 15,
});
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/desktop/start" \
-H "Content-Type: application/json" \
-d '{
"width": 1920,
"height": 1080,
"streamVideoCodec": "h264",
"streamFrameRate": 60,
"webrtcPortRange": "59100-59120",
"recordingFps": 15
}'
```
</CodeGroup>
## Status ## Status
<CodeGroup> <CodeGroup>
@ -56,7 +99,7 @@ curl "http://127.0.0.1:2468/v1/desktop/status"
## Screenshots ## Screenshots
Capture the full desktop or a specific region. Capture the full desktop or a specific region. Optionally include the cursor position.
<CodeGroup> <CodeGroup>
```ts TypeScript ```ts TypeScript
@ -70,6 +113,11 @@ const jpeg = await sdk.takeDesktopScreenshot({
scale: 0.5, scale: 0.5,
}); });
// Include cursor overlay
const withCursor = await sdk.takeDesktopScreenshot({
showCursor: true,
});
// Region screenshot // Region screenshot
const region = await sdk.takeDesktopRegionScreenshot({ const region = await sdk.takeDesktopRegionScreenshot({
x: 100, x: 100,
@ -85,11 +133,26 @@ curl "http://127.0.0.1:2468/v1/desktop/screenshot" --output screenshot.png
curl "http://127.0.0.1:2468/v1/desktop/screenshot?format=jpeg&quality=70&scale=0.5" \ curl "http://127.0.0.1:2468/v1/desktop/screenshot?format=jpeg&quality=70&scale=0.5" \
--output screenshot.jpg --output screenshot.jpg
# Include cursor overlay
curl "http://127.0.0.1:2468/v1/desktop/screenshot?show_cursor=true" \
--output with_cursor.png
curl "http://127.0.0.1:2468/v1/desktop/screenshot/region?x=100&y=100&width=400&height=300" \ curl "http://127.0.0.1:2468/v1/desktop/screenshot/region?x=100&y=100&width=400&height=300" \
--output region.png --output region.png
``` ```
</CodeGroup> </CodeGroup>
### Screenshot options
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `format` | string | `"png"` | Output format: `png`, `jpeg`, or `webp` |
| `quality` | number | 85 | Compression quality (1-100, JPEG/WebP only) |
| `scale` | number | 1.0 | Scale factor (0.1-1.0) |
| `showCursor` | boolean | `false` | Composite a crosshair at the cursor position |
When `showCursor` is enabled, the cursor position is captured at the moment of the screenshot and a red crosshair is drawn at that location. This is useful for AI agents that need to see where the cursor is in the screenshot.
## Mouse ## Mouse
<CodeGroup> <CodeGroup>
@ -166,6 +229,52 @@ curl -X POST "http://127.0.0.1:2468/v1/desktop/keyboard/press" \
``` ```
</CodeGroup> </CodeGroup>
## Clipboard
Read and write the X11 clipboard programmatically.
<CodeGroup>
```ts TypeScript
// Read clipboard
const clipboard = await sdk.getDesktopClipboard();
console.log(clipboard.text);
// Read primary selection (mouse-selected text)
const primary = await sdk.getDesktopClipboard({ selection: "primary" });
// Write to clipboard
await sdk.setDesktopClipboard({ text: "Pasted via API" });
// Write to both clipboard and primary selection
await sdk.setDesktopClipboard({
text: "Synced text",
selection: "both",
});
```
```bash cURL
curl "http://127.0.0.1:2468/v1/desktop/clipboard"
curl "http://127.0.0.1:2468/v1/desktop/clipboard?selection=primary"
curl -X POST "http://127.0.0.1:2468/v1/desktop/clipboard" \
-H "Content-Type: application/json" \
-d '{"text":"Pasted via API"}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/clipboard" \
-H "Content-Type: application/json" \
-d '{"text":"Synced text","selection":"both"}'
```
</CodeGroup>
The `selection` parameter controls which X11 selection to read or write:
| Value | Description |
|-------|-------------|
| `clipboard` (default) | The standard clipboard (Ctrl+C / Ctrl+V) |
| `primary` | The primary selection (text selected with the mouse) |
| `both` | Write to both clipboard and primary selection (write only) |
## Display and windows ## Display and windows
<CodeGroup> <CodeGroup>
@ -186,6 +295,112 @@ curl "http://127.0.0.1:2468/v1/desktop/windows"
``` ```
</CodeGroup> </CodeGroup>
The windows endpoint filters out noise automatically: window manager internals (Openbox), windows with empty titles, and tiny helper windows (under 120x80) are excluded. The currently active/focused window is always included regardless of filters.
### Focused window
Get the currently focused window without listing all windows.
<CodeGroup>
```ts TypeScript
const focused = await sdk.getDesktopFocusedWindow();
console.log(focused.title, focused.id);
```
```bash cURL
curl "http://127.0.0.1:2468/v1/desktop/windows/focused"
```
</CodeGroup>
Returns 404 if no window currently has focus.
### Window management
Focus, move, and resize windows by their X11 window ID.
<CodeGroup>
```ts TypeScript
const { windows } = await sdk.listDesktopWindows();
const win = windows[0];
// Bring window to foreground
await sdk.focusDesktopWindow(win.id);
// Move window
await sdk.moveDesktopWindow(win.id, { x: 100, y: 50 });
// Resize window
await sdk.resizeDesktopWindow(win.id, { width: 1280, height: 720 });
```
```bash cURL
# Focus a window
curl -X POST "http://127.0.0.1:2468/v1/desktop/windows/12345/focus"
# Move a window
curl -X POST "http://127.0.0.1:2468/v1/desktop/windows/12345/move" \
-H "Content-Type: application/json" \
-d '{"x":100,"y":50}'
# Resize a window
curl -X POST "http://127.0.0.1:2468/v1/desktop/windows/12345/resize" \
-H "Content-Type: application/json" \
-d '{"width":1280,"height":720}'
```
</CodeGroup>
All three endpoints return the updated window info so you can verify the operation took effect. The window manager may adjust the requested position or size.
## App launching
Launch applications or open files/URLs on the desktop without needing to shell out.
<CodeGroup>
```ts TypeScript
// Launch an app by name
const result = await sdk.launchDesktopApp({
app: "firefox",
args: ["--private"],
});
console.log(result.processId); // "proc_7"
// Launch and wait for the window to appear
const withWindow = await sdk.launchDesktopApp({
app: "xterm",
wait: true,
});
console.log(withWindow.windowId); // "12345" or null if timed out
// Open a URL with the default handler
const opened = await sdk.openDesktopTarget({
target: "https://example.com",
});
console.log(opened.processId);
```
```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/desktop/launch" \
-H "Content-Type: application/json" \
-d '{"app":"firefox","args":["--private"]}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/launch" \
-H "Content-Type: application/json" \
-d '{"app":"xterm","wait":true}'
curl -X POST "http://127.0.0.1:2468/v1/desktop/open" \
-H "Content-Type: application/json" \
-d '{"target":"https://example.com"}'
```
</CodeGroup>
The returned `processId` can be used with the [Process API](/processes) to read logs (`GET /v1/processes/{id}/logs`) or stop the application (`POST /v1/processes/{id}/stop`).
When `wait` is `true`, the API polls for up to 5 seconds for a window to appear. If the window appears, its ID is returned in `windowId`. If it times out, `windowId` is `null` but the process is still running.
<Tip>
**Launch/Open vs the Process API:** Both `launch` and `open` are convenience wrappers around the [Process API](/processes). They create managed processes (with `owner: "desktop"`) that you can inspect, log, and stop through the same Process endpoints. The difference is that `launch` validates the binary exists in PATH first and can optionally wait for a window to appear, while `open` delegates to the system default handler (`xdg-open`). Use the Process API directly when you need full control over command, environment, working directory, or restart policies.
</Tip>
## Recording ## Recording
Record the desktop to MP4. Record the desktop to MP4.
@ -285,6 +500,11 @@ Start a WebRTC stream for real-time desktop viewing in a browser.
```ts TypeScript ```ts TypeScript
await sdk.startDesktopStream(); await sdk.startDesktopStream();
// Check stream status
const status = await sdk.getDesktopStreamStatus();
console.log(status.active); // true
console.log(status.processId); // "proc_5"
// Connect via the React DesktopViewer component or // Connect via the React DesktopViewer component or
// use the WebSocket signaling endpoint directly // use the WebSocket signaling endpoint directly
// at ws://127.0.0.1:2468/v1/desktop/stream/signaling // at ws://127.0.0.1:2468/v1/desktop/stream/signaling
@ -295,6 +515,9 @@ await sdk.stopDesktopStream();
```bash cURL ```bash cURL
curl -X POST "http://127.0.0.1:2468/v1/desktop/stream/start" curl -X POST "http://127.0.0.1:2468/v1/desktop/stream/start"
# Check stream status
curl "http://127.0.0.1:2468/v1/desktop/stream/status"
# Connect to ws://127.0.0.1:2468/v1/desktop/stream/signaling for WebRTC signaling # Connect to ws://127.0.0.1:2468/v1/desktop/stream/signaling for WebRTC signaling
curl -X POST "http://127.0.0.1:2468/v1/desktop/stream/stop" curl -X POST "http://127.0.0.1:2468/v1/desktop/stream/stop"
@ -303,6 +526,89 @@ curl -X POST "http://127.0.0.1:2468/v1/desktop/stream/stop"
For a drop-in React component, see [React Components](/react-components). For a drop-in React component, see [React Components](/react-components).
## API reference
### Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/desktop/start` | Start the desktop runtime |
| `POST` | `/v1/desktop/stop` | Stop the desktop runtime |
| `GET` | `/v1/desktop/status` | Get desktop runtime status |
| `GET` | `/v1/desktop/screenshot` | Capture full desktop screenshot |
| `GET` | `/v1/desktop/screenshot/region` | Capture a region screenshot |
| `GET` | `/v1/desktop/mouse/position` | Get current mouse position |
| `POST` | `/v1/desktop/mouse/move` | Move the mouse |
| `POST` | `/v1/desktop/mouse/click` | Click the mouse |
| `POST` | `/v1/desktop/mouse/down` | Press mouse button down |
| `POST` | `/v1/desktop/mouse/up` | Release mouse button |
| `POST` | `/v1/desktop/mouse/drag` | Drag from one point to another |
| `POST` | `/v1/desktop/mouse/scroll` | Scroll at a position |
| `POST` | `/v1/desktop/keyboard/type` | Type text |
| `POST` | `/v1/desktop/keyboard/press` | Press a key with optional modifiers |
| `POST` | `/v1/desktop/keyboard/down` | Press a key down (hold) |
| `POST` | `/v1/desktop/keyboard/up` | Release a key |
| `GET` | `/v1/desktop/display/info` | Get display info |
| `GET` | `/v1/desktop/windows` | List visible windows |
| `GET` | `/v1/desktop/windows/focused` | Get focused window info |
| `POST` | `/v1/desktop/windows/{id}/focus` | Focus a window |
| `POST` | `/v1/desktop/windows/{id}/move` | Move a window |
| `POST` | `/v1/desktop/windows/{id}/resize` | Resize a window |
| `GET` | `/v1/desktop/clipboard` | Read clipboard contents |
| `POST` | `/v1/desktop/clipboard` | Write to clipboard |
| `POST` | `/v1/desktop/launch` | Launch an application |
| `POST` | `/v1/desktop/open` | Open a file or URL |
| `POST` | `/v1/desktop/recording/start` | Start recording |
| `POST` | `/v1/desktop/recording/stop` | Stop recording |
| `GET` | `/v1/desktop/recordings` | List recordings |
| `GET` | `/v1/desktop/recordings/{id}` | Get recording metadata |
| `GET` | `/v1/desktop/recordings/{id}/download` | Download recording |
| `DELETE` | `/v1/desktop/recordings/{id}` | Delete recording |
| `POST` | `/v1/desktop/stream/start` | Start WebRTC streaming |
| `POST` | `/v1/desktop/stream/stop` | Stop WebRTC streaming |
| `GET` | `/v1/desktop/stream/status` | Get stream status |
| `GET` | `/v1/desktop/stream/signaling` | WebSocket for WebRTC signaling |
### TypeScript SDK methods
| Method | Returns | Description |
|--------|---------|-------------|
| `startDesktop(request?)` | `DesktopStatusResponse` | Start the desktop |
| `stopDesktop()` | `DesktopStatusResponse` | Stop the desktop |
| `getDesktopStatus()` | `DesktopStatusResponse` | Get desktop status |
| `takeDesktopScreenshot(query?)` | `Uint8Array` | Capture screenshot |
| `takeDesktopRegionScreenshot(query)` | `Uint8Array` | Capture region screenshot |
| `getDesktopMousePosition()` | `DesktopMousePositionResponse` | Get mouse position |
| `moveDesktopMouse(request)` | `DesktopMousePositionResponse` | Move mouse |
| `clickDesktop(request)` | `DesktopMousePositionResponse` | Click mouse |
| `mouseDownDesktop(request)` | `DesktopMousePositionResponse` | Mouse button down |
| `mouseUpDesktop(request)` | `DesktopMousePositionResponse` | Mouse button up |
| `dragDesktopMouse(request)` | `DesktopMousePositionResponse` | Drag mouse |
| `scrollDesktop(request)` | `DesktopMousePositionResponse` | Scroll |
| `typeDesktopText(request)` | `DesktopActionResponse` | Type text |
| `pressDesktopKey(request)` | `DesktopActionResponse` | Press key |
| `keyDownDesktop(request)` | `DesktopActionResponse` | Key down |
| `keyUpDesktop(request)` | `DesktopActionResponse` | Key up |
| `getDesktopDisplayInfo()` | `DesktopDisplayInfoResponse` | Get display info |
| `listDesktopWindows()` | `DesktopWindowListResponse` | List windows |
| `getDesktopFocusedWindow()` | `DesktopWindowInfo` | Get focused window |
| `focusDesktopWindow(id)` | `DesktopWindowInfo` | Focus a window |
| `moveDesktopWindow(id, request)` | `DesktopWindowInfo` | Move a window |
| `resizeDesktopWindow(id, request)` | `DesktopWindowInfo` | Resize a window |
| `getDesktopClipboard(query?)` | `DesktopClipboardResponse` | Read clipboard |
| `setDesktopClipboard(request)` | `DesktopActionResponse` | Write clipboard |
| `launchDesktopApp(request)` | `DesktopLaunchResponse` | Launch an app |
| `openDesktopTarget(request)` | `DesktopOpenResponse` | Open file/URL |
| `startDesktopRecording(request?)` | `DesktopRecordingInfo` | Start recording |
| `stopDesktopRecording()` | `DesktopRecordingInfo` | Stop recording |
| `listDesktopRecordings()` | `DesktopRecordingListResponse` | List recordings |
| `getDesktopRecording(id)` | `DesktopRecordingInfo` | Get recording |
| `downloadDesktopRecording(id)` | `Uint8Array` | Download recording |
| `deleteDesktopRecording(id)` | `void` | Delete recording |
| `startDesktopStream()` | `DesktopStreamStatusResponse` | Start streaming |
| `stopDesktopStream()` | `DesktopStreamStatusResponse` | Stop streaming |
| `getDesktopStreamStatus()` | `DesktopStreamStatusResponse` | Stream status |
## Customizing the desktop environment ## Customizing the desktop environment
The desktop runs inside the sandbox filesystem, so you can customize it using the [File System](/file-system) API before or after starting the desktop. The desktop HOME directory is located at `~/.local/state/sandbox-agent/desktop/home` (or `$XDG_STATE_HOME/sandbox-agent/desktop/home` if `XDG_STATE_HOME` is set). The desktop runs inside the sandbox filesystem, so you can customize it using the [File System](/file-system) API before or after starting the desktop. The desktop HOME directory is located at `~/.local/state/sandbox-agent/desktop/home` (or `$XDG_STATE_HOME/sandbox-agent/desktop/home` if `XDG_STATE_HOME` is set).

View file

@ -628,6 +628,105 @@
} }
} }
}, },
"/v1/desktop/clipboard": {
"get": {
"tags": ["v1"],
"summary": "Read the desktop clipboard.",
"description": "Returns the current text content of the X11 clipboard.",
"operationId": "get_v1_desktop_clipboard",
"parameters": [
{
"name": "selection",
"in": "query",
"required": false,
"schema": {
"type": "string",
"nullable": true
}
}
],
"responses": {
"200": {
"description": "Clipboard contents",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopClipboardResponse"
}
}
}
},
"409": {
"description": "Desktop runtime is not ready",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
},
"500": {
"description": "Clipboard read failed",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
}
}
},
"post": {
"tags": ["v1"],
"summary": "Write to the desktop clipboard.",
"description": "Sets the text content of the X11 clipboard.",
"operationId": "post_v1_desktop_clipboard",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopClipboardWriteRequest"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Clipboard updated",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopActionResponse"
}
}
}
},
"409": {
"description": "Desktop runtime is not ready",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
},
"500": {
"description": "Clipboard write failed",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
}
}
}
},
"/v1/desktop/display/info": { "/v1/desktop/display/info": {
"get": { "get": {
"tags": ["v1"], "tags": ["v1"],
@ -908,6 +1007,56 @@
} }
} }
}, },
"/v1/desktop/launch": {
"post": {
"tags": ["v1"],
"summary": "Launch a desktop application.",
"description": "Launches an application by name on the managed desktop, optionally waiting\nfor its window to appear.",
"operationId": "post_v1_desktop_launch",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopLaunchRequest"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Application launched",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopLaunchResponse"
}
}
}
},
"404": {
"description": "Application not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
},
"409": {
"description": "Desktop runtime is not ready",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
}
}
}
},
"/v1/desktop/mouse/click": { "/v1/desktop/mouse/click": {
"post": { "post": {
"tags": ["v1"], "tags": ["v1"],
@ -1308,6 +1457,46 @@
} }
} }
}, },
"/v1/desktop/open": {
"post": {
"tags": ["v1"],
"summary": "Open a file or URL with the default handler.",
"description": "Opens a file path or URL using xdg-open on the managed desktop.",
"operationId": "post_v1_desktop_open",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopOpenRequest"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Target opened",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopOpenResponse"
}
}
}
},
"409": {
"description": "Desktop runtime is not ready",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
}
}
}
},
"/v1/desktop/recording/start": { "/v1/desktop/recording/start": {
"post": { "post": {
"tags": ["v1"], "tags": ["v1"],
@ -1585,6 +1774,15 @@
"format": "float", "format": "float",
"nullable": true "nullable": true
} }
},
{
"name": "showCursor",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"nullable": true
}
} }
], ],
"responses": { "responses": {
@ -1702,6 +1900,15 @@
"format": "float", "format": "float",
"nullable": true "nullable": true
} }
},
{
"name": "showCursor",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"nullable": true
}
} }
], ],
"responses": { "responses": {
@ -1871,51 +2078,11 @@
} }
} }
}, },
"/v1/desktop/stream/start": { "/v1/desktop/stream/signaling": {
"post": {
"tags": ["v1"],
"summary": "Start desktop streaming.",
"description": "Enables desktop websocket streaming for the managed desktop.",
"operationId": "post_v1_desktop_stream_start",
"responses": {
"200": {
"description": "Desktop streaming started",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopStreamStatusResponse"
}
}
}
}
}
}
},
"/v1/desktop/stream/stop": {
"post": {
"tags": ["v1"],
"summary": "Stop desktop streaming.",
"description": "Disables desktop websocket streaming for the managed desktop.",
"operationId": "post_v1_desktop_stream_stop",
"responses": {
"200": {
"description": "Desktop streaming stopped",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopStreamStatusResponse"
}
}
}
}
}
}
},
"/v1/desktop/stream/ws": {
"get": { "get": {
"tags": ["v1"], "tags": ["v1"],
"summary": "Open a desktop websocket streaming session.", "summary": "Open a desktop WebRTC signaling session.",
"description": "Upgrades the connection to a websocket that streams JPEG desktop frames and\naccepts mouse and keyboard control frames.", "description": "Upgrades the connection to a WebSocket used for WebRTC signaling between\nthe browser client and the desktop streaming process. Also accepts mouse\nand keyboard input frames as a fallback transport.",
"operationId": "get_v1_desktop_stream_ws", "operationId": "get_v1_desktop_stream_ws",
"parameters": [ "parameters": [
{ {
@ -1956,6 +2123,66 @@
} }
} }
}, },
"/v1/desktop/stream/start": {
"post": {
"tags": ["v1"],
"summary": "Start desktop streaming.",
"description": "Enables desktop websocket streaming for the managed desktop.",
"operationId": "post_v1_desktop_stream_start",
"responses": {
"200": {
"description": "Desktop streaming started",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopStreamStatusResponse"
}
}
}
}
}
}
},
"/v1/desktop/stream/status": {
"get": {
"tags": ["v1"],
"summary": "Get desktop stream status.",
"description": "Returns the current state of the desktop WebRTC streaming session.",
"operationId": "get_v1_desktop_stream_status",
"responses": {
"200": {
"description": "Desktop stream status",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopStreamStatusResponse"
}
}
}
}
}
}
},
"/v1/desktop/stream/stop": {
"post": {
"tags": ["v1"],
"summary": "Stop desktop streaming.",
"description": "Disables desktop websocket streaming for the managed desktop.",
"operationId": "post_v1_desktop_stream_stop",
"responses": {
"200": {
"description": "Desktop streaming stopped",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopStreamStatusResponse"
}
}
}
}
}
}
},
"/v1/desktop/windows": { "/v1/desktop/windows": {
"get": { "get": {
"tags": ["v1"], "tags": ["v1"],
@ -1996,6 +2223,219 @@
} }
} }
}, },
"/v1/desktop/windows/focused": {
"get": {
"tags": ["v1"],
"summary": "Get the currently focused desktop window.",
"description": "Returns information about the window that currently has input focus.",
"operationId": "get_v1_desktop_windows_focused",
"responses": {
"200": {
"description": "Focused window info",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopWindowInfo"
}
}
}
},
"404": {
"description": "No window is focused",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
},
"409": {
"description": "Desktop runtime is not ready",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
}
}
}
},
"/v1/desktop/windows/{id}/focus": {
"post": {
"tags": ["v1"],
"summary": "Focus a desktop window.",
"description": "Brings the specified window to the foreground and gives it input focus.",
"operationId": "post_v1_desktop_window_focus",
"parameters": [
{
"name": "id",
"in": "path",
"description": "X11 window ID",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"200": {
"description": "Window info after focus",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopWindowInfo"
}
}
}
},
"404": {
"description": "Window not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
},
"409": {
"description": "Desktop runtime is not ready",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
}
}
}
},
"/v1/desktop/windows/{id}/move": {
"post": {
"tags": ["v1"],
"summary": "Move a desktop window.",
"description": "Moves the specified window to the given position.",
"operationId": "post_v1_desktop_window_move",
"parameters": [
{
"name": "id",
"in": "path",
"description": "X11 window ID",
"required": true,
"schema": {
"type": "string"
}
}
],
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopWindowMoveRequest"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Window info after move",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopWindowInfo"
}
}
}
},
"404": {
"description": "Window not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
},
"409": {
"description": "Desktop runtime is not ready",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
}
}
}
},
"/v1/desktop/windows/{id}/resize": {
"post": {
"tags": ["v1"],
"summary": "Resize a desktop window.",
"description": "Resizes the specified window to the given dimensions.",
"operationId": "post_v1_desktop_window_resize",
"parameters": [
{
"name": "id",
"in": "path",
"description": "X11 window ID",
"required": true,
"schema": {
"type": "string"
}
}
],
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopWindowResizeRequest"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Window info after resize",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DesktopWindowInfo"
}
}
}
},
"404": {
"description": "Window not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
},
"409": {
"description": "Desktop runtime is not ready",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ProblemDetails"
}
}
}
}
}
}
},
"/v1/fs/entries": { "/v1/fs/entries": {
"get": { "get": {
"tags": ["v1"], "tags": ["v1"],
@ -3326,6 +3766,40 @@
} }
} }
}, },
"DesktopClipboardQuery": {
"type": "object",
"properties": {
"selection": {
"type": "string",
"nullable": true
}
}
},
"DesktopClipboardResponse": {
"type": "object",
"required": ["text", "selection"],
"properties": {
"selection": {
"type": "string"
},
"text": {
"type": "string"
}
}
},
"DesktopClipboardWriteRequest": {
"type": "object",
"required": ["text"],
"properties": {
"selection": {
"type": "string",
"nullable": true
},
"text": {
"type": "string"
}
}
},
"DesktopDisplayInfoResponse": { "DesktopDisplayInfoResponse": {
"type": "object", "type": "object",
"required": ["display", "resolution"], "required": ["display", "resolution"],
@ -3421,6 +3895,45 @@
} }
} }
}, },
"DesktopLaunchRequest": {
"type": "object",
"required": ["app"],
"properties": {
"app": {
"type": "string"
},
"args": {
"type": "array",
"items": {
"type": "string"
},
"nullable": true
},
"wait": {
"type": "boolean",
"nullable": true
}
}
},
"DesktopLaunchResponse": {
"type": "object",
"required": ["processId"],
"properties": {
"pid": {
"type": "integer",
"format": "int32",
"nullable": true,
"minimum": 0
},
"processId": {
"type": "string"
},
"windowId": {
"type": "string",
"nullable": true
}
}
},
"DesktopMouseButton": { "DesktopMouseButton": {
"type": "string", "type": "string",
"enum": ["left", "middle", "right"] "enum": ["left", "middle", "right"]
@ -3590,6 +4103,30 @@
} }
} }
}, },
"DesktopOpenRequest": {
"type": "object",
"required": ["target"],
"properties": {
"target": {
"type": "string"
}
}
},
"DesktopOpenResponse": {
"type": "object",
"required": ["processId"],
"properties": {
"pid": {
"type": "integer",
"format": "int32",
"nullable": true,
"minimum": 0
},
"processId": {
"type": "string"
}
}
},
"DesktopProcessInfo": { "DesktopProcessInfo": {
"type": "object", "type": "object",
"required": ["name", "running"], "required": ["name", "running"],
@ -3698,6 +4235,10 @@
"format": "float", "format": "float",
"nullable": true "nullable": true
}, },
"showCursor": {
"type": "boolean",
"nullable": true
},
"width": { "width": {
"type": "integer", "type": "integer",
"format": "int32", "format": "int32",
@ -3760,12 +4301,21 @@
"type": "number", "type": "number",
"format": "float", "format": "float",
"nullable": true "nullable": true
},
"showCursor": {
"type": "boolean",
"nullable": true
} }
} }
}, },
"DesktopStartRequest": { "DesktopStartRequest": {
"type": "object", "type": "object",
"properties": { "properties": {
"displayNum": {
"type": "integer",
"format": "int32",
"nullable": true
},
"dpi": { "dpi": {
"type": "integer", "type": "integer",
"format": "int32", "format": "int32",
@ -3778,6 +4328,34 @@
"nullable": true, "nullable": true,
"minimum": 0 "minimum": 0
}, },
"recordingFps": {
"type": "integer",
"format": "int32",
"nullable": true,
"minimum": 0
},
"stateDir": {
"type": "string",
"nullable": true
},
"streamAudioCodec": {
"type": "string",
"nullable": true
},
"streamFrameRate": {
"type": "integer",
"format": "int32",
"nullable": true,
"minimum": 0
},
"streamVideoCodec": {
"type": "string",
"nullable": true
},
"webrtcPortRange": {
"type": "string",
"nullable": true
},
"width": { "width": {
"type": "integer", "type": "integer",
"format": "int32", "format": "int32",
@ -3840,6 +4418,13 @@
}, },
"state": { "state": {
"$ref": "#/components/schemas/DesktopState" "$ref": "#/components/schemas/DesktopState"
},
"windows": {
"type": "array",
"items": {
"$ref": "#/components/schemas/DesktopWindowInfo"
},
"description": "Current visible windows (included when the desktop is active)."
} }
} }
}, },
@ -3849,6 +4434,14 @@
"properties": { "properties": {
"active": { "active": {
"type": "boolean" "type": "boolean"
},
"processId": {
"type": "string",
"nullable": true
},
"windowId": {
"type": "string",
"nullable": true
} }
} }
}, },
@ -3897,6 +4490,36 @@
} }
} }
}, },
"DesktopWindowMoveRequest": {
"type": "object",
"required": ["x", "y"],
"properties": {
"x": {
"type": "integer",
"format": "int32"
},
"y": {
"type": "integer",
"format": "int32"
}
}
},
"DesktopWindowResizeRequest": {
"type": "object",
"required": ["width", "height"],
"properties": {
"height": {
"type": "integer",
"format": "int32",
"minimum": 0
},
"width": {
"type": "integer",
"format": "int32",
"minimum": 0
}
}
},
"ErrorType": { "ErrorType": {
"type": "string", "type": "string",
"enum": [ "enum": [

View file

@ -2908,6 +2908,31 @@
gap: 10px; gap: 10px;
} }
.desktop-screenshot-controls {
display: flex;
align-items: flex-end;
gap: 10px;
flex-wrap: wrap;
margin-bottom: 12px;
}
.desktop-checkbox-label {
display: flex;
align-items: center;
gap: 6px;
font-size: 12px;
cursor: pointer;
white-space: nowrap;
padding-bottom: 4px;
}
.desktop-advanced-grid {
display: grid;
grid-template-columns: repeat(3, minmax(0, 1fr));
gap: 10px;
margin-top: 8px;
}
.desktop-input-group { .desktop-input-group {
display: flex; display: flex;
flex-direction: column; flex-direction: column;
@ -2950,6 +2975,68 @@
gap: 4px; gap: 4px;
} }
.desktop-clipboard-text {
margin: 4px 0 0;
padding: 8px 10px;
border-radius: var(--radius);
border: 1px solid var(--border);
background: var(--surface);
font-size: 12px;
white-space: pre-wrap;
word-break: break-all;
max-height: 120px;
overflow-y: auto;
}
.desktop-window-item {
padding: 10px;
border-radius: var(--radius);
border: 1px solid var(--border);
background: var(--surface);
display: flex;
flex-direction: column;
gap: 6px;
}
.desktop-window-focused {
border-color: var(--success);
box-shadow: inset 0 0 0 1px var(--success);
}
.desktop-window-editor {
display: flex;
align-items: center;
gap: 6px;
margin-top: 6px;
padding-top: 6px;
border-top: 1px solid var(--border);
}
.desktop-launch-row {
display: flex;
align-items: center;
gap: 8px;
margin-top: 6px;
flex-wrap: wrap;
}
.desktop-mouse-pos {
display: flex;
align-items: center;
gap: 8px;
margin-top: 8px;
}
.desktop-stream-hint {
display: flex;
align-items: center;
justify-content: space-between;
gap: 12px;
margin-bottom: 8px;
font-size: 11px;
color: var(--muted);
}
.desktop-screenshot-empty { .desktop-screenshot-empty {
padding: 18px; padding: 18px;
border: 1px dashed var(--border); border: 1px dashed var(--border);
@ -3640,7 +3727,8 @@
} }
.desktop-state-grid, .desktop-state-grid,
.desktop-start-controls { .desktop-start-controls,
.desktop-advanced-grid {
grid-template-columns: 1fr; grid-template-columns: 1fr;
} }

View file

@ -1,10 +1,35 @@
import { Camera, Circle, Download, Loader2, Monitor, Play, RefreshCw, Square, Trash2, Video } from "lucide-react"; import {
AppWindow,
Camera,
Circle,
Clipboard,
Download,
ExternalLink,
Loader2,
Monitor,
MousePointer,
Play,
RefreshCw,
Square,
Trash2,
Video,
} from "lucide-react";
import { useCallback, useEffect, useMemo, useState } from "react"; import { useCallback, useEffect, useMemo, useState } from "react";
import { SandboxAgentError } from "sandbox-agent"; import { SandboxAgentError } from "sandbox-agent";
import type { DesktopRecordingInfo, DesktopStatusResponse, SandboxAgent } from "sandbox-agent"; import type { DesktopRecordingInfo, DesktopStatusResponse, DesktopWindowInfo, SandboxAgent } from "sandbox-agent";
import { DesktopViewer } from "@sandbox-agent/react"; import { DesktopViewer } from "@sandbox-agent/react";
import type { DesktopViewerClient } from "@sandbox-agent/react"; import type { DesktopViewerClient } from "@sandbox-agent/react";
const MIN_SPIN_MS = 350; const MIN_SPIN_MS = 350;
type DesktopScreenshotRequest = Parameters<SandboxAgent["takeDesktopScreenshot"]>[0] & {
showCursor?: boolean;
};
type DesktopStartRequestWithAdvanced = Parameters<SandboxAgent["startDesktop"]>[0] & {
streamVideoCodec?: string;
streamAudioCodec?: string;
streamFrameRate?: number;
webrtcPortRange?: string;
recordingFps?: number;
};
const extractErrorMessage = (error: unknown, fallback: string): string => { const extractErrorMessage = (error: unknown, fallback: string): string => {
if (error instanceof SandboxAgentError && error.problem?.detail) return error.problem.detail; if (error instanceof SandboxAgentError && error.problem?.detail) return error.problem.detail;
if (error instanceof Error) return error.message; if (error instanceof Error) return error.message;
@ -31,10 +56,10 @@ const formatDuration = (start: string, end?: string | null): string => {
const secs = seconds % 60; const secs = seconds % 60;
return `${mins}m ${secs}s`; return `${mins}m ${secs}s`;
}; };
const createScreenshotUrl = async (bytes: Uint8Array): Promise<string> => { const createScreenshotUrl = async (bytes: Uint8Array, mimeType = "image/png"): Promise<string> => {
const payload = new Uint8Array(bytes.byteLength); const payload = new Uint8Array(bytes.byteLength);
payload.set(bytes); payload.set(bytes);
const blob = new Blob([payload.buffer], { type: "image/png" }); const blob = new Blob([payload.buffer], { type: mimeType });
if (typeof URL.createObjectURL === "function") { if (typeof URL.createObjectURL === "function") {
return URL.createObjectURL(blob); return URL.createObjectURL(blob);
} }
@ -64,9 +89,15 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
const [screenshotUrl, setScreenshotUrl] = useState<string | null>(null); const [screenshotUrl, setScreenshotUrl] = useState<string | null>(null);
const [screenshotLoading, setScreenshotLoading] = useState(false); const [screenshotLoading, setScreenshotLoading] = useState(false);
const [screenshotError, setScreenshotError] = useState<string | null>(null); const [screenshotError, setScreenshotError] = useState<string | null>(null);
const [screenshotFormat, setScreenshotFormat] = useState<"png" | "jpeg" | "webp">("png");
const [screenshotQuality, setScreenshotQuality] = useState("85");
const [screenshotScale, setScreenshotScale] = useState("1.0");
const [showCursor, setShowCursor] = useState(false);
// Live view // Live view
const [liveViewActive, setLiveViewActive] = useState(false); const [liveViewActive, setLiveViewActive] = useState(false);
const [liveViewError, setLiveViewError] = useState<string | null>(null); const [liveViewError, setLiveViewError] = useState<string | null>(null);
const [mousePos, setMousePos] = useState<{ x: number; y: number } | null>(null);
const [mousePosLoading, setMousePosLoading] = useState(false);
// Memoize the client as a DesktopViewerClient so the reference is stable // Memoize the client as a DesktopViewerClient so the reference is stable
// across renders and doesn't cause the DesktopViewer effect to re-fire. // across renders and doesn't cause the DesktopViewer effect to re-fire.
const viewerClient = useMemo<DesktopViewerClient>(() => { const viewerClient = useMemo<DesktopViewerClient>(() => {
@ -85,8 +116,47 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
const [recordingFps, setRecordingFps] = useState("30"); const [recordingFps, setRecordingFps] = useState("30");
const [deletingRecordingId, setDeletingRecordingId] = useState<string | null>(null); const [deletingRecordingId, setDeletingRecordingId] = useState<string | null>(null);
const [downloadingRecordingId, setDownloadingRecordingId] = useState<string | null>(null); const [downloadingRecordingId, setDownloadingRecordingId] = useState<string | null>(null);
const [showAdvancedStart, setShowAdvancedStart] = useState(false);
const [streamVideoCodec, setStreamVideoCodec] = useState("vp8");
const [streamAudioCodec, setStreamAudioCodec] = useState("opus");
const [streamFrameRate, setStreamFrameRate] = useState("30");
const [webrtcPortRange, setWebrtcPortRange] = useState("59050-59070");
const [defaultRecordingFps, setDefaultRecordingFps] = useState("30");
const [clipboardText, setClipboardText] = useState("");
const [clipboardSelection, setClipboardSelection] = useState<"clipboard" | "primary">("clipboard");
const [clipboardLoading, setClipboardLoading] = useState(false);
const [clipboardError, setClipboardError] = useState<string | null>(null);
const [clipboardWriteText, setClipboardWriteText] = useState("");
const [clipboardWriting, setClipboardWriting] = useState(false);
const [windows, setWindows] = useState<DesktopWindowInfo[]>([]);
const [windowsLoading, setWindowsLoading] = useState(false);
const [windowsError, setWindowsError] = useState<string | null>(null);
const [windowActing, setWindowActing] = useState<string | null>(null);
const [editingWindow, setEditingWindow] = useState<{ id: string; action: "move" | "resize" } | null>(null);
const [editX, setEditX] = useState("");
const [editY, setEditY] = useState("");
const [editW, setEditW] = useState("");
const [editH, setEditH] = useState("");
const [launchApp, setLaunchApp] = useState("firefox");
const [launchArgs, setLaunchArgs] = useState("");
const [launchWait, setLaunchWait] = useState(true);
const [launching, setLaunching] = useState(false);
const [launchResult, setLaunchResult] = useState<string | null>(null);
const [launchError, setLaunchError] = useState<string | null>(null);
const [openTarget, setOpenTarget] = useState("");
const [opening, setOpening] = useState(false);
const [openResult, setOpenResult] = useState<string | null>(null);
const [openError, setOpenError] = useState<string | null>(null);
// Active recording tracking // Active recording tracking
const activeRecording = useMemo(() => recordings.find((r) => r.status === "recording"), [recordings]); const activeRecording = useMemo(() => recordings.find((r) => r.status === "recording"), [recordings]);
const visibleWindows = useMemo(() => {
return windows.filter((win) => {
const title = win.title.trim();
if (win.isActive) return true;
if (!title || title === "Openbox") return false;
return win.width >= 120 && win.height >= 80;
});
}, [windows]);
const revokeScreenshotUrl = useCallback(() => { const revokeScreenshotUrl = useCallback(() => {
setScreenshotUrl((current) => { setScreenshotUrl((current) => {
if (current?.startsWith("blob:") && typeof URL.revokeObjectURL === "function") { if (current?.startsWith("blob:") && typeof URL.revokeObjectURL === "function") {
@ -103,6 +173,11 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
try { try {
const next = await getClient().getDesktopStatus(); const next = await getClient().getDesktopStatus();
setStatus(next); setStatus(next);
// Status response now includes windows; sync them so we get window
// updates for free every time status is polled.
if (next.state === "active" && next.windows?.length) {
setWindows(next.windows);
}
return next; return next;
} catch (loadError) { } catch (loadError) {
setError(extractErrorMessage(loadError, "Unable to load desktop status.")); setError(extractErrorMessage(loadError, "Unable to load desktop status."));
@ -118,16 +193,36 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
setScreenshotLoading(true); setScreenshotLoading(true);
setScreenshotError(null); setScreenshotError(null);
try { try {
const bytes = await getClient().takeDesktopScreenshot(); const quality = Number.parseInt(screenshotQuality, 10);
const scale = Number.parseFloat(screenshotScale);
const request: DesktopScreenshotRequest = {
format: screenshotFormat !== "png" ? screenshotFormat : undefined,
quality: screenshotFormat !== "png" && Number.isFinite(quality) ? quality : undefined,
scale: Number.isFinite(scale) && scale !== 1.0 ? scale : undefined,
showCursor: showCursor || undefined,
};
const bytes = await getClient().takeDesktopScreenshot(request);
revokeScreenshotUrl(); revokeScreenshotUrl();
setScreenshotUrl(await createScreenshotUrl(bytes)); const mimeType = screenshotFormat === "jpeg" ? "image/jpeg" : screenshotFormat === "webp" ? "image/webp" : "image/png";
setScreenshotUrl(await createScreenshotUrl(bytes, mimeType));
} catch (captureError) { } catch (captureError) {
revokeScreenshotUrl(); revokeScreenshotUrl();
setScreenshotError(extractErrorMessage(captureError, "Unable to capture desktop screenshot.")); setScreenshotError(extractErrorMessage(captureError, "Unable to capture desktop screenshot."));
} finally { } finally {
setScreenshotLoading(false); setScreenshotLoading(false);
} }
}, [getClient, revokeScreenshotUrl]); }, [getClient, revokeScreenshotUrl, screenshotFormat, screenshotQuality, screenshotScale, showCursor]);
const loadMousePosition = useCallback(async () => {
setMousePosLoading(true);
try {
const pos = await getClient().getDesktopMousePosition();
setMousePos({ x: pos.x, y: pos.y });
} catch {
setMousePos(null);
} finally {
setMousePosLoading(false);
}
}, [getClient]);
const loadRecordings = useCallback(async () => { const loadRecordings = useCallback(async () => {
setRecordingLoading(true); setRecordingLoading(true);
setRecordingError(null); setRecordingError(null);
@ -140,15 +235,88 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
setRecordingLoading(false); setRecordingLoading(false);
} }
}, [getClient]); }, [getClient]);
const loadClipboard = useCallback(async () => {
setClipboardLoading(true);
setClipboardError(null);
try {
const result = await getClient().getDesktopClipboard({ selection: clipboardSelection });
setClipboardText(result.text);
} catch (err) {
setClipboardError(extractErrorMessage(err, "Unable to read clipboard."));
} finally {
setClipboardLoading(false);
}
}, [clipboardSelection, getClient]);
const loadWindows = useCallback(async () => {
setWindowsLoading(true);
setWindowsError(null);
try {
const result = await getClient().listDesktopWindows();
setWindows(result.windows);
} catch (err) {
setWindowsError(extractErrorMessage(err, "Unable to list windows."));
} finally {
setWindowsLoading(false);
}
}, [getClient]);
const handleFocusWindow = async (windowId: string) => {
setWindowActing(windowId);
try {
await getClient().focusDesktopWindow(windowId);
await loadWindows();
} catch (err) {
setWindowsError(extractErrorMessage(err, "Unable to focus window."));
} finally {
setWindowActing(null);
}
};
const handleMoveWindow = async (windowId: string) => {
const x = Number.parseInt(editX, 10);
const y = Number.parseInt(editY, 10);
if (!Number.isFinite(x) || !Number.isFinite(y)) return;
setWindowActing(windowId);
try {
await getClient().moveDesktopWindow(windowId, { x, y });
setEditingWindow(null);
await loadWindows();
} catch (err) {
setWindowsError(extractErrorMessage(err, "Unable to move window."));
} finally {
setWindowActing(null);
}
};
const handleResizeWindow = async (windowId: string) => {
const nextWidth = Number.parseInt(editW, 10);
const nextHeight = Number.parseInt(editH, 10);
if (!Number.isFinite(nextWidth) || !Number.isFinite(nextHeight) || nextWidth <= 0 || nextHeight <= 0) return;
setWindowActing(windowId);
try {
await getClient().resizeDesktopWindow(windowId, { width: nextWidth, height: nextHeight });
setEditingWindow(null);
await loadWindows();
} catch (err) {
setWindowsError(extractErrorMessage(err, "Unable to resize window."));
} finally {
setWindowActing(null);
}
};
useEffect(() => { useEffect(() => {
void loadStatus(); void loadStatus();
}, [loadStatus]); }, [loadStatus]);
// Auto-refresh status (and windows via status) every 5 seconds when active
useEffect(() => {
if (status?.state !== "active") return;
const interval = setInterval(() => void loadStatus("refresh"), 5000);
return () => clearInterval(interval);
}, [status?.state, loadStatus]);
useEffect(() => { useEffect(() => {
if (status?.state === "active") { if (status?.state === "active") {
void loadRecordings(); void loadRecordings();
} else { } else {
revokeScreenshotUrl(); revokeScreenshotUrl();
setLiveViewActive(false); setLiveViewActive(false);
setMousePos(null);
setEditingWindow(null);
} }
}, [status?.state, loadRecordings, revokeScreenshotUrl]); }, [status?.state, loadRecordings, revokeScreenshotUrl]);
useEffect(() => { useEffect(() => {
@ -160,19 +328,35 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
const interval = setInterval(() => void loadRecordings(), 3000); const interval = setInterval(() => void loadRecordings(), 3000);
return () => clearInterval(interval); return () => clearInterval(interval);
}, [activeRecording, loadRecordings]); }, [activeRecording, loadRecordings]);
useEffect(() => {
if (status?.state !== "active") {
setWindows([]);
return;
}
// Initial load; subsequent updates come from the status auto-refresh.
void loadWindows();
}, [status?.state, loadWindows]);
const handleStart = async () => { const handleStart = async () => {
const parsedWidth = Number.parseInt(width, 10); const parsedWidth = Number.parseInt(width, 10);
const parsedHeight = Number.parseInt(height, 10); const parsedHeight = Number.parseInt(height, 10);
const parsedDpi = Number.parseInt(dpi, 10); const parsedDpi = Number.parseInt(dpi, 10);
const parsedFrameRate = Number.parseInt(streamFrameRate, 10);
const parsedRecordingFps = Number.parseInt(defaultRecordingFps, 10);
setActing("start"); setActing("start");
setError(null); setError(null);
const startedAt = Date.now(); const startedAt = Date.now();
try { try {
const next = await getClient().startDesktop({ const request: DesktopStartRequestWithAdvanced = {
width: Number.isFinite(parsedWidth) ? parsedWidth : undefined, width: Number.isFinite(parsedWidth) ? parsedWidth : undefined,
height: Number.isFinite(parsedHeight) ? parsedHeight : undefined, height: Number.isFinite(parsedHeight) ? parsedHeight : undefined,
dpi: Number.isFinite(parsedDpi) ? parsedDpi : undefined, dpi: Number.isFinite(parsedDpi) ? parsedDpi : undefined,
}); streamVideoCodec: streamVideoCodec !== "vp8" ? streamVideoCodec : undefined,
streamAudioCodec: streamAudioCodec !== "opus" ? streamAudioCodec : undefined,
streamFrameRate: Number.isFinite(parsedFrameRate) && parsedFrameRate !== 30 ? parsedFrameRate : undefined,
webrtcPortRange: webrtcPortRange !== "59050-59070" ? webrtcPortRange : undefined,
recordingFps: Number.isFinite(parsedRecordingFps) && parsedRecordingFps !== 30 ? parsedRecordingFps : undefined,
};
const next = await getClient().startDesktop(request);
setStatus(next); setStatus(next);
} catch (startError) { } catch (startError) {
setError(extractErrorMessage(startError, "Unable to start desktop runtime.")); setError(extractErrorMessage(startError, "Unable to start desktop runtime."));
@ -205,6 +389,18 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
setActing(null); setActing(null);
} }
}; };
const handleWriteClipboard = async () => {
setClipboardWriting(true);
setClipboardError(null);
try {
await getClient().setDesktopClipboard({ text: clipboardWriteText, selection: clipboardSelection });
setClipboardText(clipboardWriteText);
} catch (err) {
setClipboardError(extractErrorMessage(err, "Unable to write clipboard."));
} finally {
setClipboardWriting(false);
}
};
const handleStartRecording = async () => { const handleStartRecording = async () => {
const fps = Number.parseInt(recordingFps, 10); const fps = Number.parseInt(recordingFps, 10);
setRecordingActing("start"); setRecordingActing("start");
@ -262,6 +458,41 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
setDownloadingRecordingId(null); setDownloadingRecordingId(null);
} }
}; };
const handleLaunchApp = async () => {
if (!launchApp.trim()) return;
setLaunching(true);
setLaunchError(null);
setLaunchResult(null);
try {
const args = launchArgs.trim() ? launchArgs.trim().split(/\s+/) : undefined;
const result = await getClient().launchDesktopApp({
app: launchApp.trim(),
args,
wait: launchWait || undefined,
});
setLaunchResult(`Started ${result.processId}${result.windowId ? ` (window: ${result.windowId})` : ""}`);
await loadWindows();
} catch (err) {
setLaunchError(extractErrorMessage(err, "Unable to launch app."));
} finally {
setLaunching(false);
}
};
const handleOpenTarget = async () => {
if (!openTarget.trim()) return;
setOpening(true);
setOpenError(null);
setOpenResult(null);
try {
const result = await getClient().openDesktopTarget({ target: openTarget.trim() });
setOpenResult(`Opened via ${result.processId}`);
await loadWindows();
} catch (err) {
setOpenError(extractErrorMessage(err, "Unable to open target."));
} finally {
setOpening(false);
}
};
const canRefreshScreenshot = status?.state === "active"; const canRefreshScreenshot = status?.state === "active";
const isActive = status?.state === "active"; const isActive = status?.state === "active";
const resolutionLabel = useMemo(() => { const resolutionLabel = useMemo(() => {
@ -284,6 +515,48 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
</button> </button>
)} )}
</div> </div>
{isActive && !liveViewActive && (
<div className="desktop-screenshot-controls">
<div className="desktop-input-group">
<label className="label">Format</label>
<select
className="setup-input mono"
value={screenshotFormat}
onChange={(event) => setScreenshotFormat(event.target.value as "png" | "jpeg" | "webp")}
>
<option value="png">PNG</option>
<option value="jpeg">JPEG</option>
<option value="webp">WebP</option>
</select>
</div>
{screenshotFormat !== "png" && (
<div className="desktop-input-group">
<label className="label">Quality</label>
<input
className="setup-input mono"
value={screenshotQuality}
onChange={(event) => setScreenshotQuality(event.target.value)}
inputMode="numeric"
style={{ maxWidth: 60 }}
/>
</div>
)}
<div className="desktop-input-group">
<label className="label">Scale</label>
<input
className="setup-input mono"
value={screenshotScale}
onChange={(event) => setScreenshotScale(event.target.value)}
inputMode="decimal"
style={{ maxWidth: 60 }}
/>
</div>
<label className="desktop-checkbox-label">
<input type="checkbox" checked={showCursor} onChange={(event) => setShowCursor(event.target.checked)} />
Show cursor
</label>
</div>
)}
{error && <div className="banner error">{error}</div>} {error && <div className="banner error">{error}</div>}
{screenshotError && <div className="banner error">{screenshotError}</div>} {screenshotError && <div className="banner error">{screenshotError}</div>}
{/* ========== Runtime Section ========== */} {/* ========== Runtime Section ========== */}
@ -329,6 +602,56 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
<input className="setup-input mono" value={dpi} onChange={(event) => setDpi(event.target.value)} inputMode="numeric" /> <input className="setup-input mono" value={dpi} onChange={(event) => setDpi(event.target.value)} inputMode="numeric" />
</div> </div>
</div> </div>
<button
className="button ghost small"
onClick={() => setShowAdvancedStart((value) => !value)}
style={{ marginTop: 8, fontSize: 11, padding: "4px 8px" }}
>
{showAdvancedStart ? "v Advanced" : "> Advanced"}
</button>
{showAdvancedStart && (
<div className="desktop-advanced-grid">
<div className="desktop-input-group">
<label className="label">Video Codec</label>
<select className="setup-input mono" value={streamVideoCodec} onChange={(event) => setStreamVideoCodec(event.target.value)} disabled={isActive}>
<option value="vp8">vp8</option>
<option value="vp9">vp9</option>
<option value="h264">h264</option>
</select>
</div>
<div className="desktop-input-group">
<label className="label">Audio Codec</label>
<select className="setup-input mono" value={streamAudioCodec} onChange={(event) => setStreamAudioCodec(event.target.value)} disabled={isActive}>
<option value="opus">opus</option>
<option value="g722">g722</option>
</select>
</div>
<div className="desktop-input-group">
<label className="label">Frame Rate</label>
<input
className="setup-input mono"
value={streamFrameRate}
onChange={(event) => setStreamFrameRate(event.target.value)}
inputMode="numeric"
disabled={isActive}
/>
</div>
<div className="desktop-input-group">
<label className="label">WebRTC Ports</label>
<input className="setup-input mono" value={webrtcPortRange} onChange={(event) => setWebrtcPortRange(event.target.value)} disabled={isActive} />
</div>
<div className="desktop-input-group">
<label className="label">Recording FPS</label>
<input
className="setup-input mono"
value={defaultRecordingFps}
onChange={(event) => setDefaultRecordingFps(event.target.value)}
inputMode="numeric"
disabled={isActive}
/>
</div>
</div>
)}
<div className="card-actions"> <div className="card-actions">
{isActive ? ( {isActive ? (
<button className="button danger small" onClick={() => void handleStop()} disabled={acting === "stop"}> <button className="button danger small" onClick={() => void handleStop()} disabled={acting === "stop"}>
@ -416,7 +739,29 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
</div> </div>
)} )}
{!isActive && <div className="desktop-screenshot-empty">Start the desktop runtime to enable live view.</div>} {!isActive && <div className="desktop-screenshot-empty">Start the desktop runtime to enable live view.</div>}
{isActive && liveViewActive && <DesktopViewer client={viewerClient} autoStart={true} showStatusBar={true} />} {isActive && liveViewActive && (
<>
<div className="desktop-stream-hint">
<span>Right click to open window</span>
{status?.resolution && (
<span className="mono" style={{ color: "var(--muted)" }}>
{status.resolution.width}x{status.resolution.height}
</span>
)}
</div>
<DesktopViewer
client={viewerClient}
height={360}
showStatusBar={false}
style={{
border: "1px solid var(--border)",
borderRadius: 10,
background: "linear-gradient(180deg, rgba(15, 23, 42, 0.92) 0%, rgba(17, 24, 39, 0.98) 100%)",
boxShadow: "none",
}}
/>
</>
)}
{isActive && !liveViewActive && ( {isActive && !liveViewActive && (
<> <>
{screenshotUrl ? ( {screenshotUrl ? (
@ -428,7 +773,313 @@ const DesktopTab = ({ getClient }: { getClient: () => SandboxAgent }) => {
)} )}
</> </>
)} )}
{isActive && (
<div className="desktop-mouse-pos">
<button
className="button ghost small"
onClick={() => void loadMousePosition()}
disabled={mousePosLoading}
style={{ padding: "4px 8px", fontSize: 11 }}
>
{mousePosLoading ? <Loader2 size={12} className="spinner-icon" /> : <MousePointer size={12} style={{ marginRight: 4 }} />}
Position
</button>
{mousePos && (
<span className="mono" style={{ fontSize: 11, color: "var(--muted)" }}>
({mousePos.x}, {mousePos.y})
</span>
)}
</div>
)}
</div> </div>
{isActive && (
<div className="card">
<div className="card-header">
<span className="card-title">
<Clipboard size={14} style={{ marginRight: 6 }} />
Clipboard
</span>
<div style={{ display: "flex", gap: 6, alignItems: "center" }}>
<select
className="setup-input mono"
value={clipboardSelection}
onChange={(event) => setClipboardSelection(event.target.value as "clipboard" | "primary")}
style={{ fontSize: 11, padding: "2px 6px", width: "auto" }}
>
<option value="clipboard">clipboard</option>
<option value="primary">primary</option>
</select>
<button
className="button secondary small"
onClick={() => void loadClipboard()}
disabled={clipboardLoading}
style={{ padding: "4px 8px", fontSize: 11 }}
>
{clipboardLoading ? <Loader2 size={12} className="spinner-icon" /> : <RefreshCw size={12} />}
</button>
</div>
</div>
{clipboardError && (
<div className="banner error" style={{ marginBottom: 8 }}>
{clipboardError}
</div>
)}
<div className="desktop-clipboard-content">
<div className="card-meta">Current contents</div>
<pre className="desktop-clipboard-text">
{clipboardText ? clipboardText : <span style={{ color: "var(--muted)", fontStyle: "italic" }}>(empty)</span>}
</pre>
</div>
<div style={{ marginTop: 10 }}>
<div className="card-meta">Write to clipboard</div>
<textarea
className="setup-input mono"
value={clipboardWriteText}
onChange={(event) => setClipboardWriteText(event.target.value)}
rows={2}
style={{ width: "100%", resize: "vertical", marginTop: 4 }}
placeholder="Text to copy..."
/>
<button className="button secondary small" onClick={() => void handleWriteClipboard()} disabled={clipboardWriting} style={{ marginTop: 6 }}>
{clipboardWriting ? <Loader2 className="button-icon spinner-icon" /> : null}
Update Clipboard
</button>
</div>
</div>
)}
{isActive && (
<div className="card">
<div className="card-header">
<span className="card-title">
<AppWindow size={14} style={{ marginRight: 6 }} />
Windows
</span>
<button
className="button secondary small"
onClick={() => void loadWindows()}
disabled={windowsLoading}
style={{ padding: "4px 8px", fontSize: 11 }}
>
{windowsLoading ? <Loader2 size={12} className="spinner-icon" /> : <RefreshCw size={12} />}
</button>
</div>
{windowsError && (
<div className="banner error" style={{ marginBottom: 8 }}>
{windowsError}
</div>
)}
{visibleWindows.length > 0 ? (
<div className="desktop-process-list">
{windows.length !== visibleWindows.length && (
<div className="card-meta">
Showing {visibleWindows.length} top-level windows ({windows.length - visibleWindows.length} helper entries hidden)
</div>
)}
{visibleWindows.map((win) => (
<div key={win.id} className={`desktop-window-item ${win.isActive ? "desktop-window-focused" : ""}`}>
<div style={{ display: "flex", alignItems: "center", justifyContent: "space-between" }}>
<div style={{ minWidth: 0 }}>
<strong style={{ fontSize: 12 }}>{win.title || "(untitled)"}</strong>
{win.isActive && (
<span className="pill success" style={{ marginLeft: 8 }}>
focused
</span>
)}
<div className="mono" style={{ fontSize: 11, color: "var(--muted)", marginTop: 2 }}>
id: {win.id}
{" \u00b7 "}
{win.x},{win.y}
{" \u00b7 "}
{win.width}x{win.height}
</div>
</div>
<div style={{ display: "flex", gap: 4, flexShrink: 0 }}>
<button
className="button ghost small"
title="Focus"
onClick={() => void handleFocusWindow(win.id)}
disabled={windowActing === win.id}
style={{ padding: "4px 6px", fontSize: 10 }}
>
Focus
</button>
<button
className="button ghost small"
title="Move"
onClick={() => {
setEditingWindow({ id: win.id, action: "move" });
setEditX(String(win.x));
setEditY(String(win.y));
}}
style={{ padding: "4px 6px", fontSize: 10 }}
>
Move
</button>
<button
className="button ghost small"
title="Resize"
onClick={() => {
setEditingWindow({ id: win.id, action: "resize" });
setEditW(String(win.width));
setEditH(String(win.height));
}}
style={{ padding: "4px 6px", fontSize: 10 }}
>
Resize
</button>
</div>
</div>
{editingWindow?.id === win.id && editingWindow.action === "move" && (
<div className="desktop-window-editor">
<input
className="setup-input mono"
placeholder="x"
value={editX}
onChange={(event) => setEditX(event.target.value)}
style={{ width: 60 }}
inputMode="numeric"
/>
<input
className="setup-input mono"
placeholder="y"
value={editY}
onChange={(event) => setEditY(event.target.value)}
style={{ width: 60 }}
inputMode="numeric"
/>
<button
className="button success small"
onClick={() => void handleMoveWindow(win.id)}
disabled={windowActing === win.id}
style={{ padding: "4px 8px", fontSize: 10 }}
>
Apply
</button>
<button className="button ghost small" onClick={() => setEditingWindow(null)} style={{ padding: "4px 8px", fontSize: 10 }}>
Cancel
</button>
</div>
)}
{editingWindow?.id === win.id && editingWindow.action === "resize" && (
<div className="desktop-window-editor">
<input
className="setup-input mono"
placeholder="width"
value={editW}
onChange={(event) => setEditW(event.target.value)}
style={{ width: 60 }}
inputMode="numeric"
/>
<input
className="setup-input mono"
placeholder="height"
value={editH}
onChange={(event) => setEditH(event.target.value)}
style={{ width: 60 }}
inputMode="numeric"
/>
<button
className="button success small"
onClick={() => void handleResizeWindow(win.id)}
disabled={windowActing === win.id}
style={{ padding: "4px 8px", fontSize: 10 }}
>
Apply
</button>
<button className="button ghost small" onClick={() => setEditingWindow(null)} style={{ padding: "4px 8px", fontSize: 10 }}>
Cancel
</button>
</div>
)}
</div>
))}
</div>
) : (
<div className="desktop-screenshot-empty">{windowsLoading ? "Loading..." : "No windows detected. Click refresh to update."}</div>
)}
</div>
)}
{isActive && (
<div className="card">
<div className="card-header">
<span className="card-title">
<ExternalLink size={14} style={{ marginRight: 6 }} />
Launch / Open
</span>
</div>
<div className="card-meta">Launch application</div>
<div className="desktop-launch-row">
<input
className="setup-input mono"
placeholder="binary name (e.g. firefox)"
value={launchApp}
onChange={(event) => setLaunchApp(event.target.value)}
style={{ flex: 1 }}
/>
<input
className="setup-input mono"
placeholder="args (optional)"
value={launchArgs}
onChange={(event) => setLaunchArgs(event.target.value)}
style={{ flex: 1 }}
/>
<label className="desktop-checkbox-label">
<input type="checkbox" checked={launchWait} onChange={(event) => setLaunchWait(event.target.checked)} />
Wait
</label>
<button
className="button success small"
onClick={() => void handleLaunchApp()}
disabled={launching || !launchApp.trim()}
style={{ padding: "4px 10px", fontSize: 11 }}
>
{launching ? <Loader2 size={12} className="spinner-icon" /> : <Play size={12} style={{ marginRight: 4 }} />}
Launch
</button>
</div>
{launchError && (
<div className="banner error" style={{ marginTop: 6 }}>
{launchError}
</div>
)}
{launchResult && (
<div className="mono" style={{ fontSize: 11, color: "var(--success)", marginTop: 4 }}>
{launchResult}
</div>
)}
<div className="card-meta" style={{ marginTop: 14 }}>
Open file or URL
</div>
<div className="desktop-launch-row">
<input
className="setup-input mono"
placeholder="https://example.com or /path/to/file"
value={openTarget}
onChange={(event) => setOpenTarget(event.target.value)}
style={{ flex: 1 }}
/>
<button
className="button success small"
onClick={() => void handleOpenTarget()}
disabled={opening || !openTarget.trim()}
style={{ padding: "4px 10px", fontSize: 11 }}
>
{opening ? <Loader2 size={12} className="spinner-icon" /> : <ExternalLink size={12} style={{ marginRight: 4 }} />}
Open
</button>
</div>
{openError && (
<div className="banner error" style={{ marginTop: 6 }}>
{openError}
</div>
)}
{openResult && (
<div className="mono" style={{ fontSize: 11, color: "var(--success)", marginTop: 4 }}>
{openResult}
</div>
)}
</div>
)}
{/* ========== Recording Section ========== */} {/* ========== Recording Section ========== */}
<div className="card"> <div className="card">
<div className="card-header"> <div className="card-header">

View file

@ -14,6 +14,7 @@ export interface DesktopViewerProps {
style?: CSSProperties; style?: CSSProperties;
imageStyle?: CSSProperties; imageStyle?: CSSProperties;
height?: number | string; height?: number | string;
showStatusBar?: boolean;
onConnect?: (status: DesktopStreamReadyStatus) => void; onConnect?: (status: DesktopStreamReadyStatus) => void;
onDisconnect?: () => void; onDisconnect?: () => void;
onError?: (error: DesktopStreamErrorStatus | Error) => void; onError?: (error: DesktopStreamErrorStatus | Error) => void;
@ -76,7 +77,17 @@ const getStatusColor = (state: ConnectionState): string => {
} }
}; };
export const DesktopViewer = ({ client, className, style, imageStyle, height = 480, onConnect, onDisconnect, onError }: DesktopViewerProps) => { export const DesktopViewer = ({
client,
className,
style,
imageStyle,
height = 480,
showStatusBar = true,
onConnect,
onDisconnect,
onError,
}: DesktopViewerProps) => {
const wrapperRef = useRef<HTMLDivElement | null>(null); const wrapperRef = useRef<HTMLDivElement | null>(null);
const videoRef = useRef<HTMLVideoElement | null>(null); const videoRef = useRef<HTMLVideoElement | null>(null);
const sessionRef = useRef<ReturnType<DesktopViewerClient["connectDesktopStream"]> | null>(null); const sessionRef = useRef<ReturnType<DesktopViewerClient["connectDesktopStream"]> | null>(null);
@ -194,10 +205,12 @@ export const DesktopViewer = ({ client, className, style, imageStyle, height = 4
return ( return (
<div className={className} style={{ ...shellStyle, ...style }}> <div className={className} style={{ ...shellStyle, ...style }}>
<div style={statusBarStyle}> {showStatusBar ? (
<span style={{ color: getStatusColor(connectionState) }}>{statusMessage}</span> <div style={statusBarStyle}>
<span style={hintStyle}>{resolution ? `${resolution.width}×${resolution.height}` : "Awaiting stream"}</span> <span style={{ color: getStatusColor(connectionState) }}>{statusMessage}</span>
</div> <span style={hintStyle}>{resolution ? `${resolution.width}×${resolution.height}` : "Awaiting stream"}</span>
</div>
) : null}
<div <div
ref={wrapperRef} ref={wrapperRef}
role="button" role="button"

View file

@ -31,10 +31,15 @@ import {
type AgentInstallResponse, type AgentInstallResponse,
type AgentListResponse, type AgentListResponse,
type DesktopActionResponse, type DesktopActionResponse,
type DesktopClipboardQuery,
type DesktopClipboardResponse,
type DesktopClipboardWriteRequest,
type DesktopDisplayInfoResponse, type DesktopDisplayInfoResponse,
type DesktopKeyboardDownRequest, type DesktopKeyboardDownRequest,
type DesktopKeyboardPressRequest, type DesktopKeyboardPressRequest,
type DesktopKeyboardTypeRequest, type DesktopKeyboardTypeRequest,
type DesktopLaunchRequest,
type DesktopLaunchResponse,
type DesktopMouseClickRequest, type DesktopMouseClickRequest,
type DesktopMouseDownRequest, type DesktopMouseDownRequest,
type DesktopMouseDragRequest, type DesktopMouseDragRequest,
@ -43,6 +48,8 @@ import {
type DesktopMouseScrollRequest, type DesktopMouseScrollRequest,
type DesktopMouseUpRequest, type DesktopMouseUpRequest,
type DesktopKeyboardUpRequest, type DesktopKeyboardUpRequest,
type DesktopOpenRequest,
type DesktopOpenResponse,
type DesktopRecordingInfo, type DesktopRecordingInfo,
type DesktopRecordingListResponse, type DesktopRecordingListResponse,
type DesktopRecordingStartRequest, type DesktopRecordingStartRequest,
@ -51,7 +58,10 @@ import {
type DesktopStartRequest, type DesktopStartRequest,
type DesktopStatusResponse, type DesktopStatusResponse,
type DesktopStreamStatusResponse, type DesktopStreamStatusResponse,
type DesktopWindowInfo,
type DesktopWindowListResponse, type DesktopWindowListResponse,
type DesktopWindowMoveRequest,
type DesktopWindowResizeRequest,
type FsActionResponse, type FsActionResponse,
type FsDeleteQuery, type FsDeleteQuery,
type FsEntriesQuery, type FsEntriesQuery,
@ -1663,6 +1673,54 @@ export class SandboxAgent {
return this.requestJson("GET", `${API_PREFIX}/desktop/windows`); return this.requestJson("GET", `${API_PREFIX}/desktop/windows`);
} }
async getDesktopFocusedWindow(): Promise<DesktopWindowInfo> {
return this.requestJson("GET", `${API_PREFIX}/desktop/windows/focused`);
}
async focusDesktopWindow(windowId: string): Promise<DesktopWindowInfo> {
return this.requestJson("POST", `${API_PREFIX}/desktop/windows/${encodeURIComponent(windowId)}/focus`);
}
async moveDesktopWindow(windowId: string, request: DesktopWindowMoveRequest): Promise<DesktopWindowInfo> {
return this.requestJson("POST", `${API_PREFIX}/desktop/windows/${encodeURIComponent(windowId)}/move`, {
body: request,
});
}
async resizeDesktopWindow(windowId: string, request: DesktopWindowResizeRequest): Promise<DesktopWindowInfo> {
return this.requestJson("POST", `${API_PREFIX}/desktop/windows/${encodeURIComponent(windowId)}/resize`, {
body: request,
});
}
async getDesktopClipboard(query: DesktopClipboardQuery = {}): Promise<DesktopClipboardResponse> {
return this.requestJson("GET", `${API_PREFIX}/desktop/clipboard`, {
query,
});
}
async setDesktopClipboard(request: DesktopClipboardWriteRequest): Promise<DesktopActionResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/clipboard`, {
body: request,
});
}
async launchDesktopApp(request: DesktopLaunchRequest): Promise<DesktopLaunchResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/launch`, {
body: request,
});
}
async openDesktopTarget(request: DesktopOpenRequest): Promise<DesktopOpenResponse> {
return this.requestJson("POST", `${API_PREFIX}/desktop/open`, {
body: request,
});
}
async getDesktopStreamStatus(): Promise<DesktopStreamStatusResponse> {
return this.requestJson("GET", `${API_PREFIX}/desktop/stream/status`);
}
async startDesktopRecording(request: DesktopRecordingStartRequest = {}): Promise<DesktopRecordingInfo> { async startDesktopRecording(request: DesktopRecordingStartRequest = {}): Promise<DesktopRecordingInfo> {
return this.requestJson("POST", `${API_PREFIX}/desktop/recording/start`, { return this.requestJson("POST", `${API_PREFIX}/desktop/recording/start`, {
body: request, body: request,

View file

@ -31,6 +31,18 @@ export interface paths {
put: operations["put_v1_config_skills"]; put: operations["put_v1_config_skills"];
delete: operations["delete_v1_config_skills"]; delete: operations["delete_v1_config_skills"];
}; };
"/v1/desktop/clipboard": {
/**
* Read the desktop clipboard.
* @description Returns the current text content of the X11 clipboard.
*/
get: operations["get_v1_desktop_clipboard"];
/**
* Write to the desktop clipboard.
* @description Sets the text content of the X11 clipboard.
*/
post: operations["post_v1_desktop_clipboard"];
};
"/v1/desktop/display/info": { "/v1/desktop/display/info": {
/** /**
* Get desktop display information. * Get desktop display information.
@ -71,6 +83,14 @@ export interface paths {
*/ */
post: operations["post_v1_desktop_keyboard_up"]; post: operations["post_v1_desktop_keyboard_up"];
}; };
"/v1/desktop/launch": {
/**
* Launch a desktop application.
* @description Launches an application by name on the managed desktop, optionally waiting
* for its window to appear.
*/
post: operations["post_v1_desktop_launch"];
};
"/v1/desktop/mouse/click": { "/v1/desktop/mouse/click": {
/** /**
* Click on the desktop. * Click on the desktop.
@ -126,6 +146,13 @@ export interface paths {
*/ */
post: operations["post_v1_desktop_mouse_up"]; post: operations["post_v1_desktop_mouse_up"];
}; };
"/v1/desktop/open": {
/**
* Open a file or URL with the default handler.
* @description Opens a file path or URL using xdg-open on the managed desktop.
*/
post: operations["post_v1_desktop_open"];
};
"/v1/desktop/recording/start": { "/v1/desktop/recording/start": {
/** /**
* Start desktop recording. * Start desktop recording.
@ -208,6 +235,15 @@ export interface paths {
*/ */
post: operations["post_v1_desktop_stop"]; post: operations["post_v1_desktop_stop"];
}; };
"/v1/desktop/stream/signaling": {
/**
* Open a desktop WebRTC signaling session.
* @description Upgrades the connection to a WebSocket used for WebRTC signaling between
* the browser client and the desktop streaming process. Also accepts mouse
* and keyboard input frames as a fallback transport.
*/
get: operations["get_v1_desktop_stream_ws"];
};
"/v1/desktop/stream/start": { "/v1/desktop/stream/start": {
/** /**
* Start desktop streaming. * Start desktop streaming.
@ -215,6 +251,13 @@ export interface paths {
*/ */
post: operations["post_v1_desktop_stream_start"]; post: operations["post_v1_desktop_stream_start"];
}; };
"/v1/desktop/stream/status": {
/**
* Get desktop stream status.
* @description Returns the current state of the desktop WebRTC streaming session.
*/
get: operations["get_v1_desktop_stream_status"];
};
"/v1/desktop/stream/stop": { "/v1/desktop/stream/stop": {
/** /**
* Stop desktop streaming. * Stop desktop streaming.
@ -222,14 +265,6 @@ export interface paths {
*/ */
post: operations["post_v1_desktop_stream_stop"]; post: operations["post_v1_desktop_stream_stop"];
}; };
"/v1/desktop/stream/ws": {
/**
* Open a desktop websocket streaming session.
* @description Upgrades the connection to a websocket that streams JPEG desktop frames and
* accepts mouse and keyboard control frames.
*/
get: operations["get_v1_desktop_stream_ws"];
};
"/v1/desktop/windows": { "/v1/desktop/windows": {
/** /**
* List visible desktop windows. * List visible desktop windows.
@ -238,6 +273,34 @@ export interface paths {
*/ */
get: operations["get_v1_desktop_windows"]; get: operations["get_v1_desktop_windows"];
}; };
"/v1/desktop/windows/focused": {
/**
* Get the currently focused desktop window.
* @description Returns information about the window that currently has input focus.
*/
get: operations["get_v1_desktop_windows_focused"];
};
"/v1/desktop/windows/{id}/focus": {
/**
* Focus a desktop window.
* @description Brings the specified window to the foreground and gives it input focus.
*/
post: operations["post_v1_desktop_window_focus"];
};
"/v1/desktop/windows/{id}/move": {
/**
* Move a desktop window.
* @description Moves the specified window to the given position.
*/
post: operations["post_v1_desktop_window_move"];
};
"/v1/desktop/windows/{id}/resize": {
/**
* Resize a desktop window.
* @description Resizes the specified window to the given dimensions.
*/
post: operations["post_v1_desktop_window_resize"];
};
"/v1/fs/entries": { "/v1/fs/entries": {
get: operations["get_v1_fs_entries"]; get: operations["get_v1_fs_entries"];
}; };
@ -443,6 +506,17 @@ export interface components {
DesktopActionResponse: { DesktopActionResponse: {
ok: boolean; ok: boolean;
}; };
DesktopClipboardQuery: {
selection?: string | null;
};
DesktopClipboardResponse: {
selection: string;
text: string;
};
DesktopClipboardWriteRequest: {
selection?: string | null;
text: string;
};
DesktopDisplayInfoResponse: { DesktopDisplayInfoResponse: {
display: string; display: string;
resolution: components["schemas"]["DesktopResolution"]; resolution: components["schemas"]["DesktopResolution"];
@ -472,6 +546,17 @@ export interface components {
DesktopKeyboardUpRequest: { DesktopKeyboardUpRequest: {
key: string; key: string;
}; };
DesktopLaunchRequest: {
app: string;
args?: string[] | null;
wait?: boolean | null;
};
DesktopLaunchResponse: {
/** Format: int32 */
pid?: number | null;
processId: string;
windowId?: string | null;
};
/** @enum {string} */ /** @enum {string} */
DesktopMouseButton: "left" | "middle" | "right"; DesktopMouseButton: "left" | "middle" | "right";
DesktopMouseClickRequest: { DesktopMouseClickRequest: {
@ -533,6 +618,14 @@ export interface components {
/** Format: int32 */ /** Format: int32 */
y?: number | null; y?: number | null;
}; };
DesktopOpenRequest: {
target: string;
};
DesktopOpenResponse: {
/** Format: int32 */
pid?: number | null;
processId: string;
};
DesktopProcessInfo: { DesktopProcessInfo: {
logPath?: string | null; logPath?: string | null;
name: string; name: string;
@ -567,6 +660,7 @@ export interface components {
quality?: number | null; quality?: number | null;
/** Format: float */ /** Format: float */
scale?: number | null; scale?: number | null;
showCursor?: boolean | null;
/** Format: int32 */ /** Format: int32 */
width: number; width: number;
/** Format: int32 */ /** Format: int32 */
@ -590,13 +684,24 @@ export interface components {
quality?: number | null; quality?: number | null;
/** Format: float */ /** Format: float */
scale?: number | null; scale?: number | null;
showCursor?: boolean | null;
}; };
DesktopStartRequest: { DesktopStartRequest: {
/** Format: int32 */
displayNum?: number | null;
/** Format: int32 */ /** Format: int32 */
dpi?: number | null; dpi?: number | null;
/** Format: int32 */ /** Format: int32 */
height?: number | null; height?: number | null;
/** Format: int32 */ /** Format: int32 */
recordingFps?: number | null;
stateDir?: string | null;
streamAudioCodec?: string | null;
/** Format: int32 */
streamFrameRate?: number | null;
streamVideoCodec?: string | null;
webrtcPortRange?: string | null;
/** Format: int32 */
width?: number | null; width?: number | null;
}; };
/** @enum {string} */ /** @enum {string} */
@ -611,9 +716,13 @@ export interface components {
runtimeLogPath?: string | null; runtimeLogPath?: string | null;
startedAt?: string | null; startedAt?: string | null;
state: components["schemas"]["DesktopState"]; state: components["schemas"]["DesktopState"];
/** @description Current visible windows (included when the desktop is active). */
windows?: components["schemas"]["DesktopWindowInfo"][];
}; };
DesktopStreamStatusResponse: { DesktopStreamStatusResponse: {
active: boolean; active: boolean;
processId?: string | null;
windowId?: string | null;
}; };
DesktopWindowInfo: { DesktopWindowInfo: {
/** Format: int32 */ /** Format: int32 */
@ -631,6 +740,18 @@ export interface components {
DesktopWindowListResponse: { DesktopWindowListResponse: {
windows: components["schemas"]["DesktopWindowInfo"][]; windows: components["schemas"]["DesktopWindowInfo"][];
}; };
DesktopWindowMoveRequest: {
/** Format: int32 */
x: number;
/** Format: int32 */
y: number;
};
DesktopWindowResizeRequest: {
/** Format: int32 */
height: number;
/** Format: int32 */
width: number;
};
/** @enum {string} */ /** @enum {string} */
ErrorType: ErrorType:
| "invalid_request" | "invalid_request"
@ -1231,6 +1352,68 @@ export interface operations {
}; };
}; };
}; };
/**
* Read the desktop clipboard.
* @description Returns the current text content of the X11 clipboard.
*/
get_v1_desktop_clipboard: {
parameters: {
query?: {
selection?: string | null;
};
};
responses: {
/** @description Clipboard contents */
200: {
content: {
"application/json": components["schemas"]["DesktopClipboardResponse"];
};
};
/** @description Desktop runtime is not ready */
409: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
/** @description Clipboard read failed */
500: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
};
};
/**
* Write to the desktop clipboard.
* @description Sets the text content of the X11 clipboard.
*/
post_v1_desktop_clipboard: {
requestBody: {
content: {
"application/json": components["schemas"]["DesktopClipboardWriteRequest"];
};
};
responses: {
/** @description Clipboard updated */
200: {
content: {
"application/json": components["schemas"]["DesktopActionResponse"];
};
};
/** @description Desktop runtime is not ready */
409: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
/** @description Clipboard write failed */
500: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
};
};
/** /**
* Get desktop display information. * Get desktop display information.
* @description Performs a health-gated display query against the managed desktop and * @description Performs a health-gated display query against the managed desktop and
@ -1410,6 +1593,38 @@ export interface operations {
}; };
}; };
}; };
/**
* Launch a desktop application.
* @description Launches an application by name on the managed desktop, optionally waiting
* for its window to appear.
*/
post_v1_desktop_launch: {
requestBody: {
content: {
"application/json": components["schemas"]["DesktopLaunchRequest"];
};
};
responses: {
/** @description Application launched */
200: {
content: {
"application/json": components["schemas"]["DesktopLaunchResponse"];
};
};
/** @description Application not found */
404: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
/** @description Desktop runtime is not ready */
409: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
};
};
/** /**
* Click on the desktop. * Click on the desktop.
* @description Performs a health-gated pointer move and click against the managed desktop * @description Performs a health-gated pointer move and click against the managed desktop
@ -1664,6 +1879,31 @@ export interface operations {
}; };
}; };
}; };
/**
* Open a file or URL with the default handler.
* @description Opens a file path or URL using xdg-open on the managed desktop.
*/
post_v1_desktop_open: {
requestBody: {
content: {
"application/json": components["schemas"]["DesktopOpenRequest"];
};
};
responses: {
/** @description Target opened */
200: {
content: {
"application/json": components["schemas"]["DesktopOpenResponse"];
};
};
/** @description Desktop runtime is not ready */
409: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
};
};
/** /**
* Start desktop recording. * Start desktop recording.
* @description Starts an ffmpeg x11grab recording against the managed desktop and returns * @description Starts an ffmpeg x11grab recording against the managed desktop and returns
@ -1834,6 +2074,7 @@ export interface operations {
format?: components["schemas"]["DesktopScreenshotFormat"] | null; format?: components["schemas"]["DesktopScreenshotFormat"] | null;
quality?: number | null; quality?: number | null;
scale?: number | null; scale?: number | null;
showCursor?: boolean | null;
}; };
}; };
responses: { responses: {
@ -1876,6 +2117,7 @@ export interface operations {
format?: components["schemas"]["DesktopScreenshotFormat"] | null; format?: components["schemas"]["DesktopScreenshotFormat"] | null;
quality?: number | null; quality?: number | null;
scale?: number | null; scale?: number | null;
showCursor?: boolean | null;
}; };
}; };
responses: { responses: {
@ -1990,37 +2232,10 @@ export interface operations {
}; };
}; };
/** /**
* Start desktop streaming. * Open a desktop WebRTC signaling session.
* @description Enables desktop websocket streaming for the managed desktop. * @description Upgrades the connection to a WebSocket used for WebRTC signaling between
*/ * the browser client and the desktop streaming process. Also accepts mouse
post_v1_desktop_stream_start: { * and keyboard input frames as a fallback transport.
responses: {
/** @description Desktop streaming started */
200: {
content: {
"application/json": components["schemas"]["DesktopStreamStatusResponse"];
};
};
};
};
/**
* Stop desktop streaming.
* @description Disables desktop websocket streaming for the managed desktop.
*/
post_v1_desktop_stream_stop: {
responses: {
/** @description Desktop streaming stopped */
200: {
content: {
"application/json": components["schemas"]["DesktopStreamStatusResponse"];
};
};
};
};
/**
* Open a desktop websocket streaming session.
* @description Upgrades the connection to a websocket that streams JPEG desktop frames and
* accepts mouse and keyboard control frames.
*/ */
get_v1_desktop_stream_ws: { get_v1_desktop_stream_ws: {
parameters: { parameters: {
@ -2048,6 +2263,48 @@ export interface operations {
}; };
}; };
}; };
/**
* Start desktop streaming.
* @description Enables desktop websocket streaming for the managed desktop.
*/
post_v1_desktop_stream_start: {
responses: {
/** @description Desktop streaming started */
200: {
content: {
"application/json": components["schemas"]["DesktopStreamStatusResponse"];
};
};
};
};
/**
* Get desktop stream status.
* @description Returns the current state of the desktop WebRTC streaming session.
*/
get_v1_desktop_stream_status: {
responses: {
/** @description Desktop stream status */
200: {
content: {
"application/json": components["schemas"]["DesktopStreamStatusResponse"];
};
};
};
};
/**
* Stop desktop streaming.
* @description Disables desktop websocket streaming for the managed desktop.
*/
post_v1_desktop_stream_stop: {
responses: {
/** @description Desktop streaming stopped */
200: {
content: {
"application/json": components["schemas"]["DesktopStreamStatusResponse"];
};
};
};
};
/** /**
* List visible desktop windows. * List visible desktop windows.
* @description Performs a health-gated visible-window enumeration against the managed * @description Performs a health-gated visible-window enumeration against the managed
@ -2075,6 +2332,138 @@ export interface operations {
}; };
}; };
}; };
/**
* Get the currently focused desktop window.
* @description Returns information about the window that currently has input focus.
*/
get_v1_desktop_windows_focused: {
responses: {
/** @description Focused window info */
200: {
content: {
"application/json": components["schemas"]["DesktopWindowInfo"];
};
};
/** @description No window is focused */
404: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
/** @description Desktop runtime is not ready */
409: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
};
};
/**
* Focus a desktop window.
* @description Brings the specified window to the foreground and gives it input focus.
*/
post_v1_desktop_window_focus: {
parameters: {
path: {
/** @description X11 window ID */
id: string;
};
};
responses: {
/** @description Window info after focus */
200: {
content: {
"application/json": components["schemas"]["DesktopWindowInfo"];
};
};
/** @description Window not found */
404: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
/** @description Desktop runtime is not ready */
409: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
};
};
/**
* Move a desktop window.
* @description Moves the specified window to the given position.
*/
post_v1_desktop_window_move: {
parameters: {
path: {
/** @description X11 window ID */
id: string;
};
};
requestBody: {
content: {
"application/json": components["schemas"]["DesktopWindowMoveRequest"];
};
};
responses: {
/** @description Window info after move */
200: {
content: {
"application/json": components["schemas"]["DesktopWindowInfo"];
};
};
/** @description Window not found */
404: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
/** @description Desktop runtime is not ready */
409: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
};
};
/**
* Resize a desktop window.
* @description Resizes the specified window to the given dimensions.
*/
post_v1_desktop_window_resize: {
parameters: {
path: {
/** @description X11 window ID */
id: string;
};
};
requestBody: {
content: {
"application/json": components["schemas"]["DesktopWindowResizeRequest"];
};
};
responses: {
/** @description Window info after resize */
200: {
content: {
"application/json": components["schemas"]["DesktopWindowInfo"];
};
};
/** @description Window not found */
404: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
/** @description Desktop runtime is not ready */
409: {
content: {
"application/json": components["schemas"]["ProblemDetails"];
};
};
};
};
get_v1_fs_entries: { get_v1_fs_entries: {
parameters: { parameters: {
query?: { query?: {

View file

@ -36,6 +36,16 @@ export type DesktopRecordingStatus = components["schemas"]["DesktopRecordingStat
export type DesktopRecordingInfo = JsonResponse<operations["post_v1_desktop_recording_start"], 200>; export type DesktopRecordingInfo = JsonResponse<operations["post_v1_desktop_recording_start"], 200>;
export type DesktopRecordingListResponse = JsonResponse<operations["get_v1_desktop_recordings"], 200>; export type DesktopRecordingListResponse = JsonResponse<operations["get_v1_desktop_recordings"], 200>;
export type DesktopStreamStatusResponse = JsonResponse<operations["post_v1_desktop_stream_start"], 200>; export type DesktopStreamStatusResponse = JsonResponse<operations["post_v1_desktop_stream_start"], 200>;
export type DesktopClipboardResponse = JsonResponse<operations["get_v1_desktop_clipboard"], 200>;
export type DesktopClipboardQuery =
QueryParams<operations["get_v1_desktop_clipboard"]> extends never ? Record<string, never> : QueryParams<operations["get_v1_desktop_clipboard"]>;
export type DesktopClipboardWriteRequest = JsonRequestBody<operations["post_v1_desktop_clipboard"]>;
export type DesktopLaunchRequest = JsonRequestBody<operations["post_v1_desktop_launch"]>;
export type DesktopLaunchResponse = JsonResponse<operations["post_v1_desktop_launch"], 200>;
export type DesktopOpenRequest = JsonRequestBody<operations["post_v1_desktop_open"]>;
export type DesktopOpenResponse = JsonResponse<operations["post_v1_desktop_open"], 200>;
export type DesktopWindowMoveRequest = JsonRequestBody<operations["post_v1_desktop_window_move"]>;
export type DesktopWindowResizeRequest = JsonRequestBody<operations["post_v1_desktop_window_resize"]>;
export type AgentListResponse = JsonResponse<operations["get_v1_agents"], 200>; export type AgentListResponse = JsonResponse<operations["get_v1_agents"], 200>;
export type AgentInfo = components["schemas"]["AgentInfo"]; export type AgentInfo = components["schemas"]["AgentInfo"];
export type AgentQuery = QueryParams<operations["get_v1_agents"]>; export type AgentQuery = QueryParams<operations["get_v1_agents"]>;

View file

@ -113,6 +113,40 @@ impl DesktopProblem {
.with_processes(processes) .with_processes(processes)
} }
pub fn window_not_found(message: impl Into<String>) -> Self {
Self::new(404, "Window Not Found", "window_not_found", message)
}
pub fn no_focused_window() -> Self {
Self::new(
404,
"No Focused Window",
"no_focused_window",
"No window currently has focus",
)
}
pub fn stream_already_active(message: impl Into<String>) -> Self {
Self::new(
409,
"Stream Already Active",
"stream_already_active",
message,
)
}
pub fn stream_not_active(message: impl Into<String>) -> Self {
Self::new(409, "Stream Not Active", "stream_not_active", message)
}
pub fn clipboard_failed(message: impl Into<String>) -> Self {
Self::new(500, "Clipboard Failed", "clipboard_failed", message)
}
pub fn app_not_found(message: impl Into<String>) -> Self {
Self::new(404, "App Not Found", "app_not_found", message)
}
pub fn to_problem_details(&self) -> ProblemDetails { pub fn to_problem_details(&self) -> ProblemDetails {
let mut extensions = Map::new(); let mut extensions = Map::new();
extensions.insert("code".to_string(), Value::String(self.code.to_string())); extensions.insert("code".to_string(), Value::String(self.code.to_string()));

View file

@ -74,6 +74,8 @@ struct DesktopRuntimeStateData {
xvfb: Option<ManagedDesktopProcess>, xvfb: Option<ManagedDesktopProcess>,
openbox: Option<ManagedDesktopProcess>, openbox: Option<ManagedDesktopProcess>,
dbus_pid: Option<u32>, dbus_pid: Option<u32>,
streaming_config: Option<crate::desktop_streaming::StreamingConfig>,
recording_fps: Option<u32>,
} }
#[derive(Debug)] #[derive(Debug)]
@ -138,26 +140,10 @@ impl DesktopScreenshotOptions {
impl Default for DesktopRuntimeConfig { impl Default for DesktopRuntimeConfig {
fn default() -> Self { fn default() -> Self {
let display_num = std::env::var("SANDBOX_AGENT_DESKTOP_DISPLAY_NUM")
.ok()
.and_then(|value| value.parse::<i32>().ok())
.filter(|value| *value > 0)
.unwrap_or(DEFAULT_DISPLAY_NUM);
let state_dir = std::env::var("SANDBOX_AGENT_DESKTOP_STATE_DIR")
.ok()
.map(PathBuf::from)
.unwrap_or_else(default_state_dir);
let assume_linux_for_tests = std::env::var("SANDBOX_AGENT_DESKTOP_TEST_ASSUME_LINUX")
.ok()
.map(|value| value == "1" || value.eq_ignore_ascii_case("true"))
.unwrap_or(false);
Self { Self {
state_dir, state_dir: default_state_dir(),
display_num, display_num: DEFAULT_DISPLAY_NUM,
assume_linux_for_tests, assume_linux_for_tests: false,
} }
} }
} }
@ -189,6 +175,8 @@ impl DesktopRuntime {
xvfb: None, xvfb: None,
openbox: None, openbox: None,
dbus_pid: None, dbus_pid: None,
streaming_config: None,
recording_fps: None,
})), })),
config, config,
} }
@ -200,6 +188,15 @@ impl DesktopRuntime {
let mut response = self.snapshot_locked(&state); let mut response = self.snapshot_locked(&state);
drop(state); drop(state);
self.append_neko_process(&mut response).await; self.append_neko_process(&mut response).await;
// Include the current window list when the desktop is active so callers
// get windows for free when polling status (avoids a separate request).
if response.state == DesktopState::Active {
if let Ok(window_list) = self.list_windows().await {
response.windows = window_list.windows;
}
}
response response
} }
@ -248,7 +245,9 @@ impl DesktopRuntime {
let dpi = request.dpi.unwrap_or(DEFAULT_DPI); let dpi = request.dpi.unwrap_or(DEFAULT_DPI);
validate_start_request(width, height, dpi)?; validate_start_request(width, height, dpi)?;
let display_num = self.choose_display_num()?; // Override display_num if provided in request
let display_num =
self.choose_display_num_from(request.display_num.unwrap_or(self.config.display_num))?;
let display = format!(":{display_num}"); let display = format!(":{display_num}");
let resolution = DesktopResolution { let resolution = DesktopResolution {
width, width,
@ -257,6 +256,29 @@ impl DesktopRuntime {
}; };
let environment = self.base_environment(&display)?; let environment = self.base_environment(&display)?;
// Store streaming and recording config for later use
state.streaming_config = if request.stream_video_codec.is_some()
|| request.stream_audio_codec.is_some()
|| request.stream_frame_rate.is_some()
|| request.webrtc_port_range.is_some()
{
Some(crate::desktop_streaming::StreamingConfig {
video_codec: request
.stream_video_codec
.unwrap_or_else(|| "vp8".to_string()),
audio_codec: request
.stream_audio_codec
.unwrap_or_else(|| "opus".to_string()),
frame_rate: request.stream_frame_rate.unwrap_or(30).clamp(1, 60),
webrtc_port_range: request
.webrtc_port_range
.unwrap_or_else(|| "59050-59070".to_string()),
})
} else {
None
};
state.recording_fps = request.recording_fps.map(|fps| fps.clamp(1, 60));
state.state = DesktopState::Starting; state.state = DesktopState::Starting;
state.display_num = display_num; state.display_num = display_num;
state.display = Some(display.clone()); state.display = Some(display.clone());
@ -344,6 +366,8 @@ impl DesktopRuntime {
state.missing_dependencies = self.detect_missing_dependencies(); state.missing_dependencies = self.detect_missing_dependencies();
state.install_command = self.install_command_for(&state.missing_dependencies); state.install_command = self.install_command_for(&state.missing_dependencies);
state.environment.clear(); state.environment.clear();
state.streaming_config = None;
state.recording_fps = None;
let mut response = self.snapshot_locked(&state); let mut response = self.snapshot_locked(&state);
drop(state); drop(state);
@ -360,11 +384,17 @@ impl DesktopRuntime {
query: DesktopScreenshotQuery, query: DesktopScreenshotQuery,
) -> Result<DesktopScreenshotData, DesktopProblem> { ) -> Result<DesktopScreenshotData, DesktopProblem> {
let options = screenshot_options(query.format, query.quality, query.scale)?; let options = screenshot_options(query.format, query.quality, query.scale)?;
let show_cursor = query.show_cursor.unwrap_or(false);
let mut state = self.inner.lock().await; let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?; let ready = self.ensure_ready_locked(&mut state).await?;
let bytes = self let mut bytes = self
.capture_screenshot_locked(&state, Some(&ready), &options) .capture_screenshot_locked(&state, Some(&ready), &options)
.await?; .await?;
if show_cursor {
bytes = self
.composite_cursor(&state, &ready, bytes, &options)
.await?;
}
Ok(DesktopScreenshotData { Ok(DesktopScreenshotData {
bytes, bytes,
content_type: options.content_type(), content_type: options.content_type(),
@ -377,12 +407,27 @@ impl DesktopRuntime {
) -> Result<DesktopScreenshotData, DesktopProblem> { ) -> Result<DesktopScreenshotData, DesktopProblem> {
validate_region(&query)?; validate_region(&query)?;
let options = screenshot_options(query.format, query.quality, query.scale)?; let options = screenshot_options(query.format, query.quality, query.scale)?;
let show_cursor = query.show_cursor.unwrap_or(false);
let mut state = self.inner.lock().await; let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?; let ready = self.ensure_ready_locked(&mut state).await?;
let crop = format!("{}x{}+{}+{}", query.width, query.height, query.x, query.y); let crop = format!("{}x{}+{}+{}", query.width, query.height, query.x, query.y);
let bytes = self let mut bytes = self
.capture_screenshot_with_crop_locked(&state, &ready, &crop, &options) .capture_screenshot_with_crop_locked(&state, &ready, &crop, &options)
.await?; .await?;
if show_cursor {
bytes = self
.composite_cursor_region(
&state,
&ready,
bytes,
&options,
query.x,
query.y,
query.width,
query.height,
)
.await?;
}
Ok(DesktopScreenshotData { Ok(DesktopScreenshotData {
bytes, bytes,
content_type: options.content_type(), content_type: options.content_type(),
@ -598,6 +643,21 @@ impl DesktopRuntime {
let (x, y, width, height) = self let (x, y, width, height) = self
.window_geometry_locked(&state, &ready, &window_id) .window_geometry_locked(&state, &ready, &window_id)
.await?; .await?;
let is_active = active_window_id
.as_deref()
.map(|active| active == window_id)
.unwrap_or(false);
// Filter out noise: window-manager chrome, toolkit internals, and
// invisible helper windows. Always keep the active window so the
// caller can track focus even when the WM itself is focused.
if !is_active {
let trimmed = title.trim();
if trimmed.is_empty() || trimmed == "Openbox" || (width < 120 && height < 80) {
continue;
}
}
windows.push(DesktopWindowInfo { windows.push(DesktopWindowInfo {
id: window_id.clone(), id: window_id.clone(),
title, title,
@ -605,10 +665,7 @@ impl DesktopRuntime {
y, y,
width, width,
height, height,
is_active: active_window_id is_active,
.as_deref()
.map(|active| active == window_id)
.unwrap_or(false),
}); });
} }
Ok(DesktopWindowListResponse { windows }) Ok(DesktopWindowListResponse { windows })
@ -658,9 +715,10 @@ impl DesktopRuntime {
})?; })?;
let environment = state.environment.clone(); let environment = state.environment.clone();
let display = display.to_string(); let display = display.to_string();
let streaming_config = state.streaming_config.clone();
drop(state); drop(state);
self.streaming_manager self.streaming_manager
.start(&display, resolution, &environment) .start(&display, resolution, &environment, streaming_config, None)
.await .await
} }
@ -1503,14 +1561,21 @@ impl DesktopRuntime {
} }
fn choose_display_num(&self) -> Result<i32, DesktopProblem> { fn choose_display_num(&self) -> Result<i32, DesktopProblem> {
self.choose_display_num_from(self.config.display_num)
}
fn choose_display_num_from(&self, start: i32) -> Result<i32, DesktopProblem> {
if start <= 0 {
return Err(DesktopProblem::invalid_action("displayNum must be > 0"));
}
for offset in 0..MAX_DISPLAY_PROBE { for offset in 0..MAX_DISPLAY_PROBE {
let candidate = self.config.display_num + offset; let candidate = start + offset;
if !socket_path(candidate).exists() { if !socket_path(candidate).exists() {
return Ok(candidate); return Ok(candidate);
} }
} }
Err(DesktopProblem::runtime_failed( Err(DesktopProblem::runtime_failed(
"unable to find an available X display starting at :99", format!("unable to find an available X display starting at :{start}"),
None, None,
Vec::new(), Vec::new(),
)) ))
@ -1579,6 +1644,7 @@ impl DesktopRuntime {
install_command: state.install_command.clone(), install_command: state.install_command.clone(),
processes: self.processes_locked(state), processes: self.processes_locked(state),
runtime_log_path: Some(state.runtime_log_path.to_string_lossy().to_string()), runtime_log_path: Some(state.runtime_log_path.to_string_lossy().to_string()),
windows: Vec::new(),
} }
} }
@ -1656,6 +1722,391 @@ impl DesktopRuntime {
.open(&state.runtime_log_path) .open(&state.runtime_log_path)
.and_then(|mut file| std::io::Write::write_all(&mut file, line.as_bytes())); .and_then(|mut file| std::io::Write::write_all(&mut file, line.as_bytes()));
} }
pub async fn get_clipboard(
&self,
selection: Option<String>,
) -> Result<crate::desktop_types::DesktopClipboardResponse, DesktopProblem> {
let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?;
let sel = selection.unwrap_or_else(|| "clipboard".to_string());
let args = vec!["-selection".to_string(), sel.clone(), "-o".to_string()];
let output = run_command_output("xclip", &args, &ready.environment, INPUT_TIMEOUT)
.await
.map_err(|err| {
DesktopProblem::clipboard_failed(format!("failed to read clipboard: {err}"))
})?;
if !output.status.success() {
// Empty clipboard is not an error
return Ok(crate::desktop_types::DesktopClipboardResponse {
text: String::new(),
selection: sel,
});
}
Ok(crate::desktop_types::DesktopClipboardResponse {
text: String::from_utf8_lossy(&output.stdout).to_string(),
selection: sel,
})
}
pub async fn set_clipboard(
&self,
request: crate::desktop_types::DesktopClipboardWriteRequest,
) -> Result<DesktopActionResponse, DesktopProblem> {
let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?;
let sel = request.selection.unwrap_or_else(|| "clipboard".to_string());
let selections: Vec<String> = if sel == "both" {
vec!["clipboard".to_string(), "primary".to_string()]
} else {
vec![sel]
};
for selection in &selections {
let args = vec![
"-selection".to_string(),
selection.clone(),
"-i".to_string(),
];
let output = run_command_output_with_stdin(
"xclip",
&args,
&ready.environment,
INPUT_TIMEOUT,
request.text.as_bytes().to_vec(),
)
.await
.map_err(|err| {
DesktopProblem::clipboard_failed(format!("failed to write clipboard: {err}"))
})?;
if !output.status.success() {
return Err(DesktopProblem::clipboard_failed(format!(
"clipboard write failed: {}",
String::from_utf8_lossy(&output.stderr).trim()
)));
}
}
Ok(DesktopActionResponse { ok: true })
}
pub async fn focused_window(&self) -> Result<DesktopWindowInfo, DesktopProblem> {
let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?;
let window_id = self
.active_window_id_locked(&state, &ready)
.await?
.ok_or_else(DesktopProblem::no_focused_window)?;
let title = self.window_title_locked(&state, &ready, &window_id).await?;
let (x, y, width, height) = self
.window_geometry_locked(&state, &ready, &window_id)
.await?;
Ok(DesktopWindowInfo {
id: window_id,
title,
x,
y,
width,
height,
is_active: true,
})
}
pub async fn focus_window(&self, window_id: &str) -> Result<DesktopWindowInfo, DesktopProblem> {
let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?;
let args = vec![
"windowactivate".to_string(),
"--sync".to_string(),
window_id.to_string(),
"windowfocus".to_string(),
"--sync".to_string(),
window_id.to_string(),
];
self.run_input_command_locked(&state, &ready, args)
.await
.map_err(|_| {
DesktopProblem::window_not_found(format!("Window {window_id} not found"))
})?;
self.window_info_locked(&state, &ready, window_id).await
}
pub async fn move_window(
&self,
window_id: &str,
request: crate::desktop_types::DesktopWindowMoveRequest,
) -> Result<DesktopWindowInfo, DesktopProblem> {
let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?;
let args = vec![
"windowmove".to_string(),
window_id.to_string(),
request.x.to_string(),
request.y.to_string(),
];
self.run_input_command_locked(&state, &ready, args)
.await
.map_err(|_| {
DesktopProblem::window_not_found(format!("Window {window_id} not found"))
})?;
self.window_info_locked(&state, &ready, window_id).await
}
pub async fn resize_window(
&self,
window_id: &str,
request: crate::desktop_types::DesktopWindowResizeRequest,
) -> Result<DesktopWindowInfo, DesktopProblem> {
let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?;
let args = vec![
"windowsize".to_string(),
window_id.to_string(),
request.width.to_string(),
request.height.to_string(),
];
self.run_input_command_locked(&state, &ready, args)
.await
.map_err(|_| {
DesktopProblem::window_not_found(format!("Window {window_id} not found"))
})?;
self.window_info_locked(&state, &ready, window_id).await
}
async fn window_info_locked(
&self,
state: &DesktopRuntimeStateData,
ready: &DesktopReadyContext,
window_id: &str,
) -> Result<DesktopWindowInfo, DesktopProblem> {
let active_id = self.active_window_id_locked(state, ready).await?;
let title = self.window_title_locked(state, ready, window_id).await?;
let (x, y, width, height) = self.window_geometry_locked(state, ready, window_id).await?;
Ok(DesktopWindowInfo {
id: window_id.to_string(),
title,
x,
y,
width,
height,
is_active: active_id
.as_deref()
.map(|a| a == window_id)
.unwrap_or(false),
})
}
pub async fn launch_app(
&self,
request: crate::desktop_types::DesktopLaunchRequest,
) -> Result<crate::desktop_types::DesktopLaunchResponse, DesktopProblem> {
let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?;
// Verify the app exists
if find_binary(&request.app).is_none() {
// Also try which via the desktop environment
let check = run_command_output(
"which",
&[request.app.clone()],
&ready.environment,
INPUT_TIMEOUT,
)
.await;
if check.is_err() || !check.as_ref().unwrap().status.success() {
return Err(DesktopProblem::app_not_found(format!(
"Application '{}' not found in PATH",
request.app
)));
}
}
let args = request.args.unwrap_or_default();
let snapshot = self
.process_runtime
.start_process(ProcessStartSpec {
command: request.app.clone(),
args,
cwd: None,
env: ready.environment.clone(),
tty: false,
interactive: false,
owner: ProcessOwner::Desktop,
restart_policy: None,
})
.await
.map_err(|err| {
DesktopProblem::runtime_failed(
format!("failed to launch {}: {err}", request.app),
None,
self.processes_locked(&state),
)
})?;
let mut window_id = None;
if request.wait.unwrap_or(false) {
if let Some(pid) = snapshot.pid {
// Poll for window to appear
let deadline = tokio::time::Instant::now() + Duration::from_secs(5);
let search_args = vec!["search".to_string(), "--pid".to_string(), pid.to_string()];
loop {
let output = run_command_output(
"xdotool",
&search_args,
&ready.environment,
INPUT_TIMEOUT,
)
.await;
if let Ok(ref out) = output {
if out.status.success() {
let id = String::from_utf8_lossy(&out.stdout)
.lines()
.next()
.map(|s| s.trim().to_string());
if id.as_ref().is_some_and(|s| !s.is_empty()) {
window_id = id;
break;
}
}
}
if tokio::time::Instant::now() >= deadline {
break;
}
tokio::time::sleep(Duration::from_millis(200)).await;
}
}
}
Ok(crate::desktop_types::DesktopLaunchResponse {
process_id: snapshot.id,
pid: snapshot.pid,
window_id,
})
}
pub async fn open_target(
&self,
request: crate::desktop_types::DesktopOpenRequest,
) -> Result<crate::desktop_types::DesktopOpenResponse, DesktopProblem> {
let mut state = self.inner.lock().await;
let ready = self.ensure_ready_locked(&mut state).await?;
let snapshot = self
.process_runtime
.start_process(ProcessStartSpec {
command: "xdg-open".to_string(),
args: vec![request.target],
cwd: None,
env: ready.environment.clone(),
tty: false,
interactive: false,
owner: ProcessOwner::Desktop,
restart_policy: None,
})
.await
.map_err(|err| {
DesktopProblem::runtime_failed(
format!("failed to open target: {err}"),
None,
self.processes_locked(&state),
)
})?;
Ok(crate::desktop_types::DesktopOpenResponse {
process_id: snapshot.id,
pid: snapshot.pid,
})
}
async fn composite_cursor(
&self,
state: &DesktopRuntimeStateData,
ready: &DesktopReadyContext,
screenshot_bytes: Vec<u8>,
options: &DesktopScreenshotOptions,
) -> Result<Vec<u8>, DesktopProblem> {
let pos = self.mouse_position_locked(state, ready).await?;
self.draw_cursor_on_image(screenshot_bytes, pos.x, pos.y, options, &ready.environment)
.await
}
async fn composite_cursor_region(
&self,
state: &DesktopRuntimeStateData,
ready: &DesktopReadyContext,
screenshot_bytes: Vec<u8>,
options: &DesktopScreenshotOptions,
region_x: i32,
region_y: i32,
_region_width: u32,
_region_height: u32,
) -> Result<Vec<u8>, DesktopProblem> {
let pos = self.mouse_position_locked(state, ready).await?;
// Adjust cursor position relative to the region
let cursor_x = pos.x - region_x;
let cursor_y = pos.y - region_y;
if cursor_x < 0 || cursor_y < 0 {
// Cursor is outside the region, return screenshot as-is
return Ok(screenshot_bytes);
}
self.draw_cursor_on_image(
screenshot_bytes,
cursor_x,
cursor_y,
options,
&ready.environment,
)
.await
}
async fn draw_cursor_on_image(
&self,
image_bytes: Vec<u8>,
x: i32,
y: i32,
options: &DesktopScreenshotOptions,
environment: &HashMap<String, String>,
) -> Result<Vec<u8>, DesktopProblem> {
// Draw a crosshair cursor using ImageMagick convert
let draw_cmd = format!(
"stroke red stroke-width 2 line {},{},{},{} line {},{},{},{}",
x - 10,
y,
x + 10,
y,
x,
y - 10,
x,
y + 10
);
let args = vec![
"-".to_string(), // read from stdin
"-draw".to_string(),
draw_cmd,
options.output_arg().to_string(),
];
let output = run_command_output_with_stdin(
"convert",
&args,
environment,
SCREENSHOT_TIMEOUT,
image_bytes.clone(),
)
.await
.map_err(|err| {
DesktopProblem::screenshot_failed(
format!("failed to composite cursor: {err}"),
Vec::new(),
)
})?;
if !output.status.success() {
// Fall back to returning the original screenshot without cursor
return Ok(image_bytes);
}
Ok(output.stdout)
}
pub async fn stream_status(&self) -> DesktopStreamStatusResponse {
self.streaming_manager.status().await
}
} }
fn desktop_problem_to_sandbox_error(problem: DesktopProblem) -> SandboxError { fn desktop_problem_to_sandbox_error(problem: DesktopProblem) -> SandboxError {

View file

@ -21,13 +21,32 @@ const NEKO_READY_TIMEOUT: Duration = Duration::from_secs(15);
/// How long between readiness polls. /// How long between readiness polls.
const NEKO_READY_POLL: Duration = Duration::from_millis(300); const NEKO_READY_POLL: Duration = Duration::from_millis(300);
#[derive(Debug, Clone)]
pub struct StreamingConfig {
pub video_codec: String,
pub audio_codec: String,
pub frame_rate: u32,
pub webrtc_port_range: String,
}
impl Default for StreamingConfig {
fn default() -> Self {
Self {
video_codec: "vp8".to_string(),
audio_codec: "opus".to_string(),
frame_rate: 30,
webrtc_port_range: NEKO_EPR.to_string(),
}
}
}
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct DesktopStreamingManager { pub struct DesktopStreamingManager {
inner: Arc<Mutex<DesktopStreamingState>>, inner: Arc<Mutex<DesktopStreamingState>>,
process_runtime: Arc<ProcessRuntime>, process_runtime: Arc<ProcessRuntime>,
} }
#[derive(Debug, Default)] #[derive(Debug)]
struct DesktopStreamingState { struct DesktopStreamingState {
active: bool, active: bool,
process_id: Option<String>, process_id: Option<String>,
@ -37,6 +56,23 @@ struct DesktopStreamingState {
neko_session_cookie: Option<String>, neko_session_cookie: Option<String>,
display: Option<String>, display: Option<String>,
resolution: Option<DesktopResolution>, resolution: Option<DesktopResolution>,
streaming_config: StreamingConfig,
window_id: Option<String>,
}
impl Default for DesktopStreamingState {
fn default() -> Self {
Self {
active: false,
process_id: None,
neko_base_url: None,
neko_session_cookie: None,
display: None,
resolution: None,
streaming_config: StreamingConfig::default(),
window_id: None,
}
}
} }
impl DesktopStreamingManager { impl DesktopStreamingManager {
@ -53,11 +89,18 @@ impl DesktopStreamingManager {
display: &str, display: &str,
resolution: DesktopResolution, resolution: DesktopResolution,
environment: &HashMap<String, String>, environment: &HashMap<String, String>,
config: Option<StreamingConfig>,
window_id: Option<String>,
) -> Result<DesktopStreamStatusResponse, SandboxError> { ) -> Result<DesktopStreamStatusResponse, SandboxError> {
let config = config.unwrap_or_default();
let mut state = self.inner.lock().await; let mut state = self.inner.lock().await;
if state.active { if state.active {
return Ok(DesktopStreamStatusResponse { active: true }); return Ok(DesktopStreamStatusResponse {
active: true,
window_id: state.window_id.clone(),
process_id: state.process_id.clone(),
});
} }
// Stop any stale process. // Stop any stale process.
@ -72,7 +115,10 @@ impl DesktopStreamingManager {
env.insert("DISPLAY".to_string(), display.to_string()); env.insert("DISPLAY".to_string(), display.to_string());
let bind_addr = format!("0.0.0.0:{}", NEKO_INTERNAL_PORT); let bind_addr = format!("0.0.0.0:{}", NEKO_INTERNAL_PORT);
let screen = format!("{}x{}@30", resolution.width, resolution.height); let screen = format!(
"{}x{}@{}",
resolution.width, resolution.height, config.frame_rate
);
let snapshot = self let snapshot = self
.process_runtime .process_runtime
@ -89,11 +135,11 @@ impl DesktopStreamingManager {
"--capture.video.display".to_string(), "--capture.video.display".to_string(),
display.to_string(), display.to_string(),
"--capture.video.codec".to_string(), "--capture.video.codec".to_string(),
"vp8".to_string(), config.video_codec.clone(),
"--capture.audio.codec".to_string(), "--capture.audio.codec".to_string(),
"opus".to_string(), config.audio_codec.clone(),
"--webrtc.epr".to_string(), "--webrtc.epr".to_string(),
NEKO_EPR.to_string(), config.webrtc_port_range.clone(),
"--webrtc.icelite".to_string(), "--webrtc.icelite".to_string(),
"--webrtc.nat1to1".to_string(), "--webrtc.nat1to1".to_string(),
"127.0.0.1".to_string(), "127.0.0.1".to_string(),
@ -117,10 +163,13 @@ impl DesktopStreamingManager {
})?; })?;
let neko_base = format!("http://127.0.0.1:{}", NEKO_INTERNAL_PORT); let neko_base = format!("http://127.0.0.1:{}", NEKO_INTERNAL_PORT);
let process_id_clone = snapshot.id.clone();
state.process_id = Some(snapshot.id.clone()); state.process_id = Some(snapshot.id.clone());
state.neko_base_url = Some(neko_base.clone()); state.neko_base_url = Some(neko_base.clone());
state.display = Some(display.to_string()); state.display = Some(display.to_string());
state.resolution = Some(resolution); state.resolution = Some(resolution);
state.streaming_config = config;
state.window_id = window_id;
state.active = true; state.active = true;
// Drop the lock before waiting for readiness. // Drop the lock before waiting for readiness.
@ -183,7 +232,15 @@ impl DesktopStreamingManager {
state.neko_session_cookie = Some(cookie.clone()); state.neko_session_cookie = Some(cookie.clone());
} }
Ok(DesktopStreamStatusResponse { active: true }) let state = self.inner.lock().await;
let state_window_id = state.window_id.clone();
drop(state);
Ok(DesktopStreamStatusResponse {
active: true,
window_id: state_window_id,
process_id: Some(process_id_clone),
})
} }
/// Stop streaming and tear down neko subprocess. /// Stop streaming and tear down neko subprocess.
@ -200,7 +257,21 @@ impl DesktopStreamingManager {
state.neko_session_cookie = None; state.neko_session_cookie = None;
state.display = None; state.display = None;
state.resolution = None; state.resolution = None;
DesktopStreamStatusResponse { active: false } state.window_id = None;
DesktopStreamStatusResponse {
active: false,
window_id: None,
process_id: None,
}
}
pub async fn status(&self) -> DesktopStreamStatusResponse {
let state = self.inner.lock().await;
DesktopStreamStatusResponse {
active: state.active,
window_id: state.window_id.clone(),
process_id: state.process_id.clone(),
}
} }
pub async fn ensure_active(&self) -> Result<(), SandboxError> { pub async fn ensure_active(&self) -> Result<(), SandboxError> {

View file

@ -60,6 +60,9 @@ pub struct DesktopStatusResponse {
pub processes: Vec<DesktopProcessInfo>, pub processes: Vec<DesktopProcessInfo>,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
pub runtime_log_path: Option<String>, pub runtime_log_path: Option<String>,
/// Current visible windows (included when the desktop is active).
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub windows: Vec<DesktopWindowInfo>,
} }
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)] #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)]
@ -71,6 +74,20 @@ pub struct DesktopStartRequest {
pub height: Option<u32>, pub height: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
pub dpi: Option<u32>, pub dpi: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub display_num: Option<i32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub state_dir: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub stream_video_codec: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub stream_audio_codec: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub stream_frame_rate: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub webrtc_port_range: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub recording_fps: Option<u32>,
} }
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)] #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)]
@ -82,6 +99,8 @@ pub struct DesktopScreenshotQuery {
pub quality: Option<u8>, pub quality: Option<u8>,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
pub scale: Option<f32>, pub scale: Option<f32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub show_cursor: Option<bool>,
} }
#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)] #[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
@ -105,6 +124,8 @@ pub struct DesktopRegionScreenshotQuery {
pub quality: Option<u8>, pub quality: Option<u8>,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
pub scale: Option<f32>, pub scale: Option<f32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub show_cursor: Option<bool>,
} }
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)] #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
@ -299,4 +320,78 @@ pub struct DesktopRecordingListResponse {
#[serde(rename_all = "camelCase")] #[serde(rename_all = "camelCase")]
pub struct DesktopStreamStatusResponse { pub struct DesktopStreamStatusResponse {
pub active: bool, pub active: bool,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub window_id: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub process_id: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopClipboardResponse {
pub text: String,
pub selection: String,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)]
#[serde(rename_all = "camelCase")]
pub struct DesktopClipboardQuery {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub selection: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopClipboardWriteRequest {
pub text: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub selection: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopLaunchRequest {
pub app: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub args: Option<Vec<String>>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub wait: Option<bool>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopLaunchResponse {
pub process_id: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub pid: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub window_id: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopOpenRequest {
pub target: String,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopOpenResponse {
pub process_id: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub pid: Option<u32>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopWindowMoveRequest {
pub x: i32,
pub y: i32,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
#[serde(rename_all = "camelCase")]
pub struct DesktopWindowResizeRequest {
pub width: u32,
pub height: u32,
} }

View file

@ -216,6 +216,28 @@ pub fn build_router_with_state(shared: Arc<AppState>) -> (Router, Arc<AppState>)
.route("/desktop/keyboard/up", post(post_v1_desktop_keyboard_up)) .route("/desktop/keyboard/up", post(post_v1_desktop_keyboard_up))
.route("/desktop/display/info", get(get_v1_desktop_display_info)) .route("/desktop/display/info", get(get_v1_desktop_display_info))
.route("/desktop/windows", get(get_v1_desktop_windows)) .route("/desktop/windows", get(get_v1_desktop_windows))
.route(
"/desktop/windows/focused",
get(get_v1_desktop_windows_focused),
)
.route(
"/desktop/windows/:id/focus",
post(post_v1_desktop_window_focus),
)
.route(
"/desktop/windows/:id/move",
post(post_v1_desktop_window_move),
)
.route(
"/desktop/windows/:id/resize",
post(post_v1_desktop_window_resize),
)
.route(
"/desktop/clipboard",
get(get_v1_desktop_clipboard).post(post_v1_desktop_clipboard),
)
.route("/desktop/launch", post(post_v1_desktop_launch))
.route("/desktop/open", post(post_v1_desktop_open))
.route( .route(
"/desktop/recording/start", "/desktop/recording/start",
post(post_v1_desktop_recording_start), post(post_v1_desktop_recording_start),
@ -235,6 +257,7 @@ pub fn build_router_with_state(shared: Arc<AppState>) -> (Router, Arc<AppState>)
) )
.route("/desktop/stream/start", post(post_v1_desktop_stream_start)) .route("/desktop/stream/start", post(post_v1_desktop_stream_start))
.route("/desktop/stream/stop", post(post_v1_desktop_stream_stop)) .route("/desktop/stream/stop", post(post_v1_desktop_stream_stop))
.route("/desktop/stream/status", get(get_v1_desktop_stream_status))
.route("/desktop/stream/signaling", get(get_v1_desktop_stream_ws)) .route("/desktop/stream/signaling", get(get_v1_desktop_stream_ws))
.route("/agents", get(get_v1_agents)) .route("/agents", get(get_v1_agents))
.route("/agents/:agent", get(get_v1_agent)) .route("/agents/:agent", get(get_v1_agent))
@ -405,6 +428,15 @@ pub async fn shutdown_servers(state: &Arc<AppState>) {
post_v1_desktop_keyboard_up, post_v1_desktop_keyboard_up,
get_v1_desktop_display_info, get_v1_desktop_display_info,
get_v1_desktop_windows, get_v1_desktop_windows,
get_v1_desktop_windows_focused,
post_v1_desktop_window_focus,
post_v1_desktop_window_move,
post_v1_desktop_window_resize,
get_v1_desktop_clipboard,
post_v1_desktop_clipboard,
post_v1_desktop_launch,
post_v1_desktop_open,
get_v1_desktop_stream_status,
post_v1_desktop_recording_start, post_v1_desktop_recording_start,
post_v1_desktop_recording_stop, post_v1_desktop_recording_stop,
get_v1_desktop_recordings, get_v1_desktop_recordings,
@ -483,6 +515,15 @@ pub async fn shutdown_servers(state: &Arc<AppState>) {
DesktopRecordingInfo, DesktopRecordingInfo,
DesktopRecordingListResponse, DesktopRecordingListResponse,
DesktopStreamStatusResponse, DesktopStreamStatusResponse,
DesktopClipboardResponse,
DesktopClipboardQuery,
DesktopClipboardWriteRequest,
DesktopLaunchRequest,
DesktopLaunchResponse,
DesktopOpenRequest,
DesktopOpenResponse,
DesktopWindowMoveRequest,
DesktopWindowResizeRequest,
ServerStatus, ServerStatus,
ServerStatusInfo, ServerStatusInfo,
AgentCapabilities, AgentCapabilities,
@ -1029,6 +1070,193 @@ async fn get_v1_desktop_windows(
Ok(Json(windows)) Ok(Json(windows))
} }
/// Get the currently focused desktop window.
///
/// Returns information about the window that currently has input focus.
#[utoipa::path(
get,
path = "/v1/desktop/windows/focused",
tag = "v1",
responses(
(status = 200, description = "Focused window info", body = DesktopWindowInfo),
(status = 404, description = "No window is focused", body = ProblemDetails),
(status = 409, description = "Desktop runtime is not ready", body = ProblemDetails)
)
)]
async fn get_v1_desktop_windows_focused(
State(state): State<Arc<AppState>>,
) -> Result<Json<DesktopWindowInfo>, ApiError> {
let window = state.desktop_runtime().focused_window().await?;
Ok(Json(window))
}
/// Focus a desktop window.
///
/// Brings the specified window to the foreground and gives it input focus.
#[utoipa::path(
post,
path = "/v1/desktop/windows/{id}/focus",
tag = "v1",
params(
("id" = String, Path, description = "X11 window ID")
),
responses(
(status = 200, description = "Window info after focus", body = DesktopWindowInfo),
(status = 404, description = "Window not found", body = ProblemDetails),
(status = 409, description = "Desktop runtime is not ready", body = ProblemDetails)
)
)]
async fn post_v1_desktop_window_focus(
State(state): State<Arc<AppState>>,
Path(id): Path<String>,
) -> Result<Json<DesktopWindowInfo>, ApiError> {
let window = state.desktop_runtime().focus_window(&id).await?;
Ok(Json(window))
}
/// Move a desktop window.
///
/// Moves the specified window to the given position.
#[utoipa::path(
post,
path = "/v1/desktop/windows/{id}/move",
tag = "v1",
params(
("id" = String, Path, description = "X11 window ID")
),
request_body = DesktopWindowMoveRequest,
responses(
(status = 200, description = "Window info after move", body = DesktopWindowInfo),
(status = 404, description = "Window not found", body = ProblemDetails),
(status = 409, description = "Desktop runtime is not ready", body = ProblemDetails)
)
)]
async fn post_v1_desktop_window_move(
State(state): State<Arc<AppState>>,
Path(id): Path<String>,
Json(body): Json<DesktopWindowMoveRequest>,
) -> Result<Json<DesktopWindowInfo>, ApiError> {
let window = state.desktop_runtime().move_window(&id, body).await?;
Ok(Json(window))
}
/// Resize a desktop window.
///
/// Resizes the specified window to the given dimensions.
#[utoipa::path(
post,
path = "/v1/desktop/windows/{id}/resize",
tag = "v1",
params(
("id" = String, Path, description = "X11 window ID")
),
request_body = DesktopWindowResizeRequest,
responses(
(status = 200, description = "Window info after resize", body = DesktopWindowInfo),
(status = 404, description = "Window not found", body = ProblemDetails),
(status = 409, description = "Desktop runtime is not ready", body = ProblemDetails)
)
)]
async fn post_v1_desktop_window_resize(
State(state): State<Arc<AppState>>,
Path(id): Path<String>,
Json(body): Json<DesktopWindowResizeRequest>,
) -> Result<Json<DesktopWindowInfo>, ApiError> {
let window = state.desktop_runtime().resize_window(&id, body).await?;
Ok(Json(window))
}
/// Read the desktop clipboard.
///
/// Returns the current text content of the X11 clipboard.
#[utoipa::path(
get,
path = "/v1/desktop/clipboard",
tag = "v1",
params(DesktopClipboardQuery),
responses(
(status = 200, description = "Clipboard contents", body = DesktopClipboardResponse),
(status = 409, description = "Desktop runtime is not ready", body = ProblemDetails),
(status = 500, description = "Clipboard read failed", body = ProblemDetails)
)
)]
async fn get_v1_desktop_clipboard(
State(state): State<Arc<AppState>>,
Query(query): Query<DesktopClipboardQuery>,
) -> Result<Json<DesktopClipboardResponse>, ApiError> {
let clipboard = state
.desktop_runtime()
.get_clipboard(query.selection)
.await?;
Ok(Json(clipboard))
}
/// Write to the desktop clipboard.
///
/// Sets the text content of the X11 clipboard.
#[utoipa::path(
post,
path = "/v1/desktop/clipboard",
tag = "v1",
request_body = DesktopClipboardWriteRequest,
responses(
(status = 200, description = "Clipboard updated", body = DesktopActionResponse),
(status = 409, description = "Desktop runtime is not ready", body = ProblemDetails),
(status = 500, description = "Clipboard write failed", body = ProblemDetails)
)
)]
async fn post_v1_desktop_clipboard(
State(state): State<Arc<AppState>>,
Json(body): Json<DesktopClipboardWriteRequest>,
) -> Result<Json<DesktopActionResponse>, ApiError> {
let result = state.desktop_runtime().set_clipboard(body).await?;
Ok(Json(result))
}
/// Launch a desktop application.
///
/// Launches an application by name on the managed desktop, optionally waiting
/// for its window to appear.
#[utoipa::path(
post,
path = "/v1/desktop/launch",
tag = "v1",
request_body = DesktopLaunchRequest,
responses(
(status = 200, description = "Application launched", body = DesktopLaunchResponse),
(status = 404, description = "Application not found", body = ProblemDetails),
(status = 409, description = "Desktop runtime is not ready", body = ProblemDetails)
)
)]
async fn post_v1_desktop_launch(
State(state): State<Arc<AppState>>,
Json(body): Json<DesktopLaunchRequest>,
) -> Result<Json<DesktopLaunchResponse>, ApiError> {
let result = state.desktop_runtime().launch_app(body).await?;
Ok(Json(result))
}
/// Open a file or URL with the default handler.
///
/// Opens a file path or URL using xdg-open on the managed desktop.
#[utoipa::path(
post,
path = "/v1/desktop/open",
tag = "v1",
request_body = DesktopOpenRequest,
responses(
(status = 200, description = "Target opened", body = DesktopOpenResponse),
(status = 409, description = "Desktop runtime is not ready", body = ProblemDetails)
)
)]
async fn post_v1_desktop_open(
State(state): State<Arc<AppState>>,
Json(body): Json<DesktopOpenRequest>,
) -> Result<Json<DesktopOpenResponse>, ApiError> {
let result = state.desktop_runtime().open_target(body).await?;
Ok(Json(result))
}
/// Start desktop recording. /// Start desktop recording.
/// ///
/// Starts an ffmpeg x11grab recording against the managed desktop and returns /// Starts an ffmpeg x11grab recording against the managed desktop and returns
@ -1201,6 +1429,23 @@ async fn post_v1_desktop_stream_stop(
Ok(Json(state.desktop_runtime().stop_streaming().await)) Ok(Json(state.desktop_runtime().stop_streaming().await))
} }
/// Get desktop stream status.
///
/// Returns the current state of the desktop WebRTC streaming session.
#[utoipa::path(
get,
path = "/v1/desktop/stream/status",
tag = "v1",
responses(
(status = 200, description = "Desktop stream status", body = DesktopStreamStatusResponse)
)
)]
async fn get_v1_desktop_stream_status(
State(state): State<Arc<AppState>>,
) -> Result<Json<DesktopStreamStatusResponse>, ApiError> {
Ok(Json(state.desktop_runtime().stream_status().await))
}
/// Open a desktop WebRTC signaling session. /// Open a desktop WebRTC signaling session.
/// ///
/// Upgrades the connection to a WebSocket used for WebRTC signaling between /// Upgrades the connection to a WebSocket used for WebRTC signaling between

View file

@ -0,0 +1,497 @@
/// Integration tests that verify all software documented in docs/common-software.mdx
/// is installed and working inside the sandbox.
///
/// These tests use `docker/test-common-software/Dockerfile` which extends the base
/// test-agent image with all documented software pre-installed.
///
/// KEEP IN SYNC with docs/common-software.mdx and docker/test-common-software/Dockerfile.
///
/// Run with:
/// cargo test -p sandbox-agent --test common_software
use reqwest::header::HeaderMap;
use reqwest::{Method, StatusCode};
use serde_json::{json, Value};
use serial_test::serial;
#[path = "support/docker_common_software.rs"]
mod docker_support;
use docker_support::TestApp;
async fn send_request(
app: &docker_support::DockerApp,
method: Method,
uri: &str,
body: Option<Value>,
) -> (StatusCode, HeaderMap, Vec<u8>) {
let client = reqwest::Client::new();
let mut builder = client.request(method, app.http_url(uri));
let response = if let Some(body) = body {
builder = builder.header("content-type", "application/json");
builder
.body(body.to_string())
.send()
.await
.expect("request")
} else {
builder.send().await.expect("request")
};
let status = response.status();
let headers = response.headers().clone();
let bytes = response.bytes().await.expect("body");
(status, headers, bytes.to_vec())
}
fn parse_json(bytes: &[u8]) -> Value {
if bytes.is_empty() {
Value::Null
} else {
serde_json::from_slice(bytes).expect("valid json")
}
}
/// Run a command inside the sandbox and assert it exits with code 0.
/// Returns the parsed JSON response.
async fn run_ok(app: &docker_support::DockerApp, command: &str, args: &[&str]) -> Value {
run_ok_with_timeout(app, command, args, 30_000).await
}
async fn run_ok_with_timeout(
app: &docker_support::DockerApp,
command: &str,
args: &[&str],
timeout_ms: u64,
) -> Value {
let (status, _, body) = send_request(
app,
Method::POST,
"/v1/processes/run",
Some(json!({
"command": command,
"args": args,
"timeoutMs": timeout_ms
})),
)
.await;
assert_eq!(
status,
StatusCode::OK,
"run {command} failed: {}",
String::from_utf8_lossy(&body)
);
let parsed = parse_json(&body);
assert_eq!(
parsed["exitCode"], 0,
"{command} exited with non-zero code.\nstdout: {}\nstderr: {}",
parsed["stdout"], parsed["stderr"]
);
parsed
}
// ---------------------------------------------------------------------------
// Browsers
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn chromium_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "chromium", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("Chromium"),
"expected Chromium version string, got: {stdout}"
);
}
#[tokio::test]
#[serial]
async fn firefox_esr_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "firefox-esr", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("Mozilla Firefox"),
"expected Firefox version string, got: {stdout}"
);
}
// ---------------------------------------------------------------------------
// Languages and runtimes
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn nodejs_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "node", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.starts_with('v'),
"expected node version string, got: {stdout}"
);
}
#[tokio::test]
#[serial]
async fn npm_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "npm", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn python3_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "python3", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("Python 3"),
"expected Python version string, got: {stdout}"
);
}
#[tokio::test]
#[serial]
async fn pip3_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "pip3", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn java_is_installed_and_runs() {
let test_app = TestApp::new();
// java --version prints to stdout on modern JDKs
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/processes/run",
Some(json!({
"command": "java",
"args": ["--version"],
"timeoutMs": 30000
})),
)
.await;
assert_eq!(status, StatusCode::OK);
let parsed = parse_json(&body);
assert_eq!(parsed["exitCode"], 0);
let combined = format!(
"{}{}",
parsed["stdout"].as_str().unwrap_or(""),
parsed["stderr"].as_str().unwrap_or("")
);
assert!(
combined.contains("openjdk") || combined.contains("OpenJDK") || combined.contains("java"),
"expected Java version string, got: {combined}"
);
}
#[tokio::test]
#[serial]
async fn ruby_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "ruby", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("ruby"),
"expected Ruby version string, got: {stdout}"
);
}
// ---------------------------------------------------------------------------
// Databases
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn sqlite3_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "sqlite3", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(!stdout.is_empty(), "expected sqlite3 version output");
}
#[tokio::test]
#[serial]
async fn redis_server_is_installed() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "redis-server", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("Redis") || stdout.contains("redis"),
"expected Redis version string, got: {stdout}"
);
}
// ---------------------------------------------------------------------------
// Build tools
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn gcc_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "gcc", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn make_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "make", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn cmake_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "cmake", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn pkg_config_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "pkg-config", &["--version"]).await;
}
// ---------------------------------------------------------------------------
// CLI tools
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn git_is_installed_and_runs() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "git", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("git version"),
"expected git version string, got: {stdout}"
);
}
#[tokio::test]
#[serial]
async fn jq_is_installed_and_runs() {
let test_app = TestApp::new();
// Pipe a simple JSON through jq
let result = run_ok(&test_app.app, "sh", &["-c", "echo '{\"a\":1}' | jq '.a'"]).await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
assert_eq!(stdout, "1", "jq did not parse JSON correctly: {stdout}");
}
#[tokio::test]
#[serial]
async fn tmux_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "tmux", &["-V"]).await;
}
// ---------------------------------------------------------------------------
// Media and graphics
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn ffmpeg_is_installed_and_runs() {
let test_app = TestApp::new();
// ffmpeg prints version to stderr, so just check exit code via -version
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/processes/run",
Some(json!({
"command": "ffmpeg",
"args": ["-version"],
"timeoutMs": 10000
})),
)
.await;
assert_eq!(status, StatusCode::OK);
let parsed = parse_json(&body);
assert_eq!(parsed["exitCode"], 0);
let combined = format!(
"{}{}",
parsed["stdout"].as_str().unwrap_or(""),
parsed["stderr"].as_str().unwrap_or("")
);
assert!(
combined.contains("ffmpeg version"),
"expected ffmpeg version string, got: {combined}"
);
}
#[tokio::test]
#[serial]
async fn imagemagick_is_installed() {
let test_app = TestApp::new();
run_ok(&test_app.app, "convert", &["--version"]).await;
}
#[tokio::test]
#[serial]
async fn poppler_pdftoppm_is_installed() {
let test_app = TestApp::new();
// pdftoppm -v prints to stderr and exits 0
let (status, _, body) = send_request(
&test_app.app,
Method::POST,
"/v1/processes/run",
Some(json!({
"command": "pdftoppm",
"args": ["-v"],
"timeoutMs": 10000
})),
)
.await;
assert_eq!(status, StatusCode::OK);
let parsed = parse_json(&body);
assert_eq!(parsed["exitCode"], 0);
}
// ---------------------------------------------------------------------------
// Desktop applications (verify binary exists, don't launch GUI)
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn gimp_is_installed() {
let test_app = TestApp::new();
let result = run_ok(&test_app.app, "gimp", &["--version"]).await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("GIMP") || stdout.contains("gimp") || stdout.contains("Image Manipulation"),
"expected GIMP version string, got: {stdout}"
);
}
// ---------------------------------------------------------------------------
// Functional tests: verify tools actually work, not just that they're present
// ---------------------------------------------------------------------------
#[tokio::test]
#[serial]
async fn python3_can_run_script() {
let test_app = TestApp::new();
let result = run_ok(
&test_app.app,
"python3",
&["-c", "import json; print(json.dumps({'ok': True}))"],
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
let parsed: Value = serde_json::from_str(stdout).expect("python json output");
assert_eq!(parsed["ok"], true);
}
#[tokio::test]
#[serial]
async fn node_can_run_script() {
let test_app = TestApp::new();
let result = run_ok(
&test_app.app,
"node",
&["-e", "console.log(JSON.stringify({ok: true}))"],
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
let parsed: Value = serde_json::from_str(stdout).expect("node json output");
assert_eq!(parsed["ok"], true);
}
#[tokio::test]
#[serial]
async fn ruby_can_run_script() {
let test_app = TestApp::new();
let result = run_ok(
&test_app.app,
"ruby",
&["-e", "require 'json'; puts JSON.generate({ok: true})"],
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
let parsed: Value = serde_json::from_str(stdout).expect("ruby json output");
assert_eq!(parsed["ok"], true);
}
#[tokio::test]
#[serial]
async fn gcc_can_compile_and_run_hello_world() {
let test_app = TestApp::new();
// Write a C file
run_ok(
&test_app.app,
"sh",
&["-c", r#"printf '#include <stdio.h>\nint main(){printf("hello\\n");return 0;}\n' > /tmp/hello.c"#],
)
.await;
// Compile it
run_ok(&test_app.app, "gcc", &["-o", "/tmp/hello", "/tmp/hello.c"]).await;
// Run it
let result = run_ok(&test_app.app, "/tmp/hello", &[]).await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
assert_eq!(stdout, "hello");
}
#[tokio::test]
#[serial]
async fn sqlite3_can_create_and_query() {
let test_app = TestApp::new();
let result = run_ok(
&test_app.app,
"sh",
&[
"-c",
"sqlite3 /tmp/test.db 'CREATE TABLE t(v TEXT); INSERT INTO t VALUES(\"ok\"); SELECT v FROM t;'",
],
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("").trim();
assert_eq!(stdout, "ok");
}
#[tokio::test]
#[serial]
async fn git_can_init_and_commit() {
let test_app = TestApp::new();
run_ok(
&test_app.app,
"sh",
&[
"-c",
"cd /tmp && mkdir -p testrepo && cd testrepo && git init && git config user.email 'test@test.com' && git config user.name 'Test' && touch file && git add file && git commit -m 'init'",
],
)
.await;
}
#[tokio::test]
#[serial]
async fn chromium_headless_can_dump_dom() {
let test_app = TestApp::new();
// Use headless mode to dump the DOM of a blank page
let result = run_ok_with_timeout(
&test_app.app,
"chromium",
&[
"--headless",
"--no-sandbox",
"--disable-gpu",
"--dump-dom",
"data:text/html,<h1>hello</h1>",
],
30_000,
)
.await;
let stdout = result["stdout"].as_str().unwrap_or("");
assert!(
stdout.contains("hello"),
"expected hello in DOM dump, got: {stdout}"
);
}

View file

@ -0,0 +1,332 @@
/// Docker support for common-software integration tests.
///
/// Builds the `docker/test-common-software/Dockerfile` image (which extends the
/// base test-agent image with pre-installed common software) and provides a
/// `TestApp` that runs a container from it.
///
/// KEEP IN SYNC with docs/common-software.mdx and docker/test-common-software/Dockerfile.
use std::collections::BTreeMap;
use std::io::{Read, Write};
use std::net::TcpStream;
use std::path::{Path, PathBuf};
use std::process::Command;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::OnceLock;
use std::thread;
use std::time::{Duration, SystemTime, UNIX_EPOCH};
use tempfile::TempDir;
const CONTAINER_PORT: u16 = 3000;
const DEFAULT_PATH: &str = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin";
const BASE_IMAGE_TAG: &str = "sandbox-agent-test:dev";
const COMMON_SOFTWARE_IMAGE_TAG: &str = "sandbox-agent-test-common-software:dev";
static IMAGE_TAG: OnceLock<String> = OnceLock::new();
static DOCKER_BIN: OnceLock<PathBuf> = OnceLock::new();
static CONTAINER_COUNTER: AtomicU64 = AtomicU64::new(0);
#[derive(Clone)]
pub struct DockerApp {
base_url: String,
}
impl DockerApp {
pub fn http_url(&self, path: &str) -> String {
format!("{}{}", self.base_url, path)
}
}
pub struct TestApp {
pub app: DockerApp,
_root: TempDir,
container_id: String,
}
impl TestApp {
pub fn new() -> Self {
let root = tempfile::tempdir().expect("create docker test root");
let layout = TestLayout::new(root.path());
layout.create();
let container_id = unique_container_id();
let image = ensure_common_software_image();
let env = build_env(&layout);
let mounts = build_mounts(root.path());
let base_url = run_container(&container_id, &image, &mounts, &env);
Self {
app: DockerApp { base_url },
_root: root,
container_id,
}
}
}
impl Drop for TestApp {
fn drop(&mut self) {
let _ = Command::new(docker_bin())
.args(["rm", "-f", &self.container_id])
.output();
}
}
struct TestLayout {
home: PathBuf,
xdg_data_home: PathBuf,
xdg_state_home: PathBuf,
}
impl TestLayout {
fn new(root: &Path) -> Self {
Self {
home: root.join("home"),
xdg_data_home: root.join("xdg-data"),
xdg_state_home: root.join("xdg-state"),
}
}
fn create(&self) {
for dir in [&self.home, &self.xdg_data_home, &self.xdg_state_home] {
std::fs::create_dir_all(dir).expect("create docker test dir");
}
}
}
fn ensure_base_image() -> String {
let repo_root = repo_root();
let image_tag =
std::env::var("SANDBOX_AGENT_TEST_IMAGE").unwrap_or_else(|_| BASE_IMAGE_TAG.to_string());
let output = Command::new(docker_bin())
.args(["build", "--tag", &image_tag, "--file"])
.arg(
repo_root
.join("docker")
.join("test-agent")
.join("Dockerfile"),
)
.arg(&repo_root)
.output()
.expect("build base test image");
if !output.status.success() {
panic!(
"failed to build base test image: {}",
String::from_utf8_lossy(&output.stderr)
);
}
image_tag
}
fn ensure_common_software_image() -> String {
IMAGE_TAG
.get_or_init(|| {
let base_image = ensure_base_image();
let repo_root = repo_root();
let image_tag = std::env::var("SANDBOX_AGENT_TEST_COMMON_SOFTWARE_IMAGE")
.unwrap_or_else(|_| COMMON_SOFTWARE_IMAGE_TAG.to_string());
let output = Command::new(docker_bin())
.args([
"build",
"--tag",
&image_tag,
"--build-arg",
&format!("BASE_IMAGE={base_image}"),
"--file",
])
.arg(
repo_root
.join("docker")
.join("test-common-software")
.join("Dockerfile"),
)
.arg(&repo_root)
.output()
.expect("build common-software test image");
if !output.status.success() {
panic!(
"failed to build common-software test image: {}",
String::from_utf8_lossy(&output.stderr)
);
}
image_tag
})
.clone()
}
fn build_env(layout: &TestLayout) -> BTreeMap<String, String> {
let mut env = BTreeMap::new();
env.insert(
"HOME".to_string(),
layout.home.to_string_lossy().to_string(),
);
env.insert(
"XDG_DATA_HOME".to_string(),
layout.xdg_data_home.to_string_lossy().to_string(),
);
env.insert(
"XDG_STATE_HOME".to_string(),
layout.xdg_state_home.to_string_lossy().to_string(),
);
env.insert("PATH".to_string(), DEFAULT_PATH.to_string());
env
}
fn build_mounts(root: &Path) -> Vec<PathBuf> {
vec![root.to_path_buf()]
}
fn run_container(
container_id: &str,
image: &str,
mounts: &[PathBuf],
env: &BTreeMap<String, String>,
) -> String {
let mut args = vec![
"run".to_string(),
"-d".to_string(),
"--rm".to_string(),
"--name".to_string(),
container_id.to_string(),
"-p".to_string(),
format!("127.0.0.1::{CONTAINER_PORT}"),
];
if cfg!(target_os = "linux") {
args.push("--add-host".to_string());
args.push("host.docker.internal:host-gateway".to_string());
}
for mount in mounts {
args.push("-v".to_string());
args.push(format!("{}:{}", mount.display(), mount.display()));
}
for (key, value) in env {
args.push("-e".to_string());
args.push(format!("{key}={value}"));
}
args.push(image.to_string());
args.push("server".to_string());
args.push("--host".to_string());
args.push("0.0.0.0".to_string());
args.push("--port".to_string());
args.push(CONTAINER_PORT.to_string());
args.push("--no-token".to_string());
let output = Command::new(docker_bin())
.args(&args)
.output()
.expect("start docker test container");
if !output.status.success() {
panic!(
"failed to start docker test container: {}",
String::from_utf8_lossy(&output.stderr)
);
}
let port_output = Command::new(docker_bin())
.args(["port", container_id, &format!("{CONTAINER_PORT}/tcp")])
.output()
.expect("resolve mapped docker port");
if !port_output.status.success() {
panic!(
"failed to resolve docker test port: {}",
String::from_utf8_lossy(&port_output.stderr)
);
}
let mapping = String::from_utf8(port_output.stdout)
.expect("docker port utf8")
.trim()
.to_string();
let host_port = mapping.rsplit(':').next().expect("mapped host port").trim();
let base_url = format!("http://127.0.0.1:{host_port}");
wait_for_health(&base_url);
base_url
}
fn wait_for_health(base_url: &str) {
let started = SystemTime::now();
loop {
if probe_health(base_url) {
return;
}
if started
.elapsed()
.unwrap_or_else(|_| Duration::from_secs(0))
.gt(&Duration::from_secs(60))
{
panic!("timed out waiting for common-software docker test server");
}
thread::sleep(Duration::from_millis(200));
}
}
fn probe_health(base_url: &str) -> bool {
let address = base_url.strip_prefix("http://").unwrap_or(base_url);
let mut stream = match TcpStream::connect(address) {
Ok(stream) => stream,
Err(_) => return false,
};
let _ = stream.set_read_timeout(Some(Duration::from_secs(2)));
let _ = stream.set_write_timeout(Some(Duration::from_secs(2)));
let request =
format!("GET /v1/health HTTP/1.1\r\nHost: {address}\r\nConnection: close\r\n\r\n");
if stream.write_all(request.as_bytes()).is_err() {
return false;
}
let mut response = String::new();
if stream.read_to_string(&mut response).is_err() {
return false;
}
response.starts_with("HTTP/1.1 200") || response.starts_with("HTTP/1.0 200")
}
fn unique_container_id() -> String {
let millis = SystemTime::now()
.duration_since(UNIX_EPOCH)
.map(|value| value.as_millis())
.unwrap_or(0);
let counter = CONTAINER_COUNTER.fetch_add(1, Ordering::Relaxed);
format!(
"sandbox-agent-common-sw-{}-{millis}-{counter}",
std::process::id()
)
}
fn repo_root() -> PathBuf {
PathBuf::from(env!("CARGO_MANIFEST_DIR"))
.join("../../..")
.canonicalize()
.expect("repo root")
}
fn docker_bin() -> &'static Path {
DOCKER_BIN
.get_or_init(|| {
if let Some(value) = std::env::var_os("SANDBOX_AGENT_TEST_DOCKER_BIN") {
let path = PathBuf::from(value);
if path.exists() {
return path;
}
}
for candidate in [
"/usr/local/bin/docker",
"/opt/homebrew/bin/docker",
"/usr/bin/docker",
] {
let path = PathBuf::from(candidate);
if path.exists() {
return path;
}
}
PathBuf::from("docker")
})
.as_path()
}