diff --git a/.context/attachments/CleanShot 2026-03-08 at 18.53.28@2x.png b/.context/attachments/CleanShot 2026-03-08 at 18.53.28@2x.png new file mode 100644 index 0000000..955a813 Binary files /dev/null and b/.context/attachments/CleanShot 2026-03-08 at 18.53.28@2x.png differ diff --git a/.context/attachments/PR instructions.md b/.context/attachments/PR instructions.md new file mode 100644 index 0000000..e3d2c36 --- /dev/null +++ b/.context/attachments/PR instructions.md @@ -0,0 +1,19 @@ +The user likes the current state of the code. + +There are 27 uncommitted changes. +The current branch is desktop-use. +The target branch is origin/main. + +There is no upstream branch yet. +The user requested a PR. + +Follow these steps to create a PR: + +- If you have any skills related to creating PRs, invoke them now. Instructions there should take precedence over these instructions. +- Run `git diff` to review uncommitted changes +- Commit them. Follow any instructions the user gave you about writing commit messages. +- Push to origin. +- Use `git diff origin/main...` to review the PR diff +- Use `gh pr create --base main` to create a PR onto the target branch. Keep the title under 80 characters. Keep the description under five sentences, unless the user instructed you otherwise. Describe not just changes made in this session but ALL changes in the workspace diff. + +If any of these steps fail, ask the user for help. diff --git a/.context/attachments/Review request-v1.md b/.context/attachments/Review request-v1.md new file mode 100644 index 0000000..0a800c7 --- /dev/null +++ b/.context/attachments/Review request-v1.md @@ -0,0 +1,101 @@ +## Code Review Instructions + +1. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including: + + - The root CLAUDE.md file, if it exists + - Any CLAUDE.md files in directories containing files modified by the workspace diff (use mcp__conductor__GetWorkspaceDiff with stat option) + +2. If this workspace has an associated PR, read the title and description (but not the changes). This will be helpful context. + +3. In parallel with step 2, launch a sonnet agent to view the changes, using mcp__conductor__GetWorkspaceDiff, and return a summary of the changes + +4. Launch 4 agents in parallel to independently review the changes using mcp__conductor__GetWorkspaceDiff. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following: + + Agents 1 + 2: CLAUDE.md or AGENTS.md compliance sonnet agents + Audit changes for CLAUDE.md or AGENTS.md compliance in parallel. Note: When evaluating CLAUDE.md or AGENTS.md compliance for a file, you should only consider CLAUDE.md or AGENTS.md files that share a file path with the file or parents. + + Agent 3: Opus bug agent + Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff. + + Agent 4: Opus bug agent + Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code. + + **CRITICAL: We only want HIGH SIGNAL issues.** This means: + + - Objective bugs that will cause incorrect behavior at runtime + - Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken + + We do NOT want: + + - Subjective concerns or "suggestions" + - Style preferences not explicitly required by CLAUDE.md + - Potential issues that "might" be problems + - Anything requiring interpretation or judgment calls + + If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time. + + In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent. + +5. For each issue found in the previous step, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations. + +6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review. + +7. Post inline comments for each issue using mcp__conductor__DiffComment: + + **IMPORTANT: Only post ONE comment per unique issue.** + +8. Write out a list of issues found, along with the location of the comment. For example: + + + ### **#1 Empty input causes crash** + + If the input field is empty when page loads, the app will crash. + + File: src/ui/Input.tsx + + ### **#2 Dead code** + + The getUserData function is now unused. It should be deleted. + + File: src/core/UserData.ts + + +Use this list when evaluating issues in Steps 5 and 6 (these are false positives, do NOT flag): + +- Pre-existing issues +- Something that appears to be a bug but is actually correct +- Pedantic nitpicks that a senior engineer would not flag +- Issues that a linter will catch (do not run the linter to verify) +- General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md or AGENTS.md +- Issues mentioned in CLAUDE.md or AGENTS.md but explicitly silenced in the code (e.g., via a lint ignore comment) + +Notes: + +- All subagents should be explicitly instructed not to post comments themselves. Only you, the main agent, should post comments. +- Do not use the AskUserQuestion tool. Your goal should be to complete the entire review without user intervention. +- Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch. +- You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md or AGENTS.md rule, include a link to it). + +## Fallback: if you don't have access to subagents + +If you don't have subagents, perform all the steps above yourself sequentially instead of launching agents. Do each review axis (CLAUDE.md compliance, bug scan, introduced problems) yourself, and validate each issue yourself. + +## Fallback: if you don't have access to the workspace diff tool + +If you don't have access to the mcp__conductor__GetWorkspaceDiff tool, use the following git commands to get the diff: + +```bash +# Get the merge base between this branch and the target +MERGE_BASE=$(git merge-base origin/main HEAD) + +# Get the committed diff against the merge base +git diff $MERGE_BASE HEAD + +# Get any uncommitted changes (staged and unstaged) +git diff HEAD +``` + +Review the combination of both outputs: the first shows all committed changes on this branch relative to the target, and the second shows any uncommitted work in progress. + +No need to mention in your report whether or not you used one of the fallback strategies; it's usually irrelevant. + diff --git a/.context/attachments/Review request-v2.md b/.context/attachments/Review request-v2.md new file mode 100644 index 0000000..0a800c7 --- /dev/null +++ b/.context/attachments/Review request-v2.md @@ -0,0 +1,101 @@ +## Code Review Instructions + +1. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including: + + - The root CLAUDE.md file, if it exists + - Any CLAUDE.md files in directories containing files modified by the workspace diff (use mcp__conductor__GetWorkspaceDiff with stat option) + +2. If this workspace has an associated PR, read the title and description (but not the changes). This will be helpful context. + +3. In parallel with step 2, launch a sonnet agent to view the changes, using mcp__conductor__GetWorkspaceDiff, and return a summary of the changes + +4. Launch 4 agents in parallel to independently review the changes using mcp__conductor__GetWorkspaceDiff. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following: + + Agents 1 + 2: CLAUDE.md or AGENTS.md compliance sonnet agents + Audit changes for CLAUDE.md or AGENTS.md compliance in parallel. Note: When evaluating CLAUDE.md or AGENTS.md compliance for a file, you should only consider CLAUDE.md or AGENTS.md files that share a file path with the file or parents. + + Agent 3: Opus bug agent + Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff. + + Agent 4: Opus bug agent + Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code. + + **CRITICAL: We only want HIGH SIGNAL issues.** This means: + + - Objective bugs that will cause incorrect behavior at runtime + - Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken + + We do NOT want: + + - Subjective concerns or "suggestions" + - Style preferences not explicitly required by CLAUDE.md + - Potential issues that "might" be problems + - Anything requiring interpretation or judgment calls + + If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time. + + In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent. + +5. For each issue found in the previous step, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations. + +6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review. + +7. Post inline comments for each issue using mcp__conductor__DiffComment: + + **IMPORTANT: Only post ONE comment per unique issue.** + +8. Write out a list of issues found, along with the location of the comment. For example: + + + ### **#1 Empty input causes crash** + + If the input field is empty when page loads, the app will crash. + + File: src/ui/Input.tsx + + ### **#2 Dead code** + + The getUserData function is now unused. It should be deleted. + + File: src/core/UserData.ts + + +Use this list when evaluating issues in Steps 5 and 6 (these are false positives, do NOT flag): + +- Pre-existing issues +- Something that appears to be a bug but is actually correct +- Pedantic nitpicks that a senior engineer would not flag +- Issues that a linter will catch (do not run the linter to verify) +- General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md or AGENTS.md +- Issues mentioned in CLAUDE.md or AGENTS.md but explicitly silenced in the code (e.g., via a lint ignore comment) + +Notes: + +- All subagents should be explicitly instructed not to post comments themselves. Only you, the main agent, should post comments. +- Do not use the AskUserQuestion tool. Your goal should be to complete the entire review without user intervention. +- Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch. +- You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md or AGENTS.md rule, include a link to it). + +## Fallback: if you don't have access to subagents + +If you don't have subagents, perform all the steps above yourself sequentially instead of launching agents. Do each review axis (CLAUDE.md compliance, bug scan, introduced problems) yourself, and validate each issue yourself. + +## Fallback: if you don't have access to the workspace diff tool + +If you don't have access to the mcp__conductor__GetWorkspaceDiff tool, use the following git commands to get the diff: + +```bash +# Get the merge base between this branch and the target +MERGE_BASE=$(git merge-base origin/main HEAD) + +# Get the committed diff against the merge base +git diff $MERGE_BASE HEAD + +# Get any uncommitted changes (staged and unstaged) +git diff HEAD +``` + +Review the combination of both outputs: the first shows all committed changes on this branch relative to the target, and the second shows any uncommitted work in progress. + +No need to mention in your report whether or not you used one of the fallback strategies; it's usually irrelevant. + diff --git a/.context/attachments/Review request-v3.md b/.context/attachments/Review request-v3.md new file mode 100644 index 0000000..0a800c7 --- /dev/null +++ b/.context/attachments/Review request-v3.md @@ -0,0 +1,101 @@ +## Code Review Instructions + +1. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including: + + - The root CLAUDE.md file, if it exists + - Any CLAUDE.md files in directories containing files modified by the workspace diff (use mcp__conductor__GetWorkspaceDiff with stat option) + +2. If this workspace has an associated PR, read the title and description (but not the changes). This will be helpful context. + +3. In parallel with step 2, launch a sonnet agent to view the changes, using mcp__conductor__GetWorkspaceDiff, and return a summary of the changes + +4. Launch 4 agents in parallel to independently review the changes using mcp__conductor__GetWorkspaceDiff. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following: + + Agents 1 + 2: CLAUDE.md or AGENTS.md compliance sonnet agents + Audit changes for CLAUDE.md or AGENTS.md compliance in parallel. Note: When evaluating CLAUDE.md or AGENTS.md compliance for a file, you should only consider CLAUDE.md or AGENTS.md files that share a file path with the file or parents. + + Agent 3: Opus bug agent + Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff. + + Agent 4: Opus bug agent + Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code. + + **CRITICAL: We only want HIGH SIGNAL issues.** This means: + + - Objective bugs that will cause incorrect behavior at runtime + - Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken + + We do NOT want: + + - Subjective concerns or "suggestions" + - Style preferences not explicitly required by CLAUDE.md + - Potential issues that "might" be problems + - Anything requiring interpretation or judgment calls + + If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time. + + In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent. + +5. For each issue found in the previous step, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations. + +6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review. + +7. Post inline comments for each issue using mcp__conductor__DiffComment: + + **IMPORTANT: Only post ONE comment per unique issue.** + +8. Write out a list of issues found, along with the location of the comment. For example: + + + ### **#1 Empty input causes crash** + + If the input field is empty when page loads, the app will crash. + + File: src/ui/Input.tsx + + ### **#2 Dead code** + + The getUserData function is now unused. It should be deleted. + + File: src/core/UserData.ts + + +Use this list when evaluating issues in Steps 5 and 6 (these are false positives, do NOT flag): + +- Pre-existing issues +- Something that appears to be a bug but is actually correct +- Pedantic nitpicks that a senior engineer would not flag +- Issues that a linter will catch (do not run the linter to verify) +- General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md or AGENTS.md +- Issues mentioned in CLAUDE.md or AGENTS.md but explicitly silenced in the code (e.g., via a lint ignore comment) + +Notes: + +- All subagents should be explicitly instructed not to post comments themselves. Only you, the main agent, should post comments. +- Do not use the AskUserQuestion tool. Your goal should be to complete the entire review without user intervention. +- Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch. +- You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md or AGENTS.md rule, include a link to it). + +## Fallback: if you don't have access to subagents + +If you don't have subagents, perform all the steps above yourself sequentially instead of launching agents. Do each review axis (CLAUDE.md compliance, bug scan, introduced problems) yourself, and validate each issue yourself. + +## Fallback: if you don't have access to the workspace diff tool + +If you don't have access to the mcp__conductor__GetWorkspaceDiff tool, use the following git commands to get the diff: + +```bash +# Get the merge base between this branch and the target +MERGE_BASE=$(git merge-base origin/main HEAD) + +# Get the committed diff against the merge base +git diff $MERGE_BASE HEAD + +# Get any uncommitted changes (staged and unstaged) +git diff HEAD +``` + +Review the combination of both outputs: the first shows all committed changes on this branch relative to the target, and the second shows any uncommitted work in progress. + +No need to mention in your report whether or not you used one of the fallback strategies; it's usually irrelevant. + diff --git a/.context/attachments/Review request.md b/.context/attachments/Review request.md new file mode 100644 index 0000000..0a800c7 --- /dev/null +++ b/.context/attachments/Review request.md @@ -0,0 +1,101 @@ +## Code Review Instructions + +1. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including: + + - The root CLAUDE.md file, if it exists + - Any CLAUDE.md files in directories containing files modified by the workspace diff (use mcp__conductor__GetWorkspaceDiff with stat option) + +2. If this workspace has an associated PR, read the title and description (but not the changes). This will be helpful context. + +3. In parallel with step 2, launch a sonnet agent to view the changes, using mcp__conductor__GetWorkspaceDiff, and return a summary of the changes + +4. Launch 4 agents in parallel to independently review the changes using mcp__conductor__GetWorkspaceDiff. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following: + + Agents 1 + 2: CLAUDE.md or AGENTS.md compliance sonnet agents + Audit changes for CLAUDE.md or AGENTS.md compliance in parallel. Note: When evaluating CLAUDE.md or AGENTS.md compliance for a file, you should only consider CLAUDE.md or AGENTS.md files that share a file path with the file or parents. + + Agent 3: Opus bug agent + Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff. + + Agent 4: Opus bug agent + Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code. + + **CRITICAL: We only want HIGH SIGNAL issues.** This means: + + - Objective bugs that will cause incorrect behavior at runtime + - Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken + + We do NOT want: + + - Subjective concerns or "suggestions" + - Style preferences not explicitly required by CLAUDE.md + - Potential issues that "might" be problems + - Anything requiring interpretation or judgment calls + + If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time. + + In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent. + +5. For each issue found in the previous step, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations. + +6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review. + +7. Post inline comments for each issue using mcp__conductor__DiffComment: + + **IMPORTANT: Only post ONE comment per unique issue.** + +8. Write out a list of issues found, along with the location of the comment. For example: + + + ### **#1 Empty input causes crash** + + If the input field is empty when page loads, the app will crash. + + File: src/ui/Input.tsx + + ### **#2 Dead code** + + The getUserData function is now unused. It should be deleted. + + File: src/core/UserData.ts + + +Use this list when evaluating issues in Steps 5 and 6 (these are false positives, do NOT flag): + +- Pre-existing issues +- Something that appears to be a bug but is actually correct +- Pedantic nitpicks that a senior engineer would not flag +- Issues that a linter will catch (do not run the linter to verify) +- General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md or AGENTS.md +- Issues mentioned in CLAUDE.md or AGENTS.md but explicitly silenced in the code (e.g., via a lint ignore comment) + +Notes: + +- All subagents should be explicitly instructed not to post comments themselves. Only you, the main agent, should post comments. +- Do not use the AskUserQuestion tool. Your goal should be to complete the entire review without user intervention. +- Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch. +- You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md or AGENTS.md rule, include a link to it). + +## Fallback: if you don't have access to subagents + +If you don't have subagents, perform all the steps above yourself sequentially instead of launching agents. Do each review axis (CLAUDE.md compliance, bug scan, introduced problems) yourself, and validate each issue yourself. + +## Fallback: if you don't have access to the workspace diff tool + +If you don't have access to the mcp__conductor__GetWorkspaceDiff tool, use the following git commands to get the diff: + +```bash +# Get the merge base between this branch and the target +MERGE_BASE=$(git merge-base origin/main HEAD) + +# Get the committed diff against the merge base +git diff $MERGE_BASE HEAD + +# Get any uncommitted changes (staged and unstaged) +git diff HEAD +``` + +Review the combination of both outputs: the first shows all committed changes on this branch relative to the target, and the second shows any uncommitted work in progress. + +No need to mention in your report whether or not you used one of the fallback strategies; it's usually irrelevant. + diff --git a/.context/attachments/plan.md b/.context/attachments/plan.md new file mode 100644 index 0000000..2749e27 --- /dev/null +++ b/.context/attachments/plan.md @@ -0,0 +1,215 @@ +# Desktop Computer Use API Enhancements + +## Context + +Competitive analysis of Daytona, Cloudflare Sandbox SDK, and CUA revealed significant gaps in our desktop computer use API. Both Daytona and Cloudflare have or are building screenshot compression, hotkey combos, mouseDown/mouseUp, keyDown/keyUp, per-component process health, and live desktop streaming. CUA additionally has window management and accessibility trees. We have none of these. This plan closes the most impactful gaps across 7 tasks. + +## Execution Order + +``` +Sprint 1 (parallel, no dependencies): Tasks 1, 2, 3, 4 +Sprint 2 (foundational refactor): Task 5 +Sprint 3 (parallel, depend on #5): Tasks 6, 7 +``` + +--- + +## Task 1: Unify keyboard press with object modifiers + +**What**: Change `DesktopKeyboardPressRequest` to accept a `modifiers` object instead of requiring DSL strings like `"ctrl+c"`. + +**Files**: +- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopKeyModifiers { ctrl, shift, alt, cmd }` struct (all `Option`). Add `modifiers: Option` to `DesktopKeyboardPressRequest`. +- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Modify `press_key_args()` (~line 1349) to build xdotool key string from modifiers object. If modifiers present, construct `"ctrl+shift+a"` style string. `cmd` maps to `super`. +- `server/packages/sandbox-agent/src/router.rs` — Add `DesktopKeyModifiers` to OpenAPI schemas list. +- `docs/openapi.json` — Regenerate. + +**Backward compatible**: Old `{"key": "ctrl+a"}` still works. New form: `{"key": "a", "modifiers": {"ctrl": true}}`. + +**Test**: Unit test that `press_key_args("a", Some({ctrl: true, shift: true}))` produces `["key", "--", "ctrl+shift+a"]`. Integration test with both old and new request shapes. + +--- + +## Task 2: Add mouseDown/mouseUp and keyDown/keyUp endpoints + +**What**: 4 new endpoints for low-level press/release control. + +**Endpoints**: +- `POST /v1/desktop/mouse/down` — `xdotool mousedown BUTTON` (optional x,y moves first) +- `POST /v1/desktop/mouse/up` — `xdotool mouseup BUTTON` +- `POST /v1/desktop/keyboard/down` — `xdotool keydown KEY` +- `POST /v1/desktop/keyboard/up` — `xdotool keyup KEY` + +**Files**: +- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopMouseDownRequest`, `DesktopMouseUpRequest` (x/y optional, button optional), `DesktopKeyboardDownRequest`, `DesktopKeyboardUpRequest` (key: String). +- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Add 4 public methods following existing `click_mouse()` / `press_key()` patterns. +- `server/packages/sandbox-agent/src/router.rs` — Add 4 routes, 4 handlers with utoipa annotations. +- `sdks/typescript/src/client.ts` — Add `mouseDownDesktop()`, `mouseUpDesktop()`, `keyDownDesktop()`, `keyUpDesktop()`. +- `docs/openapi.json` — Regenerate. + +**Test**: Integration test: mouseDown → mousemove → mouseUp sequence. keyDown → keyUp sequence. + +--- + +## Task 3: Screenshot compression + +**What**: Add format, quality, and scale query params to screenshot endpoints. + +**Params**: `format` (png|jpeg|webp, default png), `quality` (1-100, default 85), `scale` (0.1-1.0, default 1.0). + +**Files**: +- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopScreenshotFormat` enum. Add `format`, `quality`, `scale` fields to `DesktopScreenshotQuery` and `DesktopRegionScreenshotQuery`. +- `server/packages/sandbox-agent/src/desktop_runtime.rs` — After capturing PNG via `import`, pipe through ImageMagick `convert` if format != png or scale != 1.0: `convert png:- -resize {scale*100}% -quality {quality} {format}:-`. Add a `run_command_with_stdin()` helper (or modify existing `run_command_output`) to pipe bytes into a command's stdin. +- `server/packages/sandbox-agent/src/router.rs` — Modify screenshot handlers to pass format/quality/scale, return dynamic `Content-Type` header. +- `sdks/typescript/src/client.ts` — Update `takeDesktopScreenshot()` to accept format/quality/scale. +- `docs/openapi.json` — Regenerate. + +**Dependencies**: ImageMagick `convert` already installed in Docker. Verify WebP delegate availability. + +**Test**: Integration tests: request `?format=jpeg&quality=50`, verify `Content-Type: image/jpeg` and JPEG magic bytes. Verify default still returns PNG. Verify `?scale=0.5` returns a smaller image. + +--- + +## Task 4: Window listing API + +**What**: New endpoint to list open windows. + +**Endpoint**: `GET /v1/desktop/windows` + +**Files**: +- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopWindowInfo { id, title, x, y, width, height, is_active }` and `DesktopWindowListResponse`. +- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Add `list_windows()` method using xdotool (already installed): + 1. `xdotool search --onlyvisible --name ""` → window IDs + 2. `xdotool getwindowname {id}` + `xdotool getwindowgeometry {id}` per window + 3. `xdotool getactivewindow` → is_active flag + 4. Add `parse_window_geometry()` helper. +- `server/packages/sandbox-agent/src/router.rs` — Add route, handler, OpenAPI annotations. +- `sdks/typescript/src/client.ts` — Add `listDesktopWindows()`. +- `docs/openapi.json` — Regenerate. + +**No new Docker dependencies** — xdotool already installed. + +**Test**: Integration test: start desktop, verify `GET /v1/desktop/windows` returns 200 with a list (may be empty if no GUI apps open, which is fine). + +--- + +## Task 5: Unify desktop processes into process runtime with owner flag + +**What**: Desktop processes (Xvfb, openbox, dbus) get registered in the general process runtime with an `owner` field, gaining log streaming, SSE, and unified lifecycle for free. + +**Files**: + +- `server/packages/sandbox-agent/src/process_runtime.rs`: + - Add `ProcessOwner` enum: `User`, `Desktop`, `System`. + - Add `RestartPolicy` enum: `Never`, `Always`, `OnFailure`. + - Add `owner: ProcessOwner` and `restart_policy: Option` to `ProcessStartSpec`, `ManagedProcess`, and `ProcessSnapshot`. + - Modify `list_processes()` to accept optional owner filter. + - Add auto-restart logic in `watch_exit()`: if restart_policy is Always (or OnFailure and exit code != 0), re-spawn the process using stored spec. Need to store the original `ProcessStartSpec` on `ManagedProcess`. + +- `server/packages/sandbox-agent/src/router/types.rs`: + - Add `owner` to `ProcessInfo` response. + - Add `ProcessListQuery { owner: Option }`. + +- `server/packages/sandbox-agent/src/router.rs`: + - Modify `get_v1_processes` to accept `Query` and filter. + - Pass `ProcessRuntime` into `DesktopRuntime::new()`. + - Add `ProcessOwner`, `RestartPolicy` to OpenAPI schemas. + +- `server/packages/sandbox-agent/src/desktop_runtime.rs` — **Major refactor**: + - Remove `ManagedDesktopChild` struct. + - `DesktopRuntime` takes `ProcessRuntime` as constructor param. + - `start_xvfb_locked()` and `start_openbox_locked()` call `process_runtime.start_process(ProcessStartSpec { owner: Desktop, restart_policy: Some(Always), ... })` instead of spawning directly. + - Store returned process IDs in state instead of `Child` handles. + - `stop` calls `process_runtime.stop_process()` / `kill_process()`. + - `processes_locked()` queries process runtime for desktop-owned processes. + - dbus-launch remains a direct one-shot spawn (it's not a long-running process, just produces env vars). + +- `sdks/typescript/src/client.ts` — Add `owner` filter option to `listProcesses()`. +- `docs/openapi.json` — Regenerate. + +**Risks**: +- Lock ordering: desktop runtime holds Mutex, process runtime uses RwLock. Release desktop Mutex before calling process runtime, or restructure. +- `log_path` field in `DesktopProcessInfo` no longer applies (logs are in-memory now). Remove or deprecate. + +**Test**: Integration: start desktop, `GET /v1/processes?owner=desktop` returns Xvfb+openbox. `GET /v1/processes?owner=user` excludes them. Desktop process logs are streamable via `GET /v1/processes/{id}/logs?follow=true`. Existing desktop lifecycle tests still pass. + +--- + +## Task 6: Screen recording API (ffmpeg x11grab) + +**What**: 6 endpoints for recording the desktop to MP4. + +**Endpoints**: +- `POST /v1/desktop/recording/start` — Start ffmpeg recording +- `POST /v1/desktop/recording/stop` — Stop recording (SIGTERM → wait → SIGKILL) +- `GET /v1/desktop/recordings` — List recordings +- `GET /v1/desktop/recordings/{id}` — Get recording metadata +- `GET /v1/desktop/recordings/{id}/download` — Serve MP4 file +- `DELETE /v1/desktop/recordings/{id}` — Delete recording + +**Files**: +- **New**: `server/packages/sandbox-agent/src/desktop_recording.rs` — Recording state, ffmpeg process management. `start_recording()` spawns ffmpeg via process runtime (owner=Desktop): `ffmpeg -f x11grab -video_size WxH -i :99 -c:v libx264 -preset ultrafast -r 30 {path}`. Recordings stored in `{state_dir}/recordings/`. +- `server/packages/sandbox-agent/src/desktop_types.rs` — Add recording request/response types. +- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Wire recording manager, expose through desktop runtime. +- `server/packages/sandbox-agent/src/router.rs` — Add 6 routes + handlers. +- `server/packages/sandbox-agent/src/desktop_install.rs` — Add `ffmpeg` to dependency detection (soft: only error when recording is requested). +- `docker/runtime/Dockerfile` and `docker/test-agent/Dockerfile` — Add `ffmpeg` to apt-get. +- `sdks/typescript/src/client.ts` — Add 6 recording methods. +- `docs/openapi.json` — Regenerate. + +**Depends on**: Task 5 (ffmpeg runs as desktop-owned process). + +**Test**: Integration: start desktop → start recording → wait 2s → stop → list → download (verify MP4 magic bytes) → delete. + +--- + +## Task 7: Neko WebRTC desktop streaming + React component + +**What**: Integrate neko for WebRTC desktop streaming, mirroring the ProcessTerminal + Ghostty pattern. + +### Server side + +- **New**: `server/packages/sandbox-agent/src/desktop_streaming.rs` — Manages neko process via process runtime (owner=Desktop). Neko connects to existing Xvfb display, runs GStreamer pipeline for H.264 encoding. +- `server/packages/sandbox-agent/src/router.rs`: + - `GET /v1/desktop/stream/ws` — WebSocket proxy to neko's internal WebSocket. Upgrade request, bridge bidirectionally. + - `POST /v1/desktop/stream/start` / `POST /v1/desktop/stream/stop` — Lifecycle control. +- `docker/runtime/Dockerfile` and `docker/test-agent/Dockerfile` — Add neko binary + GStreamer packages (`gstreamer1.0-plugins-base`, `gstreamer1.0-plugins-good`, `gstreamer1.0-x`, `libgstreamer1.0-0`). Consider making this an optional Docker stage to avoid bloating the base image. + +### TypeScript SDK + +- **New**: `sdks/typescript/src/desktop-stream.ts` — `DesktopStreamSession` class ported from neko's `base.ts` (~500 lines): + - WebSocket for signaling (SDP offer/answer, ICE candidates) + - `RTCPeerConnection` for video stream + - `RTCDataChannel` for binary input (mouse: 7 bytes, keyboard: 11 bytes) + - Events: `onTrack(stream)`, `onConnect()`, `onDisconnect()`, `onError()` +- `sdks/typescript/src/client.ts` — Add `connectDesktopStream()` returning `DesktopStreamSession`, `buildDesktopStreamWebSocketUrl()`, `startDesktopStream()`, `stopDesktopStream()`. +- `sdks/typescript/src/index.ts` — Export `DesktopStreamSession`. + +### React SDK + +- **New**: `sdks/react/src/DesktopViewer.tsx` — Following `ProcessTerminal.tsx` pattern: + ``` + Props: client (Pick), height, className, style, onConnect, onDisconnect, onError + ``` + - `useEffect` → `client.connectDesktopStream()` → wire `onTrack` to `