feat: expand desktop computer-use APIs

2026-04-15 08:03:46 +00:00 · 2026-03-15 17:51:58 -07:00 · 2026-03-15 17:51:58 -07:00 · e638148345
commit e638148345
parent 96dcc3d5f9
43 changed files with 6359 additions and 493 deletions
--- a/.context/attachments/CleanShot
+++ b/.context/attachments/CleanShot
--- a/.context/attachments/PR
+++ b/.context/attachments/PR
@ -0,0 +1,19 @@
+The user likes the current state of the code.
+
+There are 27 uncommitted changes.
+The current branch is desktop-use.
+The target branch is origin/main.
+
+There is no upstream branch yet.
+The user requested a PR.
+
+Follow these steps to create a PR:
+
+- If you have any skills related to creating PRs, invoke them now. Instructions there should take precedence over these instructions.
+- Run `git diff` to review uncommitted changes
+- Commit them. Follow any instructions the user gave you about writing commit messages.
+- Push to origin.
+- Use `git diff origin/main...` to review the PR diff
+- Use `gh pr create --base main` to create a PR onto the target branch. Keep the title under 80 characters. Keep the description under five sentences, unless the user instructed you otherwise. Describe not just changes made in this session but ALL changes in the workspace diff.
+
+If any of these steps fail, ask the user for help.
--- a/.context/attachments/Review
+++ b/.context/attachments/Review
@ -0,0 +1,101 @@
+## Code Review Instructions
+
+1. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including:
+
+    - The root CLAUDE.md file, if it exists
+    - Any CLAUDE.md files in directories containing files modified by the workspace diff (use mcp__conductor__GetWorkspaceDiff with stat option)
+
+2. If this workspace has an associated PR, read the title and description (but not the changes). This will be helpful context.
+
+3. In parallel with step 2, launch a sonnet agent to view the changes, using mcp__conductor__GetWorkspaceDiff, and return a summary of the changes
+
+4. Launch 4 agents in parallel to independently review the changes using mcp__conductor__GetWorkspaceDiff. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following:
+
+    Agents 1 + 2: CLAUDE.md or AGENTS.md compliance sonnet agents
+    Audit changes for CLAUDE.md or AGENTS.md compliance in parallel. Note: When evaluating CLAUDE.md or AGENTS.md compliance for a file, you should only consider CLAUDE.md or AGENTS.md files that share a file path with the file or parents.
+
+    Agent 3: Opus bug agent
+    Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff.
+
+    Agent 4: Opus bug agent
+    Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code.
+
+    **CRITICAL: We only want HIGH SIGNAL issues.** This means:
+
+    - Objective bugs that will cause incorrect behavior at runtime
+    - Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken
+
+    We do NOT want:
+
+    - Subjective concerns or "suggestions"
+    - Style preferences not explicitly required by CLAUDE.md
+    - Potential issues that "might" be problems
+    - Anything requiring interpretation or judgment calls
+
+    If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time.
+
+    In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent.
+
+5. For each issue found in the previous step, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations.
+
+6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review.
+
+7. Post inline comments for each issue using mcp__conductor__DiffComment:
+
+    **IMPORTANT: Only post ONE comment per unique issue.**
+
+8. Write out a list of issues found, along with the location of the comment. For example:
+
+    <example>
+    ### **#1 Empty input causes crash**
+
+    If the input field is empty when page loads, the app will crash.
+
+    File: src/ui/Input.tsx
+
+    ### **#2 Dead code**
+
+    The getUserData function is now unused. It should be deleted.
+
+    File: src/core/UserData.ts
+    </example>
+
+Use this list when evaluating issues in Steps 5 and 6 (these are false positives, do NOT flag):
+
+-   Pre-existing issues
+-   Something that appears to be a bug but is actually correct
+-   Pedantic nitpicks that a senior engineer would not flag
+-   Issues that a linter will catch (do not run the linter to verify)
+-   General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md or AGENTS.md
+-   Issues mentioned in CLAUDE.md or AGENTS.md but explicitly silenced in the code (e.g., via a lint ignore comment)
+
+Notes:
+
+-   All subagents should be explicitly instructed not to post comments themselves. Only you, the main agent, should post comments.
+-   Do not use the AskUserQuestion tool. Your goal should be to complete the entire review without user intervention.
+-   Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch.
+-   You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md or AGENTS.md rule, include a link to it).
+
+## Fallback: if you don't have access to subagents
+
+If you don't have subagents, perform all the steps above yourself sequentially instead of launching agents. Do each review axis (CLAUDE.md compliance, bug scan, introduced problems) yourself, and validate each issue yourself.
+
+## Fallback: if you don't have access to the workspace diff tool
+
+If you don't have access to the mcp__conductor__GetWorkspaceDiff tool, use the following git commands to get the diff:
+
+```bash
+# Get the merge base between this branch and the target
+MERGE_BASE=$(git merge-base origin/main HEAD)
+
+# Get the committed diff against the merge base
+git diff $MERGE_BASE HEAD
+
+# Get any uncommitted changes (staged and unstaged)
+git diff HEAD
+```
+
+Review the combination of both outputs: the first shows all committed changes on this branch relative to the target, and the second shows any uncommitted work in progress.
+
+No need to mention in your report whether or not you used one of the fallback strategies; it's usually irrelevant.
+
--- a/.context/attachments/Review
+++ b/.context/attachments/Review
@ -0,0 +1,101 @@
+## Code Review Instructions
+
+1. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including:
+
+    - The root CLAUDE.md file, if it exists
+    - Any CLAUDE.md files in directories containing files modified by the workspace diff (use mcp__conductor__GetWorkspaceDiff with stat option)
+
+2. If this workspace has an associated PR, read the title and description (but not the changes). This will be helpful context.
+
+3. In parallel with step 2, launch a sonnet agent to view the changes, using mcp__conductor__GetWorkspaceDiff, and return a summary of the changes
+
+4. Launch 4 agents in parallel to independently review the changes using mcp__conductor__GetWorkspaceDiff. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following:
+
+    Agents 1 + 2: CLAUDE.md or AGENTS.md compliance sonnet agents
+    Audit changes for CLAUDE.md or AGENTS.md compliance in parallel. Note: When evaluating CLAUDE.md or AGENTS.md compliance for a file, you should only consider CLAUDE.md or AGENTS.md files that share a file path with the file or parents.
+
+    Agent 3: Opus bug agent
+    Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff.
+
+    Agent 4: Opus bug agent
+    Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code.
+
+    **CRITICAL: We only want HIGH SIGNAL issues.** This means:
+
+    - Objective bugs that will cause incorrect behavior at runtime
+    - Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken
+
+    We do NOT want:
+
+    - Subjective concerns or "suggestions"
+    - Style preferences not explicitly required by CLAUDE.md
+    - Potential issues that "might" be problems
+    - Anything requiring interpretation or judgment calls
+
+    If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time.
+
+    In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent.
+
+5. For each issue found in the previous step, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations.
+
+6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review.
+
+7. Post inline comments for each issue using mcp__conductor__DiffComment:
+
+    **IMPORTANT: Only post ONE comment per unique issue.**
+
+8. Write out a list of issues found, along with the location of the comment. For example:
+
+    <example>
+    ### **#1 Empty input causes crash**
+
+    If the input field is empty when page loads, the app will crash.
+
+    File: src/ui/Input.tsx
+
+    ### **#2 Dead code**
+
+    The getUserData function is now unused. It should be deleted.
+
+    File: src/core/UserData.ts
+    </example>
+
+Use this list when evaluating issues in Steps 5 and 6 (these are false positives, do NOT flag):
+
+-   Pre-existing issues
+-   Something that appears to be a bug but is actually correct
+-   Pedantic nitpicks that a senior engineer would not flag
+-   Issues that a linter will catch (do not run the linter to verify)
+-   General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md or AGENTS.md
+-   Issues mentioned in CLAUDE.md or AGENTS.md but explicitly silenced in the code (e.g., via a lint ignore comment)
+
+Notes:
+
+-   All subagents should be explicitly instructed not to post comments themselves. Only you, the main agent, should post comments.
+-   Do not use the AskUserQuestion tool. Your goal should be to complete the entire review without user intervention.
+-   Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch.
+-   You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md or AGENTS.md rule, include a link to it).
+
+## Fallback: if you don't have access to subagents
+
+If you don't have subagents, perform all the steps above yourself sequentially instead of launching agents. Do each review axis (CLAUDE.md compliance, bug scan, introduced problems) yourself, and validate each issue yourself.
+
+## Fallback: if you don't have access to the workspace diff tool
+
+If you don't have access to the mcp__conductor__GetWorkspaceDiff tool, use the following git commands to get the diff:
+
+```bash
+# Get the merge base between this branch and the target
+MERGE_BASE=$(git merge-base origin/main HEAD)
+
+# Get the committed diff against the merge base
+git diff $MERGE_BASE HEAD
+
+# Get any uncommitted changes (staged and unstaged)
+git diff HEAD
+```
+
+Review the combination of both outputs: the first shows all committed changes on this branch relative to the target, and the second shows any uncommitted work in progress.
+
+No need to mention in your report whether or not you used one of the fallback strategies; it's usually irrelevant.
+
--- a/.context/attachments/Review
+++ b/.context/attachments/Review
@ -0,0 +1,101 @@
+## Code Review Instructions
+
+1. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including:
+
+    - The root CLAUDE.md file, if it exists
+    - Any CLAUDE.md files in directories containing files modified by the workspace diff (use mcp__conductor__GetWorkspaceDiff with stat option)
+
+2. If this workspace has an associated PR, read the title and description (but not the changes). This will be helpful context.
+
+3. In parallel with step 2, launch a sonnet agent to view the changes, using mcp__conductor__GetWorkspaceDiff, and return a summary of the changes
+
+4. Launch 4 agents in parallel to independently review the changes using mcp__conductor__GetWorkspaceDiff. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following:
+
+    Agents 1 + 2: CLAUDE.md or AGENTS.md compliance sonnet agents
+    Audit changes for CLAUDE.md or AGENTS.md compliance in parallel. Note: When evaluating CLAUDE.md or AGENTS.md compliance for a file, you should only consider CLAUDE.md or AGENTS.md files that share a file path with the file or parents.
+
+    Agent 3: Opus bug agent
+    Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff.
+
+    Agent 4: Opus bug agent
+    Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code.
+
+    **CRITICAL: We only want HIGH SIGNAL issues.** This means:
+
+    - Objective bugs that will cause incorrect behavior at runtime
+    - Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken
+
+    We do NOT want:
+
+    - Subjective concerns or "suggestions"
+    - Style preferences not explicitly required by CLAUDE.md
+    - Potential issues that "might" be problems
+    - Anything requiring interpretation or judgment calls
+
+    If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time.
+
+    In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent.
+
+5. For each issue found in the previous step, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations.
+
+6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review.
+
+7. Post inline comments for each issue using mcp__conductor__DiffComment:
+
+    **IMPORTANT: Only post ONE comment per unique issue.**
+
+8. Write out a list of issues found, along with the location of the comment. For example:
+
+    <example>
+    ### **#1 Empty input causes crash**
+
+    If the input field is empty when page loads, the app will crash.
+
+    File: src/ui/Input.tsx
+
+    ### **#2 Dead code**
+
+    The getUserData function is now unused. It should be deleted.
+
+    File: src/core/UserData.ts
+    </example>
+
+Use this list when evaluating issues in Steps 5 and 6 (these are false positives, do NOT flag):
+
+-   Pre-existing issues
+-   Something that appears to be a bug but is actually correct
+-   Pedantic nitpicks that a senior engineer would not flag
+-   Issues that a linter will catch (do not run the linter to verify)
+-   General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md or AGENTS.md
+-   Issues mentioned in CLAUDE.md or AGENTS.md but explicitly silenced in the code (e.g., via a lint ignore comment)
+
+Notes:
+
+-   All subagents should be explicitly instructed not to post comments themselves. Only you, the main agent, should post comments.
+-   Do not use the AskUserQuestion tool. Your goal should be to complete the entire review without user intervention.
+-   Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch.
+-   You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md or AGENTS.md rule, include a link to it).
+
+## Fallback: if you don't have access to subagents
+
+If you don't have subagents, perform all the steps above yourself sequentially instead of launching agents. Do each review axis (CLAUDE.md compliance, bug scan, introduced problems) yourself, and validate each issue yourself.
+
+## Fallback: if you don't have access to the workspace diff tool
+
+If you don't have access to the mcp__conductor__GetWorkspaceDiff tool, use the following git commands to get the diff:
+
+```bash
+# Get the merge base between this branch and the target
+MERGE_BASE=$(git merge-base origin/main HEAD)
+
+# Get the committed diff against the merge base
+git diff $MERGE_BASE HEAD
+
+# Get any uncommitted changes (staged and unstaged)
+git diff HEAD
+```
+
+Review the combination of both outputs: the first shows all committed changes on this branch relative to the target, and the second shows any uncommitted work in progress.
+
+No need to mention in your report whether or not you used one of the fallback strategies; it's usually irrelevant.
+
--- a/.context/attachments/Review
+++ b/.context/attachments/Review
@ -0,0 +1,101 @@
+## Code Review Instructions
+
+1. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including:
+
+    - The root CLAUDE.md file, if it exists
+    - Any CLAUDE.md files in directories containing files modified by the workspace diff (use mcp__conductor__GetWorkspaceDiff with stat option)
+
+2. If this workspace has an associated PR, read the title and description (but not the changes). This will be helpful context.
+
+3. In parallel with step 2, launch a sonnet agent to view the changes, using mcp__conductor__GetWorkspaceDiff, and return a summary of the changes
+
+4. Launch 4 agents in parallel to independently review the changes using mcp__conductor__GetWorkspaceDiff. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following:
+
+    Agents 1 + 2: CLAUDE.md or AGENTS.md compliance sonnet agents
+    Audit changes for CLAUDE.md or AGENTS.md compliance in parallel. Note: When evaluating CLAUDE.md or AGENTS.md compliance for a file, you should only consider CLAUDE.md or AGENTS.md files that share a file path with the file or parents.
+
+    Agent 3: Opus bug agent
+    Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff.
+
+    Agent 4: Opus bug agent
+    Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code.
+
+    **CRITICAL: We only want HIGH SIGNAL issues.** This means:
+
+    - Objective bugs that will cause incorrect behavior at runtime
+    - Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken
+
+    We do NOT want:
+
+    - Subjective concerns or "suggestions"
+    - Style preferences not explicitly required by CLAUDE.md
+    - Potential issues that "might" be problems
+    - Anything requiring interpretation or judgment calls
+
+    If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time.
+
+    In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent.
+
+5. For each issue found in the previous step, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations.
+
+6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review.
+
+7. Post inline comments for each issue using mcp__conductor__DiffComment:
+
+    **IMPORTANT: Only post ONE comment per unique issue.**
+
+8. Write out a list of issues found, along with the location of the comment. For example:
+
+    <example>
+    ### **#1 Empty input causes crash**
+
+    If the input field is empty when page loads, the app will crash.
+
+    File: src/ui/Input.tsx
+
+    ### **#2 Dead code**
+
+    The getUserData function is now unused. It should be deleted.
+
+    File: src/core/UserData.ts
+    </example>
+
+Use this list when evaluating issues in Steps 5 and 6 (these are false positives, do NOT flag):
+
+-   Pre-existing issues
+-   Something that appears to be a bug but is actually correct
+-   Pedantic nitpicks that a senior engineer would not flag
+-   Issues that a linter will catch (do not run the linter to verify)
+-   General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md or AGENTS.md
+-   Issues mentioned in CLAUDE.md or AGENTS.md but explicitly silenced in the code (e.g., via a lint ignore comment)
+
+Notes:
+
+-   All subagents should be explicitly instructed not to post comments themselves. Only you, the main agent, should post comments.
+-   Do not use the AskUserQuestion tool. Your goal should be to complete the entire review without user intervention.
+-   Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch.
+-   You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md or AGENTS.md rule, include a link to it).
+
+## Fallback: if you don't have access to subagents
+
+If you don't have subagents, perform all the steps above yourself sequentially instead of launching agents. Do each review axis (CLAUDE.md compliance, bug scan, introduced problems) yourself, and validate each issue yourself.
+
+## Fallback: if you don't have access to the workspace diff tool
+
+If you don't have access to the mcp__conductor__GetWorkspaceDiff tool, use the following git commands to get the diff:
+
+```bash
+# Get the merge base between this branch and the target
+MERGE_BASE=$(git merge-base origin/main HEAD)
+
+# Get the committed diff against the merge base
+git diff $MERGE_BASE HEAD
+
+# Get any uncommitted changes (staged and unstaged)
+git diff HEAD
+```
+
+Review the combination of both outputs: the first shows all committed changes on this branch relative to the target, and the second shows any uncommitted work in progress.
+
+No need to mention in your report whether or not you used one of the fallback strategies; it's usually irrelevant.
+
--- a/.context/attachments/plan.md
+++ b/.context/attachments/plan.md
@ -0,0 +1,215 @@
+# Desktop Computer Use API Enhancements
+
+## Context
+
+Competitive analysis of Daytona, Cloudflare Sandbox SDK, and CUA revealed significant gaps in our desktop computer use API. Both Daytona and Cloudflare have or are building screenshot compression, hotkey combos, mouseDown/mouseUp, keyDown/keyUp, per-component process health, and live desktop streaming. CUA additionally has window management and accessibility trees. We have none of these. This plan closes the most impactful gaps across 7 tasks.
+
+## Execution Order
+
+```
+Sprint 1 (parallel, no dependencies):  Tasks 1, 2, 3, 4
+Sprint 2 (foundational refactor):      Task 5
+Sprint 3 (parallel, depend on #5):     Tasks 6, 7
+```
+
+---
+
+## Task 1: Unify keyboard press with object modifiers
+
+**What**: Change `DesktopKeyboardPressRequest` to accept a `modifiers` object instead of requiring DSL strings like `"ctrl+c"`.
+
+**Files**:
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopKeyModifiers { ctrl, shift, alt, cmd }` struct (all `Option<bool>`). Add `modifiers: Option<DesktopKeyModifiers>` to `DesktopKeyboardPressRequest`.
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Modify `press_key_args()` (~line 1349) to build xdotool key string from modifiers object. If modifiers present, construct `"ctrl+shift+a"` style string. `cmd` maps to `super`.
+- `server/packages/sandbox-agent/src/router.rs` — Add `DesktopKeyModifiers` to OpenAPI schemas list.
+- `docs/openapi.json` — Regenerate.
+
+**Backward compatible**: Old `{"key": "ctrl+a"}` still works. New form: `{"key": "a", "modifiers": {"ctrl": true}}`.
+
+**Test**: Unit test that `press_key_args("a", Some({ctrl: true, shift: true}))` produces `["key", "--", "ctrl+shift+a"]`. Integration test with both old and new request shapes.
+
+---
+
+## Task 2: Add mouseDown/mouseUp and keyDown/keyUp endpoints
+
+**What**: 4 new endpoints for low-level press/release control.
+
+**Endpoints**:
+- `POST /v1/desktop/mouse/down` — `xdotool mousedown BUTTON` (optional x,y moves first)
+- `POST /v1/desktop/mouse/up` — `xdotool mouseup BUTTON`
+- `POST /v1/desktop/keyboard/down` — `xdotool keydown KEY`
+- `POST /v1/desktop/keyboard/up` — `xdotool keyup KEY`
+
+**Files**:
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopMouseDownRequest`, `DesktopMouseUpRequest` (x/y optional, button optional), `DesktopKeyboardDownRequest`, `DesktopKeyboardUpRequest` (key: String).
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Add 4 public methods following existing `click_mouse()` / `press_key()` patterns.
+- `server/packages/sandbox-agent/src/router.rs` — Add 4 routes, 4 handlers with utoipa annotations.
+- `sdks/typescript/src/client.ts` — Add `mouseDownDesktop()`, `mouseUpDesktop()`, `keyDownDesktop()`, `keyUpDesktop()`.
+- `docs/openapi.json` — Regenerate.
+
+**Test**: Integration test: mouseDown → mousemove → mouseUp sequence. keyDown → keyUp sequence.
+
+---
+
+## Task 3: Screenshot compression
+
+**What**: Add format, quality, and scale query params to screenshot endpoints.
+
+**Params**: `format` (png|jpeg|webp, default png), `quality` (1-100, default 85), `scale` (0.1-1.0, default 1.0).
+
+**Files**:
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopScreenshotFormat` enum. Add `format`, `quality`, `scale` fields to `DesktopScreenshotQuery` and `DesktopRegionScreenshotQuery`.
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — After capturing PNG via `import`, pipe through ImageMagick `convert` if format != png or scale != 1.0: `convert png:- -resize {scale*100}% -quality {quality} {format}:-`. Add a `run_command_with_stdin()` helper (or modify existing `run_command_output`) to pipe bytes into a command's stdin.
+- `server/packages/sandbox-agent/src/router.rs` — Modify screenshot handlers to pass format/quality/scale, return dynamic `Content-Type` header.
+- `sdks/typescript/src/client.ts` — Update `takeDesktopScreenshot()` to accept format/quality/scale.
+- `docs/openapi.json` — Regenerate.
+
+**Dependencies**: ImageMagick `convert` already installed in Docker. Verify WebP delegate availability.
+
+**Test**: Integration tests: request `?format=jpeg&quality=50`, verify `Content-Type: image/jpeg` and JPEG magic bytes. Verify default still returns PNG. Verify `?scale=0.5` returns a smaller image.
+
+---
+
+## Task 4: Window listing API
+
+**What**: New endpoint to list open windows.
+
+**Endpoint**: `GET /v1/desktop/windows`
+
+**Files**:
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopWindowInfo { id, title, x, y, width, height, is_active }` and `DesktopWindowListResponse`.
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Add `list_windows()` method using xdotool (already installed):
+  1. `xdotool search --onlyvisible --name ""` → window IDs
+  2. `xdotool getwindowname {id}` + `xdotool getwindowgeometry {id}` per window
+  3. `xdotool getactivewindow` → is_active flag
+  4. Add `parse_window_geometry()` helper.
+- `server/packages/sandbox-agent/src/router.rs` — Add route, handler, OpenAPI annotations.
+- `sdks/typescript/src/client.ts` — Add `listDesktopWindows()`.
+- `docs/openapi.json` — Regenerate.
+
+**No new Docker dependencies** — xdotool already installed.
+
+**Test**: Integration test: start desktop, verify `GET /v1/desktop/windows` returns 200 with a list (may be empty if no GUI apps open, which is fine).
+
+---
+
+## Task 5: Unify desktop processes into process runtime with owner flag
+
+**What**: Desktop processes (Xvfb, openbox, dbus) get registered in the general process runtime with an `owner` field, gaining log streaming, SSE, and unified lifecycle for free.
+
+**Files**:
+
+- `server/packages/sandbox-agent/src/process_runtime.rs`:
+  - Add `ProcessOwner` enum: `User`, `Desktop`, `System`.
+  - Add `RestartPolicy` enum: `Never`, `Always`, `OnFailure`.
+  - Add `owner: ProcessOwner` and `restart_policy: Option<RestartPolicy>` to `ProcessStartSpec`, `ManagedProcess`, and `ProcessSnapshot`.
+  - Modify `list_processes()` to accept optional owner filter.
+  - Add auto-restart logic in `watch_exit()`: if restart_policy is Always (or OnFailure and exit code != 0), re-spawn the process using stored spec. Need to store the original `ProcessStartSpec` on `ManagedProcess`.
+
+- `server/packages/sandbox-agent/src/router/types.rs`:
+  - Add `owner` to `ProcessInfo` response.
+  - Add `ProcessListQuery { owner: Option<ProcessOwner> }`.
+
+- `server/packages/sandbox-agent/src/router.rs`:
+  - Modify `get_v1_processes` to accept `Query<ProcessListQuery>` and filter.
+  - Pass `ProcessRuntime` into `DesktopRuntime::new()`.
+  - Add `ProcessOwner`, `RestartPolicy` to OpenAPI schemas.
+
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — **Major refactor**:
+  - Remove `ManagedDesktopChild` struct.
+  - `DesktopRuntime` takes `ProcessRuntime` as constructor param.
+  - `start_xvfb_locked()` and `start_openbox_locked()` call `process_runtime.start_process(ProcessStartSpec { owner: Desktop, restart_policy: Some(Always), ... })` instead of spawning directly.
+  - Store returned process IDs in state instead of `Child` handles.
+  - `stop` calls `process_runtime.stop_process()` / `kill_process()`.
+  - `processes_locked()` queries process runtime for desktop-owned processes.
+  - dbus-launch remains a direct one-shot spawn (it's not a long-running process, just produces env vars).
+
+- `sdks/typescript/src/client.ts` — Add `owner` filter option to `listProcesses()`.
+- `docs/openapi.json` — Regenerate.
+
+**Risks**:
+- Lock ordering: desktop runtime holds Mutex, process runtime uses RwLock. Release desktop Mutex before calling process runtime, or restructure.
+- `log_path` field in `DesktopProcessInfo` no longer applies (logs are in-memory now). Remove or deprecate.
+
+**Test**: Integration: start desktop, `GET /v1/processes?owner=desktop` returns Xvfb+openbox. `GET /v1/processes?owner=user` excludes them. Desktop process logs are streamable via `GET /v1/processes/{id}/logs?follow=true`. Existing desktop lifecycle tests still pass.
+
+---
+
+## Task 6: Screen recording API (ffmpeg x11grab)
+
+**What**: 6 endpoints for recording the desktop to MP4.
+
+**Endpoints**:
+- `POST /v1/desktop/recording/start` — Start ffmpeg recording
+- `POST /v1/desktop/recording/stop` — Stop recording (SIGTERM → wait → SIGKILL)
+- `GET /v1/desktop/recordings` — List recordings
+- `GET /v1/desktop/recordings/{id}` — Get recording metadata
+- `GET /v1/desktop/recordings/{id}/download` — Serve MP4 file
+- `DELETE /v1/desktop/recordings/{id}` — Delete recording
+
+**Files**:
+- **New**: `server/packages/sandbox-agent/src/desktop_recording.rs` — Recording state, ffmpeg process management. `start_recording()` spawns ffmpeg via process runtime (owner=Desktop): `ffmpeg -f x11grab -video_size WxH -i :99 -c:v libx264 -preset ultrafast -r 30 {path}`. Recordings stored in `{state_dir}/recordings/`.
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add recording request/response types.
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Wire recording manager, expose through desktop runtime.
+- `server/packages/sandbox-agent/src/router.rs` — Add 6 routes + handlers.
+- `server/packages/sandbox-agent/src/desktop_install.rs` — Add `ffmpeg` to dependency detection (soft: only error when recording is requested).
+- `docker/runtime/Dockerfile` and `docker/test-agent/Dockerfile` — Add `ffmpeg` to apt-get.
+- `sdks/typescript/src/client.ts` — Add 6 recording methods.
+- `docs/openapi.json` — Regenerate.
+
+**Depends on**: Task 5 (ffmpeg runs as desktop-owned process).
+
+**Test**: Integration: start desktop → start recording → wait 2s → stop → list → download (verify MP4 magic bytes) → delete.
+
+---
+
+## Task 7: Neko WebRTC desktop streaming + React component
+
+**What**: Integrate neko for WebRTC desktop streaming, mirroring the ProcessTerminal + Ghostty pattern.
+
+### Server side
+
+- **New**: `server/packages/sandbox-agent/src/desktop_streaming.rs` — Manages neko process via process runtime (owner=Desktop). Neko connects to existing Xvfb display, runs GStreamer pipeline for H.264 encoding.
+- `server/packages/sandbox-agent/src/router.rs`:
+  - `GET /v1/desktop/stream/ws` — WebSocket proxy to neko's internal WebSocket. Upgrade request, bridge bidirectionally.
+  - `POST /v1/desktop/stream/start` / `POST /v1/desktop/stream/stop` — Lifecycle control.
+- `docker/runtime/Dockerfile` and `docker/test-agent/Dockerfile` — Add neko binary + GStreamer packages (`gstreamer1.0-plugins-base`, `gstreamer1.0-plugins-good`, `gstreamer1.0-x`, `libgstreamer1.0-0`). Consider making this an optional Docker stage to avoid bloating the base image.
+
+### TypeScript SDK
+
+- **New**: `sdks/typescript/src/desktop-stream.ts` — `DesktopStreamSession` class ported from neko's `base.ts` (~500 lines):
+  - WebSocket for signaling (SDP offer/answer, ICE candidates)
+  - `RTCPeerConnection` for video stream
+  - `RTCDataChannel` for binary input (mouse: 7 bytes, keyboard: 11 bytes)
+  - Events: `onTrack(stream)`, `onConnect()`, `onDisconnect()`, `onError()`
+- `sdks/typescript/src/client.ts` — Add `connectDesktopStream()` returning `DesktopStreamSession`, `buildDesktopStreamWebSocketUrl()`, `startDesktopStream()`, `stopDesktopStream()`.
+- `sdks/typescript/src/index.ts` — Export `DesktopStreamSession`.
+
+### React SDK
+
+- **New**: `sdks/react/src/DesktopViewer.tsx` — Following `ProcessTerminal.tsx` pattern:
+  ```
+  Props: client (Pick<SandboxAgent, 'connectDesktopStream'>), height, className, style, onConnect, onDisconnect, onError
+  ```
+  - `useEffect` → `client.connectDesktopStream()` → wire `onTrack` to `<video>.srcObject`
+  - Capture mouse events on video element → scale coordinates to desktop resolution → send via DataChannel
+  - Capture keyboard events → send via DataChannel
+  - Connection state indicator
+  - Cleanup: close RTCPeerConnection, close WebSocket
+- `sdks/react/src/index.ts` — Export `DesktopViewer`.
+
+**Depends on**: Task 5 (neko runs as desktop-owned process).
+
+**Test**: Server integration: start stream, connect WebSocket, verify signaling messages flow. React: component mounts/unmounts without errors. Full E2E requires browser (manual initially).
+
+---
+
+## Verification
+
+After all tasks:
+1. `cargo test` — All Rust unit tests pass
+2. `cargo test --test v1_api` — All integration tests pass (requires Docker)
+3. Regenerate `docs/openapi.json` and verify it reflects all new endpoints
+4. Build TypeScript SDK: `cd sdks/typescript && pnpm build`
+5. Build React SDK: `cd sdks/react && pnpm build`
+6. Manual: start desktop, take JPEG screenshot, list windows, record 5s video, stream desktop via DesktopViewer component
--- a/.context/docker-test-image.stamp
+++ b/.context/docker-test-image.stamp
--- a/.context/docker-test-zgvGyf/bin/Xvfb
+++ b/.context/docker-test-zgvGyf/bin/Xvfb
@ -0,0 +1,15 @@
+#!/usr/bin/env sh
+set -eu
+display="${1:-:191}"
+number="${display#:}"
+socket="/tmp/.X11-unix/X${number}"
+mkdir -p /tmp/.X11-unix
+touch "$socket"
+cleanup() {
+  rm -f "$socket"
+  exit 0
+}
+trap cleanup INT TERM EXIT
+while :; do
+  sleep 1
+done
--- a/.context/docker-test-zgvGyf/bin/dbus-launch
+++ b/.context/docker-test-zgvGyf/bin/dbus-launch
@ -0,0 +1,4 @@
+#!/usr/bin/env sh
+set -eu
+echo "DBUS_SESSION_BUS_ADDRESS=unix:path=/tmp/sandbox-agent-test-bus"
+echo "DBUS_SESSION_BUS_PID=$$"
--- a/.context/docker-test-zgvGyf/bin/import
+++ b/.context/docker-test-zgvGyf/bin/import
@ -0,0 +1,3 @@
+#!/usr/bin/env sh
+set -eu
+printf '\211PNG\r\n\032\n\000\000\000\rIHDR\000\000\000\001\000\000\000\001\010\006\000\000\000\037\025\304\211\000\000\000\013IDATx\234c\000\001\000\000\005\000\001\r\n-\264\000\000\000\000IEND\256B`\202'
--- a/.context/docker-test-zgvGyf/bin/openbox
+++ b/.context/docker-test-zgvGyf/bin/openbox
@ -0,0 +1,6 @@
+#!/usr/bin/env sh
+set -eu
+trap 'exit 0' INT TERM
+while :; do
+  sleep 1
+done
--- a/.context/docker-test-zgvGyf/bin/xdotool
+++ b/.context/docker-test-zgvGyf/bin/xdotool
@ -0,0 +1,57 @@
+#!/usr/bin/env sh
+set -eu
+state_dir="${SANDBOX_AGENT_DESKTOP_FAKE_STATE_DIR:?missing fake state dir}"
+state_file="${state_dir}/mouse"
+mkdir -p "$state_dir"
+if [ ! -f "$state_file" ]; then
+  printf '0 0\n' > "$state_file"
+fi
+
+read_state() {
+  read -r x y < "$state_file"
+}
+
+write_state() {
+  printf '%s %s\n' "$1" "$2" > "$state_file"
+}
+
+command="${1:-}"
+case "$command" in
+  getmouselocation)
+    read_state
+    printf 'X=%s\nY=%s\nSCREEN=0\nWINDOW=0\n' "$x" "$y"
+    ;;
+  mousemove)
+    shift
+    x="${1:-0}"
+    y="${2:-0}"
+    shift 2 || true
+    while [ "$#" -gt 0 ]; do
+      token="$1"
+      shift
+      case "$token" in
+        mousemove)
+          x="${1:-0}"
+          y="${2:-0}"
+          shift 2 || true
+          ;;
+        mousedown|mouseup)
+          shift 1 || true
+          ;;
+        click)
+          if [ "${1:-}" = "--repeat" ]; then
+            shift 2 || true
+          fi
+          shift 1 || true
+          ;;
+      esac
+    done
+    write_state "$x" "$y"
+    ;;
+  type|key)
+    exit 0
+    ;;
+  *)
+    exit 0
+    ;;
+esac
--- a/.context/docker-test-zgvGyf/bin/xrandr
+++ b/.context/docker-test-zgvGyf/bin/xrandr
@ -0,0 +1,5 @@
+#!/usr/bin/env sh
+set -eu
+cat <<'EOF'
+Screen 0: minimum 1 x 1, current 1440 x 900, maximum 32767 x 32767
+EOF
--- a/.context/docker-test-zgvGyf/xdg-data/Library/Application
+++ b/.context/docker-test-zgvGyf/xdg-data/Library/Application
@ -0,0 +1,111 @@
+#!/usr/bin/env node
+const { createInterface } = require("node:readline");
+
+let nextSession = 0;
+
+function emit(value) {
+  process.stdout.write(JSON.stringify(value) + "\n");
+}
+
+function firstText(prompt) {
+  if (!Array.isArray(prompt)) {
+    return "";
+  }
+
+  for (const block of prompt) {
+    if (block && block.type === "text" && typeof block.text === "string") {
+      return block.text;
+    }
+  }
+
+  return "";
+}
+
+const rl = createInterface({
+  input: process.stdin,
+  crlfDelay: Infinity,
+});
+
+rl.on("line", (line) => {
+  let msg;
+  try {
+    msg = JSON.parse(line);
+  } catch {
+    return;
+  }
+
+  const hasMethod = typeof msg?.method === "string";
+  const hasId = Object.prototype.hasOwnProperty.call(msg, "id");
+  const method = hasMethod ? msg.method : undefined;
+
+  if (method === "session/prompt") {
+    const sessionId = typeof msg?.params?.sessionId === "string" ? msg.params.sessionId : "";
+    const text = firstText(msg?.params?.prompt);
+    emit({
+      jsonrpc: "2.0",
+      method: "session/update",
+      params: {
+        sessionId,
+        update: {
+          sessionUpdate: "agent_message_chunk",
+          content: {
+            type: "text",
+            text: "mock: " + text,
+          },
+        },
+      },
+    });
+  }
+
+  if (!hasMethod || !hasId) {
+    return;
+  }
+
+  if (method === "initialize") {
+    emit({
+      jsonrpc: "2.0",
+      id: msg.id,
+      result: {
+        protocolVersion: 1,
+        capabilities: {},
+        serverInfo: {
+          name: "mock-acp-agent",
+          version: "0.0.1",
+        },
+      },
+    });
+    return;
+  }
+
+  if (method === "session/new") {
+    nextSession += 1;
+    emit({
+      jsonrpc: "2.0",
+      id: msg.id,
+      result: {
+        sessionId: "mock-session-" + nextSession,
+      },
+    });
+    return;
+  }
+
+  if (method === "session/prompt") {
+    emit({
+      jsonrpc: "2.0",
+      id: msg.id,
+      result: {
+        stopReason: "end_turn",
+      },
+    });
+    return;
+  }
+
+  emit({
+    jsonrpc: "2.0",
+    id: msg.id,
+    result: {
+      ok: true,
+      echoedMethod: method,
+    },
+  });
+});
--- a/.context/docker-test-zgvGyf/xdg-data/sandbox-agent/bin/agent_processes/mock-acp
+++ b/.context/docker-test-zgvGyf/xdg-data/sandbox-agent/bin/agent_processes/mock-acp
@ -0,0 +1,111 @@
+#!/usr/bin/env node
+const { createInterface } = require("node:readline");
+
+let nextSession = 0;
+
+function emit(value) {
+  process.stdout.write(JSON.stringify(value) + "\n");
+}
+
+function firstText(prompt) {
+  if (!Array.isArray(prompt)) {
+    return "";
+  }
+
+  for (const block of prompt) {
+    if (block && block.type === "text" && typeof block.text === "string") {
+      return block.text;
+    }
+  }
+
+  return "";
+}
+
+const rl = createInterface({
+  input: process.stdin,
+  crlfDelay: Infinity,
+});
+
+rl.on("line", (line) => {
+  let msg;
+  try {
+    msg = JSON.parse(line);
+  } catch {
+    return;
+  }
+
+  const hasMethod = typeof msg?.method === "string";
+  const hasId = Object.prototype.hasOwnProperty.call(msg, "id");
+  const method = hasMethod ? msg.method : undefined;
+
+  if (method === "session/prompt") {
+    const sessionId = typeof msg?.params?.sessionId === "string" ? msg.params.sessionId : "";
+    const text = firstText(msg?.params?.prompt);
+    emit({
+      jsonrpc: "2.0",
+      method: "session/update",
+      params: {
+        sessionId,
+        update: {
+          sessionUpdate: "agent_message_chunk",
+          content: {
+            type: "text",
+            text: "mock: " + text,
+          },
+        },
+      },
+    });
+  }
+
+  if (!hasMethod || !hasId) {
+    return;
+  }
+
+  if (method === "initialize") {
+    emit({
+      jsonrpc: "2.0",
+      id: msg.id,
+      result: {
+        protocolVersion: 1,
+        capabilities: {},
+        serverInfo: {
+          name: "mock-acp-agent",
+          version: "0.0.1",
+        },
+      },
+    });
+    return;
+  }
+
+  if (method === "session/new") {
+    nextSession += 1;
+    emit({
+      jsonrpc: "2.0",
+      id: msg.id,
+      result: {
+        sessionId: "mock-session-" + nextSession,
+      },
+    });
+    return;
+  }
+
+  if (method === "session/prompt") {
+    emit({
+      jsonrpc: "2.0",
+      id: msg.id,
+      result: {
+        stopReason: "end_turn",
+      },
+    });
+    return;
+  }
+
+  emit({
+    jsonrpc: "2.0",
+    id: msg.id,
+    result: {
+      ok: true,
+      echoedMethod: method,
+    },
+  });
+});
--- a/.context/docker-test-zgvGyf/xdg-data/sandbox-agent/logs/log-03-08-26
+++ b/.context/docker-test-zgvGyf/xdg-data/sandbox-agent/logs/log-03-08-26
@ -0,0 +1,4 @@
+ts=2026-03-08T07:57:29.140584296Z level=info target=sandbox_agent::telemetry message="anonymous telemetry is enabled, disable with --no-telemetry"
+ts=2026-03-08T07:57:29.141203296Z level=info target=sandbox_agent::cli message="server listening" addr=0.0.0.0:3000
+ts=2026-03-08T07:57:29.298687421Z level=info target=sandbox_agent::router span=http.request span_path=http.request message=request method=GET uri=/v1/health
+ts=2026-03-08T07:57:29.302092338Z level=info target=sandbox_agent::router span=http.request span_path=http.request status="200 OK" latency_ms=3 method=GET uri=/v1/health
--- a/.context/docker-test-zgvGyf/xdg-data/sandbox-agent/telemetry_id
+++ b/.context/docker-test-zgvGyf/xdg-data/sandbox-agent/telemetry_id
@ -0,0 +1 @@
+5a1927c6af3d83586f34112f58e0c8d6
--- a/.context/notes.md
+++ b/.context/notes.md
--- a/.context/plans/desktop-computer-use-api-enhancements.md
+++ b/.context/plans/desktop-computer-use-api-enhancements.md
@ -0,0 +1,215 @@
+# Desktop Computer Use API Enhancements
+
+## Context
+
+Competitive analysis of Daytona, Cloudflare Sandbox SDK, and CUA revealed significant gaps in our desktop computer use API. Both Daytona and Cloudflare have or are building screenshot compression, hotkey combos, mouseDown/mouseUp, keyDown/keyUp, per-component process health, and live desktop streaming. CUA additionally has window management and accessibility trees. We have none of these. This plan closes the most impactful gaps across 7 tasks.
+
+## Execution Order
+
+```
+Sprint 1 (parallel, no dependencies):  Tasks 1, 2, 3, 4
+Sprint 2 (foundational refactor):      Task 5
+Sprint 3 (parallel, depend on #5):     Tasks 6, 7
+```
+
+---
+
+## Task 1: Unify keyboard press with object modifiers
+
+**What**: Change `DesktopKeyboardPressRequest` to accept a `modifiers` object instead of requiring DSL strings like `"ctrl+c"`.
+
+**Files**:
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopKeyModifiers { ctrl, shift, alt, cmd }` struct (all `Option<bool>`). Add `modifiers: Option<DesktopKeyModifiers>` to `DesktopKeyboardPressRequest`.
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Modify `press_key_args()` (~line 1349) to build xdotool key string from modifiers object. If modifiers present, construct `"ctrl+shift+a"` style string. `cmd` maps to `super`.
+- `server/packages/sandbox-agent/src/router.rs` — Add `DesktopKeyModifiers` to OpenAPI schemas list.
+- `docs/openapi.json` — Regenerate.
+
+**Backward compatible**: Old `{"key": "ctrl+a"}` still works. New form: `{"key": "a", "modifiers": {"ctrl": true}}`.
+
+**Test**: Unit test that `press_key_args("a", Some({ctrl: true, shift: true}))` produces `["key", "--", "ctrl+shift+a"]`. Integration test with both old and new request shapes.
+
+---
+
+## Task 2: Add mouseDown/mouseUp and keyDown/keyUp endpoints
+
+**What**: 4 new endpoints for low-level press/release control.
+
+**Endpoints**:
+- `POST /v1/desktop/mouse/down` — `xdotool mousedown BUTTON` (optional x,y moves first)
+- `POST /v1/desktop/mouse/up` — `xdotool mouseup BUTTON`
+- `POST /v1/desktop/keyboard/down` — `xdotool keydown KEY`
+- `POST /v1/desktop/keyboard/up` — `xdotool keyup KEY`
+
+**Files**:
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopMouseDownRequest`, `DesktopMouseUpRequest` (x/y optional, button optional), `DesktopKeyboardDownRequest`, `DesktopKeyboardUpRequest` (key: String).
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Add 4 public methods following existing `click_mouse()` / `press_key()` patterns.
+- `server/packages/sandbox-agent/src/router.rs` — Add 4 routes, 4 handlers with utoipa annotations.
+- `sdks/typescript/src/client.ts` — Add `mouseDownDesktop()`, `mouseUpDesktop()`, `keyDownDesktop()`, `keyUpDesktop()`.
+- `docs/openapi.json` — Regenerate.
+
+**Test**: Integration test: mouseDown → mousemove → mouseUp sequence. keyDown → keyUp sequence.
+
+---
+
+## Task 3: Screenshot compression
+
+**What**: Add format, quality, and scale query params to screenshot endpoints.
+
+**Params**: `format` (png|jpeg|webp, default png), `quality` (1-100, default 85), `scale` (0.1-1.0, default 1.0).
+
+**Files**:
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopScreenshotFormat` enum. Add `format`, `quality`, `scale` fields to `DesktopScreenshotQuery` and `DesktopRegionScreenshotQuery`.
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — After capturing PNG via `import`, pipe through ImageMagick `convert` if format != png or scale != 1.0: `convert png:- -resize {scale*100}% -quality {quality} {format}:-`. Add a `run_command_with_stdin()` helper (or modify existing `run_command_output`) to pipe bytes into a command's stdin.
+- `server/packages/sandbox-agent/src/router.rs` — Modify screenshot handlers to pass format/quality/scale, return dynamic `Content-Type` header.
+- `sdks/typescript/src/client.ts` — Update `takeDesktopScreenshot()` to accept format/quality/scale.
+- `docs/openapi.json` — Regenerate.
+
+**Dependencies**: ImageMagick `convert` already installed in Docker. Verify WebP delegate availability.
+
+**Test**: Integration tests: request `?format=jpeg&quality=50`, verify `Content-Type: image/jpeg` and JPEG magic bytes. Verify default still returns PNG. Verify `?scale=0.5` returns a smaller image.
+
+---
+
+## Task 4: Window listing API
+
+**What**: New endpoint to list open windows.
+
+**Endpoint**: `GET /v1/desktop/windows`
+
+**Files**:
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add `DesktopWindowInfo { id, title, x, y, width, height, is_active }` and `DesktopWindowListResponse`.
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Add `list_windows()` method using xdotool (already installed):
+  1. `xdotool search --onlyvisible --name ""` → window IDs
+  2. `xdotool getwindowname {id}` + `xdotool getwindowgeometry {id}` per window
+  3. `xdotool getactivewindow` → is_active flag
+  4. Add `parse_window_geometry()` helper.
+- `server/packages/sandbox-agent/src/router.rs` — Add route, handler, OpenAPI annotations.
+- `sdks/typescript/src/client.ts` — Add `listDesktopWindows()`.
+- `docs/openapi.json` — Regenerate.
+
+**No new Docker dependencies** — xdotool already installed.
+
+**Test**: Integration test: start desktop, verify `GET /v1/desktop/windows` returns 200 with a list (may be empty if no GUI apps open, which is fine).
+
+---
+
+## Task 5: Unify desktop processes into process runtime with owner flag
+
+**What**: Desktop processes (Xvfb, openbox, dbus) get registered in the general process runtime with an `owner` field, gaining log streaming, SSE, and unified lifecycle for free.
+
+**Files**:
+
+- `server/packages/sandbox-agent/src/process_runtime.rs`:
+  - Add `ProcessOwner` enum: `User`, `Desktop`, `System`.
+  - Add `RestartPolicy` enum: `Never`, `Always`, `OnFailure`.
+  - Add `owner: ProcessOwner` and `restart_policy: Option<RestartPolicy>` to `ProcessStartSpec`, `ManagedProcess`, and `ProcessSnapshot`.
+  - Modify `list_processes()` to accept optional owner filter.
+  - Add auto-restart logic in `watch_exit()`: if restart_policy is Always (or OnFailure and exit code != 0), re-spawn the process using stored spec. Need to store the original `ProcessStartSpec` on `ManagedProcess`.
+
+- `server/packages/sandbox-agent/src/router/types.rs`:
+  - Add `owner` to `ProcessInfo` response.
+  - Add `ProcessListQuery { owner: Option<ProcessOwner> }`.
+
+- `server/packages/sandbox-agent/src/router.rs`:
+  - Modify `get_v1_processes` to accept `Query<ProcessListQuery>` and filter.
+  - Pass `ProcessRuntime` into `DesktopRuntime::new()`.
+  - Add `ProcessOwner`, `RestartPolicy` to OpenAPI schemas.
+
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — **Major refactor**:
+  - Remove `ManagedDesktopChild` struct.
+  - `DesktopRuntime` takes `ProcessRuntime` as constructor param.
+  - `start_xvfb_locked()` and `start_openbox_locked()` call `process_runtime.start_process(ProcessStartSpec { owner: Desktop, restart_policy: Some(Always), ... })` instead of spawning directly.
+  - Store returned process IDs in state instead of `Child` handles.
+  - `stop` calls `process_runtime.stop_process()` / `kill_process()`.
+  - `processes_locked()` queries process runtime for desktop-owned processes.
+  - dbus-launch remains a direct one-shot spawn (it's not a long-running process, just produces env vars).
+
+- `sdks/typescript/src/client.ts` — Add `owner` filter option to `listProcesses()`.
+- `docs/openapi.json` — Regenerate.
+
+**Risks**:
+- Lock ordering: desktop runtime holds Mutex, process runtime uses RwLock. Release desktop Mutex before calling process runtime, or restructure.
+- `log_path` field in `DesktopProcessInfo` no longer applies (logs are in-memory now). Remove or deprecate.
+
+**Test**: Integration: start desktop, `GET /v1/processes?owner=desktop` returns Xvfb+openbox. `GET /v1/processes?owner=user` excludes them. Desktop process logs are streamable via `GET /v1/processes/{id}/logs?follow=true`. Existing desktop lifecycle tests still pass.
+
+---
+
+## Task 6: Screen recording API (ffmpeg x11grab)
+
+**What**: 6 endpoints for recording the desktop to MP4.
+
+**Endpoints**:
+- `POST /v1/desktop/recording/start` — Start ffmpeg recording
+- `POST /v1/desktop/recording/stop` — Stop recording (SIGTERM → wait → SIGKILL)
+- `GET /v1/desktop/recordings` — List recordings
+- `GET /v1/desktop/recordings/{id}` — Get recording metadata
+- `GET /v1/desktop/recordings/{id}/download` — Serve MP4 file
+- `DELETE /v1/desktop/recordings/{id}` — Delete recording
+
+**Files**:
+- **New**: `server/packages/sandbox-agent/src/desktop_recording.rs` — Recording state, ffmpeg process management. `start_recording()` spawns ffmpeg via process runtime (owner=Desktop): `ffmpeg -f x11grab -video_size WxH -i :99 -c:v libx264 -preset ultrafast -r 30 {path}`. Recordings stored in `{state_dir}/recordings/`.
+- `server/packages/sandbox-agent/src/desktop_types.rs` — Add recording request/response types.
+- `server/packages/sandbox-agent/src/desktop_runtime.rs` — Wire recording manager, expose through desktop runtime.
+- `server/packages/sandbox-agent/src/router.rs` — Add 6 routes + handlers.
+- `server/packages/sandbox-agent/src/desktop_install.rs` — Add `ffmpeg` to dependency detection (soft: only error when recording is requested).
+- `docker/runtime/Dockerfile` and `docker/test-agent/Dockerfile` — Add `ffmpeg` to apt-get.
+- `sdks/typescript/src/client.ts` — Add 6 recording methods.
+- `docs/openapi.json` — Regenerate.
+
+**Depends on**: Task 5 (ffmpeg runs as desktop-owned process).
+
+**Test**: Integration: start desktop → start recording → wait 2s → stop → list → download (verify MP4 magic bytes) → delete.
+
+---
+
+## Task 7: Neko WebRTC desktop streaming + React component
+
+**What**: Integrate neko for WebRTC desktop streaming, mirroring the ProcessTerminal + Ghostty pattern.
+
+### Server side
+
+- **New**: `server/packages/sandbox-agent/src/desktop_streaming.rs` — Manages neko process via process runtime (owner=Desktop). Neko connects to existing Xvfb display, runs GStreamer pipeline for H.264 encoding.
+- `server/packages/sandbox-agent/src/router.rs`:
+  - `GET /v1/desktop/stream/ws` — WebSocket proxy to neko's internal WebSocket. Upgrade request, bridge bidirectionally.
+  - `POST /v1/desktop/stream/start` / `POST /v1/desktop/stream/stop` — Lifecycle control.
+- `docker/runtime/Dockerfile` and `docker/test-agent/Dockerfile` — Add neko binary + GStreamer packages (`gstreamer1.0-plugins-base`, `gstreamer1.0-plugins-good`, `gstreamer1.0-x`, `libgstreamer1.0-0`). Consider making this an optional Docker stage to avoid bloating the base image.
+
+### TypeScript SDK
+
+- **New**: `sdks/typescript/src/desktop-stream.ts` — `DesktopStreamSession` class ported from neko's `base.ts` (~500 lines):
+  - WebSocket for signaling (SDP offer/answer, ICE candidates)
+  - `RTCPeerConnection` for video stream
+  - `RTCDataChannel` for binary input (mouse: 7 bytes, keyboard: 11 bytes)
+  - Events: `onTrack(stream)`, `onConnect()`, `onDisconnect()`, `onError()`
+- `sdks/typescript/src/client.ts` — Add `connectDesktopStream()` returning `DesktopStreamSession`, `buildDesktopStreamWebSocketUrl()`, `startDesktopStream()`, `stopDesktopStream()`.
+- `sdks/typescript/src/index.ts` — Export `DesktopStreamSession`.
+
+### React SDK
+
+- **New**: `sdks/react/src/DesktopViewer.tsx` — Following `ProcessTerminal.tsx` pattern:
+  ```
+  Props: client (Pick<SandboxAgent, 'connectDesktopStream'>), height, className, style, onConnect, onDisconnect, onError
+  ```
+  - `useEffect` → `client.connectDesktopStream()` → wire `onTrack` to `<video>.srcObject`
+  - Capture mouse events on video element → scale coordinates to desktop resolution → send via DataChannel
+  - Capture keyboard events → send via DataChannel
+  - Connection state indicator
+  - Cleanup: close RTCPeerConnection, close WebSocket
+- `sdks/react/src/index.ts` — Export `DesktopViewer`.
+
+**Depends on**: Task 5 (neko runs as desktop-owned process).
+
+**Test**: Server integration: start stream, connect WebSocket, verify signaling messages flow. React: component mounts/unmounts without errors. Full E2E requires browser (manual initially).
+
+---
+
+## Verification
+
+After all tasks:
+1. `cargo test` — All Rust unit tests pass
+2. `cargo test --test v1_api` — All integration tests pass (requires Docker)
+3. Regenerate `docs/openapi.json` and verify it reflects all new endpoints
+4. Build TypeScript SDK: `cd sdks/typescript && pnpm build`
+5. Build React SDK: `cd sdks/react && pnpm build`
+6. Manual: start desktop, take JPEG screenshot, list windows, record 5s video, stream desktop via DesktopViewer component
--- a/.context/todos.md
+++ b/.context/todos.md
--- a/docker/runtime/Dockerfile
+++ b/docker/runtime/Dockerfile
@ -152,7 +152,8 @@ FROM debian:bookworm-slim
 RUN apt-get update && apt-get install -y \
    ca-certificates \
    curl \
-    git && \
+    git \
+    ffmpeg && \
    rm -rf /var/lib/apt/lists/*

 # Copy the binary from builder
--- a/docker/test-agent/Dockerfile
+++ b/docker/test-agent/Dockerfile
@ -25,6 +25,7 @@ RUN apt-get update -qq && \
      openbox \
      xdotool \
      imagemagick \
+      ffmpeg \
      x11-xserver-utils \
      dbus-x11 \
      xauth \
--- a/docs/openapi.json
+++ b/docs/openapi.json
--- a/sdks/react/src/DesktopViewer.tsx
+++ b/sdks/react/src/DesktopViewer.tsx
@ -0,0 +1,288 @@
+"use client";
+
+import type { CSSProperties, MouseEvent, WheelEvent } from "react";
+import { useEffect, useRef, useState } from "react";
+import type {
+  DesktopMouseButton,
+  DesktopStreamErrorStatus,
+  DesktopStreamReadyStatus,
+  SandboxAgent,
+} from "sandbox-agent";
+
+type ConnectionState = "connecting" | "ready" | "closed" | "error";
+
+export type DesktopViewerClient = Pick<
+  SandboxAgent,
+  "startDesktopStream" | "stopDesktopStream" | "connectDesktopStream"
+>;
+
+export interface DesktopViewerProps {
+  client: DesktopViewerClient;
+  className?: string;
+  style?: CSSProperties;
+  imageStyle?: CSSProperties;
+  height?: number | string;
+  onConnect?: (status: DesktopStreamReadyStatus) => void;
+  onDisconnect?: () => void;
+  onError?: (error: DesktopStreamErrorStatus | Error) => void;
+}
+
+const shellStyle: CSSProperties = {
+  display: "flex",
+  flexDirection: "column",
+  overflow: "hidden",
+  border: "1px solid rgba(15, 23, 42, 0.14)",
+  borderRadius: 14,
+  background:
+    "linear-gradient(180deg, rgba(248, 250, 252, 0.96) 0%, rgba(226, 232, 240, 0.92) 100%)",
+  boxShadow: "0 20px 40px rgba(15, 23, 42, 0.08)",
+};
+
+const statusBarStyle: CSSProperties = {
+  display: "flex",
+  alignItems: "center",
+  justifyContent: "space-between",
+  gap: 12,
+  padding: "10px 14px",
+  borderBottom: "1px solid rgba(15, 23, 42, 0.08)",
+  background: "rgba(255, 255, 255, 0.78)",
+  color: "#0f172a",
+  fontSize: 12,
+  lineHeight: 1.4,
+};
+
+const viewportStyle: CSSProperties = {
+  position: "relative",
+  display: "flex",
+  alignItems: "center",
+  justifyContent: "center",
+  overflow: "hidden",
+  background:
+    "radial-gradient(circle at top, rgba(14, 165, 233, 0.18), transparent 45%), linear-gradient(180deg, #0f172a 0%, #111827 100%)",
+};
+
+const imageBaseStyle: CSSProperties = {
+  display: "block",
+  width: "100%",
+  height: "100%",
+  objectFit: "contain",
+  userSelect: "none",
+};
+
+const hintStyle: CSSProperties = {
+  opacity: 0.66,
+};
+
+const getStatusColor = (state: ConnectionState): string => {
+  switch (state) {
+    case "ready":
+      return "#15803d";
+    case "error":
+      return "#b91c1c";
+    case "closed":
+      return "#b45309";
+    default:
+      return "#475569";
+  }
+};
+
+export const DesktopViewer = ({
+  client,
+  className,
+  style,
+  imageStyle,
+  height = 480,
+  onConnect,
+  onDisconnect,
+  onError,
+}: DesktopViewerProps) => {
+  const wrapperRef = useRef<HTMLDivElement | null>(null);
+  const sessionRef = useRef<ReturnType<DesktopViewerClient["connectDesktopStream"]> | null>(null);
+  const [connectionState, setConnectionState] = useState<ConnectionState>("connecting");
+  const [statusMessage, setStatusMessage] = useState("Starting desktop stream...");
+  const [frameUrl, setFrameUrl] = useState<string | null>(null);
+  const [resolution, setResolution] = useState<{ width: number; height: number } | null>(null);
+
+  useEffect(() => {
+    let cancelled = false;
+    let lastObjectUrl: string | null = null;
+    let session: ReturnType<DesktopViewerClient["connectDesktopStream"]> | null = null;
+
+    setConnectionState("connecting");
+    setStatusMessage("Starting desktop stream...");
+    setResolution(null);
+
+    const connect = async () => {
+      try {
+        await client.startDesktopStream();
+        if (cancelled) {
+          return;
+        }
+
+        session = client.connectDesktopStream();
+        sessionRef.current = session;
+        session.onReady((status) => {
+          if (cancelled) {
+            return;
+          }
+          setConnectionState("ready");
+          setStatusMessage("Desktop stream connected.");
+          setResolution({ width: status.width, height: status.height });
+          onConnect?.(status);
+        });
+        session.onFrame((frame) => {
+          if (cancelled) {
+            return;
+          }
+          const nextUrl = URL.createObjectURL(
+            new Blob([frame.slice().buffer], { type: "image/jpeg" }),
+          );
+          setFrameUrl((current) => {
+            if (current) {
+              URL.revokeObjectURL(current);
+            }
+            return nextUrl;
+          });
+          if (lastObjectUrl) {
+            URL.revokeObjectURL(lastObjectUrl);
+          }
+          lastObjectUrl = nextUrl;
+        });
+        session.onError((error) => {
+          if (cancelled) {
+            return;
+          }
+          setConnectionState("error");
+          setStatusMessage(error instanceof Error ? error.message : error.message);
+          onError?.(error);
+        });
+        session.onClose(() => {
+          if (cancelled) {
+            return;
+          }
+          setConnectionState((current) => (current === "error" ? current : "closed"));
+          setStatusMessage((current) =>
+            current === "Desktop stream connected." ? "Desktop stream disconnected." : current,
+          );
+          onDisconnect?.();
+        });
+      } catch (error) {
+        if (cancelled) {
+          return;
+        }
+        const nextError = error instanceof Error ? error : new Error("Failed to initialize desktop stream.");
+        setConnectionState("error");
+        setStatusMessage(nextError.message);
+        onError?.(nextError);
+      }
+    };
+
+    void connect();
+
+    return () => {
+      cancelled = true;
+      session?.close();
+      sessionRef.current = null;
+      void client.stopDesktopStream().catch(() => undefined);
+      setFrameUrl((current) => {
+        if (current) {
+          URL.revokeObjectURL(current);
+        }
+        return null;
+      });
+      if (lastObjectUrl) {
+        URL.revokeObjectURL(lastObjectUrl);
+      }
+    };
+  }, [client, onConnect, onDisconnect, onError]);
+
+  const scalePoint = (clientX: number, clientY: number) => {
+    const wrapper = wrapperRef.current;
+    if (!wrapper || !resolution) {
+      return null;
+    }
+    const rect = wrapper.getBoundingClientRect();
+    if (rect.width === 0 || rect.height === 0) {
+      return null;
+    }
+    const x = Math.max(0, Math.min(resolution.width, ((clientX - rect.left) / rect.width) * resolution.width));
+    const y = Math.max(0, Math.min(resolution.height, ((clientY - rect.top) / rect.height) * resolution.height));
+    return {
+      x: Math.round(x),
+      y: Math.round(y),
+    };
+  };
+
+  const buttonFromMouseEvent = (event: MouseEvent<HTMLDivElement>): DesktopMouseButton => {
+    switch (event.button) {
+      case 1:
+        return "middle";
+      case 2:
+        return "right";
+      default:
+        return "left";
+    }
+  };
+
+  const withSession = (
+    callback: (session: NonNullable<ReturnType<DesktopViewerClient["connectDesktopStream"]>>) => void,
+  ) => {
+    const session = sessionRef.current;
+    if (session) {
+      callback(session);
+    }
+  };
+
+  return (
+    <div className={className} style={{ ...shellStyle, ...style }}>
+      <div style={statusBarStyle}>
+        <span style={{ color: getStatusColor(connectionState) }}>{statusMessage}</span>
+        <span style={hintStyle}>
+          {resolution ? `${resolution.width}×${resolution.height}` : "Awaiting frames"}
+        </span>
+      </div>
+      <div
+        ref={wrapperRef}
+        role="button"
+        tabIndex={0}
+        style={{ ...viewportStyle, height }}
+        onMouseMove={(event) => {
+          const point = scalePoint(event.clientX, event.clientY);
+          if (!point) {
+            return;
+          }
+          withSession((session) => session.moveMouse(point.x, point.y));
+        }}
+        onMouseDown={(event) => {
+          event.preventDefault();
+          const point = scalePoint(event.clientX, event.clientY);
+          withSession((session) =>
+            session.mouseDown(buttonFromMouseEvent(event), point?.x, point?.y),
+          );
+        }}
+        onMouseUp={(event) => {
+          const point = scalePoint(event.clientX, event.clientY);
+          withSession((session) => session.mouseUp(buttonFromMouseEvent(event), point?.x, point?.y));
+        }}
+        onWheel={(event: WheelEvent<HTMLDivElement>) => {
+          event.preventDefault();
+          const point = scalePoint(event.clientX, event.clientY);
+          if (!point) {
+            return;
+          }
+          withSession((session) => session.scroll(point.x, point.y, Math.round(event.deltaX), Math.round(event.deltaY)));
+        }}
+        onKeyDown={(event) => {
+          withSession((session) => session.keyDown(event.key));
+        }}
+        onKeyUp={(event) => {
+          withSession((session) => session.keyUp(event.key));
+        }}
+      >
+        {frameUrl ? (
+          <img alt="Desktop stream" draggable={false} src={frameUrl} style={{ ...imageBaseStyle, ...imageStyle }} />
+        ) : null}
+      </div>
+    </div>
+  );
+};
--- a/sdks/react/src/index.ts
+++ b/sdks/react/src/index.ts
@ -1,6 +1,7 @@
 export { AgentConversation } from "./AgentConversation.tsx";
 export { AgentTranscript } from "./AgentTranscript.tsx";
 export { ChatComposer } from "./ChatComposer.tsx";
+export { DesktopViewer } from "./DesktopViewer.tsx";
 export { ProcessTerminal } from "./ProcessTerminal.tsx";
 export { useTranscriptVirtualizer } from "./useTranscriptVirtualizer.ts";

@ -23,6 +24,11 @@ export type {
  ChatComposerProps,
 } from "./ChatComposer.tsx";

+export type {
+  DesktopViewerClient,
+  DesktopViewerProps,
+} from "./DesktopViewer.tsx";
+
 export type {
  ProcessTerminalClient,
  ProcessTerminalProps,
--- a/sdks/typescript/src/client.ts
+++ b/sdks/typescript/src/client.ts
@ -23,6 +23,10 @@ import {
  type SetSessionModeRequest,
 } from "acp-http-client";
 import type { SandboxAgentSpawnHandle, SandboxAgentSpawnOptions } from "./spawn.ts";
+import {
+  DesktopStreamSession,
+  type DesktopStreamConnectOptions,
+} from "./desktop-stream.ts";
 import {
  type AcpServerListResponse,
  type AgentInfo,
@ -31,17 +35,26 @@ import {
  type AgentListResponse,
  type DesktopActionResponse,
  type DesktopDisplayInfoResponse,
+  type DesktopKeyboardDownRequest,
  type DesktopKeyboardPressRequest,
  type DesktopKeyboardTypeRequest,
  type DesktopMouseClickRequest,
+  type DesktopMouseDownRequest,
  type DesktopMouseDragRequest,
  type DesktopMouseMoveRequest,
  type DesktopMousePositionResponse,
  type DesktopMouseScrollRequest,
+  type DesktopMouseUpRequest,
+  type DesktopKeyboardUpRequest,
+  type DesktopRecordingInfo,
+  type DesktopRecordingListResponse,
+  type DesktopRecordingStartRequest,
  type DesktopRegionScreenshotQuery,
  type DesktopScreenshotQuery,
  type DesktopStartRequest,
  type DesktopStatusResponse,
+  type DesktopStreamStatusResponse,
+  type DesktopWindowListResponse,
  type FsActionResponse,
  type FsDeleteQuery,
  type FsEntriesQuery,
@ -66,7 +79,9 @@ import {
  type ProcessInfo,
  type ProcessInputRequest,
  type ProcessInputResponse,
+  type ProcessListQuery,
  type ProcessListResponse,
+  type ProcessOwner,
  type ProcessLogEntry,
  type ProcessLogsQuery,
  type ProcessLogsResponse,
@ -201,6 +216,7 @@ export interface ProcessTerminalConnectOptions extends ProcessTerminalWebSocketU
 }

 export type ProcessTerminalSessionOptions = ProcessTerminalConnectOptions;
+export type DesktopStreamSessionOptions = DesktopStreamConnectOptions;

 export class SandboxAgentError extends Error {
  readonly status: number;
@ -1431,7 +1447,7 @@ export class SandboxAgent {
  async takeDesktopScreenshot(query: DesktopScreenshotQuery = {}): Promise<Uint8Array> {
    const response = await this.requestRaw("GET", `${API_PREFIX}/desktop/screenshot`, {
      query,
-      accept: "image/png",
+      accept: "image/*",
    });
    const buffer = await response.arrayBuffer();
    return new Uint8Array(buffer);
@ -1440,7 +1456,7 @@ export class SandboxAgent {
  async takeDesktopRegionScreenshot(query: DesktopRegionScreenshotQuery): Promise<Uint8Array> {
    const response = await this.requestRaw("GET", `${API_PREFIX}/desktop/screenshot/region`, {
      query,
-      accept: "image/png",
+      accept: "image/*",
    });
    const buffer = await response.arrayBuffer();
    return new Uint8Array(buffer);
@ -1462,6 +1478,18 @@ export class SandboxAgent {
    });
  }

+  async mouseDownDesktop(request: DesktopMouseDownRequest): Promise<DesktopMousePositionResponse> {
+    return this.requestJson("POST", `${API_PREFIX}/desktop/mouse/down`, {
+      body: request,
+    });
+  }
+
+  async mouseUpDesktop(request: DesktopMouseUpRequest): Promise<DesktopMousePositionResponse> {
+    return this.requestJson("POST", `${API_PREFIX}/desktop/mouse/up`, {
+      body: request,
+    });
+  }
+
  async dragDesktopMouse(request: DesktopMouseDragRequest): Promise<DesktopMousePositionResponse> {
    return this.requestJson("POST", `${API_PREFIX}/desktop/mouse/drag`, {
      body: request,
@ -1486,6 +1514,66 @@ export class SandboxAgent {
    });
  }

+  async keyDownDesktop(request: DesktopKeyboardDownRequest): Promise<DesktopActionResponse> {
+    return this.requestJson("POST", `${API_PREFIX}/desktop/keyboard/down`, {
+      body: request,
+    });
+  }
+
+  async keyUpDesktop(request: DesktopKeyboardUpRequest): Promise<DesktopActionResponse> {
+    return this.requestJson("POST", `${API_PREFIX}/desktop/keyboard/up`, {
+      body: request,
+    });
+  }
+
+  async listDesktopWindows(): Promise<DesktopWindowListResponse> {
+    return this.requestJson("GET", `${API_PREFIX}/desktop/windows`);
+  }
+
+  async startDesktopRecording(
+    request: DesktopRecordingStartRequest = {},
+  ): Promise<DesktopRecordingInfo> {
+    return this.requestJson("POST", `${API_PREFIX}/desktop/recording/start`, {
+      body: request,
+    });
+  }
+
+  async stopDesktopRecording(): Promise<DesktopRecordingInfo> {
+    return this.requestJson("POST", `${API_PREFIX}/desktop/recording/stop`);
+  }
+
+  async listDesktopRecordings(): Promise<DesktopRecordingListResponse> {
+    return this.requestJson("GET", `${API_PREFIX}/desktop/recordings`);
+  }
+
+  async getDesktopRecording(id: string): Promise<DesktopRecordingInfo> {
+    return this.requestJson("GET", `${API_PREFIX}/desktop/recordings/${encodeURIComponent(id)}`);
+  }
+
+  async downloadDesktopRecording(id: string): Promise<Uint8Array> {
+    const response = await this.requestRaw(
+      "GET",
+      `${API_PREFIX}/desktop/recordings/${encodeURIComponent(id)}/download`,
+      {
+        accept: "video/mp4",
+      },
+    );
+    const buffer = await response.arrayBuffer();
+    return new Uint8Array(buffer);
+  }
+
+  async deleteDesktopRecording(id: string): Promise<void> {
+    await this.requestRaw("DELETE", `${API_PREFIX}/desktop/recordings/${encodeURIComponent(id)}`);
+  }
+
+  async startDesktopStream(): Promise<DesktopStreamStatusResponse> {
+    return this.requestJson("POST", `${API_PREFIX}/desktop/stream/start`);
+  }
+
+  async stopDesktopStream(): Promise<DesktopStreamStatusResponse> {
+    return this.requestJson("POST", `${API_PREFIX}/desktop/stream/stop`);
+  }
+
  async listAgents(options?: AgentQueryOptions): Promise<AgentListResponse> {
    return this.requestJson("GET", `${API_PREFIX}/agents`, {
      query: toAgentQuery(options),
@ -1618,8 +1706,10 @@ export class SandboxAgent {
    });
  }

-  async listProcesses(): Promise<ProcessListResponse> {
-    return this.requestJson("GET", `${API_PREFIX}/processes`);
+  async listProcesses(query?: ProcessListQuery): Promise<ProcessListResponse> {
+    return this.requestJson("GET", `${API_PREFIX}/processes`, {
+      query,
+    });
  }

  async getProcess(id: string): Promise<ProcessInfo> {
@ -1707,6 +1797,32 @@ export class SandboxAgent {
    return new ProcessTerminalSession(this.connectProcessTerminalWebSocket(id, options));
  }

+  buildDesktopStreamWebSocketUrl(options: ProcessTerminalWebSocketUrlOptions = {}): string {
+    return toWebSocketUrl(
+      this.buildUrl(`${API_PREFIX}/desktop/stream/ws`, {
+        access_token: options.accessToken ?? this.token,
+      }),
+    );
+  }
+
+  connectDesktopStreamWebSocket(options: DesktopStreamConnectOptions = {}): WebSocket {
+    const WebSocketCtor = options.WebSocket ?? globalThis.WebSocket;
+    if (!WebSocketCtor) {
+      throw new Error("WebSocket API is not available; provide a WebSocket implementation.");
+    }
+
+    return new WebSocketCtor(
+      this.buildDesktopStreamWebSocketUrl({
+        accessToken: options.accessToken,
+      }),
+      options.protocols,
+    );
+  }
+
+  connectDesktopStream(options: DesktopStreamSessionOptions = {}): DesktopStreamSession {
+    return new DesktopStreamSession(this.connectDesktopStreamWebSocket(options));
+  }
+
  private async getLiveConnection(agent: string): Promise<LiveAcpConnection> {
    await this.awaitHealthy();

--- a/sdks/typescript/src/desktop-stream.ts
+++ b/sdks/typescript/src/desktop-stream.ts
@ -0,0 +1,236 @@
+import type { DesktopMouseButton } from "./types.ts";
+
+const WS_READY_STATE_CONNECTING = 0;
+const WS_READY_STATE_OPEN = 1;
+const WS_READY_STATE_CLOSED = 3;
+
+export interface DesktopStreamReadyStatus {
+  type: "ready";
+  width: number;
+  height: number;
+}
+
+export interface DesktopStreamErrorStatus {
+  type: "error";
+  message: string;
+}
+
+export type DesktopStreamStatusMessage = DesktopStreamReadyStatus | DesktopStreamErrorStatus;
+
+export interface DesktopStreamConnectOptions {
+  accessToken?: string;
+  WebSocket?: typeof WebSocket;
+  protocols?: string | string[];
+}
+
+type DesktopStreamClientFrame =
+  | {
+      type: "moveMouse";
+      x: number;
+      y: number;
+    }
+  | {
+      type: "mouseDown" | "mouseUp";
+      x?: number;
+      y?: number;
+      button?: DesktopMouseButton;
+    }
+  | {
+      type: "scroll";
+      x: number;
+      y: number;
+      deltaX?: number;
+      deltaY?: number;
+    }
+  | {
+      type: "keyDown" | "keyUp";
+      key: string;
+    }
+  | {
+      type: "close";
+    };
+
+export class DesktopStreamSession {
+  readonly socket: WebSocket;
+  readonly closed: Promise<void>;
+
+  private readonly readyListeners = new Set<(status: DesktopStreamReadyStatus) => void>();
+  private readonly frameListeners = new Set<(frame: Uint8Array) => void>();
+  private readonly errorListeners = new Set<(error: DesktopStreamErrorStatus | Error) => void>();
+  private readonly closeListeners = new Set<() => void>();
+
+  private closeSignalSent = false;
+  private closedResolve!: () => void;
+
+  constructor(socket: WebSocket) {
+    this.socket = socket;
+    this.socket.binaryType = "arraybuffer";
+    this.closed = new Promise<void>((resolve) => {
+      this.closedResolve = resolve;
+    });
+
+    this.socket.addEventListener("message", (event) => {
+      void this.handleMessage(event.data);
+    });
+    this.socket.addEventListener("error", () => {
+      this.emitError(new Error("Desktop stream websocket connection failed."));
+    });
+    this.socket.addEventListener("close", () => {
+      this.closedResolve();
+      for (const listener of this.closeListeners) {
+        listener();
+      }
+    });
+  }
+
+  onReady(listener: (status: DesktopStreamReadyStatus) => void): () => void {
+    this.readyListeners.add(listener);
+    return () => {
+      this.readyListeners.delete(listener);
+    };
+  }
+
+  onFrame(listener: (frame: Uint8Array) => void): () => void {
+    this.frameListeners.add(listener);
+    return () => {
+      this.frameListeners.delete(listener);
+    };
+  }
+
+  onError(listener: (error: DesktopStreamErrorStatus | Error) => void): () => void {
+    this.errorListeners.add(listener);
+    return () => {
+      this.errorListeners.delete(listener);
+    };
+  }
+
+  onClose(listener: () => void): () => void {
+    this.closeListeners.add(listener);
+    return () => {
+      this.closeListeners.delete(listener);
+    };
+  }
+
+  moveMouse(x: number, y: number): void {
+    this.sendFrame({ type: "moveMouse", x, y });
+  }
+
+  mouseDown(button?: DesktopMouseButton, x?: number, y?: number): void {
+    this.sendFrame({ type: "mouseDown", button, x, y });
+  }
+
+  mouseUp(button?: DesktopMouseButton, x?: number, y?: number): void {
+    this.sendFrame({ type: "mouseUp", button, x, y });
+  }
+
+  scroll(x: number, y: number, deltaX?: number, deltaY?: number): void {
+    this.sendFrame({ type: "scroll", x, y, deltaX, deltaY });
+  }
+
+  keyDown(key: string): void {
+    this.sendFrame({ type: "keyDown", key });
+  }
+
+  keyUp(key: string): void {
+    this.sendFrame({ type: "keyUp", key });
+  }
+
+  close(): void {
+    if (this.socket.readyState === WS_READY_STATE_CONNECTING) {
+      this.socket.addEventListener(
+        "open",
+        () => {
+          this.close();
+        },
+        { once: true },
+      );
+      return;
+    }
+
+    if (this.socket.readyState === WS_READY_STATE_OPEN) {
+      if (!this.closeSignalSent) {
+        this.closeSignalSent = true;
+        this.sendFrame({ type: "close" });
+      }
+      this.socket.close();
+      return;
+    }
+
+    if (this.socket.readyState !== WS_READY_STATE_CLOSED) {
+      this.socket.close();
+    }
+  }
+
+  private async handleMessage(data: unknown): Promise<void> {
+    try {
+      if (typeof data === "string") {
+        const frame = parseStatusFrame(data);
+        if (!frame) {
+          this.emitError(new Error("Received invalid desktop stream control frame."));
+          return;
+        }
+
+        if (frame.type === "ready") {
+          for (const listener of this.readyListeners) {
+            listener(frame);
+          }
+          return;
+        }
+
+        this.emitError(frame);
+        return;
+      }
+
+      const bytes = await decodeBinaryFrame(data);
+      for (const listener of this.frameListeners) {
+        listener(bytes);
+      }
+    } catch (error) {
+      this.emitError(error instanceof Error ? error : new Error(String(error)));
+    }
+  }
+
+  private sendFrame(frame: DesktopStreamClientFrame): void {
+    if (this.socket.readyState !== WS_READY_STATE_OPEN) {
+      return;
+    }
+    this.socket.send(JSON.stringify(frame));
+  }
+
+  private emitError(error: DesktopStreamErrorStatus | Error): void {
+    for (const listener of this.errorListeners) {
+      listener(error);
+    }
+  }
+}
+
+function parseStatusFrame(payload: string): DesktopStreamStatusMessage | null {
+  const value = JSON.parse(payload) as Record<string, unknown>;
+  if (value.type === "ready" && typeof value.width === "number" && typeof value.height === "number") {
+    return {
+      type: "ready",
+      width: value.width,
+      height: value.height,
+    };
+  }
+  if (value.type === "error" && typeof value.message === "string") {
+    return {
+      type: "error",
+      message: value.message,
+    };
+  }
+  return null;
+}
+
+async function decodeBinaryFrame(data: unknown): Promise<Uint8Array> {
+  if (data instanceof ArrayBuffer) {
+    return new Uint8Array(data);
+  }
+  if (ArrayBuffer.isView(data)) {
+    return new Uint8Array(data.buffer, data.byteOffset, data.byteLength);
+  }
+  if (typeof Blob !== "undefined" && data instanceof Blob) {
+    return new Uint8Array(await data.arrayBuffer());
+  }
+  throw new Error("Unsupported desktop stream binary frame type.");
+}
--- a/sdks/typescript/src/generated/openapi.ts
+++ b/sdks/typescript/src/generated/openapi.ts
@ -3,6 +3,7 @@
 * Do not make direct changes to the file.
 */

+
 export interface paths {
  "/v1/acp": {
    get: operations["get_v1_acp_servers"];
@ -39,6 +40,14 @@ export interface paths {
     */
    get: operations["get_v1_desktop_display_info"];
  };
+  "/v1/desktop/keyboard/down": {
+    /**
+     * Press and hold a desktop keyboard key.
+     * @description Performs a health-gated `xdotool keydown` operation against the managed
+     * desktop.
+     */
+    post: operations["post_v1_desktop_keyboard_down"];
+  };
  "/v1/desktop/keyboard/press": {
    /**
     * Press a desktop keyboard shortcut.
@ -55,6 +64,14 @@ export interface paths {
     */
    post: operations["post_v1_desktop_keyboard_type"];
  };
+  "/v1/desktop/keyboard/up": {
+    /**
+     * Release a desktop keyboard key.
+     * @description Performs a health-gated `xdotool keyup` operation against the managed
+     * desktop.
+     */
+    post: operations["post_v1_desktop_keyboard_up"];
+  };
  "/v1/desktop/mouse/click": {
    /**
     * Click on the desktop.
@ -63,6 +80,14 @@ export interface paths {
     */
    post: operations["post_v1_desktop_mouse_click"];
  };
+  "/v1/desktop/mouse/down": {
+    /**
+     * Press and hold a desktop mouse button.
+     * @description Performs a health-gated optional pointer move followed by `xdotool mousedown`
+     * and returns the resulting mouse position.
+     */
+    post: operations["post_v1_desktop_mouse_down"];
+  };
  "/v1/desktop/mouse/drag": {
    /**
     * Drag the desktop mouse.
@ -94,11 +119,61 @@ export interface paths {
     */
    post: operations["post_v1_desktop_mouse_scroll"];
  };
+  "/v1/desktop/mouse/up": {
+    /**
+     * Release a desktop mouse button.
+     * @description Performs a health-gated optional pointer move followed by `xdotool mouseup`
+     * and returns the resulting mouse position.
+     */
+    post: operations["post_v1_desktop_mouse_up"];
+  };
+  "/v1/desktop/recording/start": {
+    /**
+     * Start desktop recording.
+     * @description Starts an ffmpeg x11grab recording against the managed desktop and returns
+     * the created recording metadata.
+     */
+    post: operations["post_v1_desktop_recording_start"];
+  };
+  "/v1/desktop/recording/stop": {
+    /**
+     * Stop desktop recording.
+     * @description Stops the active desktop recording and returns the finalized recording
+     * metadata.
+     */
+    post: operations["post_v1_desktop_recording_stop"];
+  };
+  "/v1/desktop/recordings": {
+    /**
+     * List desktop recordings.
+     * @description Returns the current desktop recording catalog.
+     */
+    get: operations["get_v1_desktop_recordings"];
+  };
+  "/v1/desktop/recordings/{id}": {
+    /**
+     * Get desktop recording metadata.
+     * @description Returns metadata for a single desktop recording.
+     */
+    get: operations["get_v1_desktop_recording"];
+    /**
+     * Delete a desktop recording.
+     * @description Removes a completed desktop recording and its file from disk.
+     */
+    delete: operations["delete_v1_desktop_recording"];
+  };
+  "/v1/desktop/recordings/{id}/download": {
+    /**
+     * Download a desktop recording.
+     * @description Serves the recorded MP4 bytes for a completed desktop recording.
+     */
+    get: operations["get_v1_desktop_recording_download"];
+  };
  "/v1/desktop/screenshot": {
    /**
     * Capture a full desktop screenshot.
     * @description Performs a health-gated full-frame screenshot of the managed desktop and
-     * returns PNG bytes.
+     * returns the requested image bytes.
     */
    get: operations["get_v1_desktop_screenshot"];
  };
@ -106,7 +181,7 @@ export interface paths {
    /**
     * Capture a desktop screenshot region.
     * @description Performs a health-gated screenshot crop against the managed desktop and
-     * returns the requested PNG region bytes.
+     * returns the requested region image bytes.
     */
    get: operations["get_v1_desktop_screenshot_region"];
  };
@ -134,6 +209,36 @@ export interface paths {
     */
    post: operations["post_v1_desktop_stop"];
  };
+  "/v1/desktop/stream/start": {
+    /**
+     * Start desktop streaming.
+     * @description Enables desktop websocket streaming for the managed desktop.
+     */
+    post: operations["post_v1_desktop_stream_start"];
+  };
+  "/v1/desktop/stream/stop": {
+    /**
+     * Stop desktop streaming.
+     * @description Disables desktop websocket streaming for the managed desktop.
+     */
+    post: operations["post_v1_desktop_stream_stop"];
+  };
+  "/v1/desktop/stream/ws": {
+    /**
+     * Open a desktop websocket streaming session.
+     * @description Upgrades the connection to a websocket that streams JPEG desktop frames and
+     * accepts mouse and keyboard control frames.
+     */
+    get: operations["get_v1_desktop_stream_ws"];
+  };
+  "/v1/desktop/windows": {
+    /**
+     * List visible desktop windows.
+     * @description Performs a health-gated visible-window enumeration against the managed
+     * desktop and returns the current window metadata.
+     */
+    get: operations["get_v1_desktop_windows"];
+  };
  "/v1/fs/entries": {
    get: operations["get_v1_fs_entries"];
  };
@ -347,14 +452,27 @@ export interface components {
      code: string;
      message: string;
    };
+    DesktopKeyModifiers: {
+      alt?: boolean | null;
+      cmd?: boolean | null;
+      ctrl?: boolean | null;
+      shift?: boolean | null;
+    };
+    DesktopKeyboardDownRequest: {
+      key: string;
+    };
    DesktopKeyboardPressRequest: {
      key: string;
+      modifiers?: components["schemas"]["DesktopKeyModifiers"] | null;
    };
    DesktopKeyboardTypeRequest: {
      /** Format: int32 */
      delayMs?: number | null;
      text: string;
    };
+    DesktopKeyboardUpRequest: {
+      key: string;
+    };
    /** @enum {string} */
    DesktopMouseButton: "left" | "middle" | "right";
    DesktopMouseClickRequest: {
@ -366,6 +484,13 @@ export interface components {
      /** Format: int32 */
      y: number;
    };
+    DesktopMouseDownRequest: {
+      button?: components["schemas"]["DesktopMouseButton"] | null;
+      /** Format: int32 */
+      x?: number | null;
+      /** Format: int32 */
+      y?: number | null;
+    };
    DesktopMouseDragRequest: {
      button?: components["schemas"]["DesktopMouseButton"] | null;
      /** Format: int32 */
@ -402,6 +527,13 @@ export interface components {
      /** Format: int32 */
      y: number;
    };
+    DesktopMouseUpRequest: {
+      button?: components["schemas"]["DesktopMouseButton"] | null;
+      /** Format: int32 */
+      x?: number | null;
+      /** Format: int32 */
+      y?: number | null;
+    };
    DesktopProcessInfo: {
      logPath?: string | null;
      name: string;
@ -409,10 +541,34 @@ export interface components {
      pid?: number | null;
      running: boolean;
    };
+    DesktopRecordingInfo: {
+      /** Format: int64 */
+      bytes: number;
+      endedAt?: string | null;
+      fileName: string;
+      id: string;
+      processId?: string | null;
+      startedAt: string;
+      status: components["schemas"]["DesktopRecordingStatus"];
+    };
+    DesktopRecordingListResponse: {
+      recordings: components["schemas"]["DesktopRecordingInfo"][];
+    };
+    DesktopRecordingStartRequest: {
+      /** Format: int32 */
+      fps?: number | null;
+    };
+    /** @enum {string} */
+    DesktopRecordingStatus: "recording" | "completed" | "failed";
    DesktopRegionScreenshotQuery: {
+      format?: components["schemas"]["DesktopScreenshotFormat"] | null;
      /** Format: int32 */
      height: number;
      /** Format: int32 */
+      quality?: number | null;
+      /** Format: float */
+      scale?: number | null;
+      /** Format: int32 */
      width: number;
      /** Format: int32 */
      x: number;
@ -427,7 +583,15 @@ export interface components {
      /** Format: int32 */
      width: number;
    };
-    DesktopScreenshotQuery: Record<string, never>;
+    /** @enum {string} */
+    DesktopScreenshotFormat: "png" | "jpeg" | "webp";
+    DesktopScreenshotQuery: {
+      format?: components["schemas"]["DesktopScreenshotFormat"] | null;
+      /** Format: int32 */
+      quality?: number | null;
+      /** Format: float */
+      scale?: number | null;
+    };
    DesktopStartRequest: {
      /** Format: int32 */
      dpi?: number | null;
@ -449,24 +613,27 @@ export interface components {
      startedAt?: string | null;
      state: components["schemas"]["DesktopState"];
    };
+    DesktopStreamStatusResponse: {
+      active: boolean;
+    };
+    DesktopWindowInfo: {
+      /** Format: int32 */
+      height: number;
+      id: string;
+      isActive: boolean;
+      title: string;
+      /** Format: int32 */
+      width: number;
+      /** Format: int32 */
+      x: number;
+      /** Format: int32 */
+      y: number;
+    };
+    DesktopWindowListResponse: {
+      windows: components["schemas"]["DesktopWindowInfo"][];
+    };
    /** @enum {string} */
-    ErrorType:
-      | "invalid_request"
-      | "conflict"
-      | "unsupported_agent"
-      | "agent_not_installed"
-      | "install_failed"
-      | "agent_process_exited"
-      | "token_invalid"
-      | "permission_denied"
-      | "not_acceptable"
-      | "unsupported_media_type"
-      | "not_found"
-      | "session_not_found"
-      | "session_already_exists"
-      | "mode_not_supported"
-      | "stream_error"
-      | "timeout";
+    ErrorType: "invalid_request" | "conflict" | "unsupported_agent" | "agent_not_installed" | "install_failed" | "agent_process_exited" | "token_invalid" | "permission_denied" | "not_acceptable" | "unsupported_media_type" | "not_found" | "session_not_found" | "session_already_exists" | "mode_not_supported" | "stream_error" | "timeout";
    FsActionResponse: {
      path: string;
    };
@ -525,37 +692,35 @@ export interface components {
      directory: string;
      mcpName: string;
    };
-    McpServerConfig:
-      | {
-          args?: string[];
-          command: string;
-          cwd?: string | null;
-          enabled?: boolean | null;
-          env?: {
-            [key: string]: string;
-          } | null;
-          /** Format: int64 */
-          timeoutMs?: number | null;
-          /** @enum {string} */
-          type: "local";
-        }
-      | {
-          bearerTokenEnvVar?: string | null;
-          enabled?: boolean | null;
-          envHeaders?: {
-            [key: string]: string;
-          } | null;
-          headers?: {
-            [key: string]: string;
-          } | null;
-          oauth?: Record<string, unknown> | null | null;
-          /** Format: int64 */
-          timeoutMs?: number | null;
-          transport?: string | null;
-          /** @enum {string} */
-          type: "remote";
-          url: string;
-        };
+    McpServerConfig: ({
+      args?: string[];
+      command: string;
+      cwd?: string | null;
+      enabled?: boolean | null;
+      env?: {
+        [key: string]: string;
+      } | null;
+      /** Format: int64 */
+      timeoutMs?: number | null;
+      /** @enum {string} */
+      type: "local";
+    }) | ({
+      bearerTokenEnvVar?: string | null;
+      enabled?: boolean | null;
+      envHeaders?: {
+        [key: string]: string;
+      } | null;
+      headers?: {
+        [key: string]: string;
+      } | null;
+      oauth?: Record<string, unknown> | null | null;
+      /** Format: int64 */
+      timeoutMs?: number | null;
+      transport?: string | null;
+      /** @enum {string} */
+      type: "remote";
+      url: string;
+    });
    ProblemDetails: {
      detail?: string | null;
      instance?: string | null;
@ -597,6 +762,7 @@ export interface components {
      exitedAtMs?: number | null;
      id: string;
      interactive: boolean;
+      owner: components["schemas"]["ProcessOwner"];
      /** Format: int32 */
      pid?: number | null;
      status: components["schemas"]["ProcessState"];
@ -609,6 +775,9 @@ export interface components {
    ProcessInputResponse: {
      bytesWritten: number;
    };
+    ProcessListQuery: {
+      owner?: components["schemas"]["ProcessOwner"] | null;
+    };
    ProcessListResponse: {
      processes: components["schemas"]["ProcessInfo"][];
    };
@ -635,6 +804,8 @@ export interface components {
    };
    /** @enum {string} */
    ProcessLogsStream: "stdout" | "stderr" | "combined" | "pty";
+    /** @enum {string} */
+    ProcessOwner: "user" | "desktop" | "system";
    ProcessRunRequest: {
      args?: string[];
      command: string;
@ -709,6 +880,7 @@ export type $defs = Record<string, never>;
 export type external = Record<string, never>;

 export interface operations {
+
  get_v1_acp_servers: {
    responses: {
      /** @description Active ACP server instances */
@ -1070,6 +1242,44 @@ export interface operations {
      };
    };
  };
+  /**
+   * Press and hold a desktop keyboard key.
+   * @description Performs a health-gated `xdotool keydown` operation against the managed
+   * desktop.
+   */
+  post_v1_desktop_keyboard_down: {
+    requestBody: {
+      content: {
+        "application/json": components["schemas"]["DesktopKeyboardDownRequest"];
+      };
+    };
+    responses: {
+      /** @description Desktop keyboard action result */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopActionResponse"];
+        };
+      };
+      /** @description Invalid keyboard down request */
+      400: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop runtime is not ready */
+      409: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop runtime health or input failed */
+      502: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
  /**
   * Press a desktop keyboard shortcut.
   * @description Performs a health-gated `xdotool key` operation against the managed
@ -1101,7 +1311,7 @@ export interface operations {
        };
      };
      /** @description Desktop runtime health or input failed */
-      503: {
+      502: {
        content: {
          "application/json": components["schemas"]["ProblemDetails"];
        };
@ -1139,7 +1349,45 @@ export interface operations {
        };
      };
      /** @description Desktop runtime health or input failed */
-      503: {
+      502: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * Release a desktop keyboard key.
+   * @description Performs a health-gated `xdotool keyup` operation against the managed
+   * desktop.
+   */
+  post_v1_desktop_keyboard_up: {
+    requestBody: {
+      content: {
+        "application/json": components["schemas"]["DesktopKeyboardUpRequest"];
+      };
+    };
+    responses: {
+      /** @description Desktop keyboard action result */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopActionResponse"];
+        };
+      };
+      /** @description Invalid keyboard up request */
+      400: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop runtime is not ready */
+      409: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop runtime health or input failed */
+      502: {
        content: {
          "application/json": components["schemas"]["ProblemDetails"];
        };
@ -1177,7 +1425,45 @@ export interface operations {
        };
      };
      /** @description Desktop runtime health or input failed */
-      503: {
+      502: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * Press and hold a desktop mouse button.
+   * @description Performs a health-gated optional pointer move followed by `xdotool mousedown`
+   * and returns the resulting mouse position.
+   */
+  post_v1_desktop_mouse_down: {
+    requestBody: {
+      content: {
+        "application/json": components["schemas"]["DesktopMouseDownRequest"];
+      };
+    };
+    responses: {
+      /** @description Desktop mouse position after button press */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopMousePositionResponse"];
+        };
+      };
+      /** @description Invalid mouse down request */
+      400: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop runtime is not ready */
+      409: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop runtime health or input failed */
+      502: {
        content: {
          "application/json": components["schemas"]["ProblemDetails"];
        };
@ -1215,7 +1501,7 @@ export interface operations {
        };
      };
      /** @description Desktop runtime health or input failed */
-      503: {
+      502: {
        content: {
          "application/json": components["schemas"]["ProblemDetails"];
        };
@ -1253,7 +1539,7 @@ export interface operations {
        };
      };
      /** @description Desktop runtime health or input failed */
-      503: {
+      502: {
        content: {
          "application/json": components["schemas"]["ProblemDetails"];
        };
@ -1279,7 +1565,7 @@ export interface operations {
        };
      };
      /** @description Desktop runtime health or input check failed */
-      503: {
+      502: {
        content: {
          "application/json": components["schemas"]["ProblemDetails"];
        };
@ -1317,7 +1603,204 @@ export interface operations {
        };
      };
      /** @description Desktop runtime health or input failed */
-      503: {
+      502: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * Release a desktop mouse button.
+   * @description Performs a health-gated optional pointer move followed by `xdotool mouseup`
+   * and returns the resulting mouse position.
+   */
+  post_v1_desktop_mouse_up: {
+    requestBody: {
+      content: {
+        "application/json": components["schemas"]["DesktopMouseUpRequest"];
+      };
+    };
+    responses: {
+      /** @description Desktop mouse position after button release */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopMousePositionResponse"];
+        };
+      };
+      /** @description Invalid mouse up request */
+      400: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop runtime is not ready */
+      409: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop runtime health or input failed */
+      502: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * Start desktop recording.
+   * @description Starts an ffmpeg x11grab recording against the managed desktop and returns
+   * the created recording metadata.
+   */
+  post_v1_desktop_recording_start: {
+    requestBody: {
+      content: {
+        "application/json": components["schemas"]["DesktopRecordingStartRequest"];
+      };
+    };
+    responses: {
+      /** @description Desktop recording started */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopRecordingInfo"];
+        };
+      };
+      /** @description Desktop runtime is not ready or a recording is already active */
+      409: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop recording failed */
+      502: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * Stop desktop recording.
+   * @description Stops the active desktop recording and returns the finalized recording
+   * metadata.
+   */
+  post_v1_desktop_recording_stop: {
+    responses: {
+      /** @description Desktop recording stopped */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopRecordingInfo"];
+        };
+      };
+      /** @description No active desktop recording */
+      409: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop recording stop failed */
+      502: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * List desktop recordings.
+   * @description Returns the current desktop recording catalog.
+   */
+  get_v1_desktop_recordings: {
+    responses: {
+      /** @description Desktop recordings */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopRecordingListResponse"];
+        };
+      };
+      /** @description Desktop recordings query failed */
+      502: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * Get desktop recording metadata.
+   * @description Returns metadata for a single desktop recording.
+   */
+  get_v1_desktop_recording: {
+    parameters: {
+      path: {
+        /** @description Desktop recording ID */
+        id: string;
+      };
+    };
+    responses: {
+      /** @description Desktop recording metadata */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopRecordingInfo"];
+        };
+      };
+      /** @description Unknown desktop recording */
+      404: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * Delete a desktop recording.
+   * @description Removes a completed desktop recording and its file from disk.
+   */
+  delete_v1_desktop_recording: {
+    parameters: {
+      path: {
+        /** @description Desktop recording ID */
+        id: string;
+      };
+    };
+    responses: {
+      /** @description Desktop recording deleted */
+      204: {
+        content: never;
+      };
+      /** @description Unknown desktop recording */
+      404: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop recording is still active */
+      409: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * Download a desktop recording.
+   * @description Serves the recorded MP4 bytes for a completed desktop recording.
+   */
+  get_v1_desktop_recording_download: {
+    parameters: {
+      path: {
+        /** @description Desktop recording ID */
+        id: string;
+      };
+    };
+    responses: {
+      /** @description Desktop recording as MP4 bytes */
+      200: {
+        content: never;
+      };
+      /** @description Unknown desktop recording */
+      404: {
        content: {
          "application/json": components["schemas"]["ProblemDetails"];
        };
@ -1327,14 +1810,27 @@ export interface operations {
  /**
   * Capture a full desktop screenshot.
   * @description Performs a health-gated full-frame screenshot of the managed desktop and
-   * returns PNG bytes.
+   * returns the requested image bytes.
   */
  get_v1_desktop_screenshot: {
+    parameters: {
+      query?: {
+        format?: components["schemas"]["DesktopScreenshotFormat"] | null;
+        quality?: number | null;
+        scale?: number | null;
+      };
+    };
    responses: {
-      /** @description Desktop screenshot as PNG bytes */
+      /** @description Desktop screenshot as image bytes */
      200: {
        content: never;
      };
+      /** @description Invalid screenshot query */
+      400: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
      /** @description Desktop runtime is not ready */
      409: {
        content: {
@ -1342,7 +1838,7 @@ export interface operations {
        };
      };
      /** @description Desktop runtime health or screenshot capture failed */
-      503: {
+      502: {
        content: {
          "application/json": components["schemas"]["ProblemDetails"];
        };
@ -1352,23 +1848,22 @@ export interface operations {
  /**
   * Capture a desktop screenshot region.
   * @description Performs a health-gated screenshot crop against the managed desktop and
-   * returns the requested PNG region bytes.
+   * returns the requested region image bytes.
   */
  get_v1_desktop_screenshot_region: {
    parameters: {
      query: {
-        /** @description Region x coordinate */
        x: number;
-        /** @description Region y coordinate */
        y: number;
-        /** @description Region width */
        width: number;
-        /** @description Region height */
        height: number;
+        format?: components["schemas"]["DesktopScreenshotFormat"] | null;
+        quality?: number | null;
+        scale?: number | null;
      };
    };
    responses: {
-      /** @description Desktop screenshot region as PNG bytes */
+      /** @description Desktop screenshot region as image bytes */
      200: {
        content: never;
      };
@ -1385,7 +1880,7 @@ export interface operations {
        };
      };
      /** @description Desktop runtime health or screenshot capture failed */
-      503: {
+      502: {
        content: {
          "application/json": components["schemas"]["ProblemDetails"];
        };
@ -1478,6 +1973,92 @@ export interface operations {
      };
    };
  };
+  /**
+   * Start desktop streaming.
+   * @description Enables desktop websocket streaming for the managed desktop.
+   */
+  post_v1_desktop_stream_start: {
+    responses: {
+      /** @description Desktop streaming started */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopStreamStatusResponse"];
+        };
+      };
+    };
+  };
+  /**
+   * Stop desktop streaming.
+   * @description Disables desktop websocket streaming for the managed desktop.
+   */
+  post_v1_desktop_stream_stop: {
+    responses: {
+      /** @description Desktop streaming stopped */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopStreamStatusResponse"];
+        };
+      };
+    };
+  };
+  /**
+   * Open a desktop websocket streaming session.
+   * @description Upgrades the connection to a websocket that streams JPEG desktop frames and
+   * accepts mouse and keyboard control frames.
+   */
+  get_v1_desktop_stream_ws: {
+    parameters: {
+      query?: {
+        /** @description Bearer token alternative for WS auth */
+        access_token?: string | null;
+      };
+    };
+    responses: {
+      /** @description WebSocket upgraded */
+      101: {
+        content: never;
+      };
+      /** @description Desktop runtime or streaming session is not ready */
+      409: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop stream failed */
+      502: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
+  /**
+   * List visible desktop windows.
+   * @description Performs a health-gated visible-window enumeration against the managed
+   * desktop and returns the current window metadata.
+   */
+  get_v1_desktop_windows: {
+    responses: {
+      /** @description Visible desktop windows */
+      200: {
+        content: {
+          "application/json": components["schemas"]["DesktopWindowListResponse"];
+        };
+      };
+      /** @description Desktop runtime is not ready */
+      409: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+      /** @description Desktop runtime health or window query failed */
+      503: {
+        content: {
+          "application/json": components["schemas"]["ProblemDetails"];
+        };
+      };
+    };
+  };
  get_v1_fs_entries: {
    parameters: {
      query?: {
@ -1633,6 +2214,11 @@ export interface operations {
   * by the runtime, sorted by process ID.
   */
  get_v1_processes: {
+    parameters: {
+      query?: {
+        owner?: components["schemas"]["ProcessOwner"] | null;
+      };
+    };
    responses: {
      /** @description List processes */
      200: {
--- a/sdks/typescript/src/index.ts
+++ b/sdks/typescript/src/index.ts
@ -13,10 +13,18 @@ export {
 export { AcpRpcError } from "acp-http-client";

 export { buildInspectorUrl } from "./inspector.ts";
+export { DesktopStreamSession } from "./desktop-stream.ts";
+export type {
+  DesktopStreamConnectOptions,
+  DesktopStreamErrorStatus,
+  DesktopStreamReadyStatus,
+  DesktopStreamStatusMessage,
+} from "./desktop-stream.ts";

 export type {
  SandboxAgentHealthWaitOptions,
  AgentQueryOptions,
+  DesktopStreamSessionOptions,
  ProcessLogFollowQuery,
  ProcessLogListener,
  ProcessLogSubscription,
@ -51,21 +59,34 @@ export type {
  DesktopActionResponse,
  DesktopDisplayInfoResponse,
  DesktopErrorInfo,
+  DesktopKeyboardDownRequest,
+  DesktopKeyboardUpRequest,
+  DesktopKeyModifiers,
  DesktopKeyboardPressRequest,
  DesktopKeyboardTypeRequest,
  DesktopMouseButton,
  DesktopMouseClickRequest,
+  DesktopMouseDownRequest,
  DesktopMouseDragRequest,
  DesktopMouseMoveRequest,
  DesktopMousePositionResponse,
  DesktopMouseScrollRequest,
+  DesktopMouseUpRequest,
  DesktopProcessInfo,
+  DesktopRecordingInfo,
+  DesktopRecordingListResponse,
+  DesktopRecordingStartRequest,
+  DesktopRecordingStatus,
  DesktopRegionScreenshotQuery,
  DesktopResolution,
+  DesktopScreenshotFormat,
  DesktopScreenshotQuery,
  DesktopStartRequest,
  DesktopState,
  DesktopStatusResponse,
+  DesktopStreamStatusResponse,
+  DesktopWindowInfo,
+  DesktopWindowListResponse,
  FsActionResponse,
  FsDeleteQuery,
  FsEntriesQuery,
@ -90,10 +111,12 @@ export type {
  ProcessInfo,
  ProcessInputRequest,
  ProcessInputResponse,
+  ProcessListQuery,
  ProcessListResponse,
  ProcessLogEntry,
  ProcessLogsQuery,
  ProcessLogsResponse,
+  ProcessOwner,
  ProcessLogsStream,
  ProcessRunRequest,
  ProcessRunResponse,
--- a/sdks/typescript/src/types.ts
+++ b/sdks/typescript/src/types.ts
@ -10,6 +10,7 @@ export type DesktopErrorInfo = components["schemas"]["DesktopErrorInfo"];
 export type DesktopProcessInfo = components["schemas"]["DesktopProcessInfo"];
 export type DesktopStatusResponse = JsonResponse<operations["get_v1_desktop_status"], 200>;
 export type DesktopStartRequest = JsonRequestBody<operations["post_v1_desktop_start"]>;
+export type DesktopScreenshotFormat = components["schemas"]["DesktopScreenshotFormat"];
 export type DesktopScreenshotQuery =
  QueryParams<operations["get_v1_desktop_screenshot"]> extends never
    ? Record<string, never>
@ -19,12 +20,24 @@ export type DesktopMousePositionResponse = JsonResponse<operations["get_v1_deskt
 export type DesktopMouseButton = components["schemas"]["DesktopMouseButton"];
 export type DesktopMouseMoveRequest = JsonRequestBody<operations["post_v1_desktop_mouse_move"]>;
 export type DesktopMouseClickRequest = JsonRequestBody<operations["post_v1_desktop_mouse_click"]>;
+export type DesktopMouseDownRequest = JsonRequestBody<operations["post_v1_desktop_mouse_down"]>;
+export type DesktopMouseUpRequest = JsonRequestBody<operations["post_v1_desktop_mouse_up"]>;
 export type DesktopMouseDragRequest = JsonRequestBody<operations["post_v1_desktop_mouse_drag"]>;
 export type DesktopMouseScrollRequest = JsonRequestBody<operations["post_v1_desktop_mouse_scroll"]>;
 export type DesktopKeyboardTypeRequest = JsonRequestBody<operations["post_v1_desktop_keyboard_type"]>;
+export type DesktopKeyModifiers = components["schemas"]["DesktopKeyModifiers"];
 export type DesktopKeyboardPressRequest = JsonRequestBody<operations["post_v1_desktop_keyboard_press"]>;
+export type DesktopKeyboardDownRequest = JsonRequestBody<operations["post_v1_desktop_keyboard_down"]>;
+export type DesktopKeyboardUpRequest = JsonRequestBody<operations["post_v1_desktop_keyboard_up"]>;
 export type DesktopActionResponse = JsonResponse<operations["post_v1_desktop_keyboard_type"], 200>;
 export type DesktopDisplayInfoResponse = JsonResponse<operations["get_v1_desktop_display_info"], 200>;
+export type DesktopWindowInfo = components["schemas"]["DesktopWindowInfo"];
+export type DesktopWindowListResponse = JsonResponse<operations["get_v1_desktop_windows"], 200>;
+export type DesktopRecordingStartRequest = JsonRequestBody<operations["post_v1_desktop_recording_start"]>;
+export type DesktopRecordingStatus = components["schemas"]["DesktopRecordingStatus"];
+export type DesktopRecordingInfo = JsonResponse<operations["post_v1_desktop_recording_start"], 200>;
+export type DesktopRecordingListResponse = JsonResponse<operations["get_v1_desktop_recordings"], 200>;
+export type DesktopStreamStatusResponse = JsonResponse<operations["post_v1_desktop_stream_start"], 200>;
 export type AgentListResponse = JsonResponse<operations["get_v1_agents"], 200>;
 export type AgentInfo = components["schemas"]["AgentInfo"];
 export type AgentQuery = QueryParams<operations["get_v1_agents"]>;
@ -58,11 +71,13 @@ export type ProcessCreateRequest = JsonRequestBody<operations["post_v1_processes
 export type ProcessInfo = components["schemas"]["ProcessInfo"];
 export type ProcessInputRequest = JsonRequestBody<operations["post_v1_process_input"]>;
 export type ProcessInputResponse = JsonResponse<operations["post_v1_process_input"], 200>;
+export type ProcessListQuery = QueryParams<operations["get_v1_processes"]>;
 export type ProcessListResponse = JsonResponse<operations["get_v1_processes"], 200>;
 export type ProcessLogEntry = components["schemas"]["ProcessLogEntry"];
 export type ProcessLogsQuery = QueryParams<operations["get_v1_process_logs"]>;
 export type ProcessLogsResponse = JsonResponse<operations["get_v1_process_logs"], 200>;
 export type ProcessLogsStream = components["schemas"]["ProcessLogsStream"];
+export type ProcessOwner = components["schemas"]["ProcessOwner"];
 export type ProcessRunRequest = JsonRequestBody<operations["post_v1_processes_run"]>;
 export type ProcessRunResponse = JsonResponse<operations["post_v1_processes_run"], 200>;
 export type ProcessSignalQuery = QueryParams<operations["post_v1_process_stop"]>;
--- a/server/packages/sandbox-agent/src/desktop_install.rs
+++ b/server/packages/sandbox-agent/src/desktop_install.rs
@ -105,6 +105,7 @@ fn desktop_packages(package_manager: DesktopPackageManager, no_fonts: bool) -> V
            "openbox",
            "xdotool",
            "imagemagick",
+            "ffmpeg",
            "x11-xserver-utils",
            "dbus-x11",
            "xauth",
@ -115,6 +116,7 @@ fn desktop_packages(package_manager: DesktopPackageManager, no_fonts: bool) -> V
            "openbox",
            "xdotool",
            "ImageMagick",
+            "ffmpeg",
            "xrandr",
            "dbus-x11",
            "xauth",
@ -125,6 +127,7 @@ fn desktop_packages(package_manager: DesktopPackageManager, no_fonts: bool) -> V
            "openbox",
            "xdotool",
            "imagemagick",
+            "ffmpeg",
            "xrandr",
            "dbus",
            "xauth",
--- a/server/packages/sandbox-agent/src/desktop_recording.rs
+++ b/server/packages/sandbox-agent/src/desktop_recording.rs
@ -0,0 +1,309 @@
+use std::collections::BTreeMap;
+use std::fs;
+use std::path::{Path, PathBuf};
+use std::sync::Arc;
+
+use tokio::sync::Mutex;
+
+use sandbox_agent_error::SandboxError;
+
+use crate::desktop_types::{
+    DesktopRecordingInfo, DesktopRecordingListResponse, DesktopRecordingStartRequest,
+    DesktopRecordingStatus, DesktopResolution,
+};
+use crate::process_runtime::{ProcessOwner, ProcessRuntime, ProcessStartSpec, ProcessStatus, RestartPolicy};
+
+#[derive(Debug, Clone)]
+pub struct DesktopRecordingContext {
+    pub display: String,
+    pub environment: std::collections::HashMap<String, String>,
+    pub resolution: DesktopResolution,
+}
+
+#[derive(Debug, Clone)]
+pub struct DesktopRecordingManager {
+    process_runtime: Arc<ProcessRuntime>,
+    recordings_dir: PathBuf,
+    inner: Arc<Mutex<DesktopRecordingState>>,
+}
+
+#[derive(Debug, Default)]
+struct DesktopRecordingState {
+    next_id: u64,
+    current_id: Option<String>,
+    recordings: BTreeMap<String, RecordingEntry>,
+}
+
+#[derive(Debug, Clone)]
+struct RecordingEntry {
+    info: DesktopRecordingInfo,
+    path: PathBuf,
+}
+
+impl DesktopRecordingManager {
+    pub fn new(process_runtime: Arc<ProcessRuntime>, state_dir: PathBuf) -> Self {
+        Self {
+            process_runtime,
+            recordings_dir: state_dir.join("recordings"),
+            inner: Arc::new(Mutex::new(DesktopRecordingState::default())),
+        }
+    }
+
+    pub async fn start(
+        &self,
+        context: DesktopRecordingContext,
+        request: DesktopRecordingStartRequest,
+    ) -> Result<DesktopRecordingInfo, SandboxError> {
+        if find_binary("ffmpeg").is_none() {
+            return Err(SandboxError::Conflict {
+                message: "ffmpeg is required for desktop recording".to_string(),
+            });
+        }
+
+        self.ensure_recordings_dir()?;
+
+        {
+            let mut state = self.inner.lock().await;
+            self.refresh_locked(&mut state).await?;
+            if state.current_id.is_some() {
+                return Err(SandboxError::Conflict {
+                    message: "a desktop recording is already active".to_string(),
+                });
+            }
+        }
+
+        let mut state = self.inner.lock().await;
+        let id_num = state.next_id + 1;
+        state.next_id = id_num;
+        let id = format!("rec_{id_num}");
+        let file_name = format!("{id}.mp4");
+        let path = self.recordings_dir.join(&file_name);
+        let fps = request.fps.unwrap_or(30).clamp(1, 60);
+        let args = vec![
+            "-y".to_string(),
+            "-video_size".to_string(),
+            format!("{}x{}", context.resolution.width, context.resolution.height),
+            "-framerate".to_string(),
+            fps.to_string(),
+            "-f".to_string(),
+            "x11grab".to_string(),
+            "-i".to_string(),
+            context.display,
+            "-c:v".to_string(),
+            "libx264".to_string(),
+            "-preset".to_string(),
+            "ultrafast".to_string(),
+            "-pix_fmt".to_string(),
+            "yuv420p".to_string(),
+            path.to_string_lossy().to_string(),
+        ];
+        let snapshot = self
+            .process_runtime
+            .start_process(ProcessStartSpec {
+                command: "ffmpeg".to_string(),
+                args,
+                cwd: None,
+                env: context.environment,
+                tty: false,
+                interactive: false,
+                owner: ProcessOwner::Desktop,
+                restart_policy: Some(RestartPolicy::Never),
+            })
+            .await?;
+
+        let info = DesktopRecordingInfo {
+            id: id.clone(),
+            status: DesktopRecordingStatus::Recording,
+            process_id: Some(snapshot.id),
+            file_name,
+            bytes: 0,
+            started_at: chrono::Utc::now().to_rfc3339(),
+            ended_at: None,
+        };
+        state.current_id = Some(id.clone());
+        state.recordings.insert(
+            id,
+            RecordingEntry {
+                info: info.clone(),
+                path,
+            },
+        );
+        Ok(info)
+    }
+
+    pub async fn stop(&self) -> Result<DesktopRecordingInfo, SandboxError> {
+        let (recording_id, process_id) = {
+            let mut state = self.inner.lock().await;
+            self.refresh_locked(&mut state).await?;
+            let recording_id = state.current_id.clone().ok_or_else(|| SandboxError::Conflict {
+                message: "no desktop recording is active".to_string(),
+            })?;
+            let process_id = state
+                .recordings
+                .get(&recording_id)
+                .and_then(|entry| entry.info.process_id.clone());
+            (recording_id, process_id)
+        };
+
+        if let Some(process_id) = process_id {
+            let snapshot = self.process_runtime.stop_process(&process_id, Some(5_000)).await?;
+            if snapshot.status == ProcessStatus::Running {
+                let _ = self.process_runtime.kill_process(&process_id, Some(1_000)).await;
+            }
+        }
+
+        let mut state = self.inner.lock().await;
+        self.refresh_locked(&mut state).await?;
+        let entry = state
+            .recordings
+            .get(&recording_id)
+            .ok_or_else(|| SandboxError::NotFound {
+                resource: "desktop_recording".to_string(),
+                id: recording_id.clone(),
+            })?;
+        Ok(entry.info.clone())
+    }
+
+    pub async fn list(&self) -> Result<DesktopRecordingListResponse, SandboxError> {
+        let mut state = self.inner.lock().await;
+        self.refresh_locked(&mut state).await?;
+        Ok(DesktopRecordingListResponse {
+            recordings: state.recordings.values().map(|entry| entry.info.clone()).collect(),
+        })
+    }
+
+    pub async fn get(&self, id: &str) -> Result<DesktopRecordingInfo, SandboxError> {
+        let mut state = self.inner.lock().await;
+        self.refresh_locked(&mut state).await?;
+        state
+            .recordings
+            .get(id)
+            .map(|entry| entry.info.clone())
+            .ok_or_else(|| SandboxError::NotFound {
+                resource: "desktop_recording".to_string(),
+                id: id.to_string(),
+            })
+    }
+
+    pub async fn download_path(&self, id: &str) -> Result<PathBuf, SandboxError> {
+        let mut state = self.inner.lock().await;
+        self.refresh_locked(&mut state).await?;
+        let entry = state
+            .recordings
+            .get(id)
+            .ok_or_else(|| SandboxError::NotFound {
+                resource: "desktop_recording".to_string(),
+                id: id.to_string(),
+            })?;
+        if !entry.path.is_file() {
+            return Err(SandboxError::NotFound {
+                resource: "desktop_recording_file".to_string(),
+                id: id.to_string(),
+            });
+        }
+        Ok(entry.path.clone())
+    }
+
+    pub async fn delete(&self, id: &str) -> Result<(), SandboxError> {
+        let mut state = self.inner.lock().await;
+        self.refresh_locked(&mut state).await?;
+        if state.current_id.as_deref() == Some(id) {
+            return Err(SandboxError::Conflict {
+                message: "stop the active desktop recording before deleting it".to_string(),
+            });
+        }
+        let entry = state
+            .recordings
+            .remove(id)
+            .ok_or_else(|| SandboxError::NotFound {
+                resource: "desktop_recording".to_string(),
+                id: id.to_string(),
+            })?;
+        if entry.path.exists() {
+            fs::remove_file(&entry.path).map_err(|err| SandboxError::StreamError {
+                message: format!(
+                    "failed to delete desktop recording {}: {err}",
+                    entry.path.display()
+                ),
+            })?;
+        }
+        Ok(())
+    }
+
+    fn ensure_recordings_dir(&self) -> Result<(), SandboxError> {
+        fs::create_dir_all(&self.recordings_dir).map_err(|err| SandboxError::StreamError {
+            message: format!(
+                "failed to create desktop recordings dir {}: {err}",
+                self.recordings_dir.display()
+            ),
+        })
+    }
+
+    async fn refresh_locked(&self, state: &mut DesktopRecordingState) -> Result<(), SandboxError> {
+        let ids: Vec<String> = state.recordings.keys().cloned().collect();
+        for id in ids {
+            let should_clear_current = {
+                let Some(entry) = state.recordings.get_mut(&id) else {
+                    continue;
+                };
+                let Some(process_id) = entry.info.process_id.clone() else {
+                    Self::refresh_bytes(entry);
+                    continue;
+                };
+
+                let snapshot = match self.process_runtime.snapshot(&process_id).await {
+                    Ok(snapshot) => snapshot,
+                    Err(SandboxError::NotFound { .. }) => {
+                        Self::finalize_entry(entry, false);
+                        continue;
+                    }
+                    Err(err) => return Err(err),
+                };
+
+                if snapshot.status == ProcessStatus::Running {
+                    Self::refresh_bytes(entry);
+                    false
+                } else {
+                    Self::finalize_entry(entry, snapshot.exit_code == Some(0));
+                    true
+                }
+            };
+
+            if should_clear_current && state.current_id.as_deref() == Some(id.as_str()) {
+                state.current_id = None;
+            }
+        }
+
+        Ok(())
+    }
+
+    fn refresh_bytes(entry: &mut RecordingEntry) {
+        entry.info.bytes = file_size(&entry.path);
+    }
+
+    fn finalize_entry(entry: &mut RecordingEntry, success: bool) {
+        let bytes = file_size(&entry.path);
+        entry.info.status = if success || (entry.path.is_file() && bytes > 0) {
+            DesktopRecordingStatus::Completed
+        } else {
+            DesktopRecordingStatus::Failed
+        };
+        entry.info.ended_at.get_or_insert_with(|| chrono::Utc::now().to_rfc3339());
+        entry.info.bytes = bytes;
+    }
+}
+
+fn find_binary(name: &str) -> Option<PathBuf> {
+    let path_env = std::env::var_os("PATH")?;
+    for path in std::env::split_paths(&path_env) {
+        let candidate = path.join(name);
+        if candidate.is_file() {
+            return Some(candidate);
+        }
+    }
+    None
+}
+
+fn file_size(path: &Path) -> u64 {
+    fs::metadata(path).map(|metadata| metadata.len()).unwrap_or(0)
+}
--- a/server/packages/sandbox-agent/src/desktop_runtime.rs
+++ b/server/packages/sandbox-agent/src/desktop_runtime.rs
--- a/server/packages/sandbox-agent/src/desktop_streaming.rs
+++ b/server/packages/sandbox-agent/src/desktop_streaming.rs
@ -0,0 +1,47 @@
+use std::sync::Arc;
+
+use tokio::sync::Mutex;
+
+use sandbox_agent_error::SandboxError;
+
+use crate::desktop_types::DesktopStreamStatusResponse;
+
+#[derive(Debug, Clone)]
+pub struct DesktopStreamingManager {
+    inner: Arc<Mutex<DesktopStreamingState>>,
+}
+
+#[derive(Debug, Default)]
+struct DesktopStreamingState {
+    active: bool,
+}
+
+impl DesktopStreamingManager {
+    pub fn new() -> Self {
+        Self {
+            inner: Arc::new(Mutex::new(DesktopStreamingState::default())),
+        }
+    }
+
+    pub async fn start(&self) -> DesktopStreamStatusResponse {
+        let mut state = self.inner.lock().await;
+        state.active = true;
+        DesktopStreamStatusResponse { active: true }
+    }
+
+    pub async fn stop(&self) -> DesktopStreamStatusResponse {
+        let mut state = self.inner.lock().await;
+        state.active = false;
+        DesktopStreamStatusResponse { active: false }
+    }
+
+    pub async fn ensure_active(&self) -> Result<(), SandboxError> {
+        if self.inner.lock().await.active {
+            Ok(())
+        } else {
+            Err(SandboxError::Conflict {
+                message: "desktop streaming is not active".to_string(),
+            })
+        }
+    }
+}
--- a/server/packages/sandbox-agent/src/desktop_types.rs
+++ b/server/packages/sandbox-agent/src/desktop_types.rs
@ -1,6 +1,6 @@
 use schemars::JsonSchema;
 use serde::{Deserialize, Serialize};
-use utoipa::ToSchema;
+use utoipa::{IntoParams, ToSchema};

 #[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
 #[serde(rename_all = "snake_case")]
@ -62,7 +62,7 @@ pub struct DesktopStatusResponse {
    pub runtime_log_path: Option<String>,
 }

-#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, Default)]
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)]
 #[serde(rename_all = "camelCase")]
 pub struct DesktopStartRequest {
    #[serde(default, skip_serializing_if = "Option::is_none")]
@ -73,17 +73,38 @@ pub struct DesktopStartRequest {
    pub dpi: Option<u32>,
 }

-#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, Default)]
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams, Default)]
 #[serde(rename_all = "camelCase")]
-pub struct DesktopScreenshotQuery {}
+pub struct DesktopScreenshotQuery {
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub format: Option<DesktopScreenshotFormat>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub quality: Option<u8>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub scale: Option<f32>,
+}

-#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
+#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
+#[serde(rename_all = "lowercase")]
+pub enum DesktopScreenshotFormat {
+    Png,
+    Jpeg,
+    Webp,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams)]
 #[serde(rename_all = "camelCase")]
 pub struct DesktopRegionScreenshotQuery {
    pub x: i32,
    pub y: i32,
    pub width: u32,
    pub height: u32,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub format: Option<DesktopScreenshotFormat>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub quality: Option<u8>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub scale: Option<f32>,
 }

 #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
@ -123,6 +144,28 @@ pub struct DesktopMouseClickRequest {
    pub click_count: Option<u32>,
 }

+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopMouseDownRequest {
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub x: Option<i32>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub y: Option<i32>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub button: Option<DesktopMouseButton>,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopMouseUpRequest {
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub x: Option<i32>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub y: Option<i32>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub button: Option<DesktopMouseButton>,
+}
+
 #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
 #[serde(rename_all = "camelCase")]
 pub struct DesktopMouseDragRequest {
@ -157,6 +200,33 @@ pub struct DesktopKeyboardTypeRequest {
 #[serde(rename_all = "camelCase")]
 pub struct DesktopKeyboardPressRequest {
    pub key: String,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub modifiers: Option<DesktopKeyModifiers>,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq, Default)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopKeyModifiers {
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub ctrl: Option<bool>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub shift: Option<bool>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub alt: Option<bool>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub cmd: Option<bool>,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopKeyboardDownRequest {
+    pub key: String,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopKeyboardUpRequest {
+    pub key: String,
 }

 #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
@ -171,3 +241,62 @@ pub struct DesktopDisplayInfoResponse {
    pub display: String,
    pub resolution: DesktopResolution,
 }
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopWindowInfo {
+    pub id: String,
+    pub title: String,
+    pub x: i32,
+    pub y: i32,
+    pub width: u32,
+    pub height: u32,
+    pub is_active: bool,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopWindowListResponse {
+    pub windows: Vec<DesktopWindowInfo>,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, Default)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopRecordingStartRequest {
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub fps: Option<u32>,
+}
+
+#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
+#[serde(rename_all = "lowercase")]
+pub enum DesktopRecordingStatus {
+    Recording,
+    Completed,
+    Failed,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopRecordingInfo {
+    pub id: String,
+    pub status: DesktopRecordingStatus,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub process_id: Option<String>,
+    pub file_name: String,
+    pub bytes: u64,
+    pub started_at: String,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub ended_at: Option<String>,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopRecordingListResponse {
+    pub recordings: Vec<DesktopRecordingInfo>,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
+#[serde(rename_all = "camelCase")]
+pub struct DesktopStreamStatusResponse {
+    pub active: bool,
+}
--- a/server/packages/sandbox-agent/src/lib.rs
+++ b/server/packages/sandbox-agent/src/lib.rs
@ -5,7 +5,9 @@ pub mod cli;
 pub mod daemon;
 mod desktop_errors;
 mod desktop_install;
+mod desktop_recording;
 mod desktop_runtime;
+mod desktop_streaming;
 pub mod desktop_types;
 mod process_runtime;
 pub mod router;
--- a/server/packages/sandbox-agent/src/process_runtime.rs
+++ b/server/packages/sandbox-agent/src/process_runtime.rs
@ -1,5 +1,5 @@
 use std::collections::{HashMap, VecDeque};
-use std::sync::atomic::{AtomicU64, Ordering};
+use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
 use std::sync::Arc;
 use std::time::Instant;

@ -27,6 +27,22 @@ pub enum ProcessStream {
    Pty,
 }

+#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
+#[serde(rename_all = "lowercase")]
+pub enum ProcessOwner {
+    User,
+    Desktop,
+    System,
+}
+
+#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
+#[serde(rename_all = "snake_case")]
+pub enum RestartPolicy {
+    Never,
+    Always,
+    OnFailure,
+}
+
 #[derive(Debug, Clone)]
 pub struct ProcessStartSpec {
    pub command: String,
@ -35,6 +51,8 @@ pub struct ProcessStartSpec {
    pub env: HashMap<String, String>,
    pub tty: bool,
    pub interactive: bool,
+    pub owner: ProcessOwner,
+    pub restart_policy: Option<RestartPolicy>,
 }

 #[derive(Debug, Clone)]
@ -78,6 +96,7 @@ pub struct ProcessSnapshot {
    pub cwd: Option<String>,
    pub tty: bool,
    pub interactive: bool,
+    pub owner: ProcessOwner,
    pub status: ProcessStatus,
    pub pid: Option<u32>,
    pub exit_code: Option<i32>,
@ -129,17 +148,27 @@ struct ManagedProcess {
    cwd: Option<String>,
    tty: bool,
    interactive: bool,
+    owner: ProcessOwner,
+    #[allow(dead_code)]
+    restart_policy: RestartPolicy,
+    spec: ProcessStartSpec,
    created_at_ms: i64,
-    pid: Option<u32>,
    max_log_bytes: usize,
-    stdin: Mutex<Option<ProcessStdin>>,
-    #[cfg(unix)]
-    pty_resize_fd: Mutex<Option<std::fs::File>>,
+    runtime: Mutex<ManagedRuntime>,
    status: RwLock<ManagedStatus>,
    sequence: AtomicU64,
    logs: Mutex<VecDeque<StoredLog>>,
    total_log_bytes: Mutex<usize>,
    log_tx: broadcast::Sender<ProcessLogLine>,
+    stop_requested: AtomicBool,
+}
+
+#[derive(Debug)]
+struct ManagedRuntime {
+    pid: Option<u32>,
+    stdin: Option<ProcessStdin>,
+    #[cfg(unix)]
+    pty_resize_fd: Option<std::fs::File>,
 }

 #[derive(Debug)]
@ -162,17 +191,17 @@ struct ManagedStatus {
 }

 struct SpawnedPipeProcess {
-    process: Arc<ManagedProcess>,
    child: Child,
    stdout: tokio::process::ChildStdout,
    stderr: tokio::process::ChildStderr,
+    runtime: ManagedRuntime,
 }

 #[cfg(unix)]
 struct SpawnedTtyProcess {
-    process: Arc<ManagedProcess>,
    child: Child,
    reader: tokio::fs::File,
+    runtime: ManagedRuntime,
 }

 impl ProcessRuntime {
@ -224,21 +253,14 @@ impl ProcessRuntime {
        &self,
        spec: ProcessStartSpec,
    ) -> Result<ProcessSnapshot, SandboxError> {
-        let config = self.get_config().await;
-
-        let process_refs = {
-            let processes = self.inner.processes.read().await;
-            processes.values().cloned().collect::<Vec<_>>()
-        };
-
-        let mut running_count = 0usize;
-        for process in process_refs {
-            if process.status.read().await.status == ProcessStatus::Running {
-                running_count += 1;
-            }
+        if spec.command.trim().is_empty() {
+            return Err(SandboxError::InvalidRequest {
+                message: "command must not be empty".to_string(),
+            });
        }

-        if running_count >= config.max_concurrent_processes {
+        let config = self.get_config().await;
+        if self.running_process_count().await >= config.max_concurrent_processes {
            return Err(SandboxError::Conflict {
                message: format!(
                    "max concurrent process limit reached ({})",
@ -247,73 +269,40 @@ impl ProcessRuntime {
            });
        }

-        if spec.command.trim().is_empty() {
-            return Err(SandboxError::InvalidRequest {
-                message: "command must not be empty".to_string(),
-            });
-        }
-
        let id_num = self.inner.next_id.fetch_add(1, Ordering::Relaxed);
        let id = format!("proc_{id_num}");
-
-        if spec.tty {
-            #[cfg(unix)]
-            {
-                let spawned = self
-                    .spawn_tty_process(id.clone(), spec, config.max_log_bytes_per_process)
-                    .await?;
-                let process = spawned.process.clone();
-                self.inner
-                    .processes
-                    .write()
-                    .await
-                    .insert(id, process.clone());
-
-                let p = process.clone();
-                tokio::spawn(async move {
-                    pump_output(p, spawned.reader, ProcessStream::Pty).await;
-                });
-
-                let p = process.clone();
-                tokio::spawn(async move {
-                    watch_exit(p, spawned.child).await;
-                });
-
-                return Ok(process.snapshot().await);
-            }
-            #[cfg(not(unix))]
-            {
-                return Err(SandboxError::StreamError {
-                    message: "tty process mode is not supported on this platform".to_string(),
-                });
-            }
-        }
-
-        let spawned = self
-            .spawn_pipe_process(id.clone(), spec, config.max_log_bytes_per_process)
-            .await?;
-        let process = spawned.process.clone();
-        self.inner
-            .processes
-            .write()
-            .await
-            .insert(id, process.clone());
-
-        let p = process.clone();
-        tokio::spawn(async move {
-            pump_output(p, spawned.stdout, ProcessStream::Stdout).await;
-        });
-
-        let p = process.clone();
-        tokio::spawn(async move {
-            pump_output(p, spawned.stderr, ProcessStream::Stderr).await;
-        });
-
-        let p = process.clone();
-        tokio::spawn(async move {
-            watch_exit(p, spawned.child).await;
+        let process = Arc::new(ManagedProcess {
+            id: id.clone(),
+            command: spec.command.clone(),
+            args: spec.args.clone(),
+            cwd: spec.cwd.clone(),
+            tty: spec.tty,
+            interactive: spec.interactive,
+            owner: spec.owner,
+            restart_policy: spec.restart_policy.unwrap_or(RestartPolicy::Never),
+            spec,
+            created_at_ms: now_ms(),
+            max_log_bytes: config.max_log_bytes_per_process,
+            runtime: Mutex::new(ManagedRuntime {
+                pid: None,
+                stdin: None,
+                #[cfg(unix)]
+                pty_resize_fd: None,
+            }),
+            status: RwLock::new(ManagedStatus {
+                status: ProcessStatus::Running,
+                exit_code: None,
+                exited_at_ms: None,
+            }),
+            sequence: AtomicU64::new(1),
+            logs: Mutex::new(VecDeque::new()),
+            total_log_bytes: Mutex::new(0),
+            log_tx: broadcast::channel(512).0,
+            stop_requested: AtomicBool::new(false),
        });

+        self.spawn_existing_process(process.clone()).await?;
+        self.inner.processes.write().await.insert(id, process.clone());
        Ok(process.snapshot().await)
    }

@ -412,11 +401,13 @@ impl ProcessRuntime {
        })
    }

-    pub async fn list_processes(&self) -> Vec<ProcessSnapshot> {
+    pub async fn list_processes(&self, owner: Option<ProcessOwner>) -> Vec<ProcessSnapshot> {
        let processes = self.inner.processes.read().await;
        let mut items = Vec::with_capacity(processes.len());
        for process in processes.values() {
-            items.push(process.snapshot().await);
+            if owner.is_none_or(|expected| process.owner == expected) {
+                items.push(process.snapshot().await);
+            }
        }
        items.sort_by(|a, b| a.id.cmp(&b.id));
        items
@ -453,6 +444,7 @@ impl ProcessRuntime {
        wait_ms: Option<u64>,
    ) -> Result<ProcessSnapshot, SandboxError> {
        let process = self.lookup_process(id).await?;
+        process.stop_requested.store(true, Ordering::SeqCst);
        process.send_signal(SIGTERM).await?;
        maybe_wait_for_exit(process.clone(), wait_ms.unwrap_or(2_000)).await;
        Ok(process.snapshot().await)
@ -464,6 +456,7 @@ impl ProcessRuntime {
        wait_ms: Option<u64>,
    ) -> Result<ProcessSnapshot, SandboxError> {
        let process = self.lookup_process(id).await?;
+        process.stop_requested.store(true, Ordering::SeqCst);
        process.send_signal(SIGKILL).await?;
        maybe_wait_for_exit(process.clone(), wait_ms.unwrap_or(1_000)).await;
        Ok(process.snapshot().await)
@ -506,6 +499,17 @@ impl ProcessRuntime {
        Ok(process.log_tx.subscribe())
    }

+    async fn running_process_count(&self) -> usize {
+        let processes = self.inner.processes.read().await;
+        let mut running = 0usize;
+        for process in processes.values() {
+            if process.status.read().await.status == ProcessStatus::Running {
+                running += 1;
+            }
+        }
+        running
+    }
+
    async fn lookup_process(&self, id: &str) -> Result<Arc<ManagedProcess>, SandboxError> {
        let process = self.inner.processes.read().await.get(id).cloned();
        process.ok_or_else(|| SandboxError::NotFound {
@ -514,12 +518,81 @@ impl ProcessRuntime {
        })
    }

-    async fn spawn_pipe_process(
+    async fn spawn_existing_process(
        &self,
-        id: String,
-        spec: ProcessStartSpec,
-        max_log_bytes: usize,
-    ) -> Result<SpawnedPipeProcess, SandboxError> {
+        process: Arc<ManagedProcess>,
+    ) -> Result<(), SandboxError> {
+        process.stop_requested.store(false, Ordering::SeqCst);
+        let mut runtime_guard = process.runtime.lock().await;
+        let mut status_guard = process.status.write().await;
+
+        if process.tty {
+            #[cfg(unix)]
+            {
+                let SpawnedTtyProcess {
+                    child,
+                    reader,
+                    runtime,
+                } = self.spawn_tty_process(&process.spec)?;
+                *runtime_guard = runtime;
+                status_guard.status = ProcessStatus::Running;
+                status_guard.exit_code = None;
+                status_guard.exited_at_ms = None;
+                drop(status_guard);
+                drop(runtime_guard);
+
+                let process_for_output = process.clone();
+                tokio::spawn(async move {
+                    pump_output(process_for_output, reader, ProcessStream::Pty).await;
+                });
+
+                let runtime = self.clone();
+                tokio::spawn(async move {
+                    watch_exit(runtime, process, child).await;
+                });
+
+                return Ok(());
+            }
+            #[cfg(not(unix))]
+            {
+                return Err(SandboxError::StreamError {
+                    message: "tty process mode is not supported on this platform".to_string(),
+                });
+            }
+        }
+
+        let SpawnedPipeProcess {
+            child,
+            stdout,
+            stderr,
+            runtime,
+        } = self.spawn_pipe_process(&process.spec)?;
+        *runtime_guard = runtime;
+        status_guard.status = ProcessStatus::Running;
+        status_guard.exit_code = None;
+        status_guard.exited_at_ms = None;
+        drop(status_guard);
+        drop(runtime_guard);
+
+        let process_for_stdout = process.clone();
+        tokio::spawn(async move {
+            pump_output(process_for_stdout, stdout, ProcessStream::Stdout).await;
+        });
+
+        let process_for_stderr = process.clone();
+        tokio::spawn(async move {
+            pump_output(process_for_stderr, stderr, ProcessStream::Stderr).await;
+        });
+
+        let runtime = self.clone();
+        tokio::spawn(async move {
+            watch_exit(runtime, process, child).await;
+        });
+
+        Ok(())
+    }
+
+    fn spawn_pipe_process(&self, spec: &ProcessStartSpec) -> Result<SpawnedPipeProcess, SandboxError> {
        let mut cmd = Command::new(&spec.command);
        cmd.args(&spec.args)
            .stdin(std::process::Stdio::piped())
@ -551,35 +624,14 @@ impl ProcessRuntime {
            .ok_or_else(|| SandboxError::StreamError {
                message: "failed to capture stderr".to_string(),
            })?;
-        let pid = child.id();
-
-        let (tx, _rx) = broadcast::channel(512);
-        let process = Arc::new(ManagedProcess {
-            id,
-            command: spec.command,
-            args: spec.args,
-            cwd: spec.cwd,
-            tty: false,
-            interactive: spec.interactive,
-            created_at_ms: now_ms(),
-            pid,
-            max_log_bytes,
-            stdin: Mutex::new(stdin.map(ProcessStdin::Pipe)),
-            #[cfg(unix)]
-            pty_resize_fd: Mutex::new(None),
-            status: RwLock::new(ManagedStatus {
-                status: ProcessStatus::Running,
-                exit_code: None,
-                exited_at_ms: None,
-            }),
-            sequence: AtomicU64::new(1),
-            logs: Mutex::new(VecDeque::new()),
-            total_log_bytes: Mutex::new(0),
-            log_tx: tx,
-        });

        Ok(SpawnedPipeProcess {
-            process,
+            runtime: ManagedRuntime {
+                pid: child.id(),
+                stdin: stdin.map(ProcessStdin::Pipe),
+                #[cfg(unix)]
+                pty_resize_fd: None,
+            },
            child,
            stdout,
            stderr,
@ -587,12 +639,7 @@ impl ProcessRuntime {
    }

    #[cfg(unix)]
-    async fn spawn_tty_process(
-        &self,
-        id: String,
-        spec: ProcessStartSpec,
-        max_log_bytes: usize,
-    ) -> Result<SpawnedTtyProcess, SandboxError> {
+    fn spawn_tty_process(&self, spec: &ProcessStartSpec) -> Result<SpawnedTtyProcess, SandboxError> {
        use std::os::fd::AsRawFd;
        use std::process::Stdio;

@ -632,8 +679,8 @@ impl ProcessRuntime {
        let child = cmd.spawn().map_err(|err| SandboxError::StreamError {
            message: format!("failed to spawn tty process: {err}"),
        })?;
-
        let pid = child.id();
+
        drop(slave_fd);

        let master_raw = master_fd.as_raw_fd();
@ -644,32 +691,12 @@ impl ProcessRuntime {
        let writer_file = tokio::fs::File::from_std(std::fs::File::from(writer_fd));
        let resize_file = std::fs::File::from(resize_fd);

-        let (tx, _rx) = broadcast::channel(512);
-        let process = Arc::new(ManagedProcess {
-            id,
-            command: spec.command,
-            args: spec.args,
-            cwd: spec.cwd,
-            tty: true,
-            interactive: spec.interactive,
-            created_at_ms: now_ms(),
-            pid,
-            max_log_bytes,
-            stdin: Mutex::new(Some(ProcessStdin::Pty(writer_file))),
-            pty_resize_fd: Mutex::new(Some(resize_file)),
-            status: RwLock::new(ManagedStatus {
-                status: ProcessStatus::Running,
-                exit_code: None,
-                exited_at_ms: None,
-            }),
-            sequence: AtomicU64::new(1),
-            logs: Mutex::new(VecDeque::new()),
-            total_log_bytes: Mutex::new(0),
-            log_tx: tx,
-        });
-
        Ok(SpawnedTtyProcess {
-            process,
+            runtime: ManagedRuntime {
+                pid,
+                stdin: Some(ProcessStdin::Pty(writer_file)),
+                pty_resize_fd: Some(resize_file),
+            },
            child,
            reader: reader_file,
        })
@ -694,6 +721,7 @@ pub struct ProcessLogFilter {
 impl ManagedProcess {
    async fn snapshot(&self) -> ProcessSnapshot {
        let status = self.status.read().await.clone();
+        let pid = self.runtime.lock().await.pid;
        ProcessSnapshot {
            id: self.id.clone(),
            command: self.command.clone(),
@ -701,8 +729,9 @@ impl ManagedProcess {
            cwd: self.cwd.clone(),
            tty: self.tty,
            interactive: self.interactive,
+            owner: self.owner,
            status: status.status,
-            pid: self.pid,
+            pid,
            exit_code: status.exit_code,
            created_at_ms: self.created_at_ms,
            exited_at_ms: status.exited_at_ms,
@ -752,8 +781,8 @@ impl ManagedProcess {
            });
        }

-        let mut guard = self.stdin.lock().await;
-        let stdin = guard.as_mut().ok_or_else(|| SandboxError::Conflict {
+        let mut runtime = self.runtime.lock().await;
+        let stdin = runtime.stdin.as_mut().ok_or_else(|| SandboxError::Conflict {
            message: "process does not accept stdin".to_string(),
        })?;

@ -825,7 +854,7 @@ impl ManagedProcess {
        if self.status.read().await.status != ProcessStatus::Running {
            return Ok(());
        }
-        let Some(pid) = self.pid else {
+        let Some(pid) = self.runtime.lock().await.pid else {
            return Ok(());
        };

@ -840,8 +869,9 @@ impl ManagedProcess {
        #[cfg(unix)]
        {
            use std::os::fd::AsRawFd;
-            let guard = self.pty_resize_fd.lock().await;
-            let Some(fd) = guard.as_ref() else {
+
+            let runtime = self.runtime.lock().await;
+            let Some(fd) = runtime.pty_resize_fd.as_ref() else {
                return Err(SandboxError::Conflict {
                    message: "PTY resize handle unavailable".to_string(),
                });
@ -857,6 +887,32 @@ impl ManagedProcess {

        Ok(())
    }
+
+    #[allow(dead_code)]
+    fn should_restart(&self, exit_code: Option<i32>) -> bool {
+        match self.restart_policy {
+            RestartPolicy::Never => false,
+            RestartPolicy::Always => true,
+            RestartPolicy::OnFailure => exit_code.unwrap_or(1) != 0,
+        }
+    }
+
+    async fn mark_exited(&self, exit_code: Option<i32>, exited_at_ms: Option<i64>) {
+        {
+            let mut status = self.status.write().await;
+            status.status = ProcessStatus::Exited;
+            status.exit_code = exit_code;
+            status.exited_at_ms = exited_at_ms;
+        }
+
+        let mut runtime = self.runtime.lock().await;
+        runtime.pid = None;
+        let _ = runtime.stdin.take();
+        #[cfg(unix)]
+        {
+            let _ = runtime.pty_resize_fd.take();
+        }
+    }
 }

 fn stream_matches(stream: ProcessStream, filter: ProcessLogFilterStream) -> bool {
@ -909,21 +965,16 @@ where
    }
 }

-async fn watch_exit(process: Arc<ManagedProcess>, mut child: Child) {
+async fn watch_exit(runtime: ProcessRuntime, process: Arc<ManagedProcess>, mut child: Child) {
+    let _ = runtime;
    let wait = child.wait().await;
    let (exit_code, exited_at_ms) = match wait {
        Ok(status) => (status.code(), Some(now_ms())),
        Err(_) => (None, Some(now_ms())),
    };

-    {
-        let mut state = process.status.write().await;
-        state.status = ProcessStatus::Exited;
-        state.exit_code = exit_code;
-        state.exited_at_ms = exited_at_ms;
-    }
-
-    let _ = process.stdin.lock().await.take();
+    let _ = process.stop_requested.swap(false, Ordering::SeqCst);
+    process.mark_exited(exit_code, exited_at_ms).await;
 }

 async fn capture_output<R>(mut reader: R, max_bytes: usize) -> std::io::Result<(Vec<u8>, bool)>
--- a/server/packages/sandbox-agent/src/router.rs
+++ b/server/packages/sandbox-agent/src/router.rs
@ -34,15 +34,16 @@ use tar::Archive;
 use tokio_stream::wrappers::BroadcastStream;
 use tower_http::trace::TraceLayer;
 use tracing::Span;
-use utoipa::{Modify, OpenApi, ToSchema};
+use utoipa::{IntoParams, Modify, OpenApi, ToSchema};

 use crate::acp_proxy_runtime::{AcpProxyRuntime, ProxyPostOutcome};
 use crate::desktop_errors::DesktopProblem;
 use crate::desktop_runtime::DesktopRuntime;
 use crate::desktop_types::*;
 use crate::process_runtime::{
-    decode_input_bytes, ProcessLogFilter, ProcessLogFilterStream, ProcessRuntime,
-    ProcessRuntimeConfig, ProcessSnapshot, ProcessStartSpec, ProcessStatus, ProcessStream, RunSpec,
+    decode_input_bytes, ProcessLogFilter, ProcessLogFilterStream, ProcessOwner as RuntimeProcessOwner,
+    ProcessRuntime, ProcessRuntimeConfig, ProcessSnapshot, ProcessStartSpec, ProcessStatus,
+    ProcessStream, RunSpec,
 };
 use crate::ui;

@ -115,7 +116,7 @@ impl AppState {
            },
        ));
        let process_runtime = Arc::new(ProcessRuntime::new());
-        let desktop_runtime = Arc::new(DesktopRuntime::new());
+        let desktop_runtime = Arc::new(DesktopRuntime::new(process_runtime.clone()));
        Self {
            auth,
            agent_manager,
@ -196,6 +197,8 @@ pub fn build_router_with_state(shared: Arc<AppState>) -> (Router, Arc<AppState>)
        )
        .route("/desktop/mouse/move", post(post_v1_desktop_mouse_move))
        .route("/desktop/mouse/click", post(post_v1_desktop_mouse_click))
+        .route("/desktop/mouse/down", post(post_v1_desktop_mouse_down))
+        .route("/desktop/mouse/up", post(post_v1_desktop_mouse_up))
        .route("/desktop/mouse/drag", post(post_v1_desktop_mouse_drag))
        .route("/desktop/mouse/scroll", post(post_v1_desktop_mouse_scroll))
        .route(
@ -206,7 +209,33 @@ pub fn build_router_with_state(shared: Arc<AppState>) -> (Router, Arc<AppState>)
            "/desktop/keyboard/press",
            post(post_v1_desktop_keyboard_press),
        )
+        .route(
+            "/desktop/keyboard/down",
+            post(post_v1_desktop_keyboard_down),
+        )
+        .route("/desktop/keyboard/up", post(post_v1_desktop_keyboard_up))
        .route("/desktop/display/info", get(get_v1_desktop_display_info))
+        .route("/desktop/windows", get(get_v1_desktop_windows))
+        .route(
+            "/desktop/recording/start",
+            post(post_v1_desktop_recording_start),
+        )
+        .route(
+            "/desktop/recording/stop",
+            post(post_v1_desktop_recording_stop),
+        )
+        .route("/desktop/recordings", get(get_v1_desktop_recordings))
+        .route(
+            "/desktop/recordings/:id",
+            get(get_v1_desktop_recording).delete(delete_v1_desktop_recording),
+        )
+        .route(
+            "/desktop/recordings/:id/download",
+            get(get_v1_desktop_recording_download),
+        )
+        .route("/desktop/stream/start", post(post_v1_desktop_stream_start))
+        .route("/desktop/stream/stop", post(post_v1_desktop_stream_stop))
+        .route("/desktop/stream/ws", get(get_v1_desktop_stream_ws))
        .route("/agents", get(get_v1_agents))
        .route("/agents/:agent", get(get_v1_agent))
        .route("/agents/:agent/install", post(post_v1_agent_install))
@ -366,11 +395,25 @@ pub async fn shutdown_servers(state: &Arc<AppState>) {
        get_v1_desktop_mouse_position,
        post_v1_desktop_mouse_move,
        post_v1_desktop_mouse_click,
+        post_v1_desktop_mouse_down,
+        post_v1_desktop_mouse_up,
        post_v1_desktop_mouse_drag,
        post_v1_desktop_mouse_scroll,
        post_v1_desktop_keyboard_type,
        post_v1_desktop_keyboard_press,
+        post_v1_desktop_keyboard_down,
+        post_v1_desktop_keyboard_up,
        get_v1_desktop_display_info,
+        get_v1_desktop_windows,
+        post_v1_desktop_recording_start,
+        post_v1_desktop_recording_stop,
+        get_v1_desktop_recordings,
+        get_v1_desktop_recording,
+        get_v1_desktop_recording_download,
+        delete_v1_desktop_recording,
+        post_v1_desktop_stream_start,
+        post_v1_desktop_stream_stop,
+        get_v1_desktop_stream_ws,
        get_v1_agents,
        get_v1_agent,
        post_v1_agent_install,
@ -416,17 +459,30 @@ pub async fn shutdown_servers(state: &Arc<AppState>) {
            DesktopStatusResponse,
            DesktopStartRequest,
            DesktopScreenshotQuery,
+            DesktopScreenshotFormat,
            DesktopRegionScreenshotQuery,
            DesktopMousePositionResponse,
            DesktopMouseButton,
            DesktopMouseMoveRequest,
            DesktopMouseClickRequest,
+            DesktopMouseDownRequest,
+            DesktopMouseUpRequest,
            DesktopMouseDragRequest,
            DesktopMouseScrollRequest,
            DesktopKeyboardTypeRequest,
            DesktopKeyboardPressRequest,
+            DesktopKeyModifiers,
+            DesktopKeyboardDownRequest,
+            DesktopKeyboardUpRequest,
            DesktopActionResponse,
            DesktopDisplayInfoResponse,
+            DesktopWindowInfo,
+            DesktopWindowListResponse,
+            DesktopRecordingStartRequest,
+            DesktopRecordingStatus,
+            DesktopRecordingInfo,
+            DesktopRecordingListResponse,
+            DesktopStreamStatusResponse,
            ServerStatus,
            ServerStatusInfo,
            AgentCapabilities,
@ -448,12 +504,14 @@ pub async fn shutdown_servers(state: &Arc<AppState>) {
            FsActionResponse,
            FsUploadBatchResponse,
            ProcessConfig,
+            ProcessOwner,
            ProcessCreateRequest,
            ProcessRunRequest,
            ProcessRunResponse,
            ProcessState,
            ProcessInfo,
            ProcessListResponse,
+            ProcessListQuery,
            ProcessLogsStream,
            ProcessLogsQuery,
            ProcessLogEntry,
@ -616,40 +674,42 @@ async fn post_v1_desktop_stop(
 /// Capture a full desktop screenshot.
 ///
 /// Performs a health-gated full-frame screenshot of the managed desktop and
-/// returns PNG bytes.
+/// returns the requested image bytes.
 #[utoipa::path(
    get,
    path = "/v1/desktop/screenshot",
    tag = "v1",
+    params(DesktopScreenshotQuery),
    responses(
-        (status = 200, description = "Desktop screenshot as PNG bytes"),
+        (status = 200, description = "Desktop screenshot as image bytes"),
+        (status = 400, description = "Invalid screenshot query", body = ProblemDetails),
        (status = 409, description = "Desktop runtime is not ready", body = ProblemDetails),
        (status = 502, description = "Desktop runtime health or screenshot capture failed", body = ProblemDetails)
    )
 )]
 async fn get_v1_desktop_screenshot(
    State(state): State<Arc<AppState>>,
+    Query(query): Query<DesktopScreenshotQuery>,
 ) -> Result<Response, ApiError> {
-    let bytes = state.desktop_runtime().screenshot().await?;
-    Ok(([(header::CONTENT_TYPE, "image/png")], Bytes::from(bytes)).into_response())
+    let screenshot = state.desktop_runtime().screenshot(query).await?;
+    Ok((
+        [(header::CONTENT_TYPE, screenshot.content_type)],
+        Bytes::from(screenshot.bytes),
+    )
+        .into_response())
 }

 /// Capture a desktop screenshot region.
 ///
 /// Performs a health-gated screenshot crop against the managed desktop and
-/// returns the requested PNG region bytes.
+/// returns the requested region image bytes.
 #[utoipa::path(
    get,
    path = "/v1/desktop/screenshot/region",
    tag = "v1",
-    params(
-        ("x" = i32, Query, description = "Region x coordinate"),
-        ("y" = i32, Query, description = "Region y coordinate"),
-        ("width" = u32, Query, description = "Region width"),
-        ("height" = u32, Query, description = "Region height")
-    ),
+    params(DesktopRegionScreenshotQuery),
    responses(
-        (status = 200, description = "Desktop screenshot region as PNG bytes"),
+        (status = 200, description = "Desktop screenshot region as image bytes"),
        (status = 400, description = "Invalid screenshot region", body = ProblemDetails),
        (status = 409, description = "Desktop runtime is not ready", body = ProblemDetails),
        (status = 502, description = "Desktop runtime health or screenshot capture failed", body = ProblemDetails)
@ -659,8 +719,12 @@ async fn get_v1_desktop_screenshot_region(
    State(state): State<Arc<AppState>>,
    Query(query): Query<DesktopRegionScreenshotQuery>,
 ) -> Result<Response, ApiError> {
-    let bytes = state.desktop_runtime().screenshot_region(query).await?;
-    Ok(([(header::CONTENT_TYPE, "image/png")], Bytes::from(bytes)).into_response())
+    let screenshot = state.desktop_runtime().screenshot_region(query).await?;
+    Ok((
+        [(header::CONTENT_TYPE, screenshot.content_type)],
+        Bytes::from(screenshot.bytes),
+    )
+        .into_response())
 }

 /// Get the current desktop mouse position.
@ -731,6 +795,54 @@ async fn post_v1_desktop_mouse_click(
    Ok(Json(position))
 }

+/// Press and hold a desktop mouse button.
+///
+/// Performs a health-gated optional pointer move followed by `xdotool mousedown`
+/// and returns the resulting mouse position.
+#[utoipa::path(
+    post,
+    path = "/v1/desktop/mouse/down",
+    tag = "v1",
+    request_body = DesktopMouseDownRequest,
+    responses(
+        (status = 200, description = "Desktop mouse position after button press", body = DesktopMousePositionResponse),
+        (status = 400, description = "Invalid mouse down request", body = ProblemDetails),
+        (status = 409, description = "Desktop runtime is not ready", body = ProblemDetails),
+        (status = 502, description = "Desktop runtime health or input failed", body = ProblemDetails)
+    )
+)]
+async fn post_v1_desktop_mouse_down(
+    State(state): State<Arc<AppState>>,
+    Json(body): Json<DesktopMouseDownRequest>,
+) -> Result<Json<DesktopMousePositionResponse>, ApiError> {
+    let position = state.desktop_runtime().mouse_down(body).await?;
+    Ok(Json(position))
+}
+
+/// Release a desktop mouse button.
+///
+/// Performs a health-gated optional pointer move followed by `xdotool mouseup`
+/// and returns the resulting mouse position.
+#[utoipa::path(
+    post,
+    path = "/v1/desktop/mouse/up",
+    tag = "v1",
+    request_body = DesktopMouseUpRequest,
+    responses(
+        (status = 200, description = "Desktop mouse position after button release", body = DesktopMousePositionResponse),
+        (status = 400, description = "Invalid mouse up request", body = ProblemDetails),
+        (status = 409, description = "Desktop runtime is not ready", body = ProblemDetails),
+        (status = 502, description = "Desktop runtime health or input failed", body = ProblemDetails)
+    )
+)]
+async fn post_v1_desktop_mouse_up(
+    State(state): State<Arc<AppState>>,
+    Json(body): Json<DesktopMouseUpRequest>,
+) -> Result<Json<DesktopMousePositionResponse>, ApiError> {
+    let position = state.desktop_runtime().mouse_up(body).await?;
+    Ok(Json(position))
+}
+
 /// Drag the desktop mouse.
 ///
 /// Performs a health-gated drag gesture against the managed desktop and
@ -827,6 +939,54 @@ async fn post_v1_desktop_keyboard_press(
    Ok(Json(response))
 }

+/// Press and hold a desktop keyboard key.
+///
+/// Performs a health-gated `xdotool keydown` operation against the managed
+/// desktop.
+#[utoipa::path(
+    post,
+    path = "/v1/desktop/keyboard/down",
+    tag = "v1",
+    request_body = DesktopKeyboardDownRequest,
+    responses(
+        (status = 200, description = "Desktop keyboard action result", body = DesktopActionResponse),
+        (status = 400, description = "Invalid keyboard down request", body = ProblemDetails),
+        (status = 409, description = "Desktop runtime is not ready", body = ProblemDetails),
+        (status = 502, description = "Desktop runtime health or input failed", body = ProblemDetails)
+    )
+)]
+async fn post_v1_desktop_keyboard_down(
+    State(state): State<Arc<AppState>>,
+    Json(body): Json<DesktopKeyboardDownRequest>,
+) -> Result<Json<DesktopActionResponse>, ApiError> {
+    let response = state.desktop_runtime().key_down(body).await?;
+    Ok(Json(response))
+}
+
+/// Release a desktop keyboard key.
+///
+/// Performs a health-gated `xdotool keyup` operation against the managed
+/// desktop.
+#[utoipa::path(
+    post,
+    path = "/v1/desktop/keyboard/up",
+    tag = "v1",
+    request_body = DesktopKeyboardUpRequest,
+    responses(
+        (status = 200, description = "Desktop keyboard action result", body = DesktopActionResponse),
+        (status = 400, description = "Invalid keyboard up request", body = ProblemDetails),
+        (status = 409, description = "Desktop runtime is not ready", body = ProblemDetails),
+        (status = 502, description = "Desktop runtime health or input failed", body = ProblemDetails)
+    )
+)]
+async fn post_v1_desktop_keyboard_up(
+    State(state): State<Arc<AppState>>,
+    Json(body): Json<DesktopKeyboardUpRequest>,
+) -> Result<Json<DesktopActionResponse>, ApiError> {
+    let response = state.desktop_runtime().key_up(body).await?;
+    Ok(Json(response))
+}
+
 /// Get desktop display information.
 ///
 /// Performs a health-gated display query against the managed desktop and
@ -848,6 +1008,225 @@ async fn get_v1_desktop_display_info(
    Ok(Json(info))
 }

+/// List visible desktop windows.
+///
+/// Performs a health-gated visible-window enumeration against the managed
+/// desktop and returns the current window metadata.
+#[utoipa::path(
+    get,
+    path = "/v1/desktop/windows",
+    tag = "v1",
+    responses(
+        (status = 200, description = "Visible desktop windows", body = DesktopWindowListResponse),
+        (status = 409, description = "Desktop runtime is not ready", body = ProblemDetails),
+        (status = 503, description = "Desktop runtime health or window query failed", body = ProblemDetails)
+    )
+)]
+async fn get_v1_desktop_windows(
+    State(state): State<Arc<AppState>>,
+) -> Result<Json<DesktopWindowListResponse>, ApiError> {
+    let windows = state.desktop_runtime().list_windows().await?;
+    Ok(Json(windows))
+}
+
+/// Start desktop recording.
+///
+/// Starts an ffmpeg x11grab recording against the managed desktop and returns
+/// the created recording metadata.
+#[utoipa::path(
+    post,
+    path = "/v1/desktop/recording/start",
+    tag = "v1",
+    request_body = DesktopRecordingStartRequest,
+    responses(
+        (status = 200, description = "Desktop recording started", body = DesktopRecordingInfo),
+        (status = 409, description = "Desktop runtime is not ready or a recording is already active", body = ProblemDetails),
+        (status = 502, description = "Desktop recording failed", body = ProblemDetails)
+    )
+)]
+async fn post_v1_desktop_recording_start(
+    State(state): State<Arc<AppState>>,
+    Json(body): Json<DesktopRecordingStartRequest>,
+) -> Result<Json<DesktopRecordingInfo>, ApiError> {
+    let recording = state.desktop_runtime().start_recording(body).await?;
+    Ok(Json(recording))
+}
+
+/// Stop desktop recording.
+///
+/// Stops the active desktop recording and returns the finalized recording
+/// metadata.
+#[utoipa::path(
+    post,
+    path = "/v1/desktop/recording/stop",
+    tag = "v1",
+    responses(
+        (status = 200, description = "Desktop recording stopped", body = DesktopRecordingInfo),
+        (status = 409, description = "No active desktop recording", body = ProblemDetails),
+        (status = 502, description = "Desktop recording stop failed", body = ProblemDetails)
+    )
+)]
+async fn post_v1_desktop_recording_stop(
+    State(state): State<Arc<AppState>>,
+) -> Result<Json<DesktopRecordingInfo>, ApiError> {
+    let recording = state.desktop_runtime().stop_recording().await?;
+    Ok(Json(recording))
+}
+
+/// List desktop recordings.
+///
+/// Returns the current desktop recording catalog.
+#[utoipa::path(
+    get,
+    path = "/v1/desktop/recordings",
+    tag = "v1",
+    responses(
+        (status = 200, description = "Desktop recordings", body = DesktopRecordingListResponse),
+        (status = 502, description = "Desktop recordings query failed", body = ProblemDetails)
+    )
+)]
+async fn get_v1_desktop_recordings(
+    State(state): State<Arc<AppState>>,
+) -> Result<Json<DesktopRecordingListResponse>, ApiError> {
+    let recordings = state.desktop_runtime().list_recordings().await?;
+    Ok(Json(recordings))
+}
+
+/// Get desktop recording metadata.
+///
+/// Returns metadata for a single desktop recording.
+#[utoipa::path(
+    get,
+    path = "/v1/desktop/recordings/{id}",
+    tag = "v1",
+    params(
+        ("id" = String, Path, description = "Desktop recording ID")
+    ),
+    responses(
+        (status = 200, description = "Desktop recording metadata", body = DesktopRecordingInfo),
+        (status = 404, description = "Unknown desktop recording", body = ProblemDetails)
+    )
+)]
+async fn get_v1_desktop_recording(
+    State(state): State<Arc<AppState>>,
+    Path(id): Path<String>,
+) -> Result<Json<DesktopRecordingInfo>, ApiError> {
+    let recording = state.desktop_runtime().get_recording(&id).await?;
+    Ok(Json(recording))
+}
+
+/// Download a desktop recording.
+///
+/// Serves the recorded MP4 bytes for a completed desktop recording.
+#[utoipa::path(
+    get,
+    path = "/v1/desktop/recordings/{id}/download",
+    tag = "v1",
+    params(
+        ("id" = String, Path, description = "Desktop recording ID")
+    ),
+    responses(
+        (status = 200, description = "Desktop recording as MP4 bytes"),
+        (status = 404, description = "Unknown desktop recording", body = ProblemDetails)
+    )
+)]
+async fn get_v1_desktop_recording_download(
+    State(state): State<Arc<AppState>>,
+    Path(id): Path<String>,
+) -> Result<Response, ApiError> {
+    let path = state.desktop_runtime().recording_download_path(&id).await?;
+    let bytes = tokio::fs::read(&path).await.map_err(|err| SandboxError::StreamError {
+        message: format!("failed to read desktop recording {}: {err}", path.display()),
+    })?;
+    Ok(([(header::CONTENT_TYPE, "video/mp4")], Bytes::from(bytes)).into_response())
+}
+
+/// Delete a desktop recording.
+///
+/// Removes a completed desktop recording and its file from disk.
+#[utoipa::path(
+    delete,
+    path = "/v1/desktop/recordings/{id}",
+    tag = "v1",
+    params(
+        ("id" = String, Path, description = "Desktop recording ID")
+    ),
+    responses(
+        (status = 204, description = "Desktop recording deleted"),
+        (status = 404, description = "Unknown desktop recording", body = ProblemDetails),
+        (status = 409, description = "Desktop recording is still active", body = ProblemDetails)
+    )
+)]
+async fn delete_v1_desktop_recording(
+    State(state): State<Arc<AppState>>,
+    Path(id): Path<String>,
+) -> Result<StatusCode, ApiError> {
+    state.desktop_runtime().delete_recording(&id).await?;
+    Ok(StatusCode::NO_CONTENT)
+}
+
+/// Start desktop streaming.
+///
+/// Enables desktop websocket streaming for the managed desktop.
+#[utoipa::path(
+    post,
+    path = "/v1/desktop/stream/start",
+    tag = "v1",
+    responses(
+        (status = 200, description = "Desktop streaming started", body = DesktopStreamStatusResponse)
+    )
+)]
+async fn post_v1_desktop_stream_start(
+    State(state): State<Arc<AppState>>,
+) -> Result<Json<DesktopStreamStatusResponse>, ApiError> {
+    Ok(Json(state.desktop_runtime().start_streaming().await))
+}
+
+/// Stop desktop streaming.
+///
+/// Disables desktop websocket streaming for the managed desktop.
+#[utoipa::path(
+    post,
+    path = "/v1/desktop/stream/stop",
+    tag = "v1",
+    responses(
+        (status = 200, description = "Desktop streaming stopped", body = DesktopStreamStatusResponse)
+    )
+)]
+async fn post_v1_desktop_stream_stop(
+    State(state): State<Arc<AppState>>,
+) -> Result<Json<DesktopStreamStatusResponse>, ApiError> {
+    Ok(Json(state.desktop_runtime().stop_streaming().await))
+}
+
+/// Open a desktop websocket streaming session.
+///
+/// Upgrades the connection to a websocket that streams JPEG desktop frames and
+/// accepts mouse and keyboard control frames.
+#[utoipa::path(
+    get,
+    path = "/v1/desktop/stream/ws",
+    tag = "v1",
+    params(
+        ("access_token" = Option<String>, Query, description = "Bearer token alternative for WS auth")
+    ),
+    responses(
+        (status = 101, description = "WebSocket upgraded"),
+        (status = 409, description = "Desktop runtime or streaming session is not ready", body = ProblemDetails),
+        (status = 502, description = "Desktop stream failed", body = ProblemDetails)
+    )
+)]
+async fn get_v1_desktop_stream_ws(
+    State(state): State<Arc<AppState>>,
+    Query(_query): Query<ProcessWsQuery>,
+    ws: WebSocketUpgrade,
+) -> Result<Response, ApiError> {
+    state.desktop_runtime().ensure_streaming_active().await?;
+    Ok(ws
+        .on_upgrade(move |socket| desktop_stream_ws_session(socket, state.desktop_runtime()))
+        .into_response())
+}
+
 #[utoipa::path(
    get,
    path = "/v1/agents",
@ -1610,6 +1989,8 @@ async fn post_v1_processes(
            env: body.env.into_iter().collect(),
            tty: body.tty,
            interactive: body.interactive,
+            owner: RuntimeProcessOwner::User,
+            restart_policy: None,
        })
        .await?;

@ -1670,6 +2051,7 @@ async fn post_v1_processes_run(
    get,
    path = "/v1/processes",
    tag = "v1",
+    params(ProcessListQuery),
    responses(
        (status = 200, description = "List processes", body = ProcessListResponse),
        (status = 501, description = "Process API unsupported on this platform", body = ProblemDetails)
@ -1677,12 +2059,16 @@ async fn post_v1_processes_run(
 )]
 async fn get_v1_processes(
    State(state): State<Arc<AppState>>,
+    Query(query): Query<ProcessListQuery>,
 ) -> Result<Json<ProcessListResponse>, ApiError> {
    if !process_api_supported() {
        return Err(process_api_not_supported().into());
    }

-    let snapshots = state.process_runtime().list_processes().await;
+    let snapshots = state
+        .process_runtime()
+        .list_processes(query.owner.map(into_runtime_process_owner))
+        .await;
    Ok(Json(ProcessListResponse {
        processes: snapshots.into_iter().map(map_process_snapshot).collect(),
    }))
@ -2063,6 +2449,46 @@ enum TerminalClientFrame {
    Close,
 }

+#[derive(Debug, Deserialize)]
+#[serde(tag = "type", rename_all = "camelCase")]
+enum DesktopStreamClientFrame {
+    MoveMouse {
+        x: i32,
+        y: i32,
+    },
+    MouseDown {
+        #[serde(default)]
+        x: Option<i32>,
+        #[serde(default)]
+        y: Option<i32>,
+        #[serde(default)]
+        button: Option<DesktopMouseButton>,
+    },
+    MouseUp {
+        #[serde(default)]
+        x: Option<i32>,
+        #[serde(default)]
+        y: Option<i32>,
+        #[serde(default)]
+        button: Option<DesktopMouseButton>,
+    },
+    Scroll {
+        x: i32,
+        y: i32,
+        #[serde(default)]
+        delta_x: Option<i32>,
+        #[serde(default)]
+        delta_y: Option<i32>,
+    },
+    KeyDown {
+        key: String,
+    },
+    KeyUp {
+        key: String,
+    },
+    Close,
+}
+
 async fn process_terminal_ws_session(
    mut socket: WebSocket,
    runtime: Arc<ProcessRuntime>,
@ -2175,6 +2601,133 @@ async fn process_terminal_ws_session(
    }
 }

+async fn desktop_stream_ws_session(mut socket: WebSocket, desktop_runtime: Arc<DesktopRuntime>) {
+    let display_info = match desktop_runtime.display_info().await {
+        Ok(info) => info,
+        Err(err) => {
+            let _ = send_ws_error(&mut socket, &err.to_error_info().message).await;
+            let _ = socket.close().await;
+            return;
+        }
+    };
+
+    if send_ws_json(
+        &mut socket,
+        json!({
+            "type": "ready",
+            "width": display_info.resolution.width,
+            "height": display_info.resolution.height,
+        }),
+    )
+    .await
+    .is_err()
+    {
+        return;
+    }
+
+    let mut frame_tick = tokio::time::interval(Duration::from_millis(100));
+
+    loop {
+        tokio::select! {
+            ws_in = socket.recv() => {
+                match ws_in {
+                    Some(Ok(Message::Text(text))) => {
+                        match serde_json::from_str::<DesktopStreamClientFrame>(&text) {
+                            Ok(DesktopStreamClientFrame::MoveMouse { x, y }) => {
+                                if let Err(err) = desktop_runtime
+                                    .move_mouse(DesktopMouseMoveRequest { x, y })
+                                    .await
+                                {
+                                    let _ = send_ws_error(&mut socket, &err.to_error_info().message).await;
+                                }
+                            }
+                            Ok(DesktopStreamClientFrame::MouseDown { x, y, button }) => {
+                                if let Err(err) = desktop_runtime
+                                    .mouse_down(DesktopMouseDownRequest { x, y, button })
+                                    .await
+                                {
+                                    let _ = send_ws_error(&mut socket, &err.to_error_info().message).await;
+                                }
+                            }
+                            Ok(DesktopStreamClientFrame::MouseUp { x, y, button }) => {
+                                if let Err(err) = desktop_runtime
+                                    .mouse_up(DesktopMouseUpRequest { x, y, button })
+                                    .await
+                                {
+                                    let _ = send_ws_error(&mut socket, &err.to_error_info().message).await;
+                                }
+                            }
+                            Ok(DesktopStreamClientFrame::Scroll { x, y, delta_x, delta_y }) => {
+                                if let Err(err) = desktop_runtime
+                                    .scroll_mouse(DesktopMouseScrollRequest {
+                                        x,
+                                        y,
+                                        delta_x,
+                                        delta_y,
+                                    })
+                                    .await
+                                {
+                                    let _ = send_ws_error(&mut socket, &err.to_error_info().message).await;
+                                }
+                            }
+                            Ok(DesktopStreamClientFrame::KeyDown { key }) => {
+                                if let Err(err) = desktop_runtime
+                                    .key_down(DesktopKeyboardDownRequest { key })
+                                    .await
+                                {
+                                    let _ = send_ws_error(&mut socket, &err.to_error_info().message).await;
+                                }
+                            }
+                            Ok(DesktopStreamClientFrame::KeyUp { key }) => {
+                                if let Err(err) = desktop_runtime
+                                    .key_up(DesktopKeyboardUpRequest { key })
+                                    .await
+                                {
+                                    let _ = send_ws_error(&mut socket, &err.to_error_info().message).await;
+                                }
+                            }
+                            Ok(DesktopStreamClientFrame::Close) => {
+                                let _ = socket.close().await;
+                                break;
+                            }
+                            Err(err) => {
+                                let _ = send_ws_error(&mut socket, &format!("invalid desktop stream frame: {err}")).await;
+                            }
+                        }
+                    }
+                    Some(Ok(Message::Ping(payload))) => {
+                        let _ = socket.send(Message::Pong(payload)).await;
+                    }
+                    Some(Ok(Message::Close(_))) | None => break,
+                    Some(Ok(Message::Binary(_))) | Some(Ok(Message::Pong(_))) => {}
+                    Some(Err(_)) => break,
+                }
+            }
+            _ = frame_tick.tick() => {
+                let frame = desktop_runtime
+                    .screenshot(DesktopScreenshotQuery {
+                        format: Some(DesktopScreenshotFormat::Jpeg),
+                        quality: Some(60),
+                        scale: Some(1.0),
+                    })
+                    .await;
+                match frame {
+                    Ok(frame) => {
+                        if socket.send(Message::Binary(frame.bytes.into())).await.is_err() {
+                            break;
+                        }
+                    }
+                    Err(err) => {
+                        let _ = send_ws_error(&mut socket, &err.to_error_info().message).await;
+                        let _ = socket.close().await;
+                        break;
+                    }
+                }
+            }
+        }
+    }
+}
+
 async fn send_ws_json(socket: &mut WebSocket, payload: Value) -> Result<(), ()> {
    socket
        .send(Message::Text(
@ -2543,6 +3096,14 @@ fn into_runtime_process_config(config: ProcessConfig) -> ProcessRuntimeConfig {
    }
 }

+fn into_runtime_process_owner(owner: ProcessOwner) -> RuntimeProcessOwner {
+    match owner {
+        ProcessOwner::User => RuntimeProcessOwner::User,
+        ProcessOwner::Desktop => RuntimeProcessOwner::Desktop,
+        ProcessOwner::System => RuntimeProcessOwner::System,
+    }
+}
+
 fn map_process_snapshot(snapshot: ProcessSnapshot) -> ProcessInfo {
    ProcessInfo {
        id: snapshot.id,
@ -2551,6 +3112,11 @@ fn map_process_snapshot(snapshot: ProcessSnapshot) -> ProcessInfo {
        cwd: snapshot.cwd,
        tty: snapshot.tty,
        interactive: snapshot.interactive,
+        owner: match snapshot.owner {
+            RuntimeProcessOwner::User => ProcessOwner::User,
+            RuntimeProcessOwner::Desktop => ProcessOwner::Desktop,
+            RuntimeProcessOwner::System => ProcessOwner::System,
+        },
        status: match snapshot.status {
            ProcessStatus::Running => ProcessState::Running,
            ProcessStatus::Exited => ProcessState::Exited,
--- a/server/packages/sandbox-agent/src/router/support.rs
+++ b/server/packages/sandbox-agent/src/router/support.rs
@ -33,7 +33,8 @@ pub(super) async fn require_token(
        .and_then(|value| value.to_str().ok())
        .and_then(|value| value.strip_prefix("Bearer "));

-    let allow_query_token = request.uri().path().ends_with("/terminal/ws");
+    let allow_query_token = request.uri().path().ends_with("/terminal/ws")
+        || request.uri().path().ends_with("/stream/ws");
    let query_token = if allow_query_token {
        request
            .uri()
--- a/server/packages/sandbox-agent/src/router/types.rs
+++ b/server/packages/sandbox-agent/src/router/types.rs
@ -425,6 +425,14 @@ pub enum ProcessState {
    Exited,
 }

+#[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
+#[serde(rename_all = "lowercase")]
+pub enum ProcessOwner {
+    User,
+    Desktop,
+    System,
+}
+
 #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema)]
 #[serde(rename_all = "camelCase")]
 pub struct ProcessInfo {
@ -435,6 +443,7 @@ pub struct ProcessInfo {
    pub cwd: Option<String>,
    pub tty: bool,
    pub interactive: bool,
+    pub owner: ProcessOwner,
    pub status: ProcessState,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub pid: Option<u32>,
@ -451,6 +460,13 @@ pub struct ProcessListResponse {
    pub processes: Vec<ProcessInfo>,
 }

+#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, ToSchema, IntoParams)]
+#[serde(rename_all = "camelCase")]
+pub struct ProcessListQuery {
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub owner: Option<ProcessOwner>,
+}
+
 #[derive(Debug, Clone, Copy, Serialize, Deserialize, JsonSchema, ToSchema, PartialEq, Eq)]
 #[serde(rename_all = "lowercase")]
 pub enum ProcessLogsStream {
--- a/server/packages/sandbox-agent/tests/v1_api/desktop.rs
+++ b/server/packages/sandbox-agent/tests/v1_api/desktop.rs
@ -1,6 +1,28 @@
 use super::*;
+use futures::{SinkExt, StreamExt};
 use serial_test::serial;
 use std::collections::BTreeMap;
+use tokio_tungstenite::connect_async;
+use tokio_tungstenite::tungstenite::Message;
+
+fn png_dimensions(bytes: &[u8]) -> (u32, u32) {
+    assert!(bytes.starts_with(b"\x89PNG\r\n\x1a\n"));
+    let width = u32::from_be_bytes(bytes[16..20].try_into().expect("png width bytes"));
+    let height = u32::from_be_bytes(bytes[20..24].try_into().expect("png height bytes"));
+    (width, height)
+}
+
+async fn recv_ws_message(
+    ws: &mut tokio_tungstenite::WebSocketStream<
+        tokio_tungstenite::MaybeTlsStream<tokio::net::TcpStream>,
+    >,
+) -> Message {
+    tokio::time::timeout(Duration::from_secs(5), ws.next())
+        .await
+        .expect("timed out waiting for websocket frame")
+        .expect("websocket stream ended")
+        .expect("websocket frame")
+}

 #[tokio::test]
 #[serial]
@ -89,6 +111,43 @@ async fn v1_desktop_lifecycle_and_actions_work_with_real_runtime() {
        Some("image/png")
    );
    assert!(body.starts_with(b"\x89PNG\r\n\x1a\n"));
+    assert_eq!(png_dimensions(&body), (1440, 900));
+
+    let (status, headers, body) = send_request_raw(
+        &test_app.app,
+        Method::GET,
+        "/v1/desktop/screenshot?format=jpeg&quality=50",
+        None,
+        &[],
+        None,
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(
+        headers
+            .get(header::CONTENT_TYPE)
+            .and_then(|value| value.to_str().ok()),
+        Some("image/jpeg")
+    );
+    assert!(body.starts_with(&[0xff, 0xd8, 0xff]));
+
+    let (status, headers, body) = send_request_raw(
+        &test_app.app,
+        Method::GET,
+        "/v1/desktop/screenshot?scale=0.5",
+        None,
+        &[],
+        None,
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(
+        headers
+            .get(header::CONTENT_TYPE)
+            .and_then(|value| value.to_str().ok()),
+        Some("image/png")
+    );
+    assert_eq!(png_dimensions(&body), (720, 450));

    let (status, _, body) = send_request_raw(
        &test_app.app,
@ -165,6 +224,49 @@ async fn v1_desktop_lifecycle_and_actions_work_with_real_runtime() {
    assert_eq!(clicked["x"], 220);
    assert_eq!(clicked["y"], 230);

+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/mouse/down",
+        Some(json!({
+            "x": 220,
+            "y": 230,
+            "button": "left"
+        })),
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    let mouse_down = parse_json(&body);
+    assert_eq!(mouse_down["x"], 220);
+    assert_eq!(mouse_down["y"], 230);
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/mouse/move",
+        Some(json!({ "x": 260, "y": 280 })),
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    let moved_while_down = parse_json(&body);
+    assert_eq!(moved_while_down["x"], 260);
+    assert_eq!(moved_while_down["y"], 280);
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/mouse/up",
+        Some(json!({ "button": "left" })),
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    let mouse_up = parse_json(&body);
+    assert_eq!(mouse_up["x"], 260);
+    assert_eq!(mouse_up["y"], 280);
+
    let (status, _, body) = send_request(
        &test_app.app,
        Method::POST,
@ -182,6 +284,11 @@ async fn v1_desktop_lifecycle_and_actions_work_with_real_runtime() {
    assert_eq!(scrolled["x"], 220);
    assert_eq!(scrolled["y"], 230);

+    let (status, _, body) =
+        send_request(&test_app.app, Method::GET, "/v1/desktop/windows", None, &[]).await;
+    assert_eq!(status, StatusCode::OK);
+    assert!(parse_json(&body)["windows"].is_array());
+
    let (status, _, body) = send_request(
        &test_app.app,
        Method::GET,
@ -219,6 +326,167 @@ async fn v1_desktop_lifecycle_and_actions_work_with_real_runtime() {
    assert_eq!(status, StatusCode::OK);
    assert_eq!(parse_json(&body)["ok"], true);

+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/keyboard/press",
+        Some(json!({
+            "key": "l",
+            "modifiers": {
+                "ctrl": true
+            }
+        })),
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(parse_json(&body)["ok"], true);
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/keyboard/down",
+        Some(json!({ "key": "shift" })),
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(parse_json(&body)["ok"], true);
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/keyboard/up",
+        Some(json!({ "key": "shift" })),
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(parse_json(&body)["ok"], true);
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/recording/start",
+        Some(json!({ "fps": 8 })),
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    let recording = parse_json(&body);
+    let recording_id = recording["id"].as_str().expect("recording id").to_string();
+    assert_eq!(recording["status"], "recording");
+
+    tokio::time::sleep(Duration::from_secs(2)).await;
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/recording/stop",
+        None,
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    let stopped_recording = parse_json(&body);
+    assert_eq!(stopped_recording["id"], recording_id);
+    assert_eq!(stopped_recording["status"], "completed");
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::GET,
+        "/v1/desktop/recordings",
+        None,
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert!(parse_json(&body)["recordings"].is_array());
+
+    let (status, headers, body) = send_request_raw(
+        &test_app.app,
+        Method::GET,
+        &format!("/v1/desktop/recordings/{recording_id}/download"),
+        None,
+        &[],
+        None,
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(
+        headers
+            .get(header::CONTENT_TYPE)
+            .and_then(|value| value.to_str().ok()),
+        Some("video/mp4")
+    );
+    assert!(body.windows(4).any(|window| window == b"ftyp"));
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/stream/start",
+        None,
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(parse_json(&body)["active"], true);
+
+    let (mut ws, _) = connect_async(test_app.app.ws_url("/v1/desktop/stream/ws"))
+        .await
+        .expect("connect desktop stream websocket");
+
+    let ready = recv_ws_message(&mut ws).await;
+    match ready {
+        Message::Text(text) => {
+            let value: Value = serde_json::from_str(&text).expect("desktop stream ready frame");
+            assert_eq!(value["type"], "ready");
+            assert_eq!(value["width"], 1440);
+            assert_eq!(value["height"], 900);
+        }
+        other => panic!("expected text ready frame, got {other:?}"),
+    }
+
+    let frame = recv_ws_message(&mut ws).await;
+    match frame {
+        Message::Binary(bytes) => assert!(bytes.starts_with(&[0xff, 0xd8, 0xff])),
+        other => panic!("expected binary jpeg frame, got {other:?}"),
+    }
+
+    ws.send(Message::Text(
+        json!({
+            "type": "moveMouse",
+            "x": 320,
+            "y": 330
+        })
+        .to_string()
+        .into(),
+    ))
+    .await
+    .expect("send desktop stream mouse move");
+    let _ = ws.close(None).await;
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/stream/stop",
+        None,
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(parse_json(&body)["active"], false);
+
+    let (status, _, _) = send_request(
+        &test_app.app,
+        Method::DELETE,
+        &format!("/v1/desktop/recordings/{recording_id}"),
+        None,
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::NO_CONTENT);
+
    let (status, _, body) =
        send_request(&test_app.app, Method::POST, "/v1/desktop/stop", None, &[]).await;
    assert_eq!(status, StatusCode::OK);
--- a/server/packages/sandbox-agent/tests/v1_api/processes.rs
+++ b/server/packages/sandbox-agent/tests/v1_api/processes.rs
@ -2,6 +2,7 @@ use super::*;
 use base64::engine::general_purpose::STANDARD as BASE64;
 use base64::Engine;
 use futures::{SinkExt, StreamExt};
+use serial_test::serial;
 use tokio_tungstenite::connect_async;
 use tokio_tungstenite::tungstenite::Message;

@ -277,6 +278,92 @@ async fn v1_process_tty_input_and_logs() {
    assert_eq!(status, StatusCode::NO_CONTENT);
 }

+#[tokio::test]
+#[serial]
+async fn v1_processes_owner_filter_separates_user_and_desktop_processes() {
+    let test_app = TestApp::new(AuthConfig::disabled());
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/processes",
+        Some(json!({
+            "command": "sh",
+            "args": ["-lc", "sleep 30"],
+            "tty": false,
+            "interactive": false
+        })),
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    let user_process_id = parse_json(&body)["id"]
+        .as_str()
+        .expect("process id")
+        .to_string();
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::POST,
+        "/v1/desktop/start",
+        Some(json!({
+            "width": 1024,
+            "height": 768
+        })),
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(parse_json(&body)["state"], "active");
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::GET,
+        "/v1/processes?owner=user",
+        None,
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    let user_processes = parse_json(&body)["processes"]
+        .as_array()
+        .cloned()
+        .unwrap_or_default();
+    assert!(user_processes.iter().any(|process| process["id"] == user_process_id));
+    assert!(user_processes.iter().all(|process| process["owner"] == "user"));
+
+    let (status, _, body) = send_request(
+        &test_app.app,
+        Method::GET,
+        "/v1/processes?owner=desktop",
+        None,
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+    let desktop_processes = parse_json(&body)["processes"]
+        .as_array()
+        .cloned()
+        .unwrap_or_default();
+    assert!(desktop_processes.len() >= 2);
+    assert!(desktop_processes.iter().all(|process| process["owner"] == "desktop"));
+
+    let (status, _, _) = send_request(
+        &test_app.app,
+        Method::POST,
+        &format!("/v1/processes/{user_process_id}/kill"),
+        None,
+        &[],
+    )
+    .await;
+    assert_eq!(status, StatusCode::OK);
+
+    let (status, _, body) =
+        send_request(&test_app.app, Method::POST, "/v1/desktop/stop", None, &[]).await;
+    assert_eq!(status, StatusCode::OK);
+    assert_eq!(parse_json(&body)["state"], "inactive");
+}
+
 #[tokio::test]
 async fn v1_process_not_found_returns_404() {
    let test_app = TestApp::new(AuthConfig::disabled());