From 3dbd9ce52d09759b0ffa96fd60061fab5535cf89 Mon Sep 17 00:00:00 2001
From: Harivansh Rathi
Date: Thu, 26 Mar 2026 00:07:03 -0400
Subject: [PATCH 01/35] init with runtime contract
---
CONTRIBUTING.md | 2 +-
README.md | 14 +-
...{runtime-output.md => runtime-contract.md} | 0
skills/SKILL.md | 149 ------------------
skills/deskctl/SKILL.md | 132 ++++++++++++++++
skills/deskctl/references/commands.md | 75 +++++++++
skills/deskctl/references/install.md | 75 +++++++++
skills/deskctl/references/sandbox-agent.md | 61 +++++++
.../deskctl/templates/install-deskctl-npm.sh | 27 ++++
.../templates/sandbox-agent-desktop-loop.sh | 7 +
10 files changed, 390 insertions(+), 152 deletions(-)
rename docs/{runtime-output.md => runtime-contract.md} (100%)
delete mode 100644 skills/SKILL.md
create mode 100644 skills/deskctl/SKILL.md
create mode 100644 skills/deskctl/references/commands.md
create mode 100644 skills/deskctl/references/install.md
create mode 100644 skills/deskctl/references/sandbox-agent.md
create mode 100644 skills/deskctl/templates/install-deskctl-npm.sh
create mode 100644 skills/deskctl/templates/sandbox-agent-desktop-loop.sh
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index bdbce4e..926c58a 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -21,7 +21,7 @@ pnpm --dir site install
- `src/` holds production code and unit tests
- `tests/` holds integration tests
- `tests/support/` holds shared X11 and daemon helpers for integration coverage
-- `docs/runtime-output.md` is the stable-vs-best-effort runtime output contract for agent-facing CLI work
+- `docs/runtime-contract.md` is the stable-vs-best-effort runtime output contract for agent-facing CLI work
Keep integration-only helpers out of `src/`.
diff --git a/README.md b/README.md
index 036396a..db7d92f 100644
--- a/README.md
+++ b/README.md
@@ -31,6 +31,16 @@ npx deskctl-cli --help
`deskctl-cli` currently supports `linux-x64` and installs the `deskctl` command by downloading the matching GitHub Release asset.
+### Installable skill
+
+For `skills.sh` / agent skill ecosystems:
+
+```bash
+npx skills add harivansh-afk/deskctl -s deskctl
+```
+
+The installable skill lives under [`skills/deskctl`](skills/deskctl) and is designed for X11 sandboxes, VMs, and sandbox-agent desktop sessions. It points agents to the npm install path first so they can get `deskctl` without Cargo.
+
### Nix
```bash
@@ -133,7 +143,7 @@ deskctl doctor
- `@wN` refs are short-lived handles assigned by `snapshot` and `list-windows`
- `--json` output includes a stable `window_id` for programmatic targeting within the current daemon session
- `list-windows` is a cheap read-only operation and does not capture or write a screenshot
-- the stable runtime JSON/error contract is documented in [docs/runtime-output.md](docs/runtime-output.md)
+- the stable runtime JSON/error contract is documented in [docs/runtime-contract.md](docs/runtime-contract.md)
## Read and Wait Surface
@@ -189,7 +199,7 @@ Text mode is compact and follow-up-oriented, but JSON is the parsing contract.
- rely on `window_id`, selector-related fields, grouped read payloads, and structured error `kind` values for stable automation
- treat monitor naming, incidental whitespace, and default screenshot file names as best-effort
-See [docs/runtime-output.md](docs/runtime-output.md) for the exact stable-vs-best-effort breakdown.
+See [docs/runtime-conract.md](docs/runtime-contract.md) for the exact stable-vs-best-effort breakdown.
## Distribution
diff --git a/docs/runtime-output.md b/docs/runtime-contract.md
similarity index 100%
rename from docs/runtime-output.md
rename to docs/runtime-contract.md
diff --git a/skills/SKILL.md b/skills/SKILL.md
deleted file mode 100644
index efbd188..0000000
--- a/skills/SKILL.md
+++ /dev/null
@@ -1,149 +0,0 @@
----
-name: deskctl
-description: Desktop control CLI for AI agents
-allowed-tools: Bash(deskctl:*)
----
-
-# deskctl
-
-Desktop control CLI for AI agents on Linux X11. Provides a unified interface for screenshots, mouse/keyboard input, and window management with compact `@wN` window references.
-
-## Core Workflow
-
-1. **Snapshot** to see the desktop and get window refs
-2. **Query / wait** using grouped `get` and `wait` commands
-3. **Act** using refs, explicit selectors, or coordinates
-4. **Repeat** as needed
-
-## Quick Reference
-
-### See the Desktop
-
-```bash
-deskctl snapshot # Screenshot + window tree with @wN refs
-deskctl snapshot --annotate # Screenshot with bounding boxes and labels
-deskctl snapshot --json # Structured JSON output
-deskctl list-windows # Window tree without screenshot
-deskctl screenshot /tmp/s.png # Screenshot only (no window tree)
-deskctl get active-window # Currently focused window
-deskctl get monitors # Monitor geometry
-deskctl get version # deskctl version + backend
-deskctl get systeminfo # Runtime-scoped diagnostics
-deskctl wait window --selector 'title=Firefox' --timeout 10
-deskctl wait focus --selector 'class=firefox' --timeout 5
-```
-
-### Click and Type
-
-```bash
-deskctl click @w1 # Click center of window @w1
-deskctl click 500,300 # Click absolute coordinates
-deskctl dblclick @w2 # Double-click window @w2
-deskctl type "hello world" # Type text into focused window
-deskctl press enter # Press a key
-deskctl hotkey ctrl c # Send Ctrl+C
-deskctl hotkey ctrl shift t # Send Ctrl+Shift+T
-```
-
-### Mouse Control
-
-```bash
-deskctl mouse move 500 300 # Move cursor to coordinates
-deskctl mouse scroll 3 # Scroll down 3 units
-deskctl mouse scroll -3 # Scroll up 3 units
-deskctl mouse drag 100 100 500 500 # Drag from (100,100) to (500,500)
-```
-
-### Window Management
-
-```bash
-deskctl focus @w2 # Focus window by ref
-deskctl focus 'title=Firefox' # Focus by explicit title selector
-deskctl focus 'class=firefox' # Focus by explicit class selector
-deskctl focus "firefox" # Fuzzy substring match (fails on ambiguity)
-deskctl close @w3 # Close window gracefully
-deskctl move-window @w1 100 200 # Move window to position
-deskctl resize-window @w1 800 600 # Resize window
-```
-
-### Utilities
-
-```bash
-deskctl doctor # Diagnose X11, screenshot, and daemon health
-deskctl get-screen-size # Screen resolution
-deskctl get-mouse-position # Current cursor position
-deskctl launch firefox # Launch an application
-deskctl launch code -- --new-window # Launch with arguments
-```
-
-### Daemon
-
-```bash
-deskctl daemon start # Start daemon manually
-deskctl daemon stop # Stop daemon
-deskctl daemon status # Check daemon status
-```
-
-## Global Options
-
-- `--json` : Output as structured JSON (all commands)
-- `--session NAME` : Session name for multiple daemon instances (default: "default")
-- `--socket PATH` : Custom Unix socket path
-
-## Output Contract
-
-- Prefer `--json` when an agent needs strict parsing.
-- Use `window_id` for stable targeting inside a live daemon session.
-- Use `ref_id` / `@wN` for quick short-lived follow-up actions after `snapshot` or `list-windows`.
-- Structured JSON failures expose machine-usable `kind` values for selector and wait failures.
-- The exact text formatting is intentionally compact but not the parsing contract. See `docs/runtime-output.md` for the stable field policy.
-
-## Window Refs
-
-After `snapshot` or `list-windows`, windows are assigned short refs:
-- `@w1` is the topmost (usually focused) window
-- `@w2`, `@w3`, etc. follow z-order (front to back)
-- Refs reset on each `snapshot` call
-- Use `--json` to see stable `window_id` values for programmatic tracking within the current daemon session
-
-## Selector Contract
-
-Prefer explicit selectors when an agent needs deterministic targeting:
-
-```bash
-ref=w1
-id=win1
-title=Firefox
-class=firefox
-focused
-```
-
-Bare selectors such as `firefox` still work as fuzzy substring matches, but they now fail with candidate windows if multiple matches exist.
-
-## Example Agent Workflow
-
-```bash
-# 1. See what's on screen
-deskctl snapshot --annotate
-
-# 2. Wait for the browser and focus it deterministically
-deskctl wait window --selector 'class=firefox' --timeout 10
-deskctl focus 'class=firefox'
-
-# 3. Navigate to a URL
-deskctl hotkey ctrl l
-deskctl type "https://example.com"
-deskctl press enter
-
-# 4. Take a new snapshot to see the result
-deskctl snapshot
-```
-
-## Key Names for press/hotkey
-
-Modifiers: `ctrl`, `alt`, `shift`, `super`
-Navigation: `enter`, `tab`, `escape`, `backspace`, `delete`, `space`
-Arrows: `up`, `down`, `left`, `right`
-Page: `home`, `end`, `pageup`, `pagedown`
-Function: `f1` through `f12`
-Characters: any single character (e.g. `a`, `1`, `/`)
diff --git a/skills/deskctl/SKILL.md b/skills/deskctl/SKILL.md
new file mode 100644
index 0000000..1522703
--- /dev/null
+++ b/skills/deskctl/SKILL.md
@@ -0,0 +1,132 @@
+---
+name: deskctl
+description: Desktop control CLI for AI agents on Linux X11. Use when operating an X11 desktop in a sandbox, VM, or sandbox-agent session via screenshots, grouped get/wait commands, selectors, and mouse or keyboard input. Prefer this skill when the task is "control the desktop", "inspect windows", "wait for a window", "click/type in the sandbox desktop", or "use deskctl inside sandbox-agent".
+allowed-tools: Bash(deskctl:*), Bash(npx deskctl-cli:*), Bash(npm:*), Bash(which:*), Bash(printenv:*), Bash(echo:*), Bash(sandbox-agent:*)
+---
+
+# deskctl
+
+`deskctl` is a non-interactive desktop control CLI for Linux X11 agents. It works well inside sandbox-agent desktop environments because it gives agents a tight `observe -> wait -> act -> verify` loop.
+
+## Install skill (optional)
+
+### npx
+
+```bash
+npx skills add harivansh-afk/deskctl -s deskctl
+```
+
+### bunx
+
+```bash
+bunx skills add harivansh-afk/deskctl -s deskctl
+```
+
+## Install the CLI
+
+Preferred install path:
+
+```bash
+npm install -g deskctl-cli
+deskctl --help
+```
+
+If global npm installs are not writable, use a user prefix:
+
+```bash
+mkdir -p "$HOME/.local/bin"
+npm install -g --prefix "$HOME/.local" deskctl-cli
+export PATH="$HOME/.local/bin:$PATH"
+deskctl --help
+```
+
+One-shot usage also works:
+
+```bash
+npx deskctl-cli --help
+```
+
+For install details and fallback paths, see [references/install.md](references/install.md).
+
+## Sandbox-Agent Notes
+
+Before using `deskctl` inside sandbox-agent:
+
+1. Make sure the sandbox has desktop runtime packages installed.
+2. Make sure the session is actually running X11.
+3. Run `deskctl doctor` before trying to click or type.
+
+Typical sandbox-agent prep:
+
+```bash
+sandbox-agent install desktop --yes
+deskctl doctor
+```
+
+If `doctor` fails, inspect `DISPLAY`, `XDG_SESSION_TYPE`, and whether the sandbox actually has a desktop session. See [references/sandbox-agent.md](references/sandbox-agent.md).
+
+## Core Workflow
+
+Every desktop task should follow this loop:
+
+1. **Observe**
+2. **Target**
+3. **Wait**
+4. **Act**
+5. **Verify**
+
+```bash
+deskctl doctor
+deskctl snapshot --annotate
+deskctl get active-window
+deskctl wait window --selector 'class=firefox' --timeout 10
+deskctl focus 'class=firefox'
+deskctl hotkey ctrl l
+deskctl type "https://example.com"
+deskctl press enter
+deskctl snapshot
+```
+
+## What To Reach For First
+
+- `deskctl doctor`
+- `deskctl snapshot --annotate`
+- `deskctl list-windows`
+- `deskctl get active-window`
+- `deskctl wait window --selector ...`
+- `deskctl wait focus --selector ...`
+
+Use `--json` when you need strict parsing. Use explicit selectors when you need deterministic targeting.
+
+## Selector Rules
+
+Prefer explicit selectors:
+
+```bash
+ref=w1
+id=win1
+title=Firefox
+class=firefox
+focused
+```
+
+Legacy refs still work:
+
+```bash
+@w1
+w1
+win1
+```
+
+Bare strings such as `firefox` are fuzzy substring selectors. They fail on ambiguity instead of silently picking the wrong window.
+
+## References
+
+- [references/install.md](references/install.md) - install paths, npm-first bootstrap, runtime prerequisites
+- [references/commands.md](references/commands.md) - grouped reads, waits, selectors, and core action commands
+- [references/sandbox-agent.md](references/sandbox-agent.md) - using `deskctl` inside sandbox-agent desktop sessions
+
+## Templates
+
+- [templates/install-deskctl-npm.sh](templates/install-deskctl-npm.sh) - install `deskctl-cli` into a user prefix
+- [templates/sandbox-agent-desktop-loop.sh](templates/sandbox-agent-desktop-loop.sh) - minimal observe/wait/act loop for desktop tasks
diff --git a/skills/deskctl/references/commands.md b/skills/deskctl/references/commands.md
new file mode 100644
index 0000000..2d2dc1f
--- /dev/null
+++ b/skills/deskctl/references/commands.md
@@ -0,0 +1,75 @@
+# deskctl command guide
+
+## Observe
+
+```bash
+deskctl doctor
+deskctl snapshot
+deskctl snapshot --annotate
+deskctl list-windows
+deskctl screenshot /tmp/current.png
+deskctl get active-window
+deskctl get monitors
+deskctl get version
+deskctl get systeminfo
+```
+
+Use `snapshot --annotate` when you need both the screenshot artifact and the short `@wN` labels. Use `list-windows` when you only need the window tree and do not want screenshot side effects.
+
+## Wait
+
+```bash
+deskctl wait window --selector 'title=Firefox' --timeout 10
+deskctl wait focus --selector 'class=firefox' --timeout 5
+```
+
+Wait commands return the matched window payload on success. In `--json` mode, failures include structured `kind` values so the caller can recover without string parsing.
+
+## Selectors
+
+Prefer explicit selectors:
+
+```bash
+ref=w1
+id=win1
+title=Firefox
+class=firefox
+focused
+```
+
+Legacy refs still work:
+
+```bash
+@w1
+w1
+win1
+```
+
+Bare fuzzy selectors such as `firefox` are supported, but they fail on ambiguity.
+
+## Act
+
+```bash
+deskctl focus 'class=firefox'
+deskctl click @w1
+deskctl dblclick @w2
+deskctl type "hello world"
+deskctl press enter
+deskctl hotkey ctrl shift t
+deskctl mouse move 500 300
+deskctl mouse scroll 3
+deskctl mouse drag 100 100 500 500
+deskctl move-window @w1 100 120
+deskctl resize-window @w1 1280 720
+deskctl close @w3
+deskctl launch firefox
+```
+
+## Agent loop
+
+The safe pattern is:
+
+1. Observe with `snapshot`, `list-windows`, or `get ...`
+2. Wait for the target window if needed
+3. Act using explicit selectors or refs
+4. Snapshot again to verify the result
diff --git a/skills/deskctl/references/install.md b/skills/deskctl/references/install.md
new file mode 100644
index 0000000..cb97a5c
--- /dev/null
+++ b/skills/deskctl/references/install.md
@@ -0,0 +1,75 @@
+# Install `deskctl`
+
+`deskctl` is designed to be used non-interactively by agents. The easiest install path is the npm package because it installs the `deskctl` command directly from GitHub Release assets without needing Cargo on the target machine.
+
+## Preferred: npm global install
+
+```bash
+npm install -g deskctl-cli
+deskctl --help
+```
+
+This is the preferred path for sandboxes, VMs, and sandbox-agent sessions where Node/npm already exists.
+
+## User-prefix npm install
+
+If global npm installs are not writable:
+
+```bash
+mkdir -p "$HOME/.local/bin"
+npm install -g --prefix "$HOME/.local" deskctl-cli
+export PATH="$HOME/.local/bin:$PATH"
+deskctl --help
+```
+
+This avoids `sudo` and keeps the install inside the user home directory.
+
+## One-shot npm execution
+
+```bash
+npx deskctl-cli --help
+```
+
+Use this for quick testing. For repeated desktop control, install the command once so the runtime is predictable.
+
+## Fallback: Cargo
+
+```bash
+cargo install deskctl
+```
+
+Use this only when the machine already has a Rust toolchain or when you explicitly want a source build.
+
+## Fallback: local Docker build
+
+If you need a Linux binary from macOS or another non-Linux host:
+
+```bash
+docker compose -f docker/docker-compose.yml run --rm build
+```
+
+Then copy `dist/deskctl-linux-x86_64` into the target machine.
+
+## Runtime prerequisites
+
+`deskctl` needs:
+
+- Linux
+- X11
+- a valid `DISPLAY`
+- a working desktop/window-manager session
+
+Quick verification:
+
+```bash
+printenv DISPLAY
+printenv XDG_SESSION_TYPE
+deskctl doctor
+```
+
+Inside sandbox-agent, you may need to install desktop dependencies first:
+
+```bash
+sandbox-agent install desktop --yes
+deskctl doctor
+```
diff --git a/skills/deskctl/references/sandbox-agent.md b/skills/deskctl/references/sandbox-agent.md
new file mode 100644
index 0000000..d994062
--- /dev/null
+++ b/skills/deskctl/references/sandbox-agent.md
@@ -0,0 +1,61 @@
+# deskctl inside sandbox-agent
+
+Use `deskctl` when the sandbox-agent session includes a Linux desktop and you want a tight local desktop-control loop from the shell.
+
+## When it fits
+
+`deskctl` is a good fit when:
+
+- the sandbox already has an X11 desktop session
+- you want fast local desktop control from inside the sandbox
+- you want short-lived refs like `@w1` and grouped `get` or `wait` primitives
+
+It is not a replacement for sandbox-agent session orchestration itself. Use sandbox-agent to provision the sandbox and desktop runtime, then use `deskctl` inside that environment to control the GUI.
+
+## Minimal bootstrap
+
+```bash
+sandbox-agent install desktop --yes
+npm install -g deskctl-cli
+deskctl doctor
+deskctl snapshot --annotate
+```
+
+If npm global installs are not writable:
+
+```bash
+mkdir -p "$HOME/.local/bin"
+npm install -g --prefix "$HOME/.local" deskctl-cli
+export PATH="$HOME/.local/bin:$PATH"
+deskctl doctor
+```
+
+## Expected environment
+
+Check:
+
+```bash
+printenv DISPLAY
+printenv XDG_SESSION_TYPE
+deskctl --json get systeminfo
+```
+
+Healthy `deskctl` usage usually means:
+
+- `DISPLAY` is set
+- `XDG_SESSION_TYPE=x11`
+- `deskctl doctor` succeeds
+
+## Recommended workflow
+
+```bash
+deskctl snapshot --annotate
+deskctl wait window --selector 'class=firefox' --timeout 10
+deskctl focus 'class=firefox'
+deskctl hotkey ctrl l
+deskctl type "https://example.com"
+deskctl press enter
+deskctl snapshot
+```
+
+Prefer `--json` for strict machine parsing and explicit selectors for deterministic targeting.
diff --git a/skills/deskctl/templates/install-deskctl-npm.sh b/skills/deskctl/templates/install-deskctl-npm.sh
new file mode 100644
index 0000000..a0ab596
--- /dev/null
+++ b/skills/deskctl/templates/install-deskctl-npm.sh
@@ -0,0 +1,27 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+if command -v deskctl >/dev/null 2>&1; then
+ echo "deskctl already installed: $(command -v deskctl)"
+ exit 0
+fi
+
+if ! command -v npm >/dev/null 2>&1; then
+ echo "npm is required for the preferred deskctl install path"
+ exit 1
+fi
+
+prefix="${DESKCTL_NPM_PREFIX:-$HOME/.local}"
+bin_dir="$prefix/bin"
+
+mkdir -p "$bin_dir"
+npm install -g --prefix "$prefix" deskctl-cli
+
+if ! command -v deskctl >/dev/null 2>&1; then
+ echo "deskctl installed to $bin_dir"
+ echo "add this to PATH if needed:"
+ echo "export PATH=\"$bin_dir:\$PATH\""
+fi
+
+"$bin_dir/deskctl" --help >/dev/null 2>&1 || true
+echo "deskctl bootstrap complete"
diff --git a/skills/deskctl/templates/sandbox-agent-desktop-loop.sh b/skills/deskctl/templates/sandbox-agent-desktop-loop.sh
new file mode 100644
index 0000000..f47dbb8
--- /dev/null
+++ b/skills/deskctl/templates/sandbox-agent-desktop-loop.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+deskctl doctor
+deskctl snapshot --annotate
+deskctl get active-window
+deskctl wait window --selector "${1:-focused}" --timeout "${2:-5}"
From c37589ccf403106ebba3414ceeb9263c19c96e4f Mon Sep 17 00:00:00 2001
From: Harivansh Rathi
Date: Thu, 26 Mar 2026 00:30:05 -0400
Subject: [PATCH 02/35] skill validated with workflows
---
skills/deskctl/SKILL.md | 128 ++++--------------
skills/deskctl/references/commands.md | 64 ++++-----
skills/deskctl/references/install.md | 75 ----------
skills/deskctl/references/runtime-contract.md | 1 +
skills/deskctl/references/sandbox-agent.md | 61 ---------
.../deskctl/templates/install-deskctl-npm.sh | 27 ----
.../templates/sandbox-agent-desktop-loop.sh | 7 -
skills/deskctl/workflows/observe-act.sh | 37 +++++
skills/deskctl/workflows/poll-condition.sh | 42 ++++++
9 files changed, 134 insertions(+), 308 deletions(-)
delete mode 100644 skills/deskctl/references/install.md
create mode 120000 skills/deskctl/references/runtime-contract.md
delete mode 100644 skills/deskctl/references/sandbox-agent.md
delete mode 100644 skills/deskctl/templates/install-deskctl-npm.sh
delete mode 100644 skills/deskctl/templates/sandbox-agent-desktop-loop.sh
create mode 100755 skills/deskctl/workflows/observe-act.sh
create mode 100755 skills/deskctl/workflows/poll-condition.sh
diff --git a/skills/deskctl/SKILL.md b/skills/deskctl/SKILL.md
index 1522703..81dea19 100644
--- a/skills/deskctl/SKILL.md
+++ b/skills/deskctl/SKILL.md
@@ -1,132 +1,54 @@
---
name: deskctl
-description: Desktop control CLI for AI agents on Linux X11. Use when operating an X11 desktop in a sandbox, VM, or sandbox-agent session via screenshots, grouped get/wait commands, selectors, and mouse or keyboard input. Prefer this skill when the task is "control the desktop", "inspect windows", "wait for a window", "click/type in the sandbox desktop", or "use deskctl inside sandbox-agent".
-allowed-tools: Bash(deskctl:*), Bash(npx deskctl-cli:*), Bash(npm:*), Bash(which:*), Bash(printenv:*), Bash(echo:*), Bash(sandbox-agent:*)
+description: Non-interactive X11 desktop control for AI agents. Use when the task involves controlling a Linux desktop - clicking, typing, reading windows, waiting for UI state, or taking screenshots inside a sandbox or VM.
+allowed-tools: Bash(deskctl:*), Bash(npx deskctl-cli:*), Bash(npm:*), Bash(which:*), Bash(printenv:*), Bash(echo:*)
---
# deskctl
-`deskctl` is a non-interactive desktop control CLI for Linux X11 agents. It works well inside sandbox-agent desktop environments because it gives agents a tight `observe -> wait -> act -> verify` loop.
+Non-interactive desktop control CLI for Linux X11 agents.
-## Install skill (optional)
+All output follows the runtime contract defined in [references/runtime-contract.md](references/runtime-contract.md). Every command returns a stable JSON envelope when called with `--json`. Use `--json` whenever you need to parse output programmatically.
-### npx
-
-```bash
-npx skills add harivansh-afk/deskctl -s deskctl
-```
-
-### bunx
-
-```bash
-bunx skills add harivansh-afk/deskctl -s deskctl
-```
-
-## Install the CLI
-
-Preferred install path:
+## Quick start
```bash
npm install -g deskctl-cli
-deskctl --help
-```
-
-If global npm installs are not writable, use a user prefix:
-
-```bash
-mkdir -p "$HOME/.local/bin"
-npm install -g --prefix "$HOME/.local" deskctl-cli
-export PATH="$HOME/.local/bin:$PATH"
-deskctl --help
-```
-
-One-shot usage also works:
-
-```bash
-npx deskctl-cli --help
-```
-
-For install details and fallback paths, see [references/install.md](references/install.md).
-
-## Sandbox-Agent Notes
-
-Before using `deskctl` inside sandbox-agent:
-
-1. Make sure the sandbox has desktop runtime packages installed.
-2. Make sure the session is actually running X11.
-3. Run `deskctl doctor` before trying to click or type.
-
-Typical sandbox-agent prep:
-
-```bash
-sandbox-agent install desktop --yes
-deskctl doctor
-```
-
-If `doctor` fails, inspect `DISPLAY`, `XDG_SESSION_TYPE`, and whether the sandbox actually has a desktop session. See [references/sandbox-agent.md](references/sandbox-agent.md).
-
-## Core Workflow
-
-Every desktop task should follow this loop:
-
-1. **Observe**
-2. **Target**
-3. **Wait**
-4. **Act**
-5. **Verify**
-
-```bash
deskctl doctor
deskctl snapshot --annotate
-deskctl get active-window
-deskctl wait window --selector 'class=firefox' --timeout 10
-deskctl focus 'class=firefox'
-deskctl hotkey ctrl l
-deskctl type "https://example.com"
-deskctl press enter
-deskctl snapshot
```
-## What To Reach For First
+## Agent loop
-- `deskctl doctor`
-- `deskctl snapshot --annotate`
-- `deskctl list-windows`
-- `deskctl get active-window`
-- `deskctl wait window --selector ...`
-- `deskctl wait focus --selector ...`
-
-Use `--json` when you need strict parsing. Use explicit selectors when you need deterministic targeting.
-
-## Selector Rules
-
-Prefer explicit selectors:
+Every desktop interaction follows: **observe -> wait -> act -> verify**.
```bash
-ref=w1
-id=win1
-title=Firefox
-class=firefox
-focused
+deskctl snapshot --annotate # observe
+deskctl wait window --selector 'title=Firefox' --timeout 10 # wait
+deskctl click 'title=Firefox' # act
+deskctl snapshot # verify
```
-Legacy refs still work:
+See [workflows/observe-act.sh](workflows/observe-act.sh) for a reusable script. See [workflows/poll-condition.sh](workflows/poll-condition.sh) for polling loops.
+
+## Selectors
```bash
-@w1
-w1
-win1
+ref=w1 # snapshot ref (short-lived)
+id=win1 # stable window ID (session-scoped)
+title=Firefox # match by title
+class=firefox # match by WM class
+focused # currently focused window
```
-Bare strings such as `firefox` are fuzzy substring selectors. They fail on ambiguity instead of silently picking the wrong window.
+Bare strings like `firefox` do fuzzy matching but fail on ambiguity. Prefer explicit selectors.
## References
-- [references/install.md](references/install.md) - install paths, npm-first bootstrap, runtime prerequisites
-- [references/commands.md](references/commands.md) - grouped reads, waits, selectors, and core action commands
-- [references/sandbox-agent.md](references/sandbox-agent.md) - using `deskctl` inside sandbox-agent desktop sessions
+- [references/runtime-contract.md](references/runtime-contract.md) - output contract, stable fields, error kinds
+- [references/commands.md](references/commands.md) - all available commands
-## Templates
+## Workflows
-- [templates/install-deskctl-npm.sh](templates/install-deskctl-npm.sh) - install `deskctl-cli` into a user prefix
-- [templates/sandbox-agent-desktop-loop.sh](templates/sandbox-agent-desktop-loop.sh) - minimal observe/wait/act loop for desktop tasks
+- [workflows/observe-act.sh](workflows/observe-act.sh) - main observe-act loop
+- [workflows/poll-condition.sh](workflows/poll-condition.sh) - poll for a condition on screen
diff --git a/skills/deskctl/references/commands.md b/skills/deskctl/references/commands.md
index 2d2dc1f..d0e7c9f 100644
--- a/skills/deskctl/references/commands.md
+++ b/skills/deskctl/references/commands.md
@@ -1,21 +1,23 @@
-# deskctl command guide
+# deskctl commands
+
+All commands support `--json` for machine-parseable output following the runtime contract.
## Observe
```bash
-deskctl doctor
-deskctl snapshot
-deskctl snapshot --annotate
-deskctl list-windows
-deskctl screenshot /tmp/current.png
-deskctl get active-window
-deskctl get monitors
-deskctl get version
-deskctl get systeminfo
+deskctl doctor # check X11 runtime and daemon health
+deskctl snapshot # screenshot + window list
+deskctl snapshot --annotate # screenshot with @wN labels overlaid
+deskctl list-windows # window list only (no screenshot)
+deskctl screenshot /tmp/screen.png # screenshot to explicit path
+deskctl get active-window # focused window info
+deskctl get monitors # monitor geometry
+deskctl get version # version and backend
+deskctl get systeminfo # full runtime diagnostics
+deskctl get-screen-size # screen resolution
+deskctl get-mouse-position # cursor coordinates
```
-Use `snapshot --annotate` when you need both the screenshot artifact and the short `@wN` labels. Use `list-windows` when you only need the window tree and do not want screenshot side effects.
-
## Wait
```bash
@@ -23,29 +25,19 @@ deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'class=firefox' --timeout 5
```
-Wait commands return the matched window payload on success. In `--json` mode, failures include structured `kind` values so the caller can recover without string parsing.
+Returns the matched window payload on success. Failures include structured `kind` values in `--json` mode.
## Selectors
-Prefer explicit selectors:
-
```bash
-ref=w1
-id=win1
-title=Firefox
-class=firefox
-focused
+ref=w1 # snapshot ref (short-lived, from last snapshot)
+id=win1 # stable window ID (session-scoped)
+title=Firefox # match by window title
+class=firefox # match by WM class
+focused # currently focused window
```
-Legacy refs still work:
-
-```bash
-@w1
-w1
-win1
-```
-
-Bare fuzzy selectors such as `firefox` are supported, but they fail on ambiguity.
+Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail on ambiguity.
## Act
@@ -58,6 +50,7 @@ deskctl press enter
deskctl hotkey ctrl shift t
deskctl mouse move 500 300
deskctl mouse scroll 3
+deskctl mouse scroll 3 --axis horizontal
deskctl mouse drag 100 100 500 500
deskctl move-window @w1 100 120
deskctl resize-window @w1 1280 720
@@ -65,11 +58,12 @@ deskctl close @w3
deskctl launch firefox
```
-## Agent loop
+## Daemon
-The safe pattern is:
+```bash
+deskctl daemon start
+deskctl daemon stop
+deskctl daemon status
+```
-1. Observe with `snapshot`, `list-windows`, or `get ...`
-2. Wait for the target window if needed
-3. Act using explicit selectors or refs
-4. Snapshot again to verify the result
+The daemon starts automatically on first command. Manual control is rarely needed.
diff --git a/skills/deskctl/references/install.md b/skills/deskctl/references/install.md
deleted file mode 100644
index cb97a5c..0000000
--- a/skills/deskctl/references/install.md
+++ /dev/null
@@ -1,75 +0,0 @@
-# Install `deskctl`
-
-`deskctl` is designed to be used non-interactively by agents. The easiest install path is the npm package because it installs the `deskctl` command directly from GitHub Release assets without needing Cargo on the target machine.
-
-## Preferred: npm global install
-
-```bash
-npm install -g deskctl-cli
-deskctl --help
-```
-
-This is the preferred path for sandboxes, VMs, and sandbox-agent sessions where Node/npm already exists.
-
-## User-prefix npm install
-
-If global npm installs are not writable:
-
-```bash
-mkdir -p "$HOME/.local/bin"
-npm install -g --prefix "$HOME/.local" deskctl-cli
-export PATH="$HOME/.local/bin:$PATH"
-deskctl --help
-```
-
-This avoids `sudo` and keeps the install inside the user home directory.
-
-## One-shot npm execution
-
-```bash
-npx deskctl-cli --help
-```
-
-Use this for quick testing. For repeated desktop control, install the command once so the runtime is predictable.
-
-## Fallback: Cargo
-
-```bash
-cargo install deskctl
-```
-
-Use this only when the machine already has a Rust toolchain or when you explicitly want a source build.
-
-## Fallback: local Docker build
-
-If you need a Linux binary from macOS or another non-Linux host:
-
-```bash
-docker compose -f docker/docker-compose.yml run --rm build
-```
-
-Then copy `dist/deskctl-linux-x86_64` into the target machine.
-
-## Runtime prerequisites
-
-`deskctl` needs:
-
-- Linux
-- X11
-- a valid `DISPLAY`
-- a working desktop/window-manager session
-
-Quick verification:
-
-```bash
-printenv DISPLAY
-printenv XDG_SESSION_TYPE
-deskctl doctor
-```
-
-Inside sandbox-agent, you may need to install desktop dependencies first:
-
-```bash
-sandbox-agent install desktop --yes
-deskctl doctor
-```
diff --git a/skills/deskctl/references/runtime-contract.md b/skills/deskctl/references/runtime-contract.md
new file mode 120000
index 0000000..8de0781
--- /dev/null
+++ b/skills/deskctl/references/runtime-contract.md
@@ -0,0 +1 @@
+../../../docs/runtime-contract.md
\ No newline at end of file
diff --git a/skills/deskctl/references/sandbox-agent.md b/skills/deskctl/references/sandbox-agent.md
deleted file mode 100644
index d994062..0000000
--- a/skills/deskctl/references/sandbox-agent.md
+++ /dev/null
@@ -1,61 +0,0 @@
-# deskctl inside sandbox-agent
-
-Use `deskctl` when the sandbox-agent session includes a Linux desktop and you want a tight local desktop-control loop from the shell.
-
-## When it fits
-
-`deskctl` is a good fit when:
-
-- the sandbox already has an X11 desktop session
-- you want fast local desktop control from inside the sandbox
-- you want short-lived refs like `@w1` and grouped `get` or `wait` primitives
-
-It is not a replacement for sandbox-agent session orchestration itself. Use sandbox-agent to provision the sandbox and desktop runtime, then use `deskctl` inside that environment to control the GUI.
-
-## Minimal bootstrap
-
-```bash
-sandbox-agent install desktop --yes
-npm install -g deskctl-cli
-deskctl doctor
-deskctl snapshot --annotate
-```
-
-If npm global installs are not writable:
-
-```bash
-mkdir -p "$HOME/.local/bin"
-npm install -g --prefix "$HOME/.local" deskctl-cli
-export PATH="$HOME/.local/bin:$PATH"
-deskctl doctor
-```
-
-## Expected environment
-
-Check:
-
-```bash
-printenv DISPLAY
-printenv XDG_SESSION_TYPE
-deskctl --json get systeminfo
-```
-
-Healthy `deskctl` usage usually means:
-
-- `DISPLAY` is set
-- `XDG_SESSION_TYPE=x11`
-- `deskctl doctor` succeeds
-
-## Recommended workflow
-
-```bash
-deskctl snapshot --annotate
-deskctl wait window --selector 'class=firefox' --timeout 10
-deskctl focus 'class=firefox'
-deskctl hotkey ctrl l
-deskctl type "https://example.com"
-deskctl press enter
-deskctl snapshot
-```
-
-Prefer `--json` for strict machine parsing and explicit selectors for deterministic targeting.
diff --git a/skills/deskctl/templates/install-deskctl-npm.sh b/skills/deskctl/templates/install-deskctl-npm.sh
deleted file mode 100644
index a0ab596..0000000
--- a/skills/deskctl/templates/install-deskctl-npm.sh
+++ /dev/null
@@ -1,27 +0,0 @@
-#!/usr/bin/env bash
-set -euo pipefail
-
-if command -v deskctl >/dev/null 2>&1; then
- echo "deskctl already installed: $(command -v deskctl)"
- exit 0
-fi
-
-if ! command -v npm >/dev/null 2>&1; then
- echo "npm is required for the preferred deskctl install path"
- exit 1
-fi
-
-prefix="${DESKCTL_NPM_PREFIX:-$HOME/.local}"
-bin_dir="$prefix/bin"
-
-mkdir -p "$bin_dir"
-npm install -g --prefix "$prefix" deskctl-cli
-
-if ! command -v deskctl >/dev/null 2>&1; then
- echo "deskctl installed to $bin_dir"
- echo "add this to PATH if needed:"
- echo "export PATH=\"$bin_dir:\$PATH\""
-fi
-
-"$bin_dir/deskctl" --help >/dev/null 2>&1 || true
-echo "deskctl bootstrap complete"
diff --git a/skills/deskctl/templates/sandbox-agent-desktop-loop.sh b/skills/deskctl/templates/sandbox-agent-desktop-loop.sh
deleted file mode 100644
index f47dbb8..0000000
--- a/skills/deskctl/templates/sandbox-agent-desktop-loop.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/usr/bin/env bash
-set -euo pipefail
-
-deskctl doctor
-deskctl snapshot --annotate
-deskctl get active-window
-deskctl wait window --selector "${1:-focused}" --timeout "${2:-5}"
diff --git a/skills/deskctl/workflows/observe-act.sh b/skills/deskctl/workflows/observe-act.sh
new file mode 100755
index 0000000..0e336ae
--- /dev/null
+++ b/skills/deskctl/workflows/observe-act.sh
@@ -0,0 +1,37 @@
+#!/usr/bin/env bash
+# observe-act.sh - main desktop interaction loop
+# usage: ./observe-act.sh [action] [action-args...]
+# example: ./observe-act.sh 'title=Firefox' click
+# example: ./observe-act.sh 'class=terminal' type "ls -la"
+set -euo pipefail
+
+SELECTOR="${1:?usage: observe-act.sh [action] [action-args...]}"
+ACTION="${2:-click}"
+shift 2 2>/dev/null || true
+
+# 1. observe - snapshot the desktop, get current state
+echo "--- observe ---"
+deskctl snapshot --annotate --json | head -1
+deskctl get active-window
+
+# 2. wait - ensure target exists
+echo "--- wait ---"
+deskctl wait window --selector "$SELECTOR" --timeout 10
+
+# 3. act - perform the action on the target
+echo "--- act ---"
+case "$ACTION" in
+ click) deskctl click "$SELECTOR" ;;
+ dblclick) deskctl dblclick "$SELECTOR" ;;
+ focus) deskctl focus "$SELECTOR" ;;
+ type) deskctl focus "$SELECTOR" && deskctl type "$@" ;;
+ press) deskctl focus "$SELECTOR" && deskctl press "$@" ;;
+ hotkey) deskctl focus "$SELECTOR" && deskctl hotkey "$@" ;;
+ close) deskctl close "$SELECTOR" ;;
+ *) echo "unknown action: $ACTION"; exit 1 ;;
+esac
+
+# 4. verify - snapshot again to confirm result
+echo "--- verify ---"
+sleep 0.5
+deskctl snapshot --json | head -1
diff --git a/skills/deskctl/workflows/poll-condition.sh b/skills/deskctl/workflows/poll-condition.sh
new file mode 100755
index 0000000..e173bf5
--- /dev/null
+++ b/skills/deskctl/workflows/poll-condition.sh
@@ -0,0 +1,42 @@
+#!/usr/bin/env bash
+# poll-condition.sh - poll the desktop until a condition is met
+# usage: ./poll-condition.sh [interval-seconds] [max-attempts]
+# example: ./poll-condition.sh "Tickets Available" 5 60
+# example: ./poll-condition.sh "Order Confirmed" 3 20
+# example: ./poll-condition.sh "Download Complete" 10 30
+#
+# checks window titles for the match string every N seconds.
+# exits 0 when found, exits 1 after max attempts.
+set -euo pipefail
+
+MATCH="${1:?usage: poll-condition.sh [interval] [max-attempts]}"
+INTERVAL="${2:-5}"
+MAX="${3:-60}"
+
+attempt=0
+while [ "$attempt" -lt "$MAX" ]; do
+ attempt=$((attempt + 1))
+
+ # snapshot and check window titles
+ windows=$(deskctl list-windows --json 2>/dev/null || echo '{"success":false}')
+ if echo "$windows" | grep -qi "$MATCH"; then
+ echo "FOUND: '$MATCH' detected on attempt $attempt"
+ deskctl snapshot --annotate
+ exit 0
+ fi
+
+ # also check screenshot text via active window title
+ active=$(deskctl get active-window --json 2>/dev/null || echo '{}')
+ if echo "$active" | grep -qi "$MATCH"; then
+ echo "FOUND: '$MATCH' in active window on attempt $attempt"
+ deskctl snapshot --annotate
+ exit 0
+ fi
+
+ echo "attempt $attempt/$MAX - '$MATCH' not found, waiting ${INTERVAL}s..."
+ sleep "$INTERVAL"
+done
+
+echo "NOT FOUND: '$MATCH' after $MAX attempts"
+deskctl snapshot --annotate
+exit 1
From 14c89563211a8fec4b916bc4686ee1b4b86070d4 Mon Sep 17 00:00:00 2001
From: Harivansh Rathi
Date: Thu, 26 Mar 2026 08:17:07 -0400
Subject: [PATCH 03/35] align docs and contract
---
README.md | 268 ++++----------------------
docs/runtime-contract.md | 168 +++-------------
site/src/pages/architecture.mdx | 104 ++++++----
site/src/pages/commands.mdx | 219 ++++++++-------------
site/src/pages/index.astro | 57 +++++-
site/src/pages/installation.mdx | 75 ++++---
site/src/pages/quick-start.mdx | 106 +++++-----
site/src/pages/runtime-contract.mdx | 177 +++++++++++++++++
site/src/styles/base.css | 21 ++
skills/deskctl/references/commands.md | 52 +++--
10 files changed, 590 insertions(+), 657 deletions(-)
create mode 100644 site/src/pages/runtime-contract.mdx
diff --git a/README.md b/README.md
index db7d92f..32144f0 100644
--- a/README.md
+++ b/README.md
@@ -1,266 +1,68 @@
# deskctl
-Desktop control CLI for AI agents on Linux X11.
+[](https://www.npmjs.com/package/deskctl-cli)
+[](https://github.com/harivansh-afk/deskctl/releases)
+[](#support-boundary)
+[](skills/deskctl)
+
+Non-interactive desktop control for AI agents on Linux X11.
## Install
-### Cargo
-
-```bash
-cargo install deskctl
-```
-
-Source builds on Linux require:
-
-- Rust 1.75+
-- `pkg-config`
-- X11 development libraries for input and windowing, typically `libx11-dev` and `libxtst-dev` on Debian/Ubuntu
-
-### npm
-
```bash
npm install -g deskctl-cli
-deskctl --help
+deskctl doctor
+deskctl snapshot --annotate
```
-One-shot execution is also supported:
+One-shot execution also works:
```bash
npx deskctl-cli --help
```
-`deskctl-cli` currently supports `linux-x64` and installs the `deskctl` command by downloading the matching GitHub Release asset.
+`deskctl-cli` installs the `deskctl` command by downloading the matching GitHub Release asset for the supported runtime target.
-### Installable skill
-
-For `skills.sh` / agent skill ecosystems:
+## Installable skill
```bash
npx skills add harivansh-afk/deskctl -s deskctl
```
-The installable skill lives under [`skills/deskctl`](skills/deskctl) and is designed for X11 sandboxes, VMs, and sandbox-agent desktop sessions. It points agents to the npm install path first so they can get `deskctl` without Cargo.
+The installable skill lives in [`skills/deskctl`](skills/deskctl) and is built around the same observe -> wait -> act -> verify loop as the CLI.
-### Nix
+## Quick example
+
+```bash
+deskctl doctor
+deskctl snapshot --annotate
+deskctl wait window --selector 'title=Firefox' --timeout 10
+deskctl focus 'title=Firefox'
+deskctl type "hello world"
+```
+
+## Docs
+
+- runtime contract: [docs/runtime-contract.md](docs/runtime-contract.md)
+- release flow: [docs/releasing.md](docs/releasing.md)
+- installable skill: [skills/deskctl](skills/deskctl)
+- contributor workflow: [CONTRIBUTING.md](CONTRIBUTING.md)
+
+## Other install paths
+
+Nix:
```bash
nix run github:harivansh-afk/deskctl -- --help
nix profile install github:harivansh-afk/deskctl
```
-The repo flake is the supported Nix install surface in this phase.
-
-### Docker Convenience
-
-Build a Linux binary locally with Docker:
-
-```bash
-docker compose -f docker/docker-compose.yml run --rm build
-```
-
-This writes `dist/deskctl-linux-x86_64`.
-
-Copy it to an SSH machine where `scp` is unavailable:
-
-```bash
-ssh -p 443 deskctl@ssh.agentcomputer.ai 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64
-```
-
-Run it on an X11 session:
-
-```bash
-DISPLAY=:1 XDG_SESSION_TYPE=x11 ~/deskctl --json snapshot --annotate
-```
-
-### Local Source Build
+Source build:
```bash
cargo build
```
-## Quick Start
+## Support boundary
-```bash
-# Diagnose the environment first
-deskctl doctor
-
-# See the desktop
-deskctl snapshot
-
-# Query focused runtime state
-deskctl get active-window
-deskctl get monitors
-
-# Click a window
-deskctl click @w1
-
-# Type text
-deskctl type "hello world"
-
-# Wait for a window or focus transition
-deskctl wait window --selector 'title=Firefox' --timeout 10
-deskctl wait focus --selector 'class=firefox' --timeout 5
-
-# Focus by explicit selector
-deskctl focus 'title=Firefox'
-```
-
-## Architecture
-
-Client-daemon architecture over Unix sockets (NDJSON wire protocol).
-The daemon starts automatically on first command and keeps the X11 connection alive for fast repeated calls.
-
-Source layout:
-
-- `src/lib.rs` exposes the shared library target
-- `src/main.rs` is the thin CLI wrapper
-- `src/` contains production code and unit tests
-- `tests/` contains Linux/X11 integration tests
-- `tests/support/` contains shared integration helpers
-
-## Runtime Requirements
-
-- Linux with X11 session
-- Rust 1.75+ plus the source-build dependencies above when building from source
-
-The binary itself only links the standard glibc runtime on Linux (`libc`, `libm`, `libgcc_s`).
-
-For deskctl to be fully functional on a fresh VM you still need:
-
-- an X11 server and an active `DISPLAY`
-- `XDG_SESSION_TYPE=x11` or an equivalent X11 session environment
-- a window manager or desktop environment that exposes standard EWMH properties such as `_NET_CLIENT_LIST_STACKING` and `_NET_ACTIVE_WINDOW`
-- an X server with the extensions needed for input simulation and screen metadata, which is standard on normal desktop X11 setups
-
-If setup fails, run:
-
-```bash
-deskctl doctor
-```
-
-## Contract Notes
-
-- `@wN` refs are short-lived handles assigned by `snapshot` and `list-windows`
-- `--json` output includes a stable `window_id` for programmatic targeting within the current daemon session
-- `list-windows` is a cheap read-only operation and does not capture or write a screenshot
-- the stable runtime JSON/error contract is documented in [docs/runtime-contract.md](docs/runtime-contract.md)
-
-## Read and Wait Surface
-
-The grouped runtime reads are:
-
-```bash
-deskctl get active-window
-deskctl get monitors
-deskctl get version
-deskctl get systeminfo
-```
-
-The grouped runtime waits are:
-
-```bash
-deskctl wait window --selector 'title=Firefox' --timeout 10
-deskctl wait focus --selector 'id=win3' --timeout 5
-```
-
-Successful `get active-window`, `wait window`, and `wait focus` responses return a `window` payload with:
-- `ref_id`
-- `window_id`
-- `title`
-- `app_name`
-- geometry (`x`, `y`, `width`, `height`)
-- state flags (`focused`, `minimized`)
-
-`get monitors` returns:
-- `count`
-- `monitors[]` with geometry and primary/automatic flags
-
-`get version` returns:
-- `version`
-- `backend`
-
-`get systeminfo` stays runtime-scoped and returns:
-- `backend`
-- `display`
-- `session_type`
-- `session`
-- `socket_path`
-- `screen`
-- `monitor_count`
-- `monitors`
-
-Wait timeout and selector failures are structured in `--json` mode so agents can recover without string parsing.
-
-## Output Policy
-
-Text mode is compact and follow-up-oriented, but JSON is the parsing contract.
-
-- use `--json` when an agent needs strict parsing
-- rely on `window_id`, selector-related fields, grouped read payloads, and structured error `kind` values for stable automation
-- treat monitor naming, incidental whitespace, and default screenshot file names as best-effort
-
-See [docs/runtime-conract.md](docs/runtime-contract.md) for the exact stable-vs-best-effort breakdown.
-
-## Distribution
-
-- GitHub Releases are the canonical binary source
-- crates.io package: `deskctl`
-- npm package: `deskctl-cli`
-- installed command on every channel: `deskctl`
-- repo-owned Nix install path: `flake.nix`
-
-For maintainer publishing and release steps, see [docs/releasing.md](docs/releasing.md).
-
-## Selector Contract
-
-Explicit selector modes:
-
-```bash
-ref=w1
-id=win1
-title=Firefox
-class=firefox
-focused
-```
-
-Legacy refs remain supported:
-
-```bash
-@w1
-w1
-win1
-```
-
-Bare selectors such as `firefox` are still supported as fuzzy substring matches, but they now fail on ambiguity and return candidate windows instead of silently picking the first match.
-
-## Support Boundary
-
-`deskctl` supports Linux X11 in this phase. Wayland and Hyprland are explicitly out of scope for the current runtime contract.
-
-## Workflow
-
-Local validation uses the root `Makefile`:
-
-```bash
-make fmt-check
-make lint
-make test-unit
-make test-integration
-make site-format-check
-make validate
-```
-
-`make validate` is the full repo-quality check and requires Linux with `xvfb-run` plus `pnpm --dir site install`.
-
-The repository standardizes on `pre-commit` for fast commit-time checks:
-
-```bash
-pre-commit install
-pre-commit run --all-files
-```
-
-See [CONTRIBUTING.md](CONTRIBUTING.md) for the full contributor guide.
-
-## Acknowledgements
-
-- [@barrettruth](github.com/barrettruth) - i stole the website from [vimdoc](https://github.com/barrettruth/vimdoc-language-server)
+`deskctl` currently supports Linux X11. Use `--json` for stable machine parsing, use `window_id` for programmatic targeting inside a live session, and use `deskctl doctor` first when the runtime looks broken.
diff --git a/docs/runtime-contract.md b/docs/runtime-contract.md
index 7312357..0316c06 100644
--- a/docs/runtime-contract.md
+++ b/docs/runtime-contract.md
@@ -1,19 +1,6 @@
-# Runtime Output Contract
+# deskctl runtime contract
-This document defines the current output contract for `deskctl`.
-
-It is intentionally scoped to the current Linux X11 runtime surface.
-It does not promise stability for future Wayland or window-manager-specific features.
-
-## Goals
-
-- Keep `deskctl` fully non-interactive
-- Make text output actionable for quick terminal and agent loops
-- Make `--json` safe for agent consumption without depending on incidental formatting
-
-## JSON Envelope
-
-Every runtime command uses the same top-level JSON envelope:
+All commands support `--json` and use the same top-level envelope:
```json
{
@@ -23,22 +10,11 @@ Every runtime command uses the same top-level JSON envelope:
}
```
-Stable top-level fields:
+Use `--json` whenever you need to parse output programmatically.
-- `success`
-- `data`
-- `error`
+## Stable window fields
-`success` is always the authoritative success/failure bit.
-When `success` is `false`, the CLI exits non-zero in both text mode and `--json` mode.
-
-## Stable Fields
-
-These fields are stable for agent consumption in the current Phase 1 runtime contract.
-
-### Window Identity
-
-Whenever a runtime response includes a window payload, these fields are stable:
+Whenever a response includes a window payload, these fields are stable:
- `ref_id`
- `window_id`
@@ -51,128 +27,46 @@ Whenever a runtime response includes a window payload, these fields are stable:
- `focused`
- `minimized`
-`window_id` is the stable public identifier for a live daemon session.
-`ref_id` is a short-lived convenience handle for the current window snapshot/ref map.
+Use `window_id` for stable targeting inside a live daemon session. Use
+`ref_id` or `@wN` for short-lived follow-up actions after `snapshot` or
+`list-windows`.
-### Grouped Reads
+## Stable grouped reads
-`deskctl get active-window`
+- `deskctl get active-window` -> `data.window`
+- `deskctl get monitors` -> `data.count`, `data.monitors`
+- `deskctl get version` -> `data.version`, `data.backend`
+- `deskctl get systeminfo` -> runtime-scoped diagnostic fields such as
+ `backend`, `display`, `session_type`, `session`, `socket_path`, `screen`,
+ `monitor_count`, and `monitors`
-- stable: `data.window`
+## Stable waits
-`deskctl get monitors`
+- `deskctl wait window` -> `data.wait`, `data.selector`, `data.elapsed_ms`,
+ `data.window`
+- `deskctl wait focus` -> `data.wait`, `data.selector`, `data.elapsed_ms`,
+ `data.window`
-- stable: `data.count`
-- stable: `data.monitors`
-- stable per monitor:
- - `name`
- - `x`
- - `y`
- - `width`
- - `height`
- - `width_mm`
- - `height_mm`
- - `primary`
- - `automatic`
+## Stable structured error kinds
-`deskctl get version`
-
-- stable: `data.version`
-- stable: `data.backend`
-
-`deskctl get systeminfo`
-
-- stable: `data.backend`
-- stable: `data.display`
-- stable: `data.session_type`
-- stable: `data.session`
-- stable: `data.socket_path`
-- stable: `data.screen`
-- stable: `data.monitor_count`
-- stable: `data.monitors`
-
-### Waits
-
-`deskctl wait window`
-`deskctl wait focus`
-
-- stable: `data.wait`
-- stable: `data.selector`
-- stable: `data.elapsed_ms`
-- stable: `data.window`
-
-### Selector-Driven Action Success
-
-For selector-driven action commands that resolve a window target, these identifiers are stable when present:
-
-- `data.ref_id`
-- `data.window_id`
-- `data.title`
-- `data.selector`
-
-This applies to:
-
-- `click`
-- `dblclick`
-- `focus`
-- `close`
-- `move-window`
-- `resize-window`
-
-The exact human-readable text rendering of those commands is not part of the JSON contract.
-
-### Artifact-Producing Commands
-
-`snapshot`
-`screenshot`
-
-- stable: `data.screenshot`
-
-When the command also returns windows, `data.windows` uses the stable window payload documented above.
-
-## Stable Structured Error Kinds
-
-When a runtime command returns structured JSON failure data, these error kinds are stable:
+When a command fails with structured JSON data, these `kind` values are stable:
- `selector_not_found`
- `selector_ambiguous`
- `selector_invalid`
- `timeout`
- `not_found`
-- `window_not_focused` as `data.last_observation.kind` or equivalent observation payload
-Stable structured failure fields include:
+Wait failures may also include `window_not_focused` in the last observation
+payload.
-- `data.kind`
-- `data.selector` when selector-related
-- `data.mode` when selector-related
-- `data.candidates` for ambiguous selector failures
-- `data.message` for invalid selector failures
-- `data.wait`
-- `data.timeout_ms`
-- `data.poll_ms`
-- `data.last_observation`
+## Best-effort fields
-## Best-Effort Fields
+Treat these as useful but non-contractual:
-These values are useful but environment-dependent and should be treated as best-effort:
+- exact monitor names
+- incidental text formatting in non-JSON mode
+- default screenshot file names when no explicit path was provided
+- environment-dependent ordering details from the window manager
-- exact monitor naming conventions
-- EWMH/window-manager-dependent window ordering details
-- cosmetic text formatting in non-JSON mode
-- screenshot file names when the caller did not provide an explicit path
-- command stderr wording outside the structured `kind` classifications above
-
-## Text Mode Expectations
-
-Text mode is intended to stay compact and follow-up-useful.
-
-The exact whitespace/alignment of text output is not stable.
-The following expectations are stable at the behavioral level:
-
-- important runtime reads print actionable identifiers or geometry
-- selector failures print enough detail to recover without `--json`
-- artifact-producing commands print the artifact path
-- window listings print both `@wN` refs and `window_id` values
-
-If an agent needs strict parsing, it should use `--json`.
+For the full repo copy, see `docs/runtime-contract.md`.
diff --git a/site/src/pages/architecture.mdx b/site/src/pages/architecture.mdx
index 87b2b4e..9478246 100644
--- a/site/src/pages/architecture.mdx
+++ b/site/src/pages/architecture.mdx
@@ -6,73 +6,93 @@ toc: true
# Architecture
-## Client-daemon model
+## Public model
-deskctl uses a client-daemon architecture over Unix sockets. The daemon starts automatically on the first command and keeps the X11 connection alive so repeated calls skip the connection setup overhead.
+`deskctl` is a thin, non-interactive X11 control primitive for agent loops.
+The public flow is:
-Each command opens a new connection to the daemon, sends a single NDJSON request, reads one NDJSON response, and exits.
+- diagnose with `deskctl doctor`
+- observe with `snapshot`, `list-windows`, and grouped `get` commands
+- wait with grouped `wait` commands instead of shell `sleep`
+- act with explicit selectors or coordinates
+- verify with another read or snapshot
-## Wire protocol
+The tool stays intentionally narrow. It does not try to be a full desktop shell
+or a speculative Wayland abstraction.
+
+## Client-daemon architecture
+
+The CLI talks to an auto-managed daemon over a Unix socket. The daemon keeps
+the X11 connection alive so repeated commands stay fast and share the same
+session-scoped window identity map.
+
+Each CLI invocation sends one request, reads one response, and exits.
+
+## Runtime contract
Requests and responses are newline-delimited JSON (NDJSON) over a Unix socket.
-**Request:**
+All commands share the same JSON envelope:
```json
-{ "id": "r123456", "action": "snapshot", "annotate": true }
+{
+ "success": true,
+ "data": {},
+ "error": null
+}
```
-**Response:**
+For window payloads, the public identity is `window_id`, not an X11 handle.
+That keeps the contract backend-neutral even though the current support
+boundary is X11-only.
-```json
-{"success": true, "data": {"screenshot": "/tmp/deskctl-1234567890.png", "windows": [...]}}
-```
+The complete stable-vs-best-effort policy lives on the
+[runtime contract](/runtime-contract) page.
-Error responses include an `error` field:
+## Sessions and sockets
-```json
-{ "success": false, "error": "window not found: @w99" }
-```
+Each session gets its own socket path, PID file, and live window mapping.
-## Socket location
+Public socket resolution order:
-The daemon socket is resolved in this order:
-
-1. `--socket` flag (highest priority)
-2. `$DESKCTL_SOCKET_DIR/{session}.sock`
-3. `$XDG_RUNTIME_DIR/deskctl/{session}.sock`
+1. `--socket`
+2. `DESKCTL_SOCKET_DIR/{session}.sock`
+3. `XDG_RUNTIME_DIR/deskctl/{session}.sock`
4. `~/.deskctl/{session}.sock`
-PID files are stored alongside the socket.
+Most users should let `deskctl` manage this automatically. `--session` is the
+main public knob when you need isolated daemon instances.
-## Sessions
+## Diagnostics and failure handling
-Multiple isolated daemon instances can run simultaneously using the `--session` flag:
+`deskctl doctor` runs before daemon startup and checks:
-```sh
-deskctl --session workspace1 snapshot
-deskctl --session workspace2 snapshot
-```
+- display/session setup
+- X11 connectivity
+- basic window enumeration
+- screenshot viability
+- socket directory and stale-socket health
-Each session has its own socket, PID file, and window ref map.
+Selector and wait failures are structured in `--json` mode so clients can
+recover without scraping text.
-## Backend design
+## Backend notes
-The core is built around a `DesktopBackend` trait. The current implementation uses `x11rb` for X11 protocol operations and `enigo` for input simulation.
+The backend is built around a `DesktopBackend` trait and currently ships with
+an X11 implementation backed by `x11rb`.
-The trait-based design means adding Wayland support is a single trait implementation with no changes to the core, CLI, or daemon code.
+The important public guarantee is not "portable desktop automation." The
+important guarantee is "a correct and unsurprising Linux X11 runtime contract."
-## X11 integration
+## X11 support boundary
-Window detection uses EWMH properties:
+This phase supports Linux X11 only.
-| Property | Purpose |
-| --------------------------- | ------------------------ |
-| `_NET_CLIENT_LIST_STACKING` | Window stacking order |
-| `_NET_ACTIVE_WINDOW` | Currently focused window |
-| `_NET_WM_NAME` | Window title (UTF-8) |
-| `_NET_WM_STATE_HIDDEN` | Minimized state |
-| `_NET_CLOSE_WINDOW` | Graceful close |
-| `WM_CLASS` | Application class/name |
+That means:
-Falls back to `XQueryTree` if `_NET_CLIENT_LIST_STACKING` is unavailable.
+- EWMH/window-manager properties matter
+- monitor naming and some ordering details are best-effort
+- Wayland and Hyprland are out of scope for the current contract
+
+The runtime documents those boundaries explicitly instead of pretending the
+surface is broader than it is.
diff --git a/site/src/pages/commands.mdx b/site/src/pages/commands.mdx
index e1fc509..8a5132b 100644
--- a/site/src/pages/commands.mdx
+++ b/site/src/pages/commands.mdx
@@ -6,167 +6,101 @@ toc: true
# Commands
-## Snapshot
-
-Capture a screenshot and get the window tree:
+## Observe
```sh
+deskctl doctor
deskctl snapshot
deskctl snapshot --annotate
-```
-
-With `--annotate`, colored bounding boxes and `@wN` labels are drawn on the screenshot. Each window gets a unique color from an 8-color palette. Minimized windows are skipped.
-
-The screenshot is saved to `/tmp/deskctl-{timestamp}.png`.
-
-## Click
-
-Click the center of a window by ref, or click exact coordinates:
-
-```sh
-deskctl click @w1
-deskctl click 960,540
-```
-
-## Double click
-
-```sh
-deskctl dblclick @w1
-deskctl dblclick 500,300
-```
-
-## Type
-
-Type a string into the focused window:
-
-```sh
-deskctl type "hello world"
-```
-
-## Press
-
-Press a single key:
-
-```sh
-deskctl press enter
-deskctl press tab
-deskctl press escape
-```
-
-Supported key names: `enter`, `tab`, `escape`, `backspace`, `delete`, `space`, `up`, `down`, `left`, `right`, `home`, `end`, `pageup`, `pagedown`, `f1`-`f12`, or any single character.
-
-## Hotkey
-
-Send a key combination. List modifier keys first, then the target key:
-
-```sh
-deskctl hotkey ctrl c
-deskctl hotkey ctrl shift t
-deskctl hotkey alt f4
-```
-
-Modifier names: `ctrl`, `alt`, `shift`, `super` (also `meta` or `win`).
-
-## Mouse move
-
-Move the cursor to absolute coordinates:
-
-```sh
-deskctl mouse move 100 200
-```
-
-## Mouse scroll
-
-Scroll the mouse wheel. Positive values scroll down, negative scroll up:
-
-```sh
-deskctl mouse scroll 3
-deskctl mouse scroll -5
-deskctl mouse scroll 3 --axis horizontal
-```
-
-## Mouse drag
-
-Drag from one position to another:
-
-```sh
-deskctl mouse drag 100 200 500 600
-```
-
-## Focus
-
-Focus a window by ref or by name (case-insensitive substring match):
-
-```sh
-deskctl focus @w1
-deskctl focus "firefox"
-```
-
-## Close
-
-Close a window gracefully:
-
-```sh
-deskctl close @w2
-deskctl close "terminal"
-```
-
-## Move window
-
-Move a window to an absolute position:
-
-```sh
-deskctl move-window @w1 0 0
-deskctl move-window "firefox" 100 100
-```
-
-## Resize window
-
-Resize a window:
-
-```sh
-deskctl resize-window @w1 1280 720
-```
-
-## List windows
-
-List all windows without taking a screenshot:
-
-```sh
deskctl list-windows
-```
-
-## Get screen size
-
-```sh
+deskctl screenshot
+deskctl screenshot /tmp/screen.png
+deskctl get active-window
+deskctl get monitors
+deskctl get version
+deskctl get systeminfo
deskctl get-screen-size
-```
-
-## Get mouse position
-
-```sh
deskctl get-mouse-position
```
-## Screenshot
+`doctor` checks the runtime before daemon startup. `snapshot` produces a
+screenshot plus window refs. `list-windows` is the same window tree without the
+side effect of writing a screenshot.
-Take a screenshot without the window tree. Optionally specify a save path:
+## Wait
```sh
-deskctl screenshot
-deskctl screenshot /tmp/my-screenshot.png
-deskctl screenshot --annotate
+deskctl wait window --selector 'title=Firefox' --timeout 10
+deskctl wait focus --selector 'id=win3' --timeout 5
+deskctl --json wait window --selector 'class=firefox' --poll-ms 100
```
-## Launch
+Wait commands return the matched window payload on success. In `--json` mode,
+timeouts and selector failures expose structured `kind` values.
-Launch an application:
+## Act on a window
```sh
deskctl launch firefox
-deskctl launch code --args /path/to/project
+deskctl focus @w1
+deskctl focus 'title=Firefox'
+deskctl click @w1
+deskctl click 960,540
+deskctl dblclick @w2
+deskctl close @w3
+deskctl move-window @w1 100 120
+deskctl resize-window @w1 1280 720
```
+Selector-driven actions accept refs, explicit selector modes, or absolute
+coordinates where appropriate.
+
+## Input and mouse
+
+```sh
+deskctl type "hello world"
+deskctl press enter
+deskctl hotkey ctrl shift t
+deskctl mouse move 100 200
+deskctl mouse scroll 3
+deskctl mouse scroll 3 --axis horizontal
+deskctl mouse drag 100 200 500 600
+```
+
+Supported key names include `enter`, `tab`, `escape`, `backspace`, `delete`,
+`space`, arrow keys, paging keys, `f1` through `f12`, and any single
+character.
+
+## Launch
+
+```sh
+deskctl launch firefox
+deskctl launch code -- --new-window
+```
+
+## Selectors
+
+Prefer explicit selectors when the target matters:
+
+```sh
+ref=w1
+id=win1
+title=Firefox
+class=firefox
+focused
+```
+
+Legacy shorthand is still supported:
+
+```sh
+@w1
+w1
+win1
+```
+
+Bare strings like `firefox` are fuzzy matches. They resolve when there is one
+match and fail with candidate windows when there are multiple matches.
+
## Global options
| Flag | Env | Description |
@@ -174,3 +108,6 @@ deskctl launch code --args /path/to/project
| `--json` | | Output as JSON |
| `--socket ` | `DESKCTL_SOCKET` | Path to daemon Unix socket |
| `--session ` | | Session name for multiple daemons (default: `default`) |
+
+`deskctl` manages the daemon automatically. Most users never need to think
+about it beyond `--session` and `--socket`.
diff --git a/site/src/pages/index.astro b/site/src/pages/index.astro
index 9327dc5..4263549 100644
--- a/site/src/pages/index.astro
+++ b/site/src/pages/index.astro
@@ -8,17 +8,49 @@ import DocLayout from "../layouts/DocLayout.astro";
-