align docs and contract

2026-04-15 05:02:08 +00:00 · 2026-03-26 08:17:07 -04:00 · 2026-03-26 08:17:07 -04:00 · 14c8956321
commit 14c8956321
parent c37589ccf4
10 changed files with 590 additions and 657 deletions
--- a/README.md
+++ b/README.md
@ -1,266 +1,68 @@
 # deskctl

-Desktop control CLI for AI agents on Linux X11. 
+[![npm](https://img.shields.io/npm/v/deskctl-cli?label=npm)](https://www.npmjs.com/package/deskctl-cli)
+[![release](https://img.shields.io/github/v/release/harivansh-afk/deskctl?label=release)](https://github.com/harivansh-afk/deskctl/releases)
+[![runtime](https://img.shields.io/badge/runtime-linux--x11-111827)](#support-boundary)
+[![skill](https://img.shields.io/badge/skills.sh-deskctl-111827)](skills/deskctl)
+
+Non-interactive desktop control for AI agents on Linux X11.

 ## Install

-### Cargo
-
-```bash
-cargo install deskctl
-```
-
-Source builds on Linux require:
-
- Rust 1.75+
- `pkg-config`
- X11 development libraries for input and windowing, typically `libx11-dev` and `libxtst-dev` on Debian/Ubuntu
-
-### npm
-
 ```bash
 npm install -g deskctl-cli
-deskctl --help
+deskctl doctor
+deskctl snapshot --annotate
 ```

-One-shot execution is also supported:
+One-shot execution also works:

 ```bash
 npx deskctl-cli --help
 ```

-`deskctl-cli` currently supports `linux-x64` and installs the `deskctl` command by downloading the matching GitHub Release asset.
+`deskctl-cli` installs the `deskctl` command by downloading the matching GitHub Release asset for the supported runtime target.

-### Installable skill
-
-For `skills.sh` / agent skill ecosystems:
+## Installable skill

 ```bash
 npx skills add harivansh-afk/deskctl -s deskctl
 ```

-The installable skill lives under [`skills/deskctl`](skills/deskctl) and is designed for X11 sandboxes, VMs, and sandbox-agent desktop sessions. It points agents to the npm install path first so they can get `deskctl` without Cargo.
+The installable skill lives in [`skills/deskctl`](skills/deskctl) and is built around the same observe -> wait -> act -> verify loop as the CLI.

-### Nix
+## Quick example
+
+```bash
+deskctl doctor
+deskctl snapshot --annotate
+deskctl wait window --selector 'title=Firefox' --timeout 10
+deskctl focus 'title=Firefox'
+deskctl type "hello world"
+```
+
+## Docs
+
+- runtime contract: [docs/runtime-contract.md](docs/runtime-contract.md)
+- release flow: [docs/releasing.md](docs/releasing.md)
+- installable skill: [skills/deskctl](skills/deskctl)
+- contributor workflow: [CONTRIBUTING.md](CONTRIBUTING.md)
+
+## Other install paths
+
+Nix:

 ```bash
 nix run github:harivansh-afk/deskctl -- --help
 nix profile install github:harivansh-afk/deskctl
 ```

-The repo flake is the supported Nix install surface in this phase.
-
-### Docker Convenience
-
-Build a Linux binary locally with Docker:
-
-```bash
-docker compose -f docker/docker-compose.yml run --rm build
-```
-
-This writes `dist/deskctl-linux-x86_64`.
-
-Copy it to an SSH machine where `scp` is unavailable:
-
-```bash
-ssh -p 443 deskctl@ssh.agentcomputer.ai 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64
-```
-
-Run it on an X11 session:
-
-```bash
-DISPLAY=:1 XDG_SESSION_TYPE=x11 ~/deskctl --json snapshot --annotate
-```
-
-### Local Source Build
+Source build:

 ```bash
 cargo build
 ```

-## Quick Start
+## Support boundary

-```bash
-# Diagnose the environment first
-deskctl doctor
-
-# See the desktop
-deskctl snapshot
-
-# Query focused runtime state
-deskctl get active-window
-deskctl get monitors
-
-# Click a window
-deskctl click @w1
-
-# Type text
-deskctl type "hello world"
-
-# Wait for a window or focus transition
-deskctl wait window --selector 'title=Firefox' --timeout 10
-deskctl wait focus --selector 'class=firefox' --timeout 5
-
-# Focus by explicit selector
-deskctl focus 'title=Firefox'
-```
-
-## Architecture
-
-Client-daemon architecture over Unix sockets (NDJSON wire protocol). 
-The daemon starts automatically on first command and keeps the X11 connection alive for fast repeated calls.
-
-Source layout:
-
- `src/lib.rs` exposes the shared library target
- `src/main.rs` is the thin CLI wrapper
- `src/` contains production code and unit tests
- `tests/` contains Linux/X11 integration tests
- `tests/support/` contains shared integration helpers
-
-## Runtime Requirements
-
- Linux with X11 session
- Rust 1.75+ plus the source-build dependencies above when building from source
-
-The binary itself only links the standard glibc runtime on Linux (`libc`, `libm`, `libgcc_s`).
-
-For deskctl to be fully functional on a fresh VM you still need:
-
- an X11 server and an active `DISPLAY`
- `XDG_SESSION_TYPE=x11` or an equivalent X11 session environment
- a window manager or desktop environment that exposes standard EWMH properties such as `_NET_CLIENT_LIST_STACKING` and `_NET_ACTIVE_WINDOW`
- an X server with the extensions needed for input simulation and screen metadata, which is standard on normal desktop X11 setups
-
-If setup fails, run:
-
-```bash
-deskctl doctor
-```
-
-## Contract Notes
-
- `@wN` refs are short-lived handles assigned by `snapshot` and `list-windows`
- `--json` output includes a stable `window_id` for programmatic targeting within the current daemon session
- `list-windows` is a cheap read-only operation and does not capture or write a screenshot
- the stable runtime JSON/error contract is documented in [docs/runtime-contract.md](docs/runtime-contract.md)
-
-## Read and Wait Surface
-
-The grouped runtime reads are:
-
-```bash
-deskctl get active-window
-deskctl get monitors
-deskctl get version
-deskctl get systeminfo
-```
-
-The grouped runtime waits are:
-
-```bash
-deskctl wait window --selector 'title=Firefox' --timeout 10
-deskctl wait focus --selector 'id=win3' --timeout 5
-```
-
-Successful `get active-window`, `wait window`, and `wait focus` responses return a `window` payload with:
- `ref_id`
- `window_id`
- `title`
- `app_name`
- geometry (`x`, `y`, `width`, `height`)
- state flags (`focused`, `minimized`)
-
-`get monitors` returns:
- `count`
- `monitors[]` with geometry and primary/automatic flags
-
-`get version` returns:
- `version`
- `backend`
-
-`get systeminfo` stays runtime-scoped and returns:
- `backend`
- `display`
- `session_type`
- `session`
- `socket_path`
- `screen`
- `monitor_count`
- `monitors`
-
-Wait timeout and selector failures are structured in `--json` mode so agents can recover without string parsing.
-
-## Output Policy
-
-Text mode is compact and follow-up-oriented, but JSON is the parsing contract.
-
- use `--json` when an agent needs strict parsing
- rely on `window_id`, selector-related fields, grouped read payloads, and structured error `kind` values for stable automation
- treat monitor naming, incidental whitespace, and default screenshot file names as best-effort
-
-See [docs/runtime-conract.md](docs/runtime-contract.md) for the exact stable-vs-best-effort breakdown.
-
-## Distribution
-
- GitHub Releases are the canonical binary source
- crates.io package: `deskctl`
- npm package: `deskctl-cli`
- installed command on every channel: `deskctl`
- repo-owned Nix install path: `flake.nix`
-
-For maintainer publishing and release steps, see [docs/releasing.md](docs/releasing.md).
-
-## Selector Contract
-
-Explicit selector modes:
-
-```bash
-ref=w1
-id=win1
-title=Firefox
-class=firefox
-focused
-```
-
-Legacy refs remain supported:
-
-```bash
-@w1
-w1
-win1
-```
-
-Bare selectors such as `firefox` are still supported as fuzzy substring matches, but they now fail on ambiguity and return candidate windows instead of silently picking the first match.
-
-## Support Boundary
-
-`deskctl` supports Linux X11 in this phase. Wayland and Hyprland are explicitly out of scope for the current runtime contract.
-
-## Workflow
-
-Local validation uses the root `Makefile`:
-
-```bash
-make fmt-check
-make lint
-make test-unit
-make test-integration
-make site-format-check
-make validate
-```
-
-`make validate` is the full repo-quality check and requires Linux with `xvfb-run` plus `pnpm --dir site install`.
-
-The repository standardizes on `pre-commit` for fast commit-time checks:
-
-```bash
-pre-commit install
-pre-commit run --all-files
-```
-
-See [CONTRIBUTING.md](CONTRIBUTING.md) for the full contributor guide.
-
-## Acknowledgements
-
- [@barrettruth](github.com/barrettruth) - i stole the website from [vimdoc](https://github.com/barrettruth/vimdoc-language-server)
+`deskctl` currently supports Linux X11. Use `--json` for stable machine parsing, use `window_id` for programmatic targeting inside a live session, and use `deskctl doctor` first when the runtime looks broken.
--- a/docs/runtime-contract.md
+++ b/docs/runtime-contract.md
@ -1,19 +1,6 @@
-# Runtime Output Contract
+# deskctl runtime contract

-This document defines the current output contract for `deskctl`.
-
-It is intentionally scoped to the current Linux X11 runtime surface.
-It does not promise stability for future Wayland or window-manager-specific features.
-
-## Goals
-
- Keep `deskctl` fully non-interactive
- Make text output actionable for quick terminal and agent loops
- Make `--json` safe for agent consumption without depending on incidental formatting
-
-## JSON Envelope
-
-Every runtime command uses the same top-level JSON envelope:
+All commands support `--json` and use the same top-level envelope:

 ```json
 {
@ -23,22 +10,11 @@ Every runtime command uses the same top-level JSON envelope:
 }
 ```

-Stable top-level fields:
+Use `--json` whenever you need to parse output programmatically.

- `success`
- `data`
- `error`
+## Stable window fields

-`success` is always the authoritative success/failure bit.
-When `success` is `false`, the CLI exits non-zero in both text mode and `--json` mode.
-
-## Stable Fields
-
-These fields are stable for agent consumption in the current Phase 1 runtime contract.
-
-### Window Identity
-
-Whenever a runtime response includes a window payload, these fields are stable:
+Whenever a response includes a window payload, these fields are stable:

 - `ref_id`
 - `window_id`
@ -51,128 +27,46 @@ Whenever a runtime response includes a window payload, these fields are stable:
 - `focused`
 - `minimized`

-`window_id` is the stable public identifier for a live daemon session.
-`ref_id` is a short-lived convenience handle for the current window snapshot/ref map.
+Use `window_id` for stable targeting inside a live daemon session. Use
+`ref_id` or `@wN` for short-lived follow-up actions after `snapshot` or
+`list-windows`.

-### Grouped Reads
+## Stable grouped reads

-`deskctl get active-window`
+- `deskctl get active-window` -> `data.window`
+- `deskctl get monitors` -> `data.count`, `data.monitors`
+- `deskctl get version` -> `data.version`, `data.backend`
+- `deskctl get systeminfo` -> runtime-scoped diagnostic fields such as
+  `backend`, `display`, `session_type`, `session`, `socket_path`, `screen`,
+  `monitor_count`, and `monitors`

- stable: `data.window`
+## Stable waits

-`deskctl get monitors`
+- `deskctl wait window` -> `data.wait`, `data.selector`, `data.elapsed_ms`,
+  `data.window`
+- `deskctl wait focus` -> `data.wait`, `data.selector`, `data.elapsed_ms`,
+  `data.window`

- stable: `data.count`
- stable: `data.monitors`
- stable per monitor:
-  - `name`
-  - `x`
-  - `y`
-  - `width`
-  - `height`
-  - `width_mm`
-  - `height_mm`
-  - `primary`
-  - `automatic`
+## Stable structured error kinds

-`deskctl get version`
-
- stable: `data.version`
- stable: `data.backend`
-
-`deskctl get systeminfo`
-
- stable: `data.backend`
- stable: `data.display`
- stable: `data.session_type`
- stable: `data.session`
- stable: `data.socket_path`
- stable: `data.screen`
- stable: `data.monitor_count`
- stable: `data.monitors`
-
-### Waits
-
-`deskctl wait window`
-`deskctl wait focus`
-
- stable: `data.wait`
- stable: `data.selector`
- stable: `data.elapsed_ms`
- stable: `data.window`
-
-### Selector-Driven Action Success
-
-For selector-driven action commands that resolve a window target, these identifiers are stable when present:
-
- `data.ref_id`
- `data.window_id`
- `data.title`
- `data.selector`
-
-This applies to:
-
- `click`
- `dblclick`
- `focus`
- `close`
- `move-window`
- `resize-window`
-
-The exact human-readable text rendering of those commands is not part of the JSON contract.
-
-### Artifact-Producing Commands
-
-`snapshot`
-`screenshot`
-
- stable: `data.screenshot`
-
-When the command also returns windows, `data.windows` uses the stable window payload documented above.
-
-## Stable Structured Error Kinds
-
-When a runtime command returns structured JSON failure data, these error kinds are stable:
+When a command fails with structured JSON data, these `kind` values are stable:

 - `selector_not_found`
 - `selector_ambiguous`
 - `selector_invalid`
 - `timeout`
 - `not_found`
- `window_not_focused` as `data.last_observation.kind` or equivalent observation payload

-Stable structured failure fields include:
+Wait failures may also include `window_not_focused` in the last observation
+payload.

- `data.kind`
- `data.selector` when selector-related
- `data.mode` when selector-related
- `data.candidates` for ambiguous selector failures
- `data.message` for invalid selector failures
- `data.wait`
- `data.timeout_ms`
- `data.poll_ms`
- `data.last_observation`
+## Best-effort fields

-## Best-Effort Fields
+Treat these as useful but non-contractual:

-These values are useful but environment-dependent and should be treated as best-effort:
+- exact monitor names
+- incidental text formatting in non-JSON mode
+- default screenshot file names when no explicit path was provided
+- environment-dependent ordering details from the window manager

- exact monitor naming conventions
- EWMH/window-manager-dependent window ordering details
- cosmetic text formatting in non-JSON mode
- screenshot file names when the caller did not provide an explicit path
- command stderr wording outside the structured `kind` classifications above
-
-## Text Mode Expectations
-
-Text mode is intended to stay compact and follow-up-useful.
-
-The exact whitespace/alignment of text output is not stable.
-The following expectations are stable at the behavioral level:
-
- important runtime reads print actionable identifiers or geometry
- selector failures print enough detail to recover without `--json`
- artifact-producing commands print the artifact path
- window listings print both `@wN` refs and `window_id` values
-
-If an agent needs strict parsing, it should use `--json`.
+For the full repo copy, see `docs/runtime-contract.md`.
--- a/site/src/pages/architecture.mdx
+++ b/site/src/pages/architecture.mdx
@ -6,73 +6,93 @@ toc: true

 # Architecture

-## Client-daemon model
+## Public model

-deskctl uses a client-daemon architecture over Unix sockets. The daemon starts automatically on the first command and keeps the X11 connection alive so repeated calls skip the connection setup overhead.
+`deskctl` is a thin, non-interactive X11 control primitive for agent loops.
+The public flow is:

-Each command opens a new connection to the daemon, sends a single NDJSON request, reads one NDJSON response, and exits.
+- diagnose with `deskctl doctor`
+- observe with `snapshot`, `list-windows`, and grouped `get` commands
+- wait with grouped `wait` commands instead of shell `sleep`
+- act with explicit selectors or coordinates
+- verify with another read or snapshot

-## Wire protocol
+The tool stays intentionally narrow. It does not try to be a full desktop shell
+or a speculative Wayland abstraction.
+
+## Client-daemon architecture
+
+The CLI talks to an auto-managed daemon over a Unix socket. The daemon keeps
+the X11 connection alive so repeated commands stay fast and share the same
+session-scoped window identity map.
+
+Each CLI invocation sends one request, reads one response, and exits.
+
+## Runtime contract

 Requests and responses are newline-delimited JSON (NDJSON) over a Unix socket.

-**Request:**
+All commands share the same JSON envelope:

 ```json
-{ "id": "r123456", "action": "snapshot", "annotate": true }
+{
+  "success": true,
+  "data": {},
+  "error": null
+}
 ```

-**Response:**
+For window payloads, the public identity is `window_id`, not an X11 handle.
+That keeps the contract backend-neutral even though the current support
+boundary is X11-only.

-```json
-{"success": true, "data": {"screenshot": "/tmp/deskctl-1234567890.png", "windows": [...]}}
-```
+The complete stable-vs-best-effort policy lives on the
+[runtime contract](/runtime-contract) page.

-Error responses include an `error` field:
+## Sessions and sockets

-```json
-{ "success": false, "error": "window not found: @w99" }
-```
+Each session gets its own socket path, PID file, and live window mapping.

-## Socket location
+Public socket resolution order:

-The daemon socket is resolved in this order:
-
-1. `--socket` flag (highest priority)
-2. `$DESKCTL_SOCKET_DIR/{session}.sock`
-3. `$XDG_RUNTIME_DIR/deskctl/{session}.sock`
+1. `--socket`
+2. `DESKCTL_SOCKET_DIR/{session}.sock`
+3. `XDG_RUNTIME_DIR/deskctl/{session}.sock`
 4. `~/.deskctl/{session}.sock`

-PID files are stored alongside the socket.
+Most users should let `deskctl` manage this automatically. `--session` is the
+main public knob when you need isolated daemon instances.

-## Sessions
+## Diagnostics and failure handling

-Multiple isolated daemon instances can run simultaneously using the `--session` flag:
+`deskctl doctor` runs before daemon startup and checks:

-```sh
-deskctl --session workspace1 snapshot
-deskctl --session workspace2 snapshot
-```
+- display/session setup
+- X11 connectivity
+- basic window enumeration
+- screenshot viability
+- socket directory and stale-socket health

-Each session has its own socket, PID file, and window ref map.
+Selector and wait failures are structured in `--json` mode so clients can
+recover without scraping text.

-## Backend design
+## Backend notes

-The core is built around a `DesktopBackend` trait. The current implementation uses `x11rb` for X11 protocol operations and `enigo` for input simulation.
+The backend is built around a `DesktopBackend` trait and currently ships with
+an X11 implementation backed by `x11rb`.

-The trait-based design means adding Wayland support is a single trait implementation with no changes to the core, CLI, or daemon code.
+The important public guarantee is not "portable desktop automation." The
+important guarantee is "a correct and unsurprising Linux X11 runtime contract."

-## X11 integration
+## X11 support boundary

-Window detection uses EWMH properties:
+This phase supports Linux X11 only.

-| Property                    | Purpose                  |
-| --------------------------- | ------------------------ |
-| `_NET_CLIENT_LIST_STACKING` | Window stacking order    |
-| `_NET_ACTIVE_WINDOW`        | Currently focused window |
-| `_NET_WM_NAME`              | Window title (UTF-8)     |
-| `_NET_WM_STATE_HIDDEN`      | Minimized state          |
-| `_NET_CLOSE_WINDOW`         | Graceful close           |
-| `WM_CLASS`                  | Application class/name   |
+That means:

-Falls back to `XQueryTree` if `_NET_CLIENT_LIST_STACKING` is unavailable.
+- EWMH/window-manager properties matter
+- monitor naming and some ordering details are best-effort
+- Wayland and Hyprland are out of scope for the current contract
+
+The runtime documents those boundaries explicitly instead of pretending the
+surface is broader than it is.
--- a/site/src/pages/commands.mdx
+++ b/site/src/pages/commands.mdx
@ -6,167 +6,101 @@ toc: true

 # Commands

-## Snapshot
-
-Capture a screenshot and get the window tree:
+## Observe

 ```sh
+deskctl doctor
 deskctl snapshot
 deskctl snapshot --annotate
-```
-
-With `--annotate`, colored bounding boxes and `@wN` labels are drawn on the screenshot. Each window gets a unique color from an 8-color palette. Minimized windows are skipped.
-
-The screenshot is saved to `/tmp/deskctl-{timestamp}.png`.
-
-## Click
-
-Click the center of a window by ref, or click exact coordinates:
-
-```sh
-deskctl click @w1
-deskctl click 960,540
-```
-
-## Double click
-
-```sh
-deskctl dblclick @w1
-deskctl dblclick 500,300
-```
-
-## Type
-
-Type a string into the focused window:
-
-```sh
-deskctl type "hello world"
-```
-
-## Press
-
-Press a single key:
-
-```sh
-deskctl press enter
-deskctl press tab
-deskctl press escape
-```
-
-Supported key names: `enter`, `tab`, `escape`, `backspace`, `delete`, `space`, `up`, `down`, `left`, `right`, `home`, `end`, `pageup`, `pagedown`, `f1`-`f12`, or any single character.
-
-## Hotkey
-
-Send a key combination. List modifier keys first, then the target key:
-
-```sh
-deskctl hotkey ctrl c
-deskctl hotkey ctrl shift t
-deskctl hotkey alt f4
-```
-
-Modifier names: `ctrl`, `alt`, `shift`, `super` (also `meta` or `win`).
-
-## Mouse move
-
-Move the cursor to absolute coordinates:
-
-```sh
-deskctl mouse move 100 200
-```
-
-## Mouse scroll
-
-Scroll the mouse wheel. Positive values scroll down, negative scroll up:
-
-```sh
-deskctl mouse scroll 3
-deskctl mouse scroll -5
-deskctl mouse scroll 3 --axis horizontal
-```
-
-## Mouse drag
-
-Drag from one position to another:
-
-```sh
-deskctl mouse drag 100 200 500 600
-```
-
-## Focus
-
-Focus a window by ref or by name (case-insensitive substring match):
-
-```sh
-deskctl focus @w1
-deskctl focus "firefox"
-```
-
-## Close
-
-Close a window gracefully:
-
-```sh
-deskctl close @w2
-deskctl close "terminal"
-```
-
-## Move window
-
-Move a window to an absolute position:
-
-```sh
-deskctl move-window @w1 0 0
-deskctl move-window "firefox" 100 100
-```
-
-## Resize window
-
-Resize a window:
-
-```sh
-deskctl resize-window @w1 1280 720
-```
-
-## List windows
-
-List all windows without taking a screenshot:
-
-```sh
 deskctl list-windows
-```
-
-## Get screen size
-
-```sh
+deskctl screenshot
+deskctl screenshot /tmp/screen.png
+deskctl get active-window
+deskctl get monitors
+deskctl get version
+deskctl get systeminfo
 deskctl get-screen-size
-```
-
-## Get mouse position
-
-```sh
 deskctl get-mouse-position
 ```

-## Screenshot
+`doctor` checks the runtime before daemon startup. `snapshot` produces a
+screenshot plus window refs. `list-windows` is the same window tree without the
+side effect of writing a screenshot.

-Take a screenshot without the window tree. Optionally specify a save path:
+## Wait

 ```sh
-deskctl screenshot
-deskctl screenshot /tmp/my-screenshot.png
-deskctl screenshot --annotate
+deskctl wait window --selector 'title=Firefox' --timeout 10
+deskctl wait focus --selector 'id=win3' --timeout 5
+deskctl --json wait window --selector 'class=firefox' --poll-ms 100
 ```

-## Launch
+Wait commands return the matched window payload on success. In `--json` mode,
+timeouts and selector failures expose structured `kind` values.

-Launch an application:
+## Act on a window

 ```sh
 deskctl launch firefox
-deskctl launch code --args /path/to/project
+deskctl focus @w1
+deskctl focus 'title=Firefox'
+deskctl click @w1
+deskctl click 960,540
+deskctl dblclick @w2
+deskctl close @w3
+deskctl move-window @w1 100 120
+deskctl resize-window @w1 1280 720
 ```

+Selector-driven actions accept refs, explicit selector modes, or absolute
+coordinates where appropriate.
+
+## Input and mouse
+
+```sh
+deskctl type "hello world"
+deskctl press enter
+deskctl hotkey ctrl shift t
+deskctl mouse move 100 200
+deskctl mouse scroll 3
+deskctl mouse scroll 3 --axis horizontal
+deskctl mouse drag 100 200 500 600
+```
+
+Supported key names include `enter`, `tab`, `escape`, `backspace`, `delete`,
+`space`, arrow keys, paging keys, `f1` through `f12`, and any single
+character.
+
+## Launch
+
+```sh
+deskctl launch firefox
+deskctl launch code -- --new-window
+```
+
+## Selectors
+
+Prefer explicit selectors when the target matters:
+
+```sh
+ref=w1
+id=win1
+title=Firefox
+class=firefox
+focused
+```
+
+Legacy shorthand is still supported:
+
+```sh
+@w1
+w1
+win1
+```
+
+Bare strings like `firefox` are fuzzy matches. They resolve when there is one
+match and fail with candidate windows when there are multiple matches.
+
 ## Global options

 | Flag               | Env              | Description                                            |
@ -174,3 +108,6 @@ deskctl launch code --args /path/to/project
 | `--json`           |                  | Output as JSON                                         |
 | `--socket <path>`  | `DESKCTL_SOCKET` | Path to daemon Unix socket                             |
 | `--session <name>` |                  | Session name for multiple daemons (default: `default`) |
+
+`deskctl` manages the daemon automatically. Most users never need to think
+about it beyond `--session` and `--socket`.
--- a/site/src/pages/index.astro
+++ b/site/src/pages/index.astro
@ -8,17 +8,49 @@ import DocLayout from "../layouts/DocLayout.astro";
    <img src="/favicon.svg" alt="" width="40" height="40" />
  </header>

-  <p>
-    Desktop control CLI for AI agents on Linux X11. Compact JSON output for
-    agent loops. Screenshot, click, type, scroll, drag, and manage windows
-    through a fast client-daemon architecture. 100% native Rust.
+  <p class="tagline">non-interactive desktop control for AI agents</p>
+
+  <div class="badges" aria-label="package and runtime badges">
+    <a href="https://www.npmjs.com/package/deskctl-cli">
+      <img
+        src="https://img.shields.io/npm/v/deskctl-cli?label=npm"
+        alt="npm version badge"
+      />
+    </a>
+    <a href="https://github.com/harivansh-afk/deskctl/releases">
+      <img
+        src="https://img.shields.io/github/v/release/harivansh-afk/deskctl?label=release"
+        alt="github release badge"
+      />
+    </a>
+    <img
+      src="https://img.shields.io/badge/runtime-linux--x11-111827"
+      alt="linux x11 runtime badge"
+    />
+    <a href="https://www.npmjs.com/package/deskctl-cli">
+      <img
+        src="https://img.shields.io/badge/install-npm%20i%20-g%20deskctl--cli-111827"
+        alt="npm install command badge"
+      />
+    </a>
+  </div>
+
+  <p class="lede">
+    <code>deskctl</code> is a thin X11 control primitive for agent loops: diagnose
+    the runtime, observe the desktop, wait for state transitions, act deterministically,
+    then verify.
  </p>

-  <h2>Getting started</h2>
+  <pre><code>npm install -g deskctl-cli
+deskctl doctor
+deskctl snapshot --annotate</code></pre>
+
+  <h2>Start here</h2>

  <ul>
    <li><a href="/installation">Installation</a></li>
    <li><a href="/quick-start">Quick start</a></li>
+    <li><a href="/runtime-contract">Runtime contract</a></li>
  </ul>

  <h2>Reference</h2>
@ -28,14 +60,27 @@ import DocLayout from "../layouts/DocLayout.astro";
    <li><a href="/architecture">Architecture</a></li>
  </ul>

+  <h2>Agent skill</h2>
+
+  <p>
+    There is also an installable skill for `skills.sh`-style agent runtimes:
+  </p>
+
+  <pre><code>npx skills add harivansh-afk/deskctl -s deskctl</code></pre>
+
  <h2>Links</h2>

  <ul>
+    <li>
+      <a href="https://www.npmjs.com/package/deskctl-cli">npm package</a>
+    </li>
    <li>
      <a href="https://github.com/harivansh-afk/deskctl">GitHub</a>
    </li>
    <li>
-      <a href="https://crates.io/crates/deskctl">crates.io</a>
+      <a href="https://github.com/harivansh-afk/deskctl/releases">
+        GitHub releases
+      </a>
    </li>
  </ul>
 </DocLayout>
--- a/site/src/pages/installation.mdx
+++ b/site/src/pages/installation.mdx
@ -6,43 +6,68 @@ toc: true

 # Installation

-## Cargo
+## Default install

 ```sh
-cargo install deskctl
+npm install -g deskctl-cli
+deskctl --help
 ```

-## From source
+`deskctl-cli` is the default install path. It installs the `deskctl` command by
+downloading the matching GitHub Release asset for the supported runtime target.
+
+## One-shot usage
+
+```sh
+npx deskctl-cli --help
+```
+
+## Agent skill
+
+For `skills.sh`-style runtimes:
+
+```sh
+npx skills add harivansh-afk/deskctl -s deskctl
+```
+
+The repo skill lives under `skills/deskctl` and is designed around the same
+observe -> wait -> act -> verify loop as the CLI.
+
+## Other install paths
+
+### Nix
+
+```sh
+nix run github:harivansh-afk/deskctl -- --help
+nix profile install github:harivansh-afk/deskctl
+```
+
+### Build from source

 ```sh
 git clone https://github.com/harivansh-afk/deskctl
 cd deskctl
-cargo build --release
+cargo build
 ```

-## Docker (cross-compile for Linux)
+Source builds on Linux require:

-Build a static Linux binary from any platform:
+- Rust 1.75+
+- `pkg-config`
+- X11 development libraries such as `libx11-dev` and `libxtst-dev`

-```sh
-docker compose -f docker/docker-compose.yml run --rm build
-```
-
-This writes `dist/deskctl-linux-x86_64`.
-
-## Deploy to a remote machine
-
-Copy the binary over SSH when `scp` is not available:
-
-```sh
-ssh -p 443 user@host 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64
-```
-
-## Requirements
+## Runtime requirements

 - Linux with an active X11 session
- `DISPLAY` environment variable set (e.g. `DISPLAY=:1`)
- `XDG_SESSION_TYPE=x11`
- A window manager that exposes EWMH properties (`_NET_CLIENT_LIST_STACKING`, `_NET_ACTIVE_WINDOW`)
+- `DISPLAY` set to a usable X11 display, such as `DISPLAY=:1`
+- `XDG_SESSION_TYPE=x11` or an equivalent X11 session environment
+- a window manager or desktop environment that exposes standard EWMH properties
+  such as `_NET_CLIENT_LIST_STACKING` and `_NET_ACTIVE_WINDOW`

-No extra native libraries are needed beyond the standard glibc runtime (`libc`, `libm`, `libgcc_s`).
+The binary itself only depends on the standard Linux glibc runtime.
+
+If setup fails, run:
+
+```sh
+deskctl doctor
+```
--- a/site/src/pages/quick-start.mdx
+++ b/site/src/pages/quick-start.mdx
@ -6,50 +6,72 @@ toc: true

 # Quick start

-## Core workflow
-
-The typical agent loop is: snapshot the desktop, interpret the result, act on it.
+## Install and diagnose

 ```sh
-# 1. see the desktop
-deskctl --json snapshot --annotate
+npm install -g deskctl-cli
+deskctl doctor
+```

-# 2. click a window by its ref
-deskctl click @w1
+Use `deskctl doctor` first. It checks X11 connectivity, basic enumeration,
+screenshot viability, and socket health before you start driving the desktop.

-# 3. type into the focused window
-deskctl type "hello world"
+## Observe

-# 4. press a key
+```sh
+deskctl snapshot --annotate
+deskctl list-windows
+deskctl get active-window
+deskctl get monitors
+```
+
+Use `snapshot` when you want a screenshot artifact plus window refs. Use
+`list-windows` when you only need the current window tree without writing a
+screenshot.
+
+## Target windows cleanly
+
+Prefer explicit selectors when you need deterministic targeting:
+
+```sh
+ref=w1
+id=win1
+title=Firefox
+class=firefox
+focused
+```
+
+Legacy refs such as `@w1` still work after `snapshot` or `list-windows`. Bare
+strings like `firefox` are fuzzy matches and now fail on ambiguity.
+
+## Wait, act, verify
+
+The core loop is:
+
+```sh
+# observe
+deskctl snapshot --annotate
+
+# wait
+deskctl wait window --selector 'title=Firefox' --timeout 10
+
+# act
+deskctl focus 'title=Firefox'
+deskctl hotkey ctrl l
+deskctl type "https://example.com"
 deskctl press enter
+
+# verify
+deskctl wait focus --selector 'title=Firefox' --timeout 5
+deskctl snapshot
 ```

-The `--annotate` flag draws colored bounding boxes and `@wN` labels on the screenshot so agents can visually identify windows.
+The wait commands return the matched window payload on success, so they compose
+cleanly into the next action.

-## Window refs
+## Use `--json` when parsing matters

-Every `snapshot` assigns refs like `@w1`, `@w2`, etc. to each visible window, ordered top-to-bottom by stacking order. Use these refs anywhere a selector is expected:
-
-```sh
-deskctl click @w1
-deskctl focus @w3
-deskctl close @w2
-```
-
-You can also select windows by name (case-insensitive substring match):
-
-```sh
-deskctl focus "firefox"
-deskctl close "terminal"
-```
-
-## JSON output
-
-Pass `--json` for machine-readable output. This is the primary mode for agent integrations:
-
-```sh
-deskctl --json snapshot
-```
+Every command supports `--json` and uses the same top-level envelope:

 ```json
 {
@ -59,7 +81,7 @@ deskctl --json snapshot
    "windows": [
      {
        "ref_id": "w1",
-        "xcb_id": 12345678,
+        "window_id": "win1",
        "title": "Firefox",
        "app_name": "firefox",
        "x": 0,
@ -74,14 +96,8 @@ deskctl --json snapshot
 }
 ```

-## Daemon lifecycle
+Use `window_id` for stable targeting inside a live daemon session. The exact
+text formatting is intentionally compact, but JSON is the parsing contract.

-The daemon starts automatically on the first command. It keeps the X11 connection alive so repeated calls are fast. You do not need to manage it manually.
-
-```sh
-# check if the daemon is running
-deskctl daemon status
-
-# stop it explicitly
-deskctl daemon stop
-```
+The full stable-vs-best-effort contract lives on the
+[runtime contract](/runtime-contract) page.
--- a/site/src/pages/runtime-contract.mdx
+++ b/site/src/pages/runtime-contract.mdx
@ -0,0 +1,177 @@
+---
+layout: ../layouts/DocLayout.astro
+title: Runtime contract
+toc: true
+---
+
+# Runtime contract
+
+This page defines the current public output contract for `deskctl`.
+
+It is intentionally scoped to the current Linux X11 runtime surface. It does
+not promise stability for future Wayland or window-manager-specific features.
+
+## JSON envelope
+
+Every command supports `--json` and uses the same top-level envelope:
+
+```json
+{
+  "success": true,
+  "data": {},
+  "error": null
+}
+```
+
+Stable top-level fields:
+
+- `success`
+- `data`
+- `error`
+
+If `success` is `false`, the command exits non-zero in both text mode and JSON
+mode.
+
+## Stable window fields
+
+Whenever a response includes a window payload, these fields are stable:
+
+- `ref_id`
+- `window_id`
+- `title`
+- `app_name`
+- `x`
+- `y`
+- `width`
+- `height`
+- `focused`
+- `minimized`
+
+`window_id` is the public session-scoped identifier for programmatic targeting.
+`ref_id` is a short-lived convenience handle from the current ref map.
+
+## Stable grouped reads
+
+`deskctl get active-window`
+
+- stable: `data.window`
+
+`deskctl get monitors`
+
+- stable: `data.count`
+- stable: `data.monitors`
+
+Stable per-monitor fields:
+
+- `name`
+- `x`
+- `y`
+- `width`
+- `height`
+- `width_mm`
+- `height_mm`
+- `primary`
+- `automatic`
+
+`deskctl get version`
+
+- stable: `data.version`
+- stable: `data.backend`
+
+`deskctl get systeminfo`
+
+- stable: `data.backend`
+- stable: `data.display`
+- stable: `data.session_type`
+- stable: `data.session`
+- stable: `data.socket_path`
+- stable: `data.screen`
+- stable: `data.monitor_count`
+- stable: `data.monitors`
+
+## Stable waits
+
+`deskctl wait window`
+`deskctl wait focus`
+
+- stable: `data.wait`
+- stable: `data.selector`
+- stable: `data.elapsed_ms`
+- stable: `data.window`
+
+## Stable selector-driven action fields
+
+When selector-driven actions return resolved window data, these fields are
+stable when present:
+
+- `data.ref_id`
+- `data.window_id`
+- `data.title`
+- `data.selector`
+
+This applies to:
+
+- `click`
+- `dblclick`
+- `focus`
+- `close`
+- `move-window`
+- `resize-window`
+
+## Stable artifact fields
+
+For `snapshot` and `screenshot`:
+
+- stable: `data.screenshot`
+
+When a command also returns windows, `data.windows` uses the stable window
+payload documented above.
+
+## Stable structured error kinds
+
+When a command fails with structured JSON data, these error kinds are stable:
+
+- `selector_not_found`
+- `selector_ambiguous`
+- `selector_invalid`
+- `timeout`
+- `not_found`
+- `window_not_focused` in `data.last_observation.kind` or an equivalent wait
+  observation payload
+
+Stable structured failure fields include:
+
+- `data.kind`
+- `data.selector`
+- `data.mode`
+- `data.candidates`
+- `data.message`
+- `data.wait`
+- `data.timeout_ms`
+- `data.poll_ms`
+- `data.last_observation`
+
+## Best-effort fields
+
+These values are useful but environment-dependent and should not be treated as
+strict parsing guarantees:
+
+- exact monitor naming conventions
+- EWMH/window-manager-dependent ordering details
+- cosmetic text formatting in non-JSON mode
+- default screenshot file names when no explicit path was provided
+- stderr wording outside the structured `kind` classifications above
+
+## Text mode expectations
+
+Text mode is intended to stay compact and follow-up-useful.
+
+The exact whitespace and alignment are not stable. The stable behavioral
+expectations are:
+
+- important reads print actionable identifiers or geometry
+- selector failures print enough detail to recover without `--json`
+- artifact-producing commands print the artifact path
+- window listings print both `@wN` refs and `window_id` values
+
+If you need strict parsing, use `--json`.
--- a/site/src/styles/base.css
+++ b/site/src/styles/base.css
@ -65,6 +65,23 @@ main {
  font-style: italic;
 }

+.lede {
+  font-size: 1.05rem;
+  max-width: 42rem;
+}
+
+.badges {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 0.6rem;
+  margin-bottom: 1.25rem;
+}
+
+.badges a,
+.badges img {
+  display: block;
+}
+
 header {
  display: flex;
  align-items: center;
@ -117,6 +134,10 @@ a:hover {
  text-decoration-thickness: 2px;
 }

+img {
+  max-width: 100%;
+}
+
 ul,
 ol {
  padding-left: 1.25em;
--- a/skills/deskctl/references/commands.md
+++ b/skills/deskctl/references/commands.md
@ -1,21 +1,22 @@
 # deskctl commands

-All commands support `--json` for machine-parseable output following the runtime contract.
+All commands support `--json` for machine-parseable output following the
+runtime contract.

 ## Observe

 ```bash
-deskctl doctor                          # check X11 runtime and daemon health
-deskctl snapshot                        # screenshot + window list
-deskctl snapshot --annotate             # screenshot with @wN labels overlaid
-deskctl list-windows                    # window list only (no screenshot)
-deskctl screenshot /tmp/screen.png      # screenshot to explicit path
-deskctl get active-window               # focused window info
-deskctl get monitors                    # monitor geometry
-deskctl get version                     # version and backend
-deskctl get systeminfo                  # full runtime diagnostics
-deskctl get-screen-size                 # screen resolution
-deskctl get-mouse-position              # cursor coordinates
+deskctl doctor
+deskctl snapshot
+deskctl snapshot --annotate
+deskctl list-windows
+deskctl screenshot /tmp/screen.png
+deskctl get active-window
+deskctl get monitors
+deskctl get version
+deskctl get systeminfo
+deskctl get-screen-size
+deskctl get-mouse-position
 ```

 ## Wait
@ -25,19 +26,21 @@ deskctl wait window --selector 'title=Firefox' --timeout 10
 deskctl wait focus --selector 'class=firefox' --timeout 5
 ```

-Returns the matched window payload on success. Failures include structured `kind` values in `--json` mode.
+Returns the matched window payload on success. Failures include structured
+`kind` values in `--json` mode.

 ## Selectors

 ```bash
-ref=w1          # snapshot ref (short-lived, from last snapshot)
-id=win1         # stable window ID (session-scoped)
-title=Firefox   # match by window title
-class=firefox   # match by WM class
-focused         # currently focused window
+ref=w1
+id=win1
+title=Firefox
+class=firefox
+focused
 ```

-Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail on ambiguity.
+Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail
+on ambiguity.

 ## Act

@ -58,12 +61,5 @@ deskctl close @w3
 deskctl launch firefox
 ```

-## Daemon
-
-```bash
-deskctl daemon start
-deskctl daemon stop
-deskctl daemon status
-```
-
-The daemon starts automatically on first command. Manual control is rarely needed.
+The daemon starts automatically on first command. In normal usage you should
+not need to manage it directly.