From cdab5e55508fca975a659d8022f8b82dc59b8210 Mon Sep 17 00:00:00 2001 From: Harivansh Rathi Date: Thu, 26 Mar 2026 08:17:07 -0400 Subject: [PATCH] align docs and contract --- README.md | 268 ++++---------------------- docs/runtime-contract.md | 168 +++------------- site/src/pages/architecture.mdx | 104 ++++++---- site/src/pages/commands.mdx | 219 ++++++++------------- site/src/pages/index.astro | 57 +++++- site/src/pages/installation.mdx | 75 ++++--- site/src/pages/quick-start.mdx | 106 +++++----- site/src/pages/runtime-contract.mdx | 177 +++++++++++++++++ site/src/styles/base.css | 21 ++ skills/deskctl/references/commands.md | 52 +++-- 10 files changed, 590 insertions(+), 657 deletions(-) create mode 100644 site/src/pages/runtime-contract.mdx diff --git a/README.md b/README.md index db7d92f..32144f0 100644 --- a/README.md +++ b/README.md @@ -1,266 +1,68 @@ # deskctl -Desktop control CLI for AI agents on Linux X11. +[![npm](https://img.shields.io/npm/v/deskctl-cli?label=npm)](https://www.npmjs.com/package/deskctl-cli) +[![release](https://img.shields.io/github/v/release/harivansh-afk/deskctl?label=release)](https://github.com/harivansh-afk/deskctl/releases) +[![runtime](https://img.shields.io/badge/runtime-linux--x11-111827)](#support-boundary) +[![skill](https://img.shields.io/badge/skills.sh-deskctl-111827)](skills/deskctl) + +Non-interactive desktop control for AI agents on Linux X11. ## Install -### Cargo - -```bash -cargo install deskctl -``` - -Source builds on Linux require: - -- Rust 1.75+ -- `pkg-config` -- X11 development libraries for input and windowing, typically `libx11-dev` and `libxtst-dev` on Debian/Ubuntu - -### npm - ```bash npm install -g deskctl-cli -deskctl --help +deskctl doctor +deskctl snapshot --annotate ``` -One-shot execution is also supported: +One-shot execution also works: ```bash npx deskctl-cli --help ``` -`deskctl-cli` currently supports `linux-x64` and installs the `deskctl` command by downloading the matching GitHub Release asset. +`deskctl-cli` installs the `deskctl` command by downloading the matching GitHub Release asset for the supported runtime target. -### Installable skill - -For `skills.sh` / agent skill ecosystems: +## Installable skill ```bash npx skills add harivansh-afk/deskctl -s deskctl ``` -The installable skill lives under [`skills/deskctl`](skills/deskctl) and is designed for X11 sandboxes, VMs, and sandbox-agent desktop sessions. It points agents to the npm install path first so they can get `deskctl` without Cargo. +The installable skill lives in [`skills/deskctl`](skills/deskctl) and is built around the same observe -> wait -> act -> verify loop as the CLI. -### Nix +## Quick example + +```bash +deskctl doctor +deskctl snapshot --annotate +deskctl wait window --selector 'title=Firefox' --timeout 10 +deskctl focus 'title=Firefox' +deskctl type "hello world" +``` + +## Docs + +- runtime contract: [docs/runtime-contract.md](docs/runtime-contract.md) +- release flow: [docs/releasing.md](docs/releasing.md) +- installable skill: [skills/deskctl](skills/deskctl) +- contributor workflow: [CONTRIBUTING.md](CONTRIBUTING.md) + +## Other install paths + +Nix: ```bash nix run github:harivansh-afk/deskctl -- --help nix profile install github:harivansh-afk/deskctl ``` -The repo flake is the supported Nix install surface in this phase. - -### Docker Convenience - -Build a Linux binary locally with Docker: - -```bash -docker compose -f docker/docker-compose.yml run --rm build -``` - -This writes `dist/deskctl-linux-x86_64`. - -Copy it to an SSH machine where `scp` is unavailable: - -```bash -ssh -p 443 deskctl@ssh.agentcomputer.ai 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64 -``` - -Run it on an X11 session: - -```bash -DISPLAY=:1 XDG_SESSION_TYPE=x11 ~/deskctl --json snapshot --annotate -``` - -### Local Source Build +Source build: ```bash cargo build ``` -## Quick Start +## Support boundary -```bash -# Diagnose the environment first -deskctl doctor - -# See the desktop -deskctl snapshot - -# Query focused runtime state -deskctl get active-window -deskctl get monitors - -# Click a window -deskctl click @w1 - -# Type text -deskctl type "hello world" - -# Wait for a window or focus transition -deskctl wait window --selector 'title=Firefox' --timeout 10 -deskctl wait focus --selector 'class=firefox' --timeout 5 - -# Focus by explicit selector -deskctl focus 'title=Firefox' -``` - -## Architecture - -Client-daemon architecture over Unix sockets (NDJSON wire protocol). -The daemon starts automatically on first command and keeps the X11 connection alive for fast repeated calls. - -Source layout: - -- `src/lib.rs` exposes the shared library target -- `src/main.rs` is the thin CLI wrapper -- `src/` contains production code and unit tests -- `tests/` contains Linux/X11 integration tests -- `tests/support/` contains shared integration helpers - -## Runtime Requirements - -- Linux with X11 session -- Rust 1.75+ plus the source-build dependencies above when building from source - -The binary itself only links the standard glibc runtime on Linux (`libc`, `libm`, `libgcc_s`). - -For deskctl to be fully functional on a fresh VM you still need: - -- an X11 server and an active `DISPLAY` -- `XDG_SESSION_TYPE=x11` or an equivalent X11 session environment -- a window manager or desktop environment that exposes standard EWMH properties such as `_NET_CLIENT_LIST_STACKING` and `_NET_ACTIVE_WINDOW` -- an X server with the extensions needed for input simulation and screen metadata, which is standard on normal desktop X11 setups - -If setup fails, run: - -```bash -deskctl doctor -``` - -## Contract Notes - -- `@wN` refs are short-lived handles assigned by `snapshot` and `list-windows` -- `--json` output includes a stable `window_id` for programmatic targeting within the current daemon session -- `list-windows` is a cheap read-only operation and does not capture or write a screenshot -- the stable runtime JSON/error contract is documented in [docs/runtime-contract.md](docs/runtime-contract.md) - -## Read and Wait Surface - -The grouped runtime reads are: - -```bash -deskctl get active-window -deskctl get monitors -deskctl get version -deskctl get systeminfo -``` - -The grouped runtime waits are: - -```bash -deskctl wait window --selector 'title=Firefox' --timeout 10 -deskctl wait focus --selector 'id=win3' --timeout 5 -``` - -Successful `get active-window`, `wait window`, and `wait focus` responses return a `window` payload with: -- `ref_id` -- `window_id` -- `title` -- `app_name` -- geometry (`x`, `y`, `width`, `height`) -- state flags (`focused`, `minimized`) - -`get monitors` returns: -- `count` -- `monitors[]` with geometry and primary/automatic flags - -`get version` returns: -- `version` -- `backend` - -`get systeminfo` stays runtime-scoped and returns: -- `backend` -- `display` -- `session_type` -- `session` -- `socket_path` -- `screen` -- `monitor_count` -- `monitors` - -Wait timeout and selector failures are structured in `--json` mode so agents can recover without string parsing. - -## Output Policy - -Text mode is compact and follow-up-oriented, but JSON is the parsing contract. - -- use `--json` when an agent needs strict parsing -- rely on `window_id`, selector-related fields, grouped read payloads, and structured error `kind` values for stable automation -- treat monitor naming, incidental whitespace, and default screenshot file names as best-effort - -See [docs/runtime-conract.md](docs/runtime-contract.md) for the exact stable-vs-best-effort breakdown. - -## Distribution - -- GitHub Releases are the canonical binary source -- crates.io package: `deskctl` -- npm package: `deskctl-cli` -- installed command on every channel: `deskctl` -- repo-owned Nix install path: `flake.nix` - -For maintainer publishing and release steps, see [docs/releasing.md](docs/releasing.md). - -## Selector Contract - -Explicit selector modes: - -```bash -ref=w1 -id=win1 -title=Firefox -class=firefox -focused -``` - -Legacy refs remain supported: - -```bash -@w1 -w1 -win1 -``` - -Bare selectors such as `firefox` are still supported as fuzzy substring matches, but they now fail on ambiguity and return candidate windows instead of silently picking the first match. - -## Support Boundary - -`deskctl` supports Linux X11 in this phase. Wayland and Hyprland are explicitly out of scope for the current runtime contract. - -## Workflow - -Local validation uses the root `Makefile`: - -```bash -make fmt-check -make lint -make test-unit -make test-integration -make site-format-check -make validate -``` - -`make validate` is the full repo-quality check and requires Linux with `xvfb-run` plus `pnpm --dir site install`. - -The repository standardizes on `pre-commit` for fast commit-time checks: - -```bash -pre-commit install -pre-commit run --all-files -``` - -See [CONTRIBUTING.md](CONTRIBUTING.md) for the full contributor guide. - -## Acknowledgements - -- [@barrettruth](github.com/barrettruth) - i stole the website from [vimdoc](https://github.com/barrettruth/vimdoc-language-server) +`deskctl` currently supports Linux X11. Use `--json` for stable machine parsing, use `window_id` for programmatic targeting inside a live session, and use `deskctl doctor` first when the runtime looks broken. diff --git a/docs/runtime-contract.md b/docs/runtime-contract.md index 7312357..0316c06 100644 --- a/docs/runtime-contract.md +++ b/docs/runtime-contract.md @@ -1,19 +1,6 @@ -# Runtime Output Contract +# deskctl runtime contract -This document defines the current output contract for `deskctl`. - -It is intentionally scoped to the current Linux X11 runtime surface. -It does not promise stability for future Wayland or window-manager-specific features. - -## Goals - -- Keep `deskctl` fully non-interactive -- Make text output actionable for quick terminal and agent loops -- Make `--json` safe for agent consumption without depending on incidental formatting - -## JSON Envelope - -Every runtime command uses the same top-level JSON envelope: +All commands support `--json` and use the same top-level envelope: ```json { @@ -23,22 +10,11 @@ Every runtime command uses the same top-level JSON envelope: } ``` -Stable top-level fields: +Use `--json` whenever you need to parse output programmatically. -- `success` -- `data` -- `error` +## Stable window fields -`success` is always the authoritative success/failure bit. -When `success` is `false`, the CLI exits non-zero in both text mode and `--json` mode. - -## Stable Fields - -These fields are stable for agent consumption in the current Phase 1 runtime contract. - -### Window Identity - -Whenever a runtime response includes a window payload, these fields are stable: +Whenever a response includes a window payload, these fields are stable: - `ref_id` - `window_id` @@ -51,128 +27,46 @@ Whenever a runtime response includes a window payload, these fields are stable: - `focused` - `minimized` -`window_id` is the stable public identifier for a live daemon session. -`ref_id` is a short-lived convenience handle for the current window snapshot/ref map. +Use `window_id` for stable targeting inside a live daemon session. Use +`ref_id` or `@wN` for short-lived follow-up actions after `snapshot` or +`list-windows`. -### Grouped Reads +## Stable grouped reads -`deskctl get active-window` +- `deskctl get active-window` -> `data.window` +- `deskctl get monitors` -> `data.count`, `data.monitors` +- `deskctl get version` -> `data.version`, `data.backend` +- `deskctl get systeminfo` -> runtime-scoped diagnostic fields such as + `backend`, `display`, `session_type`, `session`, `socket_path`, `screen`, + `monitor_count`, and `monitors` -- stable: `data.window` +## Stable waits -`deskctl get monitors` +- `deskctl wait window` -> `data.wait`, `data.selector`, `data.elapsed_ms`, + `data.window` +- `deskctl wait focus` -> `data.wait`, `data.selector`, `data.elapsed_ms`, + `data.window` -- stable: `data.count` -- stable: `data.monitors` -- stable per monitor: - - `name` - - `x` - - `y` - - `width` - - `height` - - `width_mm` - - `height_mm` - - `primary` - - `automatic` +## Stable structured error kinds -`deskctl get version` - -- stable: `data.version` -- stable: `data.backend` - -`deskctl get systeminfo` - -- stable: `data.backend` -- stable: `data.display` -- stable: `data.session_type` -- stable: `data.session` -- stable: `data.socket_path` -- stable: `data.screen` -- stable: `data.monitor_count` -- stable: `data.monitors` - -### Waits - -`deskctl wait window` -`deskctl wait focus` - -- stable: `data.wait` -- stable: `data.selector` -- stable: `data.elapsed_ms` -- stable: `data.window` - -### Selector-Driven Action Success - -For selector-driven action commands that resolve a window target, these identifiers are stable when present: - -- `data.ref_id` -- `data.window_id` -- `data.title` -- `data.selector` - -This applies to: - -- `click` -- `dblclick` -- `focus` -- `close` -- `move-window` -- `resize-window` - -The exact human-readable text rendering of those commands is not part of the JSON contract. - -### Artifact-Producing Commands - -`snapshot` -`screenshot` - -- stable: `data.screenshot` - -When the command also returns windows, `data.windows` uses the stable window payload documented above. - -## Stable Structured Error Kinds - -When a runtime command returns structured JSON failure data, these error kinds are stable: +When a command fails with structured JSON data, these `kind` values are stable: - `selector_not_found` - `selector_ambiguous` - `selector_invalid` - `timeout` - `not_found` -- `window_not_focused` as `data.last_observation.kind` or equivalent observation payload -Stable structured failure fields include: +Wait failures may also include `window_not_focused` in the last observation +payload. -- `data.kind` -- `data.selector` when selector-related -- `data.mode` when selector-related -- `data.candidates` for ambiguous selector failures -- `data.message` for invalid selector failures -- `data.wait` -- `data.timeout_ms` -- `data.poll_ms` -- `data.last_observation` +## Best-effort fields -## Best-Effort Fields +Treat these as useful but non-contractual: -These values are useful but environment-dependent and should be treated as best-effort: +- exact monitor names +- incidental text formatting in non-JSON mode +- default screenshot file names when no explicit path was provided +- environment-dependent ordering details from the window manager -- exact monitor naming conventions -- EWMH/window-manager-dependent window ordering details -- cosmetic text formatting in non-JSON mode -- screenshot file names when the caller did not provide an explicit path -- command stderr wording outside the structured `kind` classifications above - -## Text Mode Expectations - -Text mode is intended to stay compact and follow-up-useful. - -The exact whitespace/alignment of text output is not stable. -The following expectations are stable at the behavioral level: - -- important runtime reads print actionable identifiers or geometry -- selector failures print enough detail to recover without `--json` -- artifact-producing commands print the artifact path -- window listings print both `@wN` refs and `window_id` values - -If an agent needs strict parsing, it should use `--json`. +For the full repo copy, see `docs/runtime-contract.md`. diff --git a/site/src/pages/architecture.mdx b/site/src/pages/architecture.mdx index 87b2b4e..9478246 100644 --- a/site/src/pages/architecture.mdx +++ b/site/src/pages/architecture.mdx @@ -6,73 +6,93 @@ toc: true # Architecture -## Client-daemon model +## Public model -deskctl uses a client-daemon architecture over Unix sockets. The daemon starts automatically on the first command and keeps the X11 connection alive so repeated calls skip the connection setup overhead. +`deskctl` is a thin, non-interactive X11 control primitive for agent loops. +The public flow is: -Each command opens a new connection to the daemon, sends a single NDJSON request, reads one NDJSON response, and exits. +- diagnose with `deskctl doctor` +- observe with `snapshot`, `list-windows`, and grouped `get` commands +- wait with grouped `wait` commands instead of shell `sleep` +- act with explicit selectors or coordinates +- verify with another read or snapshot -## Wire protocol +The tool stays intentionally narrow. It does not try to be a full desktop shell +or a speculative Wayland abstraction. + +## Client-daemon architecture + +The CLI talks to an auto-managed daemon over a Unix socket. The daemon keeps +the X11 connection alive so repeated commands stay fast and share the same +session-scoped window identity map. + +Each CLI invocation sends one request, reads one response, and exits. + +## Runtime contract Requests and responses are newline-delimited JSON (NDJSON) over a Unix socket. -**Request:** +All commands share the same JSON envelope: ```json -{ "id": "r123456", "action": "snapshot", "annotate": true } +{ + "success": true, + "data": {}, + "error": null +} ``` -**Response:** +For window payloads, the public identity is `window_id`, not an X11 handle. +That keeps the contract backend-neutral even though the current support +boundary is X11-only. -```json -{"success": true, "data": {"screenshot": "/tmp/deskctl-1234567890.png", "windows": [...]}} -``` +The complete stable-vs-best-effort policy lives on the +[runtime contract](/runtime-contract) page. -Error responses include an `error` field: +## Sessions and sockets -```json -{ "success": false, "error": "window not found: @w99" } -``` +Each session gets its own socket path, PID file, and live window mapping. -## Socket location +Public socket resolution order: -The daemon socket is resolved in this order: - -1. `--socket` flag (highest priority) -2. `$DESKCTL_SOCKET_DIR/{session}.sock` -3. `$XDG_RUNTIME_DIR/deskctl/{session}.sock` +1. `--socket` +2. `DESKCTL_SOCKET_DIR/{session}.sock` +3. `XDG_RUNTIME_DIR/deskctl/{session}.sock` 4. `~/.deskctl/{session}.sock` -PID files are stored alongside the socket. +Most users should let `deskctl` manage this automatically. `--session` is the +main public knob when you need isolated daemon instances. -## Sessions +## Diagnostics and failure handling -Multiple isolated daemon instances can run simultaneously using the `--session` flag: +`deskctl doctor` runs before daemon startup and checks: -```sh -deskctl --session workspace1 snapshot -deskctl --session workspace2 snapshot -``` +- display/session setup +- X11 connectivity +- basic window enumeration +- screenshot viability +- socket directory and stale-socket health -Each session has its own socket, PID file, and window ref map. +Selector and wait failures are structured in `--json` mode so clients can +recover without scraping text. -## Backend design +## Backend notes -The core is built around a `DesktopBackend` trait. The current implementation uses `x11rb` for X11 protocol operations and `enigo` for input simulation. +The backend is built around a `DesktopBackend` trait and currently ships with +an X11 implementation backed by `x11rb`. -The trait-based design means adding Wayland support is a single trait implementation with no changes to the core, CLI, or daemon code. +The important public guarantee is not "portable desktop automation." The +important guarantee is "a correct and unsurprising Linux X11 runtime contract." -## X11 integration +## X11 support boundary -Window detection uses EWMH properties: +This phase supports Linux X11 only. -| Property | Purpose | -| --------------------------- | ------------------------ | -| `_NET_CLIENT_LIST_STACKING` | Window stacking order | -| `_NET_ACTIVE_WINDOW` | Currently focused window | -| `_NET_WM_NAME` | Window title (UTF-8) | -| `_NET_WM_STATE_HIDDEN` | Minimized state | -| `_NET_CLOSE_WINDOW` | Graceful close | -| `WM_CLASS` | Application class/name | +That means: -Falls back to `XQueryTree` if `_NET_CLIENT_LIST_STACKING` is unavailable. +- EWMH/window-manager properties matter +- monitor naming and some ordering details are best-effort +- Wayland and Hyprland are out of scope for the current contract + +The runtime documents those boundaries explicitly instead of pretending the +surface is broader than it is. diff --git a/site/src/pages/commands.mdx b/site/src/pages/commands.mdx index e1fc509..8a5132b 100644 --- a/site/src/pages/commands.mdx +++ b/site/src/pages/commands.mdx @@ -6,167 +6,101 @@ toc: true # Commands -## Snapshot - -Capture a screenshot and get the window tree: +## Observe ```sh +deskctl doctor deskctl snapshot deskctl snapshot --annotate -``` - -With `--annotate`, colored bounding boxes and `@wN` labels are drawn on the screenshot. Each window gets a unique color from an 8-color palette. Minimized windows are skipped. - -The screenshot is saved to `/tmp/deskctl-{timestamp}.png`. - -## Click - -Click the center of a window by ref, or click exact coordinates: - -```sh -deskctl click @w1 -deskctl click 960,540 -``` - -## Double click - -```sh -deskctl dblclick @w1 -deskctl dblclick 500,300 -``` - -## Type - -Type a string into the focused window: - -```sh -deskctl type "hello world" -``` - -## Press - -Press a single key: - -```sh -deskctl press enter -deskctl press tab -deskctl press escape -``` - -Supported key names: `enter`, `tab`, `escape`, `backspace`, `delete`, `space`, `up`, `down`, `left`, `right`, `home`, `end`, `pageup`, `pagedown`, `f1`-`f12`, or any single character. - -## Hotkey - -Send a key combination. List modifier keys first, then the target key: - -```sh -deskctl hotkey ctrl c -deskctl hotkey ctrl shift t -deskctl hotkey alt f4 -``` - -Modifier names: `ctrl`, `alt`, `shift`, `super` (also `meta` or `win`). - -## Mouse move - -Move the cursor to absolute coordinates: - -```sh -deskctl mouse move 100 200 -``` - -## Mouse scroll - -Scroll the mouse wheel. Positive values scroll down, negative scroll up: - -```sh -deskctl mouse scroll 3 -deskctl mouse scroll -5 -deskctl mouse scroll 3 --axis horizontal -``` - -## Mouse drag - -Drag from one position to another: - -```sh -deskctl mouse drag 100 200 500 600 -``` - -## Focus - -Focus a window by ref or by name (case-insensitive substring match): - -```sh -deskctl focus @w1 -deskctl focus "firefox" -``` - -## Close - -Close a window gracefully: - -```sh -deskctl close @w2 -deskctl close "terminal" -``` - -## Move window - -Move a window to an absolute position: - -```sh -deskctl move-window @w1 0 0 -deskctl move-window "firefox" 100 100 -``` - -## Resize window - -Resize a window: - -```sh -deskctl resize-window @w1 1280 720 -``` - -## List windows - -List all windows without taking a screenshot: - -```sh deskctl list-windows -``` - -## Get screen size - -```sh +deskctl screenshot +deskctl screenshot /tmp/screen.png +deskctl get active-window +deskctl get monitors +deskctl get version +deskctl get systeminfo deskctl get-screen-size -``` - -## Get mouse position - -```sh deskctl get-mouse-position ``` -## Screenshot +`doctor` checks the runtime before daemon startup. `snapshot` produces a +screenshot plus window refs. `list-windows` is the same window tree without the +side effect of writing a screenshot. -Take a screenshot without the window tree. Optionally specify a save path: +## Wait ```sh -deskctl screenshot -deskctl screenshot /tmp/my-screenshot.png -deskctl screenshot --annotate +deskctl wait window --selector 'title=Firefox' --timeout 10 +deskctl wait focus --selector 'id=win3' --timeout 5 +deskctl --json wait window --selector 'class=firefox' --poll-ms 100 ``` -## Launch +Wait commands return the matched window payload on success. In `--json` mode, +timeouts and selector failures expose structured `kind` values. -Launch an application: +## Act on a window ```sh deskctl launch firefox -deskctl launch code --args /path/to/project +deskctl focus @w1 +deskctl focus 'title=Firefox' +deskctl click @w1 +deskctl click 960,540 +deskctl dblclick @w2 +deskctl close @w3 +deskctl move-window @w1 100 120 +deskctl resize-window @w1 1280 720 ``` +Selector-driven actions accept refs, explicit selector modes, or absolute +coordinates where appropriate. + +## Input and mouse + +```sh +deskctl type "hello world" +deskctl press enter +deskctl hotkey ctrl shift t +deskctl mouse move 100 200 +deskctl mouse scroll 3 +deskctl mouse scroll 3 --axis horizontal +deskctl mouse drag 100 200 500 600 +``` + +Supported key names include `enter`, `tab`, `escape`, `backspace`, `delete`, +`space`, arrow keys, paging keys, `f1` through `f12`, and any single +character. + +## Launch + +```sh +deskctl launch firefox +deskctl launch code -- --new-window +``` + +## Selectors + +Prefer explicit selectors when the target matters: + +```sh +ref=w1 +id=win1 +title=Firefox +class=firefox +focused +``` + +Legacy shorthand is still supported: + +```sh +@w1 +w1 +win1 +``` + +Bare strings like `firefox` are fuzzy matches. They resolve when there is one +match and fail with candidate windows when there are multiple matches. + ## Global options | Flag | Env | Description | @@ -174,3 +108,6 @@ deskctl launch code --args /path/to/project | `--json` | | Output as JSON | | `--socket ` | `DESKCTL_SOCKET` | Path to daemon Unix socket | | `--session ` | | Session name for multiple daemons (default: `default`) | + +`deskctl` manages the daemon automatically. Most users never need to think +about it beyond `--session` and `--socket`. diff --git a/site/src/pages/index.astro b/site/src/pages/index.astro index 9327dc5..4263549 100644 --- a/site/src/pages/index.astro +++ b/site/src/pages/index.astro @@ -8,17 +8,49 @@ import DocLayout from "../layouts/DocLayout.astro"; -

- Desktop control CLI for AI agents on Linux X11. Compact JSON output for - agent loops. Screenshot, click, type, scroll, drag, and manage windows - through a fast client-daemon architecture. 100% native Rust. +

non-interactive desktop control for AI agents

+ + + +

+ deskctl is a thin X11 control primitive for agent loops: diagnose + the runtime, observe the desktop, wait for state transitions, act deterministically, + then verify.

-

Getting started

+
npm install -g deskctl-cli
+deskctl doctor
+deskctl snapshot --annotate
+ +

Start here

Reference

@@ -28,14 +60,27 @@ import DocLayout from "../layouts/DocLayout.astro";
  • Architecture
  • +

    Agent skill

    + +

    + There is also an installable skill for `skills.sh`-style agent runtimes: +

    + +
    npx skills add harivansh-afk/deskctl -s deskctl
    +

    Links

    diff --git a/site/src/pages/installation.mdx b/site/src/pages/installation.mdx index e05772d..985cf99 100644 --- a/site/src/pages/installation.mdx +++ b/site/src/pages/installation.mdx @@ -6,43 +6,68 @@ toc: true # Installation -## Cargo +## Default install ```sh -cargo install deskctl +npm install -g deskctl-cli +deskctl --help ``` -## From source +`deskctl-cli` is the default install path. It installs the `deskctl` command by +downloading the matching GitHub Release asset for the supported runtime target. + +## One-shot usage + +```sh +npx deskctl-cli --help +``` + +## Agent skill + +For `skills.sh`-style runtimes: + +```sh +npx skills add harivansh-afk/deskctl -s deskctl +``` + +The repo skill lives under `skills/deskctl` and is designed around the same +observe -> wait -> act -> verify loop as the CLI. + +## Other install paths + +### Nix + +```sh +nix run github:harivansh-afk/deskctl -- --help +nix profile install github:harivansh-afk/deskctl +``` + +### Build from source ```sh git clone https://github.com/harivansh-afk/deskctl cd deskctl -cargo build --release +cargo build ``` -## Docker (cross-compile for Linux) +Source builds on Linux require: -Build a static Linux binary from any platform: +- Rust 1.75+ +- `pkg-config` +- X11 development libraries such as `libx11-dev` and `libxtst-dev` -```sh -docker compose -f docker/docker-compose.yml run --rm build -``` - -This writes `dist/deskctl-linux-x86_64`. - -## Deploy to a remote machine - -Copy the binary over SSH when `scp` is not available: - -```sh -ssh -p 443 user@host 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64 -``` - -## Requirements +## Runtime requirements - Linux with an active X11 session -- `DISPLAY` environment variable set (e.g. `DISPLAY=:1`) -- `XDG_SESSION_TYPE=x11` -- A window manager that exposes EWMH properties (`_NET_CLIENT_LIST_STACKING`, `_NET_ACTIVE_WINDOW`) +- `DISPLAY` set to a usable X11 display, such as `DISPLAY=:1` +- `XDG_SESSION_TYPE=x11` or an equivalent X11 session environment +- a window manager or desktop environment that exposes standard EWMH properties + such as `_NET_CLIENT_LIST_STACKING` and `_NET_ACTIVE_WINDOW` -No extra native libraries are needed beyond the standard glibc runtime (`libc`, `libm`, `libgcc_s`). +The binary itself only depends on the standard Linux glibc runtime. + +If setup fails, run: + +```sh +deskctl doctor +``` diff --git a/site/src/pages/quick-start.mdx b/site/src/pages/quick-start.mdx index 7f3bc07..c783b9e 100644 --- a/site/src/pages/quick-start.mdx +++ b/site/src/pages/quick-start.mdx @@ -6,50 +6,72 @@ toc: true # Quick start -## Core workflow - -The typical agent loop is: snapshot the desktop, interpret the result, act on it. +## Install and diagnose ```sh -# 1. see the desktop -deskctl --json snapshot --annotate +npm install -g deskctl-cli +deskctl doctor +``` -# 2. click a window by its ref -deskctl click @w1 +Use `deskctl doctor` first. It checks X11 connectivity, basic enumeration, +screenshot viability, and socket health before you start driving the desktop. -# 3. type into the focused window -deskctl type "hello world" +## Observe -# 4. press a key +```sh +deskctl snapshot --annotate +deskctl list-windows +deskctl get active-window +deskctl get monitors +``` + +Use `snapshot` when you want a screenshot artifact plus window refs. Use +`list-windows` when you only need the current window tree without writing a +screenshot. + +## Target windows cleanly + +Prefer explicit selectors when you need deterministic targeting: + +```sh +ref=w1 +id=win1 +title=Firefox +class=firefox +focused +``` + +Legacy refs such as `@w1` still work after `snapshot` or `list-windows`. Bare +strings like `firefox` are fuzzy matches and now fail on ambiguity. + +## Wait, act, verify + +The core loop is: + +```sh +# observe +deskctl snapshot --annotate + +# wait +deskctl wait window --selector 'title=Firefox' --timeout 10 + +# act +deskctl focus 'title=Firefox' +deskctl hotkey ctrl l +deskctl type "https://example.com" deskctl press enter + +# verify +deskctl wait focus --selector 'title=Firefox' --timeout 5 +deskctl snapshot ``` -The `--annotate` flag draws colored bounding boxes and `@wN` labels on the screenshot so agents can visually identify windows. +The wait commands return the matched window payload on success, so they compose +cleanly into the next action. -## Window refs +## Use `--json` when parsing matters -Every `snapshot` assigns refs like `@w1`, `@w2`, etc. to each visible window, ordered top-to-bottom by stacking order. Use these refs anywhere a selector is expected: - -```sh -deskctl click @w1 -deskctl focus @w3 -deskctl close @w2 -``` - -You can also select windows by name (case-insensitive substring match): - -```sh -deskctl focus "firefox" -deskctl close "terminal" -``` - -## JSON output - -Pass `--json` for machine-readable output. This is the primary mode for agent integrations: - -```sh -deskctl --json snapshot -``` +Every command supports `--json` and uses the same top-level envelope: ```json { @@ -59,7 +81,7 @@ deskctl --json snapshot "windows": [ { "ref_id": "w1", - "xcb_id": 12345678, + "window_id": "win1", "title": "Firefox", "app_name": "firefox", "x": 0, @@ -74,14 +96,8 @@ deskctl --json snapshot } ``` -## Daemon lifecycle +Use `window_id` for stable targeting inside a live daemon session. The exact +text formatting is intentionally compact, but JSON is the parsing contract. -The daemon starts automatically on the first command. It keeps the X11 connection alive so repeated calls are fast. You do not need to manage it manually. - -```sh -# check if the daemon is running -deskctl daemon status - -# stop it explicitly -deskctl daemon stop -``` +The full stable-vs-best-effort contract lives on the +[runtime contract](/runtime-contract) page. diff --git a/site/src/pages/runtime-contract.mdx b/site/src/pages/runtime-contract.mdx new file mode 100644 index 0000000..4fca14c --- /dev/null +++ b/site/src/pages/runtime-contract.mdx @@ -0,0 +1,177 @@ +--- +layout: ../layouts/DocLayout.astro +title: Runtime contract +toc: true +--- + +# Runtime contract + +This page defines the current public output contract for `deskctl`. + +It is intentionally scoped to the current Linux X11 runtime surface. It does +not promise stability for future Wayland or window-manager-specific features. + +## JSON envelope + +Every command supports `--json` and uses the same top-level envelope: + +```json +{ + "success": true, + "data": {}, + "error": null +} +``` + +Stable top-level fields: + +- `success` +- `data` +- `error` + +If `success` is `false`, the command exits non-zero in both text mode and JSON +mode. + +## Stable window fields + +Whenever a response includes a window payload, these fields are stable: + +- `ref_id` +- `window_id` +- `title` +- `app_name` +- `x` +- `y` +- `width` +- `height` +- `focused` +- `minimized` + +`window_id` is the public session-scoped identifier for programmatic targeting. +`ref_id` is a short-lived convenience handle from the current ref map. + +## Stable grouped reads + +`deskctl get active-window` + +- stable: `data.window` + +`deskctl get monitors` + +- stable: `data.count` +- stable: `data.monitors` + +Stable per-monitor fields: + +- `name` +- `x` +- `y` +- `width` +- `height` +- `width_mm` +- `height_mm` +- `primary` +- `automatic` + +`deskctl get version` + +- stable: `data.version` +- stable: `data.backend` + +`deskctl get systeminfo` + +- stable: `data.backend` +- stable: `data.display` +- stable: `data.session_type` +- stable: `data.session` +- stable: `data.socket_path` +- stable: `data.screen` +- stable: `data.monitor_count` +- stable: `data.monitors` + +## Stable waits + +`deskctl wait window` +`deskctl wait focus` + +- stable: `data.wait` +- stable: `data.selector` +- stable: `data.elapsed_ms` +- stable: `data.window` + +## Stable selector-driven action fields + +When selector-driven actions return resolved window data, these fields are +stable when present: + +- `data.ref_id` +- `data.window_id` +- `data.title` +- `data.selector` + +This applies to: + +- `click` +- `dblclick` +- `focus` +- `close` +- `move-window` +- `resize-window` + +## Stable artifact fields + +For `snapshot` and `screenshot`: + +- stable: `data.screenshot` + +When a command also returns windows, `data.windows` uses the stable window +payload documented above. + +## Stable structured error kinds + +When a command fails with structured JSON data, these error kinds are stable: + +- `selector_not_found` +- `selector_ambiguous` +- `selector_invalid` +- `timeout` +- `not_found` +- `window_not_focused` in `data.last_observation.kind` or an equivalent wait + observation payload + +Stable structured failure fields include: + +- `data.kind` +- `data.selector` +- `data.mode` +- `data.candidates` +- `data.message` +- `data.wait` +- `data.timeout_ms` +- `data.poll_ms` +- `data.last_observation` + +## Best-effort fields + +These values are useful but environment-dependent and should not be treated as +strict parsing guarantees: + +- exact monitor naming conventions +- EWMH/window-manager-dependent ordering details +- cosmetic text formatting in non-JSON mode +- default screenshot file names when no explicit path was provided +- stderr wording outside the structured `kind` classifications above + +## Text mode expectations + +Text mode is intended to stay compact and follow-up-useful. + +The exact whitespace and alignment are not stable. The stable behavioral +expectations are: + +- important reads print actionable identifiers or geometry +- selector failures print enough detail to recover without `--json` +- artifact-producing commands print the artifact path +- window listings print both `@wN` refs and `window_id` values + +If you need strict parsing, use `--json`. diff --git a/site/src/styles/base.css b/site/src/styles/base.css index 86fd6a8..f60c0e6 100644 --- a/site/src/styles/base.css +++ b/site/src/styles/base.css @@ -65,6 +65,23 @@ main { font-style: italic; } +.lede { + font-size: 1.05rem; + max-width: 42rem; +} + +.badges { + display: flex; + flex-wrap: wrap; + gap: 0.6rem; + margin-bottom: 1.25rem; +} + +.badges a, +.badges img { + display: block; +} + header { display: flex; align-items: center; @@ -117,6 +134,10 @@ a:hover { text-decoration-thickness: 2px; } +img { + max-width: 100%; +} + ul, ol { padding-left: 1.25em; diff --git a/skills/deskctl/references/commands.md b/skills/deskctl/references/commands.md index d0e7c9f..77b9513 100644 --- a/skills/deskctl/references/commands.md +++ b/skills/deskctl/references/commands.md @@ -1,21 +1,22 @@ # deskctl commands -All commands support `--json` for machine-parseable output following the runtime contract. +All commands support `--json` for machine-parseable output following the +runtime contract. ## Observe ```bash -deskctl doctor # check X11 runtime and daemon health -deskctl snapshot # screenshot + window list -deskctl snapshot --annotate # screenshot with @wN labels overlaid -deskctl list-windows # window list only (no screenshot) -deskctl screenshot /tmp/screen.png # screenshot to explicit path -deskctl get active-window # focused window info -deskctl get monitors # monitor geometry -deskctl get version # version and backend -deskctl get systeminfo # full runtime diagnostics -deskctl get-screen-size # screen resolution -deskctl get-mouse-position # cursor coordinates +deskctl doctor +deskctl snapshot +deskctl snapshot --annotate +deskctl list-windows +deskctl screenshot /tmp/screen.png +deskctl get active-window +deskctl get monitors +deskctl get version +deskctl get systeminfo +deskctl get-screen-size +deskctl get-mouse-position ``` ## Wait @@ -25,19 +26,21 @@ deskctl wait window --selector 'title=Firefox' --timeout 10 deskctl wait focus --selector 'class=firefox' --timeout 5 ``` -Returns the matched window payload on success. Failures include structured `kind` values in `--json` mode. +Returns the matched window payload on success. Failures include structured +`kind` values in `--json` mode. ## Selectors ```bash -ref=w1 # snapshot ref (short-lived, from last snapshot) -id=win1 # stable window ID (session-scoped) -title=Firefox # match by window title -class=firefox # match by WM class -focused # currently focused window +ref=w1 +id=win1 +title=Firefox +class=firefox +focused ``` -Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail on ambiguity. +Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail +on ambiguity. ## Act @@ -58,12 +61,5 @@ deskctl close @w3 deskctl launch firefox ``` -## Daemon - -```bash -deskctl daemon start -deskctl daemon stop -deskctl daemon status -``` - -The daemon starts automatically on first command. Manual control is rarely needed. +The daemon starts automatically on first command. In normal usage you should +not need to manage it directly.