align docs and contract

This commit is contained in:
Harivansh Rathi 2026-03-26 08:17:07 -04:00
parent c37589ccf4
commit cdab5e5550
10 changed files with 590 additions and 657 deletions

268
README.md
View file

@ -1,266 +1,68 @@
# deskctl
Desktop control CLI for AI agents on Linux X11.
[![npm](https://img.shields.io/npm/v/deskctl-cli?label=npm)](https://www.npmjs.com/package/deskctl-cli)
[![release](https://img.shields.io/github/v/release/harivansh-afk/deskctl?label=release)](https://github.com/harivansh-afk/deskctl/releases)
[![runtime](https://img.shields.io/badge/runtime-linux--x11-111827)](#support-boundary)
[![skill](https://img.shields.io/badge/skills.sh-deskctl-111827)](skills/deskctl)
Non-interactive desktop control for AI agents on Linux X11.
## Install
### Cargo
```bash
cargo install deskctl
```
Source builds on Linux require:
- Rust 1.75+
- `pkg-config`
- X11 development libraries for input and windowing, typically `libx11-dev` and `libxtst-dev` on Debian/Ubuntu
### npm
```bash
npm install -g deskctl-cli
deskctl --help
deskctl doctor
deskctl snapshot --annotate
```
One-shot execution is also supported:
One-shot execution also works:
```bash
npx deskctl-cli --help
```
`deskctl-cli` currently supports `linux-x64` and installs the `deskctl` command by downloading the matching GitHub Release asset.
`deskctl-cli` installs the `deskctl` command by downloading the matching GitHub Release asset for the supported runtime target.
### Installable skill
For `skills.sh` / agent skill ecosystems:
## Installable skill
```bash
npx skills add harivansh-afk/deskctl -s deskctl
```
The installable skill lives under [`skills/deskctl`](skills/deskctl) and is designed for X11 sandboxes, VMs, and sandbox-agent desktop sessions. It points agents to the npm install path first so they can get `deskctl` without Cargo.
The installable skill lives in [`skills/deskctl`](skills/deskctl) and is built around the same observe -> wait -> act -> verify loop as the CLI.
### Nix
## Quick example
```bash
deskctl doctor
deskctl snapshot --annotate
deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl focus 'title=Firefox'
deskctl type "hello world"
```
## Docs
- runtime contract: [docs/runtime-contract.md](docs/runtime-contract.md)
- release flow: [docs/releasing.md](docs/releasing.md)
- installable skill: [skills/deskctl](skills/deskctl)
- contributor workflow: [CONTRIBUTING.md](CONTRIBUTING.md)
## Other install paths
Nix:
```bash
nix run github:harivansh-afk/deskctl -- --help
nix profile install github:harivansh-afk/deskctl
```
The repo flake is the supported Nix install surface in this phase.
### Docker Convenience
Build a Linux binary locally with Docker:
```bash
docker compose -f docker/docker-compose.yml run --rm build
```
This writes `dist/deskctl-linux-x86_64`.
Copy it to an SSH machine where `scp` is unavailable:
```bash
ssh -p 443 deskctl@ssh.agentcomputer.ai 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64
```
Run it on an X11 session:
```bash
DISPLAY=:1 XDG_SESSION_TYPE=x11 ~/deskctl --json snapshot --annotate
```
### Local Source Build
Source build:
```bash
cargo build
```
## Quick Start
## Support boundary
```bash
# Diagnose the environment first
deskctl doctor
# See the desktop
deskctl snapshot
# Query focused runtime state
deskctl get active-window
deskctl get monitors
# Click a window
deskctl click @w1
# Type text
deskctl type "hello world"
# Wait for a window or focus transition
deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'class=firefox' --timeout 5
# Focus by explicit selector
deskctl focus 'title=Firefox'
```
## Architecture
Client-daemon architecture over Unix sockets (NDJSON wire protocol).
The daemon starts automatically on first command and keeps the X11 connection alive for fast repeated calls.
Source layout:
- `src/lib.rs` exposes the shared library target
- `src/main.rs` is the thin CLI wrapper
- `src/` contains production code and unit tests
- `tests/` contains Linux/X11 integration tests
- `tests/support/` contains shared integration helpers
## Runtime Requirements
- Linux with X11 session
- Rust 1.75+ plus the source-build dependencies above when building from source
The binary itself only links the standard glibc runtime on Linux (`libc`, `libm`, `libgcc_s`).
For deskctl to be fully functional on a fresh VM you still need:
- an X11 server and an active `DISPLAY`
- `XDG_SESSION_TYPE=x11` or an equivalent X11 session environment
- a window manager or desktop environment that exposes standard EWMH properties such as `_NET_CLIENT_LIST_STACKING` and `_NET_ACTIVE_WINDOW`
- an X server with the extensions needed for input simulation and screen metadata, which is standard on normal desktop X11 setups
If setup fails, run:
```bash
deskctl doctor
```
## Contract Notes
- `@wN` refs are short-lived handles assigned by `snapshot` and `list-windows`
- `--json` output includes a stable `window_id` for programmatic targeting within the current daemon session
- `list-windows` is a cheap read-only operation and does not capture or write a screenshot
- the stable runtime JSON/error contract is documented in [docs/runtime-contract.md](docs/runtime-contract.md)
## Read and Wait Surface
The grouped runtime reads are:
```bash
deskctl get active-window
deskctl get monitors
deskctl get version
deskctl get systeminfo
```
The grouped runtime waits are:
```bash
deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'id=win3' --timeout 5
```
Successful `get active-window`, `wait window`, and `wait focus` responses return a `window` payload with:
- `ref_id`
- `window_id`
- `title`
- `app_name`
- geometry (`x`, `y`, `width`, `height`)
- state flags (`focused`, `minimized`)
`get monitors` returns:
- `count`
- `monitors[]` with geometry and primary/automatic flags
`get version` returns:
- `version`
- `backend`
`get systeminfo` stays runtime-scoped and returns:
- `backend`
- `display`
- `session_type`
- `session`
- `socket_path`
- `screen`
- `monitor_count`
- `monitors`
Wait timeout and selector failures are structured in `--json` mode so agents can recover without string parsing.
## Output Policy
Text mode is compact and follow-up-oriented, but JSON is the parsing contract.
- use `--json` when an agent needs strict parsing
- rely on `window_id`, selector-related fields, grouped read payloads, and structured error `kind` values for stable automation
- treat monitor naming, incidental whitespace, and default screenshot file names as best-effort
See [docs/runtime-conract.md](docs/runtime-contract.md) for the exact stable-vs-best-effort breakdown.
## Distribution
- GitHub Releases are the canonical binary source
- crates.io package: `deskctl`
- npm package: `deskctl-cli`
- installed command on every channel: `deskctl`
- repo-owned Nix install path: `flake.nix`
For maintainer publishing and release steps, see [docs/releasing.md](docs/releasing.md).
## Selector Contract
Explicit selector modes:
```bash
ref=w1
id=win1
title=Firefox
class=firefox
focused
```
Legacy refs remain supported:
```bash
@w1
w1
win1
```
Bare selectors such as `firefox` are still supported as fuzzy substring matches, but they now fail on ambiguity and return candidate windows instead of silently picking the first match.
## Support Boundary
`deskctl` supports Linux X11 in this phase. Wayland and Hyprland are explicitly out of scope for the current runtime contract.
## Workflow
Local validation uses the root `Makefile`:
```bash
make fmt-check
make lint
make test-unit
make test-integration
make site-format-check
make validate
```
`make validate` is the full repo-quality check and requires Linux with `xvfb-run` plus `pnpm --dir site install`.
The repository standardizes on `pre-commit` for fast commit-time checks:
```bash
pre-commit install
pre-commit run --all-files
```
See [CONTRIBUTING.md](CONTRIBUTING.md) for the full contributor guide.
## Acknowledgements
- [@barrettruth](github.com/barrettruth) - i stole the website from [vimdoc](https://github.com/barrettruth/vimdoc-language-server)
`deskctl` currently supports Linux X11. Use `--json` for stable machine parsing, use `window_id` for programmatic targeting inside a live session, and use `deskctl doctor` first when the runtime looks broken.

View file

@ -1,19 +1,6 @@
# Runtime Output Contract
# deskctl runtime contract
This document defines the current output contract for `deskctl`.
It is intentionally scoped to the current Linux X11 runtime surface.
It does not promise stability for future Wayland or window-manager-specific features.
## Goals
- Keep `deskctl` fully non-interactive
- Make text output actionable for quick terminal and agent loops
- Make `--json` safe for agent consumption without depending on incidental formatting
## JSON Envelope
Every runtime command uses the same top-level JSON envelope:
All commands support `--json` and use the same top-level envelope:
```json
{
@ -23,22 +10,11 @@ Every runtime command uses the same top-level JSON envelope:
}
```
Stable top-level fields:
Use `--json` whenever you need to parse output programmatically.
- `success`
- `data`
- `error`
## Stable window fields
`success` is always the authoritative success/failure bit.
When `success` is `false`, the CLI exits non-zero in both text mode and `--json` mode.
## Stable Fields
These fields are stable for agent consumption in the current Phase 1 runtime contract.
### Window Identity
Whenever a runtime response includes a window payload, these fields are stable:
Whenever a response includes a window payload, these fields are stable:
- `ref_id`
- `window_id`
@ -51,128 +27,46 @@ Whenever a runtime response includes a window payload, these fields are stable:
- `focused`
- `minimized`
`window_id` is the stable public identifier for a live daemon session.
`ref_id` is a short-lived convenience handle for the current window snapshot/ref map.
Use `window_id` for stable targeting inside a live daemon session. Use
`ref_id` or `@wN` for short-lived follow-up actions after `snapshot` or
`list-windows`.
### Grouped Reads
## Stable grouped reads
`deskctl get active-window`
- `deskctl get active-window` -> `data.window`
- `deskctl get monitors` -> `data.count`, `data.monitors`
- `deskctl get version` -> `data.version`, `data.backend`
- `deskctl get systeminfo` -> runtime-scoped diagnostic fields such as
`backend`, `display`, `session_type`, `session`, `socket_path`, `screen`,
`monitor_count`, and `monitors`
- stable: `data.window`
## Stable waits
`deskctl get monitors`
- `deskctl wait window` -> `data.wait`, `data.selector`, `data.elapsed_ms`,
`data.window`
- `deskctl wait focus` -> `data.wait`, `data.selector`, `data.elapsed_ms`,
`data.window`
- stable: `data.count`
- stable: `data.monitors`
- stable per monitor:
- `name`
- `x`
- `y`
- `width`
- `height`
- `width_mm`
- `height_mm`
- `primary`
- `automatic`
## Stable structured error kinds
`deskctl get version`
- stable: `data.version`
- stable: `data.backend`
`deskctl get systeminfo`
- stable: `data.backend`
- stable: `data.display`
- stable: `data.session_type`
- stable: `data.session`
- stable: `data.socket_path`
- stable: `data.screen`
- stable: `data.monitor_count`
- stable: `data.monitors`
### Waits
`deskctl wait window`
`deskctl wait focus`
- stable: `data.wait`
- stable: `data.selector`
- stable: `data.elapsed_ms`
- stable: `data.window`
### Selector-Driven Action Success
For selector-driven action commands that resolve a window target, these identifiers are stable when present:
- `data.ref_id`
- `data.window_id`
- `data.title`
- `data.selector`
This applies to:
- `click`
- `dblclick`
- `focus`
- `close`
- `move-window`
- `resize-window`
The exact human-readable text rendering of those commands is not part of the JSON contract.
### Artifact-Producing Commands
`snapshot`
`screenshot`
- stable: `data.screenshot`
When the command also returns windows, `data.windows` uses the stable window payload documented above.
## Stable Structured Error Kinds
When a runtime command returns structured JSON failure data, these error kinds are stable:
When a command fails with structured JSON data, these `kind` values are stable:
- `selector_not_found`
- `selector_ambiguous`
- `selector_invalid`
- `timeout`
- `not_found`
- `window_not_focused` as `data.last_observation.kind` or equivalent observation payload
Stable structured failure fields include:
Wait failures may also include `window_not_focused` in the last observation
payload.
- `data.kind`
- `data.selector` when selector-related
- `data.mode` when selector-related
- `data.candidates` for ambiguous selector failures
- `data.message` for invalid selector failures
- `data.wait`
- `data.timeout_ms`
- `data.poll_ms`
- `data.last_observation`
## Best-effort fields
## Best-Effort Fields
Treat these as useful but non-contractual:
These values are useful but environment-dependent and should be treated as best-effort:
- exact monitor names
- incidental text formatting in non-JSON mode
- default screenshot file names when no explicit path was provided
- environment-dependent ordering details from the window manager
- exact monitor naming conventions
- EWMH/window-manager-dependent window ordering details
- cosmetic text formatting in non-JSON mode
- screenshot file names when the caller did not provide an explicit path
- command stderr wording outside the structured `kind` classifications above
## Text Mode Expectations
Text mode is intended to stay compact and follow-up-useful.
The exact whitespace/alignment of text output is not stable.
The following expectations are stable at the behavioral level:
- important runtime reads print actionable identifiers or geometry
- selector failures print enough detail to recover without `--json`
- artifact-producing commands print the artifact path
- window listings print both `@wN` refs and `window_id` values
If an agent needs strict parsing, it should use `--json`.
For the full repo copy, see `docs/runtime-contract.md`.

View file

@ -6,73 +6,93 @@ toc: true
# Architecture
## Client-daemon model
## Public model
deskctl uses a client-daemon architecture over Unix sockets. The daemon starts automatically on the first command and keeps the X11 connection alive so repeated calls skip the connection setup overhead.
`deskctl` is a thin, non-interactive X11 control primitive for agent loops.
The public flow is:
Each command opens a new connection to the daemon, sends a single NDJSON request, reads one NDJSON response, and exits.
- diagnose with `deskctl doctor`
- observe with `snapshot`, `list-windows`, and grouped `get` commands
- wait with grouped `wait` commands instead of shell `sleep`
- act with explicit selectors or coordinates
- verify with another read or snapshot
## Wire protocol
The tool stays intentionally narrow. It does not try to be a full desktop shell
or a speculative Wayland abstraction.
## Client-daemon architecture
The CLI talks to an auto-managed daemon over a Unix socket. The daemon keeps
the X11 connection alive so repeated commands stay fast and share the same
session-scoped window identity map.
Each CLI invocation sends one request, reads one response, and exits.
## Runtime contract
Requests and responses are newline-delimited JSON (NDJSON) over a Unix socket.
**Request:**
All commands share the same JSON envelope:
```json
{ "id": "r123456", "action": "snapshot", "annotate": true }
{
"success": true,
"data": {},
"error": null
}
```
**Response:**
For window payloads, the public identity is `window_id`, not an X11 handle.
That keeps the contract backend-neutral even though the current support
boundary is X11-only.
```json
{"success": true, "data": {"screenshot": "/tmp/deskctl-1234567890.png", "windows": [...]}}
```
The complete stable-vs-best-effort policy lives on the
[runtime contract](/runtime-contract) page.
Error responses include an `error` field:
## Sessions and sockets
```json
{ "success": false, "error": "window not found: @w99" }
```
Each session gets its own socket path, PID file, and live window mapping.
## Socket location
Public socket resolution order:
The daemon socket is resolved in this order:
1. `--socket` flag (highest priority)
2. `$DESKCTL_SOCKET_DIR/{session}.sock`
3. `$XDG_RUNTIME_DIR/deskctl/{session}.sock`
1. `--socket`
2. `DESKCTL_SOCKET_DIR/{session}.sock`
3. `XDG_RUNTIME_DIR/deskctl/{session}.sock`
4. `~/.deskctl/{session}.sock`
PID files are stored alongside the socket.
Most users should let `deskctl` manage this automatically. `--session` is the
main public knob when you need isolated daemon instances.
## Sessions
## Diagnostics and failure handling
Multiple isolated daemon instances can run simultaneously using the `--session` flag:
`deskctl doctor` runs before daemon startup and checks:
```sh
deskctl --session workspace1 snapshot
deskctl --session workspace2 snapshot
```
- display/session setup
- X11 connectivity
- basic window enumeration
- screenshot viability
- socket directory and stale-socket health
Each session has its own socket, PID file, and window ref map.
Selector and wait failures are structured in `--json` mode so clients can
recover without scraping text.
## Backend design
## Backend notes
The core is built around a `DesktopBackend` trait. The current implementation uses `x11rb` for X11 protocol operations and `enigo` for input simulation.
The backend is built around a `DesktopBackend` trait and currently ships with
an X11 implementation backed by `x11rb`.
The trait-based design means adding Wayland support is a single trait implementation with no changes to the core, CLI, or daemon code.
The important public guarantee is not "portable desktop automation." The
important guarantee is "a correct and unsurprising Linux X11 runtime contract."
## X11 integration
## X11 support boundary
Window detection uses EWMH properties:
This phase supports Linux X11 only.
| Property | Purpose |
| --------------------------- | ------------------------ |
| `_NET_CLIENT_LIST_STACKING` | Window stacking order |
| `_NET_ACTIVE_WINDOW` | Currently focused window |
| `_NET_WM_NAME` | Window title (UTF-8) |
| `_NET_WM_STATE_HIDDEN` | Minimized state |
| `_NET_CLOSE_WINDOW` | Graceful close |
| `WM_CLASS` | Application class/name |
That means:
Falls back to `XQueryTree` if `_NET_CLIENT_LIST_STACKING` is unavailable.
- EWMH/window-manager properties matter
- monitor naming and some ordering details are best-effort
- Wayland and Hyprland are out of scope for the current contract
The runtime documents those boundaries explicitly instead of pretending the
surface is broader than it is.

View file

@ -6,167 +6,101 @@ toc: true
# Commands
## Snapshot
Capture a screenshot and get the window tree:
## Observe
```sh
deskctl doctor
deskctl snapshot
deskctl snapshot --annotate
```
With `--annotate`, colored bounding boxes and `@wN` labels are drawn on the screenshot. Each window gets a unique color from an 8-color palette. Minimized windows are skipped.
The screenshot is saved to `/tmp/deskctl-{timestamp}.png`.
## Click
Click the center of a window by ref, or click exact coordinates:
```sh
deskctl click @w1
deskctl click 960,540
```
## Double click
```sh
deskctl dblclick @w1
deskctl dblclick 500,300
```
## Type
Type a string into the focused window:
```sh
deskctl type "hello world"
```
## Press
Press a single key:
```sh
deskctl press enter
deskctl press tab
deskctl press escape
```
Supported key names: `enter`, `tab`, `escape`, `backspace`, `delete`, `space`, `up`, `down`, `left`, `right`, `home`, `end`, `pageup`, `pagedown`, `f1`-`f12`, or any single character.
## Hotkey
Send a key combination. List modifier keys first, then the target key:
```sh
deskctl hotkey ctrl c
deskctl hotkey ctrl shift t
deskctl hotkey alt f4
```
Modifier names: `ctrl`, `alt`, `shift`, `super` (also `meta` or `win`).
## Mouse move
Move the cursor to absolute coordinates:
```sh
deskctl mouse move 100 200
```
## Mouse scroll
Scroll the mouse wheel. Positive values scroll down, negative scroll up:
```sh
deskctl mouse scroll 3
deskctl mouse scroll -5
deskctl mouse scroll 3 --axis horizontal
```
## Mouse drag
Drag from one position to another:
```sh
deskctl mouse drag 100 200 500 600
```
## Focus
Focus a window by ref or by name (case-insensitive substring match):
```sh
deskctl focus @w1
deskctl focus "firefox"
```
## Close
Close a window gracefully:
```sh
deskctl close @w2
deskctl close "terminal"
```
## Move window
Move a window to an absolute position:
```sh
deskctl move-window @w1 0 0
deskctl move-window "firefox" 100 100
```
## Resize window
Resize a window:
```sh
deskctl resize-window @w1 1280 720
```
## List windows
List all windows without taking a screenshot:
```sh
deskctl list-windows
```
## Get screen size
```sh
deskctl screenshot
deskctl screenshot /tmp/screen.png
deskctl get active-window
deskctl get monitors
deskctl get version
deskctl get systeminfo
deskctl get-screen-size
```
## Get mouse position
```sh
deskctl get-mouse-position
```
## Screenshot
`doctor` checks the runtime before daemon startup. `snapshot` produces a
screenshot plus window refs. `list-windows` is the same window tree without the
side effect of writing a screenshot.
Take a screenshot without the window tree. Optionally specify a save path:
## Wait
```sh
deskctl screenshot
deskctl screenshot /tmp/my-screenshot.png
deskctl screenshot --annotate
deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'id=win3' --timeout 5
deskctl --json wait window --selector 'class=firefox' --poll-ms 100
```
## Launch
Wait commands return the matched window payload on success. In `--json` mode,
timeouts and selector failures expose structured `kind` values.
Launch an application:
## Act on a window
```sh
deskctl launch firefox
deskctl launch code --args /path/to/project
deskctl focus @w1
deskctl focus 'title=Firefox'
deskctl click @w1
deskctl click 960,540
deskctl dblclick @w2
deskctl close @w3
deskctl move-window @w1 100 120
deskctl resize-window @w1 1280 720
```
Selector-driven actions accept refs, explicit selector modes, or absolute
coordinates where appropriate.
## Input and mouse
```sh
deskctl type "hello world"
deskctl press enter
deskctl hotkey ctrl shift t
deskctl mouse move 100 200
deskctl mouse scroll 3
deskctl mouse scroll 3 --axis horizontal
deskctl mouse drag 100 200 500 600
```
Supported key names include `enter`, `tab`, `escape`, `backspace`, `delete`,
`space`, arrow keys, paging keys, `f1` through `f12`, and any single
character.
## Launch
```sh
deskctl launch firefox
deskctl launch code -- --new-window
```
## Selectors
Prefer explicit selectors when the target matters:
```sh
ref=w1
id=win1
title=Firefox
class=firefox
focused
```
Legacy shorthand is still supported:
```sh
@w1
w1
win1
```
Bare strings like `firefox` are fuzzy matches. They resolve when there is one
match and fail with candidate windows when there are multiple matches.
## Global options
| Flag | Env | Description |
@ -174,3 +108,6 @@ deskctl launch code --args /path/to/project
| `--json` | | Output as JSON |
| `--socket <path>` | `DESKCTL_SOCKET` | Path to daemon Unix socket |
| `--session <name>` | | Session name for multiple daemons (default: `default`) |
`deskctl` manages the daemon automatically. Most users never need to think
about it beyond `--session` and `--socket`.

View file

@ -8,17 +8,49 @@ import DocLayout from "../layouts/DocLayout.astro";
<img src="/favicon.svg" alt="" width="40" height="40" />
</header>
<p>
Desktop control CLI for AI agents on Linux X11. Compact JSON output for
agent loops. Screenshot, click, type, scroll, drag, and manage windows
through a fast client-daemon architecture. 100% native Rust.
<p class="tagline">non-interactive desktop control for AI agents</p>
<div class="badges" aria-label="package and runtime badges">
<a href="https://www.npmjs.com/package/deskctl-cli">
<img
src="https://img.shields.io/npm/v/deskctl-cli?label=npm"
alt="npm version badge"
/>
</a>
<a href="https://github.com/harivansh-afk/deskctl/releases">
<img
src="https://img.shields.io/github/v/release/harivansh-afk/deskctl?label=release"
alt="github release badge"
/>
</a>
<img
src="https://img.shields.io/badge/runtime-linux--x11-111827"
alt="linux x11 runtime badge"
/>
<a href="https://www.npmjs.com/package/deskctl-cli">
<img
src="https://img.shields.io/badge/install-npm%20i%20-g%20deskctl--cli-111827"
alt="npm install command badge"
/>
</a>
</div>
<p class="lede">
<code>deskctl</code> is a thin X11 control primitive for agent loops: diagnose
the runtime, observe the desktop, wait for state transitions, act deterministically,
then verify.
</p>
<h2>Getting started</h2>
<pre><code>npm install -g deskctl-cli
deskctl doctor
deskctl snapshot --annotate</code></pre>
<h2>Start here</h2>
<ul>
<li><a href="/installation">Installation</a></li>
<li><a href="/quick-start">Quick start</a></li>
<li><a href="/runtime-contract">Runtime contract</a></li>
</ul>
<h2>Reference</h2>
@ -28,14 +60,27 @@ import DocLayout from "../layouts/DocLayout.astro";
<li><a href="/architecture">Architecture</a></li>
</ul>
<h2>Agent skill</h2>
<p>
There is also an installable skill for `skills.sh`-style agent runtimes:
</p>
<pre><code>npx skills add harivansh-afk/deskctl -s deskctl</code></pre>
<h2>Links</h2>
<ul>
<li>
<a href="https://www.npmjs.com/package/deskctl-cli">npm package</a>
</li>
<li>
<a href="https://github.com/harivansh-afk/deskctl">GitHub</a>
</li>
<li>
<a href="https://crates.io/crates/deskctl">crates.io</a>
<a href="https://github.com/harivansh-afk/deskctl/releases">
GitHub releases
</a>
</li>
</ul>
</DocLayout>

View file

@ -6,43 +6,68 @@ toc: true
# Installation
## Cargo
## Default install
```sh
cargo install deskctl
npm install -g deskctl-cli
deskctl --help
```
## From source
`deskctl-cli` is the default install path. It installs the `deskctl` command by
downloading the matching GitHub Release asset for the supported runtime target.
## One-shot usage
```sh
npx deskctl-cli --help
```
## Agent skill
For `skills.sh`-style runtimes:
```sh
npx skills add harivansh-afk/deskctl -s deskctl
```
The repo skill lives under `skills/deskctl` and is designed around the same
observe -> wait -> act -> verify loop as the CLI.
## Other install paths
### Nix
```sh
nix run github:harivansh-afk/deskctl -- --help
nix profile install github:harivansh-afk/deskctl
```
### Build from source
```sh
git clone https://github.com/harivansh-afk/deskctl
cd deskctl
cargo build --release
cargo build
```
## Docker (cross-compile for Linux)
Source builds on Linux require:
Build a static Linux binary from any platform:
- Rust 1.75+
- `pkg-config`
- X11 development libraries such as `libx11-dev` and `libxtst-dev`
```sh
docker compose -f docker/docker-compose.yml run --rm build
```
This writes `dist/deskctl-linux-x86_64`.
## Deploy to a remote machine
Copy the binary over SSH when `scp` is not available:
```sh
ssh -p 443 user@host 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64
```
## Requirements
## Runtime requirements
- Linux with an active X11 session
- `DISPLAY` environment variable set (e.g. `DISPLAY=:1`)
- `XDG_SESSION_TYPE=x11`
- A window manager that exposes EWMH properties (`_NET_CLIENT_LIST_STACKING`, `_NET_ACTIVE_WINDOW`)
- `DISPLAY` set to a usable X11 display, such as `DISPLAY=:1`
- `XDG_SESSION_TYPE=x11` or an equivalent X11 session environment
- a window manager or desktop environment that exposes standard EWMH properties
such as `_NET_CLIENT_LIST_STACKING` and `_NET_ACTIVE_WINDOW`
No extra native libraries are needed beyond the standard glibc runtime (`libc`, `libm`, `libgcc_s`).
The binary itself only depends on the standard Linux glibc runtime.
If setup fails, run:
```sh
deskctl doctor
```

View file

@ -6,50 +6,72 @@ toc: true
# Quick start
## Core workflow
The typical agent loop is: snapshot the desktop, interpret the result, act on it.
## Install and diagnose
```sh
# 1. see the desktop
deskctl --json snapshot --annotate
npm install -g deskctl-cli
deskctl doctor
```
# 2. click a window by its ref
deskctl click @w1
Use `deskctl doctor` first. It checks X11 connectivity, basic enumeration,
screenshot viability, and socket health before you start driving the desktop.
# 3. type into the focused window
deskctl type "hello world"
## Observe
# 4. press a key
```sh
deskctl snapshot --annotate
deskctl list-windows
deskctl get active-window
deskctl get monitors
```
Use `snapshot` when you want a screenshot artifact plus window refs. Use
`list-windows` when you only need the current window tree without writing a
screenshot.
## Target windows cleanly
Prefer explicit selectors when you need deterministic targeting:
```sh
ref=w1
id=win1
title=Firefox
class=firefox
focused
```
Legacy refs such as `@w1` still work after `snapshot` or `list-windows`. Bare
strings like `firefox` are fuzzy matches and now fail on ambiguity.
## Wait, act, verify
The core loop is:
```sh
# observe
deskctl snapshot --annotate
# wait
deskctl wait window --selector 'title=Firefox' --timeout 10
# act
deskctl focus 'title=Firefox'
deskctl hotkey ctrl l
deskctl type "https://example.com"
deskctl press enter
# verify
deskctl wait focus --selector 'title=Firefox' --timeout 5
deskctl snapshot
```
The `--annotate` flag draws colored bounding boxes and `@wN` labels on the screenshot so agents can visually identify windows.
The wait commands return the matched window payload on success, so they compose
cleanly into the next action.
## Window refs
## Use `--json` when parsing matters
Every `snapshot` assigns refs like `@w1`, `@w2`, etc. to each visible window, ordered top-to-bottom by stacking order. Use these refs anywhere a selector is expected:
```sh
deskctl click @w1
deskctl focus @w3
deskctl close @w2
```
You can also select windows by name (case-insensitive substring match):
```sh
deskctl focus "firefox"
deskctl close "terminal"
```
## JSON output
Pass `--json` for machine-readable output. This is the primary mode for agent integrations:
```sh
deskctl --json snapshot
```
Every command supports `--json` and uses the same top-level envelope:
```json
{
@ -59,7 +81,7 @@ deskctl --json snapshot
"windows": [
{
"ref_id": "w1",
"xcb_id": 12345678,
"window_id": "win1",
"title": "Firefox",
"app_name": "firefox",
"x": 0,
@ -74,14 +96,8 @@ deskctl --json snapshot
}
```
## Daemon lifecycle
Use `window_id` for stable targeting inside a live daemon session. The exact
text formatting is intentionally compact, but JSON is the parsing contract.
The daemon starts automatically on the first command. It keeps the X11 connection alive so repeated calls are fast. You do not need to manage it manually.
```sh
# check if the daemon is running
deskctl daemon status
# stop it explicitly
deskctl daemon stop
```
The full stable-vs-best-effort contract lives on the
[runtime contract](/runtime-contract) page.

View file

@ -0,0 +1,177 @@
---
layout: ../layouts/DocLayout.astro
title: Runtime contract
toc: true
---
# Runtime contract
This page defines the current public output contract for `deskctl`.
It is intentionally scoped to the current Linux X11 runtime surface. It does
not promise stability for future Wayland or window-manager-specific features.
## JSON envelope
Every command supports `--json` and uses the same top-level envelope:
```json
{
"success": true,
"data": {},
"error": null
}
```
Stable top-level fields:
- `success`
- `data`
- `error`
If `success` is `false`, the command exits non-zero in both text mode and JSON
mode.
## Stable window fields
Whenever a response includes a window payload, these fields are stable:
- `ref_id`
- `window_id`
- `title`
- `app_name`
- `x`
- `y`
- `width`
- `height`
- `focused`
- `minimized`
`window_id` is the public session-scoped identifier for programmatic targeting.
`ref_id` is a short-lived convenience handle from the current ref map.
## Stable grouped reads
`deskctl get active-window`
- stable: `data.window`
`deskctl get monitors`
- stable: `data.count`
- stable: `data.monitors`
Stable per-monitor fields:
- `name`
- `x`
- `y`
- `width`
- `height`
- `width_mm`
- `height_mm`
- `primary`
- `automatic`
`deskctl get version`
- stable: `data.version`
- stable: `data.backend`
`deskctl get systeminfo`
- stable: `data.backend`
- stable: `data.display`
- stable: `data.session_type`
- stable: `data.session`
- stable: `data.socket_path`
- stable: `data.screen`
- stable: `data.monitor_count`
- stable: `data.monitors`
## Stable waits
`deskctl wait window`
`deskctl wait focus`
- stable: `data.wait`
- stable: `data.selector`
- stable: `data.elapsed_ms`
- stable: `data.window`
## Stable selector-driven action fields
When selector-driven actions return resolved window data, these fields are
stable when present:
- `data.ref_id`
- `data.window_id`
- `data.title`
- `data.selector`
This applies to:
- `click`
- `dblclick`
- `focus`
- `close`
- `move-window`
- `resize-window`
## Stable artifact fields
For `snapshot` and `screenshot`:
- stable: `data.screenshot`
When a command also returns windows, `data.windows` uses the stable window
payload documented above.
## Stable structured error kinds
When a command fails with structured JSON data, these error kinds are stable:
- `selector_not_found`
- `selector_ambiguous`
- `selector_invalid`
- `timeout`
- `not_found`
- `window_not_focused` in `data.last_observation.kind` or an equivalent wait
observation payload
Stable structured failure fields include:
- `data.kind`
- `data.selector`
- `data.mode`
- `data.candidates`
- `data.message`
- `data.wait`
- `data.timeout_ms`
- `data.poll_ms`
- `data.last_observation`
## Best-effort fields
These values are useful but environment-dependent and should not be treated as
strict parsing guarantees:
- exact monitor naming conventions
- EWMH/window-manager-dependent ordering details
- cosmetic text formatting in non-JSON mode
- default screenshot file names when no explicit path was provided
- stderr wording outside the structured `kind` classifications above
## Text mode expectations
Text mode is intended to stay compact and follow-up-useful.
The exact whitespace and alignment are not stable. The stable behavioral
expectations are:
- important reads print actionable identifiers or geometry
- selector failures print enough detail to recover without `--json`
- artifact-producing commands print the artifact path
- window listings print both `@wN` refs and `window_id` values
If you need strict parsing, use `--json`.

View file

@ -65,6 +65,23 @@ main {
font-style: italic;
}
.lede {
font-size: 1.05rem;
max-width: 42rem;
}
.badges {
display: flex;
flex-wrap: wrap;
gap: 0.6rem;
margin-bottom: 1.25rem;
}
.badges a,
.badges img {
display: block;
}
header {
display: flex;
align-items: center;
@ -117,6 +134,10 @@ a:hover {
text-decoration-thickness: 2px;
}
img {
max-width: 100%;
}
ul,
ol {
padding-left: 1.25em;

View file

@ -1,21 +1,22 @@
# deskctl commands
All commands support `--json` for machine-parseable output following the runtime contract.
All commands support `--json` for machine-parseable output following the
runtime contract.
## Observe
```bash
deskctl doctor # check X11 runtime and daemon health
deskctl snapshot # screenshot + window list
deskctl snapshot --annotate # screenshot with @wN labels overlaid
deskctl list-windows # window list only (no screenshot)
deskctl screenshot /tmp/screen.png # screenshot to explicit path
deskctl get active-window # focused window info
deskctl get monitors # monitor geometry
deskctl get version # version and backend
deskctl get systeminfo # full runtime diagnostics
deskctl get-screen-size # screen resolution
deskctl get-mouse-position # cursor coordinates
deskctl doctor
deskctl snapshot
deskctl snapshot --annotate
deskctl list-windows
deskctl screenshot /tmp/screen.png
deskctl get active-window
deskctl get monitors
deskctl get version
deskctl get systeminfo
deskctl get-screen-size
deskctl get-mouse-position
```
## Wait
@ -25,19 +26,21 @@ deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'class=firefox' --timeout 5
```
Returns the matched window payload on success. Failures include structured `kind` values in `--json` mode.
Returns the matched window payload on success. Failures include structured
`kind` values in `--json` mode.
## Selectors
```bash
ref=w1 # snapshot ref (short-lived, from last snapshot)
id=win1 # stable window ID (session-scoped)
title=Firefox # match by window title
class=firefox # match by WM class
focused # currently focused window
ref=w1
id=win1
title=Firefox
class=firefox
focused
```
Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail on ambiguity.
Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail
on ambiguity.
## Act
@ -58,12 +61,5 @@ deskctl close @w3
deskctl launch firefox
```
## Daemon
```bash
deskctl daemon start
deskctl daemon stop
deskctl daemon status
```
The daemon starts automatically on first command. Manual control is rarely needed.
The daemon starts automatically on first command. In normal usage you should
not need to manage it directly.