mirror of
https://github.com/harivansh-afk/deskctl.git
synced 2026-04-17 14:01:22 +00:00
align docs and contract
This commit is contained in:
parent
c37589ccf4
commit
cdab5e5550
10 changed files with 590 additions and 657 deletions
268
README.md
268
README.md
|
|
@ -1,266 +1,68 @@
|
||||||
# deskctl
|
# deskctl
|
||||||
|
|
||||||
Desktop control CLI for AI agents on Linux X11.
|
[](https://www.npmjs.com/package/deskctl-cli)
|
||||||
|
[](https://github.com/harivansh-afk/deskctl/releases)
|
||||||
|
[](#support-boundary)
|
||||||
|
[](skills/deskctl)
|
||||||
|
|
||||||
|
Non-interactive desktop control for AI agents on Linux X11.
|
||||||
|
|
||||||
## Install
|
## Install
|
||||||
|
|
||||||
### Cargo
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cargo install deskctl
|
|
||||||
```
|
|
||||||
|
|
||||||
Source builds on Linux require:
|
|
||||||
|
|
||||||
- Rust 1.75+
|
|
||||||
- `pkg-config`
|
|
||||||
- X11 development libraries for input and windowing, typically `libx11-dev` and `libxtst-dev` on Debian/Ubuntu
|
|
||||||
|
|
||||||
### npm
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
npm install -g deskctl-cli
|
npm install -g deskctl-cli
|
||||||
deskctl --help
|
deskctl doctor
|
||||||
|
deskctl snapshot --annotate
|
||||||
```
|
```
|
||||||
|
|
||||||
One-shot execution is also supported:
|
One-shot execution also works:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
npx deskctl-cli --help
|
npx deskctl-cli --help
|
||||||
```
|
```
|
||||||
|
|
||||||
`deskctl-cli` currently supports `linux-x64` and installs the `deskctl` command by downloading the matching GitHub Release asset.
|
`deskctl-cli` installs the `deskctl` command by downloading the matching GitHub Release asset for the supported runtime target.
|
||||||
|
|
||||||
### Installable skill
|
## Installable skill
|
||||||
|
|
||||||
For `skills.sh` / agent skill ecosystems:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
npx skills add harivansh-afk/deskctl -s deskctl
|
npx skills add harivansh-afk/deskctl -s deskctl
|
||||||
```
|
```
|
||||||
|
|
||||||
The installable skill lives under [`skills/deskctl`](skills/deskctl) and is designed for X11 sandboxes, VMs, and sandbox-agent desktop sessions. It points agents to the npm install path first so they can get `deskctl` without Cargo.
|
The installable skill lives in [`skills/deskctl`](skills/deskctl) and is built around the same observe -> wait -> act -> verify loop as the CLI.
|
||||||
|
|
||||||
### Nix
|
## Quick example
|
||||||
|
|
||||||
|
```bash
|
||||||
|
deskctl doctor
|
||||||
|
deskctl snapshot --annotate
|
||||||
|
deskctl wait window --selector 'title=Firefox' --timeout 10
|
||||||
|
deskctl focus 'title=Firefox'
|
||||||
|
deskctl type "hello world"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Docs
|
||||||
|
|
||||||
|
- runtime contract: [docs/runtime-contract.md](docs/runtime-contract.md)
|
||||||
|
- release flow: [docs/releasing.md](docs/releasing.md)
|
||||||
|
- installable skill: [skills/deskctl](skills/deskctl)
|
||||||
|
- contributor workflow: [CONTRIBUTING.md](CONTRIBUTING.md)
|
||||||
|
|
||||||
|
## Other install paths
|
||||||
|
|
||||||
|
Nix:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
nix run github:harivansh-afk/deskctl -- --help
|
nix run github:harivansh-afk/deskctl -- --help
|
||||||
nix profile install github:harivansh-afk/deskctl
|
nix profile install github:harivansh-afk/deskctl
|
||||||
```
|
```
|
||||||
|
|
||||||
The repo flake is the supported Nix install surface in this phase.
|
Source build:
|
||||||
|
|
||||||
### Docker Convenience
|
|
||||||
|
|
||||||
Build a Linux binary locally with Docker:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker compose -f docker/docker-compose.yml run --rm build
|
|
||||||
```
|
|
||||||
|
|
||||||
This writes `dist/deskctl-linux-x86_64`.
|
|
||||||
|
|
||||||
Copy it to an SSH machine where `scp` is unavailable:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh -p 443 deskctl@ssh.agentcomputer.ai 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64
|
|
||||||
```
|
|
||||||
|
|
||||||
Run it on an X11 session:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
DISPLAY=:1 XDG_SESSION_TYPE=x11 ~/deskctl --json snapshot --annotate
|
|
||||||
```
|
|
||||||
|
|
||||||
### Local Source Build
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cargo build
|
cargo build
|
||||||
```
|
```
|
||||||
|
|
||||||
## Quick Start
|
## Support boundary
|
||||||
|
|
||||||
```bash
|
`deskctl` currently supports Linux X11. Use `--json` for stable machine parsing, use `window_id` for programmatic targeting inside a live session, and use `deskctl doctor` first when the runtime looks broken.
|
||||||
# Diagnose the environment first
|
|
||||||
deskctl doctor
|
|
||||||
|
|
||||||
# See the desktop
|
|
||||||
deskctl snapshot
|
|
||||||
|
|
||||||
# Query focused runtime state
|
|
||||||
deskctl get active-window
|
|
||||||
deskctl get monitors
|
|
||||||
|
|
||||||
# Click a window
|
|
||||||
deskctl click @w1
|
|
||||||
|
|
||||||
# Type text
|
|
||||||
deskctl type "hello world"
|
|
||||||
|
|
||||||
# Wait for a window or focus transition
|
|
||||||
deskctl wait window --selector 'title=Firefox' --timeout 10
|
|
||||||
deskctl wait focus --selector 'class=firefox' --timeout 5
|
|
||||||
|
|
||||||
# Focus by explicit selector
|
|
||||||
deskctl focus 'title=Firefox'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
Client-daemon architecture over Unix sockets (NDJSON wire protocol).
|
|
||||||
The daemon starts automatically on first command and keeps the X11 connection alive for fast repeated calls.
|
|
||||||
|
|
||||||
Source layout:
|
|
||||||
|
|
||||||
- `src/lib.rs` exposes the shared library target
|
|
||||||
- `src/main.rs` is the thin CLI wrapper
|
|
||||||
- `src/` contains production code and unit tests
|
|
||||||
- `tests/` contains Linux/X11 integration tests
|
|
||||||
- `tests/support/` contains shared integration helpers
|
|
||||||
|
|
||||||
## Runtime Requirements
|
|
||||||
|
|
||||||
- Linux with X11 session
|
|
||||||
- Rust 1.75+ plus the source-build dependencies above when building from source
|
|
||||||
|
|
||||||
The binary itself only links the standard glibc runtime on Linux (`libc`, `libm`, `libgcc_s`).
|
|
||||||
|
|
||||||
For deskctl to be fully functional on a fresh VM you still need:
|
|
||||||
|
|
||||||
- an X11 server and an active `DISPLAY`
|
|
||||||
- `XDG_SESSION_TYPE=x11` or an equivalent X11 session environment
|
|
||||||
- a window manager or desktop environment that exposes standard EWMH properties such as `_NET_CLIENT_LIST_STACKING` and `_NET_ACTIVE_WINDOW`
|
|
||||||
- an X server with the extensions needed for input simulation and screen metadata, which is standard on normal desktop X11 setups
|
|
||||||
|
|
||||||
If setup fails, run:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
deskctl doctor
|
|
||||||
```
|
|
||||||
|
|
||||||
## Contract Notes
|
|
||||||
|
|
||||||
- `@wN` refs are short-lived handles assigned by `snapshot` and `list-windows`
|
|
||||||
- `--json` output includes a stable `window_id` for programmatic targeting within the current daemon session
|
|
||||||
- `list-windows` is a cheap read-only operation and does not capture or write a screenshot
|
|
||||||
- the stable runtime JSON/error contract is documented in [docs/runtime-contract.md](docs/runtime-contract.md)
|
|
||||||
|
|
||||||
## Read and Wait Surface
|
|
||||||
|
|
||||||
The grouped runtime reads are:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
deskctl get active-window
|
|
||||||
deskctl get monitors
|
|
||||||
deskctl get version
|
|
||||||
deskctl get systeminfo
|
|
||||||
```
|
|
||||||
|
|
||||||
The grouped runtime waits are:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
deskctl wait window --selector 'title=Firefox' --timeout 10
|
|
||||||
deskctl wait focus --selector 'id=win3' --timeout 5
|
|
||||||
```
|
|
||||||
|
|
||||||
Successful `get active-window`, `wait window`, and `wait focus` responses return a `window` payload with:
|
|
||||||
- `ref_id`
|
|
||||||
- `window_id`
|
|
||||||
- `title`
|
|
||||||
- `app_name`
|
|
||||||
- geometry (`x`, `y`, `width`, `height`)
|
|
||||||
- state flags (`focused`, `minimized`)
|
|
||||||
|
|
||||||
`get monitors` returns:
|
|
||||||
- `count`
|
|
||||||
- `monitors[]` with geometry and primary/automatic flags
|
|
||||||
|
|
||||||
`get version` returns:
|
|
||||||
- `version`
|
|
||||||
- `backend`
|
|
||||||
|
|
||||||
`get systeminfo` stays runtime-scoped and returns:
|
|
||||||
- `backend`
|
|
||||||
- `display`
|
|
||||||
- `session_type`
|
|
||||||
- `session`
|
|
||||||
- `socket_path`
|
|
||||||
- `screen`
|
|
||||||
- `monitor_count`
|
|
||||||
- `monitors`
|
|
||||||
|
|
||||||
Wait timeout and selector failures are structured in `--json` mode so agents can recover without string parsing.
|
|
||||||
|
|
||||||
## Output Policy
|
|
||||||
|
|
||||||
Text mode is compact and follow-up-oriented, but JSON is the parsing contract.
|
|
||||||
|
|
||||||
- use `--json` when an agent needs strict parsing
|
|
||||||
- rely on `window_id`, selector-related fields, grouped read payloads, and structured error `kind` values for stable automation
|
|
||||||
- treat monitor naming, incidental whitespace, and default screenshot file names as best-effort
|
|
||||||
|
|
||||||
See [docs/runtime-conract.md](docs/runtime-contract.md) for the exact stable-vs-best-effort breakdown.
|
|
||||||
|
|
||||||
## Distribution
|
|
||||||
|
|
||||||
- GitHub Releases are the canonical binary source
|
|
||||||
- crates.io package: `deskctl`
|
|
||||||
- npm package: `deskctl-cli`
|
|
||||||
- installed command on every channel: `deskctl`
|
|
||||||
- repo-owned Nix install path: `flake.nix`
|
|
||||||
|
|
||||||
For maintainer publishing and release steps, see [docs/releasing.md](docs/releasing.md).
|
|
||||||
|
|
||||||
## Selector Contract
|
|
||||||
|
|
||||||
Explicit selector modes:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ref=w1
|
|
||||||
id=win1
|
|
||||||
title=Firefox
|
|
||||||
class=firefox
|
|
||||||
focused
|
|
||||||
```
|
|
||||||
|
|
||||||
Legacy refs remain supported:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
@w1
|
|
||||||
w1
|
|
||||||
win1
|
|
||||||
```
|
|
||||||
|
|
||||||
Bare selectors such as `firefox` are still supported as fuzzy substring matches, but they now fail on ambiguity and return candidate windows instead of silently picking the first match.
|
|
||||||
|
|
||||||
## Support Boundary
|
|
||||||
|
|
||||||
`deskctl` supports Linux X11 in this phase. Wayland and Hyprland are explicitly out of scope for the current runtime contract.
|
|
||||||
|
|
||||||
## Workflow
|
|
||||||
|
|
||||||
Local validation uses the root `Makefile`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
make fmt-check
|
|
||||||
make lint
|
|
||||||
make test-unit
|
|
||||||
make test-integration
|
|
||||||
make site-format-check
|
|
||||||
make validate
|
|
||||||
```
|
|
||||||
|
|
||||||
`make validate` is the full repo-quality check and requires Linux with `xvfb-run` plus `pnpm --dir site install`.
|
|
||||||
|
|
||||||
The repository standardizes on `pre-commit` for fast commit-time checks:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pre-commit install
|
|
||||||
pre-commit run --all-files
|
|
||||||
```
|
|
||||||
|
|
||||||
See [CONTRIBUTING.md](CONTRIBUTING.md) for the full contributor guide.
|
|
||||||
|
|
||||||
## Acknowledgements
|
|
||||||
|
|
||||||
- [@barrettruth](github.com/barrettruth) - i stole the website from [vimdoc](https://github.com/barrettruth/vimdoc-language-server)
|
|
||||||
|
|
|
||||||
|
|
@ -1,19 +1,6 @@
|
||||||
# Runtime Output Contract
|
# deskctl runtime contract
|
||||||
|
|
||||||
This document defines the current output contract for `deskctl`.
|
All commands support `--json` and use the same top-level envelope:
|
||||||
|
|
||||||
It is intentionally scoped to the current Linux X11 runtime surface.
|
|
||||||
It does not promise stability for future Wayland or window-manager-specific features.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
- Keep `deskctl` fully non-interactive
|
|
||||||
- Make text output actionable for quick terminal and agent loops
|
|
||||||
- Make `--json` safe for agent consumption without depending on incidental formatting
|
|
||||||
|
|
||||||
## JSON Envelope
|
|
||||||
|
|
||||||
Every runtime command uses the same top-level JSON envelope:
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
|
@ -23,22 +10,11 @@ Every runtime command uses the same top-level JSON envelope:
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Stable top-level fields:
|
Use `--json` whenever you need to parse output programmatically.
|
||||||
|
|
||||||
- `success`
|
## Stable window fields
|
||||||
- `data`
|
|
||||||
- `error`
|
|
||||||
|
|
||||||
`success` is always the authoritative success/failure bit.
|
Whenever a response includes a window payload, these fields are stable:
|
||||||
When `success` is `false`, the CLI exits non-zero in both text mode and `--json` mode.
|
|
||||||
|
|
||||||
## Stable Fields
|
|
||||||
|
|
||||||
These fields are stable for agent consumption in the current Phase 1 runtime contract.
|
|
||||||
|
|
||||||
### Window Identity
|
|
||||||
|
|
||||||
Whenever a runtime response includes a window payload, these fields are stable:
|
|
||||||
|
|
||||||
- `ref_id`
|
- `ref_id`
|
||||||
- `window_id`
|
- `window_id`
|
||||||
|
|
@ -51,128 +27,46 @@ Whenever a runtime response includes a window payload, these fields are stable:
|
||||||
- `focused`
|
- `focused`
|
||||||
- `minimized`
|
- `minimized`
|
||||||
|
|
||||||
`window_id` is the stable public identifier for a live daemon session.
|
Use `window_id` for stable targeting inside a live daemon session. Use
|
||||||
`ref_id` is a short-lived convenience handle for the current window snapshot/ref map.
|
`ref_id` or `@wN` for short-lived follow-up actions after `snapshot` or
|
||||||
|
`list-windows`.
|
||||||
|
|
||||||
### Grouped Reads
|
## Stable grouped reads
|
||||||
|
|
||||||
`deskctl get active-window`
|
- `deskctl get active-window` -> `data.window`
|
||||||
|
- `deskctl get monitors` -> `data.count`, `data.monitors`
|
||||||
|
- `deskctl get version` -> `data.version`, `data.backend`
|
||||||
|
- `deskctl get systeminfo` -> runtime-scoped diagnostic fields such as
|
||||||
|
`backend`, `display`, `session_type`, `session`, `socket_path`, `screen`,
|
||||||
|
`monitor_count`, and `monitors`
|
||||||
|
|
||||||
- stable: `data.window`
|
## Stable waits
|
||||||
|
|
||||||
`deskctl get monitors`
|
- `deskctl wait window` -> `data.wait`, `data.selector`, `data.elapsed_ms`,
|
||||||
|
`data.window`
|
||||||
|
- `deskctl wait focus` -> `data.wait`, `data.selector`, `data.elapsed_ms`,
|
||||||
|
`data.window`
|
||||||
|
|
||||||
- stable: `data.count`
|
## Stable structured error kinds
|
||||||
- stable: `data.monitors`
|
|
||||||
- stable per monitor:
|
|
||||||
- `name`
|
|
||||||
- `x`
|
|
||||||
- `y`
|
|
||||||
- `width`
|
|
||||||
- `height`
|
|
||||||
- `width_mm`
|
|
||||||
- `height_mm`
|
|
||||||
- `primary`
|
|
||||||
- `automatic`
|
|
||||||
|
|
||||||
`deskctl get version`
|
When a command fails with structured JSON data, these `kind` values are stable:
|
||||||
|
|
||||||
- stable: `data.version`
|
|
||||||
- stable: `data.backend`
|
|
||||||
|
|
||||||
`deskctl get systeminfo`
|
|
||||||
|
|
||||||
- stable: `data.backend`
|
|
||||||
- stable: `data.display`
|
|
||||||
- stable: `data.session_type`
|
|
||||||
- stable: `data.session`
|
|
||||||
- stable: `data.socket_path`
|
|
||||||
- stable: `data.screen`
|
|
||||||
- stable: `data.monitor_count`
|
|
||||||
- stable: `data.monitors`
|
|
||||||
|
|
||||||
### Waits
|
|
||||||
|
|
||||||
`deskctl wait window`
|
|
||||||
`deskctl wait focus`
|
|
||||||
|
|
||||||
- stable: `data.wait`
|
|
||||||
- stable: `data.selector`
|
|
||||||
- stable: `data.elapsed_ms`
|
|
||||||
- stable: `data.window`
|
|
||||||
|
|
||||||
### Selector-Driven Action Success
|
|
||||||
|
|
||||||
For selector-driven action commands that resolve a window target, these identifiers are stable when present:
|
|
||||||
|
|
||||||
- `data.ref_id`
|
|
||||||
- `data.window_id`
|
|
||||||
- `data.title`
|
|
||||||
- `data.selector`
|
|
||||||
|
|
||||||
This applies to:
|
|
||||||
|
|
||||||
- `click`
|
|
||||||
- `dblclick`
|
|
||||||
- `focus`
|
|
||||||
- `close`
|
|
||||||
- `move-window`
|
|
||||||
- `resize-window`
|
|
||||||
|
|
||||||
The exact human-readable text rendering of those commands is not part of the JSON contract.
|
|
||||||
|
|
||||||
### Artifact-Producing Commands
|
|
||||||
|
|
||||||
`snapshot`
|
|
||||||
`screenshot`
|
|
||||||
|
|
||||||
- stable: `data.screenshot`
|
|
||||||
|
|
||||||
When the command also returns windows, `data.windows` uses the stable window payload documented above.
|
|
||||||
|
|
||||||
## Stable Structured Error Kinds
|
|
||||||
|
|
||||||
When a runtime command returns structured JSON failure data, these error kinds are stable:
|
|
||||||
|
|
||||||
- `selector_not_found`
|
- `selector_not_found`
|
||||||
- `selector_ambiguous`
|
- `selector_ambiguous`
|
||||||
- `selector_invalid`
|
- `selector_invalid`
|
||||||
- `timeout`
|
- `timeout`
|
||||||
- `not_found`
|
- `not_found`
|
||||||
- `window_not_focused` as `data.last_observation.kind` or equivalent observation payload
|
|
||||||
|
|
||||||
Stable structured failure fields include:
|
Wait failures may also include `window_not_focused` in the last observation
|
||||||
|
payload.
|
||||||
|
|
||||||
- `data.kind`
|
## Best-effort fields
|
||||||
- `data.selector` when selector-related
|
|
||||||
- `data.mode` when selector-related
|
|
||||||
- `data.candidates` for ambiguous selector failures
|
|
||||||
- `data.message` for invalid selector failures
|
|
||||||
- `data.wait`
|
|
||||||
- `data.timeout_ms`
|
|
||||||
- `data.poll_ms`
|
|
||||||
- `data.last_observation`
|
|
||||||
|
|
||||||
## Best-Effort Fields
|
Treat these as useful but non-contractual:
|
||||||
|
|
||||||
These values are useful but environment-dependent and should be treated as best-effort:
|
- exact monitor names
|
||||||
|
- incidental text formatting in non-JSON mode
|
||||||
|
- default screenshot file names when no explicit path was provided
|
||||||
|
- environment-dependent ordering details from the window manager
|
||||||
|
|
||||||
- exact monitor naming conventions
|
For the full repo copy, see `docs/runtime-contract.md`.
|
||||||
- EWMH/window-manager-dependent window ordering details
|
|
||||||
- cosmetic text formatting in non-JSON mode
|
|
||||||
- screenshot file names when the caller did not provide an explicit path
|
|
||||||
- command stderr wording outside the structured `kind` classifications above
|
|
||||||
|
|
||||||
## Text Mode Expectations
|
|
||||||
|
|
||||||
Text mode is intended to stay compact and follow-up-useful.
|
|
||||||
|
|
||||||
The exact whitespace/alignment of text output is not stable.
|
|
||||||
The following expectations are stable at the behavioral level:
|
|
||||||
|
|
||||||
- important runtime reads print actionable identifiers or geometry
|
|
||||||
- selector failures print enough detail to recover without `--json`
|
|
||||||
- artifact-producing commands print the artifact path
|
|
||||||
- window listings print both `@wN` refs and `window_id` values
|
|
||||||
|
|
||||||
If an agent needs strict parsing, it should use `--json`.
|
|
||||||
|
|
|
||||||
|
|
@ -6,73 +6,93 @@ toc: true
|
||||||
|
|
||||||
# Architecture
|
# Architecture
|
||||||
|
|
||||||
## Client-daemon model
|
## Public model
|
||||||
|
|
||||||
deskctl uses a client-daemon architecture over Unix sockets. The daemon starts automatically on the first command and keeps the X11 connection alive so repeated calls skip the connection setup overhead.
|
`deskctl` is a thin, non-interactive X11 control primitive for agent loops.
|
||||||
|
The public flow is:
|
||||||
|
|
||||||
Each command opens a new connection to the daemon, sends a single NDJSON request, reads one NDJSON response, and exits.
|
- diagnose with `deskctl doctor`
|
||||||
|
- observe with `snapshot`, `list-windows`, and grouped `get` commands
|
||||||
|
- wait with grouped `wait` commands instead of shell `sleep`
|
||||||
|
- act with explicit selectors or coordinates
|
||||||
|
- verify with another read or snapshot
|
||||||
|
|
||||||
## Wire protocol
|
The tool stays intentionally narrow. It does not try to be a full desktop shell
|
||||||
|
or a speculative Wayland abstraction.
|
||||||
|
|
||||||
|
## Client-daemon architecture
|
||||||
|
|
||||||
|
The CLI talks to an auto-managed daemon over a Unix socket. The daemon keeps
|
||||||
|
the X11 connection alive so repeated commands stay fast and share the same
|
||||||
|
session-scoped window identity map.
|
||||||
|
|
||||||
|
Each CLI invocation sends one request, reads one response, and exits.
|
||||||
|
|
||||||
|
## Runtime contract
|
||||||
|
|
||||||
Requests and responses are newline-delimited JSON (NDJSON) over a Unix socket.
|
Requests and responses are newline-delimited JSON (NDJSON) over a Unix socket.
|
||||||
|
|
||||||
**Request:**
|
All commands share the same JSON envelope:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{ "id": "r123456", "action": "snapshot", "annotate": true }
|
{
|
||||||
|
"success": true,
|
||||||
|
"data": {},
|
||||||
|
"error": null
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Response:**
|
For window payloads, the public identity is `window_id`, not an X11 handle.
|
||||||
|
That keeps the contract backend-neutral even though the current support
|
||||||
|
boundary is X11-only.
|
||||||
|
|
||||||
```json
|
The complete stable-vs-best-effort policy lives on the
|
||||||
{"success": true, "data": {"screenshot": "/tmp/deskctl-1234567890.png", "windows": [...]}}
|
[runtime contract](/runtime-contract) page.
|
||||||
```
|
|
||||||
|
|
||||||
Error responses include an `error` field:
|
## Sessions and sockets
|
||||||
|
|
||||||
```json
|
Each session gets its own socket path, PID file, and live window mapping.
|
||||||
{ "success": false, "error": "window not found: @w99" }
|
|
||||||
```
|
|
||||||
|
|
||||||
## Socket location
|
Public socket resolution order:
|
||||||
|
|
||||||
The daemon socket is resolved in this order:
|
1. `--socket`
|
||||||
|
2. `DESKCTL_SOCKET_DIR/{session}.sock`
|
||||||
1. `--socket` flag (highest priority)
|
3. `XDG_RUNTIME_DIR/deskctl/{session}.sock`
|
||||||
2. `$DESKCTL_SOCKET_DIR/{session}.sock`
|
|
||||||
3. `$XDG_RUNTIME_DIR/deskctl/{session}.sock`
|
|
||||||
4. `~/.deskctl/{session}.sock`
|
4. `~/.deskctl/{session}.sock`
|
||||||
|
|
||||||
PID files are stored alongside the socket.
|
Most users should let `deskctl` manage this automatically. `--session` is the
|
||||||
|
main public knob when you need isolated daemon instances.
|
||||||
|
|
||||||
## Sessions
|
## Diagnostics and failure handling
|
||||||
|
|
||||||
Multiple isolated daemon instances can run simultaneously using the `--session` flag:
|
`deskctl doctor` runs before daemon startup and checks:
|
||||||
|
|
||||||
```sh
|
- display/session setup
|
||||||
deskctl --session workspace1 snapshot
|
- X11 connectivity
|
||||||
deskctl --session workspace2 snapshot
|
- basic window enumeration
|
||||||
```
|
- screenshot viability
|
||||||
|
- socket directory and stale-socket health
|
||||||
|
|
||||||
Each session has its own socket, PID file, and window ref map.
|
Selector and wait failures are structured in `--json` mode so clients can
|
||||||
|
recover without scraping text.
|
||||||
|
|
||||||
## Backend design
|
## Backend notes
|
||||||
|
|
||||||
The core is built around a `DesktopBackend` trait. The current implementation uses `x11rb` for X11 protocol operations and `enigo` for input simulation.
|
The backend is built around a `DesktopBackend` trait and currently ships with
|
||||||
|
an X11 implementation backed by `x11rb`.
|
||||||
|
|
||||||
The trait-based design means adding Wayland support is a single trait implementation with no changes to the core, CLI, or daemon code.
|
The important public guarantee is not "portable desktop automation." The
|
||||||
|
important guarantee is "a correct and unsurprising Linux X11 runtime contract."
|
||||||
|
|
||||||
## X11 integration
|
## X11 support boundary
|
||||||
|
|
||||||
Window detection uses EWMH properties:
|
This phase supports Linux X11 only.
|
||||||
|
|
||||||
| Property | Purpose |
|
That means:
|
||||||
| --------------------------- | ------------------------ |
|
|
||||||
| `_NET_CLIENT_LIST_STACKING` | Window stacking order |
|
|
||||||
| `_NET_ACTIVE_WINDOW` | Currently focused window |
|
|
||||||
| `_NET_WM_NAME` | Window title (UTF-8) |
|
|
||||||
| `_NET_WM_STATE_HIDDEN` | Minimized state |
|
|
||||||
| `_NET_CLOSE_WINDOW` | Graceful close |
|
|
||||||
| `WM_CLASS` | Application class/name |
|
|
||||||
|
|
||||||
Falls back to `XQueryTree` if `_NET_CLIENT_LIST_STACKING` is unavailable.
|
- EWMH/window-manager properties matter
|
||||||
|
- monitor naming and some ordering details are best-effort
|
||||||
|
- Wayland and Hyprland are out of scope for the current contract
|
||||||
|
|
||||||
|
The runtime documents those boundaries explicitly instead of pretending the
|
||||||
|
surface is broader than it is.
|
||||||
|
|
|
||||||
|
|
@ -6,167 +6,101 @@ toc: true
|
||||||
|
|
||||||
# Commands
|
# Commands
|
||||||
|
|
||||||
## Snapshot
|
## Observe
|
||||||
|
|
||||||
Capture a screenshot and get the window tree:
|
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
|
deskctl doctor
|
||||||
deskctl snapshot
|
deskctl snapshot
|
||||||
deskctl snapshot --annotate
|
deskctl snapshot --annotate
|
||||||
```
|
|
||||||
|
|
||||||
With `--annotate`, colored bounding boxes and `@wN` labels are drawn on the screenshot. Each window gets a unique color from an 8-color palette. Minimized windows are skipped.
|
|
||||||
|
|
||||||
The screenshot is saved to `/tmp/deskctl-{timestamp}.png`.
|
|
||||||
|
|
||||||
## Click
|
|
||||||
|
|
||||||
Click the center of a window by ref, or click exact coordinates:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl click @w1
|
|
||||||
deskctl click 960,540
|
|
||||||
```
|
|
||||||
|
|
||||||
## Double click
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl dblclick @w1
|
|
||||||
deskctl dblclick 500,300
|
|
||||||
```
|
|
||||||
|
|
||||||
## Type
|
|
||||||
|
|
||||||
Type a string into the focused window:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl type "hello world"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Press
|
|
||||||
|
|
||||||
Press a single key:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl press enter
|
|
||||||
deskctl press tab
|
|
||||||
deskctl press escape
|
|
||||||
```
|
|
||||||
|
|
||||||
Supported key names: `enter`, `tab`, `escape`, `backspace`, `delete`, `space`, `up`, `down`, `left`, `right`, `home`, `end`, `pageup`, `pagedown`, `f1`-`f12`, or any single character.
|
|
||||||
|
|
||||||
## Hotkey
|
|
||||||
|
|
||||||
Send a key combination. List modifier keys first, then the target key:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl hotkey ctrl c
|
|
||||||
deskctl hotkey ctrl shift t
|
|
||||||
deskctl hotkey alt f4
|
|
||||||
```
|
|
||||||
|
|
||||||
Modifier names: `ctrl`, `alt`, `shift`, `super` (also `meta` or `win`).
|
|
||||||
|
|
||||||
## Mouse move
|
|
||||||
|
|
||||||
Move the cursor to absolute coordinates:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl mouse move 100 200
|
|
||||||
```
|
|
||||||
|
|
||||||
## Mouse scroll
|
|
||||||
|
|
||||||
Scroll the mouse wheel. Positive values scroll down, negative scroll up:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl mouse scroll 3
|
|
||||||
deskctl mouse scroll -5
|
|
||||||
deskctl mouse scroll 3 --axis horizontal
|
|
||||||
```
|
|
||||||
|
|
||||||
## Mouse drag
|
|
||||||
|
|
||||||
Drag from one position to another:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl mouse drag 100 200 500 600
|
|
||||||
```
|
|
||||||
|
|
||||||
## Focus
|
|
||||||
|
|
||||||
Focus a window by ref or by name (case-insensitive substring match):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl focus @w1
|
|
||||||
deskctl focus "firefox"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Close
|
|
||||||
|
|
||||||
Close a window gracefully:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl close @w2
|
|
||||||
deskctl close "terminal"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Move window
|
|
||||||
|
|
||||||
Move a window to an absolute position:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl move-window @w1 0 0
|
|
||||||
deskctl move-window "firefox" 100 100
|
|
||||||
```
|
|
||||||
|
|
||||||
## Resize window
|
|
||||||
|
|
||||||
Resize a window:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl resize-window @w1 1280 720
|
|
||||||
```
|
|
||||||
|
|
||||||
## List windows
|
|
||||||
|
|
||||||
List all windows without taking a screenshot:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl list-windows
|
deskctl list-windows
|
||||||
```
|
deskctl screenshot
|
||||||
|
deskctl screenshot /tmp/screen.png
|
||||||
## Get screen size
|
deskctl get active-window
|
||||||
|
deskctl get monitors
|
||||||
```sh
|
deskctl get version
|
||||||
|
deskctl get systeminfo
|
||||||
deskctl get-screen-size
|
deskctl get-screen-size
|
||||||
```
|
|
||||||
|
|
||||||
## Get mouse position
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl get-mouse-position
|
deskctl get-mouse-position
|
||||||
```
|
```
|
||||||
|
|
||||||
## Screenshot
|
`doctor` checks the runtime before daemon startup. `snapshot` produces a
|
||||||
|
screenshot plus window refs. `list-windows` is the same window tree without the
|
||||||
|
side effect of writing a screenshot.
|
||||||
|
|
||||||
Take a screenshot without the window tree. Optionally specify a save path:
|
## Wait
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
deskctl screenshot
|
deskctl wait window --selector 'title=Firefox' --timeout 10
|
||||||
deskctl screenshot /tmp/my-screenshot.png
|
deskctl wait focus --selector 'id=win3' --timeout 5
|
||||||
deskctl screenshot --annotate
|
deskctl --json wait window --selector 'class=firefox' --poll-ms 100
|
||||||
```
|
```
|
||||||
|
|
||||||
## Launch
|
Wait commands return the matched window payload on success. In `--json` mode,
|
||||||
|
timeouts and selector failures expose structured `kind` values.
|
||||||
|
|
||||||
Launch an application:
|
## Act on a window
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
deskctl launch firefox
|
deskctl launch firefox
|
||||||
deskctl launch code --args /path/to/project
|
deskctl focus @w1
|
||||||
|
deskctl focus 'title=Firefox'
|
||||||
|
deskctl click @w1
|
||||||
|
deskctl click 960,540
|
||||||
|
deskctl dblclick @w2
|
||||||
|
deskctl close @w3
|
||||||
|
deskctl move-window @w1 100 120
|
||||||
|
deskctl resize-window @w1 1280 720
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Selector-driven actions accept refs, explicit selector modes, or absolute
|
||||||
|
coordinates where appropriate.
|
||||||
|
|
||||||
|
## Input and mouse
|
||||||
|
|
||||||
|
```sh
|
||||||
|
deskctl type "hello world"
|
||||||
|
deskctl press enter
|
||||||
|
deskctl hotkey ctrl shift t
|
||||||
|
deskctl mouse move 100 200
|
||||||
|
deskctl mouse scroll 3
|
||||||
|
deskctl mouse scroll 3 --axis horizontal
|
||||||
|
deskctl mouse drag 100 200 500 600
|
||||||
|
```
|
||||||
|
|
||||||
|
Supported key names include `enter`, `tab`, `escape`, `backspace`, `delete`,
|
||||||
|
`space`, arrow keys, paging keys, `f1` through `f12`, and any single
|
||||||
|
character.
|
||||||
|
|
||||||
|
## Launch
|
||||||
|
|
||||||
|
```sh
|
||||||
|
deskctl launch firefox
|
||||||
|
deskctl launch code -- --new-window
|
||||||
|
```
|
||||||
|
|
||||||
|
## Selectors
|
||||||
|
|
||||||
|
Prefer explicit selectors when the target matters:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
ref=w1
|
||||||
|
id=win1
|
||||||
|
title=Firefox
|
||||||
|
class=firefox
|
||||||
|
focused
|
||||||
|
```
|
||||||
|
|
||||||
|
Legacy shorthand is still supported:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
@w1
|
||||||
|
w1
|
||||||
|
win1
|
||||||
|
```
|
||||||
|
|
||||||
|
Bare strings like `firefox` are fuzzy matches. They resolve when there is one
|
||||||
|
match and fail with candidate windows when there are multiple matches.
|
||||||
|
|
||||||
## Global options
|
## Global options
|
||||||
|
|
||||||
| Flag | Env | Description |
|
| Flag | Env | Description |
|
||||||
|
|
@ -174,3 +108,6 @@ deskctl launch code --args /path/to/project
|
||||||
| `--json` | | Output as JSON |
|
| `--json` | | Output as JSON |
|
||||||
| `--socket <path>` | `DESKCTL_SOCKET` | Path to daemon Unix socket |
|
| `--socket <path>` | `DESKCTL_SOCKET` | Path to daemon Unix socket |
|
||||||
| `--session <name>` | | Session name for multiple daemons (default: `default`) |
|
| `--session <name>` | | Session name for multiple daemons (default: `default`) |
|
||||||
|
|
||||||
|
`deskctl` manages the daemon automatically. Most users never need to think
|
||||||
|
about it beyond `--session` and `--socket`.
|
||||||
|
|
|
||||||
|
|
@ -8,17 +8,49 @@ import DocLayout from "../layouts/DocLayout.astro";
|
||||||
<img src="/favicon.svg" alt="" width="40" height="40" />
|
<img src="/favicon.svg" alt="" width="40" height="40" />
|
||||||
</header>
|
</header>
|
||||||
|
|
||||||
<p>
|
<p class="tagline">non-interactive desktop control for AI agents</p>
|
||||||
Desktop control CLI for AI agents on Linux X11. Compact JSON output for
|
|
||||||
agent loops. Screenshot, click, type, scroll, drag, and manage windows
|
<div class="badges" aria-label="package and runtime badges">
|
||||||
through a fast client-daemon architecture. 100% native Rust.
|
<a href="https://www.npmjs.com/package/deskctl-cli">
|
||||||
|
<img
|
||||||
|
src="https://img.shields.io/npm/v/deskctl-cli?label=npm"
|
||||||
|
alt="npm version badge"
|
||||||
|
/>
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/harivansh-afk/deskctl/releases">
|
||||||
|
<img
|
||||||
|
src="https://img.shields.io/github/v/release/harivansh-afk/deskctl?label=release"
|
||||||
|
alt="github release badge"
|
||||||
|
/>
|
||||||
|
</a>
|
||||||
|
<img
|
||||||
|
src="https://img.shields.io/badge/runtime-linux--x11-111827"
|
||||||
|
alt="linux x11 runtime badge"
|
||||||
|
/>
|
||||||
|
<a href="https://www.npmjs.com/package/deskctl-cli">
|
||||||
|
<img
|
||||||
|
src="https://img.shields.io/badge/install-npm%20i%20-g%20deskctl--cli-111827"
|
||||||
|
alt="npm install command badge"
|
||||||
|
/>
|
||||||
|
</a>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<p class="lede">
|
||||||
|
<code>deskctl</code> is a thin X11 control primitive for agent loops: diagnose
|
||||||
|
the runtime, observe the desktop, wait for state transitions, act deterministically,
|
||||||
|
then verify.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h2>Getting started</h2>
|
<pre><code>npm install -g deskctl-cli
|
||||||
|
deskctl doctor
|
||||||
|
deskctl snapshot --annotate</code></pre>
|
||||||
|
|
||||||
|
<h2>Start here</h2>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li><a href="/installation">Installation</a></li>
|
<li><a href="/installation">Installation</a></li>
|
||||||
<li><a href="/quick-start">Quick start</a></li>
|
<li><a href="/quick-start">Quick start</a></li>
|
||||||
|
<li><a href="/runtime-contract">Runtime contract</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<h2>Reference</h2>
|
<h2>Reference</h2>
|
||||||
|
|
@ -28,14 +60,27 @@ import DocLayout from "../layouts/DocLayout.astro";
|
||||||
<li><a href="/architecture">Architecture</a></li>
|
<li><a href="/architecture">Architecture</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2>Agent skill</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There is also an installable skill for `skills.sh`-style agent runtimes:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre><code>npx skills add harivansh-afk/deskctl -s deskctl</code></pre>
|
||||||
|
|
||||||
<h2>Links</h2>
|
<h2>Links</h2>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
|
<li>
|
||||||
|
<a href="https://www.npmjs.com/package/deskctl-cli">npm package</a>
|
||||||
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<a href="https://github.com/harivansh-afk/deskctl">GitHub</a>
|
<a href="https://github.com/harivansh-afk/deskctl">GitHub</a>
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<a href="https://crates.io/crates/deskctl">crates.io</a>
|
<a href="https://github.com/harivansh-afk/deskctl/releases">
|
||||||
|
GitHub releases
|
||||||
|
</a>
|
||||||
</li>
|
</li>
|
||||||
</ul>
|
</ul>
|
||||||
</DocLayout>
|
</DocLayout>
|
||||||
|
|
|
||||||
|
|
@ -6,43 +6,68 @@ toc: true
|
||||||
|
|
||||||
# Installation
|
# Installation
|
||||||
|
|
||||||
## Cargo
|
## Default install
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
cargo install deskctl
|
npm install -g deskctl-cli
|
||||||
|
deskctl --help
|
||||||
```
|
```
|
||||||
|
|
||||||
## From source
|
`deskctl-cli` is the default install path. It installs the `deskctl` command by
|
||||||
|
downloading the matching GitHub Release asset for the supported runtime target.
|
||||||
|
|
||||||
|
## One-shot usage
|
||||||
|
|
||||||
|
```sh
|
||||||
|
npx deskctl-cli --help
|
||||||
|
```
|
||||||
|
|
||||||
|
## Agent skill
|
||||||
|
|
||||||
|
For `skills.sh`-style runtimes:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
npx skills add harivansh-afk/deskctl -s deskctl
|
||||||
|
```
|
||||||
|
|
||||||
|
The repo skill lives under `skills/deskctl` and is designed around the same
|
||||||
|
observe -> wait -> act -> verify loop as the CLI.
|
||||||
|
|
||||||
|
## Other install paths
|
||||||
|
|
||||||
|
### Nix
|
||||||
|
|
||||||
|
```sh
|
||||||
|
nix run github:harivansh-afk/deskctl -- --help
|
||||||
|
nix profile install github:harivansh-afk/deskctl
|
||||||
|
```
|
||||||
|
|
||||||
|
### Build from source
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
git clone https://github.com/harivansh-afk/deskctl
|
git clone https://github.com/harivansh-afk/deskctl
|
||||||
cd deskctl
|
cd deskctl
|
||||||
cargo build --release
|
cargo build
|
||||||
```
|
```
|
||||||
|
|
||||||
## Docker (cross-compile for Linux)
|
Source builds on Linux require:
|
||||||
|
|
||||||
Build a static Linux binary from any platform:
|
- Rust 1.75+
|
||||||
|
- `pkg-config`
|
||||||
|
- X11 development libraries such as `libx11-dev` and `libxtst-dev`
|
||||||
|
|
||||||
```sh
|
## Runtime requirements
|
||||||
docker compose -f docker/docker-compose.yml run --rm build
|
|
||||||
```
|
|
||||||
|
|
||||||
This writes `dist/deskctl-linux-x86_64`.
|
|
||||||
|
|
||||||
## Deploy to a remote machine
|
|
||||||
|
|
||||||
Copy the binary over SSH when `scp` is not available:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
ssh -p 443 user@host 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64
|
|
||||||
```
|
|
||||||
|
|
||||||
## Requirements
|
|
||||||
|
|
||||||
- Linux with an active X11 session
|
- Linux with an active X11 session
|
||||||
- `DISPLAY` environment variable set (e.g. `DISPLAY=:1`)
|
- `DISPLAY` set to a usable X11 display, such as `DISPLAY=:1`
|
||||||
- `XDG_SESSION_TYPE=x11`
|
- `XDG_SESSION_TYPE=x11` or an equivalent X11 session environment
|
||||||
- A window manager that exposes EWMH properties (`_NET_CLIENT_LIST_STACKING`, `_NET_ACTIVE_WINDOW`)
|
- a window manager or desktop environment that exposes standard EWMH properties
|
||||||
|
such as `_NET_CLIENT_LIST_STACKING` and `_NET_ACTIVE_WINDOW`
|
||||||
|
|
||||||
No extra native libraries are needed beyond the standard glibc runtime (`libc`, `libm`, `libgcc_s`).
|
The binary itself only depends on the standard Linux glibc runtime.
|
||||||
|
|
||||||
|
If setup fails, run:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
deskctl doctor
|
||||||
|
```
|
||||||
|
|
|
||||||
|
|
@ -6,50 +6,72 @@ toc: true
|
||||||
|
|
||||||
# Quick start
|
# Quick start
|
||||||
|
|
||||||
## Core workflow
|
## Install and diagnose
|
||||||
|
|
||||||
The typical agent loop is: snapshot the desktop, interpret the result, act on it.
|
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
# 1. see the desktop
|
npm install -g deskctl-cli
|
||||||
deskctl --json snapshot --annotate
|
deskctl doctor
|
||||||
|
```
|
||||||
|
|
||||||
# 2. click a window by its ref
|
Use `deskctl doctor` first. It checks X11 connectivity, basic enumeration,
|
||||||
deskctl click @w1
|
screenshot viability, and socket health before you start driving the desktop.
|
||||||
|
|
||||||
# 3. type into the focused window
|
## Observe
|
||||||
deskctl type "hello world"
|
|
||||||
|
|
||||||
# 4. press a key
|
```sh
|
||||||
|
deskctl snapshot --annotate
|
||||||
|
deskctl list-windows
|
||||||
|
deskctl get active-window
|
||||||
|
deskctl get monitors
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `snapshot` when you want a screenshot artifact plus window refs. Use
|
||||||
|
`list-windows` when you only need the current window tree without writing a
|
||||||
|
screenshot.
|
||||||
|
|
||||||
|
## Target windows cleanly
|
||||||
|
|
||||||
|
Prefer explicit selectors when you need deterministic targeting:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
ref=w1
|
||||||
|
id=win1
|
||||||
|
title=Firefox
|
||||||
|
class=firefox
|
||||||
|
focused
|
||||||
|
```
|
||||||
|
|
||||||
|
Legacy refs such as `@w1` still work after `snapshot` or `list-windows`. Bare
|
||||||
|
strings like `firefox` are fuzzy matches and now fail on ambiguity.
|
||||||
|
|
||||||
|
## Wait, act, verify
|
||||||
|
|
||||||
|
The core loop is:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# observe
|
||||||
|
deskctl snapshot --annotate
|
||||||
|
|
||||||
|
# wait
|
||||||
|
deskctl wait window --selector 'title=Firefox' --timeout 10
|
||||||
|
|
||||||
|
# act
|
||||||
|
deskctl focus 'title=Firefox'
|
||||||
|
deskctl hotkey ctrl l
|
||||||
|
deskctl type "https://example.com"
|
||||||
deskctl press enter
|
deskctl press enter
|
||||||
|
|
||||||
|
# verify
|
||||||
|
deskctl wait focus --selector 'title=Firefox' --timeout 5
|
||||||
|
deskctl snapshot
|
||||||
```
|
```
|
||||||
|
|
||||||
The `--annotate` flag draws colored bounding boxes and `@wN` labels on the screenshot so agents can visually identify windows.
|
The wait commands return the matched window payload on success, so they compose
|
||||||
|
cleanly into the next action.
|
||||||
|
|
||||||
## Window refs
|
## Use `--json` when parsing matters
|
||||||
|
|
||||||
Every `snapshot` assigns refs like `@w1`, `@w2`, etc. to each visible window, ordered top-to-bottom by stacking order. Use these refs anywhere a selector is expected:
|
Every command supports `--json` and uses the same top-level envelope:
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl click @w1
|
|
||||||
deskctl focus @w3
|
|
||||||
deskctl close @w2
|
|
||||||
```
|
|
||||||
|
|
||||||
You can also select windows by name (case-insensitive substring match):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl focus "firefox"
|
|
||||||
deskctl close "terminal"
|
|
||||||
```
|
|
||||||
|
|
||||||
## JSON output
|
|
||||||
|
|
||||||
Pass `--json` for machine-readable output. This is the primary mode for agent integrations:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deskctl --json snapshot
|
|
||||||
```
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
|
@ -59,7 +81,7 @@ deskctl --json snapshot
|
||||||
"windows": [
|
"windows": [
|
||||||
{
|
{
|
||||||
"ref_id": "w1",
|
"ref_id": "w1",
|
||||||
"xcb_id": 12345678,
|
"window_id": "win1",
|
||||||
"title": "Firefox",
|
"title": "Firefox",
|
||||||
"app_name": "firefox",
|
"app_name": "firefox",
|
||||||
"x": 0,
|
"x": 0,
|
||||||
|
|
@ -74,14 +96,8 @@ deskctl --json snapshot
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Daemon lifecycle
|
Use `window_id` for stable targeting inside a live daemon session. The exact
|
||||||
|
text formatting is intentionally compact, but JSON is the parsing contract.
|
||||||
|
|
||||||
The daemon starts automatically on the first command. It keeps the X11 connection alive so repeated calls are fast. You do not need to manage it manually.
|
The full stable-vs-best-effort contract lives on the
|
||||||
|
[runtime contract](/runtime-contract) page.
|
||||||
```sh
|
|
||||||
# check if the daemon is running
|
|
||||||
deskctl daemon status
|
|
||||||
|
|
||||||
# stop it explicitly
|
|
||||||
deskctl daemon stop
|
|
||||||
```
|
|
||||||
|
|
|
||||||
177
site/src/pages/runtime-contract.mdx
Normal file
177
site/src/pages/runtime-contract.mdx
Normal file
|
|
@ -0,0 +1,177 @@
|
||||||
|
---
|
||||||
|
layout: ../layouts/DocLayout.astro
|
||||||
|
title: Runtime contract
|
||||||
|
toc: true
|
||||||
|
---
|
||||||
|
|
||||||
|
# Runtime contract
|
||||||
|
|
||||||
|
This page defines the current public output contract for `deskctl`.
|
||||||
|
|
||||||
|
It is intentionally scoped to the current Linux X11 runtime surface. It does
|
||||||
|
not promise stability for future Wayland or window-manager-specific features.
|
||||||
|
|
||||||
|
## JSON envelope
|
||||||
|
|
||||||
|
Every command supports `--json` and uses the same top-level envelope:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"success": true,
|
||||||
|
"data": {},
|
||||||
|
"error": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Stable top-level fields:
|
||||||
|
|
||||||
|
- `success`
|
||||||
|
- `data`
|
||||||
|
- `error`
|
||||||
|
|
||||||
|
If `success` is `false`, the command exits non-zero in both text mode and JSON
|
||||||
|
mode.
|
||||||
|
|
||||||
|
## Stable window fields
|
||||||
|
|
||||||
|
Whenever a response includes a window payload, these fields are stable:
|
||||||
|
|
||||||
|
- `ref_id`
|
||||||
|
- `window_id`
|
||||||
|
- `title`
|
||||||
|
- `app_name`
|
||||||
|
- `x`
|
||||||
|
- `y`
|
||||||
|
- `width`
|
||||||
|
- `height`
|
||||||
|
- `focused`
|
||||||
|
- `minimized`
|
||||||
|
|
||||||
|
`window_id` is the public session-scoped identifier for programmatic targeting.
|
||||||
|
`ref_id` is a short-lived convenience handle from the current ref map.
|
||||||
|
|
||||||
|
## Stable grouped reads
|
||||||
|
|
||||||
|
`deskctl get active-window`
|
||||||
|
|
||||||
|
- stable: `data.window`
|
||||||
|
|
||||||
|
`deskctl get monitors`
|
||||||
|
|
||||||
|
- stable: `data.count`
|
||||||
|
- stable: `data.monitors`
|
||||||
|
|
||||||
|
Stable per-monitor fields:
|
||||||
|
|
||||||
|
- `name`
|
||||||
|
- `x`
|
||||||
|
- `y`
|
||||||
|
- `width`
|
||||||
|
- `height`
|
||||||
|
- `width_mm`
|
||||||
|
- `height_mm`
|
||||||
|
- `primary`
|
||||||
|
- `automatic`
|
||||||
|
|
||||||
|
`deskctl get version`
|
||||||
|
|
||||||
|
- stable: `data.version`
|
||||||
|
- stable: `data.backend`
|
||||||
|
|
||||||
|
`deskctl get systeminfo`
|
||||||
|
|
||||||
|
- stable: `data.backend`
|
||||||
|
- stable: `data.display`
|
||||||
|
- stable: `data.session_type`
|
||||||
|
- stable: `data.session`
|
||||||
|
- stable: `data.socket_path`
|
||||||
|
- stable: `data.screen`
|
||||||
|
- stable: `data.monitor_count`
|
||||||
|
- stable: `data.monitors`
|
||||||
|
|
||||||
|
## Stable waits
|
||||||
|
|
||||||
|
`deskctl wait window`
|
||||||
|
`deskctl wait focus`
|
||||||
|
|
||||||
|
- stable: `data.wait`
|
||||||
|
- stable: `data.selector`
|
||||||
|
- stable: `data.elapsed_ms`
|
||||||
|
- stable: `data.window`
|
||||||
|
|
||||||
|
## Stable selector-driven action fields
|
||||||
|
|
||||||
|
When selector-driven actions return resolved window data, these fields are
|
||||||
|
stable when present:
|
||||||
|
|
||||||
|
- `data.ref_id`
|
||||||
|
- `data.window_id`
|
||||||
|
- `data.title`
|
||||||
|
- `data.selector`
|
||||||
|
|
||||||
|
This applies to:
|
||||||
|
|
||||||
|
- `click`
|
||||||
|
- `dblclick`
|
||||||
|
- `focus`
|
||||||
|
- `close`
|
||||||
|
- `move-window`
|
||||||
|
- `resize-window`
|
||||||
|
|
||||||
|
## Stable artifact fields
|
||||||
|
|
||||||
|
For `snapshot` and `screenshot`:
|
||||||
|
|
||||||
|
- stable: `data.screenshot`
|
||||||
|
|
||||||
|
When a command also returns windows, `data.windows` uses the stable window
|
||||||
|
payload documented above.
|
||||||
|
|
||||||
|
## Stable structured error kinds
|
||||||
|
|
||||||
|
When a command fails with structured JSON data, these error kinds are stable:
|
||||||
|
|
||||||
|
- `selector_not_found`
|
||||||
|
- `selector_ambiguous`
|
||||||
|
- `selector_invalid`
|
||||||
|
- `timeout`
|
||||||
|
- `not_found`
|
||||||
|
- `window_not_focused` in `data.last_observation.kind` or an equivalent wait
|
||||||
|
observation payload
|
||||||
|
|
||||||
|
Stable structured failure fields include:
|
||||||
|
|
||||||
|
- `data.kind`
|
||||||
|
- `data.selector`
|
||||||
|
- `data.mode`
|
||||||
|
- `data.candidates`
|
||||||
|
- `data.message`
|
||||||
|
- `data.wait`
|
||||||
|
- `data.timeout_ms`
|
||||||
|
- `data.poll_ms`
|
||||||
|
- `data.last_observation`
|
||||||
|
|
||||||
|
## Best-effort fields
|
||||||
|
|
||||||
|
These values are useful but environment-dependent and should not be treated as
|
||||||
|
strict parsing guarantees:
|
||||||
|
|
||||||
|
- exact monitor naming conventions
|
||||||
|
- EWMH/window-manager-dependent ordering details
|
||||||
|
- cosmetic text formatting in non-JSON mode
|
||||||
|
- default screenshot file names when no explicit path was provided
|
||||||
|
- stderr wording outside the structured `kind` classifications above
|
||||||
|
|
||||||
|
## Text mode expectations
|
||||||
|
|
||||||
|
Text mode is intended to stay compact and follow-up-useful.
|
||||||
|
|
||||||
|
The exact whitespace and alignment are not stable. The stable behavioral
|
||||||
|
expectations are:
|
||||||
|
|
||||||
|
- important reads print actionable identifiers or geometry
|
||||||
|
- selector failures print enough detail to recover without `--json`
|
||||||
|
- artifact-producing commands print the artifact path
|
||||||
|
- window listings print both `@wN` refs and `window_id` values
|
||||||
|
|
||||||
|
If you need strict parsing, use `--json`.
|
||||||
|
|
@ -65,6 +65,23 @@ main {
|
||||||
font-style: italic;
|
font-style: italic;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.lede {
|
||||||
|
font-size: 1.05rem;
|
||||||
|
max-width: 42rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.badges {
|
||||||
|
display: flex;
|
||||||
|
flex-wrap: wrap;
|
||||||
|
gap: 0.6rem;
|
||||||
|
margin-bottom: 1.25rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.badges a,
|
||||||
|
.badges img {
|
||||||
|
display: block;
|
||||||
|
}
|
||||||
|
|
||||||
header {
|
header {
|
||||||
display: flex;
|
display: flex;
|
||||||
align-items: center;
|
align-items: center;
|
||||||
|
|
@ -117,6 +134,10 @@ a:hover {
|
||||||
text-decoration-thickness: 2px;
|
text-decoration-thickness: 2px;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
img {
|
||||||
|
max-width: 100%;
|
||||||
|
}
|
||||||
|
|
||||||
ul,
|
ul,
|
||||||
ol {
|
ol {
|
||||||
padding-left: 1.25em;
|
padding-left: 1.25em;
|
||||||
|
|
|
||||||
|
|
@ -1,21 +1,22 @@
|
||||||
# deskctl commands
|
# deskctl commands
|
||||||
|
|
||||||
All commands support `--json` for machine-parseable output following the runtime contract.
|
All commands support `--json` for machine-parseable output following the
|
||||||
|
runtime contract.
|
||||||
|
|
||||||
## Observe
|
## Observe
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
deskctl doctor # check X11 runtime and daemon health
|
deskctl doctor
|
||||||
deskctl snapshot # screenshot + window list
|
deskctl snapshot
|
||||||
deskctl snapshot --annotate # screenshot with @wN labels overlaid
|
deskctl snapshot --annotate
|
||||||
deskctl list-windows # window list only (no screenshot)
|
deskctl list-windows
|
||||||
deskctl screenshot /tmp/screen.png # screenshot to explicit path
|
deskctl screenshot /tmp/screen.png
|
||||||
deskctl get active-window # focused window info
|
deskctl get active-window
|
||||||
deskctl get monitors # monitor geometry
|
deskctl get monitors
|
||||||
deskctl get version # version and backend
|
deskctl get version
|
||||||
deskctl get systeminfo # full runtime diagnostics
|
deskctl get systeminfo
|
||||||
deskctl get-screen-size # screen resolution
|
deskctl get-screen-size
|
||||||
deskctl get-mouse-position # cursor coordinates
|
deskctl get-mouse-position
|
||||||
```
|
```
|
||||||
|
|
||||||
## Wait
|
## Wait
|
||||||
|
|
@ -25,19 +26,21 @@ deskctl wait window --selector 'title=Firefox' --timeout 10
|
||||||
deskctl wait focus --selector 'class=firefox' --timeout 5
|
deskctl wait focus --selector 'class=firefox' --timeout 5
|
||||||
```
|
```
|
||||||
|
|
||||||
Returns the matched window payload on success. Failures include structured `kind` values in `--json` mode.
|
Returns the matched window payload on success. Failures include structured
|
||||||
|
`kind` values in `--json` mode.
|
||||||
|
|
||||||
## Selectors
|
## Selectors
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ref=w1 # snapshot ref (short-lived, from last snapshot)
|
ref=w1
|
||||||
id=win1 # stable window ID (session-scoped)
|
id=win1
|
||||||
title=Firefox # match by window title
|
title=Firefox
|
||||||
class=firefox # match by WM class
|
class=firefox
|
||||||
focused # currently focused window
|
focused
|
||||||
```
|
```
|
||||||
|
|
||||||
Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail on ambiguity.
|
Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail
|
||||||
|
on ambiguity.
|
||||||
|
|
||||||
## Act
|
## Act
|
||||||
|
|
||||||
|
|
@ -58,12 +61,5 @@ deskctl close @w3
|
||||||
deskctl launch firefox
|
deskctl launch firefox
|
||||||
```
|
```
|
||||||
|
|
||||||
## Daemon
|
The daemon starts automatically on first command. In normal usage you should
|
||||||
|
not need to manage it directly.
|
||||||
```bash
|
|
||||||
deskctl daemon start
|
|
||||||
deskctl daemon stop
|
|
||||||
deskctl daemon status
|
|
||||||
```
|
|
||||||
|
|
||||||
The daemon starts automatically on first command. Manual control is rarely needed.
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue