From e3c96878b19d27b1aefeb1e4d6f4f954318b370c Mon Sep 17 00:00:00 2001 From: Harivansh Rathi Date: Wed, 25 Mar 2026 16:01:43 -0400 Subject: [PATCH] scaffold docs --- site/src/pages/architecture.mdx | 67 +++++++++++- site/src/pages/commands.mdx | 176 ++++++++++++++++++++++++++++++++ site/src/pages/index.astro | 16 ++- site/src/pages/installation.mdx | 35 ++++--- site/src/pages/quick-start.mdx | 87 ++++++++++++++++ site/src/pages/usage.mdx | 53 ---------- 6 files changed, 356 insertions(+), 78 deletions(-) create mode 100644 site/src/pages/commands.mdx create mode 100644 site/src/pages/quick-start.mdx delete mode 100644 site/src/pages/usage.mdx diff --git a/site/src/pages/architecture.mdx b/site/src/pages/architecture.mdx index 064f995..ec874e2 100644 --- a/site/src/pages/architecture.mdx +++ b/site/src/pages/architecture.mdx @@ -8,12 +8,71 @@ toc: true ## Client-daemon model -deskctl uses a client-daemon architecture over Unix sockets with an NDJSON wire protocol. The daemon starts automatically on the first command and keeps the X11 connection alive for fast repeated calls. +deskctl uses a client-daemon architecture over Unix sockets. The daemon starts automatically on the first command and keeps the X11 connection alive so repeated calls skip the connection setup overhead. + +Each command opens a new connection to the daemon, sends a single NDJSON request, reads one NDJSON response, and exits. + +## Wire protocol + +Requests and responses are newline-delimited JSON (NDJSON) over a Unix socket. + +**Request:** + +```json +{"id": "r123456", "action": "snapshot", "annotate": true} +``` + +**Response:** + +```json +{"success": true, "data": {"screenshot": "/tmp/deskctl-1234567890.png", "windows": [...]}} +``` + +Error responses include an `error` field: + +```json +{"success": false, "error": "window not found: @w99"} +``` + +## Socket location + +The daemon socket is resolved in this order: + +1. `--socket` flag (highest priority) +2. `$DESKCTL_SOCKET_DIR/{session}.sock` +3. `$XDG_RUNTIME_DIR/deskctl/{session}.sock` +4. `~/.deskctl/{session}.sock` + +PID files are stored alongside the socket. + +## Sessions + +Multiple isolated daemon instances can run simultaneously using the `--session` flag: + +```sh +deskctl --session workspace1 snapshot +deskctl --session workspace2 snapshot +``` + +Each session has its own socket, PID file, and window ref map. ## Backend design -The backend is trait-based, making it straightforward to add support for different display servers. The current implementation targets X11 via `x11rb`. +The core is built around a `DesktopBackend` trait. The current implementation uses `x11rb` for X11 protocol operations and `enigo` for input simulation. -## Wayland support +The trait-based design means adding Wayland support is a single trait implementation with no changes to the core, CLI, or daemon code. -Coming soon. The trait-based backend design means adding Hyprland/Wayland support is a single trait implementation with zero refactoring of the core. +## X11 integration + +Window detection uses EWMH properties: + +| Property | Purpose | +|----------|---------| +| `_NET_CLIENT_LIST_STACKING` | Window stacking order | +| `_NET_ACTIVE_WINDOW` | Currently focused window | +| `_NET_WM_NAME` | Window title (UTF-8) | +| `_NET_WM_STATE_HIDDEN` | Minimized state | +| `_NET_CLOSE_WINDOW` | Graceful close | +| `WM_CLASS` | Application class/name | + +Falls back to `XQueryTree` if `_NET_CLIENT_LIST_STACKING` is unavailable. diff --git a/site/src/pages/commands.mdx b/site/src/pages/commands.mdx new file mode 100644 index 0000000..bd639c7 --- /dev/null +++ b/site/src/pages/commands.mdx @@ -0,0 +1,176 @@ +--- +layout: ../layouts/DocLayout.astro +title: Commands +toc: true +--- + +# Commands + +## Snapshot + +Capture a screenshot and get the window tree: + +```sh +deskctl snapshot +deskctl snapshot --annotate +``` + +With `--annotate`, colored bounding boxes and `@wN` labels are drawn on the screenshot. Each window gets a unique color from an 8-color palette. Minimized windows are skipped. + +The screenshot is saved to `/tmp/deskctl-{timestamp}.png`. + +## Click + +Click the center of a window by ref, or click exact coordinates: + +```sh +deskctl click @w1 +deskctl click 960,540 +``` + +## Double click + +```sh +deskctl dblclick @w1 +deskctl dblclick 500,300 +``` + +## Type + +Type a string into the focused window: + +```sh +deskctl type "hello world" +``` + +## Press + +Press a single key: + +```sh +deskctl press enter +deskctl press tab +deskctl press escape +``` + +Supported key names: `enter`, `tab`, `escape`, `backspace`, `delete`, `space`, `up`, `down`, `left`, `right`, `home`, `end`, `pageup`, `pagedown`, `f1`-`f12`, or any single character. + +## Hotkey + +Send a key combination. List modifier keys first, then the target key: + +```sh +deskctl hotkey ctrl c +deskctl hotkey ctrl shift t +deskctl hotkey alt f4 +``` + +Modifier names: `ctrl`, `alt`, `shift`, `super` (also `meta` or `win`). + +## Mouse move + +Move the cursor to absolute coordinates: + +```sh +deskctl mouse move 100 200 +``` + +## Mouse scroll + +Scroll the mouse wheel. Positive values scroll down, negative scroll up: + +```sh +deskctl mouse scroll 3 +deskctl mouse scroll -5 +deskctl mouse scroll 3 --axis horizontal +``` + +## Mouse drag + +Drag from one position to another: + +```sh +deskctl mouse drag 100 200 500 600 +``` + +## Focus + +Focus a window by ref or by name (case-insensitive substring match): + +```sh +deskctl focus @w1 +deskctl focus "firefox" +``` + +## Close + +Close a window gracefully: + +```sh +deskctl close @w2 +deskctl close "terminal" +``` + +## Move window + +Move a window to an absolute position: + +```sh +deskctl move-window @w1 0 0 +deskctl move-window "firefox" 100 100 +``` + +## Resize window + +Resize a window: + +```sh +deskctl resize-window @w1 1280 720 +``` + +## List windows + +List all windows without taking a screenshot: + +```sh +deskctl list-windows +``` + +## Get screen size + +```sh +deskctl get-screen-size +``` + +## Get mouse position + +```sh +deskctl get-mouse-position +``` + +## Screenshot + +Take a screenshot without the window tree. Optionally specify a save path: + +```sh +deskctl screenshot +deskctl screenshot /tmp/my-screenshot.png +deskctl screenshot --annotate +``` + +## Launch + +Launch an application: + +```sh +deskctl launch firefox +deskctl launch code --args /path/to/project +``` + +## Global options + +| Flag | Env | Description | +|------|-----|-------------| +| `--json` | | Output as JSON | +| `--socket ` | `DESKCTL_SOCKET` | Path to daemon Unix socket | +| `--session ` | | Session name for multiple daemons (default: `default`) | diff --git a/site/src/pages/index.astro b/site/src/pages/index.astro index 607835e..8fcd07c 100644 --- a/site/src/pages/index.astro +++ b/site/src/pages/index.astro @@ -9,16 +9,22 @@ import DocLayout from "../layouts/DocLayout.astro";

- X11 desktop control CLI for AI agents on Linux. Snapshot, click, type, and - focus windows through a simple command-line interface with a client-daemon - architecture over Unix sockets. + Desktop control CLI for AI agents on Linux X11. Compact JSON output + for agent loops. Screenshot, click, type, scroll, drag, and manage + windows through a fast client-daemon architecture. 100% native Rust.

-

Documentation

+

Getting started

+ +

Reference

+ + diff --git a/site/src/pages/installation.mdx b/site/src/pages/installation.mdx index faeca27..e05772d 100644 --- a/site/src/pages/installation.mdx +++ b/site/src/pages/installation.mdx @@ -1,6 +1,7 @@ --- layout: ../layouts/DocLayout.astro title: Installation +toc: true --- # Installation @@ -11,9 +12,17 @@ title: Installation cargo install deskctl ``` -## Docker build +## From source -Build a Linux binary with Docker: +```sh +git clone https://github.com/harivansh-afk/deskctl +cd deskctl +cargo build --release +``` + +## Docker (cross-compile for Linux) + +Build a static Linux binary from any platform: ```sh docker compose -f docker/docker-compose.yml run --rm build @@ -21,25 +30,19 @@ docker compose -f docker/docker-compose.yml run --rm build This writes `dist/deskctl-linux-x86_64`. -## From source - -```sh -git clone https://github.com/harivansh-afk/deskctl -cd deskctl -cargo build -``` - ## Deploy to a remote machine -Copy the binary to an SSH machine: +Copy the binary over SSH when `scp` is not available: ```sh -ssh -p 443 deskctl@ssh.agentcomputer.ai 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64 +ssh -p 443 user@host 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64 ``` -## Runtime requirements +## Requirements -- Linux with X11 session -- `DISPLAY` environment variable set +- Linux with an active X11 session +- `DISPLAY` environment variable set (e.g. `DISPLAY=:1`) - `XDG_SESSION_TYPE=x11` -- A window manager exposing standard EWMH properties +- A window manager that exposes EWMH properties (`_NET_CLIENT_LIST_STACKING`, `_NET_ACTIVE_WINDOW`) + +No extra native libraries are needed beyond the standard glibc runtime (`libc`, `libm`, `libgcc_s`). diff --git a/site/src/pages/quick-start.mdx b/site/src/pages/quick-start.mdx new file mode 100644 index 0000000..7f3bc07 --- /dev/null +++ b/site/src/pages/quick-start.mdx @@ -0,0 +1,87 @@ +--- +layout: ../layouts/DocLayout.astro +title: Quick start +toc: true +--- + +# Quick start + +## Core workflow + +The typical agent loop is: snapshot the desktop, interpret the result, act on it. + +```sh +# 1. see the desktop +deskctl --json snapshot --annotate + +# 2. click a window by its ref +deskctl click @w1 + +# 3. type into the focused window +deskctl type "hello world" + +# 4. press a key +deskctl press enter +``` + +The `--annotate` flag draws colored bounding boxes and `@wN` labels on the screenshot so agents can visually identify windows. + +## Window refs + +Every `snapshot` assigns refs like `@w1`, `@w2`, etc. to each visible window, ordered top-to-bottom by stacking order. Use these refs anywhere a selector is expected: + +```sh +deskctl click @w1 +deskctl focus @w3 +deskctl close @w2 +``` + +You can also select windows by name (case-insensitive substring match): + +```sh +deskctl focus "firefox" +deskctl close "terminal" +``` + +## JSON output + +Pass `--json` for machine-readable output. This is the primary mode for agent integrations: + +```sh +deskctl --json snapshot +``` + +```json +{ + "success": true, + "data": { + "screenshot": "/tmp/deskctl-1234567890.png", + "windows": [ + { + "ref_id": "w1", + "xcb_id": 12345678, + "title": "Firefox", + "app_name": "firefox", + "x": 0, + "y": 0, + "width": 1920, + "height": 1080, + "focused": true, + "minimized": false + } + ] + } +} +``` + +## Daemon lifecycle + +The daemon starts automatically on the first command. It keeps the X11 connection alive so repeated calls are fast. You do not need to manage it manually. + +```sh +# check if the daemon is running +deskctl daemon status + +# stop it explicitly +deskctl daemon stop +``` diff --git a/site/src/pages/usage.mdx b/site/src/pages/usage.mdx deleted file mode 100644 index 43118f6..0000000 --- a/site/src/pages/usage.mdx +++ /dev/null @@ -1,53 +0,0 @@ ---- -layout: ../layouts/DocLayout.astro -title: Usage -toc: true ---- - -# Usage - -## Snapshot - -Capture the current desktop state: - -```sh -deskctl snapshot -``` - -With annotations overlaid on windows: - -```sh -deskctl --json snapshot --annotate -``` - -## Click - -Click a window by its annotation handle: - -```sh -deskctl click @w1 -``` - -## Type - -Type text into the focused window: - -```sh -deskctl type "hello world" -``` - -## Focus - -Focus a window by name: - -```sh -deskctl focus "firefox" -``` - -## JSON output - -Pass `--json` for machine-readable output, useful for agent integrations: - -```sh -deskctl --json snapshot -```