scaffold docs

This commit is contained in:
Harivansh Rathi 2026-03-25 16:01:43 -04:00
parent c69d0fa569
commit bc43b5878b
6 changed files with 356 additions and 78 deletions

View file

@ -8,12 +8,71 @@ toc: true
## Client-daemon model ## Client-daemon model
deskctl uses a client-daemon architecture over Unix sockets with an NDJSON wire protocol. The daemon starts automatically on the first command and keeps the X11 connection alive for fast repeated calls. deskctl uses a client-daemon architecture over Unix sockets. The daemon starts automatically on the first command and keeps the X11 connection alive so repeated calls skip the connection setup overhead.
Each command opens a new connection to the daemon, sends a single NDJSON request, reads one NDJSON response, and exits.
## Wire protocol
Requests and responses are newline-delimited JSON (NDJSON) over a Unix socket.
**Request:**
```json
{"id": "r123456", "action": "snapshot", "annotate": true}
```
**Response:**
```json
{"success": true, "data": {"screenshot": "/tmp/deskctl-1234567890.png", "windows": [...]}}
```
Error responses include an `error` field:
```json
{"success": false, "error": "window not found: @w99"}
```
## Socket location
The daemon socket is resolved in this order:
1. `--socket` flag (highest priority)
2. `$DESKCTL_SOCKET_DIR/{session}.sock`
3. `$XDG_RUNTIME_DIR/deskctl/{session}.sock`
4. `~/.deskctl/{session}.sock`
PID files are stored alongside the socket.
## Sessions
Multiple isolated daemon instances can run simultaneously using the `--session` flag:
```sh
deskctl --session workspace1 snapshot
deskctl --session workspace2 snapshot
```
Each session has its own socket, PID file, and window ref map.
## Backend design ## Backend design
The backend is trait-based, making it straightforward to add support for different display servers. The current implementation targets X11 via `x11rb`. The core is built around a `DesktopBackend` trait. The current implementation uses `x11rb` for X11 protocol operations and `enigo` for input simulation.
## Wayland support The trait-based design means adding Wayland support is a single trait implementation with no changes to the core, CLI, or daemon code.
Coming soon. The trait-based backend design means adding Hyprland/Wayland support is a single trait implementation with zero refactoring of the core. ## X11 integration
Window detection uses EWMH properties:
| Property | Purpose |
|----------|---------|
| `_NET_CLIENT_LIST_STACKING` | Window stacking order |
| `_NET_ACTIVE_WINDOW` | Currently focused window |
| `_NET_WM_NAME` | Window title (UTF-8) |
| `_NET_WM_STATE_HIDDEN` | Minimized state |
| `_NET_CLOSE_WINDOW` | Graceful close |
| `WM_CLASS` | Application class/name |
Falls back to `XQueryTree` if `_NET_CLIENT_LIST_STACKING` is unavailable.

176
site/src/pages/commands.mdx Normal file
View file

@ -0,0 +1,176 @@
---
layout: ../layouts/DocLayout.astro
title: Commands
toc: true
---
# Commands
## Snapshot
Capture a screenshot and get the window tree:
```sh
deskctl snapshot
deskctl snapshot --annotate
```
With `--annotate`, colored bounding boxes and `@wN` labels are drawn on the screenshot. Each window gets a unique color from an 8-color palette. Minimized windows are skipped.
The screenshot is saved to `/tmp/deskctl-{timestamp}.png`.
## Click
Click the center of a window by ref, or click exact coordinates:
```sh
deskctl click @w1
deskctl click 960,540
```
## Double click
```sh
deskctl dblclick @w1
deskctl dblclick 500,300
```
## Type
Type a string into the focused window:
```sh
deskctl type "hello world"
```
## Press
Press a single key:
```sh
deskctl press enter
deskctl press tab
deskctl press escape
```
Supported key names: `enter`, `tab`, `escape`, `backspace`, `delete`, `space`, `up`, `down`, `left`, `right`, `home`, `end`, `pageup`, `pagedown`, `f1`-`f12`, or any single character.
## Hotkey
Send a key combination. List modifier keys first, then the target key:
```sh
deskctl hotkey ctrl c
deskctl hotkey ctrl shift t
deskctl hotkey alt f4
```
Modifier names: `ctrl`, `alt`, `shift`, `super` (also `meta` or `win`).
## Mouse move
Move the cursor to absolute coordinates:
```sh
deskctl mouse move 100 200
```
## Mouse scroll
Scroll the mouse wheel. Positive values scroll down, negative scroll up:
```sh
deskctl mouse scroll 3
deskctl mouse scroll -5
deskctl mouse scroll 3 --axis horizontal
```
## Mouse drag
Drag from one position to another:
```sh
deskctl mouse drag 100 200 500 600
```
## Focus
Focus a window by ref or by name (case-insensitive substring match):
```sh
deskctl focus @w1
deskctl focus "firefox"
```
## Close
Close a window gracefully:
```sh
deskctl close @w2
deskctl close "terminal"
```
## Move window
Move a window to an absolute position:
```sh
deskctl move-window @w1 0 0
deskctl move-window "firefox" 100 100
```
## Resize window
Resize a window:
```sh
deskctl resize-window @w1 1280 720
```
## List windows
List all windows without taking a screenshot:
```sh
deskctl list-windows
```
## Get screen size
```sh
deskctl get-screen-size
```
## Get mouse position
```sh
deskctl get-mouse-position
```
## Screenshot
Take a screenshot without the window tree. Optionally specify a save path:
```sh
deskctl screenshot
deskctl screenshot /tmp/my-screenshot.png
deskctl screenshot --annotate
```
## Launch
Launch an application:
```sh
deskctl launch firefox
deskctl launch code --args /path/to/project
```
## Global options
| Flag | Env | Description |
|------|-----|-------------|
| `--json` | | Output as JSON |
| `--socket <path>` | `DESKCTL_SOCKET` | Path to daemon Unix socket |
| `--session <name>` | | Session name for multiple daemons (default: `default`) |

View file

@ -9,16 +9,22 @@ import DocLayout from "../layouts/DocLayout.astro";
</header> </header>
<p> <p>
X11 desktop control CLI for AI agents on Linux. Snapshot, click, type, and Desktop control CLI for AI agents on Linux X11. Compact JSON output
focus windows through a simple command-line interface with a client-daemon for agent loops. Screenshot, click, type, scroll, drag, and manage
architecture over Unix sockets. windows through a fast client-daemon architecture. 100% native Rust.
</p> </p>
<h2>Documentation</h2> <h2>Getting started</h2>
<ul> <ul>
<li><a href="/installation">Installation</a></li> <li><a href="/installation">Installation</a></li>
<li><a href="/usage">Usage</a></li> <li><a href="/quick-start">Quick start</a></li>
</ul>
<h2>Reference</h2>
<ul>
<li><a href="/commands">Commands</a></li>
<li><a href="/architecture">Architecture</a></li> <li><a href="/architecture">Architecture</a></li>
</ul> </ul>

View file

@ -1,6 +1,7 @@
--- ---
layout: ../layouts/DocLayout.astro layout: ../layouts/DocLayout.astro
title: Installation title: Installation
toc: true
--- ---
# Installation # Installation
@ -11,9 +12,17 @@ title: Installation
cargo install deskctl cargo install deskctl
``` ```
## Docker build ## From source
Build a Linux binary with Docker: ```sh
git clone https://github.com/harivansh-afk/deskctl
cd deskctl
cargo build --release
```
## Docker (cross-compile for Linux)
Build a static Linux binary from any platform:
```sh ```sh
docker compose -f docker/docker-compose.yml run --rm build docker compose -f docker/docker-compose.yml run --rm build
@ -21,25 +30,19 @@ docker compose -f docker/docker-compose.yml run --rm build
This writes `dist/deskctl-linux-x86_64`. This writes `dist/deskctl-linux-x86_64`.
## From source
```sh
git clone https://github.com/harivansh-afk/deskctl
cd deskctl
cargo build
```
## Deploy to a remote machine ## Deploy to a remote machine
Copy the binary to an SSH machine: Copy the binary over SSH when `scp` is not available:
```sh ```sh
ssh -p 443 deskctl@ssh.agentcomputer.ai 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64 ssh -p 443 user@host 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64
``` ```
## Runtime requirements ## Requirements
- Linux with X11 session - Linux with an active X11 session
- `DISPLAY` environment variable set - `DISPLAY` environment variable set (e.g. `DISPLAY=:1`)
- `XDG_SESSION_TYPE=x11` - `XDG_SESSION_TYPE=x11`
- A window manager exposing standard EWMH properties - A window manager that exposes EWMH properties (`_NET_CLIENT_LIST_STACKING`, `_NET_ACTIVE_WINDOW`)
No extra native libraries are needed beyond the standard glibc runtime (`libc`, `libm`, `libgcc_s`).

View file

@ -0,0 +1,87 @@
---
layout: ../layouts/DocLayout.astro
title: Quick start
toc: true
---
# Quick start
## Core workflow
The typical agent loop is: snapshot the desktop, interpret the result, act on it.
```sh
# 1. see the desktop
deskctl --json snapshot --annotate
# 2. click a window by its ref
deskctl click @w1
# 3. type into the focused window
deskctl type "hello world"
# 4. press a key
deskctl press enter
```
The `--annotate` flag draws colored bounding boxes and `@wN` labels on the screenshot so agents can visually identify windows.
## Window refs
Every `snapshot` assigns refs like `@w1`, `@w2`, etc. to each visible window, ordered top-to-bottom by stacking order. Use these refs anywhere a selector is expected:
```sh
deskctl click @w1
deskctl focus @w3
deskctl close @w2
```
You can also select windows by name (case-insensitive substring match):
```sh
deskctl focus "firefox"
deskctl close "terminal"
```
## JSON output
Pass `--json` for machine-readable output. This is the primary mode for agent integrations:
```sh
deskctl --json snapshot
```
```json
{
"success": true,
"data": {
"screenshot": "/tmp/deskctl-1234567890.png",
"windows": [
{
"ref_id": "w1",
"xcb_id": 12345678,
"title": "Firefox",
"app_name": "firefox",
"x": 0,
"y": 0,
"width": 1920,
"height": 1080,
"focused": true,
"minimized": false
}
]
}
}
```
## Daemon lifecycle
The daemon starts automatically on the first command. It keeps the X11 connection alive so repeated calls are fast. You do not need to manage it manually.
```sh
# check if the daemon is running
deskctl daemon status
# stop it explicitly
deskctl daemon stop
```

View file

@ -1,53 +0,0 @@
---
layout: ../layouts/DocLayout.astro
title: Usage
toc: true
---
# Usage
## Snapshot
Capture the current desktop state:
```sh
deskctl snapshot
```
With annotations overlaid on windows:
```sh
deskctl --json snapshot --annotate
```
## Click
Click a window by its annotation handle:
```sh
deskctl click @w1
```
## Type
Type text into the focused window:
```sh
deskctl type "hello world"
```
## Focus
Focus a window by name:
```sh
deskctl focus "firefox"
```
## JSON output
Pass `--json` for machine-readable output, useful for agent integrations:
```sh
deskctl --json snapshot
```