6.4 KiB
deskctl
Desktop control CLI for AI agents on Linux X11.
Install
Cargo
cargo install deskctl
Source builds on Linux require:
- Rust 1.75+
pkg-config- X11 development libraries for input and windowing, typically
libx11-devandlibxtst-devon Debian/Ubuntu
npm
npm install -g deskctl-cli
deskctl --help
One-shot execution is also supported:
npx deskctl-cli --help
deskctl-cli currently supports linux-x64 and installs the deskctl command by downloading the matching GitHub Release asset.
Installable skill
For skills.sh / agent skill ecosystems:
npx skills add harivansh-afk/deskctl -s deskctl
The installable skill lives under skills/deskctl and is designed for X11 sandboxes, VMs, and sandbox-agent desktop sessions. It points agents to the npm install path first so they can get deskctl without Cargo.
Nix
nix run github:harivansh-afk/deskctl -- --help
nix profile install github:harivansh-afk/deskctl
The repo flake is the supported Nix install surface in this phase.
Docker Convenience
Build a Linux binary locally with Docker:
docker compose -f docker/docker-compose.yml run --rm build
This writes dist/deskctl-linux-x86_64.
Copy it to an SSH machine where scp is unavailable:
ssh -p 443 deskctl@ssh.agentcomputer.ai 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64
Run it on an X11 session:
DISPLAY=:1 XDG_SESSION_TYPE=x11 ~/deskctl --json snapshot --annotate
Local Source Build
cargo build
Quick Start
# Diagnose the environment first
deskctl doctor
# See the desktop
deskctl snapshot
# Query focused runtime state
deskctl get active-window
deskctl get monitors
# Click a window
deskctl click @w1
# Type text
deskctl type "hello world"
# Wait for a window or focus transition
deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'class=firefox' --timeout 5
# Focus by explicit selector
deskctl focus 'title=Firefox'
Architecture
Client-daemon architecture over Unix sockets (NDJSON wire protocol). The daemon starts automatically on first command and keeps the X11 connection alive for fast repeated calls.
Source layout:
src/lib.rsexposes the shared library targetsrc/main.rsis the thin CLI wrappersrc/contains production code and unit teststests/contains Linux/X11 integration teststests/support/contains shared integration helpers
Runtime Requirements
- Linux with X11 session
- Rust 1.75+ plus the source-build dependencies above when building from source
The binary itself only links the standard glibc runtime on Linux (libc, libm, libgcc_s).
For deskctl to be fully functional on a fresh VM you still need:
- an X11 server and an active
DISPLAY XDG_SESSION_TYPE=x11or an equivalent X11 session environment- a window manager or desktop environment that exposes standard EWMH properties such as
_NET_CLIENT_LIST_STACKINGand_NET_ACTIVE_WINDOW - an X server with the extensions needed for input simulation and screen metadata, which is standard on normal desktop X11 setups
If setup fails, run:
deskctl doctor
Contract Notes
@wNrefs are short-lived handles assigned bysnapshotandlist-windows--jsonoutput includes a stablewindow_idfor programmatic targeting within the current daemon sessionlist-windowsis a cheap read-only operation and does not capture or write a screenshot- the stable runtime JSON/error contract is documented in docs/runtime-contract.md
Read and Wait Surface
The grouped runtime reads are:
deskctl get active-window
deskctl get monitors
deskctl get version
deskctl get systeminfo
The grouped runtime waits are:
deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'id=win3' --timeout 5
Successful get active-window, wait window, and wait focus responses return a window payload with:
ref_idwindow_idtitleapp_name- geometry (
x,y,width,height) - state flags (
focused,minimized)
get monitors returns:
countmonitors[]with geometry and primary/automatic flags
get version returns:
versionbackend
get systeminfo stays runtime-scoped and returns:
backenddisplaysession_typesessionsocket_pathscreenmonitor_countmonitors
Wait timeout and selector failures are structured in --json mode so agents can recover without string parsing.
Output Policy
Text mode is compact and follow-up-oriented, but JSON is the parsing contract.
- use
--jsonwhen an agent needs strict parsing - rely on
window_id, selector-related fields, grouped read payloads, and structured errorkindvalues for stable automation - treat monitor naming, incidental whitespace, and default screenshot file names as best-effort
See docs/runtime-conract.md for the exact stable-vs-best-effort breakdown.
Distribution
- GitHub Releases are the canonical binary source
- crates.io package:
deskctl - npm package:
deskctl-cli - installed command on every channel:
deskctl - repo-owned Nix install path:
flake.nix
For maintainer publishing and release steps, see docs/releasing.md.
Selector Contract
Explicit selector modes:
ref=w1
id=win1
title=Firefox
class=firefox
focused
Legacy refs remain supported:
@w1
w1
win1
Bare selectors such as firefox are still supported as fuzzy substring matches, but they now fail on ambiguity and return candidate windows instead of silently picking the first match.
Support Boundary
deskctl supports Linux X11 in this phase. Wayland and Hyprland are explicitly out of scope for the current runtime contract.
Workflow
Local validation uses the root Makefile:
make fmt-check
make lint
make test-unit
make test-integration
make site-format-check
make validate
make validate is the full repo-quality check and requires Linux with xvfb-run plus pnpm --dir site install.
The repository standardizes on pre-commit for fast commit-time checks:
pre-commit install
pre-commit run --all-files
See CONTRIBUTING.md for the full contributor guide.
Acknowledgements
- @barrettruth - i stole the website from vimdoc