Desktop control CLI for AI agents https://deskctl.dev
Find a file
2026-03-26 00:33:27 -04:00
.github/workflows nix (#7) 2026-03-25 23:18:28 -04:00
assets Phase 3: screenshot annotation with bounding boxes and @wN labels 2026-03-24 21:28:10 -04:00
docker docker-compose build 2026-03-25 12:40:14 -04:00
docs init with runtime contract 2026-03-26 00:33:27 -04:00
npm/deskctl-cli release: v0.1.6 [skip ci] 2026-03-26 03:25:14 +00:00
site tests and tooling (#4) 2026-03-25 19:29:59 -04:00
skills/deskctl skill validated with workflows 2026-03-26 00:33:27 -04:00
src runtime contract enforcement (#6) 2026-03-25 22:00:16 -04:00
tests grouped runtime reads and waits selector modes (#5) 2026-03-25 21:11:30 -04:00
.dockerignore docker-compose build 2026-03-25 12:40:14 -04:00
.gitignore nix (#7) 2026-03-25 23:18:28 -04:00
.pre-commit-config.yaml tests and tooling (#4) 2026-03-25 19:29:59 -04:00
AGENTS.md Phase 6: utility commands, SKILL.md, AGENTS.md, README.md 2026-03-24 21:40:29 -04:00
Cargo.lock release: v0.1.6 [skip ci] 2026-03-26 03:25:14 +00:00
Cargo.toml release: v0.1.6 [skip ci] 2026-03-26 03:25:14 +00:00
CONTRIBUTING.md init with runtime contract 2026-03-26 00:33:27 -04:00
flake.lock nix (#7) 2026-03-25 23:18:28 -04:00
flake.nix nix (#7) 2026-03-25 23:18:28 -04:00
LICENCE licence 2026-03-25 19:34:36 -04:00
Makefile nix (#7) 2026-03-25 23:18:28 -04:00
README.md init with runtime contract 2026-03-26 00:33:27 -04:00

deskctl

Desktop control CLI for AI agents on Linux X11.

Install

Cargo

cargo install deskctl

Source builds on Linux require:

  • Rust 1.75+
  • pkg-config
  • X11 development libraries for input and windowing, typically libx11-dev and libxtst-dev on Debian/Ubuntu

npm

npm install -g deskctl-cli
deskctl --help

One-shot execution is also supported:

npx deskctl-cli --help

deskctl-cli currently supports linux-x64 and installs the deskctl command by downloading the matching GitHub Release asset.

Installable skill

For skills.sh / agent skill ecosystems:

npx skills add harivansh-afk/deskctl -s deskctl

The installable skill lives under skills/deskctl and is designed for X11 sandboxes, VMs, and sandbox-agent desktop sessions. It points agents to the npm install path first so they can get deskctl without Cargo.

Nix

nix run github:harivansh-afk/deskctl -- --help
nix profile install github:harivansh-afk/deskctl

The repo flake is the supported Nix install surface in this phase.

Docker Convenience

Build a Linux binary locally with Docker:

docker compose -f docker/docker-compose.yml run --rm build

This writes dist/deskctl-linux-x86_64.

Copy it to an SSH machine where scp is unavailable:

ssh -p 443 deskctl@ssh.agentcomputer.ai 'cat > ~/deskctl && chmod +x ~/deskctl' < dist/deskctl-linux-x86_64

Run it on an X11 session:

DISPLAY=:1 XDG_SESSION_TYPE=x11 ~/deskctl --json snapshot --annotate

Local Source Build

cargo build

Quick Start

# Diagnose the environment first
deskctl doctor

# See the desktop
deskctl snapshot

# Query focused runtime state
deskctl get active-window
deskctl get monitors

# Click a window
deskctl click @w1

# Type text
deskctl type "hello world"

# Wait for a window or focus transition
deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'class=firefox' --timeout 5

# Focus by explicit selector
deskctl focus 'title=Firefox'

Architecture

Client-daemon architecture over Unix sockets (NDJSON wire protocol). The daemon starts automatically on first command and keeps the X11 connection alive for fast repeated calls.

Source layout:

  • src/lib.rs exposes the shared library target
  • src/main.rs is the thin CLI wrapper
  • src/ contains production code and unit tests
  • tests/ contains Linux/X11 integration tests
  • tests/support/ contains shared integration helpers

Runtime Requirements

  • Linux with X11 session
  • Rust 1.75+ plus the source-build dependencies above when building from source

The binary itself only links the standard glibc runtime on Linux (libc, libm, libgcc_s).

For deskctl to be fully functional on a fresh VM you still need:

  • an X11 server and an active DISPLAY
  • XDG_SESSION_TYPE=x11 or an equivalent X11 session environment
  • a window manager or desktop environment that exposes standard EWMH properties such as _NET_CLIENT_LIST_STACKING and _NET_ACTIVE_WINDOW
  • an X server with the extensions needed for input simulation and screen metadata, which is standard on normal desktop X11 setups

If setup fails, run:

deskctl doctor

Contract Notes

  • @wN refs are short-lived handles assigned by snapshot and list-windows
  • --json output includes a stable window_id for programmatic targeting within the current daemon session
  • list-windows is a cheap read-only operation and does not capture or write a screenshot
  • the stable runtime JSON/error contract is documented in docs/runtime-contract.md

Read and Wait Surface

The grouped runtime reads are:

deskctl get active-window
deskctl get monitors
deskctl get version
deskctl get systeminfo

The grouped runtime waits are:

deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'id=win3' --timeout 5

Successful get active-window, wait window, and wait focus responses return a window payload with:

  • ref_id
  • window_id
  • title
  • app_name
  • geometry (x, y, width, height)
  • state flags (focused, minimized)

get monitors returns:

  • count
  • monitors[] with geometry and primary/automatic flags

get version returns:

  • version
  • backend

get systeminfo stays runtime-scoped and returns:

  • backend
  • display
  • session_type
  • session
  • socket_path
  • screen
  • monitor_count
  • monitors

Wait timeout and selector failures are structured in --json mode so agents can recover without string parsing.

Output Policy

Text mode is compact and follow-up-oriented, but JSON is the parsing contract.

  • use --json when an agent needs strict parsing
  • rely on window_id, selector-related fields, grouped read payloads, and structured error kind values for stable automation
  • treat monitor naming, incidental whitespace, and default screenshot file names as best-effort

See docs/runtime-conract.md for the exact stable-vs-best-effort breakdown.

Distribution

  • GitHub Releases are the canonical binary source
  • crates.io package: deskctl
  • npm package: deskctl-cli
  • installed command on every channel: deskctl
  • repo-owned Nix install path: flake.nix

For maintainer publishing and release steps, see docs/releasing.md.

Selector Contract

Explicit selector modes:

ref=w1
id=win1
title=Firefox
class=firefox
focused

Legacy refs remain supported:

@w1
w1
win1

Bare selectors such as firefox are still supported as fuzzy substring matches, but they now fail on ambiguity and return candidate windows instead of silently picking the first match.

Support Boundary

deskctl supports Linux X11 in this phase. Wayland and Hyprland are explicitly out of scope for the current runtime contract.

Workflow

Local validation uses the root Makefile:

make fmt-check
make lint
make test-unit
make test-integration
make site-format-check
make validate

make validate is the full repo-quality check and requires Linux with xvfb-run plus pnpm --dir site install.

The repository standardizes on pre-commit for fast commit-time checks:

pre-commit install
pre-commit run --all-files

See CONTRIBUTING.md for the full contributor guide.

Acknowledgements