deskctl/skills/deskctl/SKILL.md

1.9 KiB

name description allowed-tools
deskctl Non-interactive X11 desktop control for AI agents. Use when the task involves controlling a Linux desktop - clicking, typing, reading windows, waiting for UI state, or taking screenshots inside a sandbox or VM. Bash(deskctl:*), Bash(npx deskctl-cli:*), Bash(npm:*), Bash(which:*), Bash(printenv:*), Bash(echo:*)

deskctl

Non-interactive desktop control CLI for Linux X11 agents.

All output follows the runtime contract defined in references/runtime-contract.md. Every command returns a stable JSON envelope when called with --json. Use --json whenever you need to parse output programmatically.

Quick start

npm install -g deskctl-cli
deskctl doctor
deskctl snapshot --annotate

Agent loop

Every desktop interaction follows: observe -> wait -> act -> verify.

deskctl snapshot --annotate        # observe
deskctl wait window --selector 'title=Firefox' --timeout 10  # wait
deskctl click 'title=Firefox'      # act
deskctl snapshot                   # verify

See workflows/observe-act.sh for a reusable script. See workflows/poll-condition.sh for polling loops.

Selectors

ref=w1          # snapshot ref (short-lived)
id=win1         # stable window ID (session-scoped)
title=Firefox   # match by title
class=firefox   # match by WM class
focused         # currently focused window

Bare strings like firefox do fuzzy matching but fail on ambiguity. Prefer explicit selectors.

References

Workflows