skill validated with workflows

This commit is contained in:
Harivansh Rathi 2026-03-26 00:30:05 -04:00 committed by Hari
parent 3dbd9ce52d
commit c37589ccf4
9 changed files with 134 additions and 308 deletions

View file

@ -1,132 +1,54 @@
--- ---
name: deskctl name: deskctl
description: Desktop control CLI for AI agents on Linux X11. Use when operating an X11 desktop in a sandbox, VM, or sandbox-agent session via screenshots, grouped get/wait commands, selectors, and mouse or keyboard input. Prefer this skill when the task is "control the desktop", "inspect windows", "wait for a window", "click/type in the sandbox desktop", or "use deskctl inside sandbox-agent". description: Non-interactive X11 desktop control for AI agents. Use when the task involves controlling a Linux desktop - clicking, typing, reading windows, waiting for UI state, or taking screenshots inside a sandbox or VM.
allowed-tools: Bash(deskctl:*), Bash(npx deskctl-cli:*), Bash(npm:*), Bash(which:*), Bash(printenv:*), Bash(echo:*), Bash(sandbox-agent:*) allowed-tools: Bash(deskctl:*), Bash(npx deskctl-cli:*), Bash(npm:*), Bash(which:*), Bash(printenv:*), Bash(echo:*)
--- ---
# deskctl # deskctl
`deskctl` is a non-interactive desktop control CLI for Linux X11 agents. It works well inside sandbox-agent desktop environments because it gives agents a tight `observe -> wait -> act -> verify` loop. Non-interactive desktop control CLI for Linux X11 agents.
## Install skill (optional) All output follows the runtime contract defined in [references/runtime-contract.md](references/runtime-contract.md). Every command returns a stable JSON envelope when called with `--json`. Use `--json` whenever you need to parse output programmatically.
### npx ## Quick start
```bash
npx skills add harivansh-afk/deskctl -s deskctl
```
### bunx
```bash
bunx skills add harivansh-afk/deskctl -s deskctl
```
## Install the CLI
Preferred install path:
```bash ```bash
npm install -g deskctl-cli npm install -g deskctl-cli
deskctl --help
```
If global npm installs are not writable, use a user prefix:
```bash
mkdir -p "$HOME/.local/bin"
npm install -g --prefix "$HOME/.local" deskctl-cli
export PATH="$HOME/.local/bin:$PATH"
deskctl --help
```
One-shot usage also works:
```bash
npx deskctl-cli --help
```
For install details and fallback paths, see [references/install.md](references/install.md).
## Sandbox-Agent Notes
Before using `deskctl` inside sandbox-agent:
1. Make sure the sandbox has desktop runtime packages installed.
2. Make sure the session is actually running X11.
3. Run `deskctl doctor` before trying to click or type.
Typical sandbox-agent prep:
```bash
sandbox-agent install desktop --yes
deskctl doctor
```
If `doctor` fails, inspect `DISPLAY`, `XDG_SESSION_TYPE`, and whether the sandbox actually has a desktop session. See [references/sandbox-agent.md](references/sandbox-agent.md).
## Core Workflow
Every desktop task should follow this loop:
1. **Observe**
2. **Target**
3. **Wait**
4. **Act**
5. **Verify**
```bash
deskctl doctor deskctl doctor
deskctl snapshot --annotate deskctl snapshot --annotate
deskctl get active-window
deskctl wait window --selector 'class=firefox' --timeout 10
deskctl focus 'class=firefox'
deskctl hotkey ctrl l
deskctl type "https://example.com"
deskctl press enter
deskctl snapshot
``` ```
## What To Reach For First ## Agent loop
- `deskctl doctor` Every desktop interaction follows: **observe -> wait -> act -> verify**.
- `deskctl snapshot --annotate`
- `deskctl list-windows`
- `deskctl get active-window`
- `deskctl wait window --selector ...`
- `deskctl wait focus --selector ...`
Use `--json` when you need strict parsing. Use explicit selectors when you need deterministic targeting.
## Selector Rules
Prefer explicit selectors:
```bash ```bash
ref=w1 deskctl snapshot --annotate # observe
id=win1 deskctl wait window --selector 'title=Firefox' --timeout 10 # wait
title=Firefox deskctl click 'title=Firefox' # act
class=firefox deskctl snapshot # verify
focused
``` ```
Legacy refs still work: See [workflows/observe-act.sh](workflows/observe-act.sh) for a reusable script. See [workflows/poll-condition.sh](workflows/poll-condition.sh) for polling loops.
## Selectors
```bash ```bash
@w1 ref=w1 # snapshot ref (short-lived)
w1 id=win1 # stable window ID (session-scoped)
win1 title=Firefox # match by title
class=firefox # match by WM class
focused # currently focused window
``` ```
Bare strings such as `firefox` are fuzzy substring selectors. They fail on ambiguity instead of silently picking the wrong window. Bare strings like `firefox` do fuzzy matching but fail on ambiguity. Prefer explicit selectors.
## References ## References
- [references/install.md](references/install.md) - install paths, npm-first bootstrap, runtime prerequisites - [references/runtime-contract.md](references/runtime-contract.md) - output contract, stable fields, error kinds
- [references/commands.md](references/commands.md) - grouped reads, waits, selectors, and core action commands - [references/commands.md](references/commands.md) - all available commands
- [references/sandbox-agent.md](references/sandbox-agent.md) - using `deskctl` inside sandbox-agent desktop sessions
## Templates ## Workflows
- [templates/install-deskctl-npm.sh](templates/install-deskctl-npm.sh) - install `deskctl-cli` into a user prefix - [workflows/observe-act.sh](workflows/observe-act.sh) - main observe-act loop
- [templates/sandbox-agent-desktop-loop.sh](templates/sandbox-agent-desktop-loop.sh) - minimal observe/wait/act loop for desktop tasks - [workflows/poll-condition.sh](workflows/poll-condition.sh) - poll for a condition on screen

View file

@ -1,21 +1,23 @@
# deskctl command guide # deskctl commands
All commands support `--json` for machine-parseable output following the runtime contract.
## Observe ## Observe
```bash ```bash
deskctl doctor deskctl doctor # check X11 runtime and daemon health
deskctl snapshot deskctl snapshot # screenshot + window list
deskctl snapshot --annotate deskctl snapshot --annotate # screenshot with @wN labels overlaid
deskctl list-windows deskctl list-windows # window list only (no screenshot)
deskctl screenshot /tmp/current.png deskctl screenshot /tmp/screen.png # screenshot to explicit path
deskctl get active-window deskctl get active-window # focused window info
deskctl get monitors deskctl get monitors # monitor geometry
deskctl get version deskctl get version # version and backend
deskctl get systeminfo deskctl get systeminfo # full runtime diagnostics
deskctl get-screen-size # screen resolution
deskctl get-mouse-position # cursor coordinates
``` ```
Use `snapshot --annotate` when you need both the screenshot artifact and the short `@wN` labels. Use `list-windows` when you only need the window tree and do not want screenshot side effects.
## Wait ## Wait
```bash ```bash
@ -23,29 +25,19 @@ deskctl wait window --selector 'title=Firefox' --timeout 10
deskctl wait focus --selector 'class=firefox' --timeout 5 deskctl wait focus --selector 'class=firefox' --timeout 5
``` ```
Wait commands return the matched window payload on success. In `--json` mode, failures include structured `kind` values so the caller can recover without string parsing. Returns the matched window payload on success. Failures include structured `kind` values in `--json` mode.
## Selectors ## Selectors
Prefer explicit selectors:
```bash ```bash
ref=w1 ref=w1 # snapshot ref (short-lived, from last snapshot)
id=win1 id=win1 # stable window ID (session-scoped)
title=Firefox title=Firefox # match by window title
class=firefox class=firefox # match by WM class
focused focused # currently focused window
``` ```
Legacy refs still work: Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail on ambiguity.
```bash
@w1
w1
win1
```
Bare fuzzy selectors such as `firefox` are supported, but they fail on ambiguity.
## Act ## Act
@ -58,6 +50,7 @@ deskctl press enter
deskctl hotkey ctrl shift t deskctl hotkey ctrl shift t
deskctl mouse move 500 300 deskctl mouse move 500 300
deskctl mouse scroll 3 deskctl mouse scroll 3
deskctl mouse scroll 3 --axis horizontal
deskctl mouse drag 100 100 500 500 deskctl mouse drag 100 100 500 500
deskctl move-window @w1 100 120 deskctl move-window @w1 100 120
deskctl resize-window @w1 1280 720 deskctl resize-window @w1 1280 720
@ -65,11 +58,12 @@ deskctl close @w3
deskctl launch firefox deskctl launch firefox
``` ```
## Agent loop ## Daemon
The safe pattern is: ```bash
deskctl daemon start
deskctl daemon stop
deskctl daemon status
```
1. Observe with `snapshot`, `list-windows`, or `get ...` The daemon starts automatically on first command. Manual control is rarely needed.
2. Wait for the target window if needed
3. Act using explicit selectors or refs
4. Snapshot again to verify the result

View file

@ -1,75 +0,0 @@
# Install `deskctl`
`deskctl` is designed to be used non-interactively by agents. The easiest install path is the npm package because it installs the `deskctl` command directly from GitHub Release assets without needing Cargo on the target machine.
## Preferred: npm global install
```bash
npm install -g deskctl-cli
deskctl --help
```
This is the preferred path for sandboxes, VMs, and sandbox-agent sessions where Node/npm already exists.
## User-prefix npm install
If global npm installs are not writable:
```bash
mkdir -p "$HOME/.local/bin"
npm install -g --prefix "$HOME/.local" deskctl-cli
export PATH="$HOME/.local/bin:$PATH"
deskctl --help
```
This avoids `sudo` and keeps the install inside the user home directory.
## One-shot npm execution
```bash
npx deskctl-cli --help
```
Use this for quick testing. For repeated desktop control, install the command once so the runtime is predictable.
## Fallback: Cargo
```bash
cargo install deskctl
```
Use this only when the machine already has a Rust toolchain or when you explicitly want a source build.
## Fallback: local Docker build
If you need a Linux binary from macOS or another non-Linux host:
```bash
docker compose -f docker/docker-compose.yml run --rm build
```
Then copy `dist/deskctl-linux-x86_64` into the target machine.
## Runtime prerequisites
`deskctl` needs:
- Linux
- X11
- a valid `DISPLAY`
- a working desktop/window-manager session
Quick verification:
```bash
printenv DISPLAY
printenv XDG_SESSION_TYPE
deskctl doctor
```
Inside sandbox-agent, you may need to install desktop dependencies first:
```bash
sandbox-agent install desktop --yes
deskctl doctor
```

View file

@ -0,0 +1 @@
../../../docs/runtime-contract.md

View file

@ -1,61 +0,0 @@
# deskctl inside sandbox-agent
Use `deskctl` when the sandbox-agent session includes a Linux desktop and you want a tight local desktop-control loop from the shell.
## When it fits
`deskctl` is a good fit when:
- the sandbox already has an X11 desktop session
- you want fast local desktop control from inside the sandbox
- you want short-lived refs like `@w1` and grouped `get` or `wait` primitives
It is not a replacement for sandbox-agent session orchestration itself. Use sandbox-agent to provision the sandbox and desktop runtime, then use `deskctl` inside that environment to control the GUI.
## Minimal bootstrap
```bash
sandbox-agent install desktop --yes
npm install -g deskctl-cli
deskctl doctor
deskctl snapshot --annotate
```
If npm global installs are not writable:
```bash
mkdir -p "$HOME/.local/bin"
npm install -g --prefix "$HOME/.local" deskctl-cli
export PATH="$HOME/.local/bin:$PATH"
deskctl doctor
```
## Expected environment
Check:
```bash
printenv DISPLAY
printenv XDG_SESSION_TYPE
deskctl --json get systeminfo
```
Healthy `deskctl` usage usually means:
- `DISPLAY` is set
- `XDG_SESSION_TYPE=x11`
- `deskctl doctor` succeeds
## Recommended workflow
```bash
deskctl snapshot --annotate
deskctl wait window --selector 'class=firefox' --timeout 10
deskctl focus 'class=firefox'
deskctl hotkey ctrl l
deskctl type "https://example.com"
deskctl press enter
deskctl snapshot
```
Prefer `--json` for strict machine parsing and explicit selectors for deterministic targeting.

View file

@ -1,27 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
if command -v deskctl >/dev/null 2>&1; then
echo "deskctl already installed: $(command -v deskctl)"
exit 0
fi
if ! command -v npm >/dev/null 2>&1; then
echo "npm is required for the preferred deskctl install path"
exit 1
fi
prefix="${DESKCTL_NPM_PREFIX:-$HOME/.local}"
bin_dir="$prefix/bin"
mkdir -p "$bin_dir"
npm install -g --prefix "$prefix" deskctl-cli
if ! command -v deskctl >/dev/null 2>&1; then
echo "deskctl installed to $bin_dir"
echo "add this to PATH if needed:"
echo "export PATH=\"$bin_dir:\$PATH\""
fi
"$bin_dir/deskctl" --help >/dev/null 2>&1 || true
echo "deskctl bootstrap complete"

View file

@ -1,7 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
deskctl doctor
deskctl snapshot --annotate
deskctl get active-window
deskctl wait window --selector "${1:-focused}" --timeout "${2:-5}"

View file

@ -0,0 +1,37 @@
#!/usr/bin/env bash
# observe-act.sh - main desktop interaction loop
# usage: ./observe-act.sh <selector> [action] [action-args...]
# example: ./observe-act.sh 'title=Firefox' click
# example: ./observe-act.sh 'class=terminal' type "ls -la"
set -euo pipefail
SELECTOR="${1:?usage: observe-act.sh <selector> [action] [action-args...]}"
ACTION="${2:-click}"
shift 2 2>/dev/null || true
# 1. observe - snapshot the desktop, get current state
echo "--- observe ---"
deskctl snapshot --annotate --json | head -1
deskctl get active-window
# 2. wait - ensure target exists
echo "--- wait ---"
deskctl wait window --selector "$SELECTOR" --timeout 10
# 3. act - perform the action on the target
echo "--- act ---"
case "$ACTION" in
click) deskctl click "$SELECTOR" ;;
dblclick) deskctl dblclick "$SELECTOR" ;;
focus) deskctl focus "$SELECTOR" ;;
type) deskctl focus "$SELECTOR" && deskctl type "$@" ;;
press) deskctl focus "$SELECTOR" && deskctl press "$@" ;;
hotkey) deskctl focus "$SELECTOR" && deskctl hotkey "$@" ;;
close) deskctl close "$SELECTOR" ;;
*) echo "unknown action: $ACTION"; exit 1 ;;
esac
# 4. verify - snapshot again to confirm result
echo "--- verify ---"
sleep 0.5
deskctl snapshot --json | head -1

View file

@ -0,0 +1,42 @@
#!/usr/bin/env bash
# poll-condition.sh - poll the desktop until a condition is met
# usage: ./poll-condition.sh <match-string> [interval-seconds] [max-attempts]
# example: ./poll-condition.sh "Tickets Available" 5 60
# example: ./poll-condition.sh "Order Confirmed" 3 20
# example: ./poll-condition.sh "Download Complete" 10 30
#
# checks window titles for the match string every N seconds.
# exits 0 when found, exits 1 after max attempts.
set -euo pipefail
MATCH="${1:?usage: poll-condition.sh <match-string> [interval] [max-attempts]}"
INTERVAL="${2:-5}"
MAX="${3:-60}"
attempt=0
while [ "$attempt" -lt "$MAX" ]; do
attempt=$((attempt + 1))
# snapshot and check window titles
windows=$(deskctl list-windows --json 2>/dev/null || echo '{"success":false}')
if echo "$windows" | grep -qi "$MATCH"; then
echo "FOUND: '$MATCH' detected on attempt $attempt"
deskctl snapshot --annotate
exit 0
fi
# also check screenshot text via active window title
active=$(deskctl get active-window --json 2>/dev/null || echo '{}')
if echo "$active" | grep -qi "$MATCH"; then
echo "FOUND: '$MATCH' in active window on attempt $attempt"
deskctl snapshot --annotate
exit 0
fi
echo "attempt $attempt/$MAX - '$MATCH' not found, waiting ${INTERVAL}s..."
sleep "$INTERVAL"
done
echo "NOT FOUND: '$MATCH' after $MAX attempts"
deskctl snapshot --annotate
exit 1