skill validated with workflows

2026-04-15 04:03:28 +00:00 · 2026-03-26 00:30:05 -04:00 · 2026-03-26 00:30:05 -04:00 · c37589ccf4
commit c37589ccf4
parent 3dbd9ce52d
9 changed files with 134 additions and 308 deletions
--- a/skills/deskctl/SKILL.md
+++ b/skills/deskctl/SKILL.md
@ -1,132 +1,54 @@
 ---
 name: deskctl
-description: Desktop control CLI for AI agents on Linux X11. Use when operating an X11 desktop in a sandbox, VM, or sandbox-agent session via screenshots, grouped get/wait commands, selectors, and mouse or keyboard input. Prefer this skill when the task is "control the desktop", "inspect windows", "wait for a window", "click/type in the sandbox desktop", or "use deskctl inside sandbox-agent".
+description: Non-interactive X11 desktop control for AI agents. Use when the task involves controlling a Linux desktop - clicking, typing, reading windows, waiting for UI state, or taking screenshots inside a sandbox or VM.
-allowed-tools: Bash(deskctl:*), Bash(npx deskctl-cli:*), Bash(npm:*), Bash(which:*), Bash(printenv:*), Bash(echo:*), Bash(sandbox-agent:*)
+allowed-tools: Bash(deskctl:*), Bash(npx deskctl-cli:*), Bash(npm:*), Bash(which:*), Bash(printenv:*), Bash(echo:*)
 ---
 # deskctl
-`deskctl` is a non-interactive desktop control CLI for Linux X11 agents. It works well inside sandbox-agent desktop environments because it gives agents a tight `observe -> wait -> act -> verify` loop.
+Non-interactive desktop control CLI for Linux X11 agents.
-## Install skill (optional)
+All output follows the runtime contract defined in [references/runtime-contract.md](references/runtime-contract.md). Every command returns a stable JSON envelope when called with `--json`. Use `--json` whenever you need to parse output programmatically.
-### npx
+## Quick start
 ```bash
 npx skills add harivansh-afk/deskctl -s deskctl
 ```
 ### bunx
 ```bash
 bunx skills add harivansh-afk/deskctl -s deskctl
 ```
 ## Install the CLI
 Preferred install path:
 ```bash
 npm install -g deskctl-cli
 deskctl --help
 ```
 If global npm installs are not writable, use a user prefix:
 ```bash
 mkdir -p "$HOME/.local/bin"
 npm install -g --prefix "$HOME/.local" deskctl-cli
 export PATH="$HOME/.local/bin:$PATH"
 deskctl --help
 ```
 One-shot usage also works:
 ```bash
 npx deskctl-cli --help
 ```
 For install details and fallback paths, see [references/install.md](references/install.md).
 ## Sandbox-Agent Notes
 Before using `deskctl` inside sandbox-agent:
 1. Make sure the sandbox has desktop runtime packages installed.
 2. Make sure the session is actually running X11.
 3. Run `deskctl doctor` before trying to click or type.
 Typical sandbox-agent prep:
 ```bash
 sandbox-agent install desktop --yes
 deskctl doctor
 ```
 If `doctor` fails, inspect `DISPLAY`, `XDG_SESSION_TYPE`, and whether the sandbox actually has a desktop session. See [references/sandbox-agent.md](references/sandbox-agent.md).
 ## Core Workflow
 Every desktop task should follow this loop:
 1. **Observe**
 2. **Target**
 3. **Wait**
 4. **Act**
 5. **Verify**
 ```bash
 deskctl doctor
 deskctl snapshot --annotate
 deskctl get active-window
 deskctl wait window --selector 'class=firefox' --timeout 10
 deskctl focus 'class=firefox'
 deskctl hotkey ctrl l
 deskctl type "https://example.com"
 deskctl press enter
 deskctl snapshot
 ```
-## What To Reach For First
+## Agent loop
- `deskctl doctor`
+Every desktop interaction follows: **observe -> wait -> act -> verify**.
 - `deskctl snapshot --annotate`
 - `deskctl list-windows`
 - `deskctl get active-window`
 - `deskctl wait window --selector ...`
 - `deskctl wait focus --selector ...`
 Use `--json` when you need strict parsing. Use explicit selectors when you need deterministic targeting.
 ## Selector Rules
 Prefer explicit selectors:
 ```bash
-ref=w1
+deskctl snapshot --annotate        # observe
-id=win1
+deskctl wait window --selector 'title=Firefox' --timeout 10  # wait
-title=Firefox
+deskctl click 'title=Firefox'      # act
-class=firefox
+deskctl snapshot                   # verify
 focused
 ```
-Legacy refs still work:
+See [workflows/observe-act.sh](workflows/observe-act.sh) for a reusable script. See [workflows/poll-condition.sh](workflows/poll-condition.sh) for polling loops.
 ## Selectors
 ```bash
-@w1
+ref=w1          # snapshot ref (short-lived)
-w1
+id=win1         # stable window ID (session-scoped)
-win1
+title=Firefox   # match by title
 class=firefox   # match by WM class
 focused         # currently focused window
 ```
-Bare strings such as `firefox` are fuzzy substring selectors. They fail on ambiguity instead of silently picking the wrong window.
+Bare strings like `firefox` do fuzzy matching but fail on ambiguity. Prefer explicit selectors.
 ## References
- [references/install.md](references/install.md) - install paths, npm-first bootstrap, runtime prerequisites
+- [references/runtime-contract.md](references/runtime-contract.md) - output contract, stable fields, error kinds
- [references/commands.md](references/commands.md) - grouped reads, waits, selectors, and core action commands
+- [references/commands.md](references/commands.md) - all available commands
 - [references/sandbox-agent.md](references/sandbox-agent.md) - using `deskctl` inside sandbox-agent desktop sessions
-## Templates
+## Workflows
- [templates/install-deskctl-npm.sh](templates/install-deskctl-npm.sh) - install `deskctl-cli` into a user prefix
+- [workflows/observe-act.sh](workflows/observe-act.sh) - main observe-act loop
- [templates/sandbox-agent-desktop-loop.sh](templates/sandbox-agent-desktop-loop.sh) - minimal observe/wait/act loop for desktop tasks
+- [workflows/poll-condition.sh](workflows/poll-condition.sh) - poll for a condition on screen
--- a/skills/deskctl/references/commands.md
+++ b/skills/deskctl/references/commands.md
@ -1,21 +1,23 @@
-# deskctl command guide
+# deskctl commands
 All commands support `--json` for machine-parseable output following the runtime contract.
 ## Observe
 ```bash
-deskctl doctor
+deskctl doctor                          # check X11 runtime and daemon health
-deskctl snapshot
+deskctl snapshot                        # screenshot + window list
-deskctl snapshot --annotate
+deskctl snapshot --annotate             # screenshot with @wN labels overlaid
-deskctl list-windows
+deskctl list-windows                    # window list only (no screenshot)
-deskctl screenshot /tmp/current.png
+deskctl screenshot /tmp/screen.png      # screenshot to explicit path
-deskctl get active-window
+deskctl get active-window               # focused window info
-deskctl get monitors
+deskctl get monitors                    # monitor geometry
-deskctl get version
+deskctl get version                     # version and backend
-deskctl get systeminfo
+deskctl get systeminfo                  # full runtime diagnostics
 deskctl get-screen-size                 # screen resolution
 deskctl get-mouse-position              # cursor coordinates
 ```
 Use `snapshot --annotate` when you need both the screenshot artifact and the short `@wN` labels. Use `list-windows` when you only need the window tree and do not want screenshot side effects.
 ## Wait
 ```bash
@ -23,29 +25,19 @@ deskctl wait window --selector 'title=Firefox' --timeout 10
 deskctl wait focus --selector 'class=firefox' --timeout 5
 ```
-Wait commands return the matched window payload on success. In `--json` mode, failures include structured `kind` values so the caller can recover without string parsing.
+Returns the matched window payload on success. Failures include structured `kind` values in `--json` mode.
 ## Selectors
 Prefer explicit selectors:
 ```bash
-ref=w1
+ref=w1          # snapshot ref (short-lived, from last snapshot)
-id=win1
+id=win1         # stable window ID (session-scoped)
-title=Firefox
+title=Firefox   # match by window title
-class=firefox
+class=firefox   # match by WM class
-focused
+focused         # currently focused window
 ```
-Legacy refs still work:
+Legacy shorthand: `@w1`, `w1`, `win1`. Bare strings do fuzzy matching but fail on ambiguity.
 ```bash
@w1
 w1
 win1
 ```
 Bare fuzzy selectors such as `firefox` are supported, but they fail on ambiguity.
 ## Act
@ -58,6 +50,7 @@ deskctl press enter
 deskctl hotkey ctrl shift t
 deskctl mouse move 500 300
 deskctl mouse scroll 3
 deskctl mouse scroll 3 --axis horizontal
 deskctl mouse drag 100 100 500 500
 deskctl move-window @w1 100 120
 deskctl resize-window @w1 1280 720
@ -65,11 +58,12 @@ deskctl close @w3
 deskctl launch firefox
 ```
-## Agent loop
+## Daemon
-The safe pattern is:
+```bash
 deskctl daemon start
 deskctl daemon stop
 deskctl daemon status
 ```
-1. Observe with `snapshot`, `list-windows`, or `get ...`
+The daemon starts automatically on first command. Manual control is rarely needed.
 2. Wait for the target window if needed
 3. Act using explicit selectors or refs
 4. Snapshot again to verify the result
--- a/skills/deskctl/references/install.md
+++ b/skills/deskctl/references/install.md
@ -1,75 +0,0 @@
 # Install `deskctl`
 `deskctl` is designed to be used non-interactively by agents. The easiest install path is the npm package because it installs the `deskctl` command directly from GitHub Release assets without needing Cargo on the target machine.
 ## Preferred: npm global install
 ```bash
 npm install -g deskctl-cli
 deskctl --help
 ```
 This is the preferred path for sandboxes, VMs, and sandbox-agent sessions where Node/npm already exists.
 ## User-prefix npm install
 If global npm installs are not writable:
 ```bash
 mkdir -p "$HOME/.local/bin"
 npm install -g --prefix "$HOME/.local" deskctl-cli
 export PATH="$HOME/.local/bin:$PATH"
 deskctl --help
 ```
 This avoids `sudo` and keeps the install inside the user home directory.
 ## One-shot npm execution
 ```bash
 npx deskctl-cli --help
 ```
 Use this for quick testing. For repeated desktop control, install the command once so the runtime is predictable.
 ## Fallback: Cargo
 ```bash
 cargo install deskctl
 ```
 Use this only when the machine already has a Rust toolchain or when you explicitly want a source build.
 ## Fallback: local Docker build
 If you need a Linux binary from macOS or another non-Linux host:
 ```bash
 docker compose -f docker/docker-compose.yml run --rm build
 ```
 Then copy `dist/deskctl-linux-x86_64` into the target machine.
 ## Runtime prerequisites
 `deskctl` needs:
 - Linux
 - X11
 - a valid `DISPLAY`
 - a working desktop/window-manager session
 Quick verification:
 ```bash
 printenv DISPLAY
 printenv XDG_SESSION_TYPE
 deskctl doctor
 ```
 Inside sandbox-agent, you may need to install desktop dependencies first:
 ```bash
 sandbox-agent install desktop --yes
 deskctl doctor
 ```
--- a/skills/deskctl/references/runtime-contract.md
+++ b/skills/deskctl/references/runtime-contract.md
@ -0,0 +1 @@
 ../../../docs/runtime-contract.md
--- a/skills/deskctl/references/sandbox-agent.md
+++ b/skills/deskctl/references/sandbox-agent.md
@ -1,61 +0,0 @@
 # deskctl inside sandbox-agent
 Use `deskctl` when the sandbox-agent session includes a Linux desktop and you want a tight local desktop-control loop from the shell.
 ## When it fits
 `deskctl` is a good fit when:
 - the sandbox already has an X11 desktop session
 - you want fast local desktop control from inside the sandbox
 - you want short-lived refs like `@w1` and grouped `get` or `wait` primitives
 It is not a replacement for sandbox-agent session orchestration itself. Use sandbox-agent to provision the sandbox and desktop runtime, then use `deskctl` inside that environment to control the GUI.
 ## Minimal bootstrap
 ```bash
 sandbox-agent install desktop --yes
 npm install -g deskctl-cli
 deskctl doctor
 deskctl snapshot --annotate
 ```
 If npm global installs are not writable:
 ```bash
 mkdir -p "$HOME/.local/bin"
 npm install -g --prefix "$HOME/.local" deskctl-cli
 export PATH="$HOME/.local/bin:$PATH"
 deskctl doctor
 ```
 ## Expected environment
 Check:
 ```bash
 printenv DISPLAY
 printenv XDG_SESSION_TYPE
 deskctl --json get systeminfo
 ```
 Healthy `deskctl` usage usually means:
 - `DISPLAY` is set
 - `XDG_SESSION_TYPE=x11`
 - `deskctl doctor` succeeds
 ## Recommended workflow
 ```bash
 deskctl snapshot --annotate
 deskctl wait window --selector 'class=firefox' --timeout 10
 deskctl focus 'class=firefox'
 deskctl hotkey ctrl l
 deskctl type "https://example.com"
 deskctl press enter
 deskctl snapshot
 ```
 Prefer `--json` for strict machine parsing and explicit selectors for deterministic targeting.
--- a/skills/deskctl/templates/install-deskctl-npm.sh
+++ b/skills/deskctl/templates/install-deskctl-npm.sh
@ -1,27 +0,0 @@
 #!/usr/bin/env bash
 set -euo pipefail
 if command -v deskctl >/dev/null 2>&1; then
  echo "deskctl already installed: $(command -v deskctl)"
  exit 0
 fi
 if ! command -v npm >/dev/null 2>&1; then
  echo "npm is required for the preferred deskctl install path"
  exit 1
 fi
 prefix="${DESKCTL_NPM_PREFIX:-$HOME/.local}"
 bin_dir="$prefix/bin"
 mkdir -p "$bin_dir"
 npm install -g --prefix "$prefix" deskctl-cli
 if ! command -v deskctl >/dev/null 2>&1; then
  echo "deskctl installed to $bin_dir"
  echo "add this to PATH if needed:"
  echo "export PATH=\"$bin_dir:\$PATH\""
 fi
 "$bin_dir/deskctl" --help >/dev/null 2>&1 || true
 echo "deskctl bootstrap complete"
--- a/skills/deskctl/templates/sandbox-agent-desktop-loop.sh
+++ b/skills/deskctl/templates/sandbox-agent-desktop-loop.sh
@ -1,7 +0,0 @@
 #!/usr/bin/env bash
 set -euo pipefail
 deskctl doctor
 deskctl snapshot --annotate
 deskctl get active-window
 deskctl wait window --selector "${1:-focused}" --timeout "${2:-5}"
--- a/skills/deskctl/workflows/observe-act.sh
+++ b/skills/deskctl/workflows/observe-act.sh
@ -0,0 +1,37 @@
 #!/usr/bin/env bash
 # observe-act.sh - main desktop interaction loop
 # usage: ./observe-act.sh <selector> [action] [action-args...]
 # example: ./observe-act.sh 'title=Firefox' click
 # example: ./observe-act.sh 'class=terminal' type "ls -la"
 set -euo pipefail
 SELECTOR="${1:?usage: observe-act.sh <selector> [action] [action-args...]}"
 ACTION="${2:-click}"
 shift 2 2>/dev/null || true
 # 1. observe - snapshot the desktop, get current state
 echo "--- observe ---"
 deskctl snapshot --annotate --json | head -1
 deskctl get active-window
 # 2. wait - ensure target exists
 echo "--- wait ---"
 deskctl wait window --selector "$SELECTOR" --timeout 10
 # 3. act - perform the action on the target
 echo "--- act ---"
 case "$ACTION" in
  click)    deskctl click "$SELECTOR" ;;
  dblclick) deskctl dblclick "$SELECTOR" ;;
  focus)    deskctl focus "$SELECTOR" ;;
  type)     deskctl focus "$SELECTOR" && deskctl type "$@" ;;
  press)    deskctl focus "$SELECTOR" && deskctl press "$@" ;;
  hotkey)   deskctl focus "$SELECTOR" && deskctl hotkey "$@" ;;
  close)    deskctl close "$SELECTOR" ;;
  *)        echo "unknown action: $ACTION"; exit 1 ;;
 esac
 # 4. verify - snapshot again to confirm result
 echo "--- verify ---"
 sleep 0.5
 deskctl snapshot --json | head -1
--- a/skills/deskctl/workflows/poll-condition.sh
+++ b/skills/deskctl/workflows/poll-condition.sh
@ -0,0 +1,42 @@
 #!/usr/bin/env bash
 # poll-condition.sh - poll the desktop until a condition is met
 # usage: ./poll-condition.sh <match-string> [interval-seconds] [max-attempts]
 # example: ./poll-condition.sh "Tickets Available" 5 60
 # example: ./poll-condition.sh "Order Confirmed" 3 20
 # example: ./poll-condition.sh "Download Complete" 10 30
 #
 # checks window titles for the match string every N seconds.
 # exits 0 when found, exits 1 after max attempts.
 set -euo pipefail
 MATCH="${1:?usage: poll-condition.sh <match-string> [interval] [max-attempts]}"
 INTERVAL="${2:-5}"
 MAX="${3:-60}"
 attempt=0
 while [ "$attempt" -lt "$MAX" ]; do
  attempt=$((attempt + 1))
  # snapshot and check window titles
  windows=$(deskctl list-windows --json 2>/dev/null || echo '{"success":false}')
  if echo "$windows" | grep -qi "$MATCH"; then
    echo "FOUND: '$MATCH' detected on attempt $attempt"
    deskctl snapshot --annotate
    exit 0
  fi
  # also check screenshot text via active window title
  active=$(deskctl get active-window --json 2>/dev/null || echo '{}')
  if echo "$active" | grep -qi "$MATCH"; then
    echo "FOUND: '$MATCH' in active window on attempt $attempt"
    deskctl snapshot --annotate
    exit 0
  fi
  echo "attempt $attempt/$MAX - '$MATCH' not found, waiting ${INTERVAL}s..."
  sleep "$INTERVAL"
 done
 echo "NOT FOUND: '$MATCH' after $MAX attempts"
 deskctl snapshot --annotate
 exit 1