deskctl/SKILL.md
2026-03-24 22:59:21 -04:00

3.3 KiB

name description allowed-tools
deskctl Desktop control CLI for AI agents Bash(deskctl:*)

deskctl

Desktop control CLI for AI agents on Linux X11. Provides a unified interface for screenshots, mouse/keyboard input, and window management with compact @wN window references.

Core Workflow

  1. Snapshot to see the desktop and get window refs
  2. Act using refs or coordinates (click, type, focus)
  3. Repeat as needed

Quick Reference

See the Desktop

deskctl snapshot              # Screenshot + window tree with @wN refs
deskctl snapshot --annotate   # Screenshot with bounding boxes and labels
deskctl snapshot --json       # Structured JSON output
deskctl list-windows          # Window tree without screenshot
deskctl screenshot /tmp/s.png # Screenshot only (no window tree)

Click and Type

deskctl click @w1             # Click center of window @w1
deskctl click 500,300         # Click absolute coordinates
deskctl dblclick @w2          # Double-click window @w2
deskctl type "hello world"    # Type text into focused window
deskctl press enter           # Press a key
deskctl hotkey ctrl c         # Send Ctrl+C
deskctl hotkey ctrl shift t   # Send Ctrl+Shift+T

Mouse Control

deskctl mouse move 500 300    # Move cursor to coordinates
deskctl mouse scroll 3        # Scroll down 3 units
deskctl mouse scroll -3       # Scroll up 3 units
deskctl mouse drag 100 100 500 500  # Drag from (100,100) to (500,500)

Window Management

deskctl focus @w2             # Focus window by ref
deskctl focus "firefox"       # Focus window by name (substring match)
deskctl close @w3             # Close window gracefully
deskctl move-window @w1 100 200     # Move window to position
deskctl resize-window @w1 800 600   # Resize window

Utilities

deskctl get-screen-size       # Screen resolution
deskctl get-mouse-position    # Current cursor position
deskctl launch firefox        # Launch an application
deskctl launch code -- --new-window  # Launch with arguments

Daemon

deskctl daemon start          # Start daemon manually
deskctl daemon stop           # Stop daemon
deskctl daemon status         # Check daemon status

Global Options

  • --json : Output as structured JSON (all commands)
  • --session NAME : Session name for multiple daemon instances (default: "default")
  • --socket PATH : Custom Unix socket path

Window Refs

After snapshot or list-windows, windows are assigned short refs:

  • @w1 is the topmost (usually focused) window
  • @w2, @w3, etc. follow z-order (front to back)
  • Refs reset on each snapshot call
  • Use --json to see stable xcb_id for programmatic tracking

Example Agent Workflow

# 1. See what's on screen
deskctl snapshot --annotate

# 2. Focus the browser
deskctl focus "firefox"

# 3. Navigate to a URL
deskctl hotkey ctrl l
deskctl type "https://example.com"
deskctl press enter

# 4. Take a new snapshot to see the result
deskctl snapshot

Key Names for press/hotkey

Modifiers: ctrl, alt, shift, super Navigation: enter, tab, escape, backspace, delete, space Arrows: up, down, left, right Page: home, end, pageup, pagedown Function: f1 through f12 Characters: any single character (e.g. a, 1, /)