mirror of
https://github.com/harivansh-afk/deskctl.git
synced 2026-04-15 03:00:45 +00:00
- Implement screen_size via xcap Monitor, mouse_position via x11rb query_pointer, standalone screenshot with optional annotation, launch for spawning detached processes - Handler dispatchers for get-screen-size, get-mouse-position, screenshot, launch - SKILL.md agent discovery file with allowed-tools frontmatter - AGENTS.md contributor guidelines for AI agents - README.md with installation, quick start, architecture overview
3.5 KiB
3.5 KiB
| name | description | allowed-tools |
|---|---|---|
| desktop-ctl | Desktop control CLI for AI agents - screenshot, click, type, window management on Linux X11 | Bash(desktop-ctl:*) |
desktop-ctl
Desktop control CLI for AI agents on Linux X11. Provides a unified interface for screenshots, mouse/keyboard input, and window management with compact @wN window references.
Core Workflow
- Snapshot to see the desktop and get window refs
- Act using refs or coordinates (click, type, focus)
- Repeat as needed
Quick Reference
See the Desktop
desktop-ctl snapshot # Screenshot + window tree with @wN refs
desktop-ctl snapshot --annotate # Screenshot with bounding boxes and labels
desktop-ctl snapshot --json # Structured JSON output
desktop-ctl list-windows # Window tree without screenshot
desktop-ctl screenshot /tmp/s.png # Screenshot only (no window tree)
Click and Type
desktop-ctl click @w1 # Click center of window @w1
desktop-ctl click 500,300 # Click absolute coordinates
desktop-ctl dblclick @w2 # Double-click window @w2
desktop-ctl type "hello world" # Type text into focused window
desktop-ctl press enter # Press a key
desktop-ctl hotkey ctrl c # Send Ctrl+C
desktop-ctl hotkey ctrl shift t # Send Ctrl+Shift+T
Mouse Control
desktop-ctl mouse move 500 300 # Move cursor to coordinates
desktop-ctl mouse scroll 3 # Scroll down 3 units
desktop-ctl mouse scroll -3 # Scroll up 3 units
desktop-ctl mouse drag 100 100 500 500 # Drag from (100,100) to (500,500)
Window Management
desktop-ctl focus @w2 # Focus window by ref
desktop-ctl focus "firefox" # Focus window by name (substring match)
desktop-ctl close @w3 # Close window gracefully
desktop-ctl move-window @w1 100 200 # Move window to position
desktop-ctl resize-window @w1 800 600 # Resize window
Utilities
desktop-ctl get-screen-size # Screen resolution
desktop-ctl get-mouse-position # Current cursor position
desktop-ctl launch firefox # Launch an application
desktop-ctl launch code -- --new-window # Launch with arguments
Daemon
desktop-ctl daemon start # Start daemon manually
desktop-ctl daemon stop # Stop daemon
desktop-ctl daemon status # Check daemon status
Global Options
--json: Output as structured JSON (all commands)--session NAME: Session name for multiple daemon instances (default: "default")--socket PATH: Custom Unix socket path
Window Refs
After snapshot or list-windows, windows are assigned short refs:
@w1is the topmost (usually focused) window@w2,@w3, etc. follow z-order (front to back)- Refs reset on each
snapshotcall - Use
--jsonto see stablexcb_idfor programmatic tracking
Example Agent Workflow
# 1. See what's on screen
desktop-ctl snapshot --annotate
# 2. Focus the browser
desktop-ctl focus "firefox"
# 3. Navigate to a URL
desktop-ctl hotkey ctrl l
desktop-ctl type "https://example.com"
desktop-ctl press enter
# 4. Take a new snapshot to see the result
desktop-ctl snapshot
Key Names for press/hotkey
Modifiers: ctrl, alt, shift, super
Navigation: enter, tab, escape, backspace, delete, space
Arrows: up, down, left, right
Page: home, end, pageup, pagedown
Function: f1 through f12
Characters: any single character (e.g. a, 1, /)