mirror of
https://github.com/harivansh-afk/deskctl.git
synced 2026-04-16 12:03:24 +00:00
Phase 6: utility commands, SKILL.md, AGENTS.md, README.md
- Implement screen_size via xcap Monitor, mouse_position via x11rb query_pointer, standalone screenshot with optional annotation, launch for spawning detached processes - Handler dispatchers for get-screen-size, get-mouse-position, screenshot, launch - SKILL.md agent discovery file with allowed-tools frontmatter - AGENTS.md contributor guidelines for AI agents - README.md with installation, quick start, architecture overview
This commit is contained in:
parent
567115a6c2
commit
03dfd6b6ea
5 changed files with 339 additions and 8 deletions
116
SKILL.md
Normal file
116
SKILL.md
Normal file
|
|
@ -0,0 +1,116 @@
|
|||
---
|
||||
name: desktop-ctl
|
||||
description: Desktop control CLI for AI agents - screenshot, click, type, window management on Linux X11
|
||||
allowed-tools: Bash(desktop-ctl:*)
|
||||
---
|
||||
|
||||
# desktop-ctl
|
||||
|
||||
Desktop control CLI for AI agents on Linux X11. Provides a unified interface for screenshots, mouse/keyboard input, and window management with compact `@wN` window references.
|
||||
|
||||
## Core Workflow
|
||||
|
||||
1. **Snapshot** to see the desktop and get window refs
|
||||
2. **Act** using refs or coordinates (click, type, focus)
|
||||
3. **Repeat** as needed
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### See the Desktop
|
||||
|
||||
```bash
|
||||
desktop-ctl snapshot # Screenshot + window tree with @wN refs
|
||||
desktop-ctl snapshot --annotate # Screenshot with bounding boxes and labels
|
||||
desktop-ctl snapshot --json # Structured JSON output
|
||||
desktop-ctl list-windows # Window tree without screenshot
|
||||
desktop-ctl screenshot /tmp/s.png # Screenshot only (no window tree)
|
||||
```
|
||||
|
||||
### Click and Type
|
||||
|
||||
```bash
|
||||
desktop-ctl click @w1 # Click center of window @w1
|
||||
desktop-ctl click 500,300 # Click absolute coordinates
|
||||
desktop-ctl dblclick @w2 # Double-click window @w2
|
||||
desktop-ctl type "hello world" # Type text into focused window
|
||||
desktop-ctl press enter # Press a key
|
||||
desktop-ctl hotkey ctrl c # Send Ctrl+C
|
||||
desktop-ctl hotkey ctrl shift t # Send Ctrl+Shift+T
|
||||
```
|
||||
|
||||
### Mouse Control
|
||||
|
||||
```bash
|
||||
desktop-ctl mouse move 500 300 # Move cursor to coordinates
|
||||
desktop-ctl mouse scroll 3 # Scroll down 3 units
|
||||
desktop-ctl mouse scroll -3 # Scroll up 3 units
|
||||
desktop-ctl mouse drag 100 100 500 500 # Drag from (100,100) to (500,500)
|
||||
```
|
||||
|
||||
### Window Management
|
||||
|
||||
```bash
|
||||
desktop-ctl focus @w2 # Focus window by ref
|
||||
desktop-ctl focus "firefox" # Focus window by name (substring match)
|
||||
desktop-ctl close @w3 # Close window gracefully
|
||||
desktop-ctl move-window @w1 100 200 # Move window to position
|
||||
desktop-ctl resize-window @w1 800 600 # Resize window
|
||||
```
|
||||
|
||||
### Utilities
|
||||
|
||||
```bash
|
||||
desktop-ctl get-screen-size # Screen resolution
|
||||
desktop-ctl get-mouse-position # Current cursor position
|
||||
desktop-ctl launch firefox # Launch an application
|
||||
desktop-ctl launch code -- --new-window # Launch with arguments
|
||||
```
|
||||
|
||||
### Daemon
|
||||
|
||||
```bash
|
||||
desktop-ctl daemon start # Start daemon manually
|
||||
desktop-ctl daemon stop # Stop daemon
|
||||
desktop-ctl daemon status # Check daemon status
|
||||
```
|
||||
|
||||
## Global Options
|
||||
|
||||
- `--json` : Output as structured JSON (all commands)
|
||||
- `--session NAME` : Session name for multiple daemon instances (default: "default")
|
||||
- `--socket PATH` : Custom Unix socket path
|
||||
|
||||
## Window Refs
|
||||
|
||||
After `snapshot` or `list-windows`, windows are assigned short refs:
|
||||
- `@w1` is the topmost (usually focused) window
|
||||
- `@w2`, `@w3`, etc. follow z-order (front to back)
|
||||
- Refs reset on each `snapshot` call
|
||||
- Use `--json` to see stable `xcb_id` for programmatic tracking
|
||||
|
||||
## Example Agent Workflow
|
||||
|
||||
```bash
|
||||
# 1. See what's on screen
|
||||
desktop-ctl snapshot --annotate
|
||||
|
||||
# 2. Focus the browser
|
||||
desktop-ctl focus "firefox"
|
||||
|
||||
# 3. Navigate to a URL
|
||||
desktop-ctl hotkey ctrl l
|
||||
desktop-ctl type "https://example.com"
|
||||
desktop-ctl press enter
|
||||
|
||||
# 4. Take a new snapshot to see the result
|
||||
desktop-ctl snapshot
|
||||
```
|
||||
|
||||
## Key Names for press/hotkey
|
||||
|
||||
Modifiers: `ctrl`, `alt`, `shift`, `super`
|
||||
Navigation: `enter`, `tab`, `escape`, `backspace`, `delete`, `space`
|
||||
Arrows: `up`, `down`, `left`, `right`
|
||||
Page: `home`, `end`, `pageup`, `pagedown`
|
||||
Function: `f1` through `f12`
|
||||
Characters: any single character (e.g. `a`, `1`, `/`)
|
||||
Loading…
Add table
Add a link
Reference in a new issue