This commit is contained in:
Harivansh Rathi 2026-03-24 22:59:21 -04:00
parent 9adc74f6b7
commit 62a1aab859
4 changed files with 55 additions and 64 deletions

2
Cargo.lock generated
View file

@ -676,7 +676,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5" checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5"
[[package]] [[package]]
name = "desktop-ctl" name = "deskctl"
version = "0.1.0" version = "0.1.0"
dependencies = [ dependencies = [
"ab_glyph", "ab_glyph",

View file

@ -1,9 +1,9 @@
[package] [package]
name = "desktop-ctl" name = "deskctl"
version = "0.1.0" version = "0.1.0"
edition = "2021" edition = "2026"
description = "Desktop control CLI for AI agents - screenshot, click, type, window management on Linux X11" description = "X11 desktop control CLI for agents"
license = "MIT OR Apache-2.0" license = "MIT"
repository = "https://github.com/user/agent-computer" repository = "https://github.com/user/agent-computer"
[dependencies] [dependencies]

View file

@ -1,16 +1,14 @@
# desktop-ctl # deskctl
Desktop control CLI for AI agents on Linux X11. A single installable binary that gives agents full desktop access: screenshots with window refs, mouse/keyboard input, and window management. Desktop control CLI for AI agents on Linux X11.
Inspired by [agent-browser](https://github.com/vercel-labs/agent-browser) - but for the full desktop.
## Install ## Install
```bash ```bash
cargo install desktop-ctl cargo install deskctl
``` ```
System dependencies (Debian/Ubuntu): System deps (Debian/Ubuntu):
```bash ```bash
sudo apt install libxcb-dev libxrandr-dev libclang-dev sudo apt install libxcb-dev libxrandr-dev libclang-dev
``` ```
@ -19,35 +17,28 @@ sudo apt install libxcb-dev libxrandr-dev libclang-dev
```bash ```bash
# See the desktop # See the desktop
desktop-ctl snapshot deskctl snapshot
# Click a window # Click a window
desktop-ctl click @w1 deskctl click @w1
# Type text # Type text
desktop-ctl type "hello world" deskctl type "hello world"
# Focus by name # Focus by name
desktop-ctl focus "firefox" deskctl focus "firefox"
``` ```
## Architecture ## Architecture
Client-daemon architecture over Unix sockets (NDJSON wire protocol). The daemon starts automatically on first command and keeps the X11 connection alive for fast repeated calls. Client-daemon architecture over Unix sockets (NDJSON wire protocol).
The daemon starts automatically on first command and keeps the X11 connection alive for fast repeated calls.
```
Agent -> desktop-ctl CLI (thin client) -> Unix socket -> desktop-ctl daemon -> X11
```
## Requirements ## Requirements
- Linux with X11 session - Linux with X11 session
- Rust 1.75+ (for building) - Rust 1.75+ (for build)
## Wayland Support ## Wayland Support
Coming in v0.2. The trait-based backend design means adding Hyprland/Wayland support is a single trait implementation with zero refactoring of the core. Coming soon hopefully. The trait-based backend design means adding Hyprland/Wayland support is a single trait implementation with zero refactoring of the core which is good.
## License
MIT OR Apache-2.0

View file

@ -1,10 +1,10 @@
--- ---
name: desktop-ctl name: deskctl
description: Desktop control CLI for AI agents - screenshot, click, type, window management on Linux X11 description: Desktop control CLI for AI agents
allowed-tools: Bash(desktop-ctl:*) allowed-tools: Bash(deskctl:*)
--- ---
# desktop-ctl # deskctl
Desktop control CLI for AI agents on Linux X11. Provides a unified interface for screenshots, mouse/keyboard input, and window management with compact `@wN` window references. Desktop control CLI for AI agents on Linux X11. Provides a unified interface for screenshots, mouse/keyboard input, and window management with compact `@wN` window references.
@ -19,59 +19,59 @@ Desktop control CLI for AI agents on Linux X11. Provides a unified interface for
### See the Desktop ### See the Desktop
```bash ```bash
desktop-ctl snapshot # Screenshot + window tree with @wN refs deskctl snapshot # Screenshot + window tree with @wN refs
desktop-ctl snapshot --annotate # Screenshot with bounding boxes and labels deskctl snapshot --annotate # Screenshot with bounding boxes and labels
desktop-ctl snapshot --json # Structured JSON output deskctl snapshot --json # Structured JSON output
desktop-ctl list-windows # Window tree without screenshot deskctl list-windows # Window tree without screenshot
desktop-ctl screenshot /tmp/s.png # Screenshot only (no window tree) deskctl screenshot /tmp/s.png # Screenshot only (no window tree)
``` ```
### Click and Type ### Click and Type
```bash ```bash
desktop-ctl click @w1 # Click center of window @w1 deskctl click @w1 # Click center of window @w1
desktop-ctl click 500,300 # Click absolute coordinates deskctl click 500,300 # Click absolute coordinates
desktop-ctl dblclick @w2 # Double-click window @w2 deskctl dblclick @w2 # Double-click window @w2
desktop-ctl type "hello world" # Type text into focused window deskctl type "hello world" # Type text into focused window
desktop-ctl press enter # Press a key deskctl press enter # Press a key
desktop-ctl hotkey ctrl c # Send Ctrl+C deskctl hotkey ctrl c # Send Ctrl+C
desktop-ctl hotkey ctrl shift t # Send Ctrl+Shift+T deskctl hotkey ctrl shift t # Send Ctrl+Shift+T
``` ```
### Mouse Control ### Mouse Control
```bash ```bash
desktop-ctl mouse move 500 300 # Move cursor to coordinates deskctl mouse move 500 300 # Move cursor to coordinates
desktop-ctl mouse scroll 3 # Scroll down 3 units deskctl mouse scroll 3 # Scroll down 3 units
desktop-ctl mouse scroll -3 # Scroll up 3 units deskctl mouse scroll -3 # Scroll up 3 units
desktop-ctl mouse drag 100 100 500 500 # Drag from (100,100) to (500,500) deskctl mouse drag 100 100 500 500 # Drag from (100,100) to (500,500)
``` ```
### Window Management ### Window Management
```bash ```bash
desktop-ctl focus @w2 # Focus window by ref deskctl focus @w2 # Focus window by ref
desktop-ctl focus "firefox" # Focus window by name (substring match) deskctl focus "firefox" # Focus window by name (substring match)
desktop-ctl close @w3 # Close window gracefully deskctl close @w3 # Close window gracefully
desktop-ctl move-window @w1 100 200 # Move window to position deskctl move-window @w1 100 200 # Move window to position
desktop-ctl resize-window @w1 800 600 # Resize window deskctl resize-window @w1 800 600 # Resize window
``` ```
### Utilities ### Utilities
```bash ```bash
desktop-ctl get-screen-size # Screen resolution deskctl get-screen-size # Screen resolution
desktop-ctl get-mouse-position # Current cursor position deskctl get-mouse-position # Current cursor position
desktop-ctl launch firefox # Launch an application deskctl launch firefox # Launch an application
desktop-ctl launch code -- --new-window # Launch with arguments deskctl launch code -- --new-window # Launch with arguments
``` ```
### Daemon ### Daemon
```bash ```bash
desktop-ctl daemon start # Start daemon manually deskctl daemon start # Start daemon manually
desktop-ctl daemon stop # Stop daemon deskctl daemon stop # Stop daemon
desktop-ctl daemon status # Check daemon status deskctl daemon status # Check daemon status
``` ```
## Global Options ## Global Options
@ -92,18 +92,18 @@ After `snapshot` or `list-windows`, windows are assigned short refs:
```bash ```bash
# 1. See what's on screen # 1. See what's on screen
desktop-ctl snapshot --annotate deskctl snapshot --annotate
# 2. Focus the browser # 2. Focus the browser
desktop-ctl focus "firefox" deskctl focus "firefox"
# 3. Navigate to a URL # 3. Navigate to a URL
desktop-ctl hotkey ctrl l deskctl hotkey ctrl l
desktop-ctl type "https://example.com" deskctl type "https://example.com"
desktop-ctl press enter deskctl press enter
# 4. Take a new snapshot to see the result # 4. Take a new snapshot to see the result
desktop-ctl snapshot deskctl snapshot
``` ```
## Key Names for press/hotkey ## Key Names for press/hotkey