From f74f4dc2b0049d7bb5d766d9f50762493c24ba73 Mon Sep 17 00:00:00 2001 From: Harivansh Rathi Date: Wed, 25 Mar 2026 17:52:18 -0400 Subject: [PATCH] specs --- .../stabilize-v0-2-foundation/design.md | 116 ++++++++++++++++++ .../stabilize-v0-2-foundation/proposal.md | 28 +++++ .../specs/desktop-runtime/spec.md | 54 ++++++++ .../stabilize-v0-2-foundation/tasks.md | 17 +++ 4 files changed, 215 insertions(+) create mode 100644 openspec/changes/stabilize-v0-2-foundation/design.md create mode 100644 openspec/changes/stabilize-v0-2-foundation/proposal.md create mode 100644 openspec/changes/stabilize-v0-2-foundation/specs/desktop-runtime/spec.md create mode 100644 openspec/changes/stabilize-v0-2-foundation/tasks.md diff --git a/openspec/changes/stabilize-v0-2-foundation/design.md b/openspec/changes/stabilize-v0-2-foundation/design.md new file mode 100644 index 0000000..c320144 --- /dev/null +++ b/openspec/changes/stabilize-v0-2-foundation/design.md @@ -0,0 +1,116 @@ +## Context + +`deskctl` already exposes a useful X11 runtime for screenshots, input, and window management, but the current implementation mixes together concerns that need to be separated before the repo can become a stable primitive. Public window data still exposes `xcb_id`, `list-windows` is routed through screenshot-producing code, setup failures are opaque, and daemon startup behavior is not yet explicit enough for reliable reuse by higher-level tooling. + +Phase 1 is the foundation tranche. It should make the runtime contract clean and cheap to consume without expanding into packaging, skills, or non-X11 backends. + +## Goals / Non-Goals + +**Goals:** +- Define a backend-neutral public window identity and selector contract. +- Make read-only window enumeration cheap and side-effect free. +- Add a first-run diagnostics command that explains broken environments precisely. +- Harden daemon startup and health behavior enough for predictable CLI use. +- Keep the Phase 1 scope implementable in one focused change. + +**Non-Goals:** +- Wayland support or any additional backend implementation. +- npm distribution, crates.io publishing, or release automation changes. +- Broad new read surface such as monitors, workspaces, clipboard, or batching. +- Agent skills, config files, or policy/confirmation features. + +## Decisions + +### 1. Public window identity becomes `window_id`, not `xcb_id` + +The stable contract will expose an opaque `window_id` for programmatic targeting. Backend-specific handles such as X11 window IDs stay internal to daemon/backend state. + +Rationale: +- This removes X11 leakage from the public interface. +- It keeps the future backend boundary open without promising a Wayland implementation now. +- It makes selector behavior explicit: users and agents target `@wN`, window names, or `window_id`, not backend handles. + +Alternatives considered: +- Keep exposing `xcb_id` and add `window_id` alongside it. Rejected because it cements the wrong contract and encourages downstream dependence on X11 internals. +- Hide programmatic identity entirely and rely only on refs. Rejected because refs are intentionally ephemeral and not sufficient for durable automation. + +### 2. Window enumeration and screenshot capture become separate backend operations + +The backend API will separate: +- window listing / state collection +- screenshot capture +- composed snapshot behavior + +`list-windows` will call the cheap enumeration path directly. `snapshot` remains the convenience command that combines enumeration plus screenshot generation. + +Rationale: +- Read-only commands must not capture screenshots or write `/tmp` files. +- This reduces latency and unintended side effects. +- It clarifies which operations are safe to call frequently in agent loops. + +Alternatives considered: +- Keep the current `snapshot(false)` path and optimize it internally. Rejected because the coupling itself is the product problem; the API shape needs to reflect the intended behavior. + +### 3. `deskctl doctor` runs without requiring a healthy daemon + +`doctor` will be implemented as a CLI command that can run before daemon startup. It will probe environment prerequisites directly and optionally inspect daemon/socket state as one of the checks. + +Expected checks: +- `DISPLAY` present +- X11 session expectation (`XDG_SESSION_TYPE=x11` or explicit note if missing) +- X server connectivity +- required extensions or equivalent runtime capabilities used by the backend +- socket directory existence and permissions +- basic window enumeration +- screenshot capture viability + +Rationale: +- Diagnostics must work when daemon startup is the thing that is broken. +- The command should return actionable failure messages, not “failed to connect to daemon.” + +Alternatives considered: +- Implement `doctor` as a normal daemon request. Rejected because it hides the startup path and cannot diagnose stale socket or spawn failures well. + +### 4. Daemon hardening stays minimal and focused in this phase + +Phase 1 daemon work will cover: +- stale socket detection and cleanup on startup/connect +- clearer startup/connect failure messages +- a lightweight health-check request or equivalent status probe used by the client path + +Rationale: +- This is enough to make CLI behavior predictable without turning the spec into a full daemon observability project. + +Alternatives considered: +- Include idle timeout, structured logging, and full lifecycle policy now. Rejected because those are useful but not necessary to stabilize the basic runtime contract. + +### 5. X11 remains the explicit support boundary for this change + +The spec and implementation will define expected behavior for X11 environments only. Unsupported-session diagnostics are in scope; a second backend is not. + +Rationale: +- The repo needs a stable foundation more than a nominally portable abstraction. +- Clear support boundaries are better than implying near-term Wayland support. + +## Risks / Trade-offs + +- [Breaking JSON contract for existing users] → Treat this as a deliberate pre-1.0 stabilization change, update docs/examples, and keep the new shape simple. +- [More internal mapping state between public IDs and backend handles] → Keep one canonical mapping in daemon state and use it for selector resolution. +- [Separate read and screenshot paths could drift] → Share the same window collection logic underneath both operations. +- [`doctor` checks may be environment-specific] → Keep the checks narrow, tied to actual backend requirements, and report concrete pass/fail reasons. + +## Migration Plan + +1. Introduce the new public `window_id` contract and update selector resolution to accept it. +2. Refactor backend and daemon code so `list-windows` uses a pure read path. +3. Add `deskctl doctor` and wire it into the CLI as a daemon-independent command. +4. Add unit and integration coverage for the new contract and cheap read behavior. +5. Update user-facing docs and examples after implementation so the new output shape is canonical. + +Rollback strategy: +- This is a pre-1.0 contract cleanup, so rollback is a normal code revert rather than a compatibility shim plan. + +## Open Questions + +- Whether the health check should be an explicit public `daemon ping` action or an internal client/daemon probe. +- Whether `doctor` should expose machine-readable JSON on day one or land text output first and add JSON immediately afterward in the same tranche if implementation cost is low. diff --git a/openspec/changes/stabilize-v0-2-foundation/proposal.md b/openspec/changes/stabilize-v0-2-foundation/proposal.md new file mode 100644 index 0000000..5472981 --- /dev/null +++ b/openspec/changes/stabilize-v0-2-foundation/proposal.md @@ -0,0 +1,28 @@ +## Why + +`deskctl` already works as a useful X11 desktop control CLI, but the current contract is not stable enough to build packaging, skills, or broader agent workflows on top of it yet. Public output still leaks X11-specific identifiers, some read-only commands still perform screenshot capture and write files, setup failures are not self-diagnosing, and daemon lifecycle behavior needs to be more predictable before the repo is treated as a reliable primitive. + +## What Changes + +- Stabilize the public desktop runtime contract around backend-neutral window identity and explicit selector semantics. +- Separate cheap read paths from screenshot-producing paths so read-only commands do not capture or write screenshots unless explicitly requested. +- Add a first-run `deskctl doctor` command that verifies X11 runtime prerequisites and reports exact remediation steps. +- Harden daemon startup and health behavior enough for reliable reuse from CLI commands and future higher-level tooling. +- Document the Phase 1 support boundary: X11 is the supported runtime today; Wayland is out of scope for this change. + +## Capabilities + +### New Capabilities +- `desktop-runtime`: Stable Phase 1 desktop runtime behavior covering public window identity, cheap read commands, runtime diagnostics, and foundational daemon health behavior. + +### Modified Capabilities +- None. + +## Impact + +- Affected CLI surface in `src/cli/` +- Affected daemon request handling and state in `src/daemon/` +- Affected backend contract and X11 implementation in `src/backend/` +- Affected shared protocol and types in `src/core/` +- New tests for unit behavior and X11 integration coverage +- Follow-on docs updates for usage and troubleshooting once implementation lands diff --git a/openspec/changes/stabilize-v0-2-foundation/specs/desktop-runtime/spec.md b/openspec/changes/stabilize-v0-2-foundation/specs/desktop-runtime/spec.md new file mode 100644 index 0000000..191180f --- /dev/null +++ b/openspec/changes/stabilize-v0-2-foundation/specs/desktop-runtime/spec.md @@ -0,0 +1,54 @@ +## ADDED Requirements + +### Requirement: Backend-neutral window identity +The desktop runtime SHALL expose a backend-neutral `window_id` for each enumerated window. Public runtime responses MUST NOT require backend-specific handles such as X11 window IDs for targeting or automation. + +#### Scenario: Listing windows returns stable public identity +- **WHEN** a client lists windows through the runtime +- **THEN** each window result includes a `window_id` that the client can use for later targeting +- **AND** the result does not require the client to know the underlying X11 handle format + +#### Scenario: Selector resolution accepts public identity +- **WHEN** a client targets a window by `window_id` +- **THEN** the runtime resolves the request to the correct live window +- **AND** performs the requested action without exposing backend handles in the public contract + +### Requirement: Read-only window enumeration is side-effect free +The desktop runtime SHALL provide a read-only window enumeration path that does not capture screenshots and does not write screenshot artifacts unless a screenshot-producing command was explicitly requested. + +#### Scenario: Listing windows does not write a screenshot +- **WHEN** a client runs a read-only window listing command +- **THEN** the runtime returns current window data +- **AND** does not capture a screenshot +- **AND** does not create a screenshot file as a side effect + +#### Scenario: Snapshot remains an explicit screenshot-producing command +- **WHEN** a client runs a snapshot command +- **THEN** the runtime returns window data together with screenshot output +- **AND** any screenshot artifact is created only because the snapshot command explicitly requested it + +### Requirement: Runtime diagnostics are first-class +The CLI SHALL provide a `doctor` command that checks runtime prerequisites for the supported X11 environment and reports actionable remediation guidance for each failed check. + +#### Scenario: Doctor reports missing display configuration +- **WHEN** `deskctl doctor` runs without a usable `DISPLAY` +- **THEN** it reports that the X11 display is unavailable +- **AND** includes a concrete remediation message describing what environment setup is required + +#### Scenario: Doctor verifies basic runtime operations +- **WHEN** `deskctl doctor` runs in a healthy supported environment +- **THEN** it verifies X11 connectivity, basic window enumeration, screenshot viability, and socket path health +- **AND** reports a successful diagnostic result for each check + +### Requirement: Daemon startup failures are recoverable and diagnosable +The runtime SHALL detect stale daemon socket state and surface actionable startup or connection errors instead of failing with ambiguous transport errors. + +#### Scenario: Client encounters a stale socket +- **WHEN** the client finds a socket path whose daemon is no longer serving requests +- **THEN** the runtime removes or replaces the stale socket state safely +- **AND** proceeds with a healthy daemon startup or reports a specific failure if recovery does not succeed + +#### Scenario: Health probing distinguishes startup failure from runtime failure +- **WHEN** a client attempts to use the runtime and the daemon cannot become healthy +- **THEN** the returned error explains whether the failure occurred during spawn, health probing, or request handling +- **AND** does not report the problem as a generic connection failure alone diff --git a/openspec/changes/stabilize-v0-2-foundation/tasks.md b/openspec/changes/stabilize-v0-2-foundation/tasks.md new file mode 100644 index 0000000..cf2918b --- /dev/null +++ b/openspec/changes/stabilize-v0-2-foundation/tasks.md @@ -0,0 +1,17 @@ +## 1. Contract and protocol stabilization + +- [ ] 1.1 Define the public `window_id` contract in shared types/protocol code and remove backend-handle assumptions from public runtime responses +- [ ] 1.2 Update daemon state and selector resolution to map `window_id` and refs to internal backend handles without exposing X11-specific IDs publicly +- [ ] 1.3 Update CLI text and JSON response handling to use the new public identity consistently + +## 2. Cheap reads and diagnostics + +- [ ] 2.1 Split backend window enumeration from screenshot capture and route `list-windows` through a read-only path with no screenshot side effects +- [ ] 2.2 Add a daemon-independent `deskctl doctor` command that probes X11 environment setup, socket health, window enumeration, and screenshot viability +- [ ] 2.3 Harden daemon startup and reconnect behavior with stale socket cleanup, health probing, and clearer failure messages + +## 3. Validation and follow-through + +- [ ] 3.1 Add unit tests for selector parsing, public ID resolution, and read-only behavior +- [ ] 3.2 Add X11 integration coverage for `doctor`, `list-windows`, and daemon recovery behavior +- [ ] 3.3 Update user-facing docs and examples to reflect the new contract, `doctor`, and the explicit X11 support boundary