deskctl/openspec/changes/archive/2026-03-25-stabilize-v0-2-foundation/design.md
2026-03-25 18:22:49 -04:00

116 lines
6.3 KiB
Markdown

## Context
`deskctl` already exposes a useful X11 runtime for screenshots, input, and window management, but the current implementation mixes together concerns that need to be separated before the repo can become a stable primitive. Public window data still exposes `xcb_id`, `list-windows` is routed through screenshot-producing code, setup failures are opaque, and daemon startup behavior is not yet explicit enough for reliable reuse by higher-level tooling.
Phase 1 is the foundation tranche. It should make the runtime contract clean and cheap to consume without expanding into packaging, skills, or non-X11 backends.
## Goals / Non-Goals
**Goals:**
- Define a backend-neutral public window identity and selector contract.
- Make read-only window enumeration cheap and side-effect free.
- Add a first-run diagnostics command that explains broken environments precisely.
- Harden daemon startup and health behavior enough for predictable CLI use.
- Keep the Phase 1 scope implementable in one focused change.
**Non-Goals:**
- Wayland support or any additional backend implementation.
- npm distribution, crates.io publishing, or release automation changes.
- Broad new read surface such as monitors, workspaces, clipboard, or batching.
- Agent skills, config files, or policy/confirmation features.
## Decisions
### 1. Public window identity becomes `window_id`, not `xcb_id`
The stable contract will expose an opaque `window_id` for programmatic targeting. Backend-specific handles such as X11 window IDs stay internal to daemon/backend state.
Rationale:
- This removes X11 leakage from the public interface.
- It keeps the future backend boundary open without promising a Wayland implementation now.
- It makes selector behavior explicit: users and agents target `@wN`, window names, or `window_id`, not backend handles.
Alternatives considered:
- Keep exposing `xcb_id` and add `window_id` alongside it. Rejected because it cements the wrong contract and encourages downstream dependence on X11 internals.
- Hide programmatic identity entirely and rely only on refs. Rejected because refs are intentionally ephemeral and not sufficient for durable automation.
### 2. Window enumeration and screenshot capture become separate backend operations
The backend API will separate:
- window listing / state collection
- screenshot capture
- composed snapshot behavior
`list-windows` will call the cheap enumeration path directly. `snapshot` remains the convenience command that combines enumeration plus screenshot generation.
Rationale:
- Read-only commands must not capture screenshots or write `/tmp` files.
- This reduces latency and unintended side effects.
- It clarifies which operations are safe to call frequently in agent loops.
Alternatives considered:
- Keep the current `snapshot(false)` path and optimize it internally. Rejected because the coupling itself is the product problem; the API shape needs to reflect the intended behavior.
### 3. `deskctl doctor` runs without requiring a healthy daemon
`doctor` will be implemented as a CLI command that can run before daemon startup. It will probe environment prerequisites directly and optionally inspect daemon/socket state as one of the checks.
Expected checks:
- `DISPLAY` present
- X11 session expectation (`XDG_SESSION_TYPE=x11` or explicit note if missing)
- X server connectivity
- required extensions or equivalent runtime capabilities used by the backend
- socket directory existence and permissions
- basic window enumeration
- screenshot capture viability
Rationale:
- Diagnostics must work when daemon startup is the thing that is broken.
- The command should return actionable failure messages, not “failed to connect to daemon.”
Alternatives considered:
- Implement `doctor` as a normal daemon request. Rejected because it hides the startup path and cannot diagnose stale socket or spawn failures well.
### 4. Daemon hardening stays minimal and focused in this phase
Phase 1 daemon work will cover:
- stale socket detection and cleanup on startup/connect
- clearer startup/connect failure messages
- a lightweight health-check request or equivalent status probe used by the client path
Rationale:
- This is enough to make CLI behavior predictable without turning the spec into a full daemon observability project.
Alternatives considered:
- Include idle timeout, structured logging, and full lifecycle policy now. Rejected because those are useful but not necessary to stabilize the basic runtime contract.
### 5. X11 remains the explicit support boundary for this change
The spec and implementation will define expected behavior for X11 environments only. Unsupported-session diagnostics are in scope; a second backend is not.
Rationale:
- The repo needs a stable foundation more than a nominally portable abstraction.
- Clear support boundaries are better than implying near-term Wayland support.
## Risks / Trade-offs
- [Breaking JSON contract for existing users] → Treat this as a deliberate pre-1.0 stabilization change, update docs/examples, and keep the new shape simple.
- [More internal mapping state between public IDs and backend handles] → Keep one canonical mapping in daemon state and use it for selector resolution.
- [Separate read and screenshot paths could drift] → Share the same window collection logic underneath both operations.
- [`doctor` checks may be environment-specific] → Keep the checks narrow, tied to actual backend requirements, and report concrete pass/fail reasons.
## Migration Plan
1. Introduce the new public `window_id` contract and update selector resolution to accept it.
2. Refactor backend and daemon code so `list-windows` uses a pure read path.
3. Add `deskctl doctor` and wire it into the CLI as a daemon-independent command.
4. Add unit and integration coverage for the new contract and cheap read behavior.
5. Update user-facing docs and examples after implementation so the new output shape is canonical.
Rollback strategy:
- This is a pre-1.0 contract cleanup, so rollback is a normal code revert rather than a compatibility shim plan.
## Open Questions
- Whether the health check should be an explicit public `daemon ping` action or an internal client/daemon probe.
- Whether `doctor` should expose machine-readable JSON on day one or land text output first and add JSON immediately afterward in the same tranche if implementation cost is low.