feat: enhance desktop computer-use streaming with neko integration

Improve desktop streaming architecture, add inspector dev tooling,
React DesktopViewer updates, and computer-use documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Nathan Flurry 2026-03-16 23:59:43 -07:00
parent 4252c705df
commit 2d8508d6e2
17 changed files with 2712 additions and 688 deletions

View file

@ -0,0 +1,103 @@
# Desktop Streaming Architecture
## Decision: neko over GStreamer (direct) and VNC
We evaluated three approaches for streaming the virtual desktop to browser clients:
1. **VNC (noVNC/websockify)** - traditional remote desktop
2. **GStreamer WebRTC (direct)** - custom GStreamer pipeline in the sandbox agent process
3. **neko** - standalone WebRTC streaming server with its own GStreamer pipeline
We chose **neko**.
## Approach comparison
### VNC (noVNC)
- Uses RFB protocol, not WebRTC. Relies on pixel-diff framebuffer updates over WebSocket.
- Higher latency than WebRTC (no hardware-accelerated codec, no adaptive bitrate).
- Requires a VNC server (x11vnc or similar) plus websockify for browser access.
- Input handling is mature but tied to the RFB protocol.
- No audio support without additional plumbing.
**Rejected because:** Latency is noticeably worse than WebRTC-based approaches. The pixel-diff approach doesn't scale well at higher resolutions or frame rates. No native audio path.
### GStreamer WebRTC (direct)
- Custom pipeline: `ximagesrc -> videoconvert -> vp8enc -> rtpvp8pay -> webrtcbin`.
- Runs inside the sandbox agent Rust process using `gstreamer-rs` bindings.
- Requires feature-gating (`desktop-gstreamer` Cargo feature) and linking GStreamer at compile time.
- ICE candidate handling is complex: Docker-internal IPs (172.17.x.x) must be rewritten to 127.0.0.1 for host browser connectivity.
- UDP port range must be constrained via libnice NiceAgent properties to stay within Docker-forwarded ports.
- Input must be implemented separately (xdotool or custom X11 input injection).
- No built-in session management, authentication, or multi-client support.
**Rejected because:** Too much complexity for the sandbox agent to own directly. ICE/NAT traversal bugs are hard to debug. The GStreamer Rust bindings add significant compile-time dependencies. Input handling requires a separate implementation. We built and tested this approach (branch `desktop-computer-use`, PR #226) and found:
- Black screen issues due to GStreamer pipeline negotiation failures
- ICE candidate rewriting fragility across Docker networking modes
- libnice port range configuration requires accessing internal NiceAgent properties that vary across GStreamer versions
- No data channel for low-latency input (had to fall back to WebSocket-based input which adds a round trip)
### neko (chosen)
- Standalone Go binary extracted from `ghcr.io/m1k1o/neko/base`.
- Has its own GStreamer pipeline internally (same `ximagesrc -> vp8enc -> webrtcbin` approach, but battle-tested).
- Provides WebSocket signaling, WebRTC media, and a binary data channel for input, all out of the box.
- Input via data channel is low-latency (sub-frame, no HTTP round trip). Uses X11 XTEST extension.
- Multi-session support with `noauth` provider (each browser tab gets its own session).
- ICE-lite mode with `--webrtc.nat1to1 127.0.0.1` eliminates NAT traversal issues for Docker-to-host.
- EPR (ephemeral port range) flag constrains UDP ports cleanly.
- Sandbox agent acts as a thin WebSocket proxy: browser WS connects to sandbox agent, which creates a per-connection neko login session and relays signaling messages bidirectionally.
- Audio codec support (opus) included for free.
**Chosen because:** Neko encapsulates all the hard WebRTC/GStreamer/input complexity into a single binary. The sandbox agent only needs to:
1. Manage the neko process lifecycle (start/stop via the process runtime)
2. Proxy WebSocket signaling (bidirectional relay, ~60 lines of code)
3. Handle neko session creation (HTTP login to get a session cookie)
This keeps the sandbox agent's desktop streaming code simple (~300 lines for the manager, ~120 lines for the WS proxy) while delivering production-quality WebRTC streaming with data channel input.
## Architecture
```
Browser Sandbox Agent neko (internal)
| | |
|-- WS /stream/signaling --> |-- WS ws://127.0.0.1:18100/api/ws -->|
| | (bidirectional relay) |
|<-- neko signaling ---------|<-- neko signaling -------|
| | |
|<========= WebRTC (UDP 59000-59100) ==================>|
| VP8 video, Opus audio, binary data channel |
| |
|-- data channel input (mouse/keyboard) --------------->|
| (binary protocol: opcode + payload, big-endian) |
```
Key points:
- neko listens on internal port 18100 (not exposed externally).
- UDP ports 59000-59100 are forwarded through Docker for WebRTC media.
- `--webrtc.icelite` + `--webrtc.nat1to1 127.0.0.1` means neko advertises 127.0.0.1 as its ICE candidate, so the browser connects to localhost UDP ports directly.
- `--desktop.input.enabled=false` disables neko's custom xf86-input driver (not available outside neko's official Docker images). Input falls back to XTEST.
- Each WebSocket proxy connection creates a fresh neko login session with a unique username to avoid session conflicts when multiple clients connect.
## Trade-offs
| Concern | neko | GStreamer direct |
|---------|------|-----------------|
| Binary size | ~30MB additional binary | ~0 (uses system GStreamer libs) |
| Compile-time deps | None (external binary) | gstreamer-rs crate + GStreamer dev libs |
| Input latency | Sub-frame (data channel) | WebSocket round trip |
| ICE/NAT complexity | Handled by neko flags | Must implement in Rust |
| Multi-client | Built-in session management | Must implement |
| Maintenance | Upstream neko updates | Own all the code |
| Audio | Built-in (opus) | Must add audio pipeline |
The main trade-off is the additional ~30MB binary size from neko. This is acceptable for the Docker-based deployment model where image size is less critical than reliability and development velocity.
## References
- neko v3: https://github.com/m1k1o/neko
- neko client reference: https://github.com/demodesk/neko-client
- neko data channel protocol: https://github.com/m1k1o/neko/blob/master/server/internal/webrtc/payload/receive.go
- GStreamer branch (closed): PR #226, branch `desktop-computer-use`
- Image digest: `ghcr.io/m1k1o/neko/base@sha256:0c384afa56268aaa2d5570211d284763d0840dcdd1a7d9a24be3081d94d3dfce`