mirror of
https://github.com/harivansh-afk/sandbox-agent.git
synced 2026-04-15 07:04:48 +00:00
Improve desktop streaming architecture, add inspector dev tooling, React DesktopViewer updates, and computer-use documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
103 lines
6.2 KiB
Markdown
103 lines
6.2 KiB
Markdown
# Desktop Streaming Architecture
|
|
|
|
## Decision: neko over GStreamer (direct) and VNC
|
|
|
|
We evaluated three approaches for streaming the virtual desktop to browser clients:
|
|
|
|
1. **VNC (noVNC/websockify)** - traditional remote desktop
|
|
2. **GStreamer WebRTC (direct)** - custom GStreamer pipeline in the sandbox agent process
|
|
3. **neko** - standalone WebRTC streaming server with its own GStreamer pipeline
|
|
|
|
We chose **neko**.
|
|
|
|
## Approach comparison
|
|
|
|
### VNC (noVNC)
|
|
|
|
- Uses RFB protocol, not WebRTC. Relies on pixel-diff framebuffer updates over WebSocket.
|
|
- Higher latency than WebRTC (no hardware-accelerated codec, no adaptive bitrate).
|
|
- Requires a VNC server (x11vnc or similar) plus websockify for browser access.
|
|
- Input handling is mature but tied to the RFB protocol.
|
|
- No audio support without additional plumbing.
|
|
|
|
**Rejected because:** Latency is noticeably worse than WebRTC-based approaches. The pixel-diff approach doesn't scale well at higher resolutions or frame rates. No native audio path.
|
|
|
|
### GStreamer WebRTC (direct)
|
|
|
|
- Custom pipeline: `ximagesrc -> videoconvert -> vp8enc -> rtpvp8pay -> webrtcbin`.
|
|
- Runs inside the sandbox agent Rust process using `gstreamer-rs` bindings.
|
|
- Requires feature-gating (`desktop-gstreamer` Cargo feature) and linking GStreamer at compile time.
|
|
- ICE candidate handling is complex: Docker-internal IPs (172.17.x.x) must be rewritten to 127.0.0.1 for host browser connectivity.
|
|
- UDP port range must be constrained via libnice NiceAgent properties to stay within Docker-forwarded ports.
|
|
- Input must be implemented separately (xdotool or custom X11 input injection).
|
|
- No built-in session management, authentication, or multi-client support.
|
|
|
|
**Rejected because:** Too much complexity for the sandbox agent to own directly. ICE/NAT traversal bugs are hard to debug. The GStreamer Rust bindings add significant compile-time dependencies. Input handling requires a separate implementation. We built and tested this approach (branch `desktop-computer-use`, PR #226) and found:
|
|
- Black screen issues due to GStreamer pipeline negotiation failures
|
|
- ICE candidate rewriting fragility across Docker networking modes
|
|
- libnice port range configuration requires accessing internal NiceAgent properties that vary across GStreamer versions
|
|
- No data channel for low-latency input (had to fall back to WebSocket-based input which adds a round trip)
|
|
|
|
### neko (chosen)
|
|
|
|
- Standalone Go binary extracted from `ghcr.io/m1k1o/neko/base`.
|
|
- Has its own GStreamer pipeline internally (same `ximagesrc -> vp8enc -> webrtcbin` approach, but battle-tested).
|
|
- Provides WebSocket signaling, WebRTC media, and a binary data channel for input, all out of the box.
|
|
- Input via data channel is low-latency (sub-frame, no HTTP round trip). Uses X11 XTEST extension.
|
|
- Multi-session support with `noauth` provider (each browser tab gets its own session).
|
|
- ICE-lite mode with `--webrtc.nat1to1 127.0.0.1` eliminates NAT traversal issues for Docker-to-host.
|
|
- EPR (ephemeral port range) flag constrains UDP ports cleanly.
|
|
- Sandbox agent acts as a thin WebSocket proxy: browser WS connects to sandbox agent, which creates a per-connection neko login session and relays signaling messages bidirectionally.
|
|
- Audio codec support (opus) included for free.
|
|
|
|
**Chosen because:** Neko encapsulates all the hard WebRTC/GStreamer/input complexity into a single binary. The sandbox agent only needs to:
|
|
1. Manage the neko process lifecycle (start/stop via the process runtime)
|
|
2. Proxy WebSocket signaling (bidirectional relay, ~60 lines of code)
|
|
3. Handle neko session creation (HTTP login to get a session cookie)
|
|
|
|
This keeps the sandbox agent's desktop streaming code simple (~300 lines for the manager, ~120 lines for the WS proxy) while delivering production-quality WebRTC streaming with data channel input.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Browser Sandbox Agent neko (internal)
|
|
| | |
|
|
|-- WS /stream/signaling --> |-- WS ws://127.0.0.1:18100/api/ws -->|
|
|
| | (bidirectional relay) |
|
|
|<-- neko signaling ---------|<-- neko signaling -------|
|
|
| | |
|
|
|<========= WebRTC (UDP 59000-59100) ==================>|
|
|
| VP8 video, Opus audio, binary data channel |
|
|
| |
|
|
|-- data channel input (mouse/keyboard) --------------->|
|
|
| (binary protocol: opcode + payload, big-endian) |
|
|
```
|
|
|
|
Key points:
|
|
- neko listens on internal port 18100 (not exposed externally).
|
|
- UDP ports 59000-59100 are forwarded through Docker for WebRTC media.
|
|
- `--webrtc.icelite` + `--webrtc.nat1to1 127.0.0.1` means neko advertises 127.0.0.1 as its ICE candidate, so the browser connects to localhost UDP ports directly.
|
|
- `--desktop.input.enabled=false` disables neko's custom xf86-input driver (not available outside neko's official Docker images). Input falls back to XTEST.
|
|
- Each WebSocket proxy connection creates a fresh neko login session with a unique username to avoid session conflicts when multiple clients connect.
|
|
|
|
## Trade-offs
|
|
|
|
| Concern | neko | GStreamer direct |
|
|
|---------|------|-----------------|
|
|
| Binary size | ~30MB additional binary | ~0 (uses system GStreamer libs) |
|
|
| Compile-time deps | None (external binary) | gstreamer-rs crate + GStreamer dev libs |
|
|
| Input latency | Sub-frame (data channel) | WebSocket round trip |
|
|
| ICE/NAT complexity | Handled by neko flags | Must implement in Rust |
|
|
| Multi-client | Built-in session management | Must implement |
|
|
| Maintenance | Upstream neko updates | Own all the code |
|
|
| Audio | Built-in (opus) | Must add audio pipeline |
|
|
|
|
The main trade-off is the additional ~30MB binary size from neko. This is acceptable for the Docker-based deployment model where image size is less critical than reliability and development velocity.
|
|
|
|
## References
|
|
|
|
- neko v3: https://github.com/m1k1o/neko
|
|
- neko client reference: https://github.com/demodesk/neko-client
|
|
- neko data channel protocol: https://github.com/m1k1o/neko/blob/master/server/internal/webrtc/payload/receive.go
|
|
- GStreamer branch (closed): PR #226, branch `desktop-computer-use`
|
|
- Image digest: `ghcr.io/m1k1o/neko/base@sha256:0c384afa56268aaa2d5570211d284763d0840dcdd1a7d9a24be3081d94d3dfce`
|