sandbox-agent/research/process-terminal-design.md
NathanFlurry c54f83e1a6
fix: credential detection and provider auth status (#120)
## Summary

Fix credential detection bugs and add credential availability status to the API. Consolidate Claude fallback models and add `sonnet` alias.

Builds on #109 (OAuth token support).

Related issues:
- Fixes #117 (Claude, Codex not showing up in gigacode)
- Related to #113 (Default agent should be Claude Code)

## Changes

### Credential detection fixes
- **`agent-credentials/src/lib.rs`**: Fix `?` operator bug in `extract_claude_credentials` - now continues to next config path if one is missing instead of returning early

### API credential status
- **`sandbox-agent/src/router.rs`**: Add `credentialsAvailable` field to `AgentInfo` struct
- **`/v1/agents`** endpoint now reports whether each agent has valid credentials

### OpenCode provider improvements
- **`sandbox-agent/src/opencode_compat.rs`**: Build `connected` array based on actual credential availability, not just model presence
- Check provider-specific credentials for OpenCode groups (e.g., `opencode:anthropic` only connected if Anthropic creds available)
- Add logging when credential extraction fails in model cache building

### Fallback model consolidation
- Renamed `claude_oauth_fallback_models()` → `claude_fallback_models()` (used for all fallback cases, not just OAuth)
- Added `sonnet` to fallback models (confirmed working via headless CLI test)
- Added `codex_fallback_models()` for Codex when credentials missing
- Added comment explaining aliases work for both API and OAuth users

### Documentation
- **`docs/credentials.mdx`**: New reference doc covering credential sources, extraction behavior, and error handling
- Documents that extraction failures are silent (not errors)
- Documents that agents spawn without credential pre-validation

### Inspector UI
- **`AgentsTab.tsx`**: Added credential status pill showing "Authenticated" or "No Credentials"

## Error Handling Philosophy

- **Extraction failures are silent**: Missing/malformed config files don't error, just continue to next source
- **Agents spawn without credential validation**: No pre-flight auth check; agent's native error surfaces if credentials are missing
- **Fallback models for UI**: When credentials missing, show alias-based models so users can still configure sessions

## Validation

- Tested Claude Code model aliases via headless CLI:
  - `claude --model default --print "say hi"` ✓
  - `claude --model sonnet --print "say hi"` ✓
  - `claude --model haiku --print "say hi"` ✓
- Build passes
- TypeScript types regenerated with `credentialsAvailable` field
2026-02-07 07:56:06 +00:00

18 KiB

Research: Process & Terminal System Design

Research on PTY/terminal and process management APIs across sandbox platforms, with design recommendations for sandbox-agent.

Competitive Landscape

Transport Comparison

Platform PTY Transport Command Transport Unified?
OpenCode WebSocket (/pty/{id}/connect) REST (session-scoped, AI-mediated) No
E2B gRPC server-stream (output) + unary RPC (input) Same gRPC service Yes
Daytona WebSocket REST No
Kubernetes WebSocket (channel byte mux) Same WebSocket Yes
Docker HTTP connection hijack Same connection Yes
Fly.io SSH over WireGuard REST (sync, 60s max) No
Vercel Sandboxes No PTY API REST SDK (async generator for logs) N/A
Gitpod gRPC (Listen=output, Write=input) Same gRPC service Yes

Resize Mechanism

Platform How Notes
OpenCode PUT /pty/{id} with size: {rows, cols} Separate REST call
E2B Separate Update RPC Separate gRPC call
Daytona Separate HTTP POST Sends SIGWINCH
Kubernetes In-band WebSocket message (channel byte 4) {"Width": N, "Height": N}
Docker POST /exec/{id}/resize?h=N&w=N Separate REST call
Gitpod Separate SetSize RPC Separate gRPC call

Consensus: Almost all platforms use a separate call for resize. Only Kubernetes does it in-band. Since resize is a control signal (not data), a separate mechanism is cleaner.

I/O Multiplexing

I/O multiplexing is how platforms distinguish between stdout, stderr, and PTY data on a shared connection.

Platform Method Detail
Docker 8-byte binary header per frame Byte 0 = stream type (0=stdin, 1=stdout, 2=stderr). When TTY=true, no mux (raw stream).
Kubernetes 1-byte channel prefix per WebSocket message 0=stdin, 1=stdout, 2=stderr, 3=error, 4=resize, 255=close
E2B gRPC oneof in protobuf DataEvent.output is oneof { bytes stdout, bytes stderr, bytes pty }
OpenCode None PTY is a unified stream. Commands capture stdout/stderr separately in response.
Daytona None PTY is unified. Commands return structured {stdout, stderr}.

Key insight: When a process runs with a PTY allocated, stdout and stderr are merged by the kernel into a single stream. Multiplexing only matters for non-PTY command execution. OpenCode and Daytona handle this by keeping PTY (unified stream) and commands (structured response) as separate APIs.

Reconnection

Platform Method Replays missed output?
E2B Connect RPC by PID or tag No - only new events from reconnect point
Daytona New WebSocket to same PTY session No
Kubernetes Not supported (connection = session) N/A
Docker Not supported (connection = session) N/A
OpenCode GET /pty/{id}/connect (WebSocket) Unknown (not documented)

Process Identification

Platform ID Type Notes
OpenCode String (pty_N) Pattern ^pty.*
E2B PID (uint32) or tag (string) Dual selector
Daytona Session ID / PID
Docker Exec ID (string, server-generated)
Kubernetes Connection-scoped No ID - the WebSocket IS the process
Gitpod Alias (string) Human-readable

Scoping

Platform PTY Scope Command Scope
OpenCode Server-wide (global) Session-specific (AI-mediated)
E2B Sandbox-wide Sandbox-wide
Daytona Sandbox-wide Sandbox-wide
Docker Container-scoped Container-scoped
Kubernetes Pod-scoped Pod-scoped

Key Questions & Analysis

Q: Should PTY transport be WebSocket?

Yes. WebSocket is the right choice for PTY I/O:

  • Bidirectional: client sends keystrokes, server sends terminal output
  • Low latency: no HTTP request overhead per keystroke
  • Persistent connection: terminal sessions are long-lived
  • Industry consensus: OpenCode, Daytona, and Kubernetes all use WebSocket for PTY

Q: Should command transport be WebSocket or REST?

REST is sufficient for commands. WebSocket is not needed.

The distinction comes down to the nature of each operation:

  • PTY: Long-lived, bidirectional, interactive. User types, terminal responds. Needs WebSocket.
  • Commands: Request-response. Client says "run ls -la", server runs it, returns stdout/stderr/exit_code. This is a natural REST operation.

The "full duplex" question: commands don't need full duplex because:

  1. Input is sent once at invocation (the command string)
  2. Output is collected and returned when the process exits
  3. There's no ongoing interactive input during execution

For streaming output of long-running commands (e.g., npm install), there are two clean options:

  1. SSE: Server-Sent Events for output streaming (output-only, which is all you need)
  2. PTY: If the user needs to interact with the process (send ctrl+c, provide stdin), they should use a PTY instead

This matches how OpenCode separates the two: commands are REST, PTYs are WebSocket.

Recommendation: Keep commands as REST. If a command needs streaming output or interactive input, the user should create a PTY instead. This avoids building a second WebSocket protocol for a use case that PTYs already cover.

Q: Should resize be WebSocket in-band or separate POST?

Separate endpoint (PUT or POST).

Reasons:

  • Resize is a control signal, not data. Mixing it into the data stream requires a framing protocol to distinguish resize messages from terminal input.
  • OpenCode already defines PUT /pty/{id} with size: {rows, cols} - this is the existing spec.
  • E2B, Daytona, Docker, and Gitpod all use separate calls.
  • Only Kubernetes does in-band (because their channel-byte protocol already has a mux layer).
  • A separate endpoint is simpler to implement, test, and debug.

Recommendation: Use PUT /pty/{id} with size field (matching OpenCode spec). Alternatively, a dedicated POST /pty/{id}/resize if we want to keep update and resize semantically separate.

Q: What is I/O multiplexing?

I/O multiplexing is the mechanism for distinguishing between different data streams (stdout, stderr, stdin, control signals) on a single connection.

When it matters: Non-PTY command execution where stdout and stderr need to be kept separate.

When it doesn't matter: PTY sessions. When a PTY is allocated, the kernel merges stdout and stderr into a single stream (the PTY master fd). There is only one output stream. This is why terminals show stdout and stderr interleaved - the PTY doesn't distinguish them.

For sandbox-agent: Since PTYs are unified streams and commands use REST (separate stdout/stderr in the JSON response), we don't need a multiplexing protocol. The API design naturally separates the two cases.

Q: How should reconnect work?

Reconnect is an application-level concept, not just HTTP/WebSocket reconnection.

The distinction:

  • HTTP/WebSocket reconnect: The transport-level connection drops and is re-established. This is handled by the client library automatically (retry logic, exponential backoff). The server doesn't need to know.
  • Process reconnect: The client disconnects from a running process but the process keeps running. Later, the client (or a different client) connects to the same process and starts receiving output again.

E2B's model: Disconnecting a stream (via AbortController) leaves the process running. Connect RPC by PID or tag re-establishes the output stream. Missed output during disconnection is lost. This works because:

  1. Processes are long-lived (servers, shells)
  2. For terminals, the screen state can be recovered by the shell/application redrawing
  3. For commands, if you care about all output, don't disconnect

Recommendation for sandbox-agent: Reconnect should be supported at the application level:

  1. GET /pty/{id}/connect (WebSocket) can be called multiple times for the same PTY
  2. If the WebSocket drops, the PTY process keeps running
  3. Client reconnects by opening a new WebSocket to the same endpoint
  4. No output replay (too complex, rarely needed - terminal apps redraw on reconnect via SIGWINCH)
  5. This is essentially what OpenCode's /pty/{id}/connect endpoint already implies

This naturally leads to the persistent process system concept (see below).

Q: How are PTY events different from PTY transport?

Two completely separate channels serving different purposes:

PTY Events (via SSE on /event or /sessions/{id}/events/sse):

  • Lifecycle notifications: pty.created, pty.updated, pty.exited, pty.deleted
  • Lightweight JSON metadata (PTY id, status, exit code)
  • Broadcast to all subscribers
  • Used by UIs to update PTY lists, show status indicators, handle cleanup

PTY Transport (via WebSocket on /pty/{id}/connect):

  • Raw terminal I/O: binary input/output bytes
  • High-frequency, high-bandwidth
  • Point-to-point (one client connected to one PTY)
  • Used by terminal emulators (xterm.js) to render the terminal

Analogy: Events are like email notifications ("a new terminal was opened"). Transport is like the phone call (the actual terminal session).

Q: How are PTY and commands different in OpenCode?

They serve fundamentally different purposes:

PTY (/pty/*) - Direct execution environment:

  • Server-scoped (not tied to any AI session)
  • Creates a real terminal process
  • User interacts directly via WebSocket
  • Not part of the AI conversation
  • Think: "the terminal panel in VS Code"

Commands (/session/{sessionID}/command, /session/{sessionID}/shell) - AI-mediated execution:

  • Session-scoped (tied to an AI session)
  • The command is sent to the AI assistant for execution
  • Creates an AssistantMessage in the session's conversation history
  • Output becomes part of the AI's context
  • Think: "asking Claude to run a command as a tool call"

Why commands are session-specific: Because they're AI operations, not direct execution. When you call POST /session/{id}/command, the server:

  1. Creates an assistant message in the session
  2. Runs the command
  3. Captures output as message parts
  4. Emits message.part.updated events
  5. The AI can see this output in subsequent turns

This is how the AI "uses terminal tools" - the command infrastructure provides the bridge between the AI session and system execution.

Q: Should scoping be system-wide?

Yes, for both PTY and commands.

Current OpenCode behavior:

  • PTYs: Already server-wide (global)
  • Commands: Session-scoped (for AI context injection)

For sandbox-agent, since we're the orchestration layer (not the AI):

  • PTYs: System-wide. Any client should be able to list, connect to, or manage any PTY.
  • Commands/processes: System-wide. Process execution is a system primitive, not an AI primitive. If a caller wants to associate a process with a session, they can do so at their layer.

The session-scoping of commands in OpenCode is an OpenCode-specific concern (AI context injection). Sandbox-agent should provide the lower-level primitive (system-wide process execution) and let the OpenCode compat layer handle the session association.

Persistent Process System

The Concept

A persistent process system means:

  1. Spawn a process (PTY or command) via API
  2. Process runs independently of any client connection
  3. Connect/disconnect to the process I/O at will
  4. Process continues running through disconnections
  5. Query process status, list running processes
  6. Kill/signal processes explicitly

This is distinct from the typical "connection = process lifetime" model (Kubernetes, Docker exec) where closing the connection kills the process.

How E2B Does It

E2B's Process service is the best reference implementation:

Start(cmd, pty?) → stream of events (output)
Connect(pid/tag) → stream of events (reconnect)
SendInput(pid, data) → ok
Update(pid, size) → ok (resize)
SendSignal(pid, signal) → ok
List() → running processes

Key design choices:

  • Unified service: PTY and command are the same service, differentiated by the pty field in StartRequest
  • Process outlives connection: Disconnecting the output stream (aborting the Start/Connect RPC) does NOT kill the process
  • Explicit termination: Must call SendSignal(SIGKILL) to stop a process
  • Tag-based selection: Processes can be tagged at creation for later lookup without knowing the PID

Recommendation for Sandbox-Agent

Sandbox-agent should implement a persistent process manager that:

  1. Is system-wide (not session-scoped)
  2. Supports both PTY and non-PTY modes
  3. Decouples process lifetime from connection lifetime
  4. Exposes via both REST (lifecycle) and WebSocket (I/O)

Proposed API Surface

Process Lifecycle (REST):

Method Endpoint Description
POST /v1/processes Create/spawn a process (PTY or command)
GET /v1/processes List all processes
GET /v1/processes/{id} Get process info (status, pid, exit code)
DELETE /v1/processes/{id} Kill process (SIGTERM, then SIGKILL)
POST /v1/processes/{id}/signal Send signal (SIGTERM, SIGKILL, SIGINT, etc.)
POST /v1/processes/{id}/resize Resize PTY (rows, cols)
POST /v1/processes/{id}/input Send stdin/pty input (REST fallback)

Process I/O (WebSocket):

Method Endpoint Description
GET /v1/processes/{id}/connect WebSocket for bidirectional I/O

Process Events (SSE):

Event Description
process.created Process spawned
process.updated Process metadata changed
process.exited Process terminated (includes exit code)
process.deleted Process record removed

Create Request

{
  "command": "bash",
  "args": ["-i", "-l"],
  "cwd": "/workspace",
  "env": {"TERM": "xterm-256color"},
  "pty": {                         // Optional - if present, allocate PTY
    "rows": 24,
    "cols": 80
  },
  "tag": "main-terminal",          // Optional - for lookup by name
  "label": "Terminal 1"            // Optional - display name
}

Process Object

{
  "id": "proc_abc123",
  "tag": "main-terminal",
  "label": "Terminal 1",
  "command": "bash",
  "args": ["-i", "-l"],
  "cwd": "/workspace",
  "pid": 12345,
  "pty": true,
  "status": "running",             // "running" | "exited"
  "exit_code": null,               // Set when exited
  "created_at": "2025-01-15T...",
  "exited_at": null
}

OpenCode Compatibility Layer

The OpenCode compat layer maps to this system:

OpenCode Endpoint Maps To
POST /pty POST /v1/processes (with pty field)
GET /pty GET /v1/processes?pty=true
GET /pty/{id} GET /v1/processes/{id}
PUT /pty/{id} POST /v1/processes/{id}/resize + metadata update
DELETE /pty/{id} DELETE /v1/processes/{id}
GET /pty/{id}/connect GET /v1/processes/{id}/connect
POST /session/{id}/command Create process + capture output into session
POST /session/{id}/shell Create process (shell mode) + capture output into session

Open Questions

  1. Output buffering for reconnect: Should we buffer recent output (e.g., last 64KB) so reconnecting clients get some history? E2B doesn't do this, but it would improve UX for flaky connections.

  2. Process limits: Should there be a max number of concurrent processes? E2B doesn't expose one, but sandbox environments have limited resources.

  3. Auto-cleanup: Should processes be auto-cleaned after exiting? Options:

    • Keep forever until explicitly deleted
    • Auto-delete after N seconds/minutes
    • Keep metadata but release resources
  4. Input via REST vs WebSocket-only: The REST POST /processes/{id}/input endpoint is useful for one-shot input (e.g., "send ctrl+c") without establishing a WebSocket. E2B has both SendInput (unary) and StreamInput (streaming) for this reason.

  5. Multiple WebSocket connections to same process: Should we allow multiple clients to connect to the same process simultaneously? (Pair programming, monitoring). E2B supports this via multiple Connect calls.

User-Initiated Command Injection ("Run command, give AI context")

A common pattern across agents: the user (or frontend) runs a command and the output is injected into the AI's conversation context. This is distinct from the agent running a command via its own tools.

Agent Feature Mechanism Protocol-level?
Claude Code !command prefix in TUI CLI runs command locally, injects output as user message No - client-side hack, not in API schema
Codex user_shell source ExecCommandSource enum distinguishes agent vs user_shell vs unified_exec_* Yes - first-class protocol event
OpenCode /session/{id}/command HTTP endpoint runs command, records result as AssistantMessage Yes - HTTP API
Amp N/A Not supported N/A

Design implication for sandbox-agent: The process system should support an optional session_id field when creating a process. If provided, the process output is associated with that session so the agent can see it. If not provided, the process runs independently (like a PTY). This unifies:

  • User interactive terminals (no session association)
  • User-initiated commands for AI context (session association)
  • Agent-initiated background processes (session association)

Sources