## Summary Fix credential detection bugs and add credential availability status to the API. Consolidate Claude fallback models and add `sonnet` alias. Builds on #109 (OAuth token support). Related issues: - Fixes #117 (Claude, Codex not showing up in gigacode) - Related to #113 (Default agent should be Claude Code) ## Changes ### Credential detection fixes - **`agent-credentials/src/lib.rs`**: Fix `?` operator bug in `extract_claude_credentials` - now continues to next config path if one is missing instead of returning early ### API credential status - **`sandbox-agent/src/router.rs`**: Add `credentialsAvailable` field to `AgentInfo` struct - **`/v1/agents`** endpoint now reports whether each agent has valid credentials ### OpenCode provider improvements - **`sandbox-agent/src/opencode_compat.rs`**: Build `connected` array based on actual credential availability, not just model presence - Check provider-specific credentials for OpenCode groups (e.g., `opencode:anthropic` only connected if Anthropic creds available) - Add logging when credential extraction fails in model cache building ### Fallback model consolidation - Renamed `claude_oauth_fallback_models()` → `claude_fallback_models()` (used for all fallback cases, not just OAuth) - Added `sonnet` to fallback models (confirmed working via headless CLI test) - Added `codex_fallback_models()` for Codex when credentials missing - Added comment explaining aliases work for both API and OAuth users ### Documentation - **`docs/credentials.mdx`**: New reference doc covering credential sources, extraction behavior, and error handling - Documents that extraction failures are silent (not errors) - Documents that agents spawn without credential pre-validation ### Inspector UI - **`AgentsTab.tsx`**: Added credential status pill showing "Authenticated" or "No Credentials" ## Error Handling Philosophy - **Extraction failures are silent**: Missing/malformed config files don't error, just continue to next source - **Agents spawn without credential validation**: No pre-flight auth check; agent's native error surfaces if credentials are missing - **Fallback models for UI**: When credentials missing, show alias-based models so users can still configure sessions ## Validation - Tested Claude Code model aliases via headless CLI: - `claude --model default --print "say hi"` ✓ - `claude --model sonnet --print "say hi"` ✓ - `claude --model haiku --print "say hi"` ✓ - Build passes - TypeScript types regenerated with `credentialsAvailable` field
18 KiB
Research: Process & Terminal System Design
Research on PTY/terminal and process management APIs across sandbox platforms, with design recommendations for sandbox-agent.
Competitive Landscape
Transport Comparison
| Platform | PTY Transport | Command Transport | Unified? |
|---|---|---|---|
| OpenCode | WebSocket (/pty/{id}/connect) |
REST (session-scoped, AI-mediated) | No |
| E2B | gRPC server-stream (output) + unary RPC (input) | Same gRPC service | Yes |
| Daytona | WebSocket | REST | No |
| Kubernetes | WebSocket (channel byte mux) | Same WebSocket | Yes |
| Docker | HTTP connection hijack | Same connection | Yes |
| Fly.io | SSH over WireGuard | REST (sync, 60s max) | No |
| Vercel Sandboxes | No PTY API | REST SDK (async generator for logs) | N/A |
| Gitpod | gRPC (Listen=output, Write=input) | Same gRPC service | Yes |
Resize Mechanism
| Platform | How | Notes |
|---|---|---|
| OpenCode | PUT /pty/{id} with size: {rows, cols} |
Separate REST call |
| E2B | Separate Update RPC |
Separate gRPC call |
| Daytona | Separate HTTP POST | Sends SIGWINCH |
| Kubernetes | In-band WebSocket message (channel byte 4) | {"Width": N, "Height": N} |
| Docker | POST /exec/{id}/resize?h=N&w=N |
Separate REST call |
| Gitpod | Separate SetSize RPC |
Separate gRPC call |
Consensus: Almost all platforms use a separate call for resize. Only Kubernetes does it in-band. Since resize is a control signal (not data), a separate mechanism is cleaner.
I/O Multiplexing
I/O multiplexing is how platforms distinguish between stdout, stderr, and PTY data on a shared connection.
| Platform | Method | Detail |
|---|---|---|
| Docker | 8-byte binary header per frame | Byte 0 = stream type (0=stdin, 1=stdout, 2=stderr). When TTY=true, no mux (raw stream). |
| Kubernetes | 1-byte channel prefix per WebSocket message | 0=stdin, 1=stdout, 2=stderr, 3=error, 4=resize, 255=close |
| E2B | gRPC oneof in protobuf |
DataEvent.output is oneof { bytes stdout, bytes stderr, bytes pty } |
| OpenCode | None | PTY is a unified stream. Commands capture stdout/stderr separately in response. |
| Daytona | None | PTY is unified. Commands return structured {stdout, stderr}. |
Key insight: When a process runs with a PTY allocated, stdout and stderr are merged by the kernel into a single stream. Multiplexing only matters for non-PTY command execution. OpenCode and Daytona handle this by keeping PTY (unified stream) and commands (structured response) as separate APIs.
Reconnection
| Platform | Method | Replays missed output? |
|---|---|---|
| E2B | Connect RPC by PID or tag |
No - only new events from reconnect point |
| Daytona | New WebSocket to same PTY session | No |
| Kubernetes | Not supported (connection = session) | N/A |
| Docker | Not supported (connection = session) | N/A |
| OpenCode | GET /pty/{id}/connect (WebSocket) |
Unknown (not documented) |
Process Identification
| Platform | ID Type | Notes |
|---|---|---|
| OpenCode | String (pty_N) |
Pattern ^pty.* |
| E2B | PID (uint32) or tag (string) | Dual selector |
| Daytona | Session ID / PID | |
| Docker | Exec ID (string, server-generated) | |
| Kubernetes | Connection-scoped | No ID - the WebSocket IS the process |
| Gitpod | Alias (string) | Human-readable |
Scoping
| Platform | PTY Scope | Command Scope |
|---|---|---|
| OpenCode | Server-wide (global) | Session-specific (AI-mediated) |
| E2B | Sandbox-wide | Sandbox-wide |
| Daytona | Sandbox-wide | Sandbox-wide |
| Docker | Container-scoped | Container-scoped |
| Kubernetes | Pod-scoped | Pod-scoped |
Key Questions & Analysis
Q: Should PTY transport be WebSocket?
Yes. WebSocket is the right choice for PTY I/O:
- Bidirectional: client sends keystrokes, server sends terminal output
- Low latency: no HTTP request overhead per keystroke
- Persistent connection: terminal sessions are long-lived
- Industry consensus: OpenCode, Daytona, and Kubernetes all use WebSocket for PTY
Q: Should command transport be WebSocket or REST?
REST is sufficient for commands. WebSocket is not needed.
The distinction comes down to the nature of each operation:
- PTY: Long-lived, bidirectional, interactive. User types, terminal responds. Needs WebSocket.
- Commands: Request-response. Client says "run
ls -la", server runs it, returns stdout/stderr/exit_code. This is a natural REST operation.
The "full duplex" question: commands don't need full duplex because:
- Input is sent once at invocation (the command string)
- Output is collected and returned when the process exits
- There's no ongoing interactive input during execution
For streaming output of long-running commands (e.g., npm install), there are two clean options:
- SSE: Server-Sent Events for output streaming (output-only, which is all you need)
- PTY: If the user needs to interact with the process (send ctrl+c, provide stdin), they should use a PTY instead
This matches how OpenCode separates the two: commands are REST, PTYs are WebSocket.
Recommendation: Keep commands as REST. If a command needs streaming output or interactive input, the user should create a PTY instead. This avoids building a second WebSocket protocol for a use case that PTYs already cover.
Q: Should resize be WebSocket in-band or separate POST?
Separate endpoint (PUT or POST).
Reasons:
- Resize is a control signal, not data. Mixing it into the data stream requires a framing protocol to distinguish resize messages from terminal input.
- OpenCode already defines
PUT /pty/{id}withsize: {rows, cols}- this is the existing spec. - E2B, Daytona, Docker, and Gitpod all use separate calls.
- Only Kubernetes does in-band (because their channel-byte protocol already has a mux layer).
- A separate endpoint is simpler to implement, test, and debug.
Recommendation: Use PUT /pty/{id} with size field (matching OpenCode spec). Alternatively, a dedicated POST /pty/{id}/resize if we want to keep update and resize semantically separate.
Q: What is I/O multiplexing?
I/O multiplexing is the mechanism for distinguishing between different data streams (stdout, stderr, stdin, control signals) on a single connection.
When it matters: Non-PTY command execution where stdout and stderr need to be kept separate.
When it doesn't matter: PTY sessions. When a PTY is allocated, the kernel merges stdout and stderr into a single stream (the PTY master fd). There is only one output stream. This is why terminals show stdout and stderr interleaved - the PTY doesn't distinguish them.
For sandbox-agent: Since PTYs are unified streams and commands use REST (separate stdout/stderr in the JSON response), we don't need a multiplexing protocol. The API design naturally separates the two cases.
Q: How should reconnect work?
Reconnect is an application-level concept, not just HTTP/WebSocket reconnection.
The distinction:
- HTTP/WebSocket reconnect: The transport-level connection drops and is re-established. This is handled by the client library automatically (retry logic, exponential backoff). The server doesn't need to know.
- Process reconnect: The client disconnects from a running process but the process keeps running. Later, the client (or a different client) connects to the same process and starts receiving output again.
E2B's model: Disconnecting a stream (via AbortController) leaves the process running. Connect RPC by PID or tag re-establishes the output stream. Missed output during disconnection is lost. This works because:
- Processes are long-lived (servers, shells)
- For terminals, the screen state can be recovered by the shell/application redrawing
- For commands, if you care about all output, don't disconnect
Recommendation for sandbox-agent: Reconnect should be supported at the application level:
GET /pty/{id}/connect(WebSocket) can be called multiple times for the same PTY- If the WebSocket drops, the PTY process keeps running
- Client reconnects by opening a new WebSocket to the same endpoint
- No output replay (too complex, rarely needed - terminal apps redraw on reconnect via SIGWINCH)
- This is essentially what OpenCode's
/pty/{id}/connectendpoint already implies
This naturally leads to the persistent process system concept (see below).
Q: How are PTY events different from PTY transport?
Two completely separate channels serving different purposes:
PTY Events (via SSE on /event or /sessions/{id}/events/sse):
- Lifecycle notifications:
pty.created,pty.updated,pty.exited,pty.deleted - Lightweight JSON metadata (PTY id, status, exit code)
- Broadcast to all subscribers
- Used by UIs to update PTY lists, show status indicators, handle cleanup
PTY Transport (via WebSocket on /pty/{id}/connect):
- Raw terminal I/O: binary input/output bytes
- High-frequency, high-bandwidth
- Point-to-point (one client connected to one PTY)
- Used by terminal emulators (xterm.js) to render the terminal
Analogy: Events are like email notifications ("a new terminal was opened"). Transport is like the phone call (the actual terminal session).
Q: How are PTY and commands different in OpenCode?
They serve fundamentally different purposes:
PTY (/pty/*) - Direct execution environment:
- Server-scoped (not tied to any AI session)
- Creates a real terminal process
- User interacts directly via WebSocket
- Not part of the AI conversation
- Think: "the terminal panel in VS Code"
Commands (/session/{sessionID}/command, /session/{sessionID}/shell) - AI-mediated execution:
- Session-scoped (tied to an AI session)
- The command is sent to the AI assistant for execution
- Creates an
AssistantMessagein the session's conversation history - Output becomes part of the AI's context
- Think: "asking Claude to run a command as a tool call"
Why commands are session-specific: Because they're AI operations, not direct execution. When you call POST /session/{id}/command, the server:
- Creates an assistant message in the session
- Runs the command
- Captures output as message parts
- Emits
message.part.updatedevents - The AI can see this output in subsequent turns
This is how the AI "uses terminal tools" - the command infrastructure provides the bridge between the AI session and system execution.
Q: Should scoping be system-wide?
Yes, for both PTY and commands.
Current OpenCode behavior:
- PTYs: Already server-wide (global)
- Commands: Session-scoped (for AI context injection)
For sandbox-agent, since we're the orchestration layer (not the AI):
- PTYs: System-wide. Any client should be able to list, connect to, or manage any PTY.
- Commands/processes: System-wide. Process execution is a system primitive, not an AI primitive. If a caller wants to associate a process with a session, they can do so at their layer.
The session-scoping of commands in OpenCode is an OpenCode-specific concern (AI context injection). Sandbox-agent should provide the lower-level primitive (system-wide process execution) and let the OpenCode compat layer handle the session association.
Persistent Process System
The Concept
A persistent process system means:
- Spawn a process (PTY or command) via API
- Process runs independently of any client connection
- Connect/disconnect to the process I/O at will
- Process continues running through disconnections
- Query process status, list running processes
- Kill/signal processes explicitly
This is distinct from the typical "connection = process lifetime" model (Kubernetes, Docker exec) where closing the connection kills the process.
How E2B Does It
E2B's Process service is the best reference implementation:
Start(cmd, pty?) → stream of events (output)
Connect(pid/tag) → stream of events (reconnect)
SendInput(pid, data) → ok
Update(pid, size) → ok (resize)
SendSignal(pid, signal) → ok
List() → running processes
Key design choices:
- Unified service: PTY and command are the same service, differentiated by the
ptyfield inStartRequest - Process outlives connection: Disconnecting the output stream (aborting the
Start/ConnectRPC) does NOT kill the process - Explicit termination: Must call
SendSignal(SIGKILL)to stop a process - Tag-based selection: Processes can be tagged at creation for later lookup without knowing the PID
Recommendation for Sandbox-Agent
Sandbox-agent should implement a persistent process manager that:
- Is system-wide (not session-scoped)
- Supports both PTY and non-PTY modes
- Decouples process lifetime from connection lifetime
- Exposes via both REST (lifecycle) and WebSocket (I/O)
Proposed API Surface
Process Lifecycle (REST):
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/processes |
Create/spawn a process (PTY or command) |
GET |
/v1/processes |
List all processes |
GET |
/v1/processes/{id} |
Get process info (status, pid, exit code) |
DELETE |
/v1/processes/{id} |
Kill process (SIGTERM, then SIGKILL) |
POST |
/v1/processes/{id}/signal |
Send signal (SIGTERM, SIGKILL, SIGINT, etc.) |
POST |
/v1/processes/{id}/resize |
Resize PTY (rows, cols) |
POST |
/v1/processes/{id}/input |
Send stdin/pty input (REST fallback) |
Process I/O (WebSocket):
| Method | Endpoint | Description |
|---|---|---|
GET |
/v1/processes/{id}/connect |
WebSocket for bidirectional I/O |
Process Events (SSE):
| Event | Description |
|---|---|
process.created |
Process spawned |
process.updated |
Process metadata changed |
process.exited |
Process terminated (includes exit code) |
process.deleted |
Process record removed |
Create Request
{
"command": "bash",
"args": ["-i", "-l"],
"cwd": "/workspace",
"env": {"TERM": "xterm-256color"},
"pty": { // Optional - if present, allocate PTY
"rows": 24,
"cols": 80
},
"tag": "main-terminal", // Optional - for lookup by name
"label": "Terminal 1" // Optional - display name
}
Process Object
{
"id": "proc_abc123",
"tag": "main-terminal",
"label": "Terminal 1",
"command": "bash",
"args": ["-i", "-l"],
"cwd": "/workspace",
"pid": 12345,
"pty": true,
"status": "running", // "running" | "exited"
"exit_code": null, // Set when exited
"created_at": "2025-01-15T...",
"exited_at": null
}
OpenCode Compatibility Layer
The OpenCode compat layer maps to this system:
| OpenCode Endpoint | Maps To |
|---|---|
POST /pty |
POST /v1/processes (with pty field) |
GET /pty |
GET /v1/processes?pty=true |
GET /pty/{id} |
GET /v1/processes/{id} |
PUT /pty/{id} |
POST /v1/processes/{id}/resize + metadata update |
DELETE /pty/{id} |
DELETE /v1/processes/{id} |
GET /pty/{id}/connect |
GET /v1/processes/{id}/connect |
POST /session/{id}/command |
Create process + capture output into session |
POST /session/{id}/shell |
Create process (shell mode) + capture output into session |
Open Questions
-
Output buffering for reconnect: Should we buffer recent output (e.g., last 64KB) so reconnecting clients get some history? E2B doesn't do this, but it would improve UX for flaky connections.
-
Process limits: Should there be a max number of concurrent processes? E2B doesn't expose one, but sandbox environments have limited resources.
-
Auto-cleanup: Should processes be auto-cleaned after exiting? Options:
- Keep forever until explicitly deleted
- Auto-delete after N seconds/minutes
- Keep metadata but release resources
-
Input via REST vs WebSocket-only: The REST
POST /processes/{id}/inputendpoint is useful for one-shot input (e.g., "send ctrl+c") without establishing a WebSocket. E2B has bothSendInput(unary) andStreamInput(streaming) for this reason. -
Multiple WebSocket connections to same process: Should we allow multiple clients to connect to the same process simultaneously? (Pair programming, monitoring). E2B supports this via multiple
Connectcalls.
User-Initiated Command Injection ("Run command, give AI context")
A common pattern across agents: the user (or frontend) runs a command and the output is injected into the AI's conversation context. This is distinct from the agent running a command via its own tools.
| Agent | Feature | Mechanism | Protocol-level? |
|---|---|---|---|
| Claude Code | !command prefix in TUI |
CLI runs command locally, injects output as user message | No - client-side hack, not in API schema |
| Codex | user_shell source |
ExecCommandSource enum distinguishes agent vs user_shell vs unified_exec_* |
Yes - first-class protocol event |
| OpenCode | /session/{id}/command |
HTTP endpoint runs command, records result as AssistantMessage |
Yes - HTTP API |
| Amp | N/A | Not supported | N/A |
Design implication for sandbox-agent: The process system should support an optional session_id field when creating a process. If provided, the process output is associated with that session so the agent can see it. If not provided, the process runs independently (like a PTY). This unifies:
- User interactive terminals (no session association)
- User-initiated commands for AI context (session association)
- Agent-initiated background processes (session association)
Sources
- E2B Process Proto -
process.protogRPC service definition - E2B JS SDK -
commands/pty.ts,commands/index.ts - Daytona SDK - REST + WebSocket PTY API
- Kubernetes RemoteCommand - WebSocket subprotocol
- Docker Engine API - Exec API with stream multiplexing
- Fly.io Machines API - REST exec with 60s limit
- Gitpod terminal.proto - gRPC terminal service
- OpenCode OpenAPI Spec - PTY and session command endpoints