mirror of https://github.com/harivansh-afk/sandbox-agent.git synced 2026-04-15 09:01:17 +00:00

fix: credential detection and provider auth status (#120 )

## Summary

Fix credential detection bugs and add credential availability status to the API. Consolidate Claude fallback models and add `sonnet` alias.

Builds on #109 (OAuth token support).

Related issues:
- Fixes #117 (Claude, Codex not showing up in gigacode)
- Related to #113 (Default agent should be Claude Code)

## Changes

### Credential detection fixes
- **`agent-credentials/src/lib.rs`**: Fix `?` operator bug in `extract_claude_credentials` - now continues to next config path if one is missing instead of returning early

### API credential status
- **`sandbox-agent/src/router.rs`**: Add `credentialsAvailable` field to `AgentInfo` struct
- **`/v1/agents`** endpoint now reports whether each agent has valid credentials

### OpenCode provider improvements
- **`sandbox-agent/src/opencode_compat.rs`**: Build `connected` array based on actual credential availability, not just model presence
- Check provider-specific credentials for OpenCode groups (e.g., `opencode:anthropic` only connected if Anthropic creds available)
- Add logging when credential extraction fails in model cache building

### Fallback model consolidation
- Renamed `claude_oauth_fallback_models()` → `claude_fallback_models()` (used for all fallback cases, not just OAuth)
- Added `sonnet` to fallback models (confirmed working via headless CLI test)
- Added `codex_fallback_models()` for Codex when credentials missing
- Added comment explaining aliases work for both API and OAuth users

### Documentation
- **`docs/credentials.mdx`**: New reference doc covering credential sources, extraction behavior, and error handling
- Documents that extraction failures are silent (not errors)
- Documents that agents spawn without credential pre-validation

### Inspector UI
- **`AgentsTab.tsx`**: Added credential status pill showing "Authenticated" or "No Credentials"

## Error Handling Philosophy

- **Extraction failures are silent**: Missing/malformed config files don't error, just continue to next source
- **Agents spawn without credential validation**: No pre-flight auth check; agent's native error surfaces if credentials are missing
- **Fallback models for UI**: When credentials missing, show alias-based models so users can still configure sessions

## Validation

- Tested Claude Code model aliases via headless CLI:
  - `claude --model default --print "say hi"` ✓
  - `claude --model sonnet --print "say hi"` ✓
  - `claude --model haiku --print "say hi"` ✓
- Build passes
- TypeScript types regenerated with `credentialsAvailable` field

2026-02-07 07:56:06 +00:00

18 KiB

Raw Blame History

Research: Process & Terminal System Design

Research on PTY/terminal and process management APIs across sandbox platforms, with design recommendations for sandbox-agent.

Competitive Landscape

Transport Comparison

Platform	PTY Transport	Command Transport	Unified?
OpenCode	WebSocket (`/pty/{id}/connect`)	REST (session-scoped, AI-mediated)	No
E2B	gRPC server-stream (output) + unary RPC (input)	Same gRPC service	Yes
Daytona	WebSocket	REST	No
Kubernetes	WebSocket (channel byte mux)	Same WebSocket	Yes
Docker	HTTP connection hijack	Same connection	Yes
Fly.io	SSH over WireGuard	REST (sync, 60s max)	No
Vercel Sandboxes	No PTY API	REST SDK (async generator for logs)	N/A
Gitpod	gRPC (Listen=output, Write=input)	Same gRPC service	Yes

Resize Mechanism

Platform	How	Notes
OpenCode	`PUT /pty/{id}` with `size: {rows, cols}`	Separate REST call
E2B	Separate `Update` RPC	Separate gRPC call
Daytona	Separate HTTP POST	Sends SIGWINCH
Kubernetes	In-band WebSocket message (channel byte 4)	`{"Width": N, "Height": N}`
Docker	`POST /exec/{id}/resize?h=N&w=N`	Separate REST call
Gitpod	Separate `SetSize` RPC	Separate gRPC call

Consensus: Almost all platforms use a separate call for resize. Only Kubernetes does it in-band. Since resize is a control signal (not data), a separate mechanism is cleaner.

I/O Multiplexing

I/O multiplexing is how platforms distinguish between stdout, stderr, and PTY data on a shared connection.

Platform	Method	Detail
Docker	8-byte binary header per frame	Byte 0 = stream type (0=stdin, 1=stdout, 2=stderr). When TTY=true, no mux (raw stream).
Kubernetes	1-byte channel prefix per WebSocket message	0=stdin, 1=stdout, 2=stderr, 3=error, 4=resize, 255=close
E2B	gRPC `oneof` in protobuf	`DataEvent.output` is `oneof { bytes stdout, bytes stderr, bytes pty }`
OpenCode	None	PTY is a unified stream. Commands capture stdout/stderr separately in response.
Daytona	None	PTY is unified. Commands return structured `{stdout, stderr}`.

Key insight: When a process runs with a PTY allocated, stdout and stderr are merged by the kernel into a single stream. Multiplexing only matters for non-PTY command execution. OpenCode and Daytona handle this by keeping PTY (unified stream) and commands (structured response) as separate APIs.

Reconnection

Platform	Method	Replays missed output?
E2B	`Connect` RPC by PID or tag	No - only new events from reconnect point
Daytona	New WebSocket to same PTY session	No
Kubernetes	Not supported (connection = session)	N/A
Docker	Not supported (connection = session)	N/A
OpenCode	`GET /pty/{id}/connect` (WebSocket)	Unknown (not documented)

Process Identification

Platform	ID Type	Notes
OpenCode	String (`pty_N`)	Pattern `^pty.*`
E2B	PID (uint32) or tag (string)	Dual selector
Daytona	Session ID / PID
Docker	Exec ID (string, server-generated)
Kubernetes	Connection-scoped	No ID - the WebSocket IS the process
Gitpod	Alias (string)	Human-readable

Scoping

Platform	PTY Scope	Command Scope
OpenCode	Server-wide (global)	Session-specific (AI-mediated)
E2B	Sandbox-wide	Sandbox-wide
Daytona	Sandbox-wide	Sandbox-wide
Docker	Container-scoped	Container-scoped
Kubernetes	Pod-scoped	Pod-scoped

Key Questions & Analysis

Q: Should PTY transport be WebSocket?

Yes. WebSocket is the right choice for PTY I/O:

Bidirectional: client sends keystrokes, server sends terminal output
Low latency: no HTTP request overhead per keystroke
Persistent connection: terminal sessions are long-lived
Industry consensus: OpenCode, Daytona, and Kubernetes all use WebSocket for PTY

Q: Should command transport be WebSocket or REST?

REST is sufficient for commands. WebSocket is not needed.

The distinction comes down to the nature of each operation:

PTY: Long-lived, bidirectional, interactive. User types, terminal responds. Needs WebSocket.
Commands: Request-response. Client says "run ls -la", server runs it, returns stdout/stderr/exit_code. This is a natural REST operation.

The "full duplex" question: commands don't need full duplex because:

Input is sent once at invocation (the command string)
Output is collected and returned when the process exits
There's no ongoing interactive input during execution

For streaming output of long-running commands (e.g., npm install), there are two clean options:

SSE: Server-Sent Events for output streaming (output-only, which is all you need)
PTY: If the user needs to interact with the process (send ctrl+c, provide stdin), they should use a PTY instead

This matches how OpenCode separates the two: commands are REST, PTYs are WebSocket.

Recommendation: Keep commands as REST. If a command needs streaming output or interactive input, the user should create a PTY instead. This avoids building a second WebSocket protocol for a use case that PTYs already cover.

Q: Should resize be WebSocket in-band or separate POST?

Separate endpoint (PUT or POST).

Reasons:

Resize is a control signal, not data. Mixing it into the data stream requires a framing protocol to distinguish resize messages from terminal input.
OpenCode already defines PUT /pty/{id} with size: {rows, cols} - this is the existing spec.
E2B, Daytona, Docker, and Gitpod all use separate calls.
Only Kubernetes does in-band (because their channel-byte protocol already has a mux layer).
A separate endpoint is simpler to implement, test, and debug.

Recommendation: Use PUT /pty/{id} with size field (matching OpenCode spec). Alternatively, a dedicated POST /pty/{id}/resize if we want to keep update and resize semantically separate.

Q: What is I/O multiplexing?

I/O multiplexing is the mechanism for distinguishing between different data streams (stdout, stderr, stdin, control signals) on a single connection.

When it matters: Non-PTY command execution where stdout and stderr need to be kept separate.

When it doesn't matter: PTY sessions. When a PTY is allocated, the kernel merges stdout and stderr into a single stream (the PTY master fd). There is only one output stream. This is why terminals show stdout and stderr interleaved - the PTY doesn't distinguish them.

For sandbox-agent: Since PTYs are unified streams and commands use REST (separate stdout/stderr in the JSON response), we don't need a multiplexing protocol. The API design naturally separates the two cases.

Q: How should reconnect work?

Reconnect is an application-level concept, not just HTTP/WebSocket reconnection.

The distinction:

HTTP/WebSocket reconnect: The transport-level connection drops and is re-established. This is handled by the client library automatically (retry logic, exponential backoff). The server doesn't need to know.
Process reconnect: The client disconnects from a running process but the process keeps running. Later, the client (or a different client) connects to the same process and starts receiving output again.

E2B's model: Disconnecting a stream (via AbortController) leaves the process running. Connect RPC by PID or tag re-establishes the output stream. Missed output during disconnection is lost. This works because:

Processes are long-lived (servers, shells)
For terminals, the screen state can be recovered by the shell/application redrawing
For commands, if you care about all output, don't disconnect

Recommendation for sandbox-agent: Reconnect should be supported at the application level:

GET /pty/{id}/connect (WebSocket) can be called multiple times for the same PTY
If the WebSocket drops, the PTY process keeps running
Client reconnects by opening a new WebSocket to the same endpoint
No output replay (too complex, rarely needed - terminal apps redraw on reconnect via SIGWINCH)
This is essentially what OpenCode's /pty/{id}/connect endpoint already implies

This naturally leads to the persistent process system concept (see below).

Q: How are PTY events different from PTY transport?

Two completely separate channels serving different purposes:

PTY Events (via SSE on /event or /sessions/{id}/events/sse):

Lifecycle notifications: pty.created, pty.updated, pty.exited, pty.deleted
Lightweight JSON metadata (PTY id, status, exit code)
Broadcast to all subscribers
Used by UIs to update PTY lists, show status indicators, handle cleanup

PTY Transport (via WebSocket on /pty/{id}/connect):

Raw terminal I/O: binary input/output bytes
High-frequency, high-bandwidth
Point-to-point (one client connected to one PTY)
Used by terminal emulators (xterm.js) to render the terminal

Analogy: Events are like email notifications ("a new terminal was opened"). Transport is like the phone call (the actual terminal session).

Q: How are PTY and commands different in OpenCode?

They serve fundamentally different purposes:

PTY (/pty/*) - Direct execution environment:

Server-scoped (not tied to any AI session)
Creates a real terminal process
User interacts directly via WebSocket
Not part of the AI conversation
Think: "the terminal panel in VS Code"

Commands (/session/{sessionID}/command, /session/{sessionID}/shell) - AI-mediated execution:

Session-scoped (tied to an AI session)
The command is sent to the AI assistant for execution
Creates an AssistantMessage in the session's conversation history
Output becomes part of the AI's context
Think: "asking Claude to run a command as a tool call"

Why commands are session-specific: Because they're AI operations, not direct execution. When you call POST /session/{id}/command, the server:

Creates an assistant message in the session
Runs the command
Captures output as message parts
Emits message.part.updated events
The AI can see this output in subsequent turns

This is how the AI "uses terminal tools" - the command infrastructure provides the bridge between the AI session and system execution.

Q: Should scoping be system-wide?

Yes, for both PTY and commands.

Current OpenCode behavior:

PTYs: Already server-wide (global)
Commands: Session-scoped (for AI context injection)

For sandbox-agent, since we're the orchestration layer (not the AI):

PTYs: System-wide. Any client should be able to list, connect to, or manage any PTY.
Commands/processes: System-wide. Process execution is a system primitive, not an AI primitive. If a caller wants to associate a process with a session, they can do so at their layer.

The session-scoping of commands in OpenCode is an OpenCode-specific concern (AI context injection). Sandbox-agent should provide the lower-level primitive (system-wide process execution) and let the OpenCode compat layer handle the session association.

Persistent Process System

The Concept

A persistent process system means:

Spawn a process (PTY or command) via API
Process runs independently of any client connection
Connect/disconnect to the process I/O at will
Process continues running through disconnections
Query process status, list running processes
Kill/signal processes explicitly

This is distinct from the typical "connection = process lifetime" model (Kubernetes, Docker exec) where closing the connection kills the process.

How E2B Does It

E2B's Process service is the best reference implementation:

Start(cmd, pty?) → stream of events (output)
Connect(pid/tag) → stream of events (reconnect)
SendInput(pid, data) → ok
Update(pid, size) → ok (resize)
SendSignal(pid, signal) → ok
List() → running processes

Key design choices:

Unified service: PTY and command are the same service, differentiated by the pty field in StartRequest
Process outlives connection: Disconnecting the output stream (aborting the Start/Connect RPC) does NOT kill the process
Explicit termination: Must call SendSignal(SIGKILL) to stop a process
Tag-based selection: Processes can be tagged at creation for later lookup without knowing the PID

Recommendation for Sandbox-Agent

Sandbox-agent should implement a persistent process manager that:

Is system-wide (not session-scoped)
Supports both PTY and non-PTY modes
Decouples process lifetime from connection lifetime
Exposes via both REST (lifecycle) and WebSocket (I/O)

Proposed API Surface

Process Lifecycle (REST):

Method	Endpoint	Description
`POST`	`/v1/processes`	Create/spawn a process (PTY or command)
`GET`	`/v1/processes`	List all processes
`GET`	`/v1/processes/{id}`	Get process info (status, pid, exit code)
`DELETE`	`/v1/processes/{id}`	Kill process (SIGTERM, then SIGKILL)
`POST`	`/v1/processes/{id}/signal`	Send signal (SIGTERM, SIGKILL, SIGINT, etc.)
`POST`	`/v1/processes/{id}/resize`	Resize PTY (rows, cols)
`POST`	`/v1/processes/{id}/input`	Send stdin/pty input (REST fallback)

Process I/O (WebSocket):

Method	Endpoint	Description
`GET`	`/v1/processes/{id}/connect`	WebSocket for bidirectional I/O

Process Events (SSE):

Event	Description
`process.created`	Process spawned
`process.updated`	Process metadata changed
`process.exited`	Process terminated (includes exit code)
`process.deleted`	Process record removed

Create Request

{
  "command": "bash",
  "args": ["-i", "-l"],
  "cwd": "/workspace",
  "env": {"TERM": "xterm-256color"},
  "pty": {                         // Optional - if present, allocate PTY
    "rows": 24,
    "cols": 80
  },
  "tag": "main-terminal",          // Optional - for lookup by name
  "label": "Terminal 1"            // Optional - display name
}

Process Object

{
  "id": "proc_abc123",
  "tag": "main-terminal",
  "label": "Terminal 1",
  "command": "bash",
  "args": ["-i", "-l"],
  "cwd": "/workspace",
  "pid": 12345,
  "pty": true,
  "status": "running",             // "running" | "exited"
  "exit_code": null,               // Set when exited
  "created_at": "2025-01-15T...",
  "exited_at": null
}

OpenCode Compatibility Layer

The OpenCode compat layer maps to this system:

OpenCode Endpoint	Maps To
`POST /pty`	`POST /v1/processes` (with `pty` field)
`GET /pty`	`GET /v1/processes?pty=true`
`GET /pty/{id}`	`GET /v1/processes/{id}`
`PUT /pty/{id}`	`POST /v1/processes/{id}/resize` + metadata update
`DELETE /pty/{id}`	`DELETE /v1/processes/{id}`
`GET /pty/{id}/connect`	`GET /v1/processes/{id}/connect`
`POST /session/{id}/command`	Create process + capture output into session
`POST /session/{id}/shell`	Create process (shell mode) + capture output into session

Open Questions

Output buffering for reconnect: Should we buffer recent output (e.g., last 64KB) so reconnecting clients get some history? E2B doesn't do this, but it would improve UX for flaky connections.
Process limits: Should there be a max number of concurrent processes? E2B doesn't expose one, but sandbox environments have limited resources.
Auto-cleanup: Should processes be auto-cleaned after exiting? Options:
- Keep forever until explicitly deleted
- Auto-delete after N seconds/minutes
- Keep metadata but release resources
Input via REST vs WebSocket-only: The REST POST /processes/{id}/input endpoint is useful for one-shot input (e.g., "send ctrl+c") without establishing a WebSocket. E2B has both SendInput (unary) and StreamInput (streaming) for this reason.
Multiple WebSocket connections to same process: Should we allow multiple clients to connect to the same process simultaneously? (Pair programming, monitoring). E2B supports this via multiple Connect calls.

User-Initiated Command Injection ("Run command, give AI context")

A common pattern across agents: the user (or frontend) runs a command and the output is injected into the AI's conversation context. This is distinct from the agent running a command via its own tools.

Agent	Feature	Mechanism	Protocol-level?
Claude Code	`!command` prefix in TUI	CLI runs command locally, injects output as user message	No - client-side hack, not in API schema
Codex	`user_shell` source	`ExecCommandSource` enum distinguishes `agent` vs `user_shell` vs `unified_exec_*`	Yes - first-class protocol event
OpenCode	`/session/{id}/command`	HTTP endpoint runs command, records result as `AssistantMessage`	Yes - HTTP API
Amp	N/A	Not supported	N/A

Design implication for sandbox-agent: The process system should support an optional session_id field when creating a process. If provided, the process output is associated with that session so the agent can see it. If not provided, the process runs independently (like a PTY). This unifies:

User interactive terminals (no session association)
User-initiated commands for AI context (session association)
Agent-initiated background processes (session association)

Sources

E2B Process Proto - process.proto gRPC service definition
E2B JS SDK - commands/pty.ts, commands/index.ts
Daytona SDK - REST + WebSocket PTY API
Kubernetes RemoteCommand - WebSocket subprotocol
Docker Engine API - Exec API with stream multiplexing
Fly.io Machines API - REST exec with 60s limit
Gitpod terminal.proto - gRPC terminal service
OpenCode OpenAPI Spec - PTY and session command endpoints

18 KiB Raw Blame History