mirror of
https://github.com/harivansh-afk/sandbox-agent.git
synced 2026-04-15 05:02:11 +00:00
fix: credential detection and provider auth status (#120)
## Summary Fix credential detection bugs and add credential availability status to the API. Consolidate Claude fallback models and add `sonnet` alias. Builds on #109 (OAuth token support). Related issues: - Fixes #117 (Claude, Codex not showing up in gigacode) - Related to #113 (Default agent should be Claude Code) ## Changes ### Credential detection fixes - **`agent-credentials/src/lib.rs`**: Fix `?` operator bug in `extract_claude_credentials` - now continues to next config path if one is missing instead of returning early ### API credential status - **`sandbox-agent/src/router.rs`**: Add `credentialsAvailable` field to `AgentInfo` struct - **`/v1/agents`** endpoint now reports whether each agent has valid credentials ### OpenCode provider improvements - **`sandbox-agent/src/opencode_compat.rs`**: Build `connected` array based on actual credential availability, not just model presence - Check provider-specific credentials for OpenCode groups (e.g., `opencode:anthropic` only connected if Anthropic creds available) - Add logging when credential extraction fails in model cache building ### Fallback model consolidation - Renamed `claude_oauth_fallback_models()` → `claude_fallback_models()` (used for all fallback cases, not just OAuth) - Added `sonnet` to fallback models (confirmed working via headless CLI test) - Added `codex_fallback_models()` for Codex when credentials missing - Added comment explaining aliases work for both API and OAuth users ### Documentation - **`docs/credentials.mdx`**: New reference doc covering credential sources, extraction behavior, and error handling - Documents that extraction failures are silent (not errors) - Documents that agents spawn without credential pre-validation ### Inspector UI - **`AgentsTab.tsx`**: Added credential status pill showing "Authenticated" or "No Credentials" ## Error Handling Philosophy - **Extraction failures are silent**: Missing/malformed config files don't error, just continue to next source - **Agents spawn without credential validation**: No pre-flight auth check; agent's native error surfaces if credentials are missing - **Fallback models for UI**: When credentials missing, show alias-based models so users can still configure sessions ## Validation - Tested Claude Code model aliases via headless CLI: - `claude --model default --print "say hi"` ✓ - `claude --model sonnet --print "say hi"` ✓ - `claude --model haiku --print "say hi"` ✓ - Build passes - TypeScript types regenerated with `credentialsAvailable` field
This commit is contained in:
parent
915d484845
commit
c54f83e1a6
13 changed files with 807 additions and 9 deletions
|
|
@ -415,6 +415,31 @@ if let Some(model) = options.model.as_deref() {
|
|||
3. **Wait for Amp API** — Amp may add model/mode discovery in a future release
|
||||
4. **Scrape ampcode.com** — Check if the web UI exposes available modes/models
|
||||
|
||||
## Command Execution & Process Management
|
||||
|
||||
### Agent Tool Execution
|
||||
|
||||
Amp executes commands via the `Bash` tool, similar to Claude Code. Synchronous execution, blocks the agent turn. Permission rules can pre-authorize specific commands:
|
||||
|
||||
```typescript
|
||||
{ tool: "Bash", matches: { command: "git *" }, action: "allow" }
|
||||
```
|
||||
|
||||
### No User-Initiated Command Injection
|
||||
|
||||
Amp does not expose any mechanism for external clients to inject command results into the agent's context. No `!` prefix equivalent, no command injection API.
|
||||
|
||||
### Comparison
|
||||
|
||||
| Capability | Supported? | Notes |
|
||||
|-----------|-----------|-------|
|
||||
| Agent runs commands | Yes (`Bash` tool) | Synchronous, blocks agent turn |
|
||||
| User runs commands → agent sees output | No | |
|
||||
| External API for command injection | No | |
|
||||
| Command source tracking | No | |
|
||||
| Background process management | No | Shell `&` only |
|
||||
| PTY / interactive terminal | No | |
|
||||
|
||||
## Notes
|
||||
|
||||
- Amp is similar to Claude Code (same streaming format)
|
||||
|
|
|
|||
|
|
@ -279,6 +279,44 @@ x-api-key: <ANTHROPIC_API_KEY>
|
|||
anthropic-version: 2023-06-01
|
||||
```
|
||||
|
||||
## Command Execution & Process Management
|
||||
|
||||
### Agent Tool Execution
|
||||
|
||||
The agent executes commands via the `Bash` tool. This is synchronous - the agent blocks until the command exits. Tool schema:
|
||||
|
||||
```json
|
||||
{
|
||||
"command": "string",
|
||||
"timeout": "number",
|
||||
"workingDirectory": "string"
|
||||
}
|
||||
```
|
||||
|
||||
There is no background process support. If the agent needs a long-running process (e.g., dev server), it uses shell backgrounding (`&`) within a single `Bash` tool call.
|
||||
|
||||
### User-Initiated Command Execution (`!` prefix)
|
||||
|
||||
Claude Code's TUI supports `!command` syntax where the user types `!npm test` to run a command directly. The output is injected into the conversation as a user message so the agent can see it on the next turn.
|
||||
|
||||
**This is a client-side TUI feature only.** It is not exposed in the API schema or streaming protocol. The CLI runs the command locally and stuffs the output into the next user message. There is no protocol-level concept of "user ran a command" vs "agent ran a command."
|
||||
|
||||
### No External Command Injection API
|
||||
|
||||
External clients (SDKs, frontends) cannot programmatically inject command results into Claude's conversation context. The only way to provide command output to the agent is:
|
||||
- Include it in the user prompt text
|
||||
- Use the `!` prefix in the interactive TUI
|
||||
|
||||
### Comparison
|
||||
|
||||
| Capability | Supported? | Notes |
|
||||
|-----------|-----------|-------|
|
||||
| Agent runs commands | Yes (`Bash` tool) | Synchronous, blocks agent turn |
|
||||
| User runs commands → agent sees output | Yes (`!cmd` in TUI) | Client-side only, not in protocol |
|
||||
| External API for command injection | No | |
|
||||
| Background process management | No | Shell `&` only |
|
||||
| PTY / interactive terminal | No | |
|
||||
|
||||
## Notes
|
||||
|
||||
- Claude CLI manages its own OAuth refresh internally
|
||||
|
|
|
|||
|
|
@ -347,6 +347,68 @@ Requires a running Codex app-server process. Send the JSON-RPC request to the ap
|
|||
- Requires an active app-server process (cannot query models without starting one)
|
||||
- No standalone CLI command like `codex models`
|
||||
|
||||
## Command Execution & Process Management
|
||||
|
||||
### Agent Tool Execution
|
||||
|
||||
Codex executes commands via `LocalShellAction`. The agent proposes a command, and external clients approve/deny via JSON-RPC (`item/commandExecution/requestApproval`).
|
||||
|
||||
### Command Source Tracking (`ExecCommandSource`)
|
||||
|
||||
Codex is the only agent that explicitly tracks **who initiated a command** at the protocol level:
|
||||
|
||||
```json
|
||||
{
|
||||
"ExecCommandSource": {
|
||||
"enum": ["agent", "user_shell", "unified_exec_startup", "unified_exec_interaction"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Source | Meaning |
|
||||
|--------|---------|
|
||||
| `agent` | Agent decided to run this command via tool call |
|
||||
| `user_shell` | User ran a command in a shell (equivalent to Claude Code's `!` prefix) |
|
||||
| `unified_exec_startup` | Startup script ran this command |
|
||||
| `unified_exec_interaction` | Interactive execution |
|
||||
|
||||
This means user-initiated shell commands are **first-class protocol events** in Codex, not a client-side hack like Claude Code's `!` prefix.
|
||||
|
||||
### Command Execution Events
|
||||
|
||||
Codex emits structured events for command execution:
|
||||
|
||||
- `exec_command_begin` - Command started (includes `source`, `command`, `cwd`, `turn_id`)
|
||||
- `exec_command_output_delta` - Streaming output chunk (includes `stream: stdout|stderr`)
|
||||
- `exec_command_end` - Command completed (includes `exit_code`, `source`)
|
||||
|
||||
### Parsed Command Analysis (`CommandAction`)
|
||||
|
||||
Codex provides semantic analysis of what a command does:
|
||||
|
||||
```json
|
||||
{
|
||||
"commandActions": [
|
||||
{ "type": "read", "path": "/src/main.ts" },
|
||||
{ "type": "write", "path": "/src/utils.ts" },
|
||||
{ "type": "install", "package": "lodash" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Action types: `read`, `write`, `listFiles`, `search`, `install`, `remove`, `other`.
|
||||
|
||||
### Comparison
|
||||
|
||||
| Capability | Supported? | Notes |
|
||||
|-----------|-----------|-------|
|
||||
| Agent runs commands | Yes (`LocalShellAction`) | With approval workflow |
|
||||
| User runs commands → agent sees output | Yes (`user_shell` source) | First-class protocol event |
|
||||
| External API for command injection | Yes (JSON-RPC approval) | Can approve/deny before execution |
|
||||
| Command source tracking | Yes (`ExecCommandSource` enum) | Distinguishes agent vs user vs startup |
|
||||
| Background process management | No | |
|
||||
| PTY / interactive terminal | No | |
|
||||
|
||||
## Notes
|
||||
|
||||
- SDK is dynamically imported to reduce bundle size
|
||||
|
|
|
|||
|
|
@ -585,6 +585,60 @@ const response = await client.provider.list();
|
|||
|
||||
When an OpenCode server is running, call `GET /provider` on its HTTP port. Returns full model metadata including capabilities, costs, context limits, and modalities.
|
||||
|
||||
## Command Execution & Process Management
|
||||
|
||||
### Agent Tool Execution
|
||||
|
||||
The agent executes commands via internal tools (not exposed in the HTTP API). The agent's tool calls are synchronous within its turn. Tool parts have states: `pending`, `running`, `completed`, `error`.
|
||||
|
||||
### PTY System (`/pty/*`) - User-Facing Terminals
|
||||
|
||||
Separate from the agent's command execution. PTYs are server-scoped interactive terminals for the user:
|
||||
|
||||
- `POST /pty` - Create PTY (command, args, cwd, title, env)
|
||||
- `GET /pty` - List all PTYs
|
||||
- `GET /pty/{ptyID}` - Get PTY info
|
||||
- `PUT /pty/{ptyID}` - Update PTY (title, resize via `size: {rows, cols}`)
|
||||
- `DELETE /pty/{ptyID}` - Kill and remove PTY
|
||||
- `GET /pty/{ptyID}/connect` - WebSocket for bidirectional I/O
|
||||
|
||||
PTY events (globally broadcast via SSE): `pty.created`, `pty.updated`, `pty.exited`, `pty.deleted`.
|
||||
|
||||
The agent does NOT use the PTY system. PTYs are for the user's interactive terminal panel, independent of any AI session.
|
||||
|
||||
### Session Commands (`/session/{id}/command`, `/session/{id}/shell`) - Context Injection
|
||||
|
||||
External clients can inject command results into an AI session's conversation context:
|
||||
|
||||
- `POST /session/{sessionID}/command` - Executes a command and records the result as an `AssistantMessage` in the session. Required fields: `command`, `arguments`. The output becomes part of the AI's context for subsequent turns.
|
||||
- `POST /session/{sessionID}/shell` - Similar but wraps in `sh -c`. Required fields: `command`, `agent`.
|
||||
- `GET /command` - Lists available command definitions (metadata, not execution).
|
||||
|
||||
Session commands emit `command.executed` events with `sessionID` + `messageID`.
|
||||
|
||||
**Key distinction**: These endpoints execute commands directly (not via the AI), then inject the output into the session as if the AI produced it. The AI doesn't actively run the command - it just finds the output in its conversation history on the next turn.
|
||||
|
||||
### Three Separate Execution Mechanisms
|
||||
|
||||
| Mechanism | Who uses it | Scoped to | AI sees output? |
|
||||
|-----------|-------------|-----------|----------------|
|
||||
| Agent tools (internal) | AI agent | Session turn | Yes (immediate) |
|
||||
| PTY (`/pty/*`) | User/frontend | Server (global) | No |
|
||||
| Session commands (`/session/{id}/*`) | Frontend/SDK client | Session | Yes (next turn) |
|
||||
|
||||
The agent has no tool to interact with PTYs and cannot access the session command endpoints. When the agent needs to run a background process, it uses its internal bash-equivalent tool with shell backgrounding (`&`).
|
||||
|
||||
### Comparison
|
||||
|
||||
| Capability | Supported? | Notes |
|
||||
|-----------|-----------|-------|
|
||||
| Agent runs commands | Yes (internal tools) | Synchronous, blocks agent turn |
|
||||
| User runs commands → agent sees output | Yes (`/session/{id}/command`) | HTTP API, first-class |
|
||||
| External API for command injection | Yes | Session-scoped endpoints |
|
||||
| Command source tracking | Implicit | Endpoint implies source (no enum) |
|
||||
| Background process management | No | Shell `&` only for agent |
|
||||
| PTY / interactive terminal | Yes (`/pty/*`) | Server-scoped, WebSocket I/O |
|
||||
|
||||
## Notes
|
||||
|
||||
- OpenCode is the most feature-rich runtime (streaming, questions, permissions)
|
||||
|
|
|
|||
374
research/process-terminal-design.md
Normal file
374
research/process-terminal-design.md
Normal file
|
|
@ -0,0 +1,374 @@
|
|||
# Research: Process & Terminal System Design
|
||||
|
||||
Research on PTY/terminal and process management APIs across sandbox platforms, with design recommendations for sandbox-agent.
|
||||
|
||||
## Competitive Landscape
|
||||
|
||||
### Transport Comparison
|
||||
|
||||
| Platform | PTY Transport | Command Transport | Unified? |
|
||||
|----------|--------------|-------------------|----------|
|
||||
| **OpenCode** | WebSocket (`/pty/{id}/connect`) | REST (session-scoped, AI-mediated) | No |
|
||||
| **E2B** | gRPC server-stream (output) + unary RPC (input) | Same gRPC service | Yes |
|
||||
| **Daytona** | WebSocket | REST | No |
|
||||
| **Kubernetes** | WebSocket (channel byte mux) | Same WebSocket | Yes |
|
||||
| **Docker** | HTTP connection hijack | Same connection | Yes |
|
||||
| **Fly.io** | SSH over WireGuard | REST (sync, 60s max) | No |
|
||||
| **Vercel Sandboxes** | No PTY API | REST SDK (async generator for logs) | N/A |
|
||||
| **Gitpod** | gRPC (Listen=output, Write=input) | Same gRPC service | Yes |
|
||||
|
||||
### Resize Mechanism
|
||||
|
||||
| Platform | How | Notes |
|
||||
|----------|-----|-------|
|
||||
| **OpenCode** | `PUT /pty/{id}` with `size: {rows, cols}` | Separate REST call |
|
||||
| **E2B** | Separate `Update` RPC | Separate gRPC call |
|
||||
| **Daytona** | Separate HTTP POST | Sends SIGWINCH |
|
||||
| **Kubernetes** | In-band WebSocket message (channel byte 4) | `{"Width": N, "Height": N}` |
|
||||
| **Docker** | `POST /exec/{id}/resize?h=N&w=N` | Separate REST call |
|
||||
| **Gitpod** | Separate `SetSize` RPC | Separate gRPC call |
|
||||
|
||||
**Consensus**: Almost all platforms use a separate call for resize. Only Kubernetes does it in-band. Since resize is a control signal (not data), a separate mechanism is cleaner.
|
||||
|
||||
### I/O Multiplexing
|
||||
|
||||
I/O multiplexing is how platforms distinguish between stdout, stderr, and PTY data on a shared connection.
|
||||
|
||||
| Platform | Method | Detail |
|
||||
|----------|--------|--------|
|
||||
| **Docker** | 8-byte binary header per frame | Byte 0 = stream type (0=stdin, 1=stdout, 2=stderr). When TTY=true, no mux (raw stream). |
|
||||
| **Kubernetes** | 1-byte channel prefix per WebSocket message | 0=stdin, 1=stdout, 2=stderr, 3=error, 4=resize, 255=close |
|
||||
| **E2B** | gRPC `oneof` in protobuf | `DataEvent.output` is `oneof { bytes stdout, bytes stderr, bytes pty }` |
|
||||
| **OpenCode** | None | PTY is a unified stream. Commands capture stdout/stderr separately in response. |
|
||||
| **Daytona** | None | PTY is unified. Commands return structured `{stdout, stderr}`. |
|
||||
|
||||
**Key insight**: When a process runs with a PTY allocated, stdout and stderr are merged by the kernel into a single stream. Multiplexing only matters for non-PTY command execution. OpenCode and Daytona handle this by keeping PTY (unified stream) and commands (structured response) as separate APIs.
|
||||
|
||||
### Reconnection
|
||||
|
||||
| Platform | Method | Replays missed output? |
|
||||
|----------|--------|----------------------|
|
||||
| **E2B** | `Connect` RPC by PID or tag | No - only new events from reconnect point |
|
||||
| **Daytona** | New WebSocket to same PTY session | No |
|
||||
| **Kubernetes** | Not supported (connection = session) | N/A |
|
||||
| **Docker** | Not supported (connection = session) | N/A |
|
||||
| **OpenCode** | `GET /pty/{id}/connect` (WebSocket) | Unknown (not documented) |
|
||||
|
||||
### Process Identification
|
||||
|
||||
| Platform | ID Type | Notes |
|
||||
|----------|---------|-------|
|
||||
| **OpenCode** | String (`pty_N`) | Pattern `^pty.*` |
|
||||
| **E2B** | PID (uint32) or tag (string) | Dual selector |
|
||||
| **Daytona** | Session ID / PID | |
|
||||
| **Docker** | Exec ID (string, server-generated) | |
|
||||
| **Kubernetes** | Connection-scoped | No ID - the WebSocket IS the process |
|
||||
| **Gitpod** | Alias (string) | Human-readable |
|
||||
|
||||
### Scoping
|
||||
|
||||
| Platform | PTY Scope | Command Scope |
|
||||
|----------|-----------|---------------|
|
||||
| **OpenCode** | Server-wide (global) | Session-specific (AI-mediated) |
|
||||
| **E2B** | Sandbox-wide | Sandbox-wide |
|
||||
| **Daytona** | Sandbox-wide | Sandbox-wide |
|
||||
| **Docker** | Container-scoped | Container-scoped |
|
||||
| **Kubernetes** | Pod-scoped | Pod-scoped |
|
||||
|
||||
## Key Questions & Analysis
|
||||
|
||||
### Q: Should PTY transport be WebSocket?
|
||||
|
||||
**Yes.** WebSocket is the right choice for PTY I/O:
|
||||
- Bidirectional: client sends keystrokes, server sends terminal output
|
||||
- Low latency: no HTTP request overhead per keystroke
|
||||
- Persistent connection: terminal sessions are long-lived
|
||||
- Industry consensus: OpenCode, Daytona, and Kubernetes all use WebSocket for PTY
|
||||
|
||||
### Q: Should command transport be WebSocket or REST?
|
||||
|
||||
**REST is sufficient for commands. WebSocket is not needed.**
|
||||
|
||||
The distinction comes down to the nature of each operation:
|
||||
|
||||
- **PTY**: Long-lived, bidirectional, interactive. User types, terminal responds. Needs WebSocket.
|
||||
- **Commands**: Request-response. Client says "run `ls -la`", server runs it, returns stdout/stderr/exit_code. This is a natural REST operation.
|
||||
|
||||
The "full duplex" question: commands don't need full duplex because:
|
||||
1. Input is sent once at invocation (the command string)
|
||||
2. Output is collected and returned when the process exits
|
||||
3. There's no ongoing interactive input during execution
|
||||
|
||||
For **streaming output** of long-running commands (e.g., `npm install`), there are two clean options:
|
||||
1. **SSE**: Server-Sent Events for output streaming (output-only, which is all you need)
|
||||
2. **PTY**: If the user needs to interact with the process (send ctrl+c, provide stdin), they should use a PTY instead
|
||||
|
||||
This matches how OpenCode separates the two: commands are REST, PTYs are WebSocket.
|
||||
|
||||
**Recommendation**: Keep commands as REST. If a command needs streaming output or interactive input, the user should create a PTY instead. This avoids building a second WebSocket protocol for a use case that PTYs already cover.
|
||||
|
||||
### Q: Should resize be WebSocket in-band or separate POST?
|
||||
|
||||
**Separate endpoint (PUT or POST).**
|
||||
|
||||
Reasons:
|
||||
- Resize is a control signal, not data. Mixing it into the data stream requires a framing protocol to distinguish resize messages from terminal input.
|
||||
- OpenCode already defines `PUT /pty/{id}` with `size: {rows, cols}` - this is the existing spec.
|
||||
- E2B, Daytona, Docker, and Gitpod all use separate calls.
|
||||
- Only Kubernetes does in-band (because their channel-byte protocol already has a mux layer).
|
||||
- A separate endpoint is simpler to implement, test, and debug.
|
||||
|
||||
**Recommendation**: Use `PUT /pty/{id}` with `size` field (matching OpenCode spec). Alternatively, a dedicated `POST /pty/{id}/resize` if we want to keep update and resize semantically separate.
|
||||
|
||||
### Q: What is I/O multiplexing?
|
||||
|
||||
I/O multiplexing is the mechanism for distinguishing between different data streams (stdout, stderr, stdin, control signals) on a single connection.
|
||||
|
||||
**When it matters**: Non-PTY command execution where stdout and stderr need to be kept separate.
|
||||
|
||||
**When it doesn't matter**: PTY sessions. When a PTY is allocated, the kernel merges stdout and stderr into a single stream (the PTY master fd). There is only one output stream. This is why terminals show stdout and stderr interleaved - the PTY doesn't distinguish them.
|
||||
|
||||
**For sandbox-agent**: Since PTYs are unified streams and commands use REST (separate stdout/stderr in the JSON response), we don't need a multiplexing protocol. The API design naturally separates the two cases.
|
||||
|
||||
### Q: How should reconnect work?
|
||||
|
||||
**Reconnect is an application-level concept, not just HTTP/WebSocket reconnection.**
|
||||
|
||||
The distinction:
|
||||
|
||||
- **HTTP/WebSocket reconnect**: The transport-level connection drops and is re-established. This is handled by the client library automatically (retry logic, exponential backoff). The server doesn't need to know.
|
||||
- **Process reconnect**: The client disconnects from a running process but the process keeps running. Later, the client (or a different client) connects to the same process and starts receiving output again.
|
||||
|
||||
**E2B's model**: Disconnecting a stream (via AbortController) leaves the process running. `Connect` RPC by PID or tag re-establishes the output stream. Missed output during disconnection is lost. This works because:
|
||||
1. Processes are long-lived (servers, shells)
|
||||
2. For terminals, the screen state can be recovered by the shell/application redrawing
|
||||
3. For commands, if you care about all output, don't disconnect
|
||||
|
||||
**Recommendation for sandbox-agent**: Reconnect should be supported at the application level:
|
||||
1. `GET /pty/{id}/connect` (WebSocket) can be called multiple times for the same PTY
|
||||
2. If the WebSocket drops, the PTY process keeps running
|
||||
3. Client reconnects by opening a new WebSocket to the same endpoint
|
||||
4. No output replay (too complex, rarely needed - terminal apps redraw on reconnect via SIGWINCH)
|
||||
5. This is essentially what OpenCode's `/pty/{id}/connect` endpoint already implies
|
||||
|
||||
This naturally leads to the **persistent process system** concept (see below).
|
||||
|
||||
### Q: How are PTY events different from PTY transport?
|
||||
|
||||
Two completely separate channels serving different purposes:
|
||||
|
||||
**PTY Events** (via SSE on `/event` or `/sessions/{id}/events/sse`):
|
||||
- Lifecycle notifications: `pty.created`, `pty.updated`, `pty.exited`, `pty.deleted`
|
||||
- Lightweight JSON metadata (PTY id, status, exit code)
|
||||
- Broadcast to all subscribers
|
||||
- Used by UIs to update PTY lists, show status indicators, handle cleanup
|
||||
|
||||
**PTY Transport** (via WebSocket on `/pty/{id}/connect`):
|
||||
- Raw terminal I/O: binary input/output bytes
|
||||
- High-frequency, high-bandwidth
|
||||
- Point-to-point (one client connected to one PTY)
|
||||
- Used by terminal emulators (xterm.js) to render the terminal
|
||||
|
||||
**Analogy**: Events are like email notifications ("a new terminal was opened"). Transport is like the phone call (the actual terminal session).
|
||||
|
||||
### Q: How are PTY and commands different in OpenCode?
|
||||
|
||||
They serve fundamentally different purposes:
|
||||
|
||||
**PTY (`/pty/*`)** - Direct execution environment:
|
||||
- Server-scoped (not tied to any AI session)
|
||||
- Creates a real terminal process
|
||||
- User interacts directly via WebSocket
|
||||
- Not part of the AI conversation
|
||||
- Think: "the terminal panel in VS Code"
|
||||
|
||||
**Commands (`/session/{sessionID}/command`, `/session/{sessionID}/shell`)** - AI-mediated execution:
|
||||
- Session-scoped (tied to an AI session)
|
||||
- The command is sent **to the AI assistant** for execution
|
||||
- Creates an `AssistantMessage` in the session's conversation history
|
||||
- Output becomes part of the AI's context
|
||||
- Think: "asking Claude to run a command as a tool call"
|
||||
|
||||
**Why commands are session-specific**: Because they're AI operations, not direct execution. When you call `POST /session/{id}/command`, the server:
|
||||
1. Creates an assistant message in the session
|
||||
2. Runs the command
|
||||
3. Captures output as message parts
|
||||
4. Emits `message.part.updated` events
|
||||
5. The AI can see this output in subsequent turns
|
||||
|
||||
This is how the AI "uses terminal tools" - the command infrastructure provides the bridge between the AI session and system execution.
|
||||
|
||||
### Q: Should scoping be system-wide?
|
||||
|
||||
**Yes, for both PTY and commands.**
|
||||
|
||||
Current OpenCode behavior:
|
||||
- PTYs: Already server-wide (global)
|
||||
- Commands: Session-scoped (for AI context injection)
|
||||
|
||||
**For sandbox-agent**, since we're the orchestration layer (not the AI):
|
||||
- **PTYs**: System-wide. Any client should be able to list, connect to, or manage any PTY.
|
||||
- **Commands/processes**: System-wide. Process execution is a system primitive, not an AI primitive. If a caller wants to associate a process with a session, they can do so at their layer.
|
||||
|
||||
The session-scoping of commands in OpenCode is an OpenCode-specific concern (AI context injection). Sandbox-agent should provide the lower-level primitive (system-wide process execution) and let the OpenCode compat layer handle the session association.
|
||||
|
||||
## Persistent Process System
|
||||
|
||||
### The Concept
|
||||
|
||||
A persistent process system means:
|
||||
1. **Spawn** a process (PTY or command) via API
|
||||
2. Process runs independently of any client connection
|
||||
3. **Connect/disconnect** to the process I/O at will
|
||||
4. Process continues running through disconnections
|
||||
5. **Query** process status, list running processes
|
||||
6. **Kill/signal** processes explicitly
|
||||
|
||||
This is distinct from the typical "connection = process lifetime" model (Kubernetes, Docker exec) where closing the connection kills the process.
|
||||
|
||||
### How E2B Does It
|
||||
|
||||
E2B's `Process` service is the best reference implementation:
|
||||
|
||||
```
|
||||
Start(cmd, pty?) → stream of events (output)
|
||||
Connect(pid/tag) → stream of events (reconnect)
|
||||
SendInput(pid, data) → ok
|
||||
Update(pid, size) → ok (resize)
|
||||
SendSignal(pid, signal) → ok
|
||||
List() → running processes
|
||||
```
|
||||
|
||||
Key design choices:
|
||||
- **Unified service**: PTY and command are the same service, differentiated by the `pty` field in `StartRequest`
|
||||
- **Process outlives connection**: Disconnecting the output stream (aborting the `Start`/`Connect` RPC) does NOT kill the process
|
||||
- **Explicit termination**: Must call `SendSignal(SIGKILL)` to stop a process
|
||||
- **Tag-based selection**: Processes can be tagged at creation for later lookup without knowing the PID
|
||||
|
||||
### Recommendation for Sandbox-Agent
|
||||
|
||||
Sandbox-agent should implement a **persistent process manager** that:
|
||||
|
||||
1. **Is system-wide** (not session-scoped)
|
||||
2. **Supports both PTY and non-PTY modes**
|
||||
3. **Decouples process lifetime from connection lifetime**
|
||||
4. **Exposes via both REST (lifecycle) and WebSocket (I/O)**
|
||||
|
||||
#### Proposed API Surface
|
||||
|
||||
**Process Lifecycle (REST)**:
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| `POST` | `/v1/processes` | Create/spawn a process (PTY or command) |
|
||||
| `GET` | `/v1/processes` | List all processes |
|
||||
| `GET` | `/v1/processes/{id}` | Get process info (status, pid, exit code) |
|
||||
| `DELETE` | `/v1/processes/{id}` | Kill process (SIGTERM, then SIGKILL) |
|
||||
| `POST` | `/v1/processes/{id}/signal` | Send signal (SIGTERM, SIGKILL, SIGINT, etc.) |
|
||||
| `POST` | `/v1/processes/{id}/resize` | Resize PTY (rows, cols) |
|
||||
| `POST` | `/v1/processes/{id}/input` | Send stdin/pty input (REST fallback) |
|
||||
|
||||
**Process I/O (WebSocket)**:
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| `GET` | `/v1/processes/{id}/connect` | WebSocket for bidirectional I/O |
|
||||
|
||||
**Process Events (SSE)**:
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `process.created` | Process spawned |
|
||||
| `process.updated` | Process metadata changed |
|
||||
| `process.exited` | Process terminated (includes exit code) |
|
||||
| `process.deleted` | Process record removed |
|
||||
|
||||
#### Create Request
|
||||
|
||||
```json
|
||||
{
|
||||
"command": "bash",
|
||||
"args": ["-i", "-l"],
|
||||
"cwd": "/workspace",
|
||||
"env": {"TERM": "xterm-256color"},
|
||||
"pty": { // Optional - if present, allocate PTY
|
||||
"rows": 24,
|
||||
"cols": 80
|
||||
},
|
||||
"tag": "main-terminal", // Optional - for lookup by name
|
||||
"label": "Terminal 1" // Optional - display name
|
||||
}
|
||||
```
|
||||
|
||||
#### Process Object
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "proc_abc123",
|
||||
"tag": "main-terminal",
|
||||
"label": "Terminal 1",
|
||||
"command": "bash",
|
||||
"args": ["-i", "-l"],
|
||||
"cwd": "/workspace",
|
||||
"pid": 12345,
|
||||
"pty": true,
|
||||
"status": "running", // "running" | "exited"
|
||||
"exit_code": null, // Set when exited
|
||||
"created_at": "2025-01-15T...",
|
||||
"exited_at": null
|
||||
}
|
||||
```
|
||||
|
||||
#### OpenCode Compatibility Layer
|
||||
|
||||
The OpenCode compat layer maps to this system:
|
||||
|
||||
| OpenCode Endpoint | Maps To |
|
||||
|-------------------|---------|
|
||||
| `POST /pty` | `POST /v1/processes` (with `pty` field) |
|
||||
| `GET /pty` | `GET /v1/processes?pty=true` |
|
||||
| `GET /pty/{id}` | `GET /v1/processes/{id}` |
|
||||
| `PUT /pty/{id}` | `POST /v1/processes/{id}/resize` + metadata update |
|
||||
| `DELETE /pty/{id}` | `DELETE /v1/processes/{id}` |
|
||||
| `GET /pty/{id}/connect` | `GET /v1/processes/{id}/connect` |
|
||||
| `POST /session/{id}/command` | Create process + capture output into session |
|
||||
| `POST /session/{id}/shell` | Create process (shell mode) + capture output into session |
|
||||
|
||||
### Open Questions
|
||||
|
||||
1. **Output buffering for reconnect**: Should we buffer recent output (e.g., last 64KB) so reconnecting clients get some history? E2B doesn't do this, but it would improve UX for flaky connections.
|
||||
|
||||
2. **Process limits**: Should there be a max number of concurrent processes? E2B doesn't expose one, but sandbox environments have limited resources.
|
||||
|
||||
3. **Auto-cleanup**: Should processes be auto-cleaned after exiting? Options:
|
||||
- Keep forever until explicitly deleted
|
||||
- Auto-delete after N seconds/minutes
|
||||
- Keep metadata but release resources
|
||||
|
||||
4. **Input via REST vs WebSocket-only**: The REST `POST /processes/{id}/input` endpoint is useful for one-shot input (e.g., "send ctrl+c") without establishing a WebSocket. E2B has both `SendInput` (unary) and `StreamInput` (streaming) for this reason.
|
||||
|
||||
5. **Multiple WebSocket connections to same process**: Should we allow multiple clients to connect to the same process simultaneously? (Pair programming, monitoring). E2B supports this via multiple `Connect` calls.
|
||||
|
||||
## User-Initiated Command Injection ("Run command, give AI context")
|
||||
|
||||
A common pattern across agents: the user (or frontend) runs a command and the output is injected into the AI's conversation context. This is distinct from the agent running a command via its own tools.
|
||||
|
||||
| Agent | Feature | Mechanism | Protocol-level? |
|
||||
|-------|---------|-----------|----------------|
|
||||
| **Claude Code** | `!command` prefix in TUI | CLI runs command locally, injects output as user message | No - client-side hack, not in API schema |
|
||||
| **Codex** | `user_shell` source | `ExecCommandSource` enum distinguishes `agent` vs `user_shell` vs `unified_exec_*` | Yes - first-class protocol event |
|
||||
| **OpenCode** | `/session/{id}/command` | HTTP endpoint runs command, records result as `AssistantMessage` | Yes - HTTP API |
|
||||
| **Amp** | N/A | Not supported | N/A |
|
||||
|
||||
**Design implication for sandbox-agent**: The process system should support an optional `session_id` field when creating a process. If provided, the process output is associated with that session so the agent can see it. If not provided, the process runs independently (like a PTY). This unifies:
|
||||
- User interactive terminals (no session association)
|
||||
- User-initiated commands for AI context (session association)
|
||||
- Agent-initiated background processes (session association)
|
||||
|
||||
## Sources
|
||||
|
||||
- [E2B Process Proto](https://github.com/e2b-dev/E2B) - `process.proto` gRPC service definition
|
||||
- [E2B JS SDK](https://github.com/e2b-dev/E2B/tree/main/packages/js-sdk) - `commands/pty.ts`, `commands/index.ts`
|
||||
- [Daytona SDK](https://www.daytona.io/docs/en/typescript-sdk/process/) - REST + WebSocket PTY API
|
||||
- [Kubernetes RemoteCommand](https://github.com/kubernetes/apimachinery/blob/master/pkg/util/remotecommand/constants.go) - WebSocket subprotocol
|
||||
- [Docker Engine API](https://docker-docs.uclv.cu/engine/api/v1.21/) - Exec API with stream multiplexing
|
||||
- [Fly.io Machines API](https://fly.io/docs/machines/api/) - REST exec with 60s limit
|
||||
- [Gitpod terminal.proto](https://codeberg.org/kanishka-reading-list/gitpod/src/branch/main/components/supervisor-api/terminal.proto) - gRPC terminal service
|
||||
- [OpenCode OpenAPI Spec](https://github.com/opencode-ai/opencode) - PTY and session command endpoints
|
||||
Loading…
Add table
Add a link
Reference in a new issue