feat: add turn streaming and inspector updates

2026-04-15 06:04:43 +00:00 · 2026-01-27 06:18:43 -08:00 · 2026-01-27 06:18:43 -08:00 · 34d4f3693e
commit 34d4f3693e
parent bf58891edf
49 changed files with 4629 additions and 1146 deletions
--- a/docs/architecture.mdx
+++ b/docs/architecture.mdx
@ -105,6 +105,7 @@ Each session tracks:
 POST /v1/sessions/{sessionId}     Create session, auto-install agent
        ↓
 POST /v1/sessions/{id}/messages   Spawn agent subprocess, stream output
+POST /v1/sessions/{id}/messages/stream   Post and stream a single turn
        ↓
 GET /v1/sessions/{id}/events      Poll for new events (offset-based)
 GET /v1/sessions/{id}/events/sse  Subscribe to SSE stream
@ -133,16 +134,30 @@ When a message is sent:

 ## Agent Execution

-Each agent has a different execution model and communication pattern.
+Each agent has a different execution model and communication pattern. There are two main architectural patterns:
+
+### Architecture Patterns
+
+**Subprocess Model (Claude, Amp):**
+- New process spawned per message/turn
+- Process terminates after turn completes
+- Multi-turn via CLI resume flags (`--resume`, `--continue`)
+- Simple but has process spawn overhead
+
+**Client/Server Model (OpenCode, Codex):**
+- Single long-running server process
+- Multiple sessions/threads multiplexed via RPC
+- Multi-turn via server-side thread persistence
+- More efficient for repeated interactions

 ### Overview

-| Agent | Execution Model | Binary Source | Session Resume |
-|-------|-----------------|---------------|----------------|
-| Claude Code | CLI subprocess | GCS (Anthropic) | Yes (`--resume`) |
-| Codex | App Server subprocess (JSON-RPC) | GitHub releases | No |
-| OpenCode | HTTP server + SSE | GitHub releases | Yes (server-side) |
-| Amp | CLI subprocess | GCS (Amp) | Yes (`--continue`) |
+| Agent | Architecture | Binary Source | Multi-Turn Method |
+|-------|--------------|---------------|-------------------|
+| Claude Code | Subprocess (per-turn) | GCS (Anthropic) | `--resume` flag |
+| Codex | **Shared Server (JSON-RPC)** | GitHub releases | **Thread persistence** |
+| OpenCode | HTTP Server (SSE) | GitHub releases | Server-side sessions |
+| Amp | Subprocess (per-turn) | GCS (Amp) | `--continue` flag |

 ### Claude Code

@ -161,15 +176,25 @@ claude --print --output-format stream-json --verbose \

 ### Codex

-Spawned as a subprocess using the App Server JSON-RPC protocol:
+Uses a **shared app-server process** that handles multiple sessions via JSON-RPC over stdio:

 ```bash
 codex app-server
 ```

- JSON-RPC over stdio (JSONL)
- Uses `initialize`, `thread/start`, and `turn/start` requests
- Approval requests arrive as server JSON-RPC requests
+**Daemon flow:**
+1. First Codex session triggers `codex app-server` spawn
+2. Performs `initialize` / `initialized` handshake
+3. Each session creation sends `thread/start` → receives `thread_id`
+4. Messages sent via `turn/start` with `thread_id`
+5. Notifications routed back to session by `thread_id`
+
+**Key characteristics:**
+- Single process handles all Codex sessions
+- JSON-RPC over stdio (JSONL format)
+- Thread IDs map to daemon session IDs
+- Approval requests arrive as server-to-client JSON-RPC requests
+- Process lifetime matches daemon lifetime (not per-turn)

 ### OpenCode

@ -208,12 +233,21 @@ amp [--execute|--print] [--output-format stream-json] \

 ### Communication Patterns

-**Subprocess agents (Claude, Codex, Amp):**
+**Per-turn subprocess agents (Claude, Amp):**
 1. Agent CLI spawned with appropriate flags
 2. Stdout/stderr read line-by-line
 3. Each line parsed as JSON
 4. Events converted via `parse_agent_line()` → agent-specific converter
 5. Universal events recorded and broadcast to SSE subscribers
+6. Process terminated on turn completion
+
+**Shared stdio server agent (Codex):**
+1. Single `codex app-server` process started on first session
+2. `initialize`/`initialized` handshake performed once
+3. New sessions send `thread/start`, receive `thread_id`
+4. Messages sent via `turn/start` with `thread_id`
+5. Notifications read from stdout, routed by `thread_id`
+6. Process persists across sessions and turns

 **HTTP server agent (OpenCode):**
 1. Server started on available port (if not running)
--- a/docs/building-chat-ui.mdx
+++ b/docs/building-chat-ui.mdx
@ -131,13 +131,15 @@ timestamps, not ordering.

 ## Optional raw payloads

-If you need provider-level debugging, pass `include_raw=true` when streaming or polling events to
-receive the `raw` payload for each event.
+If you need provider-level debugging, pass `include_raw=true` when streaming or polling events
+(including one-turn streams) to receive the `raw` payload for each event.

-## SSE vs polling
+## SSE vs polling vs turn streaming

 - SSE gives low-latency updates and simplifies streaming UIs.
 - Polling is simpler to debug and works in any environment.
+- Turn streaming (`POST /v1/sessions/{session_id}/messages/stream`) is a one-shot stream tied to a
+  single prompt. The stream closes automatically once the turn completes.

 Both yield the same event payloads.

--- a/docs/cli.mdx
+++ b/docs/cli.mdx
@ -67,6 +67,16 @@ sandbox-agent sessions send-message my-session \
 ```
 </details>

+<details>
+<summary><strong>sessions send-message-stream</strong></summary>
+
+```bash
+sandbox-agent sessions send-message-stream my-session \
+  --message "Summarize the repository" \
+  --endpoint http://127.0.0.1:2468
+```
+</details>
+
 <details>
 <summary><strong>sessions events</strong></summary>

--- a/docs/openapi.json
+++ b/docs/openapi.json
@ -408,6 +408,60 @@
        }
      }
    },
+    "/v1/sessions/{session_id}/messages/stream": {
+      "post": {
+        "tags": [
+          "sessions"
+        ],
+        "operationId": "post_message_stream",
+        "parameters": [
+          {
+            "name": "session_id",
+            "in": "path",
+            "description": "Session id",
+            "required": true,
+            "schema": {
+              "type": "string"
+            }
+          },
+          {
+            "name": "include_raw",
+            "in": "query",
+            "description": "Include raw provider payloads",
+            "required": false,
+            "schema": {
+              "type": "boolean",
+              "nullable": true
+            }
+          }
+        ],
+        "requestBody": {
+          "content": {
+            "application/json": {
+              "schema": {
+                "$ref": "#/components/schemas/MessageRequest"
+              }
+            }
+          },
+          "required": true
+        },
+        "responses": {
+          "200": {
+            "description": "SSE event stream"
+          },
+          "404": {
+            "description": "",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/ProblemDetails"
+                }
+              }
+            }
+          }
+        }
+      }
+    },
    "/v1/sessions/{session_id}/permissions/{permission_id}/reply": {
      "post": {
        "tags": [
@ -1431,6 +1485,15 @@
          "daemon"
        ]
      },
+      "TurnStreamQuery": {
+        "type": "object",
+        "properties": {
+          "includeRaw": {
+            "type": "boolean",
+            "nullable": true
+          }
+        }
+      },
      "UniversalEvent": {
        "type": "object",
        "required": [
--- a/docs/quickstart.mdx
+++ b/docs/quickstart.mdx
@ -70,6 +70,15 @@ curl "http://127.0.0.1:2468/v1/sessions/my-session/events/sse?offset=0" \
  -H "Authorization: Bearer $SANDBOX_TOKEN"
 ```

+For a single-turn stream (post a message and get one streamed response):
+
+```bash
+curl -N -X POST "http://127.0.0.1:2468/v1/sessions/my-session/messages/stream" \
+  -H "Authorization: Bearer $SANDBOX_TOKEN" \
+  -H "content-type: application/json" \
+  -d '{"message":"Hello"}'
+```
+
 ## 5. CLI shortcuts

 The CLI mirrors the HTTP API:
@ -78,4 +87,6 @@ The CLI mirrors the HTTP API:
 sandbox-agent sessions create my-session --agent claude --endpoint http://127.0.0.1:2468 --token "$SANDBOX_TOKEN"

 sandbox-agent sessions send-message my-session --message "Hello" --endpoint http://127.0.0.1:2468 --token "$SANDBOX_TOKEN"
+
+sandbox-agent sessions send-message-stream my-session --message "Hello" --endpoint http://127.0.0.1:2468 --token "$SANDBOX_TOKEN"
 ```
--- a/docs/sdks/typescript.mdx
+++ b/docs/sdks/typescript.mdx
@ -86,10 +86,21 @@ for await (const event of client.streamEvents("demo-session", {
 The SDK parses `text/event-stream` into `UniversalEvent` objects. If you want full control, use
 `getEventsSse()` and parse the stream yourself.

+## Stream a single turn
+
+```ts
+for await (const event of client.streamTurn("demo-session", { message: "Hello" })) {
+  console.log(event.type, event.data);
+}
+```
+
+This method posts the message and streams only the next turn. For manual control, call
+`postMessageStream()` and parse the SSE response yourself.
+
 ## Optional raw payloads

-Set `includeRaw: true` on `getEvents` or `streamEvents` to include the raw provider payload in
-`event.raw`. This is useful for debugging and conversion analysis.
+Set `includeRaw: true` on `getEvents`, `streamEvents`, or `streamTurn` to include the raw provider
+payload in `event.raw`. This is useful for debugging and conversion analysis.

 ## Error handling

--- a/docs/telemetry.mdx
+++ b/docs/telemetry.mdx
@ -3,7 +3,7 @@ title: "Telemetry"
 description: "Anonymous telemetry collected by sandbox-agent."
 ---

-sandbox-agent sends a small, anonymous telemetry payload on startup to help us understand usage and improve reliability.
+sandbox-agent sends a small, anonymous telemetry payload on startup and then every 5 minutes to help us understand usage and improve reliability.

 ## What gets sent

@ -12,6 +12,7 @@ sandbox-agent sends a small, anonymous telemetry payload on startup to help us u
 - Detected sandbox provider (for example: Docker, E2B, Vercel Sandboxes).

 Each sandbox gets a random anonymous ID stored on disk so usage can be counted without identifying users.
+The last successful send time is also stored on disk, and heartbeats are rate-limited to at most one every 5 minutes.

 ## Opting out