SDK: Add ensureServer() for automatic server recovery (#260)

* SDK sandbox provisioning: built-in providers, docs restructure, and quickstart overhaul

- Add built-in sandbox providers (local, docker, e2b, daytona, vercel, cloudflare) to the TypeScript SDK so users import directly instead of passing client instances
- Restructure docs: rename architecture to orchestration-architecture, add new architecture page for server overview, improve getting started flow
- Rewrite quickstart to be TypeScript-first with provider CodeGroup and custom provider accordion
- Update all examples to use new provider APIs
- Update persist drivers and foundry for new SDK surface

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix SDK typecheck errors and update persist drivers for insertEvent signature

- Fix insertEvent call in client.ts to pass sessionId as first argument
- Update Daytona provider create options to use Partial type (image has default)
- Update StrictUniqueSessionPersistDriver in tests to match new insertEvent signature
- Sync persist packages, openapi spec, and docs with upstream changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Modal and ComputeSDK built-in providers, update examples and docs

- Add `sandbox-agent/modal` provider using Modal SDK with node:22-slim image
- Add `sandbox-agent/computesdk` provider using ComputeSDK's unified sandbox API
- Update Modal and ComputeSDK examples to use new SDK providers
- Update Modal and ComputeSDK deploy docs with provider-based examples
- Add Modal to quickstart CodeGroup and docs.json navigation
- Add provider test entries for Modal and ComputeSDK
- Remove old standalone example files (modal.ts, computesdk.ts)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix Modal provider: pre-install agents in image, fire-and-forget exec for server

- Pre-install agents in Dockerfile commands so they are cached across creates
- Use fire-and-forget exec (no wait) to keep server alive in Modal sandbox
- Add memoryMiB option (default 2GB) to avoid OOM during agent install

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Sync upstream changes: multiplayer docs, logos, openapi spec, foundry config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* SDK: Add ensureServer() for automatic server recovery

Add ensureServer() to SandboxProvider interface to handle cases where the
sandbox-agent server stops or goes to sleep. The SDK now calls this method
after 3 consecutive health-check failures, allowing providers to restart the
server if needed. Most built-in providers (E2B, Daytona, Vercel, Modal,
ComputeSDK) implement this. Docker and Cloudflare manage server lifecycle
differently, and Local uses managed child processes.

Also update docs for quickstart, architecture, multiplayer, and session
persistence; mark persist-* packages as deprecated; and add ensureServer
implementations to all applicable providers.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* wip

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Nathan Flurry 2026-03-15 20:29:28 -07:00 committed by GitHub
parent 3426cbc6ec
commit cf7e2a92c6
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
112 changed files with 3739 additions and 3537 deletions

View file

@ -1,64 +1,63 @@
---
title: "Architecture"
description: "How the client, sandbox, server, and agent fit together."
icon: "microchip"
description: "How the Sandbox Agent server, SDK, and agent processes fit together."
---
Sandbox Agent runs as an HTTP server inside your sandbox. Your app talks to it remotely.
Sandbox Agent is a lightweight HTTP server that runs **inside** a sandbox. It:
- **Agent management**: Installs, spawns, and stops coding agent processes
- **Sessions**: Routes prompts to agents and streams events back in real time
- **Sandbox APIs**: Filesystem, process, and terminal access for the sandbox environment
## Components
- `Your client`: your app code using the `sandbox-agent` SDK.
- `Sandbox`: isolated runtime (E2B, Daytona, Docker, etc.).
- `Sandbox Agent server`: process inside the sandbox exposing HTTP transport.
- `Agent`: Claude/Codex/OpenCode/Amp process managed by Sandbox Agent.
```mermaid placement="top-right"
flowchart LR
CLIENT["Sandbox Agent SDK"]
SERVER["Sandbox Agent server"]
AGENT["Agent process"]
```mermaid
flowchart LR
CLIENT["Your App"]
subgraph SANDBOX["Sandbox"]
direction TB
SERVER --> AGENT
direction TB
SERVER["Sandbox Agent Server"]
AGENT["Agent Process<br/>(Claude, Codex, etc.)"]
SERVER --> AGENT
end
CLIENT -->|HTTP| SERVER
CLIENT -->|"SDK (HTTP)"| SERVER
```
## Suggested Topology
- **Your app**: Uses the `sandbox-agent` TypeScript SDK to talk to the server over HTTP.
- **Sandbox**: An isolated runtime (local process, Docker, E2B, Daytona, Vercel, Cloudflare).
- **Sandbox Agent server**: A single binary inside the sandbox that manages agent lifecycles, routes prompts, streams events, and exposes filesystem/process/terminal APIs.
- **Agent process**: A coding agent (Claude Code, Codex, etc.) spawned by the server. Each session maps to one agent process.
Run the SDK on your backend, then call it from your frontend.
## What `SandboxAgent.start()` does
This extra hop is recommended because it keeps auth/token logic on the backend and makes persistence simpler.
1. **Provision**: The provider creates a sandbox (starts a container, creates a VM, etc.)
2. **Install**: The Sandbox Agent binary is installed inside the sandbox
3. **Boot**: The server starts listening on an HTTP port
4. **Health check**: The SDK waits for `/v1/health` to respond
5. **Ready**: The SDK returns a connected client
```mermaid placement="top-right"
flowchart LR
BROWSER["Browser"]
subgraph BACKEND["Your backend"]
direction TB
SDK["Sandbox Agent SDK"]
end
subgraph SANDBOX_SIMPLE["Sandbox"]
SERVER_SIMPLE["Sandbox Agent server"]
end
For the `local` provider, provisioning is a no-op and the server runs as a local subprocess.
BROWSER --> BACKEND
BACKEND --> SDK --> SERVER_SIMPLE
### Server recovery
If the server process stops, the SDK automatically calls the provider's `ensureServer()` after 3 consecutive health-check failures. Most built-in providers implement this. Custom providers can add `ensureServer(sandboxId)` to their `SandboxProvider` object.
## Server HTTP API
See the [HTTP API reference](/api-reference) for the full list of server endpoints.
## Agent installation
Agents are installed lazily on first use. To avoid the cold-start delay, pre-install them:
```bash
sandbox-agent install-agent --all
```
### Backend requirements
The `rivetdev/sandbox-agent:0.3.2-full` Docker image ships with all agents pre-installed.
Your backend layer needs to handle:
## Production-ready agent orchestration
- **Long-running connections**: prompts can take minutes.
- **Session affinity**: follow-up messages must reach the same session.
- **State between requests**: session metadata and event history must persist across requests.
- **Graceful recovery**: sessions should resume after backend restarts.
We recommend [Rivet](https://rivet.dev) over serverless because actors natively support the long-lived connections, session routing, and state persistence that agent workloads require.
## Session persistence
For storage driver options and replay behavior, see [Persisting Sessions](/session-persistence).
For production deployments, see [Orchestration Architecture](/orchestration-architecture) for recommended topology, backend requirements, and session persistence patterns.