wip examples and content

2026-04-15 10:05:18 +00:00 · 2026-01-28 02:56:22 -08:00 · 2026-01-28 02:56:22 -08:00 · 0bbe92b344
commit 0bbe92b344
parent fa89872d3b
11 changed files with 724 additions and 151 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -45,6 +45,7 @@ Universal schema guidance:
 - Use `docs/glossary.md` as the source of truth for universal schema terminology and keep it updated alongside schema changes.
 - On parse failures, emit an `agent.unparsed` event (source=daemon, synthetic=true) and treat it as a test failure. Preserve raw payloads when `include_raw=true`.
 - Track subagent support in `docs/conversion.md`. For now, normalize subagent activity into normal message/tool flow, but revisit explicit subagent modeling later.
+- Keep the FAQ in `README.md` and `frontend/packages/website/src/components/FAQ.tsx` in sync. When adding or modifying FAQ entries, update both files.

 ### CLI ⇄ HTTP endpoint map (keep in sync)

--- a/README.md
+++ b/README.md
@ -6,7 +6,6 @@
  Universal API for automatic coding agents in sandboxes. Supports Claude Code, Codex, OpenCode, and Amp.
 </p>

-Docs: https://rivet.dev/docs/

 - **Any coding agent**: Universal API to interact with all agents with full feature coverage
 - **Server or SDK mode**: Run as an HTTP server or with the TypeScript SDK
@ -14,13 +13,9 @@ Docs: https://rivet.dev/docs/
 - **Supports your sandbox provider**: Daytona, E2B, Vercel Sandboxes, and more
 - **Lightweight, portable Rust binary**: Install anywhere with 1 curl command
 - **Automatic agent installation**: Agents are installed on-demand when first used
- **OpenAPI spec**: https://rivet.dev/docs/api
+- **OpenAPI spec**: https://sandboxagent.dev/docs/api

-Roadmap:
-
- [ ] Python SDK
- [ ] Automatic MCP & skill & hook configuration
- [ ] Todo lists
+[Documentation](https://sandboxagent.dev/docs) — [Discord](https://rivet.dev/discord)

 ## Agent Compatibility

@ -55,7 +50,7 @@ The Sandbox Agent acts as a universal adapter between your client application an
 - **Embedded Mode**: Runs agents locally as subprocesses
 - **Server Mode**: Runs as HTTP server from any sandbox provider

-[Documentation](https://rivet.dev/docs/architecture)
+[Documentation](https://sandboxagent.dev/docs/architecture)

 ## Components

@ -121,7 +116,7 @@ for await (const event of client.streamEvents("demo", { offset: 0 })) {
 }
 ```

-Full guide: https://rivet.dev/docs/sdks/typescript
+[Documentation](https://sandboxagent.dev/docs/sdks/typescript)

 ### Server

@ -129,7 +124,7 @@ Install the binary (fastest installation, no Node.js required):

 ```bash
 # Install it
-curl -fsSL https://releases.rivet.dev/sandbox-agent/latest/install.sh | sh
+curl -fsSL https://releases.sandboxagent.dev/sandbox-agent/latest/install.sh | sh
 # Run it
 sandbox-agent server --token "$SANDBOX_TOKEN" --host 127.0.0.1 --port 2468
 ```
@ -149,8 +144,8 @@ To disable auth locally:
 sandbox-agent server --no-token --host 127.0.0.1 --port 2468
 ```

-Docs: https://rivet.dev/docs/quickstart
-Integration guides: https://rivet.dev/docs/deployments
+[Documentation](https://sandboxagent.dev/docs/quickstart)
+[Integration guides](https://sandboxagent.dev/docs/deployments)

 ### CLI

@ -168,16 +163,53 @@ sandbox-agent api sessions send-message my-session --message "Hello" --endpoint
 sandbox-agent api sessions send-message-stream my-session --message "Hello" --endpoint http://127.0.0.1:2468 --token "$SANDBOX_TOKEN"
 ```

-Docs: https://rivet.dev/docs/cli
+You can also use npx like:
+
+```bash
+npx sandbox-agent --help
+```
+
+[Documentation](https://rivet.dev/docs/cli)

 ### Tip: Extract credentials

+Often you need to use your personal API tokens to test agents on sandboxes:
+
 ```bash
 sandbox-agent credentials extract-env --export
 ```

-This prints environment variables for your locally installed agents.
-Docs: https://rivet.dev/docs/quickstart
+This prints environment variables for your OpenAI/Anthropic/etc API keys to test with Sandbox Agent SDK.
+
+## FAQ
+
+**Does this replace the Vercel AI SDK?**
+
+No, they're complementary. AI SDK is for building chat interfaces and calling LLMs. This SDK is for controlling autonomous coding agents that write code and run commands. Use AI SDK for your UI, use this when you need an agent to actually code.
+
+**Which coding agents are supported?**
+
+Claude Code, Codex, OpenCode, and Amp. The SDK normalizes their APIs so you can swap between them without changing your code.
+
+**How is session data persisted?**
+
+This SDK does not handle persisting session data. Events stream in a universal JSON schema that you can persist anywhere. Consider using Postgres or [Rivet Actors](https://rivet.gg) for data persistence.
+
+**Can I run this locally or does it require a sandbox provider?**
+
+Both. Run locally for development, deploy to E2B, Daytona, or Vercel Sandboxes for production.
+
+**Does it support [platform]?**
+
+The server is a single Rust binary that runs anywhere with a curl install. If your platform can run Linux binaries (Docker, VMs, etc.), it works. See the deployment guides for E2B, Daytona, and Vercel Sandboxes.
+
+**Can I use this with my personal API keys?**
+
+Yes. Use `sandbox-agent credentials extract-env` to extract API keys from your local agent configs (Claude Code, Codex, OpenCode, Amp) and pass them to the sandbox environment.
+
+**Why Rust and not [language]?**
+
+Rust gives us a single static binary, fast startup, and predictable memory usage. That makes it easy to run inside sandboxes or in CI without shipping a large runtime, such as Node.js.

 ## Project Goals

@ -194,43 +226,10 @@ Features out of scope:
 - **Git Repo Management**: Just use git commands or the features provided by your sandbox provider of choice.
 - **Sandbox Provider API**: Sandbox providers have many nuanced differences in their API, it does not make sense for us to try to provide a custom layer. Instead, we opt to provide guides that let you integrate this project with sandbox providers.

-## FAQ
+## Roadmap

-**Why not use PTY?**
+- [ ] Python SDK
+- [ ] Automatic MCP & skill & hook configuration
+- [ ] Todo lists

-PTY-based approaches require parsing terminal escape sequences and dealing with interactive prompts.

-The agents we support all have machine-readable output modes (JSONL, HTTP APIs) that provide structured events, making integration more reliable.
-
-**Why not use features that already exist on sandbox provider APIs?**
-
-Sandbox providers focus on infrastructure (containers, VMs, networking).
-
-This project focuses specifically on coding agent orchestration: session management, HITL (human-in-the-loop) flows, and universal event schemas. These concerns are complementary.
-
-**Does it support [platform]?**
-The server is a single Rust binary that runs anywhere with a curl install. If your platform can run Linux binaries (Docker, VMs, etc.), it works. See the deployment guides for E2B, Daytona, Vercel Sandboxes, and Docker.
-
-**Can I use this with my personal API keys?**
-Yes. Use `sandbox-agent credentials extract-env` to extract API keys from your local agent configs (Claude Code, Codex, OpenCode, Amp) and pass them to the sandbox environment.
-
-**Why Rust?**
-Rust gives us a single static binary, fast startup, and predictable memory usage. That makes it
-easy to run inside sandboxes or in CI without shipping a large runtime.
-
-**Why not use stdio/JSON-RPC?**
-
- has benefit of not having to listen on a port
- more difficult to interact with, harder to analyze, doesn't support inspector for debugging
- may add at some point
- Codex does this and Claude has a JSON stream, but HTTP/SSE gives us a consistent API surface and inspector UI.
-
-**Why not AI SDK?**
-
- AI SDK does not provide harness for bieng a fully fledged coding agent
- Fronteir coding agent harnesses have a lot of work put in to complex things like swarms, compaction, etc
-
-**Why not OpenCode server?**
-
- The harnesses do a lot of heavy lifting, but different agents have very different APIs and behavior.
- A universal API lets you swap agents without rewriting your orchestration code.
--- a/examples/daytona/package.json
+++ b/examples/daytona/package.json
@ -3,7 +3,7 @@
  "private": true,
  "type": "module",
  "scripts": {
-    "start": "tsx src/daytona-fallback.ts",
+    "start": "tsx src/daytona.ts",
    "typecheck": "tsc --noEmit"
  },
  "dependencies": {
--- a/examples/daytona/src/daytona-fallback.ts
+++ b/examples/daytona/src/daytona-fallback.ts
@ -1,76 +0,0 @@
-import { Daytona, Image } from "@daytonaio/sdk";
-import { logInspectorUrl, runPrompt } from "@sandbox-agent/example-shared";
-
-if (
-	!process.env.DAYTONA_API_KEY ||
-	(!process.env.OPENAI_API_KEY && !process.env.ANTHROPIC_API_KEY)
-) {
-	throw new Error(
-		"DAYTONA_API_KEY and (OPENAI_API_KEY or ANTHROPIC_API_KEY) required",
-	);
-}
-
-const SNAPSHOT = "sandbox-agent-ready";
-const AGENT_BIN_DIR = "/root/.local/share/sandbox-agent/bin";
-
-const daytona = new Daytona();
-
-const hasSnapshot = await daytona.snapshot.get(SNAPSHOT).then(
-	() => true,
-	() => false,
-);
-if (!hasSnapshot) {
-	console.log(`Creating snapshot '${SNAPSHOT}' (one-time setup, ~2-3min)...`);
-	await daytona.snapshot.create(
-		{
-			name: SNAPSHOT,
-			image: Image.base("ubuntu:22.04").runCommands(
-				// Install dependencies
-				"apt-get update && apt-get install -y curl ca-certificates",
-				// Install sandbox-agent
-				"curl -fsSL https://releases.rivet.dev/sandbox-agent/latest/install.sh | sh",
-				// Create agent bin directory
-				`mkdir -p ${AGENT_BIN_DIR}`,
-				// Install Claude: get latest version, download binary
-				`CLAUDE_VERSION=$(curl -fsSL https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest) && ` +
-					`curl -fsSL -o ${AGENT_BIN_DIR}/claude "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/$CLAUDE_VERSION/linux-x64/claude" && ` +
-					`chmod +x ${AGENT_BIN_DIR}/claude`,
-				// Install Codex: download tarball, extract binary
-				`curl -fsSL -L https://github.com/openai/codex/releases/latest/download/codex-x86_64-unknown-linux-musl.tar.gz | tar -xzf - -C /tmp && ` +
-					`find /tmp -name 'codex-x86_64-unknown-linux-musl' -exec mv {} ${AGENT_BIN_DIR}/codex \\; && ` +
-					`chmod +x ${AGENT_BIN_DIR}/codex`,
-			),
-		},
-		{ onLogs: (log) => console.log(`  ${log}`) },
-	);
-	console.log("Snapshot created. Future runs will be instant.");
-}
-
-console.log("Creating sandbox...");
-const envVars: Record<string, string> = {};
-if (process.env.ANTHROPIC_API_KEY) envVars.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
-if (process.env.OPENAI_API_KEY) envVars.OPENAI_API_KEY = process.env.OPENAI_API_KEY;
-
-const sandbox = await daytona.create({
-	snapshot: SNAPSHOT,
-	envVars,
-});
-
-console.log("Starting server...");
-await sandbox.process.executeCommand(
-	"nohup sandbox-agent server --no-token --host 0.0.0.0 --port 3000 >/tmp/sandbox-agent.log 2>&1 &",
-);
-
-const baseUrl = (await sandbox.getSignedPreviewUrl(3000, 4 * 60 * 60)).url;
-logInspectorUrl({ baseUrl });
-
-const cleanup = async () => {
-	console.log("Cleaning up...");
-	await sandbox.delete(60);
-	process.exit(0);
-};
-process.once("SIGINT", cleanup);
-process.once("SIGTERM", cleanup);
-
-await runPrompt({ baseUrl });
-await cleanup();
--- a/examples/daytona/src/daytona.ts
+++ b/examples/daytona/src/daytona.ts
@ -1,17 +1,42 @@
 import { Daytona, Image } from "@daytonaio/sdk";
 import { logInspectorUrl, runPrompt } from "@sandbox-agent/example-shared";
+import { readFileSync } from "node:fs";
+import { homedir } from "node:os";
+import { join } from "node:path";

-if (
-	!process.env.DAYTONA_API_KEY ||
-	(!process.env.OPENAI_API_KEY && !process.env.ANTHROPIC_API_KEY)
-) {
+// Extract API key from Claude's config files
+function getAnthropicApiKey(): string | undefined {
+	if (process.env.ANTHROPIC_API_KEY) return process.env.ANTHROPIC_API_KEY;
+
+	const home = homedir();
+	const configPaths = [
+		join(home, ".claude.json"),
+		join(home, ".claude.json.api"),
+	];
+
+	for (const path of configPaths) {
+		try {
+			const data = JSON.parse(readFileSync(path, "utf-8"));
+			const key = data.primaryApiKey || data.apiKey || data.anthropicApiKey;
+			if (key?.startsWith("sk-ant-")) return key;
+		} catch {
+			// Ignore errors
+		}
+	}
+	return undefined;
+}
+
+const anthropicKey = getAnthropicApiKey();
+const openaiKey = process.env.OPENAI_API_KEY;
+
+if (!process.env.DAYTONA_API_KEY || (!anthropicKey && !openaiKey)) {
 	throw new Error(
-		"DAYTONA_API_KEY and (OPENAI_API_KEY or ANTHROPIC_API_KEY) required",
+		"DAYTONA_API_KEY and (ANTHROPIC_API_KEY or OPENAI_API_KEY) required",
 	);
 }

 const SNAPSHOT = "sandbox-agent-ready";
-const BINARY = "/usr/local/bin/sandbox-agent";
+const AGENT_BIN_DIR = "/root/.local/share/sandbox-agent/bin";

 const daytona = new Daytona();

@ -27,11 +52,18 @@ if (!hasSnapshot) {
 			image: Image.base("ubuntu:22.04").runCommands(
 				// Install dependencies
 				"apt-get update && apt-get install -y curl ca-certificates",
-				// Install sandbox-agent via install script
+				// Install sandbox-agent
 				"curl -fsSL https://releases.rivet.dev/sandbox-agent/latest/install.sh | sh",
-				// Pre-install agents using sandbox-agent CLI
-				"sandbox-agent install-agent claude",
-				"sandbox-agent install-agent codex",
+				// Create agent bin directory
+				`mkdir -p ${AGENT_BIN_DIR}`,
+				// Install Claude: get latest version, download binary
+				`CLAUDE_VERSION=$(curl -fsSL https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest) && ` +
+					`curl -fsSL -o ${AGENT_BIN_DIR}/claude "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/$CLAUDE_VERSION/linux-x64/claude" && ` +
+					`chmod +x ${AGENT_BIN_DIR}/claude`,
+				// Install Codex: download tarball, extract binary
+				`curl -fsSL -L https://github.com/openai/codex/releases/latest/download/codex-x86_64-unknown-linux-musl.tar.gz | tar -xzf - -C /tmp && ` +
+					`find /tmp -name 'codex-x86_64-unknown-linux-musl' -exec mv {} ${AGENT_BIN_DIR}/codex \\; && ` +
+					`chmod +x ${AGENT_BIN_DIR}/codex`,
 			),
 		},
 		{ onLogs: (log) => console.log(`  ${log}`) },
@ -41,8 +73,8 @@ if (!hasSnapshot) {

 console.log("Creating sandbox...");
 const envVars: Record<string, string> = {};
-if (process.env.ANTHROPIC_API_KEY) envVars.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
-if (process.env.OPENAI_API_KEY) envVars.OPENAI_API_KEY = process.env.OPENAI_API_KEY;
+if (anthropicKey) envVars.ANTHROPIC_API_KEY = anthropicKey;
+if (openaiKey) envVars.OPENAI_API_KEY = openaiKey;

 const sandbox = await daytona.create({
 	snapshot: SNAPSHOT,
@ -51,13 +83,35 @@ const sandbox = await daytona.create({

 console.log("Starting server...");
 await sandbox.process.executeCommand(
-	`nohup ${BINARY} server --no-token --host 0.0.0.0 --port 3000 >/tmp/sandbox-agent.log 2>&1 &`,
+	"nohup sandbox-agent server --no-token --host 0.0.0.0 --port 3000 >/tmp/sandbox-agent.log 2>&1 &",
 );

+// Wait for server to be ready
+await new Promise((r) => setTimeout(r, 2000));
+
+// Debug: check environment and agent binaries
+const envCheck = await sandbox.process.executeCommand(
+	"env | grep -E 'ANTHROPIC|OPENAI' | sed 's/=.*/=<set>/'",
+);
+console.log("Sandbox env:", envCheck.result.output || "(none)");
+
+const binCheck = await sandbox.process.executeCommand(
+	`ls -la ${AGENT_BIN_DIR}/`,
+);
+console.log("Agent binaries:", binCheck.result.output);
+
 const baseUrl = (await sandbox.getSignedPreviewUrl(3000, 4 * 60 * 60)).url;
 logInspectorUrl({ baseUrl });

 const cleanup = async () => {
+	// Show server logs before cleanup
+	const logs = await sandbox.process.executeCommand(
+		"cat /tmp/sandbox-agent.log 2>/dev/null | tail -50",
+	);
+	if (logs.result.output) {
+		console.log("\n--- Server logs ---");
+		console.log(logs.result.output);
+	}
 	console.log("Cleaning up...");
 	await sandbox.delete(60);
 	process.exit(0);
--- a/examples/shared/src/sandbox-agent-client.ts
+++ b/examples/shared/src/sandbox-agent-client.ts
@ -154,7 +154,7 @@ export async function createSession({
  const normalized = normalizeBaseUrl(baseUrl);
  const sessionId = randomUUID();
  const body: Record<string, string> = {
-    agent: agentId || process.env.SANDBOX_AGENT || "claude",
+    agent: agentId || detectAgent(),
  };
  const envAgentMode = agentMode || process.env.SANDBOX_AGENT_MODE;
  const envPermissionMode = permissionMode || process.env.SANDBOX_PERMISSION_MODE;
@ -269,6 +269,15 @@ export async function sendMessageStream({
  return fullText;
 }

+function detectAgent(): string {
+  // Prefer explicit setting
+  if (process.env.SANDBOX_AGENT) return process.env.SANDBOX_AGENT;
+  // Select based on available API key
+  if (process.env.ANTHROPIC_API_KEY) return "claude";
+  if (process.env.OPENAI_API_KEY) return "codex";
+  return "claude";
+}
+
 export async function runPrompt({
  baseUrl,
  token,
@ -286,11 +295,10 @@ export async function runPrompt({
    headers: extraHeaders,
  });

+  const agent = agentId || detectAgent();
  const sessionId = randomUUID();
-  await client.createSession(sessionId, {
-    agent: agentId || process.env.SANDBOX_AGENT || "claude",
-  });
-  console.log(`Session ${sessionId} ready. Press Ctrl+C to quit.`);
+  await client.createSession(sessionId, { agent });
+  console.log(`Session ${sessionId} using ${agent}. Press Ctrl+C to quit.`);

  let isThinking = false;
  let hasStartedOutput = false;
@ -334,6 +342,12 @@ export async function runPrompt({
        }
      }

+      // Handle errors
+      if (event.type === "error") {
+        const data = event.data as any;
+        console.error(`\nError: ${data?.message || JSON.stringify(data)}`);
+      }
+
      // Handle session ended
      if (event.type === "session.ended") {
        const data = event.data as any;
--- a/frontend/packages/website/src/components/FAQ.tsx
+++ b/frontend/packages/website/src/components/FAQ.tsx
@ -18,17 +18,27 @@ const faqs = [
  {
    question: 'How is session data persisted?',
    answer:
-      "Events stream in a universal JSON schema. Persist them anywhere. We have adapters for Postgres and ClickHouse, or use <a href='https://rivet.gg' target='_blank' rel='noopener noreferrer' class='text-orange-400 hover:underline'>Rivet Actors</a> for managed stateful storage.",
+      "This SDK does not handle persisting session data. Events stream in a universal JSON schema that you can persist anywhere. Consider using Postgres or <a href='https://rivet.gg' target='_blank' rel='noopener noreferrer' class='text-orange-400 hover:underline'>Rivet Actors</a> for data persistence.",
  },
  {
    question: 'Can I run this locally or does it require a sandbox provider?',
    answer:
-      "Both. Run locally for development, deploy to E2B, Daytona, Vercel, or Docker for production.",
+      "Both. Run locally for development, deploy to E2B, Daytona, or Vercel Sandboxes for production.",
  },
  {
-    question: 'Is this open source?',
+    question: 'Does it support [platform]?',
    answer:
-      "Yes, MIT licensed. Code is on GitHub.",
+      "The server is a single Rust binary that runs anywhere with a curl install. If your platform can run Linux binaries (Docker, VMs, etc.), it works. See the deployment guides for E2B, Daytona, and Vercel Sandboxes.",
+  },
+  {
+    question: 'Can I use this with my personal API keys?',
+    answer:
+      "Yes. Use <code>sandbox-agent credentials extract-env</code> to extract API keys from your local agent configs (Claude Code, Codex, OpenCode, Amp) and pass them to the sandbox environment.",
+  },
+  {
+    question: 'Why Rust and not [language]?',
+    answer:
+      "Rust gives us a single static binary, fast startup, and predictable memory usage. That makes it easy to run inside sandboxes or in CI without shipping a large runtime, such as Node.js.",
  },
 ];

--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@ -253,6 +253,22 @@ importers:
        specifier: ^7.5.8
        version: 7.7.1

+  scripts/sandbox-testing:
+    dependencies:
+      '@daytonaio/sdk':
+        specifier: latest
+        version: 0.135.0(ws@8.19.0)
+    devDependencies:
+      '@types/node':
+        specifier: latest
+        version: 25.0.10
+      tsx:
+        specifier: latest
+        version: 4.21.0
+      typescript:
+        specifier: latest
+        version: 5.9.3
+
  sdks/cli:
    devDependencies:
      vitest:
--- a/pnpm-workspace.yaml
+++ b/pnpm-workspace.yaml
@ -6,4 +6,5 @@ packages:
  - "resources/agent-schemas"
  - "resources/vercel-ai-sdk-schemas"
  - "scripts/release"
+  - "scripts/sandbox-testing"
  - "examples/*"
--- a/scripts/sandbox-testing/package.json
+++ b/scripts/sandbox-testing/package.json
@ -0,0 +1,20 @@
+{
+	"name": "@sandbox-agent/testing",
+	"private": true,
+	"type": "module",
+	"scripts": {
+		"test": "tsx test-sandbox.ts",
+		"test:docker": "tsx test-sandbox.ts docker",
+		"test:daytona": "tsx test-sandbox.ts daytona",
+		"test:mock": "tsx test-sandbox.ts docker --agent=mock",
+		"test:verbose": "tsx test-sandbox.ts docker --verbose"
+	},
+	"dependencies": {
+		"@daytonaio/sdk": "latest"
+	},
+	"devDependencies": {
+		"@types/node": "latest",
+		"tsx": "latest",
+		"typescript": "latest"
+	}
+}
--- a/scripts/sandbox-testing/test-sandbox.ts
+++ b/scripts/sandbox-testing/test-sandbox.ts
@ -0,0 +1,534 @@
+#!/usr/bin/env npx tsx
+/**
+ * Sandbox Testing Script
+ *
+ * Tests sandbox-agent on various cloud sandbox providers.
+ * Usage: npx tsx test-sandbox.ts [provider] [options]
+ *
+ * Providers: daytona, e2b, docker
+ *
+ * Options:
+ *   --skip-build     Skip cargo build step
+ *   --use-release    Use pre-built release binary from releases.rivet.dev
+ *   --agent <name>   Test specific agent (claude, codex, mock)
+ *   --keep-alive     Don't cleanup sandbox after test
+ *   --verbose        Show all logs
+ */
+
+import { execSync, spawn } from "node:child_process";
+import { existsSync, readFileSync, mkdtempSync, writeFileSync, rmSync } from "node:fs";
+import { homedir, tmpdir } from "node:os";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const ROOT_DIR = join(__dirname, "../..");
+const SERVER_DIR = join(ROOT_DIR, "server");
+
+// Parse args
+const args = process.argv.slice(2);
+const provider = args.find((a) => !a.startsWith("--")) || "docker";
+const skipBuild = args.includes("--skip-build");
+const useRelease = args.includes("--use-release");
+const keepAlive = args.includes("--keep-alive");
+const verbose = args.includes("--verbose");
+const agentArg = args.find((a) => a.startsWith("--agent="))?.split("=")[1];
+
+// Colors
+const log = {
+	info: (msg: string) => console.log(`\x1b[34m[INFO]\x1b[0m ${msg}`),
+	success: (msg: string) => console.log(`\x1b[32m[OK]\x1b[0m ${msg}`),
+	error: (msg: string) => console.log(`\x1b[31m[ERROR]\x1b[0m ${msg}`),
+	warn: (msg: string) => console.log(`\x1b[33m[WARN]\x1b[0m ${msg}`),
+	debug: (msg: string) => verbose && console.log(`\x1b[90m[DEBUG]\x1b[0m ${msg}`),
+	section: (msg: string) => console.log(`\n\x1b[1m=== ${msg} ===\x1b[0m`),
+};
+
+// Credentials extraction (mirrors agent-credentials logic)
+function getAnthropicApiKey(): string | undefined {
+	if (process.env.ANTHROPIC_API_KEY) return process.env.ANTHROPIC_API_KEY;
+	const home = homedir();
+	for (const path of [join(home, ".claude.json"), join(home, ".claude.json.api")]) {
+		try {
+			const data = JSON.parse(readFileSync(path, "utf-8"));
+			const key = data.primaryApiKey || data.apiKey || data.anthropicApiKey;
+			if (key?.startsWith("sk-ant-")) return key;
+		} catch {}
+	}
+	return undefined;
+}
+
+function getOpenAiApiKey(): string | undefined {
+	if (process.env.OPENAI_API_KEY) return process.env.OPENAI_API_KEY;
+	const home = homedir();
+	try {
+		const data = JSON.parse(readFileSync(join(home, ".codex", "codex.json"), "utf-8"));
+		if (data.apiKey) return data.apiKey;
+	} catch {}
+	return undefined;
+}
+
+// Build sandbox-agent
+async function buildSandboxAgent(): Promise<string> {
+	log.section("Building sandbox-agent");
+
+	if (useRelease) {
+		log.info("Using pre-built release from releases.rivet.dev");
+		return "RELEASE";
+	}
+
+	if (skipBuild) {
+		const binaryPath = join(SERVER_DIR, "target/release/sandbox-agent");
+		if (!existsSync(binaryPath)) {
+			throw new Error(`Binary not found at ${binaryPath}. Run without --skip-build.`);
+		}
+		log.info(`Using existing binary: ${binaryPath}`);
+		return binaryPath;
+	}
+
+	log.info("Running cargo build --release...");
+	try {
+		execSync("cargo build --release -p sandbox-agent", {
+			cwd: SERVER_DIR,
+			stdio: verbose ? "inherit" : "pipe",
+		});
+		const binaryPath = join(SERVER_DIR, "target/release/sandbox-agent");
+		log.success(`Built: ${binaryPath}`);
+		return binaryPath;
+	} catch (err) {
+		throw new Error(`Build failed: ${err}`);
+	}
+}
+
+// Provider interface
+interface SandboxProvider {
+	name: string;
+	requiredEnv: string[];
+	create(opts: { envVars: Record<string, string> }): Promise<Sandbox>;
+}
+
+interface Sandbox {
+	id: string;
+	exec(cmd: string): Promise<{ stdout: string; stderr: string; exitCode: number }>;
+	upload(localPath: string, remotePath: string): Promise<void>;
+	getBaseUrl(port: number): Promise<string>;
+	cleanup(): Promise<void>;
+}
+
+// Docker provider
+const dockerProvider: SandboxProvider = {
+	name: "docker",
+	requiredEnv: [],
+	async create({ envVars }) {
+		const id = `sandbox-test-${Date.now()}`;
+		const envArgs = Object.entries(envVars)
+			.map(([k, v]) => `-e ${k}=${v}`)
+			.join(" ");
+
+		log.info(`Creating Docker container: ${id}`);
+		execSync(
+			`docker run -d --name ${id} ${envArgs} -p 0:3000 ubuntu:22.04 tail -f /dev/null`,
+			{ stdio: verbose ? "inherit" : "pipe" },
+		);
+
+		// Install curl
+		execSync(`docker exec ${id} bash -c "apt-get update && apt-get install -y curl ca-certificates"`, {
+			stdio: verbose ? "inherit" : "pipe",
+		});
+
+		return {
+			id,
+			async exec(cmd) {
+				try {
+					const stdout = execSync(`docker exec ${id} bash -c "${cmd.replace(/"/g, '\\"')}"`, {
+						encoding: "utf-8",
+						stdio: ["pipe", "pipe", "pipe"],
+					});
+					return { stdout, stderr: "", exitCode: 0 };
+				} catch (err: any) {
+					return { stdout: err.stdout || "", stderr: err.stderr || "", exitCode: err.status || 1 };
+				}
+			},
+			async upload(localPath, remotePath) {
+				execSync(`docker cp "${localPath}" ${id}:${remotePath}`, { stdio: verbose ? "inherit" : "pipe" });
+			},
+			async getBaseUrl(port) {
+				const portMapping = execSync(`docker port ${id} ${port}`, { encoding: "utf-8" }).trim();
+				const hostPort = portMapping.split(":").pop();
+				return `http://localhost:${hostPort}`;
+			},
+			async cleanup() {
+				log.info(`Cleaning up container: ${id}`);
+				execSync(`docker rm -f ${id}`, { stdio: "pipe" });
+			},
+		};
+	},
+};
+
+// Daytona provider
+const daytonaProvider: SandboxProvider = {
+	name: "daytona",
+	requiredEnv: ["DAYTONA_API_KEY"],
+	async create({ envVars }) {
+		const { Daytona } = await import("@daytonaio/sdk");
+		const daytona = new Daytona();
+
+		log.info("Creating Daytona sandbox...");
+		const sandbox = await daytona.create({
+			image: "ubuntu:22.04",
+			envVars,
+		});
+		const id = sandbox.id;
+
+		// Install curl
+		await sandbox.process.executeCommand("apt-get update && apt-get install -y curl ca-certificates");
+
+		return {
+			id,
+			async exec(cmd) {
+				const result = await sandbox.process.executeCommand(cmd);
+				return {
+					stdout: result.result.output || "",
+					stderr: result.result.error || "",
+					exitCode: result.result.exitCode,
+				};
+			},
+			async upload(localPath, remotePath) {
+				const content = readFileSync(localPath);
+				await sandbox.fs.uploadFile(remotePath, content);
+				await sandbox.process.executeCommand(`chmod +x ${remotePath}`);
+			},
+			async getBaseUrl(port) {
+				const preview = await sandbox.getSignedPreviewUrl(port, 4 * 60 * 60);
+				return preview.url;
+			},
+			async cleanup() {
+				log.info(`Cleaning up Daytona sandbox: ${id}`);
+				await sandbox.delete(60);
+			},
+		};
+	},
+};
+
+// Get provider
+function getProvider(name: string): SandboxProvider {
+	switch (name) {
+		case "docker":
+			return dockerProvider;
+		case "daytona":
+			return daytonaProvider;
+		default:
+			throw new Error(`Unknown provider: ${name}. Available: docker, daytona`);
+	}
+}
+
+// Install sandbox-agent in sandbox
+async function installSandboxAgent(sandbox: Sandbox, binaryPath: string): Promise<void> {
+	log.section("Installing sandbox-agent");
+
+	if (binaryPath === "RELEASE") {
+		log.info("Installing from releases.rivet.dev...");
+		const result = await sandbox.exec(
+			"curl -fsSL https://releases.rivet.dev/sandbox-agent/latest/install.sh | sh",
+		);
+		log.debug(`Install output: ${result.stdout}`);
+		if (result.exitCode !== 0) {
+			throw new Error(`Install failed: ${result.stderr}`);
+		}
+	} else {
+		log.info(`Uploading local binary: ${binaryPath}`);
+		await sandbox.upload(binaryPath, "/usr/local/bin/sandbox-agent");
+	}
+
+	// Verify installation
+	const version = await sandbox.exec("sandbox-agent --version");
+	log.success(`Installed: ${version.stdout.trim()}`);
+}
+
+// Install agents
+async function installAgents(sandbox: Sandbox, agents: string[]): Promise<void> {
+	log.section("Installing agents");
+
+	const AGENT_BIN_DIR = "/root/.local/share/sandbox-agent/bin";
+	await sandbox.exec(`mkdir -p ${AGENT_BIN_DIR}`);
+
+	for (const agent of agents) {
+		log.info(`Installing ${agent}...`);
+
+		if (agent === "claude") {
+			// First get the version
+			const versionResult = await sandbox.exec(
+				"curl -fsSL https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest",
+			);
+			if (versionResult.exitCode !== 0) throw new Error(`Failed to get Claude version: ${versionResult.stderr}`);
+			const claudeVersion = versionResult.stdout.trim();
+			log.debug(`Claude version: ${claudeVersion}`);
+
+			// Then download the binary
+			const downloadUrl = `https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${claudeVersion}/linux-x64/claude`;
+			log.debug(`Download URL: ${downloadUrl}`);
+			const result = await sandbox.exec(
+				`curl -fsSL -o ${AGENT_BIN_DIR}/claude "${downloadUrl}" && chmod +x ${AGENT_BIN_DIR}/claude`,
+			);
+			if (result.exitCode !== 0) throw new Error(`Failed to install claude: ${result.stderr}`);
+		} else if (agent === "codex") {
+			const result = await sandbox.exec(
+				`curl -fsSL -L https://github.com/openai/codex/releases/latest/download/codex-x86_64-unknown-linux-musl.tar.gz | tar -xzf - -C /tmp && ` +
+					`find /tmp -name 'codex-x86_64-unknown-linux-musl' -exec mv {} ${AGENT_BIN_DIR}/codex \\; && ` +
+					`chmod +x ${AGENT_BIN_DIR}/codex`,
+			);
+			if (result.exitCode !== 0) throw new Error(`Failed to install codex: ${result.stderr}`);
+		} else if (agent === "mock") {
+			// Mock agent is built into sandbox-agent, no install needed
+			log.info("Mock agent is built-in, skipping install");
+			continue;
+		}
+
+		log.success(`Installed ${agent}`);
+	}
+
+	// List installed agents
+	const ls = await sandbox.exec(`ls -la ${AGENT_BIN_DIR}/`);
+	log.debug(`Agent binaries:\n${ls.stdout}`);
+}
+
+// Start server and check health
+async function startServerAndCheckHealth(sandbox: Sandbox): Promise<string> {
+	log.section("Starting server");
+
+	// Start server in background
+	await sandbox.exec("nohup sandbox-agent server --no-token --host 0.0.0.0 --port 3000 >/tmp/sandbox-agent.log 2>&1 &");
+	log.info("Server started in background");
+
+	// Get base URL
+	const baseUrl = await sandbox.getBaseUrl(3000);
+	log.info(`Base URL: ${baseUrl}`);
+
+	// Wait for health
+	log.info("Waiting for health check...");
+	for (let i = 0; i < 30; i++) {
+		try {
+			const response = await fetch(`${baseUrl}/v1/health`);
+			if (response.ok) {
+				const data = await response.json();
+				if (data.status === "ok") {
+					log.success("Health check passed!");
+					return baseUrl;
+				}
+			}
+		} catch {}
+		await new Promise((r) => setTimeout(r, 1000));
+	}
+
+	// Show logs on failure
+	const logs = await sandbox.exec("cat /tmp/sandbox-agent.log");
+	log.error("Server logs:\n" + logs.stdout);
+	throw new Error("Health check failed after 30 seconds");
+}
+
+// Test agent interaction
+async function testAgent(baseUrl: string, agent: string, message: string): Promise<void> {
+	log.section(`Testing ${agent} agent`);
+
+	const sessionId = crypto.randomUUID();
+
+	// Create session
+	log.info(`Creating session ${sessionId}...`);
+	const createRes = await fetch(`${baseUrl}/v1/sessions/${sessionId}`, {
+		method: "POST",
+		headers: { "Content-Type": "application/json" },
+		body: JSON.stringify({ agent }),
+	});
+	if (!createRes.ok) {
+		throw new Error(`Failed to create session: ${await createRes.text()}`);
+	}
+	log.success("Session created");
+
+	// Send message with streaming
+	log.info(`Sending message: "${message}"`);
+	const msgRes = await fetch(`${baseUrl}/v1/sessions/${sessionId}/messages/stream`, {
+		method: "POST",
+		headers: { "Content-Type": "application/json" },
+		body: JSON.stringify({ message }),
+	});
+	if (!msgRes.ok || !msgRes.body) {
+		throw new Error(`Failed to send message: ${await msgRes.text()}`);
+	}
+
+	// Process SSE stream
+	const reader = msgRes.body.getReader();
+	const decoder = new TextDecoder();
+	let buffer = "";
+	let receivedText = false;
+	let hasError = false;
+	let errorMessage = "";
+
+	while (true) {
+		const { done, value } = await reader.read();
+		if (done) break;
+
+		buffer += decoder.decode(value, { stream: true });
+		const lines = buffer.split("\n");
+		buffer = lines.pop() || "";
+
+		for (const line of lines) {
+			if (!line.startsWith("data: ")) continue;
+			const data = line.slice(6);
+			if (data === "[DONE]") continue;
+
+			try {
+				const event = JSON.parse(data);
+				log.debug(`Event: ${event.type}`);
+
+				if (event.type === "item.delta") {
+					const delta = event.data?.delta;
+					const text = typeof delta === "string" ? delta : delta?.text || "";
+					if (text) {
+						if (!receivedText) {
+							log.info("Receiving response...");
+							receivedText = true;
+						}
+						process.stdout.write(text);
+					}
+				}
+
+				if (event.type === "error") {
+					hasError = true;
+					errorMessage = event.data?.message || JSON.stringify(event.data);
+					log.error(`Error event: ${errorMessage}`);
+				}
+
+				if (event.type === "session.ended") {
+					const reason = event.data?.reason;
+					log.info(`Session ended: ${reason || "unknown reason"}`);
+				}
+			} catch {}
+		}
+	}
+
+	if (receivedText) {
+		console.log(); // newline after response
+		log.success("Received response from agent");
+	} else if (hasError) {
+		throw new Error(`Agent returned error: ${errorMessage}`);
+	} else {
+		throw new Error("No response received from agent");
+	}
+}
+
+// Check environment diagnostics
+async function checkEnvironment(sandbox: Sandbox): Promise<void> {
+	log.section("Environment diagnostics");
+
+	const checks = [
+		{ name: "Environment variables", cmd: "env | grep -E 'ANTHROPIC|OPENAI|CLAUDE|CODEX' | sed 's/=.*/=<set>/'" },
+		{ name: "Agent binaries", cmd: "ls -la /root/.local/share/sandbox-agent/bin/ 2>/dev/null || echo 'No agents installed'" },
+		{ name: "sandbox-agent version", cmd: "sandbox-agent --version 2>/dev/null || echo 'Not installed'" },
+		{ name: "Server process", cmd: "pgrep -a sandbox-agent || echo 'Not running'" },
+		{ name: "Server logs (last 20 lines)", cmd: "tail -20 /tmp/sandbox-agent.log 2>/dev/null || echo 'No logs'" },
+	];
+
+	for (const { name, cmd } of checks) {
+		const result = await sandbox.exec(cmd);
+		console.log(`\n\x1b[1m${name}:\x1b[0m`);
+		console.log(result.stdout || "(empty)");
+		if (result.stderr) console.log(`stderr: ${result.stderr}`);
+	}
+}
+
+// Main
+async function main() {
+	log.section(`Sandbox Testing (provider: ${provider})`);
+
+	// Check credentials
+	const anthropicKey = getAnthropicApiKey();
+	const openaiKey = getOpenAiApiKey();
+
+	log.info(`Anthropic API key: ${anthropicKey ? "found" : "not found"}`);
+	log.info(`OpenAI API key: ${openaiKey ? "found" : "not found"}`);
+
+	// Determine which agents to test
+	let agents: string[];
+	if (agentArg) {
+		agents = [agentArg];
+	} else if (anthropicKey) {
+		agents = ["claude"];
+	} else if (openaiKey) {
+		agents = ["codex"];
+	} else {
+		agents = ["mock"];
+		log.warn("No API keys found, using mock agent only");
+	}
+	log.info(`Agents to test: ${agents.join(", ")}`);
+
+	// Get provider
+	const prov = getProvider(provider);
+
+	// Check required env vars
+	for (const envVar of prov.requiredEnv) {
+		if (!process.env[envVar]) {
+			throw new Error(`Missing required environment variable: ${envVar}`);
+		}
+	}
+
+	// Build
+	const binaryPath = await buildSandboxAgent();
+
+	// Create sandbox
+	log.section(`Creating ${prov.name} sandbox`);
+	const envVars: Record<string, string> = {};
+	if (anthropicKey) envVars.ANTHROPIC_API_KEY = anthropicKey;
+	if (openaiKey) envVars.OPENAI_API_KEY = openaiKey;
+
+	const sandbox = await prov.create({ envVars });
+	log.success(`Created sandbox: ${sandbox.id}`);
+
+	try {
+		// Install sandbox-agent
+		await installSandboxAgent(sandbox, binaryPath);
+
+		// Install agents
+		await installAgents(sandbox, agents);
+
+		// Check environment
+		await checkEnvironment(sandbox);
+
+		// Start server and check health
+		const baseUrl = await startServerAndCheckHealth(sandbox);
+
+		// Test each agent
+		for (const agent of agents) {
+			const message = agent === "mock" ? "hello" : "Say hello in 10 words or less";
+			await testAgent(baseUrl, agent, message);
+		}
+
+		log.section("All tests passed!");
+
+		if (keepAlive) {
+			log.info(`Sandbox ${sandbox.id} is still running. Press Ctrl+C to cleanup.`);
+			log.info(`Base URL: ${await sandbox.getBaseUrl(3000)}`);
+			await new Promise(() => {}); // Wait forever
+		}
+	} catch (err) {
+		log.error(`Test failed: ${err}`);
+
+		// Show diagnostics on failure
+		try {
+			await checkEnvironment(sandbox);
+		} catch {}
+
+		if (!keepAlive) {
+			await sandbox.cleanup();
+		}
+		process.exit(1);
+	}
+
+	if (!keepAlive) {
+		await sandbox.cleanup();
+	}
+}
+
+main().catch((err) => {
+	log.error(err.message || err);
+	process.exit(1);
+});