wip examples and content

This commit is contained in:
Nathan Flurry 2026-01-28 02:56:22 -08:00
parent fa89872d3b
commit 0bbe92b344
11 changed files with 724 additions and 151 deletions

View file

@ -45,6 +45,7 @@ Universal schema guidance:
- Use `docs/glossary.md` as the source of truth for universal schema terminology and keep it updated alongside schema changes.
- On parse failures, emit an `agent.unparsed` event (source=daemon, synthetic=true) and treat it as a test failure. Preserve raw payloads when `include_raw=true`.
- Track subagent support in `docs/conversion.md`. For now, normalize subagent activity into normal message/tool flow, but revisit explicit subagent modeling later.
- Keep the FAQ in `README.md` and `frontend/packages/website/src/components/FAQ.tsx` in sync. When adding or modifying FAQ entries, update both files.
### CLI ⇄ HTTP endpoint map (keep in sync)

103
README.md
View file

@ -6,7 +6,6 @@
Universal API for automatic coding agents in sandboxes. Supports Claude Code, Codex, OpenCode, and Amp.
</p>
Docs: https://rivet.dev/docs/
- **Any coding agent**: Universal API to interact with all agents with full feature coverage
- **Server or SDK mode**: Run as an HTTP server or with the TypeScript SDK
@ -14,13 +13,9 @@ Docs: https://rivet.dev/docs/
- **Supports your sandbox provider**: Daytona, E2B, Vercel Sandboxes, and more
- **Lightweight, portable Rust binary**: Install anywhere with 1 curl command
- **Automatic agent installation**: Agents are installed on-demand when first used
- **OpenAPI spec**: https://rivet.dev/docs/api
- **OpenAPI spec**: https://sandboxagent.dev/docs/api
Roadmap:
- [ ] Python SDK
- [ ] Automatic MCP & skill & hook configuration
- [ ] Todo lists
[Documentation](https://sandboxagent.dev/docs) — [Discord](https://rivet.dev/discord)
## Agent Compatibility
@ -55,7 +50,7 @@ The Sandbox Agent acts as a universal adapter between your client application an
- **Embedded Mode**: Runs agents locally as subprocesses
- **Server Mode**: Runs as HTTP server from any sandbox provider
[Documentation](https://rivet.dev/docs/architecture)
[Documentation](https://sandboxagent.dev/docs/architecture)
## Components
@ -121,7 +116,7 @@ for await (const event of client.streamEvents("demo", { offset: 0 })) {
}
```
Full guide: https://rivet.dev/docs/sdks/typescript
[Documentation](https://sandboxagent.dev/docs/sdks/typescript)
### Server
@ -129,7 +124,7 @@ Install the binary (fastest installation, no Node.js required):
```bash
# Install it
curl -fsSL https://releases.rivet.dev/sandbox-agent/latest/install.sh | sh
curl -fsSL https://releases.sandboxagent.dev/sandbox-agent/latest/install.sh | sh
# Run it
sandbox-agent server --token "$SANDBOX_TOKEN" --host 127.0.0.1 --port 2468
```
@ -149,8 +144,8 @@ To disable auth locally:
sandbox-agent server --no-token --host 127.0.0.1 --port 2468
```
Docs: https://rivet.dev/docs/quickstart
Integration guides: https://rivet.dev/docs/deployments
[Documentation](https://sandboxagent.dev/docs/quickstart)
[Integration guides](https://sandboxagent.dev/docs/deployments)
### CLI
@ -168,16 +163,53 @@ sandbox-agent api sessions send-message my-session --message "Hello" --endpoint
sandbox-agent api sessions send-message-stream my-session --message "Hello" --endpoint http://127.0.0.1:2468 --token "$SANDBOX_TOKEN"
```
Docs: https://rivet.dev/docs/cli
You can also use npx like:
```bash
npx sandbox-agent --help
```
[Documentation](https://rivet.dev/docs/cli)
### Tip: Extract credentials
Often you need to use your personal API tokens to test agents on sandboxes:
```bash
sandbox-agent credentials extract-env --export
```
This prints environment variables for your locally installed agents.
Docs: https://rivet.dev/docs/quickstart
This prints environment variables for your OpenAI/Anthropic/etc API keys to test with Sandbox Agent SDK.
## FAQ
**Does this replace the Vercel AI SDK?**
No, they're complementary. AI SDK is for building chat interfaces and calling LLMs. This SDK is for controlling autonomous coding agents that write code and run commands. Use AI SDK for your UI, use this when you need an agent to actually code.
**Which coding agents are supported?**
Claude Code, Codex, OpenCode, and Amp. The SDK normalizes their APIs so you can swap between them without changing your code.
**How is session data persisted?**
This SDK does not handle persisting session data. Events stream in a universal JSON schema that you can persist anywhere. Consider using Postgres or [Rivet Actors](https://rivet.gg) for data persistence.
**Can I run this locally or does it require a sandbox provider?**
Both. Run locally for development, deploy to E2B, Daytona, or Vercel Sandboxes for production.
**Does it support [platform]?**
The server is a single Rust binary that runs anywhere with a curl install. If your platform can run Linux binaries (Docker, VMs, etc.), it works. See the deployment guides for E2B, Daytona, and Vercel Sandboxes.
**Can I use this with my personal API keys?**
Yes. Use `sandbox-agent credentials extract-env` to extract API keys from your local agent configs (Claude Code, Codex, OpenCode, Amp) and pass them to the sandbox environment.
**Why Rust and not [language]?**
Rust gives us a single static binary, fast startup, and predictable memory usage. That makes it easy to run inside sandboxes or in CI without shipping a large runtime, such as Node.js.
## Project Goals
@ -194,43 +226,10 @@ Features out of scope:
- **Git Repo Management**: Just use git commands or the features provided by your sandbox provider of choice.
- **Sandbox Provider API**: Sandbox providers have many nuanced differences in their API, it does not make sense for us to try to provide a custom layer. Instead, we opt to provide guides that let you integrate this project with sandbox providers.
## FAQ
## Roadmap
**Why not use PTY?**
- [ ] Python SDK
- [ ] Automatic MCP & skill & hook configuration
- [ ] Todo lists
PTY-based approaches require parsing terminal escape sequences and dealing with interactive prompts.
The agents we support all have machine-readable output modes (JSONL, HTTP APIs) that provide structured events, making integration more reliable.
**Why not use features that already exist on sandbox provider APIs?**
Sandbox providers focus on infrastructure (containers, VMs, networking).
This project focuses specifically on coding agent orchestration: session management, HITL (human-in-the-loop) flows, and universal event schemas. These concerns are complementary.
**Does it support [platform]?**
The server is a single Rust binary that runs anywhere with a curl install. If your platform can run Linux binaries (Docker, VMs, etc.), it works. See the deployment guides for E2B, Daytona, Vercel Sandboxes, and Docker.
**Can I use this with my personal API keys?**
Yes. Use `sandbox-agent credentials extract-env` to extract API keys from your local agent configs (Claude Code, Codex, OpenCode, Amp) and pass them to the sandbox environment.
**Why Rust?**
Rust gives us a single static binary, fast startup, and predictable memory usage. That makes it
easy to run inside sandboxes or in CI without shipping a large runtime.
**Why not use stdio/JSON-RPC?**
- has benefit of not having to listen on a port
- more difficult to interact with, harder to analyze, doesn't support inspector for debugging
- may add at some point
- Codex does this and Claude has a JSON stream, but HTTP/SSE gives us a consistent API surface and inspector UI.
**Why not AI SDK?**
- AI SDK does not provide harness for bieng a fully fledged coding agent
- Fronteir coding agent harnesses have a lot of work put in to complex things like swarms, compaction, etc
**Why not OpenCode server?**
- The harnesses do a lot of heavy lifting, but different agents have very different APIs and behavior.
- A universal API lets you swap agents without rewriting your orchestration code.

View file

@ -3,7 +3,7 @@
"private": true,
"type": "module",
"scripts": {
"start": "tsx src/daytona-fallback.ts",
"start": "tsx src/daytona.ts",
"typecheck": "tsc --noEmit"
},
"dependencies": {

View file

@ -1,76 +0,0 @@
import { Daytona, Image } from "@daytonaio/sdk";
import { logInspectorUrl, runPrompt } from "@sandbox-agent/example-shared";
if (
!process.env.DAYTONA_API_KEY ||
(!process.env.OPENAI_API_KEY && !process.env.ANTHROPIC_API_KEY)
) {
throw new Error(
"DAYTONA_API_KEY and (OPENAI_API_KEY or ANTHROPIC_API_KEY) required",
);
}
const SNAPSHOT = "sandbox-agent-ready";
const AGENT_BIN_DIR = "/root/.local/share/sandbox-agent/bin";
const daytona = new Daytona();
const hasSnapshot = await daytona.snapshot.get(SNAPSHOT).then(
() => true,
() => false,
);
if (!hasSnapshot) {
console.log(`Creating snapshot '${SNAPSHOT}' (one-time setup, ~2-3min)...`);
await daytona.snapshot.create(
{
name: SNAPSHOT,
image: Image.base("ubuntu:22.04").runCommands(
// Install dependencies
"apt-get update && apt-get install -y curl ca-certificates",
// Install sandbox-agent
"curl -fsSL https://releases.rivet.dev/sandbox-agent/latest/install.sh | sh",
// Create agent bin directory
`mkdir -p ${AGENT_BIN_DIR}`,
// Install Claude: get latest version, download binary
`CLAUDE_VERSION=$(curl -fsSL https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest) && ` +
`curl -fsSL -o ${AGENT_BIN_DIR}/claude "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/$CLAUDE_VERSION/linux-x64/claude" && ` +
`chmod +x ${AGENT_BIN_DIR}/claude`,
// Install Codex: download tarball, extract binary
`curl -fsSL -L https://github.com/openai/codex/releases/latest/download/codex-x86_64-unknown-linux-musl.tar.gz | tar -xzf - -C /tmp && ` +
`find /tmp -name 'codex-x86_64-unknown-linux-musl' -exec mv {} ${AGENT_BIN_DIR}/codex \\; && ` +
`chmod +x ${AGENT_BIN_DIR}/codex`,
),
},
{ onLogs: (log) => console.log(` ${log}`) },
);
console.log("Snapshot created. Future runs will be instant.");
}
console.log("Creating sandbox...");
const envVars: Record<string, string> = {};
if (process.env.ANTHROPIC_API_KEY) envVars.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
if (process.env.OPENAI_API_KEY) envVars.OPENAI_API_KEY = process.env.OPENAI_API_KEY;
const sandbox = await daytona.create({
snapshot: SNAPSHOT,
envVars,
});
console.log("Starting server...");
await sandbox.process.executeCommand(
"nohup sandbox-agent server --no-token --host 0.0.0.0 --port 3000 >/tmp/sandbox-agent.log 2>&1 &",
);
const baseUrl = (await sandbox.getSignedPreviewUrl(3000, 4 * 60 * 60)).url;
logInspectorUrl({ baseUrl });
const cleanup = async () => {
console.log("Cleaning up...");
await sandbox.delete(60);
process.exit(0);
};
process.once("SIGINT", cleanup);
process.once("SIGTERM", cleanup);
await runPrompt({ baseUrl });
await cleanup();

View file

@ -1,17 +1,42 @@
import { Daytona, Image } from "@daytonaio/sdk";
import { logInspectorUrl, runPrompt } from "@sandbox-agent/example-shared";
import { readFileSync } from "node:fs";
import { homedir } from "node:os";
import { join } from "node:path";
if (
!process.env.DAYTONA_API_KEY ||
(!process.env.OPENAI_API_KEY && !process.env.ANTHROPIC_API_KEY)
) {
// Extract API key from Claude's config files
function getAnthropicApiKey(): string | undefined {
if (process.env.ANTHROPIC_API_KEY) return process.env.ANTHROPIC_API_KEY;
const home = homedir();
const configPaths = [
join(home, ".claude.json"),
join(home, ".claude.json.api"),
];
for (const path of configPaths) {
try {
const data = JSON.parse(readFileSync(path, "utf-8"));
const key = data.primaryApiKey || data.apiKey || data.anthropicApiKey;
if (key?.startsWith("sk-ant-")) return key;
} catch {
// Ignore errors
}
}
return undefined;
}
const anthropicKey = getAnthropicApiKey();
const openaiKey = process.env.OPENAI_API_KEY;
if (!process.env.DAYTONA_API_KEY || (!anthropicKey && !openaiKey)) {
throw new Error(
"DAYTONA_API_KEY and (OPENAI_API_KEY or ANTHROPIC_API_KEY) required",
"DAYTONA_API_KEY and (ANTHROPIC_API_KEY or OPENAI_API_KEY) required",
);
}
const SNAPSHOT = "sandbox-agent-ready";
const BINARY = "/usr/local/bin/sandbox-agent";
const AGENT_BIN_DIR = "/root/.local/share/sandbox-agent/bin";
const daytona = new Daytona();
@ -27,11 +52,18 @@ if (!hasSnapshot) {
image: Image.base("ubuntu:22.04").runCommands(
// Install dependencies
"apt-get update && apt-get install -y curl ca-certificates",
// Install sandbox-agent via install script
// Install sandbox-agent
"curl -fsSL https://releases.rivet.dev/sandbox-agent/latest/install.sh | sh",
// Pre-install agents using sandbox-agent CLI
"sandbox-agent install-agent claude",
"sandbox-agent install-agent codex",
// Create agent bin directory
`mkdir -p ${AGENT_BIN_DIR}`,
// Install Claude: get latest version, download binary
`CLAUDE_VERSION=$(curl -fsSL https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest) && ` +
`curl -fsSL -o ${AGENT_BIN_DIR}/claude "https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/$CLAUDE_VERSION/linux-x64/claude" && ` +
`chmod +x ${AGENT_BIN_DIR}/claude`,
// Install Codex: download tarball, extract binary
`curl -fsSL -L https://github.com/openai/codex/releases/latest/download/codex-x86_64-unknown-linux-musl.tar.gz | tar -xzf - -C /tmp && ` +
`find /tmp -name 'codex-x86_64-unknown-linux-musl' -exec mv {} ${AGENT_BIN_DIR}/codex \\; && ` +
`chmod +x ${AGENT_BIN_DIR}/codex`,
),
},
{ onLogs: (log) => console.log(` ${log}`) },
@ -41,8 +73,8 @@ if (!hasSnapshot) {
console.log("Creating sandbox...");
const envVars: Record<string, string> = {};
if (process.env.ANTHROPIC_API_KEY) envVars.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
if (process.env.OPENAI_API_KEY) envVars.OPENAI_API_KEY = process.env.OPENAI_API_KEY;
if (anthropicKey) envVars.ANTHROPIC_API_KEY = anthropicKey;
if (openaiKey) envVars.OPENAI_API_KEY = openaiKey;
const sandbox = await daytona.create({
snapshot: SNAPSHOT,
@ -51,13 +83,35 @@ const sandbox = await daytona.create({
console.log("Starting server...");
await sandbox.process.executeCommand(
`nohup ${BINARY} server --no-token --host 0.0.0.0 --port 3000 >/tmp/sandbox-agent.log 2>&1 &`,
"nohup sandbox-agent server --no-token --host 0.0.0.0 --port 3000 >/tmp/sandbox-agent.log 2>&1 &",
);
// Wait for server to be ready
await new Promise((r) => setTimeout(r, 2000));
// Debug: check environment and agent binaries
const envCheck = await sandbox.process.executeCommand(
"env | grep -E 'ANTHROPIC|OPENAI' | sed 's/=.*/=<set>/'",
);
console.log("Sandbox env:", envCheck.result.output || "(none)");
const binCheck = await sandbox.process.executeCommand(
`ls -la ${AGENT_BIN_DIR}/`,
);
console.log("Agent binaries:", binCheck.result.output);
const baseUrl = (await sandbox.getSignedPreviewUrl(3000, 4 * 60 * 60)).url;
logInspectorUrl({ baseUrl });
const cleanup = async () => {
// Show server logs before cleanup
const logs = await sandbox.process.executeCommand(
"cat /tmp/sandbox-agent.log 2>/dev/null | tail -50",
);
if (logs.result.output) {
console.log("\n--- Server logs ---");
console.log(logs.result.output);
}
console.log("Cleaning up...");
await sandbox.delete(60);
process.exit(0);

View file

@ -154,7 +154,7 @@ export async function createSession({
const normalized = normalizeBaseUrl(baseUrl);
const sessionId = randomUUID();
const body: Record<string, string> = {
agent: agentId || process.env.SANDBOX_AGENT || "claude",
agent: agentId || detectAgent(),
};
const envAgentMode = agentMode || process.env.SANDBOX_AGENT_MODE;
const envPermissionMode = permissionMode || process.env.SANDBOX_PERMISSION_MODE;
@ -269,6 +269,15 @@ export async function sendMessageStream({
return fullText;
}
function detectAgent(): string {
// Prefer explicit setting
if (process.env.SANDBOX_AGENT) return process.env.SANDBOX_AGENT;
// Select based on available API key
if (process.env.ANTHROPIC_API_KEY) return "claude";
if (process.env.OPENAI_API_KEY) return "codex";
return "claude";
}
export async function runPrompt({
baseUrl,
token,
@ -286,11 +295,10 @@ export async function runPrompt({
headers: extraHeaders,
});
const agent = agentId || detectAgent();
const sessionId = randomUUID();
await client.createSession(sessionId, {
agent: agentId || process.env.SANDBOX_AGENT || "claude",
});
console.log(`Session ${sessionId} ready. Press Ctrl+C to quit.`);
await client.createSession(sessionId, { agent });
console.log(`Session ${sessionId} using ${agent}. Press Ctrl+C to quit.`);
let isThinking = false;
let hasStartedOutput = false;
@ -334,6 +342,12 @@ export async function runPrompt({
}
}
// Handle errors
if (event.type === "error") {
const data = event.data as any;
console.error(`\nError: ${data?.message || JSON.stringify(data)}`);
}
// Handle session ended
if (event.type === "session.ended") {
const data = event.data as any;

View file

@ -18,17 +18,27 @@ const faqs = [
{
question: 'How is session data persisted?',
answer:
"Events stream in a universal JSON schema. Persist them anywhere. We have adapters for Postgres and ClickHouse, or use <a href='https://rivet.gg' target='_blank' rel='noopener noreferrer' class='text-orange-400 hover:underline'>Rivet Actors</a> for managed stateful storage.",
"This SDK does not handle persisting session data. Events stream in a universal JSON schema that you can persist anywhere. Consider using Postgres or <a href='https://rivet.gg' target='_blank' rel='noopener noreferrer' class='text-orange-400 hover:underline'>Rivet Actors</a> for data persistence.",
},
{
question: 'Can I run this locally or does it require a sandbox provider?',
answer:
"Both. Run locally for development, deploy to E2B, Daytona, Vercel, or Docker for production.",
"Both. Run locally for development, deploy to E2B, Daytona, or Vercel Sandboxes for production.",
},
{
question: 'Is this open source?',
question: 'Does it support [platform]?',
answer:
"Yes, MIT licensed. Code is on GitHub.",
"The server is a single Rust binary that runs anywhere with a curl install. If your platform can run Linux binaries (Docker, VMs, etc.), it works. See the deployment guides for E2B, Daytona, and Vercel Sandboxes.",
},
{
question: 'Can I use this with my personal API keys?',
answer:
"Yes. Use <code>sandbox-agent credentials extract-env</code> to extract API keys from your local agent configs (Claude Code, Codex, OpenCode, Amp) and pass them to the sandbox environment.",
},
{
question: 'Why Rust and not [language]?',
answer:
"Rust gives us a single static binary, fast startup, and predictable memory usage. That makes it easy to run inside sandboxes or in CI without shipping a large runtime, such as Node.js.",
},
];

16
pnpm-lock.yaml generated
View file

@ -253,6 +253,22 @@ importers:
specifier: ^7.5.8
version: 7.7.1
scripts/sandbox-testing:
dependencies:
'@daytonaio/sdk':
specifier: latest
version: 0.135.0(ws@8.19.0)
devDependencies:
'@types/node':
specifier: latest
version: 25.0.10
tsx:
specifier: latest
version: 4.21.0
typescript:
specifier: latest
version: 5.9.3
sdks/cli:
devDependencies:
vitest:

View file

@ -6,4 +6,5 @@ packages:
- "resources/agent-schemas"
- "resources/vercel-ai-sdk-schemas"
- "scripts/release"
- "scripts/sandbox-testing"
- "examples/*"

View file

@ -0,0 +1,20 @@
{
"name": "@sandbox-agent/testing",
"private": true,
"type": "module",
"scripts": {
"test": "tsx test-sandbox.ts",
"test:docker": "tsx test-sandbox.ts docker",
"test:daytona": "tsx test-sandbox.ts daytona",
"test:mock": "tsx test-sandbox.ts docker --agent=mock",
"test:verbose": "tsx test-sandbox.ts docker --verbose"
},
"dependencies": {
"@daytonaio/sdk": "latest"
},
"devDependencies": {
"@types/node": "latest",
"tsx": "latest",
"typescript": "latest"
}
}

View file

@ -0,0 +1,534 @@
#!/usr/bin/env npx tsx
/**
* Sandbox Testing Script
*
* Tests sandbox-agent on various cloud sandbox providers.
* Usage: npx tsx test-sandbox.ts [provider] [options]
*
* Providers: daytona, e2b, docker
*
* Options:
* --skip-build Skip cargo build step
* --use-release Use pre-built release binary from releases.rivet.dev
* --agent <name> Test specific agent (claude, codex, mock)
* --keep-alive Don't cleanup sandbox after test
* --verbose Show all logs
*/
import { execSync, spawn } from "node:child_process";
import { existsSync, readFileSync, mkdtempSync, writeFileSync, rmSync } from "node:fs";
import { homedir, tmpdir } from "node:os";
import { join, dirname } from "node:path";
import { fileURLToPath } from "node:url";
const __dirname = dirname(fileURLToPath(import.meta.url));
const ROOT_DIR = join(__dirname, "../..");
const SERVER_DIR = join(ROOT_DIR, "server");
// Parse args
const args = process.argv.slice(2);
const provider = args.find((a) => !a.startsWith("--")) || "docker";
const skipBuild = args.includes("--skip-build");
const useRelease = args.includes("--use-release");
const keepAlive = args.includes("--keep-alive");
const verbose = args.includes("--verbose");
const agentArg = args.find((a) => a.startsWith("--agent="))?.split("=")[1];
// Colors
const log = {
info: (msg: string) => console.log(`\x1b[34m[INFO]\x1b[0m ${msg}`),
success: (msg: string) => console.log(`\x1b[32m[OK]\x1b[0m ${msg}`),
error: (msg: string) => console.log(`\x1b[31m[ERROR]\x1b[0m ${msg}`),
warn: (msg: string) => console.log(`\x1b[33m[WARN]\x1b[0m ${msg}`),
debug: (msg: string) => verbose && console.log(`\x1b[90m[DEBUG]\x1b[0m ${msg}`),
section: (msg: string) => console.log(`\n\x1b[1m=== ${msg} ===\x1b[0m`),
};
// Credentials extraction (mirrors agent-credentials logic)
function getAnthropicApiKey(): string | undefined {
if (process.env.ANTHROPIC_API_KEY) return process.env.ANTHROPIC_API_KEY;
const home = homedir();
for (const path of [join(home, ".claude.json"), join(home, ".claude.json.api")]) {
try {
const data = JSON.parse(readFileSync(path, "utf-8"));
const key = data.primaryApiKey || data.apiKey || data.anthropicApiKey;
if (key?.startsWith("sk-ant-")) return key;
} catch {}
}
return undefined;
}
function getOpenAiApiKey(): string | undefined {
if (process.env.OPENAI_API_KEY) return process.env.OPENAI_API_KEY;
const home = homedir();
try {
const data = JSON.parse(readFileSync(join(home, ".codex", "codex.json"), "utf-8"));
if (data.apiKey) return data.apiKey;
} catch {}
return undefined;
}
// Build sandbox-agent
async function buildSandboxAgent(): Promise<string> {
log.section("Building sandbox-agent");
if (useRelease) {
log.info("Using pre-built release from releases.rivet.dev");
return "RELEASE";
}
if (skipBuild) {
const binaryPath = join(SERVER_DIR, "target/release/sandbox-agent");
if (!existsSync(binaryPath)) {
throw new Error(`Binary not found at ${binaryPath}. Run without --skip-build.`);
}
log.info(`Using existing binary: ${binaryPath}`);
return binaryPath;
}
log.info("Running cargo build --release...");
try {
execSync("cargo build --release -p sandbox-agent", {
cwd: SERVER_DIR,
stdio: verbose ? "inherit" : "pipe",
});
const binaryPath = join(SERVER_DIR, "target/release/sandbox-agent");
log.success(`Built: ${binaryPath}`);
return binaryPath;
} catch (err) {
throw new Error(`Build failed: ${err}`);
}
}
// Provider interface
interface SandboxProvider {
name: string;
requiredEnv: string[];
create(opts: { envVars: Record<string, string> }): Promise<Sandbox>;
}
interface Sandbox {
id: string;
exec(cmd: string): Promise<{ stdout: string; stderr: string; exitCode: number }>;
upload(localPath: string, remotePath: string): Promise<void>;
getBaseUrl(port: number): Promise<string>;
cleanup(): Promise<void>;
}
// Docker provider
const dockerProvider: SandboxProvider = {
name: "docker",
requiredEnv: [],
async create({ envVars }) {
const id = `sandbox-test-${Date.now()}`;
const envArgs = Object.entries(envVars)
.map(([k, v]) => `-e ${k}=${v}`)
.join(" ");
log.info(`Creating Docker container: ${id}`);
execSync(
`docker run -d --name ${id} ${envArgs} -p 0:3000 ubuntu:22.04 tail -f /dev/null`,
{ stdio: verbose ? "inherit" : "pipe" },
);
// Install curl
execSync(`docker exec ${id} bash -c "apt-get update && apt-get install -y curl ca-certificates"`, {
stdio: verbose ? "inherit" : "pipe",
});
return {
id,
async exec(cmd) {
try {
const stdout = execSync(`docker exec ${id} bash -c "${cmd.replace(/"/g, '\\"')}"`, {
encoding: "utf-8",
stdio: ["pipe", "pipe", "pipe"],
});
return { stdout, stderr: "", exitCode: 0 };
} catch (err: any) {
return { stdout: err.stdout || "", stderr: err.stderr || "", exitCode: err.status || 1 };
}
},
async upload(localPath, remotePath) {
execSync(`docker cp "${localPath}" ${id}:${remotePath}`, { stdio: verbose ? "inherit" : "pipe" });
},
async getBaseUrl(port) {
const portMapping = execSync(`docker port ${id} ${port}`, { encoding: "utf-8" }).trim();
const hostPort = portMapping.split(":").pop();
return `http://localhost:${hostPort}`;
},
async cleanup() {
log.info(`Cleaning up container: ${id}`);
execSync(`docker rm -f ${id}`, { stdio: "pipe" });
},
};
},
};
// Daytona provider
const daytonaProvider: SandboxProvider = {
name: "daytona",
requiredEnv: ["DAYTONA_API_KEY"],
async create({ envVars }) {
const { Daytona } = await import("@daytonaio/sdk");
const daytona = new Daytona();
log.info("Creating Daytona sandbox...");
const sandbox = await daytona.create({
image: "ubuntu:22.04",
envVars,
});
const id = sandbox.id;
// Install curl
await sandbox.process.executeCommand("apt-get update && apt-get install -y curl ca-certificates");
return {
id,
async exec(cmd) {
const result = await sandbox.process.executeCommand(cmd);
return {
stdout: result.result.output || "",
stderr: result.result.error || "",
exitCode: result.result.exitCode,
};
},
async upload(localPath, remotePath) {
const content = readFileSync(localPath);
await sandbox.fs.uploadFile(remotePath, content);
await sandbox.process.executeCommand(`chmod +x ${remotePath}`);
},
async getBaseUrl(port) {
const preview = await sandbox.getSignedPreviewUrl(port, 4 * 60 * 60);
return preview.url;
},
async cleanup() {
log.info(`Cleaning up Daytona sandbox: ${id}`);
await sandbox.delete(60);
},
};
},
};
// Get provider
function getProvider(name: string): SandboxProvider {
switch (name) {
case "docker":
return dockerProvider;
case "daytona":
return daytonaProvider;
default:
throw new Error(`Unknown provider: ${name}. Available: docker, daytona`);
}
}
// Install sandbox-agent in sandbox
async function installSandboxAgent(sandbox: Sandbox, binaryPath: string): Promise<void> {
log.section("Installing sandbox-agent");
if (binaryPath === "RELEASE") {
log.info("Installing from releases.rivet.dev...");
const result = await sandbox.exec(
"curl -fsSL https://releases.rivet.dev/sandbox-agent/latest/install.sh | sh",
);
log.debug(`Install output: ${result.stdout}`);
if (result.exitCode !== 0) {
throw new Error(`Install failed: ${result.stderr}`);
}
} else {
log.info(`Uploading local binary: ${binaryPath}`);
await sandbox.upload(binaryPath, "/usr/local/bin/sandbox-agent");
}
// Verify installation
const version = await sandbox.exec("sandbox-agent --version");
log.success(`Installed: ${version.stdout.trim()}`);
}
// Install agents
async function installAgents(sandbox: Sandbox, agents: string[]): Promise<void> {
log.section("Installing agents");
const AGENT_BIN_DIR = "/root/.local/share/sandbox-agent/bin";
await sandbox.exec(`mkdir -p ${AGENT_BIN_DIR}`);
for (const agent of agents) {
log.info(`Installing ${agent}...`);
if (agent === "claude") {
// First get the version
const versionResult = await sandbox.exec(
"curl -fsSL https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/latest",
);
if (versionResult.exitCode !== 0) throw new Error(`Failed to get Claude version: ${versionResult.stderr}`);
const claudeVersion = versionResult.stdout.trim();
log.debug(`Claude version: ${claudeVersion}`);
// Then download the binary
const downloadUrl = `https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/claude-code-releases/${claudeVersion}/linux-x64/claude`;
log.debug(`Download URL: ${downloadUrl}`);
const result = await sandbox.exec(
`curl -fsSL -o ${AGENT_BIN_DIR}/claude "${downloadUrl}" && chmod +x ${AGENT_BIN_DIR}/claude`,
);
if (result.exitCode !== 0) throw new Error(`Failed to install claude: ${result.stderr}`);
} else if (agent === "codex") {
const result = await sandbox.exec(
`curl -fsSL -L https://github.com/openai/codex/releases/latest/download/codex-x86_64-unknown-linux-musl.tar.gz | tar -xzf - -C /tmp && ` +
`find /tmp -name 'codex-x86_64-unknown-linux-musl' -exec mv {} ${AGENT_BIN_DIR}/codex \\; && ` +
`chmod +x ${AGENT_BIN_DIR}/codex`,
);
if (result.exitCode !== 0) throw new Error(`Failed to install codex: ${result.stderr}`);
} else if (agent === "mock") {
// Mock agent is built into sandbox-agent, no install needed
log.info("Mock agent is built-in, skipping install");
continue;
}
log.success(`Installed ${agent}`);
}
// List installed agents
const ls = await sandbox.exec(`ls -la ${AGENT_BIN_DIR}/`);
log.debug(`Agent binaries:\n${ls.stdout}`);
}
// Start server and check health
async function startServerAndCheckHealth(sandbox: Sandbox): Promise<string> {
log.section("Starting server");
// Start server in background
await sandbox.exec("nohup sandbox-agent server --no-token --host 0.0.0.0 --port 3000 >/tmp/sandbox-agent.log 2>&1 &");
log.info("Server started in background");
// Get base URL
const baseUrl = await sandbox.getBaseUrl(3000);
log.info(`Base URL: ${baseUrl}`);
// Wait for health
log.info("Waiting for health check...");
for (let i = 0; i < 30; i++) {
try {
const response = await fetch(`${baseUrl}/v1/health`);
if (response.ok) {
const data = await response.json();
if (data.status === "ok") {
log.success("Health check passed!");
return baseUrl;
}
}
} catch {}
await new Promise((r) => setTimeout(r, 1000));
}
// Show logs on failure
const logs = await sandbox.exec("cat /tmp/sandbox-agent.log");
log.error("Server logs:\n" + logs.stdout);
throw new Error("Health check failed after 30 seconds");
}
// Test agent interaction
async function testAgent(baseUrl: string, agent: string, message: string): Promise<void> {
log.section(`Testing ${agent} agent`);
const sessionId = crypto.randomUUID();
// Create session
log.info(`Creating session ${sessionId}...`);
const createRes = await fetch(`${baseUrl}/v1/sessions/${sessionId}`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ agent }),
});
if (!createRes.ok) {
throw new Error(`Failed to create session: ${await createRes.text()}`);
}
log.success("Session created");
// Send message with streaming
log.info(`Sending message: "${message}"`);
const msgRes = await fetch(`${baseUrl}/v1/sessions/${sessionId}/messages/stream`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ message }),
});
if (!msgRes.ok || !msgRes.body) {
throw new Error(`Failed to send message: ${await msgRes.text()}`);
}
// Process SSE stream
const reader = msgRes.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let receivedText = false;
let hasError = false;
let errorMessage = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const data = line.slice(6);
if (data === "[DONE]") continue;
try {
const event = JSON.parse(data);
log.debug(`Event: ${event.type}`);
if (event.type === "item.delta") {
const delta = event.data?.delta;
const text = typeof delta === "string" ? delta : delta?.text || "";
if (text) {
if (!receivedText) {
log.info("Receiving response...");
receivedText = true;
}
process.stdout.write(text);
}
}
if (event.type === "error") {
hasError = true;
errorMessage = event.data?.message || JSON.stringify(event.data);
log.error(`Error event: ${errorMessage}`);
}
if (event.type === "session.ended") {
const reason = event.data?.reason;
log.info(`Session ended: ${reason || "unknown reason"}`);
}
} catch {}
}
}
if (receivedText) {
console.log(); // newline after response
log.success("Received response from agent");
} else if (hasError) {
throw new Error(`Agent returned error: ${errorMessage}`);
} else {
throw new Error("No response received from agent");
}
}
// Check environment diagnostics
async function checkEnvironment(sandbox: Sandbox): Promise<void> {
log.section("Environment diagnostics");
const checks = [
{ name: "Environment variables", cmd: "env | grep -E 'ANTHROPIC|OPENAI|CLAUDE|CODEX' | sed 's/=.*/=<set>/'" },
{ name: "Agent binaries", cmd: "ls -la /root/.local/share/sandbox-agent/bin/ 2>/dev/null || echo 'No agents installed'" },
{ name: "sandbox-agent version", cmd: "sandbox-agent --version 2>/dev/null || echo 'Not installed'" },
{ name: "Server process", cmd: "pgrep -a sandbox-agent || echo 'Not running'" },
{ name: "Server logs (last 20 lines)", cmd: "tail -20 /tmp/sandbox-agent.log 2>/dev/null || echo 'No logs'" },
];
for (const { name, cmd } of checks) {
const result = await sandbox.exec(cmd);
console.log(`\n\x1b[1m${name}:\x1b[0m`);
console.log(result.stdout || "(empty)");
if (result.stderr) console.log(`stderr: ${result.stderr}`);
}
}
// Main
async function main() {
log.section(`Sandbox Testing (provider: ${provider})`);
// Check credentials
const anthropicKey = getAnthropicApiKey();
const openaiKey = getOpenAiApiKey();
log.info(`Anthropic API key: ${anthropicKey ? "found" : "not found"}`);
log.info(`OpenAI API key: ${openaiKey ? "found" : "not found"}`);
// Determine which agents to test
let agents: string[];
if (agentArg) {
agents = [agentArg];
} else if (anthropicKey) {
agents = ["claude"];
} else if (openaiKey) {
agents = ["codex"];
} else {
agents = ["mock"];
log.warn("No API keys found, using mock agent only");
}
log.info(`Agents to test: ${agents.join(", ")}`);
// Get provider
const prov = getProvider(provider);
// Check required env vars
for (const envVar of prov.requiredEnv) {
if (!process.env[envVar]) {
throw new Error(`Missing required environment variable: ${envVar}`);
}
}
// Build
const binaryPath = await buildSandboxAgent();
// Create sandbox
log.section(`Creating ${prov.name} sandbox`);
const envVars: Record<string, string> = {};
if (anthropicKey) envVars.ANTHROPIC_API_KEY = anthropicKey;
if (openaiKey) envVars.OPENAI_API_KEY = openaiKey;
const sandbox = await prov.create({ envVars });
log.success(`Created sandbox: ${sandbox.id}`);
try {
// Install sandbox-agent
await installSandboxAgent(sandbox, binaryPath);
// Install agents
await installAgents(sandbox, agents);
// Check environment
await checkEnvironment(sandbox);
// Start server and check health
const baseUrl = await startServerAndCheckHealth(sandbox);
// Test each agent
for (const agent of agents) {
const message = agent === "mock" ? "hello" : "Say hello in 10 words or less";
await testAgent(baseUrl, agent, message);
}
log.section("All tests passed!");
if (keepAlive) {
log.info(`Sandbox ${sandbox.id} is still running. Press Ctrl+C to cleanup.`);
log.info(`Base URL: ${await sandbox.getBaseUrl(3000)}`);
await new Promise(() => {}); // Wait forever
}
} catch (err) {
log.error(`Test failed: ${err}`);
// Show diagnostics on failure
try {
await checkEnvironment(sandbox);
} catch {}
if (!keepAlive) {
await sandbox.cleanup();
}
process.exit(1);
}
if (!keepAlive) {
await sandbox.cleanup();
}
}
main().catch((err) => {
log.error(err.message || err);
process.exit(1);
});