harivansh-afk/sandbox-agent

Fork 0

mirror of https://github.com/harivansh-afk/sandbox-agent.git synced 2026-04-15 14:03:52 +00:00

Nathan Flurry ef3e811c94

feat: add opencode compatibility layer (#68 )

2026-02-04 13:43:05 -08:00

14 KiB

Raw Permalink Blame History

OpenClaw (formerly Clawdbot) Research

Research notes on OpenClaw's architecture, API, and automation patterns for integration with sandbox-agent.

Overview

Provider: Multi-provider (Anthropic, OpenAI, etc. via Pi agent)
Execution Method: WebSocket Gateway + HTTP APIs
Session Persistence: Session Key (string) + Session ID (UUID)
SDK: No official SDK - uses WebSocket/HTTP protocol directly
Binary: clawdbot (npm global install or local)
Default Port: 18789 (WebSocket + HTTP multiplex)

Architecture

OpenClaw is architected differently from other coding agents (Claude Code, Codex, OpenCode, Amp):

┌─────────────────────────────────────┐
│           Gateway Service           │  ws://127.0.0.1:18789
│      (long-running daemon)          │  http://127.0.0.1:18789
│                                     │
│  ┌─────────────────────────────┐    │
│  │ Pi Agent (embedded RPC)     │    │
│  │ - Tool execution            │    │
│  │ - Model routing             │    │
│  │ - Session management        │    │
│  └─────────────────────────────┘    │
└─────────────────────────────────────┘
         │
         ├── WebSocket (full control plane)
         ├── HTTP /v1/chat/completions (OpenAI-compatible)
         ├── HTTP /v1/responses (OpenResponses-compatible)
         ├── HTTP /tools/invoke (single tool invocation)
         └── HTTP /hooks/agent (webhook triggers)

Key Difference: OpenClaw runs as a daemon that owns the agent runtime. Other agents (Claude, Codex, Amp) spawn a subprocess per turn. OpenClaw is more similar to OpenCode's server model but with a persistent gateway.

Automation Methods (Priority Order)

1. WebSocket Gateway Protocol (Recommended)

Full-featured bidirectional control with streaming events.

Connection Handshake

// Connect to Gateway
const ws = new WebSocket("ws://127.0.0.1:18789");

// First frame MUST be connect request
ws.send(JSON.stringify({
  type: "req",
  id: "1",
  method: "connect",
  params: {
    minProtocol: 3,
    maxProtocol: 3,
    client: {
      id: "gateway-client",  // or custom client id
      version: "1.0.0",
      platform: "linux",
      mode: "backend"
    },
    role: "operator",
    scopes: ["operator.admin"],
    caps: [],
    auth: { token: "YOUR_GATEWAY_TOKEN" }
  }
}));

// Expect hello-ok response
// { type: "res", id: "1", ok: true, payload: { type: "hello-ok", ... } }

Agent Request

// Send agent turn request
const runId = crypto.randomUUID();
ws.send(JSON.stringify({
  type: "req",
  id: runId,
  method: "agent",
  params: {
    message: "Your prompt here",
    idempotencyKey: runId,
    sessionKey: "agent:main:main",  // or custom session key
    thinking: "low",  // optional: low|medium|high
    deliver: false,   // don't send to messaging channel
    timeout: 300000   // 5 minute timeout
  }
}));

Response Flow (Two-Stage)

// Stage 1: Immediate ack
// { type: "res", id: "...", ok: true, payload: { runId, status: "accepted", acceptedAt: 1234567890 } }

// Stage 2: Streaming events
// { type: "event", event: "agent", payload: { runId, seq: 1, stream: "output", data: {...} } }
// { type: "event", event: "agent", payload: { runId, seq: 2, stream: "tool", data: {...} } }
// ...

// Stage 3: Final response (same id as request)
// { type: "res", id: "...", ok: true, payload: { runId, status: "ok", summary: "completed", result: {...} } }

2. OpenAI-Compatible HTTP API

For simple integration with tools expecting OpenAI Chat Completions.

Enable in config:

{
  gateway: {
    http: {
      endpoints: {
        chatCompletions: { enabled: true }
      }
    }
  }
}

Request:

curl -X POST http://127.0.0.1:18789/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "clawdbot:main",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Model Format:

model: "clawdbot:<agentId>" (e.g., "clawdbot:main")
model: "agent:<agentId>" (alias)

3. OpenResponses HTTP API

For clients that speak OpenResponses (item-based input, function tools).

Enable in config:

{
  gateway: {
    http: {
      endpoints: {
        responses: { enabled: true }
      }
    }
  }
}

Request:

curl -X POST http://127.0.0.1:18789/v1/responses \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "clawdbot:main",
    "input": "Hello",
    "stream": true
  }'

4. Webhooks (Fire-and-Forget)

For event-driven automation without maintaining a connection.

Enable in config:

{
  hooks: {
    enabled: true,
    token: "webhook-secret",
    path: "/hooks"
  }
}

Request:

curl -X POST http://127.0.0.1:18789/hooks/agent \
  -H "Authorization: Bearer webhook-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Run this task",
    "name": "Automation",
    "sessionKey": "hook:automation:task-123",
    "deliver": false,
    "timeoutSeconds": 120
  }'

Response: 202 Accepted (async run started)

5. CLI Subprocess

For simple one-off automation (similar to Claude Code pattern).

clawdbot agent --message "Your prompt" --session-key "automation:task"

Session Management

Session Key Format

agent:<agentId>:<sessionType>
agent:main:main          # Main agent, main session
agent:main:subagent:abc  # Subagent session
agent:beta:main          # Beta agent, main session
hook:email:msg-123       # Webhook-spawned session
global                   # Legacy global session

Session Operations (WebSocket)

// List sessions
{ type: "req", id: "...", method: "sessions.list", params: { limit: 50, activeMinutes: 120 } }

// Resolve session info
{ type: "req", id: "...", method: "sessions.resolve", params: { key: "agent:main:main" } }

// Patch session settings
{ type: "req", id: "...", method: "sessions.patch", params: {
  key: "agent:main:main",
  thinkingLevel: "medium",
  model: "anthropic/claude-sonnet-4-20250514"
}}

// Reset session (clear history)
{ type: "req", id: "...", method: "sessions.reset", params: { key: "agent:main:main" } }

// Delete session
{ type: "req", id: "...", method: "sessions.delete", params: { key: "agent:main:main" } }

// Compact session (summarize history)
{ type: "req", id: "...", method: "sessions.compact", params: { key: "agent:main:main" } }

Session CLI

clawdbot sessions                    # List sessions
clawdbot sessions --active 120       # Active in last 2 hours
clawdbot sessions --json             # JSON output

Streaming Events

Event Format

interface AgentEvent {
  runId: string;       // Correlates to request
  seq: number;         // Monotonically increasing per run
  stream: string;      // Event category
  ts: number;          // Unix timestamp (ms)
  data: Record<string, unknown>;  // Event-specific payload
}

Stream Types

Stream	Description
`output`	Text output chunks
`tool`	Tool invocation/result
`thinking`	Extended thinking content
`status`	Run status changes
`error`	Error information

Event Categories

Event Type	Payload
`assistant.delta`	`{ text: "..." }`
`tool.start`	`{ name: "Read", input: {...} }`
`tool.result`	`{ name: "Read", result: "..." }`
`thinking.delta`	`{ text: "..." }`
`run.complete`	`{ summary: "..." }`
`run.error`	`{ error: "..." }`

Token Usage / Cost Tracking

OpenClaw tracks tokens per response and supports cost estimation.

In-Chat Commands

/status              # Session model, context usage, last response tokens, estimated cost
/usage off|tokens|full  # Toggle per-response usage footer
/usage cost          # Local cost summary from session logs

Configuration

Token costs are configured per model:

{
  models: {
    providers: {
      anthropic: {
        models: [{
          id: "claude-sonnet-4-20250514",
          cost: {
            input: 3.00,       // USD per 1M tokens
            output: 15.00,
            cacheRead: 0.30,
            cacheWrite: 3.75
          }
        }]
      }
    }
  }
}

Programmatic Access

Token usage is included in agent response payloads:

// In final response or streaming events
{
  usage: {
    inputTokens: 1234,
    outputTokens: 567,
    cacheReadTokens: 890,
    cacheWriteTokens: 123
  }
}

Authentication

Gateway Token

# Environment variable
CLAWDBOT_GATEWAY_TOKEN=your-secret-token

# Or config file
{
  gateway: {
    auth: {
      mode: "token",
      token: "your-secret-token"
    }
  }
}

HTTP Requests

Authorization: Bearer YOUR_TOKEN

WebSocket Connect

{
  params: {
    auth: { token: "YOUR_TOKEN" }
  }
}

Status Sync

Health Check

// WebSocket
{ type: "req", id: "...", method: "health", params: {} }

// HTTP
curl http://127.0.0.1:18789/health  # Basic health
clawdbot health --json              # Detailed health

Status Response

{
  ok: boolean;
  linkedChannel?: string;
  models?: { available: string[] };
  agents?: { configured: string[] };
  presence?: PresenceEntry[];
  uptimeMs?: number;
}

Presence Events

The gateway pushes presence updates to all connected clients:

// Event
{ type: "event", event: "presence", payload: { entries: [...], stateVersion: {...} } }

Comparison with Other Agents

Aspect	OpenClaw	Claude Code	Codex	OpenCode	Amp
Process Model	Daemon	Subprocess	Server	Server	Subprocess
Protocol	WebSocket + HTTP	CLI JSONL	JSON-RPC stdio	HTTP + SSE	CLI JSONL
Session Resume	Session Key	`--resume`	Thread ID	Session ID	`--continue`
Multi-Turn	Same session key	Same session ID	Same thread	Same session	Same session
Streaming	WS events + SSE	JSONL	Notifications	SSE	JSONL
HITL	No	No (headless)	No (SDK)	Yes (SSE)	No
SDK	None (protocol)	None (CLI)	Yes	Yes	Closed

Key Differences

Daemon Model: OpenClaw runs as a persistent gateway service, not a per-turn subprocess
Multi-Protocol: Supports WebSocket, OpenAI-compatible HTTP, OpenResponses, and webhooks
Channel Integration: Built-in WhatsApp/Telegram/Discord/iMessage support
Node System: Mobile/desktop nodes can connect for camera, canvas, location, etc.
No HITL: Like Claude/Codex/Amp, permissions are configured upfront, not interactive

Integration Patterns for sandbox-agent

Recommended: Persistent WebSocket Connection

class OpenClawDriver {
  private ws: WebSocket;
  private pending = new Map<string, { resolve, reject }>();
  
  async connect(url: string, token: string) {
    this.ws = new WebSocket(url);
    await this.handshake(token);
    this.ws.on("message", (data) => this.handleMessage(JSON.parse(data)));
  }
  
  async runAgent(params: {
    message: string;
    sessionKey?: string;
    thinking?: string;
  }): Promise<AgentResult> {
    const runId = crypto.randomUUID();
    const events: AgentEvent[] = [];
    
    return new Promise((resolve, reject) => {
      this.pending.set(runId, { resolve, reject, events });
      this.ws.send(JSON.stringify({
        type: "req",
        id: runId,
        method: "agent",
        params: {
          message: params.message,
          sessionKey: params.sessionKey ?? "agent:main:main",
          thinking: params.thinking,
          deliver: false,
          idempotencyKey: runId
        }
      }));
    });
  }
  
  private handleMessage(frame: GatewayFrame) {
    if (frame.type === "event" && frame.event === "agent") {
      const pending = this.pending.get(frame.payload.runId);
      if (pending) pending.events.push(frame.payload);
    } else if (frame.type === "res") {
      const pending = this.pending.get(frame.id);
      if (pending && frame.payload?.status === "ok") {
        pending.resolve({ result: frame.payload, events: pending.events });
        this.pending.delete(frame.id);
      } else if (pending && frame.payload?.status === "error") {
        pending.reject(new Error(frame.payload.summary));
        this.pending.delete(frame.id);
      }
      // Ignore "accepted" acks
    }
  }
}

Alternative: HTTP API (Simpler)

async function runOpenClawPrompt(prompt: string, sessionKey?: string) {
  const response = await fetch("http://127.0.0.1:18789/v1/chat/completions", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.CLAWDBOT_GATEWAY_TOKEN}`,
      "Content-Type": "application/json",
      "x-clawdbot-session-key": sessionKey ?? "automation:sandbox"
    },
    body: JSON.stringify({
      model: "clawdbot:main",
      messages: [{ role: "user", content: prompt }],
      stream: false
    })
  });
  return response.json();
}

Configuration for sandbox-agent Integration

Recommended config for automated use:

{
  gateway: {
    port: 18789,
    auth: {
      mode: "token",
      token: "${CLAWDBOT_GATEWAY_TOKEN}"
    },
    http: {
      endpoints: {
        chatCompletions: { enabled: true },
        responses: { enabled: true }
      }
    }
  },
  agents: {
    defaults: {
      model: {
        primary: "anthropic/claude-sonnet-4-20250514"
      },
      thinking: { level: "low" },
      workspace: "${HOME}/workspace"
    }
  }
}

Notes

OpenClaw is significantly more complex than other agents due to its gateway architecture
The multi-protocol support (WS, OpenAI, OpenResponses, webhooks) provides flexibility
Session management is richer (labels, spawn tracking, model/thinking overrides)
No SDK means direct protocol implementation is required
The daemon model means connection lifecycle management is important (reconnects, etc.)
Agent responses are two-stage: immediate ack + final result (handle both)
Tool policy filtering is configurable per agent/session/group

14 KiB Raw Permalink Blame History

OpenClaw (formerly Clawdbot) Research

Overview

Architecture

Automation Methods (Priority Order)

1. WebSocket Gateway Protocol (Recommended)

Connection Handshake

Agent Request

Response Flow (Two-Stage)

2. OpenAI-Compatible HTTP API

3. OpenResponses HTTP API

4. Webhooks (Fire-and-Forget)

5. CLI Subprocess

Session Management

Session Key Format

Session Operations (WebSocket)

Session CLI

Streaming Events

Event Format

Stream Types

Event Categories

Token Usage / Cost Tracking

In-Chat Commands

Configuration

Programmatic Access

Authentication

Gateway Token

HTTP Requests

WebSocket Connect

Status Sync

Health Check

Status Response

Presence Events

Comparison with Other Agents

Key Differences

Integration Patterns for sandbox-agent

Recommended: Persistent WebSocket Connection

Alternative: HTTP API (Simpler)

Configuration for sandbox-agent Integration

Notes

14 KiB

Raw Permalink Blame History