feat: add mock server mode for UI testing

This commit is contained in:
Nathan Flurry 2026-01-27 03:42:41 -08:00
parent f5d1a6383d
commit d24f983e2c
21 changed files with 1108 additions and 848 deletions

View file

@ -34,10 +34,12 @@ Universal schema guidance:
- Do not make breaking changes to API endpoints.
- When changing API routes, ensure the HTTP/SSE test suite has full coverage of every route.
- When agent schema changes, ensure API tests cover the new schema and event shapes end-to-end.
- When the universal schema changes, update mock-mode events to cover the new fields or event types.
- Update `docs/conversion.md` whenever agent-native schema terms, synthetic events, identifier mappings, or conversion logic change.
- Never use synthetic data or mocked responses in tests.
- Never manually write agent types; always use generated types in `resources/agent-schemas/`. If types are broken, fix the generated types.
- The universal schema must provide consistent behavior across providers; avoid requiring frontend/client logic to special-case agents.
- The UI must reflect every field in AgentCapabilities; keep it in sync with the README feature matrix and `agent_capabilities_for`.
- When parsing agent data, if something is unexpected or does not match the schema, bail out and surface the error rather than trying to continue with partial parsing.
- When defining the universal schema, choose the option most compatible with native agent APIs, and add synthetics to fill gaps for other agents.
- Use `docs/glossary.md` as the source of truth for universal schema terminology and keep it updated alongside schema changes.

View file

@ -62,6 +62,10 @@ Docs
### Server
- Install server
- curl (fastest & does not require npm)
- npm i -g (slower)
- npx (for quick runs)
- Run server
- Auth
@ -71,6 +75,10 @@ Docs
Docs
### Tip: Extracting API Keys
TODO: npx command to get API keys
## Project Goals
This project aims to solve 3 problems with agents:

View file

@ -1,5 +1,6 @@
## launch
- examples for daytona, e2b
- provide mock data for validating your rendering
- provides history with all items, then iterates thorugh all items on a stream
- this is a special type of serve function

View file

@ -1,2 +0,0 @@
- openai exteacted credentials do not work

167
docs/building-chat-ui.mdx Normal file
View file

@ -0,0 +1,167 @@
---
title: "Building a Chat UI"
description: "Design a client that renders universal session events consistently across providers."
---
This guide explains how to build a chat UI that works across all agents using the universal event
stream.
## High-level flow
1. List agents and read their capabilities.
2. Create a session for the selected agent.
3. Send user messages.
4. Subscribe to events (polling or SSE).
5. Render items and deltas into a stable message timeline.
## Use agent capabilities
Capabilities tell you which features are supported for the selected agent:
- `tool_calls` and `tool_results` indicate tool execution events.
- `questions` and `permissions` indicate HITL flows.
- `plan_mode` indicates that the agent supports plan-only execution.
Use these to enable or disable UI affordances (tool panels, approval buttons, etc.).
## Event model
Every event includes:
- `event_id`, `sequence`, and `time` for ordering.
- `session_id` for the universal session.
- `native_session_id` for provider-specific debugging.
- `event_type` with one of:
- `session.started`, `session.ended`
- `item.started`, `item.delta`, `item.completed`
- `permission.requested`, `permission.resolved`
- `question.requested`, `question.resolved`
- `error`, `agent.unparsed`
- `data` which holds the payload for the event type.
- `synthetic` and `source` to show daemon-generated events.
- `raw` (optional) when `include_raw=true`.
## Rendering items
Items are emitted in three phases:
- `item.started`: first snapshot of a message or tool item.
- `item.delta`: incremental updates (token streaming or synthetic deltas).
- `item.completed`: final snapshot.
Recommended render flow:
```ts
type ItemState = {
item: UniversalItem;
deltas: string[];
};
const items = new Map<string, ItemState>();
const order: string[] = [];
function applyEvent(event: UniversalEvent) {
if (event.event_type === "item.started") {
const item = event.data.item;
items.set(item.item_id, { item, deltas: [] });
order.push(item.item_id);
}
if (event.event_type === "item.delta") {
const { item_id, delta } = event.data;
const state = items.get(item_id);
if (state) {
state.deltas.push(delta);
}
}
if (event.event_type === "item.completed") {
const item = event.data.item;
const state = items.get(item.item_id);
if (state) {
state.item = item;
}
}
}
```
When rendering, combine the item content with accumulated deltas. If you receive a delta before a
started event (should not happen), treat it as an error.
## Content parts
Each `UniversalItem` has `content` parts. Your UI can branch on `part.type`:
- `text` for normal chat text.
- `tool_call` and `tool_result` for tool execution.
- `file_ref` for file read/write/patch previews.
- `reasoning` if you display public reasoning text.
- `status` for progress updates.
- `image` for image outputs.
Treat `item.kind` as the primary layout decision (message vs tool call vs system), and use content
parts for the detailed rendering.
## Questions and permissions
Question and permission events are out-of-band from item flow. Render them as modal or inline UI
blocks that must be resolved via:
- `POST /v1/sessions/{session_id}/questions/{question_id}/reply`
- `POST /v1/sessions/{session_id}/questions/{question_id}/reject`
- `POST /v1/sessions/{session_id}/permissions/{permission_id}/reply`
If an agent does not advertise these capabilities, keep those UI controls hidden.
## Error and unparsed events
- `error` events are structured failures from the daemon or agent.
- `agent.unparsed` indicates the provider emitted something the converter could not parse.
Treat `agent.unparsed` as a hard failure in development so you can fix converters quickly.
## Event ordering
Prefer `sequence` for ordering. It is monotonic for a given session. The `time` field is for
timestamps, not ordering.
## Handling session end
`session.ended` includes the reason and who terminated it. Disable input after a terminal event.
## Optional raw payloads
If you need provider-level debugging, pass `include_raw=true` when streaming or polling events to
receive the `raw` payload for each event.
## SSE vs polling
- SSE gives low-latency updates and simplifies streaming UIs.
- Polling is simpler to debug and works in any environment.
Both yield the same event payloads.
## Mock mode for UI testing
Run the server with `--mock` to emit a looping, feature-complete event history for UI development:
```bash
sandbox-agent server --mock --no-token
```
Behavior in mock mode:
- Sessions emit a fixed history that covers every event type and content part.
- The history repeats in a loop, with ~200ms between events and a ~2s pause between loops.
- `session.started` and `session.ended` are included in every loop so UIs can exercise lifecycle handling.
- `send-message` is accepted but does not change the mock stream.
If your UI stops rendering after `session.ended`, disable that behavior while testing mock mode so the
loop remains visible.
## Reference implementation
The [Inspector chat UI](https://github.com/rivet-dev/sandbox-agent/blob/main/frontend/packages/inspector/src/App.tsx)
is a complete reference implementation showing how to build a chat interface using the universal event
stream. It demonstrates session management, event rendering, item lifecycle handling, and HITL approval
flows.

View file

@ -30,7 +30,14 @@
{
"group": "Operations",
"pages": [
"frontend"
"frontend",
"building-chat-ui"
]
},
{
"group": "SDKs",
"pages": [
"sdks/typescript"
]
}
]

130
docs/sdks/typescript.mdx Normal file
View file

@ -0,0 +1,130 @@
---
title: "TypeScript SDK"
description: "Use the generated client to manage sessions and stream events."
---
The TypeScript SDK is generated from the OpenAPI spec that ships with the daemon. It provides a typed
client for sessions, events, and agent operations.
## Install
```bash
npm install sandbox-agent
```
## Create a client
```ts
import { SandboxDaemonClient } from "sandbox-agent";
const client = new SandboxDaemonClient({
baseUrl: "http://127.0.0.1:2468",
token: process.env.SANDBOX_TOKEN,
});
```
Or with the factory helper:
```ts
import { createSandboxDaemonClient } from "sandbox-agent";
const client = createSandboxDaemonClient({
baseUrl: "http://127.0.0.1:2468",
});
```
## Autospawn (Node only)
If you run locally, the SDK can launch the daemon for you.
```ts
import { connectSandboxDaemonClient } from "sandbox-agent";
const client = await connectSandboxDaemonClient({
spawn: { enabled: true },
});
await client.dispose();
```
Autospawn uses the local `sandbox-agent` binary. Install `@sandbox-agent/cli` (recommended) or set
`SANDBOX_AGENT_BIN` to a custom path.
## Sessions and messages
```ts
await client.createSession("demo-session", {
agent: "codex",
agent_mode: "default",
permission_mode: "plan",
});
await client.postMessage("demo-session", { message: "Hello" });
```
List agents and pick a compatible one:
```ts
const agents = await client.listAgents();
const codex = agents.agents.find((agent) => agent.id === "codex");
console.log(codex?.capabilities);
```
## Poll events
```ts
const events = await client.getEvents("demo-session", {
offset: 0,
limit: 200,
include_raw: false,
});
for (const event of events.events) {
console.log(event.event_type, event.data);
}
```
## Stream events (SSE)
```ts
for await (const event of client.streamEvents("demo-session", {
offset: 0,
include_raw: false,
})) {
console.log(event.event_type, event.data);
}
```
The SDK parses `text/event-stream` into `UniversalEvent` objects. If you want full control, use
`getEventsSse()` and parse the stream yourself.
## Optional raw payloads
Set `include_raw: true` on `getEvents` or `streamEvents` to include the raw provider payload in
`event.raw`. This is useful for debugging and conversion analysis.
## Error handling
All HTTP errors throw `SandboxDaemonError`:
```ts
import { SandboxDaemonError } from "sandbox-agent";
try {
await client.postMessage("missing-session", { message: "Hi" });
} catch (error) {
if (error instanceof SandboxDaemonError) {
console.error(error.status, error.problem);
}
}
```
## Types
The SDK exports OpenAPI-derived types for events, items, and capabilities:
```ts
import type { UniversalEvent, UniversalItem, AgentCapabilities } from "sandbox-agent";
```
See `docs/universal-api.mdx` for the universal schema fields and semantics.

View file

@ -1314,6 +1314,37 @@
white-space: nowrap;
}
/* Capability Badges */
.capability-badges {
display: flex;
flex-wrap: wrap;
gap: 6px;
}
.capability-badge {
display: inline-flex;
align-items: center;
gap: 4px;
padding: 3px 8px;
border-radius: 4px;
font-size: 10px;
font-weight: 500;
}
.capability-badge.enabled {
background: rgba(48, 209, 88, 0.12);
color: var(--success);
}
.capability-badge.disabled {
background: rgba(255, 255, 255, 0.04);
color: var(--muted-2);
}
.capability-badge svg {
flex-shrink: 0;
}
/* Scrollbar */
.messages-container::-webkit-scrollbar,
.debug-content::-webkit-scrollbar {

View file

@ -2,6 +2,7 @@ import {
Clipboard,
Cloud,
Download,
GitBranch,
HelpCircle,
MessageSquare,
PauseCircle,
@ -11,6 +12,7 @@ import {
Send,
Shield,
Terminal,
Wrench,
Zap
} from "lucide-react";
import { useCallback, useEffect, useMemo, useRef, useState } from "react";
@ -85,14 +87,24 @@ const formatJson = (value: unknown) => {
const escapeSingleQuotes = (value: string) => value.replace(/'/g, `'\\''`);
const formatCapabilities = (capabilities: AgentCapabilities) => {
const parts = [
`planMode ${capabilities.planMode ? "✓" : "—"}`,
`permissions ${capabilities.permissions ? "✓" : "—"}`,
`questions ${capabilities.questions ? "✓" : "—"}`,
`toolCalls ${capabilities.toolCalls ? "✓" : "—"}`
const CapabilityBadges = ({ capabilities }: { capabilities: AgentCapabilities }) => {
const items = [
{ key: "planMode", label: "Plan", icon: GitBranch, enabled: capabilities.planMode },
{ key: "permissions", label: "Perms", icon: Shield, enabled: capabilities.permissions },
{ key: "questions", label: "Q&A", icon: HelpCircle, enabled: capabilities.questions },
{ key: "toolCalls", label: "Tools", icon: Wrench, enabled: capabilities.toolCalls }
];
return parts.join(" · ");
return (
<div className="capability-badges">
{items.map(({ key, label, icon: Icon, enabled }) => (
<span key={key} className={`capability-badge ${enabled ? "enabled" : "disabled"}`}>
<Icon size={12} />
<span>{label}</span>
</span>
))}
</div>
);
};
const buildCurl = (method: string, url: string, body?: string, token?: string) => {
@ -1459,8 +1471,8 @@ export default function App() {
{agent.version ? `v${agent.version}` : "Version unknown"}
{agent.path && <span className="mono muted" style={{ marginLeft: 8 }}>{agent.path}</span>}
</div>
<div className="card-meta" style={{ marginTop: 8 }}>
Capabilities: {formatCapabilities(agent.capabilities ?? emptyCapabilities)}
<div style={{ marginTop: 8 }}>
<CapabilityBadges capabilities={agent.capabilities ?? emptyCapabilities} />
</div>
{modesByAgent[agent.id] && modesByAgent[agent.id].length > 0 && (
<div className="card-meta" style={{ marginTop: 8 }}>

View file

@ -5,6 +5,12 @@ export default defineConfig(({ command }) => ({
base: command === "build" ? "/ui/" : "/",
plugins: [react()],
server: {
port: 5173
}
port: 5173,
proxy: {
"/v1": {
target: "http://localhost:2468",
changeOrigin: true,
},
},
},
}));

View file

@ -7,7 +7,8 @@
"build": "turbo run build",
"dev": "turbo run dev --parallel",
"generate": "turbo run generate",
"typecheck": "turbo run typecheck"
"typecheck": "turbo run typecheck",
"docs": "pnpm dlx mintlify dev docs"
},
"devDependencies": {
"turbo": "^2.4.0"

View file

@ -11,7 +11,7 @@ use sandbox_agent_agent_management::credentials::{
ProviderCredentials,
};
use sandbox_agent::router::{
AgentInstallRequest, AppState, AuthConfig, CreateSessionRequest, MessageRequest,
AgentInstallRequest, AppState, AuthConfig, CreateSessionRequest, MessageRequest, MockConfig,
PermissionReply, PermissionReplyRequest, QuestionReplyRequest,
};
use sandbox_agent::router::{AgentListResponse, AgentModesResponse, CreateSessionResponse, EventsResponse};
@ -72,6 +72,9 @@ struct ServerArgs {
#[arg(long = "cors-allow-credentials", short = 'C')]
cors_allow_credentials: bool,
#[arg(long)]
mock: bool,
}
#[derive(Args, Debug)]
@ -334,7 +337,12 @@ fn run_server(cli: &Cli, server: &ServerArgs) -> Result<(), CliError> {
let agent_manager =
AgentManager::new(default_install_dir()).map_err(|err| CliError::Server(err.to_string()))?;
let state = AppState::new(auth, agent_manager);
let mock = if server.mock {
MockConfig::enabled()
} else {
MockConfig::disabled()
};
let state = AppState::new(auth, agent_manager, mock);
let mut router = build_router(state);
if let Some(cors) = build_cors_layer(server)? {

View file

@ -65,6 +65,34 @@ use sandbox_agent_agent_management::credentials::{
};
use crate::ui;
const MOCK_EVENT_DELAY_MS: u64 = 200;
const MOCK_LOOP_DELAY_MS: u64 = 2000;
#[derive(Debug, Clone)]
pub struct MockConfig {
enabled: bool,
event_delay: Duration,
loop_delay: Duration,
}
impl MockConfig {
pub fn disabled() -> Self {
Self {
enabled: false,
event_delay: Duration::ZERO,
loop_delay: Duration::ZERO,
}
}
pub fn enabled() -> Self {
Self {
enabled: true,
event_delay: Duration::from_millis(MOCK_EVENT_DELAY_MS),
loop_delay: Duration::from_millis(MOCK_LOOP_DELAY_MS),
}
}
}
#[derive(Debug)]
pub struct AppState {
auth: AuthConfig,
@ -73,9 +101,9 @@ pub struct AppState {
}
impl AppState {
pub fn new(auth: AuthConfig, agent_manager: AgentManager) -> Self {
pub fn new(auth: AuthConfig, agent_manager: AgentManager, mock: MockConfig) -> Self {
let agent_manager = Arc::new(agent_manager);
let session_manager = Arc::new(SessionManager::new(agent_manager.clone()));
let session_manager = Arc::new(SessionManager::new(agent_manager.clone(), mock));
Self {
auth,
agent_manager,
@ -565,6 +593,7 @@ struct SessionManager {
sessions: Mutex<HashMap<String, SessionState>>,
opencode_server: Mutex<Option<OpencodeServer>>,
http_client: Client,
mock: MockConfig,
}
#[derive(Debug)]
@ -580,12 +609,13 @@ struct SessionSubscription {
}
impl SessionManager {
fn new(agent_manager: Arc<AgentManager>) -> Self {
fn new(agent_manager: Arc<AgentManager>, mock: MockConfig) -> Self {
Self {
agent_manager,
sessions: Mutex::new(HashMap::new()),
opencode_server: Mutex::new(None),
http_client: Client::new(),
mock,
}
}
@ -602,6 +632,10 @@ impl SessionManager {
}
}
if self.mock.enabled {
return self.create_mock_session(session_id, agent_id, request).await;
}
let manager = self.agent_manager.clone();
let agent_version = request.agent_version.clone();
let agent_name = request.agent.clone();
@ -660,6 +694,32 @@ impl SessionManager {
})
}
async fn create_mock_session(
self: &Arc<Self>,
session_id: String,
agent_id: AgentId,
request: CreateSessionRequest,
) -> Result<CreateSessionResponse, SandboxError> {
let mut session = SessionState::new(session_id.clone(), agent_id, &request)?;
session.native_session_id = Some(format!("mock-{session_id}"));
let native_session_id = session.native_session_id.clone();
let mut sessions = self.sessions.lock().await;
sessions.insert(session_id.clone(), session);
drop(sessions);
let manager = Arc::clone(self);
tokio::spawn(async move {
manager.run_mock_loop(session_id).await;
});
Ok(CreateSessionResponse {
healthy: true,
error: None,
native_session_id,
})
}
async fn agent_modes(&self, agent: AgentId) -> Result<Vec<AgentModeInfo>, SandboxError> {
if agent != AgentId::Opencode {
return Ok(agent_modes_for(agent));
@ -683,6 +743,11 @@ impl SessionManager {
session_id: String,
message: String,
) -> Result<(), SandboxError> {
if self.mock.enabled {
self.session_snapshot(&session_id, false).await?;
return Ok(());
}
let session_snapshot = self.session_snapshot(&session_id, false).await?;
if session_snapshot.agent == AgentId::Opencode {
self.ensure_opencode_stream(session_id.clone()).await?;
@ -833,6 +898,43 @@ impl SessionManager {
question_id: &str,
answers: Vec<Vec<String>>,
) -> Result<(), SandboxError> {
if self.mock.enabled {
let pending = {
let mut sessions = self.sessions.lock().await;
let session = sessions.get_mut(session_id).ok_or_else(|| SandboxError::SessionNotFound {
session_id: session_id.to_string(),
})?;
if let Some(err) = session.ended_error() {
return Err(err);
}
session.take_question(question_id)
};
let (prompt, options) = match pending {
Some(pending) => (pending.prompt, pending.options),
None => (
"Mock question prompt".to_string(),
vec!["Option A".to_string(), "Option B".to_string()],
),
};
let response = answers
.first()
.and_then(|inner| inner.first())
.cloned();
let resolved = EventConversion::new(
UniversalEventType::QuestionResolved,
UniversalEventData::Question(QuestionEventData {
question_id: question_id.to_string(),
prompt,
options,
response,
status: QuestionStatus::Answered,
}),
)
.synthetic();
let _ = self.record_conversions(session_id, vec![resolved]).await;
return Ok(());
}
let (agent, native_session_id, pending_question) = {
let mut sessions = self.sessions.lock().await;
let session = sessions.get_mut(session_id).ok_or_else(|| SandboxError::SessionNotFound {
@ -891,6 +993,39 @@ impl SessionManager {
session_id: &str,
question_id: &str,
) -> Result<(), SandboxError> {
if self.mock.enabled {
let pending = {
let mut sessions = self.sessions.lock().await;
let session = sessions.get_mut(session_id).ok_or_else(|| SandboxError::SessionNotFound {
session_id: session_id.to_string(),
})?;
if let Some(err) = session.ended_error() {
return Err(err);
}
session.take_question(question_id)
};
let (prompt, options) = match pending {
Some(pending) => (pending.prompt, pending.options),
None => (
"Mock question prompt".to_string(),
vec!["Option A".to_string(), "Option B".to_string()],
),
};
let resolved = EventConversion::new(
UniversalEventType::QuestionResolved,
UniversalEventData::Question(QuestionEventData {
question_id: question_id.to_string(),
prompt,
options,
response: None,
status: QuestionStatus::Rejected,
}),
)
.synthetic();
let _ = self.record_conversions(session_id, vec![resolved]).await;
return Ok(());
}
let (agent, native_session_id, pending_question) = {
let mut sessions = self.sessions.lock().await;
let session = sessions.get_mut(session_id).ok_or_else(|| SandboxError::SessionNotFound {
@ -945,6 +1080,40 @@ impl SessionManager {
permission_id: &str,
reply: PermissionReply,
) -> Result<(), SandboxError> {
if self.mock.enabled {
let pending = {
let mut sessions = self.sessions.lock().await;
let session = sessions.get_mut(session_id).ok_or_else(|| SandboxError::SessionNotFound {
session_id: session_id.to_string(),
})?;
if let Some(err) = session.ended_error() {
return Err(err);
}
session.take_permission(permission_id)
};
let (action, metadata) = match pending {
Some(pending) => (pending.action, pending.metadata),
None => ("mock.permission".to_string(), None),
};
let status = match reply {
PermissionReply::Reject => PermissionStatus::Denied,
PermissionReply::Once | PermissionReply::Always => PermissionStatus::Approved,
};
let resolved = EventConversion::new(
UniversalEventType::PermissionResolved,
UniversalEventData::Permission(PermissionEventData {
permission_id: permission_id.to_string(),
action,
status,
metadata,
}),
)
.synthetic();
let _ = self.record_conversions(session_id, vec![resolved]).await;
return Ok(());
}
let reply_for_status = reply.clone();
let (agent, native_session_id, codex_sender, pending_permission) = {
let mut sessions = self.sessions.lock().await;
@ -1055,6 +1224,50 @@ impl SessionManager {
Ok(())
}
async fn run_mock_loop(self: Arc<Self>, session_id: String) {
let mut cycle = 0_u64;
let event_delay = self.mock.event_delay;
let loop_delay = self.mock.loop_delay;
loop {
if self.is_session_ended(&session_id).await {
return;
}
let snapshot = match self.session_snapshot(&session_id, true).await {
Ok(snapshot) => snapshot,
Err(_) => return,
};
cycle = cycle.saturating_add(1);
let conversions = mock_event_conversions(cycle, &snapshot);
for conversion in conversions {
if self
.record_conversions(&session_id, vec![conversion])
.await
.is_err()
{
return;
}
if event_delay != Duration::ZERO {
sleep(event_delay).await;
}
if self.is_session_ended(&session_id).await {
return;
}
}
if loop_delay != Duration::ZERO {
sleep(loop_delay).await;
}
}
}
async fn is_session_ended(&self, session_id: &str) -> bool {
let sessions = self.sessions.lock().await;
match sessions.get(session_id) {
Some(session) => session.ended,
None => true,
}
}
async fn session_snapshot(
&self,
session_id: &str,
@ -1756,10 +1969,22 @@ pub struct AgentModesResponse {
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct AgentCapabilities {
// TODO: add agent-agnostic tests that cover every capability flag here.
pub plan_mode: bool,
pub permissions: bool,
pub questions: bool,
pub tool_calls: bool,
pub tool_results: bool,
pub text_messages: bool,
pub images: bool,
pub file_attachments: bool,
pub session_lifecycle: bool,
pub error_events: bool,
pub reasoning: bool,
pub command_execution: bool,
pub file_changes: bool,
pub mcp_tools: bool,
pub streaming_deltas: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize, ToSchema, JsonSchema)]
@ -2246,24 +2471,68 @@ fn agent_capabilities_for(agent: AgentId) -> AgentCapabilities {
permissions: false,
questions: false,
tool_calls: false,
tool_results: false,
text_messages: true,
images: false,
file_attachments: false,
session_lifecycle: false,
error_events: false,
reasoning: false,
command_execution: false,
file_changes: false,
mcp_tools: false,
streaming_deltas: false,
},
AgentId::Codex => AgentCapabilities {
plan_mode: true,
permissions: true,
questions: false,
tool_calls: true,
tool_results: true,
text_messages: true,
images: true,
file_attachments: true,
session_lifecycle: true,
error_events: true,
reasoning: true,
command_execution: true,
file_changes: true,
mcp_tools: true,
streaming_deltas: true,
},
AgentId::Opencode => AgentCapabilities {
plan_mode: false,
permissions: false,
questions: false,
tool_calls: true,
tool_results: true,
text_messages: true,
images: true,
file_attachments: true,
session_lifecycle: true,
error_events: true,
reasoning: false,
command_execution: false,
file_changes: false,
mcp_tools: false,
streaming_deltas: true,
},
AgentId::Amp => AgentCapabilities {
plan_mode: false,
permissions: false,
questions: false,
tool_calls: true,
tool_results: true,
text_messages: true,
images: false,
file_attachments: false,
session_lifecycle: false,
error_events: true,
reasoning: false,
command_execution: false,
file_changes: false,
mcp_tools: false,
streaming_deltas: false,
},
}
}
@ -3172,6 +3441,441 @@ fn text_delta_from_parts(parts: &[ContentPart]) -> Option<String> {
}
}
fn mock_event_conversions(loop_index: u64, session: &SessionSnapshot) -> Vec<EventConversion> {
let prefix = format!("mock_{loop_index}");
let system_native = format!("{prefix}_system");
let user_native = format!("{prefix}_user");
let assistant_native = format!("{prefix}_assistant");
let status_native = format!("{prefix}_status");
let tool_call_native = format!("{prefix}_tool_call");
let tool_result_native = format!("{prefix}_tool_result");
let image_native = format!("{prefix}_image");
let unknown_native = format!("{prefix}_unknown");
let permission_id = format!("{prefix}_permission");
let permission_deny_id = format!("{prefix}_permission_denied");
let question_id = format!("{prefix}_question");
let question_reject_id = format!("{prefix}_question_reject");
let call_id = format!("{prefix}_call");
let metadata = json!({
"agent": session.agent.as_str(),
"agentMode": session.agent_mode.clone(),
"permissionMode": session.permission_mode.clone(),
"model": session.model.clone(),
"variant": session.variant.clone(),
"mockCycle": loop_index,
});
let mut events = Vec::new();
events.push(
EventConversion::new(
UniversalEventType::SessionStarted,
UniversalEventData::SessionStarted(SessionStartedData {
metadata: Some(metadata),
}),
)
.synthetic(),
);
events.push(mock_item_event(
UniversalEventType::ItemStarted,
mock_item(
system_native.clone(),
ItemKind::System,
ItemRole::System,
ItemStatus::InProgress,
vec![ContentPart::Text {
text: "System ready for mock events.".to_string(),
}],
),
));
events.push(mock_item_event(
UniversalEventType::ItemCompleted,
mock_item(
system_native,
ItemKind::System,
ItemRole::System,
ItemStatus::Completed,
vec![ContentPart::Text {
text: "System ready for mock events.".to_string(),
}],
),
));
events.push(mock_item_event(
UniversalEventType::ItemStarted,
mock_item(
user_native.clone(),
ItemKind::Message,
ItemRole::User,
ItemStatus::InProgress,
vec![ContentPart::Text {
text: "User: run the mock pipeline.".to_string(),
}],
),
));
events.push(mock_item_event(
UniversalEventType::ItemCompleted,
mock_item(
user_native,
ItemKind::Message,
ItemRole::User,
ItemStatus::Completed,
vec![ContentPart::Text {
text: "User: run the mock pipeline.".to_string(),
}],
),
));
let assistant_parts = vec![
ContentPart::Text {
text: "Mock assistant response with rich content.".to_string(),
},
ContentPart::Reasoning {
text: "Public reasoning for display.".to_string(),
visibility: ReasoningVisibility::Public,
},
ContentPart::Reasoning {
text: "Private reasoning hidden by default.".to_string(),
visibility: ReasoningVisibility::Private,
},
ContentPart::Json {
json: json!({
"stage": "analysis",
"ok": true,
"cycle": loop_index
}),
},
];
events.push(mock_item_event(
UniversalEventType::ItemStarted,
mock_item(
assistant_native.clone(),
ItemKind::Message,
ItemRole::Assistant,
ItemStatus::InProgress,
assistant_parts.clone(),
),
));
events.push(mock_delta(assistant_native.clone(), "Mock assistant "));
events.push(mock_delta(assistant_native.clone(), "streaming delta."));
events.push(mock_item_event(
UniversalEventType::ItemCompleted,
mock_item(
assistant_native,
ItemKind::Message,
ItemRole::Assistant,
ItemStatus::Completed,
assistant_parts,
),
));
events.push(mock_item_event(
UniversalEventType::ItemStarted,
mock_item(
status_native.clone(),
ItemKind::Status,
ItemRole::Assistant,
ItemStatus::InProgress,
vec![ContentPart::Status {
label: "Indexing".to_string(),
detail: Some("2 files".to_string()),
}],
),
));
events.push(mock_item_event(
UniversalEventType::ItemCompleted,
mock_item(
status_native,
ItemKind::Status,
ItemRole::Assistant,
ItemStatus::Completed,
vec![ContentPart::Status {
label: "Indexing".to_string(),
detail: Some("Done".to_string()),
}],
),
));
let tool_call_part = ContentPart::ToolCall {
name: "mock.search".to_string(),
arguments: "{\"query\":\"example\"}".to_string(),
call_id: call_id.clone(),
};
events.push(mock_item_event(
UniversalEventType::ItemStarted,
mock_item(
tool_call_native.clone(),
ItemKind::ToolCall,
ItemRole::Assistant,
ItemStatus::InProgress,
vec![tool_call_part.clone()],
),
));
events.push(mock_item_event(
UniversalEventType::ItemCompleted,
mock_item(
tool_call_native,
ItemKind::ToolCall,
ItemRole::Assistant,
ItemStatus::Completed,
vec![tool_call_part],
),
));
let tool_result_parts = vec![
ContentPart::ToolResult {
call_id: call_id.clone(),
output: "mock search results".to_string(),
},
ContentPart::FileRef {
path: format!("{prefix}/readme.md"),
action: FileAction::Read,
diff: None,
},
ContentPart::FileRef {
path: format!("{prefix}/output.txt"),
action: FileAction::Write,
diff: Some("+mock output\n".to_string()),
},
ContentPart::FileRef {
path: format!("{prefix}/patch.txt"),
action: FileAction::Patch,
diff: Some("@@ -1,1 +1,1 @@\n-old\n+new\n".to_string()),
},
];
events.push(mock_item_event(
UniversalEventType::ItemStarted,
mock_item(
tool_result_native.clone(),
ItemKind::ToolResult,
ItemRole::Tool,
ItemStatus::InProgress,
tool_result_parts.clone(),
),
));
events.push(mock_item_event(
UniversalEventType::ItemCompleted,
mock_item(
tool_result_native,
ItemKind::ToolResult,
ItemRole::Tool,
ItemStatus::Failed,
tool_result_parts,
),
));
let image_parts = vec![
ContentPart::Text {
text: "Here is a mock image output.".to_string(),
},
ContentPart::Image {
path: format!("{prefix}/image.png"),
mime: Some("image/png".to_string()),
},
];
events.push(mock_item_event(
UniversalEventType::ItemStarted,
mock_item(
image_native.clone(),
ItemKind::Message,
ItemRole::Assistant,
ItemStatus::InProgress,
image_parts.clone(),
),
));
events.push(mock_item_event(
UniversalEventType::ItemCompleted,
mock_item(
image_native,
ItemKind::Message,
ItemRole::Assistant,
ItemStatus::Completed,
image_parts,
),
));
events.push(mock_item_event(
UniversalEventType::ItemStarted,
mock_item(
unknown_native.clone(),
ItemKind::Unknown,
ItemRole::Assistant,
ItemStatus::InProgress,
vec![ContentPart::Text {
text: "Unknown item kind example.".to_string(),
}],
),
));
events.push(mock_item_event(
UniversalEventType::ItemCompleted,
mock_item(
unknown_native,
ItemKind::Unknown,
ItemRole::Assistant,
ItemStatus::Completed,
vec![ContentPart::Text {
text: "Unknown item kind example.".to_string(),
}],
),
));
let permission_metadata = json!({
"codexRequestKind": "commandExecution",
"command": "echo mock"
});
events.push(EventConversion::new(
UniversalEventType::PermissionRequested,
UniversalEventData::Permission(PermissionEventData {
permission_id: permission_id.clone(),
action: "command_execution".to_string(),
status: PermissionStatus::Requested,
metadata: Some(permission_metadata),
}),
));
events.push(EventConversion::new(
UniversalEventType::PermissionResolved,
UniversalEventData::Permission(PermissionEventData {
permission_id: permission_id,
action: "command_execution".to_string(),
status: PermissionStatus::Approved,
metadata: None,
}),
));
let permission_metadata_deny = json!({
"codexRequestKind": "fileChange",
"path": format!("{prefix}/deny.txt")
});
events.push(EventConversion::new(
UniversalEventType::PermissionRequested,
UniversalEventData::Permission(PermissionEventData {
permission_id: permission_deny_id.clone(),
action: "file_change".to_string(),
status: PermissionStatus::Requested,
metadata: Some(permission_metadata_deny),
}),
));
events.push(EventConversion::new(
UniversalEventType::PermissionResolved,
UniversalEventData::Permission(PermissionEventData {
permission_id: permission_deny_id,
action: "file_change".to_string(),
status: PermissionStatus::Denied,
metadata: None,
}),
));
events.push(EventConversion::new(
UniversalEventType::QuestionRequested,
UniversalEventData::Question(QuestionEventData {
question_id: question_id.clone(),
prompt: "Choose a color".to_string(),
options: vec!["Red".to_string(), "Blue".to_string()],
response: None,
status: QuestionStatus::Requested,
}),
));
events.push(EventConversion::new(
UniversalEventType::QuestionResolved,
UniversalEventData::Question(QuestionEventData {
question_id: question_id,
prompt: "Choose a color".to_string(),
options: vec!["Red".to_string(), "Blue".to_string()],
response: Some("Blue".to_string()),
status: QuestionStatus::Answered,
}),
));
events.push(EventConversion::new(
UniversalEventType::QuestionRequested,
UniversalEventData::Question(QuestionEventData {
question_id: question_reject_id.clone(),
prompt: "Allow mock experiment?".to_string(),
options: vec!["Yes".to_string(), "No".to_string()],
response: None,
status: QuestionStatus::Requested,
}),
));
events.push(EventConversion::new(
UniversalEventType::QuestionResolved,
UniversalEventData::Question(QuestionEventData {
question_id: question_reject_id,
prompt: "Allow mock experiment?".to_string(),
options: vec!["Yes".to_string(), "No".to_string()],
response: None,
status: QuestionStatus::Rejected,
}),
));
events.push(
EventConversion::new(
UniversalEventType::Error,
UniversalEventData::Error(ErrorData {
message: "Mock error event.".to_string(),
code: Some("mock_error".to_string()),
details: Some(json!({ "cycle": loop_index })),
}),
)
.synthetic(),
);
events.push(agent_unparsed(
"mock.stream",
"unsupported payload",
json!({ "raw": "mock" }),
));
events.push(
EventConversion::new(
UniversalEventType::SessionEnded,
UniversalEventData::SessionEnded(SessionEndedData {
reason: SessionEndReason::Completed,
terminated_by: TerminatedBy::Agent,
}),
)
.synthetic(),
);
events
}
fn mock_item(
native_item_id: String,
kind: ItemKind,
role: ItemRole,
status: ItemStatus,
content: Vec<ContentPart>,
) -> UniversalItem {
UniversalItem {
item_id: String::new(),
native_item_id: Some(native_item_id),
parent_id: None,
kind,
role: Some(role),
content,
status,
}
}
fn mock_item_event(event_type: UniversalEventType, item: UniversalItem) -> EventConversion {
EventConversion::new(
event_type,
UniversalEventData::Item(ItemEventData { item }),
)
}
fn mock_delta(native_item_id: String, delta: &str) -> EventConversion {
EventConversion::new(
UniversalEventType::ItemDelta,
UniversalEventData::ItemDelta(ItemDeltaData {
item_id: String::new(),
native_item_id: Some(native_item_id),
delta: delta.to_string(),
}),
)
}
fn agent_unparsed(location: &str, error: &str, raw: Value) -> EventConversion {
EventConversion::new(
UniversalEventType::AgentUnparsed,

View file

@ -17,6 +17,7 @@ use sandbox_agent::router::{
AgentCapabilities,
AgentListResponse,
AuthConfig,
MockConfig,
};
const PROMPT: &str = "Reply with exactly the single word OK.";
@ -41,7 +42,11 @@ impl TestApp {
let install_dir = tempfile::tempdir().expect("create temp install dir");
let manager = AgentManager::new(install_dir.path())
.expect("create agent manager");
let state = sandbox_agent::router::AppState::new(AuthConfig::disabled(), manager);
let state = sandbox_agent::router::AppState::new(
AuthConfig::disabled(),
manager,
MockConfig::disabled(),
);
let app = build_router(state);
Self {
app,

View file

@ -12,7 +12,7 @@ use tempfile::TempDir;
use sandbox_agent_agent_management::agents::{AgentId, AgentManager};
use sandbox_agent_agent_management::testing::{test_agents_from_env, TestAgentConfig};
use sandbox_agent_agent_credentials::ExtractedCredentials;
use sandbox_agent::router::{build_router, AppState, AuthConfig};
use sandbox_agent::router::{build_router, AppState, AuthConfig, MockConfig};
use tower::util::ServiceExt;
use tower_http::cors::CorsLayer;
@ -39,7 +39,7 @@ impl TestApp {
let install_dir = tempfile::tempdir().expect("create temp install dir");
let manager = AgentManager::new(install_dir.path())
.expect("create agent manager");
let state = AppState::new(auth, manager);
let state = AppState::new(auth, manager, MockConfig::disabled());
let mut app = build_router(state);
if let Some(cors) = cors {
app = app.layer(cors);

View file

@ -2,7 +2,7 @@ use axum::body::Body;
use axum::http::{Request, StatusCode};
use http_body_util::BodyExt;
use sandbox_agent_agent_management::agents::AgentManager;
use sandbox_agent::router::{build_router, AppState, AuthConfig};
use sandbox_agent::router::{build_router, AppState, AuthConfig, MockConfig};
use sandbox_agent::ui;
use tempfile::TempDir;
use tower::util::ServiceExt;
@ -15,7 +15,7 @@ async fn serves_inspector_ui() {
let install_dir = TempDir::new().expect("create temp install dir");
let manager = AgentManager::new(install_dir.path()).expect("create agent manager");
let state = AppState::new(AuthConfig::disabled(), manager);
let state = AppState::new(AuthConfig::disabled(), manager, MockConfig::disabled());
let app = build_router(state);
let request = Request::builder()

View file

@ -1,5 +0,0 @@
# Open Questions / Ambiguities
- OpenCode server HTTP paths and payloads may differ; current implementation assumes `POST /session`, `POST /session/{id}/prompt`, and `GET /event/subscribe` with JSON `data:` SSE frames.
- OpenCode question/permission reply endpoints are assumed as `POST /question/reply`, `/question/reject`, `/permission/reply` with `requestID` fields; confirm actual API shape.
- SSE events may not always include `sessionID`/`sessionId` fields; confirm if filtering should use a different field.

View file

@ -1,7 +0,0 @@
# Required Tests
- Session manager streams JSONL line-by-line for Claude/Codex/Amp and yields incremental events.
- `/sessions/{id}/messages` returns immediately while background ingestion populates `/events` and `/events/sse`.
- SSE subscription delivers live events after the initial offset batch.
- OpenCode server mode: create session, send prompt, and receive SSE events filtered to the session.
- OpenCode question/permission reply endpoints forward to server APIs.

View file

@ -1,553 +0,0 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "UniversalEvent",
"type": "object",
"required": [
"data",
"event_id",
"sequence",
"session_id",
"source",
"synthetic",
"time",
"type"
],
"properties": {
"data": {
"$ref": "#/definitions/UniversalEventData"
},
"event_id": {
"type": "string"
},
"native_session_id": {
"type": [
"string",
"null"
]
},
"raw": true,
"sequence": {
"type": "integer",
"format": "uint64",
"minimum": 0.0
},
"session_id": {
"type": "string"
},
"source": {
"$ref": "#/definitions/EventSource"
},
"synthetic": {
"type": "boolean"
},
"time": {
"type": "string"
},
"type": {
"$ref": "#/definitions/UniversalEventType"
}
},
"definitions": {
"AgentUnparsedData": {
"type": "object",
"required": [
"error",
"location"
],
"properties": {
"error": {
"type": "string"
},
"location": {
"type": "string"
},
"raw_hash": {
"type": [
"string",
"null"
]
}
}
},
"ContentPart": {
"oneOf": [
{
"type": "object",
"required": [
"text",
"type"
],
"properties": {
"text": {
"type": "string"
},
"type": {
"type": "string",
"enum": [
"text"
]
}
}
},
{
"type": "object",
"required": [
"json",
"type"
],
"properties": {
"json": true,
"type": {
"type": "string",
"enum": [
"json"
]
}
}
},
{
"type": "object",
"required": [
"arguments",
"call_id",
"name",
"type"
],
"properties": {
"arguments": {
"type": "string"
},
"call_id": {
"type": "string"
},
"name": {
"type": "string"
},
"type": {
"type": "string",
"enum": [
"tool_call"
]
}
}
},
{
"type": "object",
"required": [
"call_id",
"output",
"type"
],
"properties": {
"call_id": {
"type": "string"
},
"output": {
"type": "string"
},
"type": {
"type": "string",
"enum": [
"tool_result"
]
}
}
},
{
"type": "object",
"required": [
"action",
"path",
"type"
],
"properties": {
"action": {
"$ref": "#/definitions/FileAction"
},
"diff": {
"type": [
"string",
"null"
]
},
"path": {
"type": "string"
},
"type": {
"type": "string",
"enum": [
"file_ref"
]
}
}
},
{
"type": "object",
"required": [
"text",
"type",
"visibility"
],
"properties": {
"text": {
"type": "string"
},
"type": {
"type": "string",
"enum": [
"reasoning"
]
},
"visibility": {
"$ref": "#/definitions/ReasoningVisibility"
}
}
},
{
"type": "object",
"required": [
"path",
"type"
],
"properties": {
"mime": {
"type": [
"string",
"null"
]
},
"path": {
"type": "string"
},
"type": {
"type": "string",
"enum": [
"image"
]
}
}
},
{
"type": "object",
"required": [
"label",
"type"
],
"properties": {
"detail": {
"type": [
"string",
"null"
]
},
"label": {
"type": "string"
},
"type": {
"type": "string",
"enum": [
"status"
]
}
}
}
]
},
"ErrorData": {
"type": "object",
"required": [
"message"
],
"properties": {
"code": {
"type": [
"string",
"null"
]
},
"details": true,
"message": {
"type": "string"
}
}
},
"EventSource": {
"type": "string",
"enum": [
"agent",
"daemon"
]
},
"FileAction": {
"type": "string",
"enum": [
"read",
"write",
"patch"
]
},
"ItemDeltaData": {
"type": "object",
"required": [
"delta",
"item_id"
],
"properties": {
"delta": {
"type": "string"
},
"item_id": {
"type": "string"
},
"native_item_id": {
"type": [
"string",
"null"
]
}
}
},
"ItemEventData": {
"type": "object",
"required": [
"item"
],
"properties": {
"item": {
"$ref": "#/definitions/UniversalItem"
}
}
},
"ItemKind": {
"type": "string",
"enum": [
"message",
"tool_call",
"tool_result",
"system",
"status",
"unknown"
]
},
"ItemRole": {
"type": "string",
"enum": [
"user",
"assistant",
"system",
"tool"
]
},
"ItemStatus": {
"type": "string",
"enum": [
"in_progress",
"completed",
"failed"
]
},
"PermissionEventData": {
"type": "object",
"required": [
"action",
"permission_id",
"status"
],
"properties": {
"action": {
"type": "string"
},
"metadata": true,
"permission_id": {
"type": "string"
},
"status": {
"$ref": "#/definitions/PermissionStatus"
}
}
},
"PermissionStatus": {
"type": "string",
"enum": [
"requested",
"approved",
"denied"
]
},
"QuestionEventData": {
"type": "object",
"required": [
"options",
"prompt",
"question_id",
"status"
],
"properties": {
"options": {
"type": "array",
"items": {
"type": "string"
}
},
"prompt": {
"type": "string"
},
"question_id": {
"type": "string"
},
"response": {
"type": [
"string",
"null"
]
},
"status": {
"$ref": "#/definitions/QuestionStatus"
}
}
},
"QuestionStatus": {
"type": "string",
"enum": [
"requested",
"answered",
"rejected"
]
},
"ReasoningVisibility": {
"type": "string",
"enum": [
"public",
"private"
]
},
"SessionEndReason": {
"type": "string",
"enum": [
"completed",
"error",
"terminated"
]
},
"SessionEndedData": {
"type": "object",
"required": [
"reason",
"terminated_by"
],
"properties": {
"reason": {
"$ref": "#/definitions/SessionEndReason"
},
"terminated_by": {
"$ref": "#/definitions/TerminatedBy"
}
}
},
"SessionStartedData": {
"type": "object",
"properties": {
"metadata": true
}
},
"TerminatedBy": {
"type": "string",
"enum": [
"agent",
"daemon"
]
},
"UniversalEventData": {
"anyOf": [
{
"$ref": "#/definitions/SessionStartedData"
},
{
"$ref": "#/definitions/SessionEndedData"
},
{
"$ref": "#/definitions/ItemEventData"
},
{
"$ref": "#/definitions/ItemDeltaData"
},
{
"$ref": "#/definitions/ErrorData"
},
{
"$ref": "#/definitions/PermissionEventData"
},
{
"$ref": "#/definitions/QuestionEventData"
},
{
"$ref": "#/definitions/AgentUnparsedData"
}
]
},
"UniversalEventType": {
"type": "string",
"enum": [
"session.started",
"session.ended",
"item.started",
"item.delta",
"item.completed",
"error",
"permission.requested",
"permission.resolved",
"question.requested",
"question.resolved",
"agent.unparsed"
]
},
"UniversalItem": {
"type": "object",
"required": [
"content",
"item_id",
"kind",
"status"
],
"properties": {
"content": {
"type": "array",
"items": {
"$ref": "#/definitions/ContentPart"
}
},
"item_id": {
"type": "string"
},
"kind": {
"$ref": "#/definitions/ItemKind"
},
"native_item_id": {
"type": [
"string",
"null"
]
},
"parent_id": {
"type": [
"string",
"null"
]
},
"role": {
"anyOf": [
{
"$ref": "#/definitions/ItemRole"
},
{
"type": "null"
}
]
},
"status": {
"$ref": "#/definitions/ItemStatus"
}
}
}
}
}

View file

@ -1,143 +0,0 @@
# Universal Schema (Single Version, Breaking)
This document defines the canonical universal session + event model. It replaces prior versions; there is no v2. The design prioritizes compatibility with native agent APIs and fills gaps with explicit synthetics.
Principles
- Most-compatible-first: choose semantics that map cleanly to native APIs (Codex/OpenCode/Amp/Claude).
- Uniform behavior: clients should not special-case agents; the daemon normalizes differences.
- Synthetics fill gaps: when a provider lacks a feature (session start/end, deltas, user messages), we synthesize events with `source=daemon`.
- Raw preservation: always keep native payloads in `raw` for agent-sourced events.
- UI coverage: update the inspector/UI to the new schema and ensure UI tests cover all session features (messages, deltas, tools, permissions, questions, errors, termination).
Identifiers
- session_id: daemon-generated session identifier.
- native_session_id: provider thread/session/run identifier (thread_id is merged here).
- item_id: daemon-generated identifier for any universal item.
- native_item_id: provider-native item/message identifier if available; otherwise null.
Event envelope
```json
{
"event_id": "evt_...",
"sequence": 42,
"time": "2026-01-27T19:10:11Z",
"session_id": "sess_...",
"native_session_id": "provider_...",
"synthetic": false,
"source": "agent|daemon",
"type": "session.started|session.ended|item.started|item.delta|item.completed|error|permission.requested|permission.resolved|question.requested|question.resolved|agent.unparsed",
"data": { "..." : "..." },
"raw": { "..." : "..." }
}
```
Notes:
- `source=agent` for native events; `source=daemon` for synthetics.
- `synthetic` is always present and mirrors whether the event is daemon-produced.
- `raw` is always present. It may be null unless the client opts in to raw payloads; when opt-in is enabled, raw is populated for all events.
- For synthetic events derived from native payloads, include the underlying payload in `raw` when possible.
- Parsing failures emit agent.unparsed (source=daemon, synthetic=true) and should be treated as test failures.
Raw payload opt-in
- Events endpoints accept `include_raw=true` to populate the `raw` field.
- When `include_raw` is not set or false, `raw` is still present but null.
- Applies to both HTTP and SSE event streams.
Item model
```json
{
"item_id": "itm_...",
"native_item_id": "provider_item_...",
"parent_id": "itm_parent_or_null",
"kind": "message|tool_call|tool_result|system|status|unknown",
"role": "user|assistant|system|tool|null",
"content": [ { "type": "...", "...": "..." } ],
"status": "in_progress|completed|failed"
}
```
Content parts (non-exhaustive; extend as needed)
- text: `{ "type": "text", "text": "..." }`
- json: `{ "type": "json", "json": { ... } }`
- tool_call: `{ "type": "tool_call", "name": "...", "arguments": "...", "call_id": "..." }`
- tool_result: `{ "type": "tool_result", "call_id": "...", "output": "..." }`
- file_ref: `{ "type": "file_ref", "path": "...", "action": "read|write|patch", "diff": "..." }`
- reasoning: `{ "type": "reasoning", "text": "...", "visibility": "public|private" }`
- image: `{ "type": "image", "path": "...", "mime": "..." }`
- status: `{ "type": "status", "label": "...", "detail": "..." }`
Event types
session.started
```json
{ "metadata": { "...": "..." } }
```
session.ended
```json
{ "reason": "completed|error|terminated", "terminated_by": "agent|daemon" }
```
item.started
```json
{ "item": { ...Item } }
```
item.delta
```json
{ "item_id": "itm_...", "native_item_id": "provider_item_or_null", "delta": "text fragment" }
```
item.completed
```json
{ "item": { ...Item } }
```
error
```json
{ "message": "...", "code": "optional", "details": { "...": "..." } }
```
agent.unparsed
```json
{ "error": "parse failure message", "location": "agent parser name", "raw_hash": "optional" }
```
permission.requested / permission.resolved
```json
{ "permission_id": "...", "action": "...", "status": "requested|approved|denied", "metadata": { "...": "..." } }
```
question.requested / question.resolved
```json
{ "question_id": "...", "prompt": "...", "options": ["..."], "response": "...", "status": "requested|answered|rejected" }
```
Delta policy (uniform across agents)
- Always emit item.delta for messages.
- For agents without native deltas (Claude/Amp), emit a single synthetic delta containing the full final content immediately before item.completed.
- For Codex/OpenCode, forward native deltas as-is and still emit item.completed with the final content.
User messages
- If the provider emits user messages (Codex/OpenCode/Amp), map directly to message items with role=user.
- If the provider does not emit user messages (Claude), synthesize user message items from the input we send; mark source=daemon and set native_item_id=null.
Tool normalization
- Tool calls/results are always emitted as their own items (kind=tool_call/tool_result) with parent_id pointing to the originating message item.
- Codex: mcp tool call progress and tool items map directly.
- OpenCode: tool parts in message.part.updated are mapped into tool items with lifecycle states.
- Amp: tool_call/tool_result map directly.
- Claude: synthesize tool items from CLI tool usage where possible; if insufficient, omit tool items and preserve raw payloads.
OpenCode ordering rule
- OpenCode may emit message.part.updated before message.updated.
- When a part delta arrives first, create a stub item.started (source=daemon) for the parent message item, then emit item.delta.
Session lifecycle
- If an agent does not emit a session start/end, emit session.started/session.ended synthetically (source=daemon).
- session.ended uses terminated_by=daemon when our termination API is used; terminated_by=agent when the provider ends the session.
Native ID mapping
- native_session_id is the only provider session identifier.
- native_item_id preserves the provider item/message id when available; otherwise null.
- item_id is always daemon-generated.

118
todo.md
View file

@ -1,116 +1,4 @@
# TODO (from spec.md)
# Todo
## Universal API + Types
- [x] Define universal base types for agent input/output (common denominator across schemas)
- [x] Add universal question + permission types (HITL) and ensure they are supported end-to-end
- [x] Define `UniversalEvent` + `UniversalEventData` union and `AgentError` shape
- [x] Define a universal message type for "failed to parse" with raw JSON payload
- [x] Implement 2-way converters:
- [x] Universal input message <-> agent-specific input
- [x] Universal event <-> agent-specific event
- [x] Normalize Claude system/init events into universal started events
- [x] Support Codex CLI type-based event format in universal converter
- [x] Enforce agentMode vs permissionMode semantics + defaults at the API boundary
- [x] Ensure session id vs agentSessionId semantics are respected and surfaced consistently
## Daemon (Rust HTTP server)
- [x] Build axum router + utoipa + schemars integration
- [x] Implement RFC 7807 Problem Details error responses backed by a `thiserror` enum
- [x] Implement canonical error `type` values + required error variants from spec
- [x] Implement offset semantics for events (exclusive last-seen id, default offset 0)
- [x] Implement SSE endpoint for events with same semantics as JSON endpoint
- [x] Replace in-memory session store with sandbox session manager (questions/permissions routing, long-lived processes)
- [x] Remove legacy token header support
- [x] Embed inspector frontend and serve it at `/ui`
- [x] Log inspector URL when starting the HTTP server
## CLI
- [x] Implement clap CLI flags: `--token`, `--no-token`, `--host`, `--port`, CORS flags
- [x] Implement a CLI endpoint for every HTTP endpoint
- [x] Update `CLAUDE.md` to keep CLI endpoints in sync with HTTP API changes
- [x] Prefix CLI API requests with `/v1`
- [x] Add CLI credentials extractor subcommand
- [x] Move daemon startup to `server` subcommand
- [x] Add `sandbox-daemon` CLI alias
## HTTP API Endpoints
- [x] POST `/agents/{}/install` with `reinstall` handling
- [x] GET `/agents/{}/modes` (mode discovery or hardcoded)
- [x] GET `/agents` (installed/version/path; version checked at request time)
- [x] POST `/sessions/{}` (create session, install if needed, return health + agentSessionId)
- [x] POST `/sessions/{}/messages` (send prompt)
- [x] GET `/sessions/{}/events` (pagination with offset/limit)
- [x] GET `/sessions/{}/events/sse` (streaming)
- [x] POST `/sessions/{}/questions/{questionId}/reply`
- [x] POST `/sessions/{}/questions/{questionId}/reject`
- [x] POST `/sessions/{}/permissions/{permissionId}/reply`
- [x] Prefix all HTTP API endpoints with `/v1`
## Agent Management
- [x] Implement install/version/spawn basics for Claude/Codex/OpenCode/Amp
- [x] Implement agent install URL patterns + platform mappings for supported OS/arch
- [x] Parse JSONL output for subprocess agents and extract session/result metadata
- [x] Migrate Codex subprocess to App Server JSON-RPC protocol
- [x] Map permissionMode to agent CLI flags (Claude/Codex/Amp)
- [x] Implement session resume flags for Claude/OpenCode/Amp (Codex unsupported)
- [x] Replace sandbox-agent core agent modules with new agent-management crate (delete originals)
- [x] Stabilize agent-management crate API and fix build issues (sandbox-agent currently wired to WIP crate)
- [x] Implement OpenCode shared server lifecycle (`opencode serve`, health, restart)
- [x] Implement OpenCode HTTP session APIs + SSE event stream integration
- [x] Implement JSONL parsing for subprocess agents and map to `UniversalEvent`
- [x] Capture agent session id from events and expose as `agentSessionId`
- [x] Handle agent process exit and map to `agent_process_exited` error
- [x] Implement agentMode discovery rules (OpenCode API, hardcoded others)
- [x] Enforce permissionMode behavior (default/plan/bypass) for subprocesses
## Credentials
- [x] Implement credential extraction module (Claude/Codex/OpenCode)
- [x] Add Amp credential extraction (config-based)
- [x] Move credential extraction into `agent-credentials` crate
- [ ] Pass extracted credentials into subprocess env vars per agent
- [ ] Ensure OpenCode server reads credentials from config on startup
## Testing
- [ ] Build a universal agent test suite that exercises all features (messages, questions, permissions, etc.) using HTTP API
- [ ] Run the full suite against every agent (Claude/Codex/OpenCode/Amp) without mocks
- [x] Add real install/version/spawn tests for Claude/Codex/OpenCode (Amp conditional)
- [x] Expand agent lifecycle tests (reinstall, session id extraction, resume, plan mode)
- [x] Add OpenCode server-mode tests (session create, prompt, SSE)
- [ ] Add tests for question/permission flows using deterministic prompts
- [x] Add HTTP/SSE snapshot tests for real agents (env-configured)
- [x] Add snapshot coverage for auth, CORS, and concurrent sessions
- [x] Add inspector UI route test
## Frontend (frontend/packages/inspector)
- [x] Build Vite + React app with connect screen (endpoint + optional token)
- [x] Add instructions to run sandbox-agent (including CORS)
- [x] Implement full agent UI covering all features
- [x] Add HTTP request log with copyable curl command
- [x] Add Content-Type header to CORS callout command
- [x] Default inspector endpoint to current origin and auto-connect via health check
- [x] Update inspector to universal schema events (items, deltas, approvals, errors)
## TypeScript SDK
- [x] Generate OpenAPI from utoipa and run `openapi-typescript`
- [x] Implement a thin fetch-based client wrapper
- [x] Update `CLAUDE.md` to require SDK + CLI updates when API changes
- [x] Prefix SDK requests with `/v1`
## Examples + Tests
- [ ] Add examples for Docker, E2B, Daytona, Vercel Sandboxes, Cloudflare Sandboxes
- [ ] Add Vitest unit test for each example (Cloudflare requires special setup)
## Documentation
- [ ] Write README covering architecture, agent compatibility, and deployment guide
- [ ] Add universal API feature checklist (questions, approve plan, etc.)
- [ ] Document CLI, HTTP API, frontend app, and TypeScript SDK usage
- [ ] Use collapsible sections for endpoints and SDK methods
- [x] Integrate OpenAPI spec with Mintlify (docs/openapi.json + validation)
---
- [x] implement release pipeline
- implement e2b example
- implement typescript "start locally" by pulling form server using version
- [x] Move agent schema sources to src/agents
- [x] Add Vercel AI SDK UIMessage schema extractor
- [x] Add server --mock mode with looping mock session events.
- [x] Document mock mode in building chat UI docs and update CLAUDE.md guidance.