mirror of https://github.com/harivansh-afk/sandbox-agent.git synced 2026-04-15 10:05:18 +00:00

Nathan Flurry 2f26f76d9b feat: add raw session args/opts for agent passthrough

2026-02-05 15:20:50 -08:00

13 KiB

Raw Blame History

Codex Research

Research notes on OpenAI Codex's configuration, credential discovery, and runtime behavior based on agent-jj implementation.

Overview

Provider: OpenAI
Execution Method (this repo): Codex App Server (JSON-RPC over stdio)
Execution Method (alternatives): SDK (@openai/codex-sdk) or CLI binary
Session Persistence: Thread ID (string)
Import: Dynamic import to avoid bundling issues
Binary Location: ~/.nvm/versions/node/v24.3.0/bin/codex (npm global install)

SDK Architecture

The SDK wraps a bundled binary - it does NOT make direct API calls.

The TypeScript SDK includes a pre-compiled Codex binary
When you use the SDK, it spawns this binary as a child process
Communication happens via stdin/stdout using JSONL (JSON Lines) format
The binary itself handles the actual communication with OpenAI's backend services

Sources: Codex SDK docs, GitHub

CLI Usage (Alternative to App Server / SDK)

You can use the codex binary directly instead of the SDK:

Interactive Mode

codex "your prompt here"
codex --model o3 "your prompt"

Non-Interactive Mode (`codex exec`)

codex exec "your prompt here"
codex exec --json "your prompt"  # JSONL output
codex exec -m o3 "your prompt"
codex exec --dangerously-bypass-approvals-and-sandbox "prompt"
codex exec resume --last  # Resume previous session

Custom Args (CLI Flags)

Core Flags

Flag	Type	Description
`-m, --model MODEL`	string	Model to use (e.g., `o3`, `gpt-4o`)
`--json`	bool	Print events to stdout as JSONL
`-C, --cd DIR`	path	Working directory for the agent
`-o, --output-last-message FILE`	path	Write final response to file

Permission & Sandbox Flags

Flag	Type	Values	Description
`-s, --sandbox MODE`	enum	`read-only`, `workspace-write`, `danger-full-access`	Sandbox policy for shell commands
`-a, --ask-for-approval POLICY`	enum	`untrusted`, `on-failure`, `on-request`, `never`	When to require human approval
`--full-auto`	bool	-	Convenience alias: `-a on-request --sandbox workspace-write`
`--dangerously-bypass-approvals-and-sandbox`	bool	-	Skip all prompts and sandboxing (DANGEROUS)

Configuration Overrides

Flag	Type	Description
`-c, --config key=value`	string	Override config values (parsed as TOML)
`-p, --profile NAME`	string	Use a configuration profile from config.toml
`--enable FEATURE`	string	Enable a feature flag (repeatable)
`--disable FEATURE`	string	Disable a feature flag (repeatable)

Config override examples:

codex -c model="o3"
codex -c 'sandbox_permissions=["disk-full-read-access"]'
codex -c shell_environment_policy.inherit=all

Additional Capabilities

Flag	Type	Description
`-i, --image FILE`	path[]	Attach image(s) to the initial prompt (repeatable)
`--add-dir DIR`	path[]	Additional directories that should be writable (repeatable)
`--search`	bool	Enable live web search via `web_search` tool
`--output-schema FILE`	path	JSON Schema file for structured output
`--skip-git-repo-check`	bool	Allow running outside a Git repository
`--oss`	bool	Use local open source model provider (LM Studio/Ollama)
`--local-provider PROVIDER`	enum	`lmstudio`, `ollama`, `ollama-chat`
`--color COLOR`	enum	`always`, `never`, `auto`

Session Management

codex resume          # Pick from previous sessions
codex resume --last   # Resume most recent
codex fork --last     # Fork most recent session

Credential Discovery

Priority Order

User-configured credentials (from credentials array)
Environment variable: CODEX_API_KEY
Environment variable: OPENAI_API_KEY
Bootstrap extraction from config files

Config File Location

Path	Description
`~/.codex/auth.json`	Primary auth config

Auth File Structure

// API Key authentication
{
  "OPENAI_API_KEY": "sk-..."
}

// OAuth authentication
{
  "tokens": {
    "access_token": "..."
  }
}

SDK Usage

Client Initialization

import { Codex } from "@openai/codex-sdk";

// With API key
const codex = new Codex({ apiKey: "sk-..." });

// Without API key (uses default auth)
const codex = new Codex();

Dynamic import is used to avoid bundling the SDK:

const { Codex } = await import("@openai/codex-sdk");

Thread Management

// Start new thread
const thread = codex.startThread();

// Resume existing thread
const thread = codex.resumeThread(threadId);

Running Prompts

const { events } = await thread.runStreamed(prompt);

for await (const event of events) {
  // Process events
}

App Server Protocol (JSON-RPC)

Codex App Server uses JSON-RPC 2.0 over JSONL/stdin/stdout (no port required).

Key Requests

initialize → returns server info
thread/start → starts a new thread
turn/start → sends user input for a thread

Custom Args (JSON-RPC Parameters)

`thread/start` Parameters

Field	Type	Description
`approval_policy`	enum	`Never`, `Untrusted` - when to ask for approval
`sandbox`	enum	`ReadOnly`, `DangerFullAccess` - sandbox mode
`model`	string	Model to use for this thread
`cwd`	string	Working directory

`turn/start` Parameters

Field	Type	Description
`thread_id`	string	Thread ID from `thread/start` response
`input`	array	User input (e.g., `[{ "type": "text", "text": "..." }]`)
`approval_policy`	enum	Override approval policy for this turn
`sandbox_policy`	enum	Override sandbox policy for this turn
`model`	string	Override model for this turn
`cwd`	string	Override working directory
`effort`	string	Reasoning effort level
`output_schema`	object	JSON Schema for structured output
`summary`	string	Summary context for the turn
`collaboration_mode`	string	Collaboration mode (if supported)

App Server CLI Flags

Flag	Description
`-c, --config key=value`	Override config (same as interactive mode)
`--enable FEATURE`	Enable feature flag
`--disable FEATURE`	Disable feature flag
`--analytics-default-enabled`	Enable analytics by default (for first-party use)

Event Notifications (examples)

{ "method": "thread/started", "params": { "thread": { "id": "thread_abc123" } } }
{ "method": "item/completed", "params": { "item": { "type": "agentMessage", "text": "..." } } }
{ "method": "turn/completed", "params": { "threadId": "thread_abc123", "turn": { "items": [] } } }

Approval Requests (server → client)

The server can send JSON-RPC requests (with id) for approvals:

item/commandExecution/requestApproval
item/fileChange/requestApproval

These require JSON-RPC responses with a decision payload.

Response Schema

// CodexRunResultSchema
type CodexRunResult = string | {
  result?: string;
  output?: string;
  message?: string;
  // ...additional fields via passthrough
};

Content is extracted in priority order: result > output > message

Thread ID Retrieval

Thread ID can be obtained from multiple sources:

thread.started event's thread_id property
Thread object's id getter (after first turn)
Thread object's threadId or _id properties (fallbacks)

function getThreadId(thread: unknown): string | null {
  const value = thread as { id?: string; threadId?: string; _id?: string };
  return value.id ?? value.threadId ?? value._id ?? null;
}

Agent Modes vs Permission Modes

Codex separates sandbox levels (permissions) from behavioral modes (prompt prefixes).

Permission Modes (Sandbox Levels)

Mode	CLI Flag	Behavior
`read-only`	`-s read-only`	No file modifications
`workspace-write`	`-s workspace-write`	Can modify workspace files
`danger-full-access`	`-s danger-full-access`	Full system access
`bypass`	`--dangerously-bypass-approvals-and-sandbox`	Skip all checks

Agent Modes (Prompt Prefixes)

Codex doesn't have true agent modes - behavior is controlled via prompt prefixing:

Mode	Prompt Prefix
`build`	No prefix (default)
`plan`	`"Make a plan before acting.\n\n"`
`chat`	`"Answer conversationally.\n\n"`

function withModePrefix(prompt: string, mode: AgentMode): string {
  if (mode === "plan") {
    return `Make a plan before acting.\n\n${prompt}`;
  }
  if (mode === "chat") {
    return `Answer conversationally.\n\n${prompt}`;
  }
  return prompt;
}

Human-in-the-Loop

Codex has no interactive HITL in SDK mode. All permissions must be configured upfront via sandbox level.

Error Handling

turn.failed events are captured but don't throw
Thread ID is still returned on error for potential resumption
Events iterator may throw after errors - caught and logged

interface CodexPromptResult {
  result: unknown;
  threadId?: string | null;
  error?: string;  // Set if turn failed
}

Conversion to Universal Format

Codex output is converted via convertCodexOutput():

Parse with CodexRunResultSchema
If result is string, use directly
Otherwise extract from result, output, or message fields
Wrap as assistant message entry

Session Continuity

Thread ID persists across prompts
Use resumeThread(threadId) to continue conversation
Thread ID is captured from thread.started event or thread object

Shared App-Server Architecture (Daemon Implementation)

The sandbox daemon uses a single shared Codex app-server process to handle multiple sessions, similar to OpenCode's server model. This differs from Claude/Amp which spawn a new process per turn.

Architecture Comparison

Agent	Model	Process Lifetime	Session ID
Claude	Subprocess	Per-turn (killed on TurnCompleted)	`--resume` flag
Amp	Subprocess	Per-turn	`--continue` flag
OpenCode	HTTP Server	Daemon lifetime	Session ID via API
Codex	Stdio Server	Daemon lifetime	Thread ID via JSON-RPC

Daemon Flow

First Codex session created: Spawns codex app-server process, performs initialize/initialized handshake
Session creation: Sends thread/start request, captures thread_id as native_session_id
Message sent: Sends turn/start request with thread_id, streams notifications back to session
Multi-turn: Reuses same thread_id, process stays alive, no respawn needed
Daemon shutdown: Process terminated with daemon

Why This Approach?

Performance: No process spawn overhead per message
Multi-turn support: Thread persists in server memory, no resume needed
Consistent with OpenCode: Similar server-based pattern reduces code complexity
API alignment: Matches Codex's intended app-server usage pattern

Protocol Details

The shared server uses JSON-RPC 2.0 for request/response correlation:

Daemon                           Codex App-Server
   |                                   |
   |-- initialize {id: 1} ------------>|
   |<-- response {id: 1} --------------|
   |-- initialized (notification) ---->|
   |                                   |
   |-- thread/start {id: 2} ---------->|
   |<-- response {id: 2, thread.id} ---|
   |<-- thread/started (notification) -|
   |                                   |
   |-- turn/start {id: 3, threadId} -->|
   |<-- turn/started (notification) ---|
   |<-- item/* (notifications) --------|
   |<-- turn/completed (notification) -|

Thread-to-Session Routing

Notifications are routed to the correct session by extracting threadId from each notification:

fn codex_thread_id_from_server_notification(notification) -> Option<String> {
    // All thread-scoped notifications include threadId field
    match notification {
        TurnStarted(params) => Some(params.thread_id),
        ItemCompleted(params) => Some(params.thread_id),
        // ... etc
    }
}

Notes

SDK is dynamically imported to reduce bundle size
No explicit timeout (relies on SDK defaults)
Thread ID may not be available until first event
Error messages are preserved for debugging
Working directory is not explicitly set (SDK handles internally)

13 KiB Raw Blame History