mirror of https://github.com/harivansh-afk/sandbox-agent.git synced 2026-04-15 13:03:46 +00:00

2026-02-11 14:47:41 +00:00

14 KiB

Raw Blame History

Codex Research

Research notes on OpenAI Codex's configuration, credential discovery, and runtime behavior based on agent-jj implementation.

Overview

Provider: OpenAI
Execution Method (this repo): Codex App Server (JSON-RPC over stdio)
Execution Method (alternatives): SDK (@openai/codex-sdk) or CLI binary
Session Persistence: Thread ID (string)
Import: Dynamic import to avoid bundling issues
Binary Location: ~/.nvm/versions/node/v24.3.0/bin/codex (npm global install)

SDK Architecture

The SDK wraps a bundled binary - it does NOT make direct API calls.

The TypeScript SDK includes a pre-compiled Codex binary
When you use the SDK, it spawns this binary as a child process
Communication happens via stdin/stdout using JSONL (JSON Lines) format
The binary itself handles the actual communication with OpenAI's backend services

Sources: Codex SDK docs, GitHub

CLI Usage (Alternative to App Server / SDK)

You can use the codex binary directly instead of the SDK:

Interactive Mode

codex "your prompt here"
codex --model o3 "your prompt"

Non-Interactive Mode (`codex exec`)

codex exec "your prompt here"
codex exec --json "your prompt"  # JSONL output
codex exec -m o3 "your prompt"
codex exec --dangerously-bypass-approvals-and-sandbox "prompt"
codex exec resume --last  # Resume previous session

Key CLI Flags

Flag	Description
`--json`	Print events to stdout as JSONL
`-m, --model MODEL`	Model to use
`-s, --sandbox MODE`	`read-only`, `workspace-write`, `danger-full-access`
`--full-auto`	Auto-approve with workspace-write sandbox
`--dangerously-bypass-approvals-and-sandbox`	Skip all prompts (dangerous)
`-C, --cd DIR`	Working directory
`-o, --output-last-message FILE`	Write final response to file
`--output-schema FILE`	JSON Schema for structured output

Session Management

codex resume          # Pick from previous sessions
codex resume --last   # Resume most recent
codex fork --last     # Fork most recent session

Credential Discovery

Priority Order

User-configured credentials (from credentials array)
Environment variable: CODEX_API_KEY
Environment variable: OPENAI_API_KEY
Bootstrap extraction from config files

Config File Location

Path	Description
`~/.codex/auth.json`	Primary auth config

Auth File Structure

// API Key authentication
{
  "OPENAI_API_KEY": "sk-..."
}

// OAuth authentication
{
  "tokens": {
    "access_token": "..."
  }
}

SDK Usage

Client Initialization

import { Codex } from "@openai/codex-sdk";

// With API key
const codex = new Codex({ apiKey: "sk-..." });

// Without API key (uses default auth)
const codex = new Codex();

Dynamic import is used to avoid bundling the SDK:

const { Codex } = await import("@openai/codex-sdk");

Thread Management

// Start new thread
const thread = codex.startThread();

// Resume existing thread
const thread = codex.resumeThread(threadId);

Running Prompts

const { events } = await thread.runStreamed(prompt);

for await (const event of events) {
  // Process events
}

App Server Protocol (JSON-RPC)

Codex App Server uses JSON-RPC 2.0 over JSONL/stdin/stdout (no port required).

Key Requests

initialize → returns server info
thread/start → starts a new thread
turn/start → sends user input for a thread

Event Notifications (examples)

{ "method": "thread/started", "params": { "thread": { "id": "thread_abc123" } } }
{ "method": "item/completed", "params": { "item": { "type": "agentMessage", "text": "..." } } }
{ "method": "turn/completed", "params": { "threadId": "thread_abc123", "turn": { "items": [] } } }

Approval Requests (server → client)

The server can send JSON-RPC requests (with id) for approvals:

item/commandExecution/requestApproval
item/fileChange/requestApproval

These require JSON-RPC responses with a decision payload.

App Server WebSocket Transport (Experimental)

Codex app-server also supports an experimental WebSocket transport:

codex app-server --listen ws://127.0.0.1:4500

Transport constraints

Listen URL must be ws://IP:PORT (not localhost, not http://...)
One JSON-RPC message per WebSocket text frame
Incoming: text frame JSON is parsed as a JSON-RPC message
Outgoing: JSON-RPC messages are serialized and sent as text frames
Ping/Pong is handled; binary frames are ignored

Connection lifecycle

Each accepted socket becomes a distinct connection with its own session state
Every connection must send initialize first
Sending non-initialize requests before init returns "Not initialized"
Sending initialize twice on the same connection returns "Already initialized"
Broadcast notifications are only sent to initialized connections

Operational notes

WebSocket mode is currently marked experimental/unsupported upstream
It is a raw WS server (no built-in TLS/auth); keep it on loopback or place it behind your own secure proxy/tunnel

Upstream implementation references (openai/codex `main`, commit `03adb5db`)

codex-rs/app-server/src/transport.rs
codex-rs/app-server/src/message_processor.rs
codex-rs/app-server/README.md

Response Schema

// CodexRunResultSchema
type CodexRunResult = string | {
  result?: string;
  output?: string;
  message?: string;
  // ...additional fields via passthrough
};

Content is extracted in priority order: result > output > message

Thread ID Retrieval

Thread ID can be obtained from multiple sources:

thread.started event's thread_id property
Thread object's id getter (after first turn)
Thread object's threadId or _id properties (fallbacks)

function getThreadId(thread: unknown): string | null {
  const value = thread as { id?: string; threadId?: string; _id?: string };
  return value.id ?? value.threadId ?? value._id ?? null;
}

Agent Modes vs Permission Modes

Codex separates sandbox levels (permissions) from behavioral modes (prompt prefixes).

Permission Modes (Sandbox Levels)

Mode	CLI Flag	Behavior
`read-only`	`-s read-only`	No file modifications
`workspace-write`	`-s workspace-write`	Can modify workspace files
`danger-full-access`	`-s danger-full-access`	Full system access
`bypass`	`--dangerously-bypass-approvals-and-sandbox`	Skip all checks

Agent Modes (Prompt Prefixes)

Codex doesn't have true agent modes - behavior is controlled via prompt prefixing:

Mode	Prompt Prefix
`build`	No prefix (default)
`plan`	`"Make a plan before acting.\n\n"`
`chat`	`"Answer conversationally.\n\n"`

function withModePrefix(prompt: string, mode: AgentMode): string {
  if (mode === "plan") {
    return `Make a plan before acting.\n\n${prompt}`;
  }
  if (mode === "chat") {
    return `Answer conversationally.\n\n${prompt}`;
  }
  return prompt;
}

Human-in-the-Loop

Codex has no interactive HITL in SDK mode. All permissions must be configured upfront via sandbox level.

Error Handling

turn.failed events are captured but don't throw
Thread ID is still returned on error for potential resumption
Events iterator may throw after errors - caught and logged

interface CodexPromptResult {
  result: unknown;
  threadId?: string | null;
  error?: string;  // Set if turn failed
}

Conversion to Universal Format

Codex output is converted via convertCodexOutput():

Parse with CodexRunResultSchema
If result is string, use directly
Otherwise extract from result, output, or message fields
Wrap as assistant message entry

Session Continuity

Thread ID persists across prompts
Use resumeThread(threadId) to continue conversation
Thread ID is captured from thread.started event or thread object

Shared App-Server Architecture (Daemon Implementation)

The sandbox daemon uses a single shared Codex app-server process to handle multiple sessions, similar to OpenCode's server model. This differs from Claude/Amp which spawn a new process per turn.

Architecture Comparison

Agent	Model	Process Lifetime	Session ID
Claude	Subprocess	Per-turn (killed on TurnCompleted)	`--resume` flag
Amp	Subprocess	Per-turn	`--continue` flag
OpenCode	HTTP Server	Daemon lifetime	Session ID via API
Codex	Stdio Server	Daemon lifetime	Thread ID via JSON-RPC

Daemon Flow

First Codex session created: Spawns codex app-server process, performs initialize/initialized handshake
Session creation: Sends thread/start request, captures thread_id as native_session_id
Message sent: Sends turn/start request with thread_id, streams notifications back to session
Multi-turn: Reuses same thread_id, process stays alive, no respawn needed
Daemon shutdown: Process terminated with daemon

Why This Approach?

Performance: No process spawn overhead per message
Multi-turn support: Thread persists in server memory, no resume needed
Consistent with OpenCode: Similar server-based pattern reduces code complexity
API alignment: Matches Codex's intended app-server usage pattern

Protocol Details

The shared server uses JSON-RPC 2.0 for request/response correlation:

Daemon                           Codex App-Server
   |                                   |
   |-- initialize {id: 1} ------------>|
   |<-- response {id: 1} --------------|
   |-- initialized (notification) ---->|
   |                                   |
   |-- thread/start {id: 2} ---------->|
   |<-- response {id: 2, thread.id} ---|
   |<-- thread/started (notification) -|
   |                                   |
   |-- turn/start {id: 3, threadId} -->|
   |<-- turn/started (notification) ---|
   |<-- item/* (notifications) --------|
   |<-- turn/completed (notification) -|

Thread-to-Session Routing

Notifications are routed to the correct session by extracting threadId from each notification:

fn codex_thread_id_from_server_notification(notification) -> Option<String> {
    // All thread-scoped notifications include threadId field
    match notification {
        TurnStarted(params) => Some(params.thread_id),
        ItemCompleted(params) => Some(params.thread_id),
        // ... etc
    }
}

Model Discovery

Codex exposes a model/list JSON-RPC method through its app-server process.

JSON-RPC Method

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "model/list",
  "params": {
    "cursor": null,
    "limit": null
  }
}

Supports pagination via cursor and limit parameters. Defined in resources/agent-schemas/artifacts/json-schema/codex.json.

How to Replicate

Requires a running Codex app-server process. Send the JSON-RPC request to the app-server over stdio. The response contains the list of models available to the Codex instance (depends on configured API keys / providers).

Limitations

Requires an active app-server process (cannot query models without starting one)
No standalone CLI command like codex models

Command Execution & Process Management

Agent Tool Execution

Codex executes commands via LocalShellAction. The agent proposes a command, and external clients approve/deny via JSON-RPC (item/commandExecution/requestApproval).

Command Source Tracking (`ExecCommandSource`)

Codex is the only agent that explicitly tracks who initiated a command at the protocol level:

{
  "ExecCommandSource": {
    "enum": ["agent", "user_shell", "unified_exec_startup", "unified_exec_interaction"]
  }
}

Source	Meaning
`agent`	Agent decided to run this command via tool call
`user_shell`	User ran a command in a shell (equivalent to Claude Code's `!` prefix)
`unified_exec_startup`	Startup script ran this command
`unified_exec_interaction`	Interactive execution

This means user-initiated shell commands are first-class protocol events in Codex, not a client-side hack like Claude Code's ! prefix.

Command Execution Events

Codex emits structured events for command execution:

exec_command_begin - Command started (includes source, command, cwd, turn_id)
exec_command_output_delta - Streaming output chunk (includes stream: stdout|stderr)
exec_command_end - Command completed (includes exit_code, source)

Parsed Command Analysis (`CommandAction`)

Codex provides semantic analysis of what a command does:

{
  "commandActions": [
    { "type": "read", "path": "/src/main.ts" },
    { "type": "write", "path": "/src/utils.ts" },
    { "type": "install", "package": "lodash" }
  ]
}

Action types: read, write, listFiles, search, install, remove, other.

Comparison

Capability	Supported?	Notes
Agent runs commands	Yes (`LocalShellAction`)	With approval workflow
User runs commands → agent sees output	Yes (`user_shell` source)	First-class protocol event
External API for command injection	Yes (JSON-RPC approval)	Can approve/deny before execution
Command source tracking	Yes (`ExecCommandSource` enum)	Distinguishes agent vs user vs startup
Background process management	No
PTY / interactive terminal	No

Notes

SDK is dynamically imported to reduce bundle size
No explicit timeout (relies on SDK defaults)
Thread ID may not be available until first event
Error messages are preserved for debugging
Working directory is not explicitly set (SDK handles internally)

14 KiB Raw Blame History