sandbox-agent/research/agents/codex.md
2026-02-11 14:47:41 +00:00

14 KiB

Codex Research

Research notes on OpenAI Codex's configuration, credential discovery, and runtime behavior based on agent-jj implementation.

Overview

  • Provider: OpenAI
  • Execution Method (this repo): Codex App Server (JSON-RPC over stdio)
  • Execution Method (alternatives): SDK (@openai/codex-sdk) or CLI binary
  • Session Persistence: Thread ID (string)
  • Import: Dynamic import to avoid bundling issues
  • Binary Location: ~/.nvm/versions/node/v24.3.0/bin/codex (npm global install)

SDK Architecture

The SDK wraps a bundled binary - it does NOT make direct API calls.

  • The TypeScript SDK includes a pre-compiled Codex binary
  • When you use the SDK, it spawns this binary as a child process
  • Communication happens via stdin/stdout using JSONL (JSON Lines) format
  • The binary itself handles the actual communication with OpenAI's backend services

Sources: Codex SDK docs, GitHub

CLI Usage (Alternative to App Server / SDK)

You can use the codex binary directly instead of the SDK:

Interactive Mode

codex "your prompt here"
codex --model o3 "your prompt"

Non-Interactive Mode (codex exec)

codex exec "your prompt here"
codex exec --json "your prompt"  # JSONL output
codex exec -m o3 "your prompt"
codex exec --dangerously-bypass-approvals-and-sandbox "prompt"
codex exec resume --last  # Resume previous session

Key CLI Flags

Flag Description
--json Print events to stdout as JSONL
-m, --model MODEL Model to use
-s, --sandbox MODE read-only, workspace-write, danger-full-access
--full-auto Auto-approve with workspace-write sandbox
--dangerously-bypass-approvals-and-sandbox Skip all prompts (dangerous)
-C, --cd DIR Working directory
-o, --output-last-message FILE Write final response to file
--output-schema FILE JSON Schema for structured output

Session Management

codex resume          # Pick from previous sessions
codex resume --last   # Resume most recent
codex fork --last     # Fork most recent session

Credential Discovery

Priority Order

  1. User-configured credentials (from credentials array)
  2. Environment variable: CODEX_API_KEY
  3. Environment variable: OPENAI_API_KEY
  4. Bootstrap extraction from config files

Config File Location

Path Description
~/.codex/auth.json Primary auth config

Auth File Structure

// API Key authentication
{
  "OPENAI_API_KEY": "sk-..."
}

// OAuth authentication
{
  "tokens": {
    "access_token": "..."
  }
}

SDK Usage

Client Initialization

import { Codex } from "@openai/codex-sdk";

// With API key
const codex = new Codex({ apiKey: "sk-..." });

// Without API key (uses default auth)
const codex = new Codex();

Dynamic import is used to avoid bundling the SDK:

const { Codex } = await import("@openai/codex-sdk");

Thread Management

// Start new thread
const thread = codex.startThread();

// Resume existing thread
const thread = codex.resumeThread(threadId);

Running Prompts

const { events } = await thread.runStreamed(prompt);

for await (const event of events) {
  // Process events
}

App Server Protocol (JSON-RPC)

Codex App Server uses JSON-RPC 2.0 over JSONL/stdin/stdout (no port required).

Key Requests

  • initialize → returns server info
  • thread/start → starts a new thread
  • turn/start → sends user input for a thread

Event Notifications (examples)

{ "method": "thread/started", "params": { "thread": { "id": "thread_abc123" } } }
{ "method": "item/completed", "params": { "item": { "type": "agentMessage", "text": "..." } } }
{ "method": "turn/completed", "params": { "threadId": "thread_abc123", "turn": { "items": [] } } }

Approval Requests (server → client)

The server can send JSON-RPC requests (with id) for approvals:

  • item/commandExecution/requestApproval
  • item/fileChange/requestApproval

These require JSON-RPC responses with a decision payload.

App Server WebSocket Transport (Experimental)

Codex app-server also supports an experimental WebSocket transport:

codex app-server --listen ws://127.0.0.1:4500

Transport constraints

  • Listen URL must be ws://IP:PORT (not localhost, not http://...)
  • One JSON-RPC message per WebSocket text frame
  • Incoming: text frame JSON is parsed as a JSON-RPC message
  • Outgoing: JSON-RPC messages are serialized and sent as text frames
  • Ping/Pong is handled; binary frames are ignored

Connection lifecycle

  • Each accepted socket becomes a distinct connection with its own session state
  • Every connection must send initialize first
  • Sending non-initialize requests before init returns "Not initialized"
  • Sending initialize twice on the same connection returns "Already initialized"
  • Broadcast notifications are only sent to initialized connections

Operational notes

  • WebSocket mode is currently marked experimental/unsupported upstream
  • It is a raw WS server (no built-in TLS/auth); keep it on loopback or place it behind your own secure proxy/tunnel

Upstream implementation references (openai/codex main, commit 03adb5db)

  • codex-rs/app-server/src/transport.rs
  • codex-rs/app-server/src/message_processor.rs
  • codex-rs/app-server/README.md

Response Schema

// CodexRunResultSchema
type CodexRunResult = string | {
  result?: string;
  output?: string;
  message?: string;
  // ...additional fields via passthrough
};

Content is extracted in priority order: result > output > message

Thread ID Retrieval

Thread ID can be obtained from multiple sources:

  1. thread.started event's thread_id property
  2. Thread object's id getter (after first turn)
  3. Thread object's threadId or _id properties (fallbacks)
function getThreadId(thread: unknown): string | null {
  const value = thread as { id?: string; threadId?: string; _id?: string };
  return value.id ?? value.threadId ?? value._id ?? null;
}

Agent Modes vs Permission Modes

Codex separates sandbox levels (permissions) from behavioral modes (prompt prefixes).

Permission Modes (Sandbox Levels)

Mode CLI Flag Behavior
read-only -s read-only No file modifications
workspace-write -s workspace-write Can modify workspace files
danger-full-access -s danger-full-access Full system access
bypass --dangerously-bypass-approvals-and-sandbox Skip all checks

Agent Modes (Prompt Prefixes)

Codex doesn't have true agent modes - behavior is controlled via prompt prefixing:

Mode Prompt Prefix
build No prefix (default)
plan "Make a plan before acting.\n\n"
chat "Answer conversationally.\n\n"
function withModePrefix(prompt: string, mode: AgentMode): string {
  if (mode === "plan") {
    return `Make a plan before acting.\n\n${prompt}`;
  }
  if (mode === "chat") {
    return `Answer conversationally.\n\n${prompt}`;
  }
  return prompt;
}

Human-in-the-Loop

Codex has no interactive HITL in SDK mode. All permissions must be configured upfront via sandbox level.

Error Handling

  • turn.failed events are captured but don't throw
  • Thread ID is still returned on error for potential resumption
  • Events iterator may throw after errors - caught and logged
interface CodexPromptResult {
  result: unknown;
  threadId?: string | null;
  error?: string;  // Set if turn failed
}

Conversion to Universal Format

Codex output is converted via convertCodexOutput():

  1. Parse with CodexRunResultSchema
  2. If result is string, use directly
  3. Otherwise extract from result, output, or message fields
  4. Wrap as assistant message entry

Session Continuity

  • Thread ID persists across prompts
  • Use resumeThread(threadId) to continue conversation
  • Thread ID is captured from thread.started event or thread object

Shared App-Server Architecture (Daemon Implementation)

The sandbox daemon uses a single shared Codex app-server process to handle multiple sessions, similar to OpenCode's server model. This differs from Claude/Amp which spawn a new process per turn.

Architecture Comparison

Agent Model Process Lifetime Session ID
Claude Subprocess Per-turn (killed on TurnCompleted) --resume flag
Amp Subprocess Per-turn --continue flag
OpenCode HTTP Server Daemon lifetime Session ID via API
Codex Stdio Server Daemon lifetime Thread ID via JSON-RPC

Daemon Flow

  1. First Codex session created: Spawns codex app-server process, performs initialize/initialized handshake
  2. Session creation: Sends thread/start request, captures thread_id as native_session_id
  3. Message sent: Sends turn/start request with thread_id, streams notifications back to session
  4. Multi-turn: Reuses same thread_id, process stays alive, no respawn needed
  5. Daemon shutdown: Process terminated with daemon

Why This Approach?

  1. Performance: No process spawn overhead per message
  2. Multi-turn support: Thread persists in server memory, no resume needed
  3. Consistent with OpenCode: Similar server-based pattern reduces code complexity
  4. API alignment: Matches Codex's intended app-server usage pattern

Protocol Details

The shared server uses JSON-RPC 2.0 for request/response correlation:

Daemon                           Codex App-Server
   |                                   |
   |-- initialize {id: 1} ------------>|
   |<-- response {id: 1} --------------|
   |-- initialized (notification) ---->|
   |                                   |
   |-- thread/start {id: 2} ---------->|
   |<-- response {id: 2, thread.id} ---|
   |<-- thread/started (notification) -|
   |                                   |
   |-- turn/start {id: 3, threadId} -->|
   |<-- turn/started (notification) ---|
   |<-- item/* (notifications) --------|
   |<-- turn/completed (notification) -|

Thread-to-Session Routing

Notifications are routed to the correct session by extracting threadId from each notification:

fn codex_thread_id_from_server_notification(notification) -> Option<String> {
    // All thread-scoped notifications include threadId field
    match notification {
        TurnStarted(params) => Some(params.thread_id),
        ItemCompleted(params) => Some(params.thread_id),
        // ... etc
    }
}

Model Discovery

Codex exposes a model/list JSON-RPC method through its app-server process.

JSON-RPC Method

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "model/list",
  "params": {
    "cursor": null,
    "limit": null
  }
}

Supports pagination via cursor and limit parameters. Defined in resources/agent-schemas/artifacts/json-schema/codex.json.

How to Replicate

Requires a running Codex app-server process. Send the JSON-RPC request to the app-server over stdio. The response contains the list of models available to the Codex instance (depends on configured API keys / providers).

Limitations

  • Requires an active app-server process (cannot query models without starting one)
  • No standalone CLI command like codex models

Command Execution & Process Management

Agent Tool Execution

Codex executes commands via LocalShellAction. The agent proposes a command, and external clients approve/deny via JSON-RPC (item/commandExecution/requestApproval).

Command Source Tracking (ExecCommandSource)

Codex is the only agent that explicitly tracks who initiated a command at the protocol level:

{
  "ExecCommandSource": {
    "enum": ["agent", "user_shell", "unified_exec_startup", "unified_exec_interaction"]
  }
}
Source Meaning
agent Agent decided to run this command via tool call
user_shell User ran a command in a shell (equivalent to Claude Code's ! prefix)
unified_exec_startup Startup script ran this command
unified_exec_interaction Interactive execution

This means user-initiated shell commands are first-class protocol events in Codex, not a client-side hack like Claude Code's ! prefix.

Command Execution Events

Codex emits structured events for command execution:

  • exec_command_begin - Command started (includes source, command, cwd, turn_id)
  • exec_command_output_delta - Streaming output chunk (includes stream: stdout|stderr)
  • exec_command_end - Command completed (includes exit_code, source)

Parsed Command Analysis (CommandAction)

Codex provides semantic analysis of what a command does:

{
  "commandActions": [
    { "type": "read", "path": "/src/main.ts" },
    { "type": "write", "path": "/src/utils.ts" },
    { "type": "install", "package": "lodash" }
  ]
}

Action types: read, write, listFiles, search, install, remove, other.

Comparison

Capability Supported? Notes
Agent runs commands Yes (LocalShellAction) With approval workflow
User runs commands → agent sees output Yes (user_shell source) First-class protocol event
External API for command injection Yes (JSON-RPC approval) Can approve/deny before execution
Command source tracking Yes (ExecCommandSource enum) Distinguishes agent vs user vs startup
Background process management No
PTY / interactive terminal No

Notes

  • SDK is dynamically imported to reduce bundle size
  • No explicit timeout (relies on SDK defaults)
  • Thread ID may not be available until first event
  • Error messages are preserved for debugging
  • Working directory is not explicitly set (SDK handles internally)