14 KiB
Codex Research
Research notes on OpenAI Codex's configuration, credential discovery, and runtime behavior based on agent-jj implementation.
Overview
- Provider: OpenAI
- Execution Method (this repo): Codex App Server (JSON-RPC over stdio)
- Execution Method (alternatives): SDK (
@openai/codex-sdk) or CLI binary - Session Persistence: Thread ID (string)
- Import: Dynamic import to avoid bundling issues
- Binary Location:
~/.nvm/versions/node/current/bin/codex(npm global install)
SDK Architecture
The SDK wraps a bundled binary - it does NOT make direct API calls.
- The TypeScript SDK includes a pre-compiled Codex binary
- When you use the SDK, it spawns this binary as a child process
- Communication happens via stdin/stdout using JSONL (JSON Lines) format
- The binary itself handles the actual communication with OpenAI's backend services
Sources: Codex SDK docs, GitHub
CLI Usage (Alternative to App Server / SDK)
You can use the codex binary directly instead of the SDK:
Interactive Mode
codex "your prompt here"
codex --model o3 "your prompt"
Non-Interactive Mode (codex exec)
codex exec "your prompt here"
codex exec --json "your prompt" # JSONL output
codex exec -m o3 "your prompt"
codex exec --dangerously-bypass-approvals-and-sandbox "prompt"
codex exec resume --last # Resume previous session
Key CLI Flags
| Flag | Description |
|---|---|
--json |
Print events to stdout as JSONL |
-m, --model MODEL |
Model to use |
-s, --sandbox MODE |
read-only, workspace-write, danger-full-access |
--full-auto |
Auto-approve with workspace-write sandbox |
--dangerously-bypass-approvals-and-sandbox |
Skip all prompts (dangerous) |
-C, --cd DIR |
Working directory |
-o, --output-last-message FILE |
Write final response to file |
--output-schema FILE |
JSON Schema for structured output |
Session Management
codex resume # Pick from previous sessions
codex resume --last # Resume most recent
codex fork --last # Fork most recent session
Credential Discovery
Priority Order
- User-configured credentials (from
credentialsarray) - Environment variable:
CODEX_API_KEY - Environment variable:
OPENAI_API_KEY - Bootstrap extraction from config files
Config File Location
| Path | Description |
|---|---|
~/.codex/auth.json |
Primary auth config |
Auth File Structure
// API Key authentication
{
"OPENAI_API_KEY": "sk-..."
}
// OAuth authentication
{
"tokens": {
"access_token": "..."
}
}
SDK Usage
Client Initialization
import { Codex } from "@openai/codex-sdk";
// With API key
const codex = new Codex({ apiKey: "sk-..." });
// Without API key (uses default auth)
const codex = new Codex();
Dynamic import is used to avoid bundling the SDK:
const { Codex } = await import("@openai/codex-sdk");
Thread Management
// Start new thread
const thread = codex.startThread();
// Resume existing thread
const thread = codex.resumeThread(threadId);
Running Prompts
const { events } = await thread.runStreamed(prompt);
for await (const event of events) {
// Process events
}
App Server Protocol (JSON-RPC)
Codex App Server uses JSON-RPC 2.0 over JSONL/stdin/stdout (no port required).
Key Requests
initialize→ returns server infothread/start→ starts a new threadturn/start→ sends user input for a thread
Event Notifications (examples)
{ "method": "thread/started", "params": { "thread": { "id": "thread_abc123" } } }
{ "method": "item/completed", "params": { "item": { "type": "agentMessage", "text": "..." } } }
{ "method": "turn/completed", "params": { "threadId": "thread_abc123", "turn": { "items": [] } } }
Approval Requests (server → client)
The server can send JSON-RPC requests (with id) for approvals:
item/commandExecution/requestApprovalitem/fileChange/requestApproval
These require JSON-RPC responses with a decision payload.
App Server WebSocket Transport (Experimental)
Codex app-server also supports an experimental WebSocket transport:
codex app-server --listen ws://127.0.0.1:4500
Transport constraints
- Listen URL must be
ws://IP:PORT(notlocalhost, nothttp://...) - One JSON-RPC message per WebSocket text frame
- Incoming: text frame JSON is parsed as a JSON-RPC message
- Outgoing: JSON-RPC messages are serialized and sent as text frames
- Ping/Pong is handled; binary frames are ignored
Connection lifecycle
- Each accepted socket becomes a distinct connection with its own session state
- Every connection must send
initializefirst - Sending non-
initializerequests before init returns"Not initialized" - Sending
initializetwice on the same connection returns"Already initialized" - Broadcast notifications are only sent to initialized connections
Operational notes
- WebSocket mode is currently marked experimental/unsupported upstream
- It is a raw WS server (no built-in TLS/auth); keep it on loopback or place it behind your own secure proxy/tunnel
Upstream implementation references (openai/codex main, commit 03adb5db)
codex-rs/app-server/src/transport.rscodex-rs/app-server/src/message_processor.rscodex-rs/app-server/README.md
Response Schema
// CodexRunResultSchema
type CodexRunResult = string | {
result?: string;
output?: string;
message?: string;
// ...additional fields via passthrough
};
Content is extracted in priority order: result > output > message
Thread ID Retrieval
Thread ID can be obtained from multiple sources:
thread.startedevent'sthread_idproperty- Thread object's
idgetter (after first turn) - Thread object's
threadIdor_idproperties (fallbacks)
function getThreadId(thread: unknown): string | null {
const value = thread as { id?: string; threadId?: string; _id?: string };
return value.id ?? value.threadId ?? value._id ?? null;
}
Agent Modes vs Permission Modes
Codex separates sandbox levels (permissions) from behavioral modes (prompt prefixes).
Permission Modes (Sandbox Levels)
| Mode | CLI Flag | Behavior |
|---|---|---|
read-only |
-s read-only |
No file modifications |
workspace-write |
-s workspace-write |
Can modify workspace files |
danger-full-access |
-s danger-full-access |
Full system access |
bypass |
--dangerously-bypass-approvals-and-sandbox |
Skip all checks |
Agent Modes (Prompt Prefixes)
Codex doesn't have true agent modes - behavior is controlled via prompt prefixing:
| Mode | Prompt Prefix |
|---|---|
build |
No prefix (default) |
plan |
"Make a plan before acting.\n\n" |
chat |
"Answer conversationally.\n\n" |
function withModePrefix(prompt: string, mode: AgentMode): string {
if (mode === "plan") {
return `Make a plan before acting.\n\n${prompt}`;
}
if (mode === "chat") {
return `Answer conversationally.\n\n${prompt}`;
}
return prompt;
}
Human-in-the-Loop
Codex has no interactive HITL in SDK mode. All permissions must be configured upfront via sandbox level.
Error Handling
turn.failedevents are captured but don't throw- Thread ID is still returned on error for potential resumption
- Events iterator may throw after errors - caught and logged
interface CodexPromptResult {
result: unknown;
threadId?: string | null;
error?: string; // Set if turn failed
}
Conversion to Universal Format
Codex output is converted via convertCodexOutput():
- Parse with
CodexRunResultSchema - If result is string, use directly
- Otherwise extract from
result,output, ormessagefields - Wrap as assistant message entry
Session Continuity
- Thread ID persists across prompts
- Use
resumeThread(threadId)to continue conversation - Thread ID is captured from
thread.startedevent or thread object
Shared App-Server Architecture (Daemon Implementation)
The sandbox daemon uses a single shared Codex app-server process to handle multiple sessions, similar to OpenCode's server model. This differs from Claude/Amp which spawn a new process per turn.
Architecture Comparison
| Agent | Model | Process Lifetime | Session ID |
|---|---|---|---|
| Claude | Subprocess | Per-turn (killed on TurnCompleted) | --resume flag |
| Amp | Subprocess | Per-turn | --continue flag |
| OpenCode | HTTP Server | Daemon lifetime | Session ID via API |
| Codex | Stdio Server | Daemon lifetime | Thread ID via JSON-RPC |
Daemon Flow
- First Codex session created: Spawns
codex app-serverprocess, performsinitialize/initializedhandshake - Session creation: Sends
thread/startrequest, capturesthread_idasnative_session_id - Message sent: Sends
turn/startrequest withthread_id, streams notifications back to session - Multi-turn: Reuses same
thread_id, process stays alive, no respawn needed - Daemon shutdown: Process terminated with daemon
Why This Approach?
- Performance: No process spawn overhead per message
- Multi-turn support: Thread persists in server memory, no resume needed
- Consistent with OpenCode: Similar server-based pattern reduces code complexity
- API alignment: Matches Codex's intended app-server usage pattern
Protocol Details
The shared server uses JSON-RPC 2.0 for request/response correlation:
Daemon Codex App-Server
| |
|-- initialize {id: 1} ------------>|
|<-- response {id: 1} --------------|
|-- initialized (notification) ---->|
| |
|-- thread/start {id: 2} ---------->|
|<-- response {id: 2, thread.id} ---|
|<-- thread/started (notification) -|
| |
|-- turn/start {id: 3, threadId} -->|
|<-- turn/started (notification) ---|
|<-- item/* (notifications) --------|
|<-- turn/completed (notification) -|
Thread-to-Session Routing
Notifications are routed to the correct session by extracting threadId from each notification:
fn codex_thread_id_from_server_notification(notification) -> Option<String> {
// All thread-scoped notifications include threadId field
match notification {
TurnStarted(params) => Some(params.thread_id),
ItemCompleted(params) => Some(params.thread_id),
// ... etc
}
}
Model Discovery
Codex exposes a model/list JSON-RPC method through its app-server process.
JSON-RPC Method
{
"jsonrpc": "2.0",
"id": 1,
"method": "model/list",
"params": {
"cursor": null,
"limit": null
}
}
Supports pagination via cursor and limit parameters. Defined in resources/agent-schemas/artifacts/json-schema/codex.json.
How to Replicate
Requires a running Codex app-server process. Send the JSON-RPC request to the app-server over stdio. The response contains the list of models available to the Codex instance (depends on configured API keys / providers).
Limitations
- Requires an active app-server process (cannot query models without starting one)
- No standalone CLI command like
codex models
Command Execution & Process Management
Agent Tool Execution
Codex executes commands via LocalShellAction. The agent proposes a command, and external clients approve/deny via JSON-RPC (item/commandExecution/requestApproval).
Command Source Tracking (ExecCommandSource)
Codex is the only agent that explicitly tracks who initiated a command at the protocol level:
{
"ExecCommandSource": {
"enum": ["agent", "user_shell", "unified_exec_startup", "unified_exec_interaction"]
}
}
| Source | Meaning |
|---|---|
agent |
Agent decided to run this command via tool call |
user_shell |
User ran a command in a shell (equivalent to Claude Code's ! prefix) |
unified_exec_startup |
Startup script ran this command |
unified_exec_interaction |
Interactive execution |
This means user-initiated shell commands are first-class protocol events in Codex, not a client-side hack like Claude Code's ! prefix.
Command Execution Events
Codex emits structured events for command execution:
exec_command_begin- Command started (includessource,command,cwd,turn_id)exec_command_output_delta- Streaming output chunk (includesstream: stdout|stderr)exec_command_end- Command completed (includesexit_code,source)
Parsed Command Analysis (CommandAction)
Codex provides semantic analysis of what a command does:
{
"commandActions": [
{ "type": "read", "path": "/src/main.ts" },
{ "type": "write", "path": "/src/utils.ts" },
{ "type": "install", "package": "lodash" }
]
}
Action types: read, write, listFiles, search, install, remove, other.
Comparison
| Capability | Supported? | Notes |
|---|---|---|
| Agent runs commands | Yes (LocalShellAction) |
With approval workflow |
| User runs commands → agent sees output | Yes (user_shell source) |
First-class protocol event |
| External API for command injection | Yes (JSON-RPC approval) | Can approve/deny before execution |
| Command source tracking | Yes (ExecCommandSource enum) |
Distinguishes agent vs user vs startup |
| Background process management | No | |
| PTY / interactive terminal | No |
Notes
- SDK is dynamically imported to reduce bundle size
- No explicit timeout (relies on SDK defaults)
- Thread ID may not be available until first event
- Error messages are preserved for debugging
- Working directory is not explicitly set (SDK handles internally)