mirror of
https://github.com/harivansh-afk/sandbox-agent.git
synced 2026-04-15 10:05:18 +00:00
453 lines
14 KiB
Markdown
453 lines
14 KiB
Markdown
# Codex Research
|
|
|
|
Research notes on OpenAI Codex's configuration, credential discovery, and runtime behavior based on agent-jj implementation.
|
|
|
|
## Overview
|
|
|
|
- **Provider**: OpenAI
|
|
- **Execution Method (this repo)**: Codex App Server (JSON-RPC over stdio)
|
|
- **Execution Method (alternatives)**: SDK (`@openai/codex-sdk`) or CLI binary
|
|
- **Session Persistence**: Thread ID (string)
|
|
- **Import**: Dynamic import to avoid bundling issues
|
|
- **Binary Location**: `~/.nvm/versions/node/current/bin/codex` (npm global install)
|
|
|
|
## SDK Architecture
|
|
|
|
**The SDK wraps a bundled binary** - it does NOT make direct API calls.
|
|
|
|
- The TypeScript SDK includes a pre-compiled Codex binary
|
|
- When you use the SDK, it spawns this binary as a child process
|
|
- Communication happens via stdin/stdout using JSONL (JSON Lines) format
|
|
- The binary itself handles the actual communication with OpenAI's backend services
|
|
|
|
Sources: [Codex SDK docs](https://developers.openai.com/codex/sdk/), [GitHub](https://github.com/openai/codex)
|
|
|
|
## CLI Usage (Alternative to App Server / SDK)
|
|
|
|
You can use the `codex` binary directly instead of the SDK:
|
|
|
|
### Interactive Mode
|
|
```bash
|
|
codex "your prompt here"
|
|
codex --model o3 "your prompt"
|
|
```
|
|
|
|
### Non-Interactive Mode (`codex exec`)
|
|
```bash
|
|
codex exec "your prompt here"
|
|
codex exec --json "your prompt" # JSONL output
|
|
codex exec -m o3 "your prompt"
|
|
codex exec --dangerously-bypass-approvals-and-sandbox "prompt"
|
|
codex exec resume --last # Resume previous session
|
|
```
|
|
|
|
### Key CLI Flags
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--json` | Print events to stdout as JSONL |
|
|
| `-m, --model MODEL` | Model to use |
|
|
| `-s, --sandbox MODE` | `read-only`, `workspace-write`, `danger-full-access` |
|
|
| `--full-auto` | Auto-approve with workspace-write sandbox |
|
|
| `--dangerously-bypass-approvals-and-sandbox` | Skip all prompts (dangerous) |
|
|
| `-C, --cd DIR` | Working directory |
|
|
| `-o, --output-last-message FILE` | Write final response to file |
|
|
| `--output-schema FILE` | JSON Schema for structured output |
|
|
|
|
### Session Management
|
|
```bash
|
|
codex resume # Pick from previous sessions
|
|
codex resume --last # Resume most recent
|
|
codex fork --last # Fork most recent session
|
|
```
|
|
|
|
## Credential Discovery
|
|
|
|
### Priority Order
|
|
|
|
1. User-configured credentials (from `credentials` array)
|
|
2. Environment variable: `CODEX_API_KEY`
|
|
3. Environment variable: `OPENAI_API_KEY`
|
|
4. Bootstrap extraction from config files
|
|
|
|
### Config File Location
|
|
|
|
| Path | Description |
|
|
|------|-------------|
|
|
| `~/.codex/auth.json` | Primary auth config |
|
|
|
|
### Auth File Structure
|
|
|
|
```json
|
|
// API Key authentication
|
|
{
|
|
"OPENAI_API_KEY": "sk-..."
|
|
}
|
|
|
|
// OAuth authentication
|
|
{
|
|
"tokens": {
|
|
"access_token": "..."
|
|
}
|
|
}
|
|
```
|
|
|
|
## SDK Usage
|
|
|
|
### Client Initialization
|
|
|
|
```typescript
|
|
import { Codex } from "@openai/codex-sdk";
|
|
|
|
// With API key
|
|
const codex = new Codex({ apiKey: "sk-..." });
|
|
|
|
// Without API key (uses default auth)
|
|
const codex = new Codex();
|
|
```
|
|
|
|
Dynamic import is used to avoid bundling the SDK:
|
|
```typescript
|
|
const { Codex } = await import("@openai/codex-sdk");
|
|
```
|
|
|
|
### Thread Management
|
|
|
|
```typescript
|
|
// Start new thread
|
|
const thread = codex.startThread();
|
|
|
|
// Resume existing thread
|
|
const thread = codex.resumeThread(threadId);
|
|
```
|
|
|
|
### Running Prompts
|
|
|
|
```typescript
|
|
const { events } = await thread.runStreamed(prompt);
|
|
|
|
for await (const event of events) {
|
|
// Process events
|
|
}
|
|
```
|
|
|
|
## App Server Protocol (JSON-RPC)
|
|
|
|
Codex App Server uses JSON-RPC 2.0 over JSONL/stdin/stdout (no port required).
|
|
|
|
### Key Requests
|
|
|
|
- `initialize` → returns server info
|
|
- `thread/start` → starts a new thread
|
|
- `turn/start` → sends user input for a thread
|
|
|
|
### Event Notifications (examples)
|
|
|
|
```json
|
|
{ "method": "thread/started", "params": { "thread": { "id": "thread_abc123" } } }
|
|
{ "method": "item/completed", "params": { "item": { "type": "agentMessage", "text": "..." } } }
|
|
{ "method": "turn/completed", "params": { "threadId": "thread_abc123", "turn": { "items": [] } } }
|
|
```
|
|
|
|
### Approval Requests (server → client)
|
|
|
|
The server can send JSON-RPC requests (with `id`) for approvals:
|
|
|
|
- `item/commandExecution/requestApproval`
|
|
- `item/fileChange/requestApproval`
|
|
|
|
These require JSON-RPC responses with a decision payload.
|
|
|
|
## App Server WebSocket Transport (Experimental)
|
|
|
|
Codex app-server also supports an experimental WebSocket transport:
|
|
|
|
```bash
|
|
codex app-server --listen ws://127.0.0.1:4500
|
|
```
|
|
|
|
### Transport constraints
|
|
|
|
- Listen URL must be `ws://IP:PORT` (not `localhost`, not `http://...`)
|
|
- One JSON-RPC message per WebSocket text frame
|
|
- Incoming: text frame JSON is parsed as a JSON-RPC message
|
|
- Outgoing: JSON-RPC messages are serialized and sent as text frames
|
|
- Ping/Pong is handled; binary frames are ignored
|
|
|
|
### Connection lifecycle
|
|
|
|
- Each accepted socket becomes a distinct connection with its own session state
|
|
- Every connection must send `initialize` first
|
|
- Sending non-`initialize` requests before init returns `"Not initialized"`
|
|
- Sending `initialize` twice on the same connection returns `"Already initialized"`
|
|
- Broadcast notifications are only sent to initialized connections
|
|
|
|
### Operational notes
|
|
|
|
- WebSocket mode is currently marked experimental/unsupported upstream
|
|
- It is a raw WS server (no built-in TLS/auth); keep it on loopback or place it behind your own secure proxy/tunnel
|
|
|
|
### Upstream implementation references (openai/codex `main`, commit `03adb5db`)
|
|
|
|
- `codex-rs/app-server/src/transport.rs`
|
|
- `codex-rs/app-server/src/message_processor.rs`
|
|
- `codex-rs/app-server/README.md`
|
|
|
|
## Response Schema
|
|
|
|
```typescript
|
|
// CodexRunResultSchema
|
|
type CodexRunResult = string | {
|
|
result?: string;
|
|
output?: string;
|
|
message?: string;
|
|
// ...additional fields via passthrough
|
|
};
|
|
```
|
|
|
|
Content is extracted in priority order: `result` > `output` > `message`
|
|
|
|
## Thread ID Retrieval
|
|
|
|
Thread ID can be obtained from multiple sources:
|
|
|
|
1. `thread.started` event's `thread_id` property
|
|
2. Thread object's `id` getter (after first turn)
|
|
3. Thread object's `threadId` or `_id` properties (fallbacks)
|
|
|
|
```typescript
|
|
function getThreadId(thread: unknown): string | null {
|
|
const value = thread as { id?: string; threadId?: string; _id?: string };
|
|
return value.id ?? value.threadId ?? value._id ?? null;
|
|
}
|
|
```
|
|
|
|
## Agent Modes vs Permission Modes
|
|
|
|
Codex separates sandbox levels (permissions) from behavioral modes (prompt prefixes).
|
|
|
|
### Permission Modes (Sandbox Levels)
|
|
|
|
| Mode | CLI Flag | Behavior |
|
|
|------|----------|----------|
|
|
| `read-only` | `-s read-only` | No file modifications |
|
|
| `workspace-write` | `-s workspace-write` | Can modify workspace files |
|
|
| `danger-full-access` | `-s danger-full-access` | Full system access |
|
|
| `bypass` | `--dangerously-bypass-approvals-and-sandbox` | Skip all checks |
|
|
|
|
### Agent Modes (Prompt Prefixes)
|
|
|
|
Codex doesn't have true agent modes - behavior is controlled via prompt prefixing:
|
|
|
|
| Mode | Prompt Prefix |
|
|
|------|---------------|
|
|
| `build` | No prefix (default) |
|
|
| `plan` | `"Make a plan before acting.\n\n"` |
|
|
| `chat` | `"Answer conversationally.\n\n"` |
|
|
|
|
```typescript
|
|
function withModePrefix(prompt: string, mode: AgentMode): string {
|
|
if (mode === "plan") {
|
|
return `Make a plan before acting.\n\n${prompt}`;
|
|
}
|
|
if (mode === "chat") {
|
|
return `Answer conversationally.\n\n${prompt}`;
|
|
}
|
|
return prompt;
|
|
}
|
|
```
|
|
|
|
### Human-in-the-Loop
|
|
|
|
Codex has no interactive HITL in SDK mode. All permissions must be configured upfront via sandbox level.
|
|
|
|
## Error Handling
|
|
|
|
- `turn.failed` events are captured but don't throw
|
|
- Thread ID is still returned on error for potential resumption
|
|
- Events iterator may throw after errors - caught and logged
|
|
|
|
```typescript
|
|
interface CodexPromptResult {
|
|
result: unknown;
|
|
threadId?: string | null;
|
|
error?: string; // Set if turn failed
|
|
}
|
|
```
|
|
|
|
## Conversion to Universal Format
|
|
|
|
Codex output is converted via `convertCodexOutput()`:
|
|
|
|
1. Parse with `CodexRunResultSchema`
|
|
2. If result is string, use directly
|
|
3. Otherwise extract from `result`, `output`, or `message` fields
|
|
4. Wrap as assistant message entry
|
|
|
|
## Session Continuity
|
|
|
|
- Thread ID persists across prompts
|
|
- Use `resumeThread(threadId)` to continue conversation
|
|
- Thread ID is captured from `thread.started` event or thread object
|
|
|
|
## Shared App-Server Architecture (Daemon Implementation)
|
|
|
|
The sandbox daemon uses a **single shared Codex app-server process** to handle multiple sessions, similar to OpenCode's server model. This differs from Claude/Amp which spawn a new process per turn.
|
|
|
|
### Architecture Comparison
|
|
|
|
| Agent | Model | Process Lifetime | Session ID |
|
|
|-------|-------|------------------|------------|
|
|
| Claude | Subprocess | Per-turn (killed on TurnCompleted) | `--resume` flag |
|
|
| Amp | Subprocess | Per-turn | `--continue` flag |
|
|
| OpenCode | HTTP Server | Daemon lifetime | Session ID via API |
|
|
| **Codex** | **Stdio Server** | **Daemon lifetime** | **Thread ID via JSON-RPC** |
|
|
|
|
### Daemon Flow
|
|
|
|
1. **First Codex session created**: Spawns `codex app-server` process, performs `initialize`/`initialized` handshake
|
|
2. **Session creation**: Sends `thread/start` request, captures `thread_id` as `native_session_id`
|
|
3. **Message sent**: Sends `turn/start` request with `thread_id`, streams notifications back to session
|
|
4. **Multi-turn**: Reuses same `thread_id`, process stays alive, no respawn needed
|
|
5. **Daemon shutdown**: Process terminated with daemon
|
|
|
|
### Why This Approach?
|
|
|
|
1. **Performance**: No process spawn overhead per message
|
|
2. **Multi-turn support**: Thread persists in server memory, no resume needed
|
|
3. **Consistent with OpenCode**: Similar server-based pattern reduces code complexity
|
|
4. **API alignment**: Matches Codex's intended app-server usage pattern
|
|
|
|
### Protocol Details
|
|
|
|
The shared server uses JSON-RPC 2.0 for request/response correlation:
|
|
|
|
```
|
|
Daemon Codex App-Server
|
|
| |
|
|
|-- initialize {id: 1} ------------>|
|
|
|<-- response {id: 1} --------------|
|
|
|-- initialized (notification) ---->|
|
|
| |
|
|
|-- thread/start {id: 2} ---------->|
|
|
|<-- response {id: 2, thread.id} ---|
|
|
|<-- thread/started (notification) -|
|
|
| |
|
|
|-- turn/start {id: 3, threadId} -->|
|
|
|<-- turn/started (notification) ---|
|
|
|<-- item/* (notifications) --------|
|
|
|<-- turn/completed (notification) -|
|
|
```
|
|
|
|
### Thread-to-Session Routing
|
|
|
|
Notifications are routed to the correct session by extracting `threadId` from each notification:
|
|
|
|
```rust
|
|
fn codex_thread_id_from_server_notification(notification) -> Option<String> {
|
|
// All thread-scoped notifications include threadId field
|
|
match notification {
|
|
TurnStarted(params) => Some(params.thread_id),
|
|
ItemCompleted(params) => Some(params.thread_id),
|
|
// ... etc
|
|
}
|
|
}
|
|
```
|
|
|
|
## Model Discovery
|
|
|
|
Codex exposes a `model/list` JSON-RPC method through its app-server process.
|
|
|
|
### JSON-RPC Method
|
|
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"id": 1,
|
|
"method": "model/list",
|
|
"params": {
|
|
"cursor": null,
|
|
"limit": null
|
|
}
|
|
}
|
|
```
|
|
|
|
Supports pagination via `cursor` and `limit` parameters. Defined in `resources/agent-schemas/artifacts/json-schema/codex.json`.
|
|
|
|
### How to Replicate
|
|
|
|
Requires a running Codex app-server process. Send the JSON-RPC request to the app-server over stdio. The response contains the list of models available to the Codex instance (depends on configured API keys / providers).
|
|
|
|
### Limitations
|
|
|
|
- Requires an active app-server process (cannot query models without starting one)
|
|
- No standalone CLI command like `codex models`
|
|
|
|
## Command Execution & Process Management
|
|
|
|
### Agent Tool Execution
|
|
|
|
Codex executes commands via `LocalShellAction`. The agent proposes a command, and external clients approve/deny via JSON-RPC (`item/commandExecution/requestApproval`).
|
|
|
|
### Command Source Tracking (`ExecCommandSource`)
|
|
|
|
Codex is the only agent that explicitly tracks **who initiated a command** at the protocol level:
|
|
|
|
```json
|
|
{
|
|
"ExecCommandSource": {
|
|
"enum": ["agent", "user_shell", "unified_exec_startup", "unified_exec_interaction"]
|
|
}
|
|
}
|
|
```
|
|
|
|
| Source | Meaning |
|
|
|--------|---------|
|
|
| `agent` | Agent decided to run this command via tool call |
|
|
| `user_shell` | User ran a command in a shell (equivalent to Claude Code's `!` prefix) |
|
|
| `unified_exec_startup` | Startup script ran this command |
|
|
| `unified_exec_interaction` | Interactive execution |
|
|
|
|
This means user-initiated shell commands are **first-class protocol events** in Codex, not a client-side hack like Claude Code's `!` prefix.
|
|
|
|
### Command Execution Events
|
|
|
|
Codex emits structured events for command execution:
|
|
|
|
- `exec_command_begin` - Command started (includes `source`, `command`, `cwd`, `turn_id`)
|
|
- `exec_command_output_delta` - Streaming output chunk (includes `stream: stdout|stderr`)
|
|
- `exec_command_end` - Command completed (includes `exit_code`, `source`)
|
|
|
|
### Parsed Command Analysis (`CommandAction`)
|
|
|
|
Codex provides semantic analysis of what a command does:
|
|
|
|
```json
|
|
{
|
|
"commandActions": [
|
|
{ "type": "read", "path": "/src/main.ts" },
|
|
{ "type": "write", "path": "/src/utils.ts" },
|
|
{ "type": "install", "package": "lodash" }
|
|
]
|
|
}
|
|
```
|
|
|
|
Action types: `read`, `write`, `listFiles`, `search`, `install`, `remove`, `other`.
|
|
|
|
### Comparison
|
|
|
|
| Capability | Supported? | Notes |
|
|
|-----------|-----------|-------|
|
|
| Agent runs commands | Yes (`LocalShellAction`) | With approval workflow |
|
|
| User runs commands → agent sees output | Yes (`user_shell` source) | First-class protocol event |
|
|
| External API for command injection | Yes (JSON-RPC approval) | Can approve/deny before execution |
|
|
| Command source tracking | Yes (`ExecCommandSource` enum) | Distinguishes agent vs user vs startup |
|
|
| Background process management | No | |
|
|
| PTY / interactive terminal | No | |
|
|
|
|
## Notes
|
|
|
|
- SDK is dynamically imported to reduce bundle size
|
|
- No explicit timeout (relies on SDK defaults)
|
|
- Thread ID may not be available until first event
|
|
- Error messages are preserved for debugging
|
|
- Working directory is not explicitly set (SDK handles internally)
|