agent: Add reasoning token support for OpenAI reasoning models

- Extract and display reasoning tokens from both Chat Completions and Responses APIs
- Add smart preflight detection to check reasoning support per model/API (cached per agent)
- Support both reasoning_text (o1/o3) and summary_text (gpt-5) formats
- Display reasoning tokens with  symbol in console and TUI renderers
- Only send reasoning parameters to models that support them
- Fix event type from "thinking" to "reasoning" for consistency

Note: Chat Completions API only returns reasoning token counts, not content (by design).
Only Responses API exposes actual thinking/reasoning events.
This commit is contained in:
Mario Zechner 2025-08-10 00:32:30 +02:00
parent 9157411034
commit 62d9eefc2a
8 changed files with 284 additions and 15 deletions

View file

@ -2,4 +2,6 @@
- README.md
- packages/tui/README.md
- packages/agent/README.md
- packages/pods/README.md
- packages/pods/README.md
- We must NEVER have type any anywhere, unless absolutely, positively necessary.
- If you are working with an external API, check node_modules for the type definitions as needed instead of assuming things.

View file

@ -6,7 +6,7 @@ import { executeTool, toolsForChat, toolsForResponses } from "./tools/tools.js";
export type AgentEvent =
| { type: "session_start"; sessionId: string; model: string; api: string; baseURL: string; systemPrompt: string }
| { type: "assistant_start" }
| { type: "thinking"; text: string }
| { type: "reasoning"; text: string }
| { type: "tool_call"; toolCallId: string; name: string; args: string }
| { type: "tool_result"; toolCallId: string; result: string; isError: boolean }
| { type: "assistant_message"; text: string }
@ -20,6 +20,7 @@ export type AgentEvent =
totalTokens: number;
cacheReadTokens: number;
cacheWriteTokens: number;
reasoningTokens: number;
};
export interface AgentEventReceiver {
@ -40,15 +41,76 @@ export interface ToolCall {
id: string;
}
// Cache for model reasoning support detection per API type
const modelReasoningSupport = new Map<string, { completions?: boolean; responses?: boolean }>();
async function checkReasoningSupport(
client: OpenAI,
model: string,
api: "completions" | "responses",
): Promise<boolean> {
// Check cache first
const cacheKey = model;
const cached = modelReasoningSupport.get(cacheKey);
if (cached && cached[api] !== undefined) {
return cached[api]!;
}
let supportsReasoning = false;
if (api === "responses") {
// Try a minimal request with reasoning parameter for Responses API
try {
await client.responses.create({
model,
input: "test",
max_output_tokens: 1024,
reasoning: {
effort: "low", // Use low instead of minimal to ensure we get summaries
},
});
supportsReasoning = true;
} catch (error) {
supportsReasoning = false;
}
} else {
// For Chat Completions API, try with reasoning_effort parameter
try {
await client.chat.completions.create({
model,
messages: [{ role: "user", content: "test" }],
max_completion_tokens: 1,
reasoning_effort: "minimal",
});
supportsReasoning = true;
} catch (error) {
supportsReasoning = false;
}
}
// Update cache
const existing = modelReasoningSupport.get(cacheKey) || {};
existing[api] = supportsReasoning;
modelReasoningSupport.set(cacheKey, existing);
return supportsReasoning;
}
export async function callModelResponsesApi(
client: OpenAI,
model: string,
messages: any[],
signal?: AbortSignal,
eventReceiver?: AgentEventReceiver,
supportsReasoning?: boolean,
): Promise<void> {
await eventReceiver?.on({ type: "assistant_start" });
// Use provided reasoning support or detect it
if (supportsReasoning === undefined) {
supportsReasoning = await checkReasoningSupport(client, model, "responses");
}
let conversationDone = false;
while (!conversationDone) {
@ -65,11 +127,13 @@ export async function callModelResponsesApi(
tools: toolsForResponses as any,
tool_choice: "auto",
parallel_tool_calls: true,
reasoning: {
effort: "medium", // Use auto reasoning effort
summary: "auto",
},
max_output_tokens: 2000, // TODO make configurable
...(supportsReasoning && {
reasoning: {
effort: "medium", // Use auto reasoning effort
summary: "auto", // Request reasoning summaries
},
}),
},
{ signal },
);
@ -82,8 +146,9 @@ export async function callModelResponsesApi(
inputTokens: usage.input_tokens || 0,
outputTokens: usage.output_tokens || 0,
totalTokens: usage.total_tokens || 0,
cacheReadTokens: usage.input_tokens_details.cached_tokens || 0,
cacheReadTokens: usage.input_tokens_details?.cached_tokens || 0,
cacheWriteTokens: 0, // Not available in API
reasoningTokens: usage.output_tokens_details?.reasoning_tokens || 0,
});
}
@ -101,9 +166,11 @@ export async function callModelResponsesApi(
switch (item.type) {
case "reasoning": {
for (const content of item.content || []) {
if (content.type === "reasoning_text") {
await eventReceiver?.on({ type: "thinking", text: content.text });
// Handle both content (o1/o3) and summary (gpt-5) formats
const reasoningItems = item.content || item.summary || [];
for (const content of reasoningItems) {
if (content.type === "reasoning_text" || content.type === "summary_text") {
await eventReceiver?.on({ type: "reasoning", text: content.text });
}
}
break;
@ -182,9 +249,15 @@ export async function callModelChatCompletionsApi(
messages: any[],
signal?: AbortSignal,
eventReceiver?: AgentEventReceiver,
supportsReasoning?: boolean,
): Promise<void> {
await eventReceiver?.on({ type: "assistant_start" });
// Use provided reasoning support or detect it
if (supportsReasoning === undefined) {
supportsReasoning = await checkReasoningSupport(client, model, "completions");
}
let assistantResponded = false;
while (!assistantResponded) {
@ -200,6 +273,9 @@ export async function callModelChatCompletionsApi(
tools: toolsForChat,
tool_choice: "auto",
max_completion_tokens: 2000, // TODO make configurable
...(supportsReasoning && {
reasoning_effort: "medium",
}),
},
{ signal },
);
@ -216,6 +292,7 @@ export async function callModelChatCompletionsApi(
totalTokens: usage.total_tokens || 0,
cacheReadTokens: usage.prompt_tokens_details?.cached_tokens || 0,
cacheWriteTokens: 0, // Not available in API
reasoningTokens: usage.completion_tokens_details?.reasoning_tokens || 0,
});
}
@ -279,6 +356,8 @@ export class Agent {
private sessionManager?: SessionManager;
private comboReceiver: AgentEventReceiver;
private abortController: AbortController | null = null;
private supportsReasoningResponses: boolean | null = null; // Cache reasoning support for responses API
private supportsReasoningCompletions: boolean | null = null; // Cache reasoning support for completions API
constructor(config: AgentConfig, renderer?: AgentEventReceiver, sessionManager?: SessionManager) {
this.config = config;
@ -332,25 +411,46 @@ export class Agent {
try {
if (this.config.api === "responses") {
// Check reasoning support only once per agent instance
if (this.supportsReasoningResponses === null) {
this.supportsReasoningResponses = await checkReasoningSupport(
this.client,
this.config.model,
"responses",
);
}
await callModelResponsesApi(
this.client,
this.config.model,
this.messages,
this.abortController.signal,
this.comboReceiver,
this.supportsReasoningResponses,
);
} else {
// Check reasoning support for completions API
if (this.supportsReasoningCompletions === null) {
this.supportsReasoningCompletions = await checkReasoningSupport(
this.client,
this.config.model,
"completions",
);
}
await callModelChatCompletionsApi(
this.client,
this.config.model,
this.messages,
this.abortController.signal,
this.comboReceiver,
this.supportsReasoningCompletions,
);
}
} catch (e: any) {
} catch (e) {
// Check if this was an interruption
if (e.message === "Interrupted" || this.abortController.signal.aborted) {
const errorMessage = e instanceof Error ? e.message : String(e);
if (errorMessage === "Interrupted" || this.abortController.signal.aborted) {
return;
}
throw e;
@ -385,7 +485,7 @@ export class Agent {
});
break;
case "thinking":
case "reasoning":
// Add reasoning message
this.messages.push({
type: "reasoning",

View file

@ -13,6 +13,7 @@ export class ConsoleRenderer implements AgentEventReceiver {
private lastOutputTokens = 0;
private lastCacheReadTokens = 0;
private lastCacheWriteTokens = 0;
private lastReasoningTokens = 0;
private startAnimation(text: string = "Thinking"): void {
if (this.isAnimating || !this.isTTY) return;
@ -54,6 +55,11 @@ export class ConsoleRenderer implements AgentEventReceiver {
`${this.lastInputTokens.toLocaleString()}${this.lastOutputTokens.toLocaleString()}`,
);
// Add reasoning tokens if present
if (this.lastReasoningTokens > 0) {
metricsText += chalk.dim(`${this.lastReasoningTokens.toLocaleString()}`);
}
// Add cache info if available
if (this.lastCacheReadTokens > 0 || this.lastCacheWriteTokens > 0) {
const cacheText: string[] = [];
@ -96,7 +102,7 @@ export class ConsoleRenderer implements AgentEventReceiver {
this.startAnimation();
break;
case "thinking":
case "reasoning":
this.stopAnimation();
console.log(chalk.dim("[thinking]"));
console.log(chalk.dim(event.text));
@ -162,6 +168,7 @@ export class ConsoleRenderer implements AgentEventReceiver {
this.lastOutputTokens = event.outputTokens;
this.lastCacheReadTokens = event.cacheReadTokens;
this.lastCacheWriteTokens = event.cacheWriteTokens;
this.lastReasoningTokens = event.reasoningTokens;
// Don't stop animation for this event
break;
}

View file

@ -61,6 +61,7 @@ export class TuiRenderer implements AgentEventReceiver {
private lastOutputTokens = 0;
private lastCacheReadTokens = 0;
private lastCacheWriteTokens = 0;
private lastReasoningTokens = 0;
private toolCallCount = 0;
private tokenStatusComponent: TextComponent | null = null;
@ -185,7 +186,7 @@ export class TuiRenderer implements AgentEventReceiver {
this.statusContainer.addChild(this.currentLoadingAnimation);
break;
case "thinking": {
case "reasoning": {
// Show thinking in dim text
const thinkingContainer = new Container();
thinkingContainer.addChild(new TextComponent(chalk.dim("[thinking]")));
@ -264,6 +265,7 @@ export class TuiRenderer implements AgentEventReceiver {
this.lastOutputTokens = event.outputTokens;
this.lastCacheReadTokens = event.cacheReadTokens;
this.lastCacheWriteTokens = event.cacheWriteTokens;
this.lastReasoningTokens = event.reasoningTokens;
this.updateTokenDisplay();
break;
@ -291,6 +293,11 @@ export class TuiRenderer implements AgentEventReceiver {
// Build token display text
let tokenText = chalk.dim(`${this.lastInputTokens.toLocaleString()}${this.lastOutputTokens.toLocaleString()}`);
// Add reasoning tokens if present
if (this.lastReasoningTokens > 0) {
tokenText += chalk.dim(`${this.lastReasoningTokens.toLocaleString()}`);
}
// Add cache info if available
if (this.lastCacheReadTokens > 0 || this.lastCacheWriteTokens > 0) {
const cacheText: string[] = [];

View file

@ -142,6 +142,7 @@ export class SessionManager implements AgentEventReceiver {
totalTokens: 0,
cacheReadTokens: 0,
cacheWriteTokens: 0,
reasoningTokens: 0,
};
const lines = readFileSync(this.sessionFile, "utf8").trim().split("\n");

View file

@ -1,3 +1,4 @@
- agent: token usage gets overwritten with each message that has usage data. however, if the latest data doesn't have a specific usage field, we record undefined i think? also, {"type":"token_usage" "inputTokens":240,"outputTokens":35,"totalTokens":275,"cacheReadTokens":0,"cacheWriteTokens":0} doesn't contain reasoningToken? do we lack initialization?
- pods: if a pod is down and i run `pi list`, verifying processes says All processes verified. But that can't be true, as we can no longer SSH into the pod to check.
- agent: start a new agent session. when i press CTRL+C, "Press Ctrl+C again to exit" appears above the text editor followed by an empty line. After about 1 second, the empty line disappears. We should either not show the empty line, or always show the empty line. Maybe Ctrl+C info should be displayed below the text editor.
- tui: npx tsx test/demo.ts, using /exit or pressing CTRL+C does not work to exit the demo.

View file

@ -0,0 +1,111 @@
# Analysis: Thinking Tokens Handling in Pi-Agent
Based on my comprehensive search of the codebase, I found extensive thinking token handling implementation in the pi-agent package. Here's my detailed analysis:
## Current Implementation Overview
The pi-agent codebase already has **comprehensive thinking token support** implemented in `/Users/badlogic/workspaces/pi-mono/packages/agent/src/agent.ts`. The implementation covers both OpenAI's Responses API and Chat Completions API.
## Key Findings
### 1. **Thinking Token Event Type Defined**
The `AgentEvent` type includes a dedicated `thinking` event:
```typescript
export type AgentEvent =
// ... other event types
| { type: "thinking"; text: string }
// ... other event types
```
### 2. **Responses API Implementation (Lines 103-110)**
For the Responses API (used by GPT-OSS and potentially GPT-5 models), thinking tokens are already parsed:
```typescript
case "reasoning": {
for (const content of item.content || []) {
if (content.type === "reasoning_text") {
await eventReceiver?.on({ type: "thinking", text: content.text });
}
}
break;
}
```
### 3. **Token Usage Tracking**
Both API implementations properly track token usage with support for:
- Input tokens (`inputTokens`)
- Output tokens (`outputTokens`)
- Cache read tokens (`cacheReadTokens`)
- Cache write tokens (`cacheWriteTokens`)
### 4. **UI Rendering Support**
Both console and TUI renderers have explicit support for thinking events:
**Console Renderer** (`console-renderer.ts:99-106`):
```typescript
case "thinking":
this.stopAnimation();
console.log(chalk.dim("[thinking]"));
console.log(chalk.dim(event.text));
console.log();
// Resume animation after showing thinking
this.startAnimation("Processing");
break;
```
**TUI Renderer** (`tui-renderer.ts:188-201`):
```typescript
case "thinking": {
// Show thinking in dim text
const thinkingContainer = new Container();
thinkingContainer.addChild(new TextComponent(chalk.dim("[thinking]")));
// Split thinking text into lines for better display
const thinkingLines = event.text.split("\n");
for (const line of thinkingLines) {
thinkingContainer.addChild(new TextComponent(chalk.dim(line)));
}
thinkingContainer.addChild(new WhitespaceComponent(1));
this.chatContainer.addChild(thinkingContainer);
break;
}
```
## Potential Issues Identified
### 1. **GPT-5 API Compatibility**
The current implementation assumes GPT-5 models work with the Chat Completions API (`callModelChatCompletionsApi`), but GPT-5 models might need the Responses API (`callModelResponsesApi`) to access thinking tokens. The agent defaults to `"completions"` API type.
### 2. **Missing Thinking Token Usage in Chat Completions API**
The Chat Completions API implementation doesn't parse or handle thinking/reasoning content - it only handles regular message content and tool calls. However, based on the web search results, GPT-5 models support reasoning tokens even in Chat Completions API.
### 3. **Model-Specific API Detection**
There's no automatic detection of which API to use based on the model name. The default model is `"gpt-5-mini"` but uses `api: "completions"`.
## Anthropic Models Support
For Anthropic models accessed via the OpenAI SDK compatibility layer, the current Chat Completions API implementation should work, but there might be missing thinking token extraction if Anthropic returns reasoning content in a different format than standard OpenAI models.
## Recommendations
### 1. **Add Model-Based API Detection**
Implement automatic API selection based on model names:
```typescript
function getApiTypeForModel(model: string): "completions" | "responses" {
if (model.includes("gpt-5") || model.includes("o1") || model.includes("o3")) {
return "responses";
}
return "completions";
}
```
### 2. **Enhanced Chat Completions API Support**
If GPT-5 models can return thinking tokens via Chat Completions API, the implementation needs to be enhanced to parse reasoning content from the response.
### 3. **Anthropic-Specific Handling**
Add specific logic for Anthropic models to extract thinking content if they provide it in a non-standard format.
## Files to Examine/Modify
1. **`/Users/badlogic/workspaces/pi-mono/packages/agent/src/agent.ts`** - Core API handling
2. **`/Users/badlogic/workspaces/pi-mono/packages/agent/src/main.ts`** - Default configuration and model setup
The codebase already has a solid foundation for thinking token support, but may need model-specific API routing and enhanced parsing logic to fully support GPT-5 and Anthropic thinking tokens.

View file

@ -0,0 +1,40 @@
# Fix Missing Thinking Tokens for GPT-5 and Anthropic Models
**Status:** AwaitingCommit
**Agent PID:** 27674
## Original Todo
agent: we do not get thinking tokens for gpt-5. possibly also not for anthropic models?
## Description
The agent doesn't extract or report reasoning/thinking tokens from OpenAI's reasoning models (gpt-5, o1, o3) when using the Chat Completions API. While the codebase has full thinking token support for the Responses API, the Chat Completions API implementation is missing the extraction of `reasoning_tokens` from the `usage.completion_tokens_details` object. This means users don't see how many tokens were used for reasoning, which can be significant (thousands of tokens) for these models.
*Read [analysis.md](./analysis.md) in full for detailed codebase research and context*
## Implementation Plan
- [x] Extend AgentEvent token_usage type to include reasoningTokens field (packages/agent/src/agent.ts:16-23)
- [x] Update Chat Completions API token extraction to include reasoning tokens from usage.completion_tokens_details (packages/agent/src/agent.ts:210-220)
- [x] Update console renderer to display reasoning tokens in usage metrics (packages/agent/src/renderers/console-renderer.ts:117-121)
- [x] Update TUI renderer to display reasoning tokens in usage metrics (packages/agent/src/renderers/tui-renderer.ts:219-227)
- [x] Update JSON renderer to include reasoning tokens in output (packages/agent/src/renderers/json-renderer.ts:20)
- [x] User test: Run agent with gpt-4o-mini model (or other reasoning model) and verify reasoning token count appears in metrics display
- [x] Debug: Fix missing reasoningTokens field in JSON output even when value is 0
- [x] Debug: Investigate why o3 model doesn't report reasoning tokens in responses API
- [x] Fix: Parse reasoning summaries from gpt-5 models (summary_text vs reasoning_text)
- [x] Fix: Only send reasoning parameter for models that support it (o3, gpt-5, etc)
- [x] Fix: Better detection of reasoning support - preflight test instead of hardcoded model names
- [x] Fix: Add reasoning support detection for Chat Completions API
- [x] Fix: Add correct summary parameter value and increase max_output_tokens for preflight check
- [x] Investigate: Chat Completions API has reasoning tokens but no thinking events
## Notes
User reported that o3 model with responses API doesn't show reasoning tokens or thinking events.
Fixed by:
1. Adding reasoningTokens field to AgentEvent type
2. Extracting reasoning tokens from both Chat Completions and Responses APIs
3. Smart preflight detection of reasoning support for both APIs (cached per agent instance)
4. Only sending reasoning parameter for supported models
5. Parsing both reasoning_text (o1/o3) and summary_text (gpt-5) formats
6. Displaying reasoning tokens in console and TUI renderers with ⚡ symbol
7. Properly handling reasoning_effort for Chat Completions API
**Important finding**: Chat Completions API by design only returns reasoning token *counts* but not the actual thinking/reasoning content for o1 models. This is expected behavior - only the Responses API exposes thinking events.