missing-thinking-tokens: Complete task management for reasoning token support

Moved completed task documentation to done folder after implementing reasoning token
support for OpenAI models (o1, o3, gpt-5) across all renderers and APIs
This commit is contained in:
Mario Zechner 2025-08-10 14:38:25 +02:00
parent 5d13a90077
commit 923a9e58ab
3 changed files with 19 additions and 2 deletions

View file

@ -1,111 +0,0 @@
# Analysis: Thinking Tokens Handling in Pi-Agent
Based on my comprehensive search of the codebase, I found extensive thinking token handling implementation in the pi-agent package. Here's my detailed analysis:
## Current Implementation Overview
The pi-agent codebase already has **comprehensive thinking token support** implemented in `/Users/badlogic/workspaces/pi-mono/packages/agent/src/agent.ts`. The implementation covers both OpenAI's Responses API and Chat Completions API.
## Key Findings
### 1. **Thinking Token Event Type Defined**
The `AgentEvent` type includes a dedicated `thinking` event:
```typescript
export type AgentEvent =
// ... other event types
| { type: "thinking"; text: string }
// ... other event types
```
### 2. **Responses API Implementation (Lines 103-110)**
For the Responses API (used by GPT-OSS and potentially GPT-5 models), thinking tokens are already parsed:
```typescript
case "reasoning": {
for (const content of item.content || []) {
if (content.type === "reasoning_text") {
await eventReceiver?.on({ type: "thinking", text: content.text });
}
}
break;
}
```
### 3. **Token Usage Tracking**
Both API implementations properly track token usage with support for:
- Input tokens (`inputTokens`)
- Output tokens (`outputTokens`)
- Cache read tokens (`cacheReadTokens`)
- Cache write tokens (`cacheWriteTokens`)
### 4. **UI Rendering Support**
Both console and TUI renderers have explicit support for thinking events:
**Console Renderer** (`console-renderer.ts:99-106`):
```typescript
case "thinking":
this.stopAnimation();
console.log(chalk.dim("[thinking]"));
console.log(chalk.dim(event.text));
console.log();
// Resume animation after showing thinking
this.startAnimation("Processing");
break;
```
**TUI Renderer** (`tui-renderer.ts:188-201`):
```typescript
case "thinking": {
// Show thinking in dim text
const thinkingContainer = new Container();
thinkingContainer.addChild(new TextComponent(chalk.dim("[thinking]")));
// Split thinking text into lines for better display
const thinkingLines = event.text.split("\n");
for (const line of thinkingLines) {
thinkingContainer.addChild(new TextComponent(chalk.dim(line)));
}
thinkingContainer.addChild(new WhitespaceComponent(1));
this.chatContainer.addChild(thinkingContainer);
break;
}
```
## Potential Issues Identified
### 1. **GPT-5 API Compatibility**
The current implementation assumes GPT-5 models work with the Chat Completions API (`callModelChatCompletionsApi`), but GPT-5 models might need the Responses API (`callModelResponsesApi`) to access thinking tokens. The agent defaults to `"completions"` API type.
### 2. **Missing Thinking Token Usage in Chat Completions API**
The Chat Completions API implementation doesn't parse or handle thinking/reasoning content - it only handles regular message content and tool calls. However, based on the web search results, GPT-5 models support reasoning tokens even in Chat Completions API.
### 3. **Model-Specific API Detection**
There's no automatic detection of which API to use based on the model name. The default model is `"gpt-5-mini"` but uses `api: "completions"`.
## Anthropic Models Support
For Anthropic models accessed via the OpenAI SDK compatibility layer, the current Chat Completions API implementation should work, but there might be missing thinking token extraction if Anthropic returns reasoning content in a different format than standard OpenAI models.
## Recommendations
### 1. **Add Model-Based API Detection**
Implement automatic API selection based on model names:
```typescript
function getApiTypeForModel(model: string): "completions" | "responses" {
if (model.includes("gpt-5") || model.includes("o1") || model.includes("o3")) {
return "responses";
}
return "completions";
}
```
### 2. **Enhanced Chat Completions API Support**
If GPT-5 models can return thinking tokens via Chat Completions API, the implementation needs to be enhanced to parse reasoning content from the response.
### 3. **Anthropic-Specific Handling**
Add specific logic for Anthropic models to extract thinking content if they provide it in a non-standard format.
## Files to Examine/Modify
1. **`/Users/badlogic/workspaces/pi-mono/packages/agent/src/agent.ts`** - Core API handling
2. **`/Users/badlogic/workspaces/pi-mono/packages/agent/src/main.ts`** - Default configuration and model setup
The codebase already has a solid foundation for thinking token support, but may need model-specific API routing and enhanced parsing logic to fully support GPT-5 and Anthropic thinking tokens.

View file

@ -1,58 +0,0 @@
# Fix Missing Thinking Tokens for GPT-5 and Anthropic Models
**Status:** AwaitingCommit
**Agent PID:** 41002
## Original Todo
agent: we do not get thinking tokens for gpt-5. possibly also not for anthropic models?
## Description
The agent doesn't extract or report reasoning/thinking tokens from OpenAI's reasoning models (gpt-5, o1, o3) when using the Chat Completions API. While the codebase has full thinking token support for the Responses API, the Chat Completions API implementation is missing the extraction of `reasoning_tokens` from the `usage.completion_tokens_details` object. This means users don't see how many tokens were used for reasoning, which can be significant (thousands of tokens) for these models.
*Read [analysis.md](./analysis.md) in full for detailed codebase research and context*
## Implementation Plan
- [x] Extend AgentEvent token_usage type to include reasoningTokens field (packages/agent/src/agent.ts:16-23)
- [x] Update Chat Completions API token extraction to include reasoning tokens from usage.completion_tokens_details (packages/agent/src/agent.ts:210-220)
- [x] Update console renderer to display reasoning tokens in usage metrics (packages/agent/src/renderers/console-renderer.ts:117-121)
- [x] Update TUI renderer to display reasoning tokens in usage metrics (packages/agent/src/renderers/tui-renderer.ts:219-227)
- [x] Update JSON renderer to include reasoning tokens in output (packages/agent/src/renderers/json-renderer.ts:20)
- [x] User test: Run agent with gpt-4o-mini model (or other reasoning model) and verify reasoning token count appears in metrics display
- [x] Debug: Fix missing reasoningTokens field in JSON output even when value is 0
- [x] Debug: Investigate why o3 model doesn't report reasoning tokens in responses API
- [x] Fix: Parse reasoning summaries from gpt-5 models (summary_text vs reasoning_text)
- [x] Fix: Only send reasoning parameter for models that support it (o3, gpt-5, etc)
- [x] Fix: Better detection of reasoning support - preflight test instead of hardcoded model names
- [x] Fix: Add reasoning support detection for Chat Completions API
- [x] Fix: Add correct summary parameter value and increase max_output_tokens for preflight check
- [x] Investigate: Chat Completions API has reasoning tokens but no thinking events
- [x] Debug: Add logging to understand gpt-5 response structure in responses API
- [x] Fix: Change reasoning summary from "auto" to "always" to ensure reasoning text is always returned
- [x] Fix: Set correct effort levels - "minimal" for responses API, "low" for completions API
- [x] Add note to README about Chat Completions API not returning thinking content
- [x] Add Gemini API example to README
- [x] Verify Gemini thinking token support and update README accordingly
- [x] Add special case for Gemini to include extra_body with thinking_config
- [x] Add special case for Groq responses API (doesn't support reasoning.summary)
- [x] Refactor: Create centralized provider-specific request adjustment function
- [x] Refactor: Extract message content parsing into parseReasoningFromMessage() function
- [x] Test: Verify Groq reasoning extraction works with refactored code
- [x] Test: Verify Gemini thinking extraction works with refactored code
## Notes
User reported that o3 model with responses API doesn't show reasoning tokens or thinking events.
Fixed by:
1. Adding reasoningTokens field to AgentEvent type
2. Extracting reasoning tokens from both Chat Completions and Responses APIs
3. Smart preflight detection of reasoning support for both APIs (cached per agent instance)
4. Only sending reasoning parameter for supported models
5. Parsing both reasoning_text (o1/o3) and summary_text (gpt-5) formats
6. Displaying reasoning tokens in console and TUI renderers with ⚡ symbol
7. Properly handling reasoning_effort for Chat Completions API
8. Set correct effort levels: "minimal" for Responses API, "low" for Chat Completions API
9. Set summary to "always" for Responses API
**Important findings**:
- Chat Completions API by design only returns reasoning token *counts* but not the actual thinking/reasoning content for o1 models. This is expected behavior - only the Responses API exposes thinking events.
- GPT-5 models currently return empty summary arrays even with `summary: "detailed"` - the model indicates it "can't share step-by-step reasoning". This appears to be a model limitation/behavior rather than a code issue.
- The reasoning tokens ARE being used and counted correctly when the model chooses to use them.
- With effort="minimal" and summary="detailed", gpt-5 sometimes chooses not to use reasoning at all for simple questions.