co-mono/todos/done/2025-08-09-23-33-17-missing-thinking-tokens.md
Mario Zechner 923a9e58ab missing-thinking-tokens: Complete task management for reasoning token support
Moved completed task documentation to done folder after implementing reasoning token
support for OpenAI models (o1, o3, gpt-5) across all renderers and APIs
2025-08-10 14:38:25 +02:00

4.5 KiB

Fix Missing Thinking Tokens for GPT-5 and Anthropic Models

Status: Done Agent PID: 72653

Original Todo

agent: we do not get thinking tokens for gpt-5. possibly also not for anthropic models?

Description

The agent doesn't extract or report reasoning/thinking tokens from OpenAI's reasoning models (gpt-5, o1, o3) when using the Chat Completions API. While the codebase has full thinking token support for the Responses API, the Chat Completions API implementation is missing the extraction of reasoning_tokens from the usage.completion_tokens_details object. This means users don't see how many tokens were used for reasoning, which can be significant (thousands of tokens) for these models.

Read analysis.md in full for detailed codebase research and context

Implementation Plan

  • Extend AgentEvent token_usage type to include reasoningTokens field (packages/agent/src/agent.ts:16-23)
  • Update Chat Completions API token extraction to include reasoning tokens from usage.completion_tokens_details (packages/agent/src/agent.ts:210-220)
  • Update console renderer to display reasoning tokens in usage metrics (packages/agent/src/renderers/console-renderer.ts:117-121)
  • Update TUI renderer to display reasoning tokens in usage metrics (packages/agent/src/renderers/tui-renderer.ts:219-227)
  • Update JSON renderer to include reasoning tokens in output (packages/agent/src/renderers/json-renderer.ts:20)
  • User test: Run agent with gpt-4o-mini model (or other reasoning model) and verify reasoning token count appears in metrics display
  • Debug: Fix missing reasoningTokens field in JSON output even when value is 0
  • Debug: Investigate why o3 model doesn't report reasoning tokens in responses API
  • Fix: Parse reasoning summaries from gpt-5 models (summary_text vs reasoning_text)
  • Fix: Only send reasoning parameter for models that support it (o3, gpt-5, etc)
  • Fix: Better detection of reasoning support - preflight test instead of hardcoded model names
  • Fix: Add reasoning support detection for Chat Completions API
  • Fix: Add correct summary parameter value and increase max_output_tokens for preflight check
  • Investigate: Chat Completions API has reasoning tokens but no thinking events
  • Debug: Add logging to understand gpt-5 response structure in responses API
  • Fix: Change reasoning summary from "auto" to "always" to ensure reasoning text is always returned
  • Fix: Set correct effort levels - "minimal" for responses API, "low" for completions API
  • Add note to README about Chat Completions API not returning thinking content
  • Add Gemini API example to README
  • Verify Gemini thinking token support and update README accordingly
  • Add special case for Gemini to include extra_body with thinking_config
  • Add special case for Groq responses API (doesn't support reasoning.summary)
  • Refactor: Create centralized provider-specific request adjustment function
  • Refactor: Extract message content parsing into parseReasoningFromMessage() function
  • Test: Verify Groq reasoning extraction works with refactored code
  • Test: Verify Gemini thinking extraction works with refactored code

Notes

User reported that o3 model with responses API doesn't show reasoning tokens or thinking events. Fixed by:

  1. Adding reasoningTokens field to AgentEvent type
  2. Extracting reasoning tokens from both Chat Completions and Responses APIs
  3. Smart preflight detection of reasoning support for both APIs (cached per agent instance)
  4. Only sending reasoning parameter for supported models
  5. Parsing both reasoning_text (o1/o3) and summary_text (gpt-5) formats
  6. Displaying reasoning tokens in console and TUI renderers with symbol
  7. Properly handling reasoning_effort for Chat Completions API
  8. Set correct effort levels: "minimal" for Responses API, "low" for Chat Completions API
  9. Set summary to "always" for Responses API

Important findings:

  • Chat Completions API by design only returns reasoning token counts but not the actual thinking/reasoning content for o1 models. This is expected behavior - only the Responses API exposes thinking events.
  • GPT-5 models currently return empty summary arrays even with summary: "detailed" - the model indicates it "can't share step-by-step reasoning". This appears to be a model limitation/behavior rather than a code issue.
  • The reasoning tokens ARE being used and counted correctly when the model chooses to use them.
  • With effort="minimal" and summary="detailed", gpt-5 sometimes chooses not to use reasoning at all for simple questions.