feat(agent): Comprehensive reasoning token support across providers

Added provider-specific reasoning/thinking token support for: - OpenAI (o1, o3, gpt-5): Full reasoning events via Responses API, token counts via Chat Completions - Groq: reasoning_format:"parsed" for Chat Completions, no summary support for Responses - Gemini 2.5: extra_body.google.thinking_config with <thought> tag extraction - OpenRouter: Unified reasoning parameter with message.reasoning field - Anthropic: Limited support via OpenAI compatibility layer Key improvements: - Centralized provider detection based on baseURL - parseReasoningFromMessage() extracts provider-specific reasoning content - adjustRequestForProvider() handles provider-specific request modifications - Smart reasoning support detection with caching per API type - Comprehensive README documentation with provider support matrix Fixes reasoning tokens not appearing for GPT-5 and other reasoning models.
2026-04-21 03:04:28 +00:00 · 2025-08-10 01:46:15 +02:00 · 2025-08-10 01:46:15 +02:00 · 99ce76d66e
commit 99ce76d66e
parent 62d9eefc2a
5 changed files with 345 additions and 58 deletions
--- a/todos/work/2025-08-09-23-33-17-missing-thinking-tokens/task.md
+++ b/todos/work/2025-08-09-23-33-17-missing-thinking-tokens/task.md
@ -1,6 +1,6 @@
 # Fix Missing Thinking Tokens for GPT-5 and Anthropic Models
 **Status:** AwaitingCommit
-**Agent PID:** 27674
+**Agent PID:** 41002

 ## Original Todo
 agent: we do not get thinking tokens for gpt-5. possibly also not for anthropic models?
@ -25,6 +25,18 @@ The agent doesn't extract or report reasoning/thinking tokens from OpenAI's reas
 - [x] Fix: Add reasoning support detection for Chat Completions API
 - [x] Fix: Add correct summary parameter value and increase max_output_tokens for preflight check
 - [x] Investigate: Chat Completions API has reasoning tokens but no thinking events
+- [x] Debug: Add logging to understand gpt-5 response structure in responses API
+- [x] Fix: Change reasoning summary from "auto" to "always" to ensure reasoning text is always returned
+- [x] Fix: Set correct effort levels - "minimal" for responses API, "low" for completions API
+- [x] Add note to README about Chat Completions API not returning thinking content
+- [x] Add Gemini API example to README
+- [x] Verify Gemini thinking token support and update README accordingly
+- [x] Add special case for Gemini to include extra_body with thinking_config
+- [x] Add special case for Groq responses API (doesn't support reasoning.summary)
+- [x] Refactor: Create centralized provider-specific request adjustment function
+- [x] Refactor: Extract message content parsing into parseReasoningFromMessage() function
+- [x] Test: Verify Groq reasoning extraction works with refactored code
+- [x] Test: Verify Gemini thinking extraction works with refactored code

 ## Notes
 User reported that o3 model with responses API doesn't show reasoning tokens or thinking events.
@ -36,5 +48,11 @@ Fixed by:
 5. Parsing both reasoning_text (o1/o3) and summary_text (gpt-5) formats
 6. Displaying reasoning tokens in console and TUI renderers with ⚡ symbol
 7. Properly handling reasoning_effort for Chat Completions API
+8. Set correct effort levels: "minimal" for Responses API, "low" for Chat Completions API
+9. Set summary to "always" for Responses API

-**Important finding**: Chat Completions API by design only returns reasoning token *counts* but not the actual thinking/reasoning content for o1 models. This is expected behavior - only the Responses API exposes thinking events.
+**Important findings**: 
+- Chat Completions API by design only returns reasoning token *counts* but not the actual thinking/reasoning content for o1 models. This is expected behavior - only the Responses API exposes thinking events.
+- GPT-5 models currently return empty summary arrays even with `summary: "detailed"` - the model indicates it "can't share step-by-step reasoning". This appears to be a model limitation/behavior rather than a code issue. 
+- The reasoning tokens ARE being used and counted correctly when the model chooses to use them.
+- With effort="minimal" and summary="detailed", gpt-5 sometimes chooses not to use reasoning at all for simple questions.