feat(agent): Comprehensive reasoning token support across providers

Added provider-specific reasoning/thinking token support for: - OpenAI (o1, o3, gpt-5): Full reasoning events via Responses API, token counts via Chat Completions - Groq: reasoning_format:"parsed" for Chat Completions, no summary support for Responses - Gemini 2.5: extra_body.google.thinking_config with <thought> tag extraction - OpenRouter: Unified reasoning parameter with message.reasoning field - Anthropic: Limited support via OpenAI compatibility layer Key improvements: - Centralized provider detection based on baseURL - parseReasoningFromMessage() extracts provider-specific reasoning content - adjustRequestForProvider() handles provider-specific request modifications - Smart reasoning support detection with caching per API type - Comprehensive README documentation with provider support matrix Fixes reasoning tokens not appearing for GPT-5 and other reasoning models.
2026-04-15 09:01:14 +00:00 · 2025-08-10 01:46:15 +02:00 · 2025-08-10 01:46:15 +02:00 · 99ce76d66e
commit 99ce76d66e
parent 62d9eefc2a
5 changed files with 345 additions and 58 deletions
--- a/packages/agent/README.md
+++ b/packages/agent/README.md
@ -33,14 +33,20 @@ pi-agent
 # Continue most recently modified session in current directory
 pi-agent --continue "Follow up question"

-# GPT-OSS via Groq
+# GPT-OSS via Groq (supports reasoning with both APIs)
 pi-agent --base-url https://api.groq.com/openai/v1 --api-key $GROQ_API_KEY --model openai/gpt-oss-120b

 # GLM 4.5 via OpenRouter
 pi-agent --base-url https://openrouter.ai/api/v1 --api-key $OPENROUTER_API_KEY --model z-ai/glm-4.5

-# Claude via Anthropic (no prompt caching support - see https://docs.anthropic.com/en/api/openai-sdk)
+# Claude via Anthropic's OpenAI compatibility layer
+# Note: No prompt caching or thinking content support. For full features, use the native Anthropic API.
+# See: https://docs.anthropic.com/en/api/openai-sdk
 pi-agent --base-url https://api.anthropic.com/v1 --api-key $ANTHROPIC_API_KEY --model claude-opus-4-1-20250805
+
+# Gemini via Google AI (set GEMINI_API_KEY environment variable)
+# Note: Gemini 2.5 models support reasoning but require extra_body for thinking content (not yet supported)
+pi-agent --base-url https://generativelanguage.googleapis.com/v1beta/openai/ --api-key $GEMINI_API_KEY --model gemini-2.5-flash
 ```

 ## Usage Modes
@ -137,13 +143,18 @@ When using `--json`, the agent outputs these event types:
 - `user_message` - User input
 - `assistant_start` - Assistant begins responding
 - `assistant_message` - Assistant's response
- `thinking` - Reasoning/thinking (for models that support it)
+- `thinking` - Reasoning/thinking (for models that support it, requires `--api responses`)
 - `tool_call` - Tool being called
 - `tool_result` - Result from tool
- `token_usage` - Token usage statistics
+- `token_usage` - Token usage statistics (includes `reasoningTokens` for models with reasoning)
 - `error` - Error occurred
 - `interrupted` - Processing was interrupted

+**Note:**
+- OpenAI's Chat Completions API (`--api completions`, the default) only returns reasoning token *counts* but not the actual thinking content. To see thinking events, use the Responses API with `--api responses` for supported models (o1, o3, gpt-5).
+- Anthropic's OpenAI compatibility layer doesn't return thinking content. Use the native Anthropic API for full extended thinking features.
+- Gemini 2.5 models support reasoning via `reasoning_effort` but require the `extra_body` parameter with `thinking_config.include_thoughts: true` to get thinking content, which is not yet supported in pi-agent.
+
 The complete TypeScript type definition for `AgentEvent` can be found in [`src/agent.ts`](src/agent.ts#L6).

 ## Build an Interactive UI with JSON Mode
@ -284,6 +295,66 @@ agent.on('error', (err) => {
 console.log('Pi Agent Interactive Chat');
 ```

+## Reasoning
+
+Pi-agent supports reasoning/thinking tokens for models that provide this capability:
+
+### Supported Providers
+
+| Provider | API | Reasoning Tokens | Thinking Content | Notes |
+|----------|-----|------------------|------------------|-------|
+| OpenAI (o1, o3) | Responses | ✅ | ✅ | Full support via `reasoning` events |
+| OpenAI (o1, o3) | Chat Completions | ✅ | ❌ | Token counts only, no content |
+| OpenAI (gpt-5) | Responses | ✅ | ⚠️ | Model returns empty summaries |
+| OpenAI (gpt-5) | Chat Completions | ✅ | ❌ | Token counts only |
+| Groq (gpt-oss) | Responses | ✅ | ❌ | No reasoning.summary support |
+| Groq (gpt-oss) | Chat Completions | ✅ | ✅ | Via `reasoning_format: "parsed"` |
+| Gemini 2.5 | Chat Completions | ✅ | ✅ | Via `extra_body.google.thinking_config` |
+| Anthropic | OpenAI Compat | ❌ | ❌ | Not supported in compatibility layer |
+| OpenRouter | Various | ✅ | ✅ | Model-dependent, see provider docs |
+
+### Usage Examples
+
+```bash
+# OpenAI o1/o3 - see thinking content with Responses API
+pi-agent --api responses --model o1-mini "Explain quantum computing"
+
+# Groq gpt-oss - reasoning with Chat Completions
+pi-agent --base-url https://api.groq.com/openai/v1 --api-key $GROQ_API_KEY \
+  --model openai/gpt-oss-120b "Complex math problem"
+
+# Gemini 2.5 - thinking content automatically configured
+pi-agent --base-url https://generativelanguage.googleapis.com/v1beta/openai/ \
+  --api-key $GEMINI_API_KEY --model gemini-2.5-flash "Think step by step"
+
+# OpenRouter - supports various reasoning models
+pi-agent --base-url https://openrouter.ai/api/v1 --api-key $OPENROUTER_API_KEY \
+  --model "qwen/qwen3-235b-a22b-thinking-2507" "Complex reasoning task"
+```
+
+### JSON Mode Events
+
+When reasoning is active, you'll see:
+- `reasoning` events with thinking text (when available)
+- `token_usage` events include `reasoningTokens` field
+- Console/TUI show reasoning tokens with ⚡ symbol
+
+### Technical Details
+
+The agent automatically:
+- Detects provider from base URL
+- Tests model reasoning support on first use (cached)
+- Adjusts request parameters per provider:
+  - OpenAI: `reasoning_effort` (minimal/low)
+  - Groq: `reasoning_format: "parsed"`
+  - Gemini: `extra_body.google.thinking_config`
+  - OpenRouter: `reasoning` object with `effort` field
+- Parses provider-specific response formats:
+  - Gemini: Extracts from `<thought>` tags
+  - Groq: Uses `message.reasoning` field
+  - OpenRouter: Uses `message.reasoning` field
+  - OpenAI: Uses standard `reasoning` events
+
 ## Architecture

 The agent is built with:
--- a/packages/agent/src/agent.ts
+++ b/packages/agent/src/agent.ts
@ -44,10 +44,135 @@ export interface ToolCall {
 // Cache for model reasoning support detection per API type
 const modelReasoningSupport = new Map<string, { completions?: boolean; responses?: boolean }>();

+// Provider detection based on base URL
+function detectProvider(baseURL?: string): "openai" | "gemini" | "groq" | "anthropic" | "openrouter" | "other" {
+	if (!baseURL) return "openai";
+	if (baseURL.includes("api.openai.com")) return "openai";
+	if (baseURL.includes("generativelanguage.googleapis.com")) return "gemini";
+	if (baseURL.includes("api.groq.com")) return "groq";
+	if (baseURL.includes("api.anthropic.com")) return "anthropic";
+	if (baseURL.includes("openrouter.ai")) return "openrouter";
+	return "other";
+}
+
+// Parse provider-specific reasoning from message content
+function parseReasoningFromMessage(message: any, baseURL?: string): { cleanContent: string; reasoningTexts: string[] } {
+	const provider = detectProvider(baseURL);
+	const reasoningTexts: string[] = [];
+	let cleanContent = message.content || "";
+
+	switch (provider) {
+		case "gemini":
+			// Gemini returns thinking in <thought> tags
+			if (cleanContent.includes("<thought>")) {
+				const thoughtMatches = cleanContent.matchAll(/<thought>([\s\S]*?)<\/thought>/g);
+				for (const match of thoughtMatches) {
+					reasoningTexts.push(match[1].trim());
+				}
+				// Remove all thought tags from the response
+				cleanContent = cleanContent.replace(/<thought>[\s\S]*?<\/thought>/g, "").trim();
+			}
+			break;
+
+		case "groq":
+			// Groq returns reasoning in a separate field when reasoning_format is "parsed"
+			if (message.reasoning) {
+				reasoningTexts.push(message.reasoning);
+			}
+			break;
+
+		case "openrouter":
+			// OpenRouter returns reasoning in message.reasoning field
+			if (message.reasoning) {
+				reasoningTexts.push(message.reasoning);
+			}
+			break;
+
+		default:
+			// Other providers don't embed reasoning in message content
+			break;
+	}
+
+	return { cleanContent, reasoningTexts };
+}
+
+// Adjust request options based on provider-specific requirements
+function adjustRequestForProvider(
+	requestOptions: any,
+	api: "completions" | "responses",
+	baseURL?: string,
+	supportsReasoning?: boolean,
+): any {
+	const provider = detectProvider(baseURL);
+
+	// Handle provider-specific adjustments
+	switch (provider) {
+		case "gemini":
+			if (api === "completions" && supportsReasoning && requestOptions.reasoning_effort) {
+				// Gemini needs extra_body for thinking content
+				// Can't use both reasoning_effort and thinking_config
+				const budget =
+					requestOptions.reasoning_effort === "low"
+						? 1024
+						: requestOptions.reasoning_effort === "medium"
+							? 8192
+							: 24576;
+
+				requestOptions.extra_body = {
+					google: {
+						thinking_config: {
+							thinking_budget: budget,
+							include_thoughts: true,
+						},
+					},
+				};
+				// Remove reasoning_effort when using thinking_config
+				delete requestOptions.reasoning_effort;
+			}
+			break;
+
+		case "groq":
+			if (api === "responses" && requestOptions.reasoning) {
+				// Groq responses API doesn't support reasoning.summary
+				delete requestOptions.reasoning.summary;
+			} else if (api === "completions" && supportsReasoning && requestOptions.reasoning_effort) {
+				// Groq Chat Completions uses reasoning_format instead of reasoning_effort alone
+				requestOptions.reasoning_format = "parsed";
+				// Keep reasoning_effort for Groq
+			}
+			break;
+
+		case "anthropic":
+			// Anthropic's OpenAI compatibility has its own quirks
+			// But thinking content isn't available via OpenAI compat layer
+			break;
+
+		case "openrouter":
+			// OpenRouter uses a unified reasoning parameter format
+			if (api === "completions" && supportsReasoning && requestOptions.reasoning_effort) {
+				// Convert reasoning_effort to OpenRouter's reasoning format
+				requestOptions.reasoning = {
+					effort: requestOptions.reasoning_effort === "low" ? "low" : 
+					       requestOptions.reasoning_effort === "minimal" ? "low" : 
+					       requestOptions.reasoning_effort === "medium" ? "medium" : "high"
+				};
+				delete requestOptions.reasoning_effort;
+			}
+			break;
+
+		default:
+			// OpenAI and others use standard format
+			break;
+	}
+
+	return requestOptions;
+}
+
 async function checkReasoningSupport(
 	client: OpenAI,
 	model: string,
 	api: "completions" | "responses",
+	baseURL?: string,
 ): Promise<boolean> {
 	// Check cache first
 	const cacheKey = model;
@ -57,31 +182,54 @@ async function checkReasoningSupport(
 	}

 	let supportsReasoning = false;
+	const provider = detectProvider(baseURL);

 	if (api === "responses") {
 		// Try a minimal request with reasoning parameter for Responses API
 		try {
-			await client.responses.create({
+			const testRequest: any = {
 				model,
 				input: "test",
 				max_output_tokens: 1024,
 				reasoning: {
 					effort: "low", // Use low instead of minimal to ensure we get summaries
 				},
-			});
+			};
+			await client.responses.create(testRequest);
 			supportsReasoning = true;
 		} catch (error) {
 			supportsReasoning = false;
 		}
 	} else {
-		// For Chat Completions API, try with reasoning_effort parameter
+		// For Chat Completions API, try with reasoning parameter
 		try {
-			await client.chat.completions.create({
+			const testRequest: any = {
 				model,
 				messages: [{ role: "user", content: "test" }],
-				max_completion_tokens: 1,
-				reasoning_effort: "minimal",
-			});
+				max_completion_tokens: 1024,
+			};
+
+			// Add provider-specific reasoning parameters
+			if (provider === "gemini") {
+				// Gemini uses extra_body for thinking
+				testRequest.extra_body = {
+					google: {
+						thinking_config: {
+							thinking_budget: 100, // Minimum viable budget for test
+							include_thoughts: true,
+						},
+					},
+				};
+			} else if (provider === "groq") {
+				// Groq uses both reasoning_format and reasoning_effort
+				testRequest.reasoning_format = "parsed";
+				testRequest.reasoning_effort = "low";
+			} else {
+				// Others use reasoning_effort
+				testRequest.reasoning_effort = "minimal";
+			}
+
+			await client.chat.completions.create(testRequest);
 			supportsReasoning = true;
 		} catch (error) {
 			supportsReasoning = false;
@ -103,14 +251,10 @@ export async function callModelResponsesApi(
 	signal?: AbortSignal,
 	eventReceiver?: AgentEventReceiver,
 	supportsReasoning?: boolean,
+	baseURL?: string,
 ): Promise<void> {
 	await eventReceiver?.on({ type: "assistant_start" });

-	// Use provided reasoning support or detect it
-	if (supportsReasoning === undefined) {
-		supportsReasoning = await checkReasoningSupport(client, model, "responses");
-	}
-
 	let conversationDone = false;

 	while (!conversationDone) {
@ -120,23 +264,26 @@ export async function callModelResponsesApi(
 			throw new Error("Interrupted");
 		}

-		const response = await client.responses.create(
-			{
-				model,
-				input: messages,
-				tools: toolsForResponses as any,
-				tool_choice: "auto",
-				parallel_tool_calls: true,
-				max_output_tokens: 2000, // TODO make configurable
-				...(supportsReasoning && {
-					reasoning: {
-						effort: "medium", // Use auto reasoning effort
-						summary: "auto", // Request reasoning summaries
-					},
-				}),
-			},
-			{ signal },
-		);
+		// Build request options
+		let requestOptions: any = {
+			model,
+			input: messages,
+			tools: toolsForResponses as any,
+			tool_choice: "auto",
+			parallel_tool_calls: true,
+			max_output_tokens: 2000, // TODO make configurable
+			...(supportsReasoning && {
+				reasoning: {
+					effort: "minimal", // Use minimal effort for responses API
+					summary: "detailed", // Request detailed reasoning summaries
+				},
+			}),
+		};
+
+		// Apply provider-specific adjustments
+		requestOptions = adjustRequestForProvider(requestOptions, "responses", baseURL, supportsReasoning);
+
+		const response = await client.responses.create(requestOptions, { signal });

 		// Report token usage if available (responses API format)
 		if (response.usage) {
@ -250,14 +397,10 @@ export async function callModelChatCompletionsApi(
 	signal?: AbortSignal,
 	eventReceiver?: AgentEventReceiver,
 	supportsReasoning?: boolean,
+	baseURL?: string,
 ): Promise<void> {
 	await eventReceiver?.on({ type: "assistant_start" });

-	// Use provided reasoning support or detect it
-	if (supportsReasoning === undefined) {
-		supportsReasoning = await checkReasoningSupport(client, model, "completions");
-	}
-
 	let assistantResponded = false;

 	while (!assistantResponded) {
@ -266,19 +409,22 @@ export async function callModelChatCompletionsApi(
 			throw new Error("Interrupted");
 		}

-		const response = await client.chat.completions.create(
-			{
-				model,
-				messages,
-				tools: toolsForChat,
-				tool_choice: "auto",
-				max_completion_tokens: 2000, // TODO make configurable
-				...(supportsReasoning && {
-					reasoning_effort: "medium",
-				}),
-			},
-			{ signal },
-		);
+		// Build request options
+		let requestOptions: any = {
+			model,
+			messages,
+			tools: toolsForChat,
+			tool_choice: "auto",
+			max_completion_tokens: 2000, // TODO make configurable
+			...(supportsReasoning && {
+				reasoning_effort: "low", // Use low effort for completions API
+			}),
+		};
+
+		// Apply provider-specific adjustments
+		requestOptions = adjustRequestForProvider(requestOptions, "completions", baseURL, supportsReasoning);
+
+		const response = await client.chat.completions.create(requestOptions, { signal });

 		const message = response.choices[0].message;

@ -339,9 +485,17 @@ export async function callModelChatCompletionsApi(
 				}
 			}
 		} else if (message.content) {
-			// Final assistant response
-			eventReceiver?.on({ type: "assistant_message", text: message.content });
-			const finalMsg = { role: "assistant", content: message.content };
+			// Parse provider-specific reasoning from message
+			const { cleanContent, reasoningTexts } = parseReasoningFromMessage(message, baseURL);
+
+			// Emit reasoning events if any
+			for (const reasoning of reasoningTexts) {
+				await eventReceiver?.on({ type: "reasoning", text: reasoning });
+			}
+
+			// Emit the cleaned assistant message
+			await eventReceiver?.on({ type: "assistant_message", text: cleanContent });
+			const finalMsg = { role: "assistant", content: cleanContent };
 			messages.push(finalMsg);
 			assistantResponded = true;
 		}
@ -417,6 +571,7 @@ export class Agent {
 						this.client,
 						this.config.model,
 						"responses",
+						this.config.baseURL,
 					);
 				}

@ -427,6 +582,7 @@ export class Agent {
 					this.abortController.signal,
 					this.comboReceiver,
 					this.supportsReasoningResponses,
+					this.config.baseURL,
 				);
 			} else {
 				// Check reasoning support for completions API
@ -435,6 +591,7 @@ export class Agent {
 						this.client,
 						this.config.model,
 						"completions",
+						this.config.baseURL,
 					);
 				}

@ -445,6 +602,7 @@ export class Agent {
 					this.abortController.signal,
 					this.comboReceiver,
 					this.supportsReasoningCompletions,
+					this.config.baseURL,
 				);
 			}
 		} catch (e) {
--- a/todos/project-description.md
+++ b/todos/project-description.md
@ -6,7 +6,7 @@ A comprehensive toolkit for managing Large Language Model (LLM) deployments and
 - Terminal UI framework with differential rendering and interactive components
 - AI agent framework with tool calling, session persistence, and multiple renderers
 - GPU pod management CLI for automated vLLM deployment on various providers
- Support for OpenAI, Anthropic, Groq, OpenRouter, and compatible APIs
+- Support for OpenAI, Anthropic, Groq, OpenRouter, Gemini, and compatible APIs
 - Built-in file system tools for agentic AI capabilities

 ## Tech Stack
--- a/todos/todos.md
+++ b/todos/todos.md
@ -1,5 +1,45 @@
- agent: token usage gets overwritten with each message that has usage data. however, if the latest data doesn't have a specific usage field, we record undefined i think? also,   {"type":"token_usage" "inputTokens":240,"outputTokens":35,"totalTokens":275,"cacheReadTokens":0,"cacheWriteTokens":0} doesn't contain reasoningToken? do we lack initialization?
+- agent: test for basic functionality, including thinking, completions & responses API support for all the known providers and their endpoints.
+
+- agent: token usage gets overwritten with each message that has usage data. however, if the latest data doesn't have a specific usage field, we record undefined i think? also,   {"type":"token_usage" "inputTokens":240,"outputTokens":35,"totalTokens":275,"cacheReadTokens":0,"cacheWriteTokens":0} doesn't contain reasoningToken? do we lack initialization? See case "token_usage": in renderers. probably need to check if lastXXX > current and use lastXXX.
+
+-agent: groq responses api throws on second message
+    ```
+    ➜  pi-mono git:(main) ✗ npx tsx packages/agent/src/cli.ts --base-url https://api.groq.com/openai/v1 --api-key $GROQ_API_KEY --model openai/gpt-oss-120b --api responses
+    >> pi interactive chat <<<
+    Press Escape to interrupt while processing
+    Press CTRL+C to clear the text editor
+    Press CTRL+C twice quickly to exit
+
+    [user]
+    think step by step: what's 2+2?
+
+    [assistant]
+    [thinking]
+    The user asks "think step by step: what's 2+2?" They want a step-by-step reasoning. That's
+    trivial: 2+2=4. Provide answer with steps.
+
+    Sure! Let’s break it down:
+
+    1. Identify the numbers: We have the numbers 2 and 2.
+    2. Add the first number to the second:
+    3. Calculate:
+
+    2 + 2 = 4
+
+    Answer: 2 + 2 = 4.
+
+    [user]
+    what was your last thinking content?
+
+    [assistant]
+    [error] 400 `input`: `items[3]`: `role`: assistant role cannot be used with type='message'
+    (use EasyInputMessage format without type field)
+    ```
+
 - pods: if a pod is down and i run `pi list`, verifying processes says All processes verified. But that can't be true, as we can no longer SSH into the pod to check.
+
 - agent: start a new agent session. when i press CTRL+C, "Press Ctrl+C again to exit" appears above the text editor followed by an empty line. After about 1 second, the empty line disappears. We should either not show the empty line, or always show the empty line. Maybe Ctrl+C info should be displayed below the text editor.
+
 - tui: npx tsx test/demo.ts, using /exit or pressing CTRL+C does not work to exit the demo.
+
 - agent: we need to make system prompt and tools pluggable. We need to figure out the simplest way for users to define system prompts and toolkits. A toolkit could be a subset of the built-in tools, a mixture of a subset of the built-in tools plus custom self-made tools, maybe include MCP servers, and so on. We need to figure out a way to make this super easy. users should be able to write their tools in whatever language they fancy. which means that probably something like process spawning plus studio communication transport would make the most sense. but then we were back at MCP basically. And that does not support interruptibility, which we need for the agent. So if the agent invokes the tool and the user presses escape in the interface, then the tool invocation must be interrupted and whatever it's doing must stop, including killing all sub-processes. For MCP this could be solved for studio MCP servers by, since we spawn those on startup or whenever we load the tools, we spawn a process for an MCP server and then reuse that process for subsequent tool invocations. If the user interrupts then we could just kill that process, assuming that anything it's doing or any of its sub-processes will be killed along the way. So I guess tools could all be written as MCP servers, but that's a lot of overhead. It would also be nice to be able to provide tools just as a bash script that gets some inputs and return some outputs based on the inputs Same for Go apps or TypeScript apps invoked by MPX TSX. just make the barrier of entry for writing your own tools super fucking low. not necessarily going full MCP. but we also need to support MCP. So whatever we arrive at, we then need to take our built-in tools and see if those can be refactored to work with our new tools
--- a/todos/work/2025-08-09-23-33-17-missing-thinking-tokens/task.md
+++ b/todos/work/2025-08-09-23-33-17-missing-thinking-tokens/task.md
@ -1,6 +1,6 @@
 # Fix Missing Thinking Tokens for GPT-5 and Anthropic Models
 **Status:** AwaitingCommit
-**Agent PID:** 27674
+**Agent PID:** 41002

 ## Original Todo
 agent: we do not get thinking tokens for gpt-5. possibly also not for anthropic models?
@ -25,6 +25,18 @@ The agent doesn't extract or report reasoning/thinking tokens from OpenAI's reas
 - [x] Fix: Add reasoning support detection for Chat Completions API
 - [x] Fix: Add correct summary parameter value and increase max_output_tokens for preflight check
 - [x] Investigate: Chat Completions API has reasoning tokens but no thinking events
+- [x] Debug: Add logging to understand gpt-5 response structure in responses API
+- [x] Fix: Change reasoning summary from "auto" to "always" to ensure reasoning text is always returned
+- [x] Fix: Set correct effort levels - "minimal" for responses API, "low" for completions API
+- [x] Add note to README about Chat Completions API not returning thinking content
+- [x] Add Gemini API example to README
+- [x] Verify Gemini thinking token support and update README accordingly
+- [x] Add special case for Gemini to include extra_body with thinking_config
+- [x] Add special case for Groq responses API (doesn't support reasoning.summary)
+- [x] Refactor: Create centralized provider-specific request adjustment function
+- [x] Refactor: Extract message content parsing into parseReasoningFromMessage() function
+- [x] Test: Verify Groq reasoning extraction works with refactored code
+- [x] Test: Verify Gemini thinking extraction works with refactored code

 ## Notes
 User reported that o3 model with responses API doesn't show reasoning tokens or thinking events.
@ -36,5 +48,11 @@ Fixed by:
 5. Parsing both reasoning_text (o1/o3) and summary_text (gpt-5) formats
 6. Displaying reasoning tokens in console and TUI renderers with ⚡ symbol
 7. Properly handling reasoning_effort for Chat Completions API
+8. Set correct effort levels: "minimal" for Responses API, "low" for Chat Completions API
+9. Set summary to "always" for Responses API

-**Important finding**: Chat Completions API by design only returns reasoning token *counts* but not the actual thinking/reasoning content for o1 models. This is expected behavior - only the Responses API exposes thinking events.
+**Important findings**: 
+- Chat Completions API by design only returns reasoning token *counts* but not the actual thinking/reasoning content for o1 models. This is expected behavior - only the Responses API exposes thinking events.
+- GPT-5 models currently return empty summary arrays even with `summary: "detailed"` - the model indicates it "can't share step-by-step reasoning". This appears to be a model limitation/behavior rather than a code issue. 
+- The reasoning tokens ARE being used and counted correctly when the model chooses to use them.
+- With effort="minimal" and summary="detailed", gpt-5 sometimes chooses not to use reasoning at all for simple questions.