Merge branch 'main' into feat/tui-overlay-options

2026-04-17 08:00:59 +00:00 · 2026-01-13 22:06:02 +01:00 · 2026-01-13 22:06:02 +01:00 · 7d45e434de
commit 7d45e434de
parent a4ccff382c 9994ebbedd
90 changed files with 10277 additions and 1700 deletions
--- a/packages/ai/CHANGELOG.md
+++ b/packages/ai/CHANGELOG.md
@ -2,6 +2,36 @@

 ## [Unreleased]

+## [0.45.5] - 2026-01-13
+
+## [0.45.4] - 2026-01-13
+
+### Added
+
+- Added Vercel AI Gateway provider with model discovery and `AI_GATEWAY_API_KEY` env support ([#689](https://github.com/badlogic/pi-mono/pull/689) by [@timolins](https://github.com/timolins))
+
+### Fixed
+
+- Fixed z.ai thinking/reasoning: z.ai uses `thinking: { type: "enabled" }` instead of OpenAI's `reasoning_effort`. Added `thinkingFormat` compat flag to handle this. ([#688](https://github.com/badlogic/pi-mono/issues/688))
+
+## [0.45.3] - 2026-01-13
+
+## [0.45.2] - 2026-01-13
+
+## [0.45.1] - 2026-01-13
+
+## [0.45.0] - 2026-01-13
+
+### Added
+
+- MiniMax provider support with M2 and M2.1 models via Anthropic-compatible API ([#656](https://github.com/badlogic/pi-mono/pull/656) by [@dannote](https://github.com/dannote))
+- Add Amazon Bedrock provider with prompt caching for Claude models (experimental, tested with Anthropic Claude models only) ([#494](https://github.com/badlogic/pi-mono/pull/494) by [@unexge](https://github.com/unexge))
+- Added `serviceTier` option for OpenAI Responses requests ([#672](https://github.com/badlogic/pi-mono/pull/672) by [@markusylisiurunen](https://github.com/markusylisiurunen))
+- **Anthropic caching on OpenRouter**: Interactions with Anthropic models via OpenRouter now set a 5-minute cache point using Anthropic-style `cache_control` breakpoints on the last assistant or user message. ([#584](https://github.com/badlogic/pi-mono/pull/584) by [@nathyong](https://github.com/nathyong))
+- **Google Gemini CLI provider improvements**: Added Antigravity endpoint fallback (tries daily sandbox then prod when `baseUrl` is unset), header-based retry delay parsing (`Retry-After`, `x-ratelimit-reset`, `x-ratelimit-reset-after`), stable `sessionId` derivation from first user message for cache affinity, empty SSE stream retry with backoff, and `anthropic-beta` header for Claude thinking models ([#670](https://github.com/badlogic/pi-mono/pull/670) by [@kim0](https://github.com/kim0))
+
+## [0.44.0] - 2026-01-12
+
 ## [0.43.0] - 2026-01-11

 ### Fixed
--- a/packages/ai/README.md
+++ b/packages/ai/README.md
@ -56,9 +56,12 @@ Unified LLM API with automatic model discovery, provider configuration, token an
 - **Cerebras**
 - **xAI**
 - **OpenRouter**
+- **Vercel AI Gateway**
+- **MiniMax**
 - **GitHub Copilot** (requires OAuth, see below)
 - **Google Gemini CLI** (requires OAuth, see below)
 - **Antigravity** (requires OAuth, see below)
+- **Amazon Bedrock**
 - **Any OpenAI-compatible API**: Ollama, vLLM, LM Studio, etc.

 ## Installation
@ -708,6 +711,7 @@ interface OpenAICompat {
  supportsDeveloperRole?: boolean;   // Whether provider supports `developer` role vs `system` (default: true)
  supportsReasoningEffort?: boolean; // Whether provider supports `reasoning_effort` (default: true)
  maxTokensField?: 'max_completion_tokens' | 'max_tokens';  // Which field name to use (default: max_completion_tokens)
+  thinkingFormat?: 'openai' | 'zai'; // Format for reasoning param: 'openai' uses reasoning_effort, 'zai' uses thinking: { type: "enabled" } (default: openai)
 }
 ```

@ -860,7 +864,9 @@ In Node.js environments, you can set environment variables to avoid passing API
 | Cerebras | `CEREBRAS_API_KEY` |
 | xAI | `XAI_API_KEY` |
 | OpenRouter | `OPENROUTER_API_KEY` |
+| Vercel AI Gateway | `AI_GATEWAY_API_KEY` |
 | zAI | `ZAI_API_KEY` |
+| MiniMax | `MINIMAX_API_KEY` |
 | GitHub Copilot | `COPILOT_GITHUB_TOKEN` or `GH_TOKEN` or `GITHUB_TOKEN` |

 When set, the library automatically uses these keys:
@ -1026,6 +1032,90 @@ const response = await complete(model, {

 **Google Gemini CLI / Antigravity**: These use Google Cloud OAuth. The `apiKey` returned by `getOAuthApiKey()` is a JSON string containing both the token and project ID, which the library handles automatically.

+## Development
+
+### Adding a New Provider
+
+Adding a new LLM provider requires changes across multiple files. This checklist covers all necessary steps:
+
+#### 1. Core Types (`src/types.ts`)
+
+- Add the API identifier to the `Api` type union (e.g., `"bedrock-converse-stream"`)
+- Create an options interface extending `StreamOptions` (e.g., `BedrockOptions`)
+- Add the mapping to `ApiOptionsMap`
+- Add the provider name to `KnownProvider` type union (e.g., `"amazon-bedrock"`)
+
+#### 2. Provider Implementation (`src/providers/`)
+
+Create a new provider file (e.g., `amazon-bedrock.ts`) that exports:
+
+- `stream<Provider>()` function returning `AssistantMessageEventStream`
+- Provider-specific options interface
+- Message conversion functions to transform `Context` to provider format
+- Tool conversion if the provider supports tools
+- Response parsing to emit standardized events (`text`, `tool_call`, `thinking`, `usage`, `stop`)
+
+#### 3. Stream Integration (`src/stream.ts`)
+
+- Import the provider's stream function and options type
+- Add credential detection in `getEnvApiKey()` for the new provider
+- Add a case in `mapOptionsForApi()` to map `SimpleStreamOptions` to provider options
+- Add the provider's stream function to the `streamFunctions` map
+
+#### 4. Model Generation (`scripts/generate-models.ts`)
+
+- Add logic to fetch and parse models from the provider's source (e.g., models.dev API)
+- Map provider model data to the standardized `Model` interface
+- Handle provider-specific quirks (pricing format, capability flags, model ID transformations)
+
+#### 5. Tests (`test/`)
+
+Create or update test files to cover the new provider:
+
+- `stream.test.ts` - Basic streaming and tool use
+- `tokens.test.ts` - Token usage reporting
+- `abort.test.ts` - Request cancellation
+- `empty.test.ts` - Empty message handling
+- `context-overflow.test.ts` - Context limit errors
+- `image-limits.test.ts` - Image support (if applicable)
+- `unicode-surrogate.test.ts` - Unicode handling
+- `tool-call-without-result.test.ts` - Orphaned tool calls
+- `image-tool-result.test.ts` - Images in tool results
+- `total-tokens.test.ts` - Token counting accuracy
+
+For providers with non-standard auth (AWS, Google Vertex), create a utility like `bedrock-utils.ts` with credential detection helpers.
+
+#### 6. Coding Agent Integration (`../coding-agent/`)
+
+Update `src/core/model-resolver.ts`:
+
+- Add a default model ID for the provider in `DEFAULT_MODELS`
+
+Update `src/cli/args.ts`:
+
+- Add environment variable documentation in the help text
+
+Update `README.md`:
+
+- Add the provider to the providers section with setup instructions
+
+#### 7. Documentation
+
+Update `packages/ai/README.md`:
+
+- Add to the Supported Providers table
+- Document any provider-specific options or authentication requirements
+- Add environment variable to the Environment Variables section
+
+#### 8. Changelog
+
+Add an entry to `packages/ai/CHANGELOG.md` under `## [Unreleased]`:
+
+```markdown
+### Added
+- Added support for [Provider Name] provider ([#PR](link) by [@author](link))
+```
+
 ## License

 MIT
--- a/packages/ai/package.json
+++ b/packages/ai/package.json
@ -1,6 +1,6 @@
 {
 	"name": "@mariozechner/pi-ai",
-	"version": "0.43.0",
+	"version": "0.45.5",
 	"description": "Unified LLM API with automatic model discovery and provider configuration",
 	"type": "module",
 	"main": "./dist/index.js",
@ -23,6 +23,7 @@
 	},
 	"dependencies": {
 		"@anthropic-ai/sdk": "0.71.2",
+		"@aws-sdk/client-bedrock-runtime": "^3.966.0",
 		"@google/genai": "1.34.0",
 		"@mistralai/mistralai": "1.10.0",
 		"@sinclair/typebox": "^0.34.41",
@ -39,6 +40,7 @@
 		"openai",
 		"anthropic",
 		"gemini",
+		"bedrock",
 		"unified",
 		"api"
 	],
--- a/packages/ai/scripts/generate-models.ts
+++ b/packages/ai/scripts/generate-models.ts
@ -32,6 +32,20 @@ interface ModelsDevModel {
 	};
 }

+interface AiGatewayModel {
+	id: string;
+	name?: string;
+	context_window?: number;
+	max_tokens?: number;
+	tags?: string[];
+	pricing?: {
+		input?: string | number;
+		output?: string | number;
+		input_cache_read?: string | number;
+		input_cache_write?: string | number;
+	};
+}
+
 const COPILOT_STATIC_HEADERS = {
 	"User-Agent": "GitHubCopilotChat/0.35.0",
 	"Editor-Version": "vscode/1.107.0",
@ -39,6 +53,9 @@ const COPILOT_STATIC_HEADERS = {
 	"Copilot-Integration-Id": "vscode-chat",
 } as const;

+const AI_GATEWAY_MODELS_URL = "https://ai-gateway.vercel.sh/v1";
+const AI_GATEWAY_BASE_URL = "https://ai-gateway.vercel.sh";
+
 async function fetchOpenRouterModels(): Promise<Model<any>[]> {
 	try {
 		console.log("Fetching models from OpenRouter API...");
@ -97,6 +114,64 @@ async function fetchOpenRouterModels(): Promise<Model<any>[]> {
 	}
 }

+async function fetchAiGatewayModels(): Promise<Model<any>[]> {
+	try {
+		console.log("Fetching models from Vercel AI Gateway API...");
+		const response = await fetch(`${AI_GATEWAY_MODELS_URL}/models`);
+		const data = await response.json();
+		const models: Model<any>[] = [];
+
+		const toNumber = (value: string | number | undefined): number => {
+			if (typeof value === "number") {
+				return Number.isFinite(value) ? value : 0;
+			}
+			const parsed = parseFloat(value ?? "0");
+			return Number.isFinite(parsed) ? parsed : 0;
+		};
+
+		const items = Array.isArray(data.data) ? (data.data as AiGatewayModel[]) : [];
+		for (const model of items) {
+			const tags = Array.isArray(model.tags) ? model.tags : [];
+			// Only include models that support tools
+			if (!tags.includes("tool-use")) continue;
+
+			const input: ("text" | "image")[] = ["text"];
+			if (tags.includes("vision")) {
+				input.push("image");
+			}
+
+			const inputCost = toNumber(model.pricing?.input) * 1_000_000;
+			const outputCost = toNumber(model.pricing?.output) * 1_000_000;
+			const cacheReadCost = toNumber(model.pricing?.input_cache_read) * 1_000_000;
+			const cacheWriteCost = toNumber(model.pricing?.input_cache_write) * 1_000_000;
+
+			models.push({
+				id: model.id,
+				name: model.name || model.id,
+				api: "anthropic-messages",
+				baseUrl: AI_GATEWAY_BASE_URL,
+				provider: "vercel-ai-gateway",
+				reasoning: tags.includes("reasoning"),
+				input,
+				cost: {
+					input: inputCost,
+					output: outputCost,
+					cacheRead: cacheReadCost,
+					cacheWrite: cacheWriteCost,
+				},
+				contextWindow: model.context_window || 4096,
+				maxTokens: model.max_tokens || 4096,
+			});
+		}
+
+		console.log(`Fetched ${models.length} tool-capable models from Vercel AI Gateway`);
+		return models;
+	} catch (error) {
+		console.error("Failed to fetch Vercel AI Gateway models:", error);
+		return [];
+	}
+}
+
 async function loadModelsDevData(): Promise<Model<any>[]> {
 	try {
 		console.log("Fetching models from models.dev API...");
@ -105,6 +180,87 @@ async function loadModelsDevData(): Promise<Model<any>[]> {

 		const models: Model<any>[] = [];

+		// Process Amazon Bedrock models
+		if (data["amazon-bedrock"]?.models) {
+			for (const [modelId, model] of Object.entries(data["amazon-bedrock"].models)) {
+				const m = model as ModelsDevModel;
+				if (m.tool_call !== true) continue;
+
+				let id = modelId;
+
+				if (id.startsWith("ai21.jamba")) {
+					// These models doesn't support tool use in streaming mode
+					continue;
+				}
+
+				if (id.startsWith("amazon.titan-text-express") ||
+				    id.startsWith("mistral.mistral-7b-instruct-v0")) {
+					// These models doesn't support system messages
+					continue;
+				}
+
+				// Some Amazon Bedrock models require cross-region inference profiles to work.
+				// To use cross-region inference, we need to add a region prefix to the models.
+				// See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html#inference-profiles-support-system
+				// TODO: Remove Claude models once https://github.com/anomalyco/models.dev/pull/607 is merged, and follow-up with other models.
+
+				// Models with global cross-region inference profiles
+				if (id.startsWith("anthropic.claude-haiku-4-5") ||
+						id.startsWith("anthropic.claude-sonnet-4") ||
+						id.startsWith("anthropic.claude-opus-4-5") ||
+						id.startsWith("amazon.nova-2-lite") ||
+						id.startsWith("cohere.embed-v4") ||
+						id.startsWith("twelvelabs.pegasus-1-2")) {
+						id = "global." + id;
+				}
+
+				// Models with US cross-region inference profiles
+				if (id.startsWith("amazon.nova-lite") ||
+						id.startsWith("amazon.nova-micro") ||
+						id.startsWith("amazon.nova-premier") ||
+						id.startsWith("amazon.nova-pro") ||
+						id.startsWith("anthropic.claude-3-7-sonnet") ||
+						id.startsWith("anthropic.claude-opus-4-1") ||
+						id.startsWith("anthropic.claude-opus-4-20250514") ||
+						id.startsWith("deepseek.r1") ||
+						id.startsWith("meta.llama3-2") ||
+						id.startsWith("meta.llama3-3") ||
+						id.startsWith("meta.llama4")) {
+						id = "us." + id;
+				}
+
+				const bedrockModel = {
+					id,
+					name: m.name || id,
+					api: "bedrock-converse-stream" as const,
+					provider: "amazon-bedrock" as const,
+					baseUrl: "https://bedrock-runtime.us-east-1.amazonaws.com",
+					reasoning: m.reasoning === true,
+					input: (m.modalities?.input?.includes("image") ? ["text", "image"] : ["text"]) as ("text" | "image")[],
+					cost: {
+						input: m.cost?.input || 0,
+						output: m.cost?.output || 0,
+						cacheRead: m.cost?.cache_read || 0,
+						cacheWrite: m.cost?.cache_write || 0,
+					},
+					contextWindow: m.limit?.context || 4096,
+					maxTokens: m.limit?.output || 4096,
+				};
+				models.push(bedrockModel);
+
+				// Add EU cross-region inference variants for Claude models
+				if (modelId.startsWith("anthropic.claude-haiku-4-5") ||
+						modelId.startsWith("anthropic.claude-sonnet-4-5") ||
+						modelId.startsWith("anthropic.claude-opus-4-5")) {
+					models.push({
+						...bedrockModel,
+						id: "eu." + modelId,
+						name: (m.name || modelId) + " (EU)",
+					});
+				}
+			}
+		}
+
 		// Process Anthropic models
 		if (data.anthropic?.models) {
 			for (const [modelId, model] of Object.entries(data.anthropic.models)) {
@ -284,6 +440,7 @@ async function loadModelsDevData(): Promise<Model<any>[]> {
 				},
 				compat: {
 					supportsDeveloperRole: false,
+					thinkingFormat: "zai",
 				},
 				contextWindow: m.limit?.context || 4096,
 				maxTokens: m.limit?.output || 4096,
@ -409,6 +566,33 @@ async function loadModelsDevData(): Promise<Model<any>[]> {
 			}
 		}

+		// Process MiniMax models
+		if (data.minimax?.models) {
+			for (const [modelId, model] of Object.entries(data.minimax.models)) {
+				const m = model as ModelsDevModel;
+				if (m.tool_call !== true) continue;
+
+				models.push({
+					id: modelId,
+					name: m.name || modelId,
+					api: "anthropic-messages",
+					provider: "minimax",
+					// MiniMax's Anthropic-compatible API - SDK appends /v1/messages
+					baseUrl: "https://api.minimax.io/anthropic",
+					reasoning: m.reasoning === true,
+					input: m.modalities?.input?.includes("image") ? ["text", "image"] : ["text"],
+					cost: {
+						input: m.cost?.input || 0,
+						output: m.cost?.output || 0,
+						cacheRead: m.cost?.cache_read || 0,
+						cacheWrite: m.cost?.cache_write || 0,
+					},
+					contextWindow: m.limit?.context || 4096,
+					maxTokens: m.limit?.output || 4096,
+				});
+			}
+		}
+
 		console.log(`Loaded ${models.length} tool-capable models from models.dev`);
 		return models;
 	} catch (error) {
@ -421,11 +605,13 @@ async function generateModels() {
 	// Fetch models from both sources
 	// models.dev: Anthropic, Google, OpenAI, Groq, Cerebras
 	// OpenRouter: xAI and other providers (excluding Anthropic, Google, OpenAI)
+	// AI Gateway: OpenAI-compatible catalog with tool-capable models
 	const modelsDevModels = await loadModelsDevData();
 	const openRouterModels = await fetchOpenRouterModels();
+	const aiGatewayModels = await fetchAiGatewayModels();

 	// Combine models (models.dev has priority)
-	const allModels = [...modelsDevModels, ...openRouterModels];
+	const allModels = [...modelsDevModels, ...openRouterModels, ...aiGatewayModels];

 	// Fix incorrect cache pricing for Claude Opus 4.5 from models.dev
 	// models.dev has 3x the correct pricing (1.5/18.75 instead of 0.5/6.25)
--- a/packages/ai/src/models.generated.ts
+++ b/packages/ai/src/models.generated.ts
--- a/packages/ai/src/providers/amazon-bedrock.ts
+++ b/packages/ai/src/providers/amazon-bedrock.ts
@ -0,0 +1,548 @@
+import {
+	BedrockRuntimeClient,
+	StopReason as BedrockStopReason,
+	type Tool as BedrockTool,
+	CachePointType,
+	type ContentBlock,
+	type ContentBlockDeltaEvent,
+	type ContentBlockStartEvent,
+	type ContentBlockStopEvent,
+	ConversationRole,
+	ConverseStreamCommand,
+	type ConverseStreamMetadataEvent,
+	ImageFormat,
+	type Message,
+	type SystemContentBlock,
+	type ToolChoice,
+	type ToolConfiguration,
+	ToolResultStatus,
+} from "@aws-sdk/client-bedrock-runtime";
+
+import { calculateCost } from "../models.js";
+import type {
+	Api,
+	AssistantMessage,
+	Context,
+	Model,
+	StopReason,
+	StreamFunction,
+	StreamOptions,
+	TextContent,
+	ThinkingBudgets,
+	ThinkingContent,
+	ThinkingLevel,
+	Tool,
+	ToolCall,
+	ToolResultMessage,
+} from "../types.js";
+import { AssistantMessageEventStream } from "../utils/event-stream.js";
+import { parseStreamingJson } from "../utils/json-parse.js";
+import { sanitizeSurrogates } from "../utils/sanitize-unicode.js";
+
+export interface BedrockOptions extends StreamOptions {
+	region?: string;
+	profile?: string;
+	toolChoice?: "auto" | "any" | "none" | { type: "tool"; name: string };
+	/* See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-reasoning.html for supported models. */
+	reasoning?: ThinkingLevel;
+	/* Custom token budgets per thinking level. Overrides default budgets. */
+	thinkingBudgets?: ThinkingBudgets;
+	/* Only supported by Claude 4.x models, see https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved */
+	interleavedThinking?: boolean;
+}
+
+type Block = (TextContent | ThinkingContent | ToolCall) & { index?: number; partialJson?: string };
+
+export const streamBedrock: StreamFunction<"bedrock-converse-stream"> = (
+	model: Model<"bedrock-converse-stream">,
+	context: Context,
+	options: BedrockOptions,
+): AssistantMessageEventStream => {
+	const stream = new AssistantMessageEventStream();
+
+	(async () => {
+		const output: AssistantMessage = {
+			role: "assistant",
+			content: [],
+			api: "bedrock-converse-stream" as Api,
+			provider: model.provider,
+			model: model.id,
+			usage: {
+				input: 0,
+				output: 0,
+				cacheRead: 0,
+				cacheWrite: 0,
+				totalTokens: 0,
+				cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
+			},
+			stopReason: "stop",
+			timestamp: Date.now(),
+		};
+
+		const blocks = output.content as Block[];
+
+		try {
+			const client = new BedrockRuntimeClient({
+				region: options.region || process.env.AWS_REGION || process.env.AWS_DEFAULT_REGION || "us-east-1",
+				profile: options.profile,
+			});
+
+			const command = new ConverseStreamCommand({
+				modelId: model.id,
+				messages: convertMessages(context, model),
+				system: buildSystemPrompt(context.systemPrompt, model),
+				inferenceConfig: { maxTokens: options.maxTokens, temperature: options.temperature },
+				toolConfig: convertToolConfig(context.tools, options.toolChoice),
+				additionalModelRequestFields: buildAdditionalModelRequestFields(model, options),
+			});
+
+			const response = await client.send(command, { abortSignal: options.signal });
+
+			for await (const item of response.stream!) {
+				if (item.messageStart) {
+					if (item.messageStart.role !== ConversationRole.ASSISTANT) {
+						throw new Error("Unexpected assistant message start but got user message start instead");
+					}
+					stream.push({ type: "start", partial: output });
+				} else if (item.contentBlockStart) {
+					handleContentBlockStart(item.contentBlockStart, blocks, output, stream);
+				} else if (item.contentBlockDelta) {
+					handleContentBlockDelta(item.contentBlockDelta, blocks, output, stream);
+				} else if (item.contentBlockStop) {
+					handleContentBlockStop(item.contentBlockStop, blocks, output, stream);
+				} else if (item.messageStop) {
+					output.stopReason = mapStopReason(item.messageStop.stopReason);
+				} else if (item.metadata) {
+					handleMetadata(item.metadata, model, output);
+				} else if (item.internalServerException) {
+					throw new Error(`Internal server error: ${item.internalServerException.message}`);
+				} else if (item.modelStreamErrorException) {
+					throw new Error(`Model stream error: ${item.modelStreamErrorException.message}`);
+				} else if (item.validationException) {
+					throw new Error(`Validation error: ${item.validationException.message}`);
+				} else if (item.throttlingException) {
+					throw new Error(`Throttling error: ${item.throttlingException.message}`);
+				} else if (item.serviceUnavailableException) {
+					throw new Error(`Service unavailable: ${item.serviceUnavailableException.message}`);
+				}
+			}
+
+			if (options.signal?.aborted) {
+				throw new Error("Request was aborted");
+			}
+
+			if (output.stopReason === "error" || output.stopReason === "aborted") {
+				throw new Error("An unknown error occurred");
+			}
+
+			stream.push({ type: "done", reason: output.stopReason, message: output });
+			stream.end();
+		} catch (error) {
+			for (const block of output.content) {
+				delete (block as Block).index;
+				delete (block as Block).partialJson;
+			}
+			output.stopReason = options.signal?.aborted ? "aborted" : "error";
+			output.errorMessage = error instanceof Error ? error.message : JSON.stringify(error);
+			stream.push({ type: "error", reason: output.stopReason, error: output });
+			stream.end();
+		}
+	})();
+
+	return stream;
+};
+
+function handleContentBlockStart(
+	event: ContentBlockStartEvent,
+	blocks: Block[],
+	output: AssistantMessage,
+	stream: AssistantMessageEventStream,
+): void {
+	const index = event.contentBlockIndex!;
+	const start = event.start;
+
+	if (start?.toolUse) {
+		const block: Block = {
+			type: "toolCall",
+			id: start.toolUse.toolUseId || "",
+			name: start.toolUse.name || "",
+			arguments: {},
+			partialJson: "",
+			index,
+		};
+		output.content.push(block);
+		stream.push({ type: "toolcall_start", contentIndex: blocks.length - 1, partial: output });
+	}
+}
+
+function handleContentBlockDelta(
+	event: ContentBlockDeltaEvent,
+	blocks: Block[],
+	output: AssistantMessage,
+	stream: AssistantMessageEventStream,
+): void {
+	const contentBlockIndex = event.contentBlockIndex!;
+	const delta = event.delta;
+	let index = blocks.findIndex((b) => b.index === contentBlockIndex);
+	let block = blocks[index];
+
+	if (delta?.text !== undefined) {
+		// If no text block exists yet, create one, as `handleContentBlockStart` is not sent for text blocks
+		if (!block) {
+			const newBlock: Block = { type: "text", text: "", index: contentBlockIndex };
+			output.content.push(newBlock);
+			index = blocks.length - 1;
+			block = blocks[index];
+			stream.push({ type: "text_start", contentIndex: index, partial: output });
+		}
+		if (block.type === "text") {
+			block.text += delta.text;
+			stream.push({ type: "text_delta", contentIndex: index, delta: delta.text, partial: output });
+		}
+	} else if (delta?.toolUse && block?.type === "toolCall") {
+		block.partialJson = (block.partialJson || "") + (delta.toolUse.input || "");
+		block.arguments = parseStreamingJson(block.partialJson);
+		stream.push({ type: "toolcall_delta", contentIndex: index, delta: delta.toolUse.input || "", partial: output });
+	} else if (delta?.reasoningContent) {
+		let thinkingBlock = block;
+		let thinkingIndex = index;
+
+		if (!thinkingBlock) {
+			const newBlock: Block = { type: "thinking", thinking: "", thinkingSignature: "", index: contentBlockIndex };
+			output.content.push(newBlock);
+			thinkingIndex = blocks.length - 1;
+			thinkingBlock = blocks[thinkingIndex];
+			stream.push({ type: "thinking_start", contentIndex: thinkingIndex, partial: output });
+		}
+
+		if (thinkingBlock?.type === "thinking") {
+			if (delta.reasoningContent.text) {
+				thinkingBlock.thinking += delta.reasoningContent.text;
+				stream.push({
+					type: "thinking_delta",
+					contentIndex: thinkingIndex,
+					delta: delta.reasoningContent.text,
+					partial: output,
+				});
+			}
+			if (delta.reasoningContent.signature) {
+				thinkingBlock.thinkingSignature =
+					(thinkingBlock.thinkingSignature || "") + delta.reasoningContent.signature;
+			}
+		}
+	}
+}
+
+function handleMetadata(
+	event: ConverseStreamMetadataEvent,
+	model: Model<"bedrock-converse-stream">,
+	output: AssistantMessage,
+): void {
+	if (event.usage) {
+		output.usage.input = event.usage.inputTokens || 0;
+		output.usage.output = event.usage.outputTokens || 0;
+		output.usage.cacheRead = event.usage.cacheReadInputTokens || 0;
+		output.usage.cacheWrite = event.usage.cacheWriteInputTokens || 0;
+		output.usage.totalTokens = event.usage.totalTokens || output.usage.input + output.usage.output;
+		calculateCost(model, output.usage);
+	}
+}
+
+function handleContentBlockStop(
+	event: ContentBlockStopEvent,
+	blocks: Block[],
+	output: AssistantMessage,
+	stream: AssistantMessageEventStream,
+): void {
+	const index = blocks.findIndex((b) => b.index === event.contentBlockIndex);
+	const block = blocks[index];
+	if (!block) return;
+	delete (block as Block).index;
+
+	switch (block.type) {
+		case "text":
+			stream.push({ type: "text_end", contentIndex: index, content: block.text, partial: output });
+			break;
+		case "thinking":
+			stream.push({ type: "thinking_end", contentIndex: index, content: block.thinking, partial: output });
+			break;
+		case "toolCall":
+			block.arguments = parseStreamingJson(block.partialJson);
+			delete (block as Block).partialJson;
+			stream.push({ type: "toolcall_end", contentIndex: index, toolCall: block, partial: output });
+			break;
+	}
+}
+
+/**
+ * Check if the model supports prompt caching.
+ * Supported: Claude 3.5 Haiku, Claude 3.7 Sonnet, Claude 4.x models
+ */
+function supportsPromptCaching(model: Model<"bedrock-converse-stream">): boolean {
+	const id = model.id.toLowerCase();
+	// Claude 4.x models (opus-4, sonnet-4, haiku-4)
+	if (id.includes("claude") && (id.includes("-4-") || id.includes("-4."))) return true;
+	// Claude 3.7 Sonnet
+	if (id.includes("claude-3-7-sonnet")) return true;
+	// Claude 3.5 Haiku
+	if (id.includes("claude-3-5-haiku")) return true;
+	return false;
+}
+
+function buildSystemPrompt(
+	systemPrompt: string | undefined,
+	model: Model<"bedrock-converse-stream">,
+): SystemContentBlock[] | undefined {
+	if (!systemPrompt) return undefined;
+
+	const blocks: SystemContentBlock[] = [{ text: sanitizeSurrogates(systemPrompt) }];
+
+	// Add cache point for supported Claude models
+	if (supportsPromptCaching(model)) {
+		blocks.push({ cachePoint: { type: CachePointType.DEFAULT } });
+	}
+
+	return blocks;
+}
+
+function convertMessages(context: Context, model: Model<"bedrock-converse-stream">): Message[] {
+	const result: Message[] = [];
+	const messages = context.messages;
+
+	for (let i = 0; i < messages.length; i++) {
+		const m = messages[i];
+
+		switch (m.role) {
+			case "user":
+				result.push({
+					role: ConversationRole.USER,
+					content:
+						typeof m.content === "string"
+							? [{ text: sanitizeSurrogates(m.content) }]
+							: m.content.map((c) => {
+									switch (c.type) {
+										case "text":
+											return { text: sanitizeSurrogates(c.text) };
+										case "image":
+											return { image: createImageBlock(c.mimeType, c.data) };
+										default:
+											throw new Error("Unknown user content type");
+									}
+								}),
+				});
+				break;
+			case "assistant": {
+				// Skip assistant messages with empty content (e.g., from aborted requests)
+				// Bedrock rejects messages with empty content arrays
+				if (m.content.length === 0) {
+					continue;
+				}
+				const contentBlocks: ContentBlock[] = [];
+				for (const c of m.content) {
+					switch (c.type) {
+						case "text":
+							// Skip empty text blocks
+							if (c.text.trim().length === 0) continue;
+							contentBlocks.push({ text: sanitizeSurrogates(c.text) });
+							break;
+						case "toolCall":
+							contentBlocks.push({
+								toolUse: { toolUseId: c.id, name: c.name, input: c.arguments },
+							});
+							break;
+						case "thinking":
+							// Skip empty thinking blocks
+							if (c.thinking.trim().length === 0) continue;
+							contentBlocks.push({
+								reasoningContent: {
+									reasoningText: { text: sanitizeSurrogates(c.thinking), signature: c.thinkingSignature },
+								},
+							});
+							break;
+						default:
+							throw new Error("Unknown assistant content type");
+					}
+				}
+				// Skip if all content blocks were filtered out
+				if (contentBlocks.length === 0) {
+					continue;
+				}
+				result.push({
+					role: ConversationRole.ASSISTANT,
+					content: contentBlocks,
+				});
+				break;
+			}
+			case "toolResult": {
+				// Collect all consecutive toolResult messages into a single user message
+				// Bedrock requires all tool results to be in one message
+				const toolResults: ContentBlock.ToolResultMember[] = [];
+
+				// Add current tool result with all content blocks combined
+				toolResults.push({
+					toolResult: {
+						toolUseId: m.toolCallId,
+						content: m.content.map((c) =>
+							c.type === "image"
+								? { image: createImageBlock(c.mimeType, c.data) }
+								: { text: sanitizeSurrogates(c.text) },
+						),
+						status: m.isError ? ToolResultStatus.ERROR : ToolResultStatus.SUCCESS,
+					},
+				});
+
+				// Look ahead for consecutive toolResult messages
+				let j = i + 1;
+				while (j < messages.length && messages[j].role === "toolResult") {
+					const nextMsg = messages[j] as ToolResultMessage;
+					toolResults.push({
+						toolResult: {
+							toolUseId: nextMsg.toolCallId,
+							content: nextMsg.content.map((c) =>
+								c.type === "image"
+									? { image: createImageBlock(c.mimeType, c.data) }
+									: { text: sanitizeSurrogates(c.text) },
+							),
+							status: nextMsg.isError ? ToolResultStatus.ERROR : ToolResultStatus.SUCCESS,
+						},
+					});
+					j++;
+				}
+
+				// Skip the messages we've already processed
+				i = j - 1;
+
+				result.push({
+					role: ConversationRole.USER,
+					content: toolResults,
+				});
+				break;
+			}
+			default:
+				throw new Error("Unknown message role");
+		}
+	}
+
+	// Add cache point to the last user message for supported Claude models
+	if (supportsPromptCaching(model) && result.length > 0) {
+		const lastMessage = result[result.length - 1];
+		if (lastMessage.role === ConversationRole.USER && lastMessage.content) {
+			(lastMessage.content as ContentBlock[]).push({ cachePoint: { type: CachePointType.DEFAULT } });
+		}
+	}
+
+	return result;
+}
+
+function convertToolConfig(
+	tools: Tool[] | undefined,
+	toolChoice: BedrockOptions["toolChoice"],
+): ToolConfiguration | undefined {
+	if (!tools?.length || toolChoice === "none") return undefined;
+
+	const bedrockTools: BedrockTool[] = tools.map((tool) => ({
+		toolSpec: {
+			name: tool.name,
+			description: tool.description,
+			inputSchema: { json: tool.parameters },
+		},
+	}));
+
+	let bedrockToolChoice: ToolChoice | undefined;
+	switch (toolChoice) {
+		case "auto":
+			bedrockToolChoice = { auto: {} };
+			break;
+		case "any":
+			bedrockToolChoice = { any: {} };
+			break;
+		default:
+			if (toolChoice?.type === "tool") {
+				bedrockToolChoice = { tool: { name: toolChoice.name } };
+			}
+	}
+
+	return { tools: bedrockTools, toolChoice: bedrockToolChoice };
+}
+
+function mapStopReason(reason: string | undefined): StopReason {
+	switch (reason) {
+		case BedrockStopReason.END_TURN:
+		case BedrockStopReason.STOP_SEQUENCE:
+			return "stop";
+		case BedrockStopReason.MAX_TOKENS:
+		case BedrockStopReason.MODEL_CONTEXT_WINDOW_EXCEEDED:
+			return "length";
+		case BedrockStopReason.TOOL_USE:
+			return "toolUse";
+		default:
+			return "error";
+	}
+}
+
+function buildAdditionalModelRequestFields(
+	model: Model<"bedrock-converse-stream">,
+	options: BedrockOptions,
+): Record<string, any> | undefined {
+	if (!options.reasoning || !model.reasoning) {
+		return undefined;
+	}
+
+	if (model.id.includes("anthropic.claude")) {
+		const defaultBudgets: Record<ThinkingLevel, number> = {
+			minimal: 1024,
+			low: 2048,
+			medium: 8192,
+			high: 16384,
+			xhigh: 16384, // Claude doesn't support xhigh, clamp to high
+		};
+
+		// Custom budgets override defaults (xhigh not in ThinkingBudgets, use high)
+		const level = options.reasoning === "xhigh" ? "high" : options.reasoning;
+		const budget = options.thinkingBudgets?.[level] ?? defaultBudgets[options.reasoning];
+
+		const result: Record<string, any> = {
+			thinking: {
+				type: "enabled",
+				budget_tokens: budget,
+			},
+		};
+
+		if (options.interleavedThinking) {
+			result.anthropic_beta = ["interleaved-thinking-2025-05-14"];
+		}
+
+		return result;
+	}
+
+	return undefined;
+}
+
+function createImageBlock(mimeType: string, data: string) {
+	let format: ImageFormat;
+	switch (mimeType) {
+		case "image/jpeg":
+		case "image/jpg":
+			format = ImageFormat.JPEG;
+			break;
+		case "image/png":
+			format = ImageFormat.PNG;
+			break;
+		case "image/gif":
+			format = ImageFormat.GIF;
+			break;
+		case "image/webp":
+			format = ImageFormat.WEBP;
+			break;
+		default:
+			throw new Error(`Unknown image type: ${mimeType}`);
+	}
+
+	const binaryString = atob(data);
+	const bytes = new Uint8Array(binaryString.length);
+	for (let i = 0; i < binaryString.length; i++) {
+		bytes[i] = binaryString.charCodeAt(i);
+	}
+
+	return { source: { bytes }, format };
+}
--- a/packages/ai/src/providers/anthropic.ts
+++ b/packages/ai/src/providers/anthropic.ts
@ -287,7 +287,7 @@ export const streamAnthropic: StreamFunction<"anthropic-messages"> = (
 			}

 			if (output.stopReason === "aborted" || output.stopReason === "error") {
-				throw new Error("An unkown error ocurred");
+				throw new Error("An unknown error occurred");
 			}

 			stream.push({ type: "done", reason: output.stopReason, message: output });
--- a/packages/ai/src/providers/google-gemini-cli.ts
+++ b/packages/ai/src/providers/google-gemini-cli.ts
@ -4,6 +4,7 @@
 * Uses the Cloud Code Assist API endpoint to access Gemini and Claude models.
 */

+import { createHash } from "node:crypto";
 import type { Content, ThinkingConfig } from "@google/genai";
 import { calculateCost } from "../models.js";
 import type {
@ -54,6 +55,8 @@ export interface GoogleGeminiCliOptions extends StreamOptions {
 }

 const DEFAULT_ENDPOINT = "https://cloudcode-pa.googleapis.com";
+const ANTIGRAVITY_DAILY_ENDPOINT = "https://daily-cloudcode-pa.sandbox.googleapis.com";
+const ANTIGRAVITY_ENDPOINT_FALLBACKS = [ANTIGRAVITY_DAILY_ENDPOINT, DEFAULT_ENDPOINT] as const;
 // Headers for Gemini CLI (prod endpoint)
 const GEMINI_CLI_HEADERS = {
 	"User-Agent": "google-cloud-sdk vscode_cloudshelleditor/0.1",
@ -163,16 +166,66 @@ let toolCallCounter = 0;
 // Retry configuration
 const MAX_RETRIES = 3;
 const BASE_DELAY_MS = 1000;
+const MAX_EMPTY_STREAM_RETRIES = 2;
+const EMPTY_STREAM_BASE_DELAY_MS = 500;
+const CLAUDE_THINKING_BETA_HEADER = "interleaved-thinking-2025-05-14";

 /**
 * Extract retry delay from Gemini error response (in milliseconds).
- * Parses patterns like:
+ * Checks headers first (Retry-After, x-ratelimit-reset, x-ratelimit-reset-after),
+ * then parses body patterns like:
 * - "Your quota will reset after 39s"
 * - "Your quota will reset after 18h31m10s"
 * - "Please retry in Xs" or "Please retry in Xms"
 * - "retryDelay": "34.074824224s" (JSON field)
 */
-function extractRetryDelay(errorText: string): number | undefined {
+export function extractRetryDelay(errorText: string, response?: Response | Headers): number | undefined {
+	const normalizeDelay = (ms: number): number | undefined => (ms > 0 ? Math.ceil(ms + 1000) : undefined);
+
+	const headers = response instanceof Headers ? response : response?.headers;
+	if (headers) {
+		const retryAfter = headers.get("retry-after");
+		if (retryAfter) {
+			const retryAfterSeconds = Number(retryAfter);
+			if (Number.isFinite(retryAfterSeconds)) {
+				const delay = normalizeDelay(retryAfterSeconds * 1000);
+				if (delay !== undefined) {
+					return delay;
+				}
+			}
+			const retryAfterDate = new Date(retryAfter);
+			const retryAfterMs = retryAfterDate.getTime();
+			if (!Number.isNaN(retryAfterMs)) {
+				const delay = normalizeDelay(retryAfterMs - Date.now());
+				if (delay !== undefined) {
+					return delay;
+				}
+			}
+		}
+
+		const rateLimitReset = headers.get("x-ratelimit-reset");
+		if (rateLimitReset) {
+			const resetSeconds = Number.parseInt(rateLimitReset, 10);
+			if (!Number.isNaN(resetSeconds)) {
+				const delay = normalizeDelay(resetSeconds * 1000 - Date.now());
+				if (delay !== undefined) {
+					return delay;
+				}
+			}
+		}
+
+		const rateLimitResetAfter = headers.get("x-ratelimit-reset-after");
+		if (rateLimitResetAfter) {
+			const resetAfterSeconds = Number(rateLimitResetAfter);
+			if (Number.isFinite(resetAfterSeconds)) {
+				const delay = normalizeDelay(resetAfterSeconds * 1000);
+				if (delay !== undefined) {
+					return delay;
+				}
+			}
+		}
+	}
+
 	// Pattern 1: "Your quota will reset after ..." (formats: "18h31m10s", "10m15s", "6s", "39s")
 	const durationMatch = errorText.match(/reset after (?:(\d+)h)?(?:(\d+)m)?(\d+(?:\.\d+)?)s/i);
 	if (durationMatch) {
@ -181,8 +234,9 @@ function extractRetryDelay(errorText: string): number | undefined {
 		const seconds = parseFloat(durationMatch[3]);
 		if (!Number.isNaN(seconds)) {
 			const totalMs = ((hours * 60 + minutes) * 60 + seconds) * 1000;
-			if (totalMs > 0) {
-				return Math.ceil(totalMs + 1000); // Add 1s buffer
+			const delay = normalizeDelay(totalMs);
+			if (delay !== undefined) {
+				return delay;
 			}
 		}
 	}
@ -193,7 +247,10 @@ function extractRetryDelay(errorText: string): number | undefined {
 		const value = parseFloat(retryInMatch[1]);
 		if (!Number.isNaN(value) && value > 0) {
 			const ms = retryInMatch[2].toLowerCase() === "ms" ? value : value * 1000;
-			return Math.ceil(ms + 1000);
+			const delay = normalizeDelay(ms);
+			if (delay !== undefined) {
+				return delay;
+			}
 		}
 	}

@ -203,21 +260,45 @@ function extractRetryDelay(errorText: string): number | undefined {
 		const value = parseFloat(retryDelayMatch[1]);
 		if (!Number.isNaN(value) && value > 0) {
 			const ms = retryDelayMatch[2].toLowerCase() === "ms" ? value : value * 1000;
-			return Math.ceil(ms + 1000);
+			const delay = normalizeDelay(ms);
+			if (delay !== undefined) {
+				return delay;
+			}
 		}
 	}

 	return undefined;
 }

+function isClaudeThinkingModel(modelId: string): boolean {
+	const normalized = modelId.toLowerCase();
+	return normalized.includes("claude") && normalized.includes("thinking");
+}
+
 /**
- * Check if an error is retryable (rate limit, server error, etc.)
+ * Check if an error is retryable (rate limit, server error, network error, etc.)
 */
 function isRetryableError(status: number, errorText: string): boolean {
 	if (status === 429 || status === 500 || status === 502 || status === 503 || status === 504) {
 		return true;
 	}
-	return /resource.?exhausted|rate.?limit|overloaded|service.?unavailable/i.test(errorText);
+	return /resource.?exhausted|rate.?limit|overloaded|service.?unavailable|other.?side.?closed/i.test(errorText);
+}
+
+/**
+ * Extract a clean, user-friendly error message from Google API error response.
+ * Parses JSON error responses and returns just the message field.
+ */
+function extractErrorMessage(errorText: string): string {
+	try {
+		const parsed = JSON.parse(errorText) as { error?: { message?: string } };
+		if (parsed.error?.message) {
+			return parsed.error.message;
+		}
+	} catch {
+		// Not JSON, return as-is
+	}
+	return errorText;
 }

 /**
@ -242,6 +323,7 @@ interface CloudCodeAssistRequest {
 	model: string;
 	request: {
 		contents: Content[];
+		sessionId?: string;
 		systemInstruction?: { role?: string; parts: { text: string }[] };
 		generationConfig?: {
 			maxOutputTokens?: number;
@ -339,17 +421,26 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
 				throw new Error("Missing token or projectId in Google Cloud credentials. Use /login to re-authenticate.");
 			}

-			const endpoint = model.baseUrl || DEFAULT_ENDPOINT;
-			const url = `${endpoint}/v1internal:streamGenerateContent?alt=sse`;
+			const isAntigravity = model.provider === "google-antigravity";
+			const baseUrl = model.baseUrl?.trim();
+			const endpoints = baseUrl ? [baseUrl] : isAntigravity ? ANTIGRAVITY_ENDPOINT_FALLBACKS : [DEFAULT_ENDPOINT];

-			// Use Antigravity headers for sandbox endpoint, otherwise Gemini CLI headers
-			const isAntigravity = endpoint.includes("sandbox.googleapis.com");
 			const requestBody = buildRequest(model, context, projectId, options, isAntigravity);
 			const headers = isAntigravity ? ANTIGRAVITY_HEADERS : GEMINI_CLI_HEADERS;

+			const requestHeaders = {
+				Authorization: `Bearer ${accessToken}`,
+				"Content-Type": "application/json",
+				Accept: "text/event-stream",
+				...headers,
+				...(isClaudeThinkingModel(model.id) ? { "anthropic-beta": CLAUDE_THINKING_BETA_HEADER } : {}),
+			};
+			const requestBodyJson = JSON.stringify(requestBody);
+
 			// Fetch with retry logic for rate limits and transient errors
 			let response: Response | undefined;
 			let lastError: Error | undefined;
+			let requestUrl: string | undefined;

 			for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
 				if (options?.signal?.aborted) {
@ -357,15 +448,12 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
 				}

 				try {
-					response = await fetch(url, {
+					const endpoint = endpoints[Math.min(attempt, endpoints.length - 1)];
+					requestUrl = `${endpoint}/v1internal:streamGenerateContent?alt=sse`;
+					response = await fetch(requestUrl, {
 						method: "POST",
-						headers: {
-							Authorization: `Bearer ${accessToken}`,
-							"Content-Type": "application/json",
-							Accept: "text/event-stream",
-							...headers,
-						},
-						body: JSON.stringify(requestBody),
+						headers: requestHeaders,
+						body: requestBodyJson,
 						signal: options?.signal,
 					});

@ -378,14 +466,14 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
 					// Check if retryable
 					if (attempt < MAX_RETRIES && isRetryableError(response.status, errorText)) {
 						// Use server-provided delay or exponential backoff
-						const serverDelay = extractRetryDelay(errorText);
+						const serverDelay = extractRetryDelay(errorText, response);
 						const delayMs = serverDelay ?? BASE_DELAY_MS * 2 ** attempt;
 						await sleep(delayMs, options?.signal);
 						continue;
 					}

 					// Not retryable or max retries exceeded
-					throw new Error(`Cloud Code Assist API error (${response.status}): ${errorText}`);
+					throw new Error(`Cloud Code Assist API error (${response.status}): ${extractErrorMessage(errorText)}`);
 				} catch (error) {
 					// Check for abort - fetch throws AbortError, our code throws "Request was aborted"
 					if (error instanceof Error) {
@ -393,7 +481,11 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
 							throw new Error("Request was aborted");
 						}
 					}
+					// Extract detailed error message from fetch errors (Node includes cause)
 					lastError = error instanceof Error ? error : new Error(String(error));
+					if (lastError.message === "fetch failed" && lastError.cause instanceof Error) {
+						lastError = new Error(`Network error: ${lastError.cause.message}`);
+					}
 					// Network errors are retryable
 					if (attempt < MAX_RETRIES) {
 						const delayMs = BASE_DELAY_MS * 2 ** attempt;
@ -408,73 +500,160 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
 				throw lastError ?? new Error("Failed to get response after retries");
 			}

-			if (!response.body) {
-				throw new Error("No response body");
-			}
-
-			stream.push({ type: "start", partial: output });
-
-			let currentBlock: TextContent | ThinkingContent | null = null;
-			const blocks = output.content;
-			const blockIndex = () => blocks.length - 1;
-
-			// Read SSE stream
-			const reader = response.body.getReader();
-			const decoder = new TextDecoder();
-			let buffer = "";
-
-			// Set up abort handler to cancel reader when signal fires
-			const abortHandler = () => {
-				void reader.cancel().catch(() => {});
+			let started = false;
+			const ensureStarted = () => {
+				if (!started) {
+					stream.push({ type: "start", partial: output });
+					started = true;
+				}
 			};
-			options?.signal?.addEventListener("abort", abortHandler);

-			try {
-				while (true) {
-					// Check abort signal before each read
-					if (options?.signal?.aborted) {
-						throw new Error("Request was aborted");
-					}
+			const resetOutput = () => {
+				output.content = [];
+				output.usage = {
+					input: 0,
+					output: 0,
+					cacheRead: 0,
+					cacheWrite: 0,
+					totalTokens: 0,
+					cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
+				};
+				output.stopReason = "stop";
+				output.errorMessage = undefined;
+				output.timestamp = Date.now();
+				started = false;
+			};

-					const { done, value } = await reader.read();
-					if (done) break;
+			const streamResponse = async (activeResponse: Response): Promise<boolean> => {
+				if (!activeResponse.body) {
+					throw new Error("No response body");
+				}

-					buffer += decoder.decode(value, { stream: true });
-					const lines = buffer.split("\n");
-					buffer = lines.pop() || "";
+				let hasContent = false;
+				let currentBlock: TextContent | ThinkingContent | null = null;
+				const blocks = output.content;
+				const blockIndex = () => blocks.length - 1;

-					for (const line of lines) {
-						if (!line.startsWith("data:")) continue;
+				// Read SSE stream
+				const reader = activeResponse.body.getReader();
+				const decoder = new TextDecoder();
+				let buffer = "";

-						const jsonStr = line.slice(5).trim();
-						if (!jsonStr) continue;
+				// Set up abort handler to cancel reader when signal fires
+				const abortHandler = () => {
+					void reader.cancel().catch(() => {});
+				};
+				options?.signal?.addEventListener("abort", abortHandler);

-						let chunk: CloudCodeAssistResponseChunk;
-						try {
-							chunk = JSON.parse(jsonStr);
-						} catch {
-							continue;
+				try {
+					while (true) {
+						// Check abort signal before each read
+						if (options?.signal?.aborted) {
+							throw new Error("Request was aborted");
 						}

-						// Unwrap the response
-						const responseData = chunk.response;
-						if (!responseData) continue;
+						const { done, value } = await reader.read();
+						if (done) break;

-						const candidate = responseData.candidates?.[0];
-						if (candidate?.content?.parts) {
-							for (const part of candidate.content.parts) {
-								if (part.text !== undefined) {
-									const isThinking = isThinkingPart(part);
-									if (
-										!currentBlock ||
-										(isThinking && currentBlock.type !== "thinking") ||
-										(!isThinking && currentBlock.type !== "text")
-									) {
+						buffer += decoder.decode(value, { stream: true });
+						const lines = buffer.split("\n");
+						buffer = lines.pop() || "";
+
+						for (const line of lines) {
+							if (!line.startsWith("data:")) continue;
+
+							const jsonStr = line.slice(5).trim();
+							if (!jsonStr) continue;
+
+							let chunk: CloudCodeAssistResponseChunk;
+							try {
+								chunk = JSON.parse(jsonStr);
+							} catch {
+								continue;
+							}
+
+							// Unwrap the response
+							const responseData = chunk.response;
+							if (!responseData) continue;
+
+							const candidate = responseData.candidates?.[0];
+							if (candidate?.content?.parts) {
+								for (const part of candidate.content.parts) {
+									if (part.text !== undefined) {
+										hasContent = true;
+										const isThinking = isThinkingPart(part);
+										if (
+											!currentBlock ||
+											(isThinking && currentBlock.type !== "thinking") ||
+											(!isThinking && currentBlock.type !== "text")
+										) {
+											if (currentBlock) {
+												if (currentBlock.type === "text") {
+													stream.push({
+														type: "text_end",
+														contentIndex: blocks.length - 1,
+														content: currentBlock.text,
+														partial: output,
+													});
+												} else {
+													stream.push({
+														type: "thinking_end",
+														contentIndex: blockIndex(),
+														content: currentBlock.thinking,
+														partial: output,
+													});
+												}
+											}
+											if (isThinking) {
+												currentBlock = { type: "thinking", thinking: "", thinkingSignature: undefined };
+												output.content.push(currentBlock);
+												ensureStarted();
+												stream.push({
+													type: "thinking_start",
+													contentIndex: blockIndex(),
+													partial: output,
+												});
+											} else {
+												currentBlock = { type: "text", text: "" };
+												output.content.push(currentBlock);
+												ensureStarted();
+												stream.push({ type: "text_start", contentIndex: blockIndex(), partial: output });
+											}
+										}
+										if (currentBlock.type === "thinking") {
+											currentBlock.thinking += part.text;
+											currentBlock.thinkingSignature = retainThoughtSignature(
+												currentBlock.thinkingSignature,
+												part.thoughtSignature,
+											);
+											stream.push({
+												type: "thinking_delta",
+												contentIndex: blockIndex(),
+												delta: part.text,
+												partial: output,
+											});
+										} else {
+											currentBlock.text += part.text;
+											currentBlock.textSignature = retainThoughtSignature(
+												currentBlock.textSignature,
+												part.thoughtSignature,
+											);
+											stream.push({
+												type: "text_delta",
+												contentIndex: blockIndex(),
+												delta: part.text,
+												partial: output,
+											});
+										}
+									}
+
+									if (part.functionCall) {
+										hasContent = true;
 										if (currentBlock) {
 											if (currentBlock.type === "text") {
 												stream.push({
 													type: "text_end",
-													contentIndex: blocks.length - 1,
+													contentIndex: blockIndex(),
 													content: currentBlock.text,
 													partial: output,
 												});
@ -486,143 +665,142 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
 													partial: output,
 												});
 											}
+											currentBlock = null;
 										}
-										if (isThinking) {
-											currentBlock = { type: "thinking", thinking: "", thinkingSignature: undefined };
-											output.content.push(currentBlock);
-											stream.push({ type: "thinking_start", contentIndex: blockIndex(), partial: output });
-										} else {
-											currentBlock = { type: "text", text: "" };
-											output.content.push(currentBlock);
-											stream.push({ type: "text_start", contentIndex: blockIndex(), partial: output });
-										}
-									}
-									if (currentBlock.type === "thinking") {
-										currentBlock.thinking += part.text;
-										currentBlock.thinkingSignature = retainThoughtSignature(
-											currentBlock.thinkingSignature,
-											part.thoughtSignature,
-										);
+
+										const providedId = part.functionCall.id;
+										const needsNewId =
+											!providedId ||
+											output.content.some((b) => b.type === "toolCall" && b.id === providedId);
+										const toolCallId = needsNewId
+											? `${part.functionCall.name}_${Date.now()}_${++toolCallCounter}`
+											: providedId;
+
+										const toolCall: ToolCall = {
+											type: "toolCall",
+											id: toolCallId,
+											name: part.functionCall.name || "",
+											arguments: part.functionCall.args as Record<string, unknown>,
+											...(part.thoughtSignature && { thoughtSignature: part.thoughtSignature }),
+										};
+
+										output.content.push(toolCall);
+										ensureStarted();
+										stream.push({ type: "toolcall_start", contentIndex: blockIndex(), partial: output });
 										stream.push({
-											type: "thinking_delta",
+											type: "toolcall_delta",
 											contentIndex: blockIndex(),
-											delta: part.text,
+											delta: JSON.stringify(toolCall.arguments),
 											partial: output,
 										});
-									} else {
-										currentBlock.text += part.text;
-										currentBlock.textSignature = retainThoughtSignature(
-											currentBlock.textSignature,
-											part.thoughtSignature,
-										);
 										stream.push({
-											type: "text_delta",
+											type: "toolcall_end",
 											contentIndex: blockIndex(),
-											delta: part.text,
+											toolCall,
 											partial: output,
 										});
 									}
 								}
+							}

-								if (part.functionCall) {
-									if (currentBlock) {
-										if (currentBlock.type === "text") {
-											stream.push({
-												type: "text_end",
-												contentIndex: blockIndex(),
-												content: currentBlock.text,
-												partial: output,
-											});
-										} else {
-											stream.push({
-												type: "thinking_end",
-												contentIndex: blockIndex(),
-												content: currentBlock.thinking,
-												partial: output,
-											});
-										}
-										currentBlock = null;
-									}
-
-									const providedId = part.functionCall.id;
-									const needsNewId =
-										!providedId || output.content.some((b) => b.type === "toolCall" && b.id === providedId);
-									const toolCallId = needsNewId
-										? `${part.functionCall.name}_${Date.now()}_${++toolCallCounter}`
-										: providedId;
-
-									const toolCall: ToolCall = {
-										type: "toolCall",
-										id: toolCallId,
-										name: part.functionCall.name || "",
-										arguments: part.functionCall.args as Record<string, unknown>,
-										...(part.thoughtSignature && { thoughtSignature: part.thoughtSignature }),
-									};
-
-									output.content.push(toolCall);
-									stream.push({ type: "toolcall_start", contentIndex: blockIndex(), partial: output });
-									stream.push({
-										type: "toolcall_delta",
-										contentIndex: blockIndex(),
-										delta: JSON.stringify(toolCall.arguments),
-										partial: output,
-									});
-									stream.push({ type: "toolcall_end", contentIndex: blockIndex(), toolCall, partial: output });
+							if (candidate?.finishReason) {
+								output.stopReason = mapStopReasonString(candidate.finishReason);
+								if (output.content.some((b) => b.type === "toolCall")) {
+									output.stopReason = "toolUse";
 								}
 							}
-						}

-						if (candidate?.finishReason) {
-							output.stopReason = mapStopReasonString(candidate.finishReason);
-							if (output.content.some((b) => b.type === "toolCall")) {
-								output.stopReason = "toolUse";
-							}
-						}
-
-						if (responseData.usageMetadata) {
-							// promptTokenCount includes cachedContentTokenCount, so subtract to get fresh input
-							const promptTokens = responseData.usageMetadata.promptTokenCount || 0;
-							const cacheReadTokens = responseData.usageMetadata.cachedContentTokenCount || 0;
-							output.usage = {
-								input: promptTokens - cacheReadTokens,
-								output:
-									(responseData.usageMetadata.candidatesTokenCount || 0) +
-									(responseData.usageMetadata.thoughtsTokenCount || 0),
-								cacheRead: cacheReadTokens,
-								cacheWrite: 0,
-								totalTokens: responseData.usageMetadata.totalTokenCount || 0,
-								cost: {
-									input: 0,
-									output: 0,
-									cacheRead: 0,
+							if (responseData.usageMetadata) {
+								// promptTokenCount includes cachedContentTokenCount, so subtract to get fresh input
+								const promptTokens = responseData.usageMetadata.promptTokenCount || 0;
+								const cacheReadTokens = responseData.usageMetadata.cachedContentTokenCount || 0;
+								output.usage = {
+									input: promptTokens - cacheReadTokens,
+									output:
+										(responseData.usageMetadata.candidatesTokenCount || 0) +
+										(responseData.usageMetadata.thoughtsTokenCount || 0),
+									cacheRead: cacheReadTokens,
 									cacheWrite: 0,
-									total: 0,
-								},
-							};
-							calculateCost(model, output.usage);
+									totalTokens: responseData.usageMetadata.totalTokenCount || 0,
+									cost: {
+										input: 0,
+										output: 0,
+										cacheRead: 0,
+										cacheWrite: 0,
+										total: 0,
+									},
+								};
+								calculateCost(model, output.usage);
+							}
 						}
 					}
+				} finally {
+					options?.signal?.removeEventListener("abort", abortHandler);
+				}
+
+				if (currentBlock) {
+					if (currentBlock.type === "text") {
+						stream.push({
+							type: "text_end",
+							contentIndex: blockIndex(),
+							content: currentBlock.text,
+							partial: output,
+						});
+					} else {
+						stream.push({
+							type: "thinking_end",
+							contentIndex: blockIndex(),
+							content: currentBlock.thinking,
+							partial: output,
+						});
+					}
+				}
+
+				return hasContent;
+			};
+
+			let receivedContent = false;
+			let currentResponse = response;
+
+			for (let emptyAttempt = 0; emptyAttempt <= MAX_EMPTY_STREAM_RETRIES; emptyAttempt++) {
+				if (options?.signal?.aborted) {
+					throw new Error("Request was aborted");
+				}
+
+				if (emptyAttempt > 0) {
+					const backoffMs = EMPTY_STREAM_BASE_DELAY_MS * 2 ** (emptyAttempt - 1);
+					await sleep(backoffMs, options?.signal);
+
+					if (!requestUrl) {
+						throw new Error("Missing request URL");
+					}
+
+					currentResponse = await fetch(requestUrl, {
+						method: "POST",
+						headers: requestHeaders,
+						body: requestBodyJson,
+						signal: options?.signal,
+					});
+
+					if (!currentResponse.ok) {
+						const retryErrorText = await currentResponse.text();
+						throw new Error(`Cloud Code Assist API error (${currentResponse.status}): ${retryErrorText}`);
+					}
+				}
+
+				const streamed = await streamResponse(currentResponse);
+				if (streamed) {
+					receivedContent = true;
+					break;
+				}
+
+				if (emptyAttempt < MAX_EMPTY_STREAM_RETRIES) {
+					resetOutput();
 				}
-			} finally {
-				options?.signal?.removeEventListener("abort", abortHandler);
 			}

-			if (currentBlock) {
-				if (currentBlock.type === "text") {
-					stream.push({
-						type: "text_end",
-						contentIndex: blockIndex(),
-						content: currentBlock.text,
-						partial: output,
-					});
-				} else {
-					stream.push({
-						type: "thinking_end",
-						contentIndex: blockIndex(),
-						content: currentBlock.thinking,
-						partial: output,
-					});
-				}
+			if (!receivedContent) {
+				throw new Error("Cloud Code Assist API returned an empty response");
 			}

 			if (options?.signal?.aborted) {
@ -651,7 +829,34 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
 	return stream;
 };

-function buildRequest(
+function deriveSessionId(context: Context): string | undefined {
+	for (const message of context.messages) {
+		if (message.role !== "user") {
+			continue;
+		}
+
+		let text = "";
+		if (typeof message.content === "string") {
+			text = message.content;
+		} else if (Array.isArray(message.content)) {
+			text = message.content
+				.filter((item): item is TextContent => item.type === "text")
+				.map((item) => item.text)
+				.join("\n");
+		}
+
+		if (!text || text.trim().length === 0) {
+			return undefined;
+		}
+
+		const hash = createHash("sha256").update(text).digest("hex");
+		return hash.slice(0, 32);
+	}
+
+	return undefined;
+}
+
+export function buildRequest(
 	model: Model<"google-gemini-cli">,
 	context: Context,
 	projectId: string,
@ -686,6 +891,11 @@ function buildRequest(
 		contents,
 	};

+	const sessionId = deriveSessionId(context);
+	if (sessionId) {
+		request.sessionId = sessionId;
+	}
+
 	// System instruction must be object with parts, not plain string
 	if (context.systemPrompt) {
 		request.systemInstruction = {
--- a/packages/ai/src/providers/openai-completions.ts
+++ b/packages/ai/src/providers/openai-completions.ts
@ -365,6 +365,7 @@ function createClient(model: Model<"openai-completions">, context: Context, apiK
 function buildParams(model: Model<"openai-completions">, context: Context, options?: OpenAICompletionsOptions) {
 	const compat = getCompat(model);
 	const messages = convertMessages(model, context, compat);
+	maybeAddOpenRouterAnthropicCacheControl(model, messages);

 	const params: OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming = {
 		model: model.id,
@ -403,13 +404,51 @@ function buildParams(model: Model<"openai-completions">, context: Context, optio
 		params.tool_choice = options.toolChoice;
 	}

-	if (options?.reasoningEffort && model.reasoning && compat.supportsReasoningEffort) {
+	if (compat.thinkingFormat === "zai" && model.reasoning) {
+		// Z.ai uses binary thinking: { type: "enabled" | "disabled" }
+		// Must explicitly disable since z.ai defaults to thinking enabled
+		(params as any).thinking = { type: options?.reasoningEffort ? "enabled" : "disabled" };
+	} else if (options?.reasoningEffort && model.reasoning && compat.supportsReasoningEffort) {
+		// OpenAI-style reasoning_effort
 		params.reasoning_effort = options.reasoningEffort;
 	}

 	return params;
 }

+function maybeAddOpenRouterAnthropicCacheControl(
+	model: Model<"openai-completions">,
+	messages: ChatCompletionMessageParam[],
+): void {
+	if (model.provider !== "openrouter" || !model.id.startsWith("anthropic/")) return;
+
+	// Anthropic-style caching requires cache_control on a text part. Add a breakpoint
+	// on the last user/assistant message (walking backwards until we find text content).
+	for (let i = messages.length - 1; i >= 0; i--) {
+		const msg = messages[i];
+		if (msg.role !== "user" && msg.role !== "assistant") continue;
+
+		const content = msg.content;
+		if (typeof content === "string") {
+			msg.content = [
+				Object.assign({ type: "text" as const, text: content }, { cache_control: { type: "ephemeral" } }),
+			];
+			return;
+		}
+
+		if (!Array.isArray(content)) continue;
+
+		// Find last text part and add cache_control
+		for (let j = content.length - 1; j >= 0; j--) {
+			const part = content[j];
+			if (part?.type === "text") {
+				Object.assign(part, { cache_control: { type: "ephemeral" } });
+				return;
+			}
+		}
+	}
+}
+
 function convertMessages(
 	model: Model<"openai-completions">,
 	context: Context,
@ -644,11 +683,14 @@ function mapStopReason(reason: ChatCompletionChunk.Choice["finish_reason"]): Sto
 * Returns a fully resolved OpenAICompat object with all fields set.
 */
 function detectCompatFromUrl(baseUrl: string): Required<OpenAICompat> {
+	const isZai = baseUrl.includes("api.z.ai");
+
 	const isNonStandard =
 		baseUrl.includes("cerebras.ai") ||
 		baseUrl.includes("api.x.ai") ||
 		baseUrl.includes("mistral.ai") ||
-		baseUrl.includes("chutes.ai");
+		baseUrl.includes("chutes.ai") ||
+		isZai;

 	const useMaxTokens = baseUrl.includes("mistral.ai") || baseUrl.includes("chutes.ai");

@ -659,13 +701,14 @@ function detectCompatFromUrl(baseUrl: string): Required<OpenAICompat> {
 	return {
 		supportsStore: !isNonStandard,
 		supportsDeveloperRole: !isNonStandard,
-		supportsReasoningEffort: !isGrok,
+		supportsReasoningEffort: !isGrok && !isZai,
 		supportsUsageInStreaming: true,
 		maxTokensField: useMaxTokens ? "max_tokens" : "max_completion_tokens",
 		requiresToolResultName: isMistral,
 		requiresAssistantAfterToolResult: false, // Mistral no longer requires this as of Dec 2024
 		requiresThinkingAsText: isMistral,
 		requiresMistralToolIds: isMistral,
+		thinkingFormat: isZai ? "zai" : "openai",
 	};
 }

@ -688,5 +731,6 @@ function getCompat(model: Model<"openai-completions">): Required<OpenAICompat> {
 			model.compat.requiresAssistantAfterToolResult ?? detected.requiresAssistantAfterToolResult,
 		requiresThinkingAsText: model.compat.requiresThinkingAsText ?? detected.requiresThinkingAsText,
 		requiresMistralToolIds: model.compat.requiresMistralToolIds ?? detected.requiresMistralToolIds,
+		thinkingFormat: model.compat.thinkingFormat ?? detected.thinkingFormat,
 	};
 }
--- a/packages/ai/src/providers/openai-responses.ts
+++ b/packages/ai/src/providers/openai-responses.ts
@ -24,6 +24,7 @@ import type {
 	ThinkingContent,
 	Tool,
 	ToolCall,
+	Usage,
 } from "../types.js";
 import { AssistantMessageEventStream } from "../utils/event-stream.js";
 import { parseStreamingJson } from "../utils/json-parse.js";
@ -48,6 +49,7 @@ function shortHash(str: string): string {
 export interface OpenAIResponsesOptions extends StreamOptions {
 	reasoningEffort?: "minimal" | "low" | "medium" | "high" | "xhigh";
 	reasoningSummary?: "auto" | "detailed" | "concise" | null;
+	serviceTier?: ResponseCreateParamsStreaming["service_tier"];
 }

 /**
@ -85,7 +87,7 @@ export const streamOpenAIResponses: StreamFunction<"openai-responses"> = (
 			const apiKey = options?.apiKey || getEnvApiKey(model.provider) || "";
 			const client = createClient(model, context, apiKey);
 			const params = buildParams(model, context, options);
-			const openaiStream = await client.responses.create(params, { signal: options?.signal });
+			const openaiStream = await client.responses.create(params, { signal: options?.signal, timeout: undefined });
 			stream.push({ type: "start", partial: output });

 			let currentItem: ResponseReasoningItem | ResponseOutputMessage | ResponseFunctionToolCall | null = null;
@ -276,6 +278,7 @@ export const streamOpenAIResponses: StreamFunction<"openai-responses"> = (
 						};
 					}
 					calculateCost(model, output.usage);
+					applyServiceTierPricing(output.usage, response?.service_tier ?? options?.serviceTier);
 					// Map status to stop reason
 					output.stopReason = mapStopReason(response?.status);
 					if (output.content.some((b) => b.type === "toolCall") && output.stopReason === "stop") {
@ -363,6 +366,7 @@ function buildParams(model: Model<"openai-responses">, context: Context, options
 		model: model.id,
 		input: messages,
 		stream: true,
+		prompt_cache_key: options?.sessionId,
 	};

 	if (options?.maxTokens) {
@ -373,6 +377,10 @@ function buildParams(model: Model<"openai-responses">, context: Context, options
 		params.temperature = options?.temperature;
 	}

+	if (options?.serviceTier !== undefined) {
+		params.service_tier = options.serviceTier;
+	}
+
 	if (context.tools) {
 		params.tools = convertTools(context.tools);
 	}
@ -547,6 +555,28 @@ function convertTools(tools: Tool[]): OpenAITool[] {
 	}));
 }

+function getServiceTierCostMultiplier(serviceTier: ResponseCreateParamsStreaming["service_tier"] | undefined): number {
+	switch (serviceTier) {
+		case "flex":
+			return 0.5;
+		case "priority":
+			return 2;
+		default:
+			return 1;
+	}
+}
+
+function applyServiceTierPricing(usage: Usage, serviceTier: ResponseCreateParamsStreaming["service_tier"] | undefined) {
+	const multiplier = getServiceTierCostMultiplier(serviceTier);
+	if (multiplier === 1) return;
+
+	usage.cost.input *= multiplier;
+	usage.cost.output *= multiplier;
+	usage.cost.cacheRead *= multiplier;
+	usage.cost.cacheWrite *= multiplier;
+	usage.cost.total = usage.cost.input + usage.cost.output + usage.cost.cacheRead + usage.cost.cacheWrite;
+}
+
 function mapStopReason(status: OpenAI.Responses.ResponseStatus | undefined): StopReason {
 	if (!status) return "stop";
 	switch (status) {
--- a/packages/ai/src/providers/transform-messages.ts
+++ b/packages/ai/src/providers/transform-messages.ts
@ -1,11 +1,11 @@
 import type { Api, AssistantMessage, Message, Model, ToolCall, ToolResultMessage } from "../types.js";

 /**
- * Normalize tool call ID for GitHub Copilot cross-API compatibility.
+ * Normalize tool call ID for cross-provider compatibility.
 * OpenAI Responses API generates IDs that are 450+ chars with special characters like `|`.
- * Other APIs (Claude, etc.) require max 40 chars and only alphanumeric + underscore + hyphen.
+ * Anthropic APIs require IDs matching ^[a-zA-Z0-9_-]+$ (max 64 chars).
 */
-function normalizeCopilotToolCallId(id: string): string {
+function normalizeToolCallId(id: string): string {
 	return id.replace(/[^a-zA-Z0-9_-]/g, "").slice(0, 40);
 }

@ -38,11 +38,17 @@ export function transformMessages<TApi extends Api>(messages: Message[], model:
 				return msg;
 			}

-			// Check if we need to normalize tool call IDs (github-copilot cross-API)
-			const needsToolCallIdNormalization =
+			// Check if we need to normalize tool call IDs
+			// Anthropic APIs require IDs matching ^[a-zA-Z0-9_-]+$ (max 64 chars)
+			// OpenAI Responses API generates IDs with `|` and 450+ chars
+			// GitHub Copilot routes to Anthropic for Claude models
+			const targetRequiresStrictIds = model.api === "anthropic-messages" || model.provider === "github-copilot";
+			const crossProviderSwitch = assistantMsg.provider !== model.provider;
+			const copilotCrossApiSwitch =
 				assistantMsg.provider === "github-copilot" &&
 				model.provider === "github-copilot" &&
 				assistantMsg.api !== model.api;
+			const needsToolCallIdNormalization = targetRequiresStrictIds && (crossProviderSwitch || copilotCrossApiSwitch);

 			// Transform message from different provider/model
 			const transformedContent = assistantMsg.content.flatMap((block) => {
@ -54,10 +60,10 @@ export function transformMessages<TApi extends Api>(messages: Message[], model:
 						text: block.thinking,
 					};
 				}
-				// Normalize tool call IDs for github-copilot cross-API switches
+				// Normalize tool call IDs when target API requires strict format
 				if (block.type === "toolCall" && needsToolCallIdNormalization) {
 					const toolCall = block as ToolCall;
-					const normalizedId = normalizeCopilotToolCallId(toolCall.id);
+					const normalizedId = normalizeToolCallId(toolCall.id);
 					if (normalizedId !== toolCall.id) {
 						toolCallIdMap.set(toolCall.id, normalizedId);
 						return { ...toolCall, id: normalizedId };
--- a/packages/ai/src/stream.ts
+++ b/packages/ai/src/stream.ts
@ -2,6 +2,7 @@ import { existsSync } from "node:fs";
 import { homedir } from "node:os";
 import { join } from "node:path";
 import { supportsXhigh } from "./models.js";
+import { type BedrockOptions, streamBedrock } from "./providers/amazon-bedrock.js";
 import { type AnthropicOptions, streamAnthropic } from "./providers/anthropic.js";
 import { type GoogleOptions, streamGoogle } from "./providers/google.js";
 import {
@ -74,6 +75,20 @@ export function getEnvApiKey(provider: any): string | undefined {
 		}
 	}

+	if (provider === "amazon-bedrock") {
+		// Amazon Bedrock supports multiple credential sources:
+		// 1. AWS_PROFILE - named profile from ~/.aws/credentials
+		// 2. AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY - standard IAM keys
+		// 3. AWS_BEARER_TOKEN_BEDROCK - Bedrock API keys (bearer token)
+		if (
+			process.env.AWS_PROFILE ||
+			(process.env.AWS_ACCESS_KEY_ID && process.env.AWS_SECRET_ACCESS_KEY) ||
+			process.env.AWS_BEARER_TOKEN_BEDROCK
+		) {
+			return "<authenticated>";
+		}
+	}
+
 	const envMap: Record<string, string> = {
 		openai: "OPENAI_API_KEY",
 		google: "GEMINI_API_KEY",
@ -81,8 +96,10 @@ export function getEnvApiKey(provider: any): string | undefined {
 		cerebras: "CEREBRAS_API_KEY",
 		xai: "XAI_API_KEY",
 		openrouter: "OPENROUTER_API_KEY",
+		"vercel-ai-gateway": "AI_GATEWAY_API_KEY",
 		zai: "ZAI_API_KEY",
 		mistral: "MISTRAL_API_KEY",
+		minimax: "MINIMAX_API_KEY",
 		opencode: "OPENCODE_API_KEY",
 	};

@ -98,6 +115,9 @@ export function stream<TApi extends Api>(
 	// Vertex AI uses Application Default Credentials, not API keys
 	if (model.api === "google-vertex") {
 		return streamGoogleVertex(model as Model<"google-vertex">, context, options as GoogleVertexOptions);
+	} else if (model.api === "bedrock-converse-stream") {
+		// Bedrock doesn't have any API keys instead it sources credentials from standard AWS env variables or from given AWS profile.
+		return streamBedrock(model as Model<"bedrock-converse-stream">, context, (options || {}) as BedrockOptions);
 	}

 	const apiKey = options?.apiKey || getEnvApiKey(model.provider);
@ -156,6 +176,10 @@ export function streamSimple<TApi extends Api>(
 	if (model.api === "google-vertex") {
 		const providerOptions = mapOptionsForApi(model, options, undefined);
 		return stream(model, context, providerOptions);
+	} else if (model.api === "bedrock-converse-stream") {
+		// Bedrock doesn't have any API keys instead it sources credentials from standard AWS env variables or from given AWS profile.
+		const providerOptions = mapOptionsForApi(model, options, undefined);
+		return stream(model, context, providerOptions);
 	}

 	const apiKey = options?.apiKey || getEnvApiKey(model.provider);
@ -228,6 +252,13 @@ function mapOptionsForApi<TApi extends Api>(
 			} satisfies AnthropicOptions;
 		}

+		case "bedrock-converse-stream":
+			return {
+				...base,
+				reasoning: options?.reasoning,
+				thinkingBudgets: options?.thinkingBudgets,
+			} satisfies BedrockOptions;
+
 		case "openai-completions":
 			return {
 				...base,
--- a/packages/ai/src/types.ts
+++ b/packages/ai/src/types.ts
@ -1,3 +1,4 @@
+import type { BedrockOptions } from "./providers/amazon-bedrock.js";
 import type { AnthropicOptions } from "./providers/anthropic.js";
 import type { GoogleOptions } from "./providers/google.js";
 import type { GoogleGeminiCliOptions } from "./providers/google-gemini-cli.js";
@ -14,12 +15,14 @@ export type Api =
 	| "openai-responses"
 	| "openai-codex-responses"
 	| "anthropic-messages"
+	| "bedrock-converse-stream"
 	| "google-generative-ai"
 	| "google-gemini-cli"
 	| "google-vertex";

 export interface ApiOptionsMap {
 	"anthropic-messages": AnthropicOptions;
+	"bedrock-converse-stream": BedrockOptions;
 	"openai-completions": OpenAICompletionsOptions;
 	"openai-responses": OpenAIResponsesOptions;
 	"openai-codex-responses": OpenAICodexResponsesOptions;
@ -40,6 +43,7 @@ const _exhaustive: _CheckExhaustive = true;
 export type OptionsForApi<TApi extends Api> = ApiOptionsMap[TApi];

 export type KnownProvider =
+	| "amazon-bedrock"
 	| "anthropic"
 	| "google"
 	| "google-gemini-cli"
@ -52,8 +56,10 @@ export type KnownProvider =
 	| "groq"
 	| "cerebras"
 	| "openrouter"
+	| "vercel-ai-gateway"
 	| "zai"
 	| "mistral"
+	| "minimax"
 	| "opencode";
 export type Provider = KnownProvider | string;

@ -219,6 +225,8 @@ export interface OpenAICompat {
 	requiresThinkingAsText?: boolean;
 	/** Whether tool call IDs must be normalized to Mistral format (exactly 9 alphanumeric chars). Default: auto-detected from URL. */
 	requiresMistralToolIds?: boolean;
+	/** Format for reasoning/thinking parameter. "openai" uses reasoning_effort, "zai" uses thinking: { type: "enabled" }. Default: "openai". */
+	thinkingFormat?: "openai" | "zai";
 }

 // Model interface for the unified model system
--- a/packages/ai/src/utils/overflow.ts
+++ b/packages/ai/src/utils/overflow.ts
@ -17,6 +17,7 @@ import type { AssistantMessage } from "../types.js";
 * - llama.cpp: "the request exceeds the available context size, try increasing it"
 * - LM Studio: "tokens to keep from the initial prompt is greater than the context length"
 * - GitHub Copilot: "prompt token count of X exceeds the limit of Y"
+ * - MiniMax: "invalid params, context window exceeds limit"
 * - Cerebras: Returns "400 status code (no body)" - handled separately below
 * - Mistral: Returns "400 status code (no body)" - handled separately below
 * - z.ai: Does NOT error, accepts overflow silently - handled via usage.input > contextWindow
@ -24,6 +25,7 @@ import type { AssistantMessage } from "../types.js";
 */
 const OVERFLOW_PATTERNS = [
 	/prompt is too long/i, // Anthropic
+	/input is too long for requested model/i, // Amazon Bedrock
 	/exceeds the context window/i, // OpenAI (Completions & Responses API)
 	/input token count.*exceeds the maximum/i, // Google (Gemini)
 	/maximum prompt length is \d+/i, // xAI (Grok)
@ -32,6 +34,7 @@ const OVERFLOW_PATTERNS = [
 	/exceeds the limit of \d+/i, // GitHub Copilot
 	/exceeds the available context size/i, // llama.cpp server
 	/greater than the context length/i, // LM Studio
+	/context window exceeds limit/i, // MiniMax
 	/context[_ ]length[_ ]exceeded/i, // Generic fallback
 	/too many tokens/i, // Generic fallback
 	/token limit exceeded/i, // Generic fallback
--- a/packages/ai/test/abort.test.ts
+++ b/packages/ai/test/abort.test.ts
@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
 import { getModel } from "../src/models.js";
 import { complete, stream } from "../src/stream.js";
 import type { Api, Context, Model, OptionsForApi } from "../src/types.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
 import { resolveApiKey } from "./oauth.js";

 // Resolve OAuth tokens at module level (async, runs before tests)
@ -66,6 +67,35 @@ async function testImmediateAbort<TApi extends Api>(llm: Model<TApi>, options: O
 	expect(response.stopReason).toBe("aborted");
 }

+async function testAbortThenNewMessage<TApi extends Api>(llm: Model<TApi>, options: OptionsForApi<TApi> = {}) {
+	// First request: abort immediately before any response content arrives
+	const controller = new AbortController();
+	controller.abort();
+
+	const context: Context = {
+		messages: [{ role: "user", content: "Hello, how are you?", timestamp: Date.now() }],
+	};
+
+	const abortedResponse = await complete(llm, context, { ...options, signal: controller.signal });
+	expect(abortedResponse.stopReason).toBe("aborted");
+	// The aborted message has empty content since we aborted before anything arrived
+	expect(abortedResponse.content.length).toBe(0);
+
+	// Add the aborted assistant message to context (this is what happens in the real coding agent)
+	context.messages.push(abortedResponse);
+
+	// Second request: send a new message - this should work even with the aborted message in context
+	context.messages.push({
+		role: "user",
+		content: "What is 2 + 2?",
+		timestamp: Date.now(),
+	});
+
+	const followUp = await complete(llm, context, options);
+	expect(followUp.stopReason).toBe("stop");
+	expect(followUp.content.length).toBeGreaterThan(0);
+}
+
 describe("AI Providers Abort Tests", () => {
 	describe.skipIf(!process.env.GEMINI_API_KEY)("Google Provider Abort", () => {
 		const llm = getModel("google", "gemini-2.5-flash");
@ -130,6 +160,30 @@ describe("AI Providers Abort Tests", () => {
 		});
 	});

+	describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider Abort", () => {
+		const llm = getModel("minimax", "MiniMax-M2.1");
+
+		it("should abort mid-stream", { retry: 3 }, async () => {
+			await testAbortSignal(llm);
+		});
+
+		it("should handle immediate abort", { retry: 3 }, async () => {
+			await testImmediateAbort(llm);
+		});
+	});
+
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider Abort", () => {
+		const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+
+		it("should abort mid-stream", { retry: 3 }, async () => {
+			await testAbortSignal(llm);
+		});
+
+		it("should handle immediate abort", { retry: 3 }, async () => {
+			await testImmediateAbort(llm);
+		});
+	});
+
 	// Google Gemini CLI / Antigravity share the same provider, so one test covers both
 	describe("Google Gemini CLI Provider Abort", () => {
 		it.skipIf(!geminiCliToken)("should abort mid-stream", { retry: 3 }, async () => {
@ -154,4 +208,20 @@ describe("AI Providers Abort Tests", () => {
 			await testImmediateAbort(llm, { apiKey: openaiCodexToken });
 		});
 	});
+
+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider Abort", () => {
+		const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+
+		it("should abort mid-stream", { retry: 3 }, async () => {
+			await testAbortSignal(llm, { reasoning: "medium" });
+		});
+
+		it("should handle immediate abort", { retry: 3 }, async () => {
+			await testImmediateAbort(llm);
+		});
+
+		it("should handle abort then new message", { retry: 3 }, async () => {
+			await testAbortThenNewMessage(llm);
+		});
+	});
 });
--- a/packages/ai/test/bedrock-models.test.ts
+++ b/packages/ai/test/bedrock-models.test.ts
@ -0,0 +1,66 @@
+/**
+ * A test suite to ensure all configured Amazon Bedrock models are usable.
+ *
+ * This is here to make sure we got correct model identifiers from models.dev and other sources.
+ * Because Amazon Bedrock requires cross-region inference in some models,
+ * plain model identifiers are not always usable and it requires tweaking of model identifiers to use cross-region inference.
+ * See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html#inference-profiles-support-system for more details.
+ *
+ * This test suite is not enabled by default unless AWS credentials and `BEDROCK_EXTENSIVE_MODEL_TEST` environment variables are set.
+ * This test suite takes ~2 minutes to run. Because not all models are available in all regions,
+ * it's recommended to use `us-west-2` region for best coverage for running this test suite.
+ *
+ * You can run this test suite with:
+ * ```bash
+ * $ AWS_REGION=us-west-2 BEDROCK_EXTENSIVE_MODEL_TEST=1 AWS_PROFILE=... npm test -- ./test/bedrock-models.test.ts
+ * ```
+ */
+
+import { describe, expect, it } from "vitest";
+import { getModels } from "../src/models.js";
+import { complete } from "../src/stream.js";
+import type { Context } from "../src/types.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
+
+describe("Amazon Bedrock Models", () => {
+	const models = getModels("amazon-bedrock");
+
+	it("should get all available Bedrock models", () => {
+		expect(models.length).toBeGreaterThan(0);
+		console.log(`Found ${models.length} Bedrock models`);
+	});
+
+	if (hasBedrockCredentials() && process.env.BEDROCK_EXTENSIVE_MODEL_TEST) {
+		for (const model of models) {
+			it(`should make a simple request with ${model.id}`, { timeout: 10_000 }, async () => {
+				const context: Context = {
+					systemPrompt: "You are a helpful assistant. Be extremely concise.",
+					messages: [
+						{
+							role: "user",
+							content: "Reply with exactly: 'OK'",
+							timestamp: Date.now(),
+						},
+					],
+				};
+
+				const response = await complete(model, context);
+
+				expect(response.role).toBe("assistant");
+				expect(response.content).toBeTruthy();
+				expect(response.content.length).toBeGreaterThan(0);
+				expect(response.usage.input + response.usage.cacheRead).toBeGreaterThan(0);
+				expect(response.usage.output).toBeGreaterThan(0);
+				expect(response.errorMessage).toBeFalsy();
+
+				const textContent = response.content
+					.filter((b) => b.type === "text")
+					.map((b) => (b.type === "text" ? b.text : ""))
+					.join("")
+					.trim();
+				expect(textContent).toBeTruthy();
+				console.log(`${model.id}: ${textContent.substring(0, 100)}`);
+			});
+		}
+	}
+});
--- a/packages/ai/test/bedrock-utils.ts
+++ b/packages/ai/test/bedrock-utils.ts
@ -0,0 +1,18 @@
+/**
+ * Utility functions for Amazon Bedrock tests
+ */
+
+/**
+ * Check if any valid AWS credentials are configured for Bedrock.
+ * Returns true if any of the following are set:
+ * - AWS_PROFILE (named profile from ~/.aws/credentials)
+ * - AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY (IAM keys)
+ * - AWS_BEARER_TOKEN_BEDROCK (Bedrock API key)
+ */
+export function hasBedrockCredentials(): boolean {
+	return !!(
+		process.env.AWS_PROFILE ||
+		(process.env.AWS_ACCESS_KEY_ID && process.env.AWS_SECRET_ACCESS_KEY) ||
+		process.env.AWS_BEARER_TOKEN_BEDROCK
+	);
+}
--- a/packages/ai/test/context-overflow.test.ts
+++ b/packages/ai/test/context-overflow.test.ts
@ -18,6 +18,7 @@ import { getModel } from "../src/models.js";
 import { complete } from "../src/stream.js";
 import type { AssistantMessage, Context, Model, Usage } from "../src/types.js";
 import { isContextOverflow } from "../src/utils/overflow.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
 import { resolveApiKey } from "./oauth.js";

 // Resolve OAuth tokens at module level (async, runs before tests)
@ -284,6 +285,22 @@ describe("Context overflow error handling", () => {
 		);
 	});

+	// =============================================================================
+	// Amazon Bedrock
+	// Expected pattern: "Input is too long for requested model"
+	// =============================================================================
+
+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock", () => {
+		it("claude-sonnet-4-5 - should detect overflow via isContextOverflow", async () => {
+			const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+			const result = await testContextOverflow(model, "");
+			logResult(result);
+
+			expect(result.stopReason).toBe("error");
+			expect(isContextOverflow(result.response, model.contextWindow)).toBe(true);
+		}, 120000);
+	});
+
 	// =============================================================================
 	// xAI
 	// Expected pattern: "maximum prompt length is X but the request contains Y"
@ -379,6 +396,37 @@ describe("Context overflow error handling", () => {
 		}, 120000);
 	});

+	// =============================================================================
+	// MiniMax
+	// Expected pattern: TBD - need to test actual error message
+	// =============================================================================
+
+	describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax", () => {
+		it("MiniMax-M2.1 - should detect overflow via isContextOverflow", async () => {
+			const model = getModel("minimax", "MiniMax-M2.1");
+			const result = await testContextOverflow(model, process.env.MINIMAX_API_KEY!);
+			logResult(result);
+
+			expect(result.stopReason).toBe("error");
+			expect(isContextOverflow(result.response, model.contextWindow)).toBe(true);
+		}, 120000);
+	});
+
+	// =============================================================================
+	// Vercel AI Gateway - Unified API for multiple providers
+	// =============================================================================
+
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway", () => {
+		it("google/gemini-2.5-flash via AI Gateway - should detect overflow via isContextOverflow", async () => {
+			const model = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+			const result = await testContextOverflow(model, process.env.AI_GATEWAY_API_KEY!);
+			logResult(result);
+
+			expect(result.stopReason).toBe("error");
+			expect(isContextOverflow(result.response, model.contextWindow)).toBe(true);
+		}, 120000);
+	});
+
 	// =============================================================================
 	// OpenRouter - Multiple backend providers
 	// Expected pattern: "maximum context length is X tokens"
--- a/packages/ai/test/empty.test.ts
+++ b/packages/ai/test/empty.test.ts
@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
 import { getModel } from "../src/models.js";
 import { complete } from "../src/stream.js";
 import type { Api, AssistantMessage, Context, Model, OptionsForApi, UserMessage } from "../src/types.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
 import { resolveApiKey } from "./oauth.js";

 // Resolve OAuth tokens at module level (async, runs before tests)
@ -321,6 +322,66 @@ describe("AI Providers Empty Message Tests", () => {
 		});
 	});

+	describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider Empty Messages", () => {
+		const llm = getModel("minimax", "MiniMax-M2.1");
+
+		it("should handle empty content array", { retry: 3, timeout: 30000 }, async () => {
+			await testEmptyMessage(llm);
+		});
+
+		it("should handle empty string content", { retry: 3, timeout: 30000 }, async () => {
+			await testEmptyStringMessage(llm);
+		});
+
+		it("should handle whitespace-only content", { retry: 3, timeout: 30000 }, async () => {
+			await testWhitespaceOnlyMessage(llm);
+		});
+
+		it("should handle empty assistant message in conversation", { retry: 3, timeout: 30000 }, async () => {
+			await testEmptyAssistantMessage(llm);
+		});
+	});
+
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider Empty Messages", () => {
+		const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+
+		it("should handle empty content array", { retry: 3, timeout: 30000 }, async () => {
+			await testEmptyMessage(llm);
+		});
+
+		it("should handle empty string content", { retry: 3, timeout: 30000 }, async () => {
+			await testEmptyStringMessage(llm);
+		});
+
+		it("should handle whitespace-only content", { retry: 3, timeout: 30000 }, async () => {
+			await testWhitespaceOnlyMessage(llm);
+		});
+
+		it("should handle empty assistant message in conversation", { retry: 3, timeout: 30000 }, async () => {
+			await testEmptyAssistantMessage(llm);
+		});
+	});
+
+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider Empty Messages", () => {
+		const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+
+		it("should handle empty content array", { retry: 3, timeout: 30000 }, async () => {
+			await testEmptyMessage(llm);
+		});
+
+		it("should handle empty string content", { retry: 3, timeout: 30000 }, async () => {
+			await testEmptyStringMessage(llm);
+		});
+
+		it("should handle whitespace-only content", { retry: 3, timeout: 30000 }, async () => {
+			await testWhitespaceOnlyMessage(llm);
+		});
+
+		it("should handle empty assistant message in conversation", { retry: 3, timeout: 30000 }, async () => {
+			await testEmptyAssistantMessage(llm);
+		});
+	});
+
 	// =========================================================================
 	// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
 	// =========================================================================
--- a/packages/ai/test/google-gemini-cli-claude-thinking-header.test.ts
+++ b/packages/ai/test/google-gemini-cli-claude-thinking-header.test.ts
@ -0,0 +1,103 @@
+import { afterEach, describe, expect, it, vi } from "vitest";
+import { streamGoogleGeminiCli } from "../src/providers/google-gemini-cli.js";
+import type { Context, Model } from "../src/types.js";
+
+const originalFetch = global.fetch;
+const apiKey = JSON.stringify({ token: "token", projectId: "project" });
+
+const createSseResponse = () => {
+	const sse = `${[
+		`data: ${JSON.stringify({
+			response: {
+				candidates: [
+					{
+						content: { role: "model", parts: [{ text: "Hello" }] },
+						finishReason: "STOP",
+					},
+				],
+			},
+		})}`,
+	].join("\n\n")}\n\n`;
+
+	const encoder = new TextEncoder();
+	const stream = new ReadableStream<Uint8Array>({
+		start(controller) {
+			controller.enqueue(encoder.encode(sse));
+			controller.close();
+		},
+	});
+
+	return new Response(stream, {
+		status: 200,
+		headers: { "content-type": "text/event-stream" },
+	});
+};
+
+afterEach(() => {
+	global.fetch = originalFetch;
+	vi.restoreAllMocks();
+});
+
+describe("google-gemini-cli Claude thinking header", () => {
+	const context: Context = {
+		messages: [{ role: "user", content: "Say hello", timestamp: Date.now() }],
+	};
+
+	it("adds anthropic-beta for Claude thinking models", async () => {
+		const fetchMock = vi.fn(async (_input: string | URL, init?: RequestInit) => {
+			const headers = new Headers(init?.headers);
+			expect(headers.get("anthropic-beta")).toBe("interleaved-thinking-2025-05-14");
+			return createSseResponse();
+		});
+
+		global.fetch = fetchMock as typeof fetch;
+
+		const model: Model<"google-gemini-cli"> = {
+			id: "claude-opus-4-5-thinking",
+			name: "Claude Opus 4.5 Thinking",
+			api: "google-gemini-cli",
+			provider: "google-antigravity",
+			baseUrl: "https://cloudcode-pa.googleapis.com",
+			reasoning: true,
+			input: ["text"],
+			cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+			contextWindow: 128000,
+			maxTokens: 8192,
+		};
+
+		const stream = streamGoogleGeminiCli(model, context, { apiKey });
+		for await (const _event of stream) {
+			// exhaust stream
+		}
+		await stream.result();
+	});
+
+	it("does not add anthropic-beta for Gemini models", async () => {
+		const fetchMock = vi.fn(async (_input: string | URL, init?: RequestInit) => {
+			const headers = new Headers(init?.headers);
+			expect(headers.has("anthropic-beta")).toBe(false);
+			return createSseResponse();
+		});
+
+		global.fetch = fetchMock as typeof fetch;
+
+		const model: Model<"google-gemini-cli"> = {
+			id: "gemini-2.5-flash",
+			name: "Gemini 2.5 Flash",
+			api: "google-gemini-cli",
+			provider: "google-gemini-cli",
+			baseUrl: "https://cloudcode-pa.googleapis.com",
+			reasoning: false,
+			input: ["text"],
+			cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+			contextWindow: 128000,
+			maxTokens: 8192,
+		};
+
+		const stream = streamGoogleGeminiCli(model, context, { apiKey });
+		for await (const _event of stream) {
+			// exhaust stream
+		}
+		await stream.result();
+	});
+});
--- a/packages/ai/test/google-gemini-cli-empty-stream.test.ts
+++ b/packages/ai/test/google-gemini-cli-empty-stream.test.ts
@ -0,0 +1,108 @@
+import { afterEach, describe, expect, it, vi } from "vitest";
+import { streamGoogleGeminiCli } from "../src/providers/google-gemini-cli.js";
+import type { Context, Model } from "../src/types.js";
+
+const originalFetch = global.fetch;
+
+afterEach(() => {
+	global.fetch = originalFetch;
+	vi.restoreAllMocks();
+});
+
+describe("google-gemini-cli empty stream retry", () => {
+	it("retries empty SSE responses without duplicate start", async () => {
+		const emptyStream = new ReadableStream<Uint8Array>({
+			start(controller) {
+				controller.close();
+			},
+		});
+
+		const sse = `${[
+			`data: ${JSON.stringify({
+				response: {
+					candidates: [
+						{
+							content: { role: "model", parts: [{ text: "Hello" }] },
+							finishReason: "STOP",
+						},
+					],
+					usageMetadata: {
+						promptTokenCount: 1,
+						candidatesTokenCount: 1,
+						totalTokenCount: 2,
+					},
+				},
+			})}`,
+		].join("\n\n")}\n\n`;
+
+		const encoder = new TextEncoder();
+		const dataStream = new ReadableStream<Uint8Array>({
+			start(controller) {
+				controller.enqueue(encoder.encode(sse));
+				controller.close();
+			},
+		});
+
+		let callCount = 0;
+		const fetchMock = vi.fn(async () => {
+			callCount += 1;
+			if (callCount === 1) {
+				return new Response(emptyStream, {
+					status: 200,
+					headers: { "content-type": "text/event-stream" },
+				});
+			}
+			return new Response(dataStream, {
+				status: 200,
+				headers: { "content-type": "text/event-stream" },
+			});
+		});
+
+		global.fetch = fetchMock as typeof fetch;
+
+		const model: Model<"google-gemini-cli"> = {
+			id: "gemini-2.5-flash",
+			name: "Gemini 2.5 Flash",
+			api: "google-gemini-cli",
+			provider: "google-gemini-cli",
+			baseUrl: "https://cloudcode-pa.googleapis.com",
+			reasoning: false,
+			input: ["text"],
+			cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+			contextWindow: 128000,
+			maxTokens: 8192,
+		};
+
+		const context: Context = {
+			messages: [{ role: "user", content: "Say hello", timestamp: Date.now() }],
+		};
+
+		const stream = streamGoogleGeminiCli(model, context, {
+			apiKey: JSON.stringify({ token: "token", projectId: "project" }),
+		});
+
+		let startCount = 0;
+		let doneCount = 0;
+		let text = "";
+
+		for await (const event of stream) {
+			if (event.type === "start") {
+				startCount += 1;
+			}
+			if (event.type === "done") {
+				doneCount += 1;
+			}
+			if (event.type === "text_delta") {
+				text += event.delta;
+			}
+		}
+
+		const result = await stream.result();
+
+		expect(text).toBe("Hello");
+		expect(result.stopReason).toBe("stop");
+		expect(startCount).toBe(1);
+		expect(doneCount).toBe(1);
+		expect(fetchMock).toHaveBeenCalledTimes(2);
+	});
+});
--- a/packages/ai/test/google-gemini-cli-retry-delay.test.ts
+++ b/packages/ai/test/google-gemini-cli-retry-delay.test.ts
@ -0,0 +1,53 @@
+import { afterEach, describe, expect, it, vi } from "vitest";
+import { extractRetryDelay } from "../src/providers/google-gemini-cli.js";
+
+describe("extractRetryDelay header parsing", () => {
+	afterEach(() => {
+		vi.useRealTimers();
+	});
+
+	it("prefers Retry-After seconds header", () => {
+		vi.useFakeTimers();
+		vi.setSystemTime(new Date("2025-01-01T00:00:00Z"));
+
+		const response = new Response("", { headers: { "Retry-After": "5" } });
+		const delay = extractRetryDelay("Please retry in 1s", response);
+
+		expect(delay).toBe(6000);
+	});
+
+	it("parses Retry-After HTTP date header", () => {
+		vi.useFakeTimers();
+		const now = new Date("2025-01-01T00:00:00Z");
+		vi.setSystemTime(now);
+
+		const retryAt = new Date(now.getTime() + 12000).toUTCString();
+		const response = new Response("", { headers: { "Retry-After": retryAt } });
+		const delay = extractRetryDelay("", response);
+
+		expect(delay).toBe(13000);
+	});
+
+	it("parses x-ratelimit-reset header", () => {
+		vi.useFakeTimers();
+		const now = new Date("2025-01-01T00:00:00Z");
+		vi.setSystemTime(now);
+
+		const resetAtMs = now.getTime() + 20000;
+		const resetSeconds = Math.floor(resetAtMs / 1000).toString();
+		const response = new Response("", { headers: { "x-ratelimit-reset": resetSeconds } });
+		const delay = extractRetryDelay("", response);
+
+		expect(delay).toBe(21000);
+	});
+
+	it("parses x-ratelimit-reset-after header", () => {
+		vi.useFakeTimers();
+		vi.setSystemTime(new Date("2025-01-01T00:00:00Z"));
+
+		const response = new Response("", { headers: { "x-ratelimit-reset-after": "30" } });
+		const delay = extractRetryDelay("", response);
+
+		expect(delay).toBe(31000);
+	});
+});
--- a/packages/ai/test/google-gemini-cli-session-id.test.ts
+++ b/packages/ai/test/google-gemini-cli-session-id.test.ts
@ -0,0 +1,50 @@
+import { createHash } from "node:crypto";
+import { describe, expect, it } from "vitest";
+import { buildRequest } from "../src/providers/google-gemini-cli.js";
+import type { Context, Model } from "../src/types.js";
+
+const model: Model<"google-gemini-cli"> = {
+	id: "gemini-2.5-flash",
+	name: "Gemini 2.5 Flash",
+	api: "google-gemini-cli",
+	provider: "google-gemini-cli",
+	baseUrl: "https://cloudcode-pa.googleapis.com",
+	reasoning: false,
+	input: ["text"],
+	cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+	contextWindow: 128000,
+	maxTokens: 8192,
+};
+
+describe("buildRequest sessionId", () => {
+	it("derives sessionId from the first user message", () => {
+		const context: Context = {
+			messages: [
+				{ role: "user", content: "First message", timestamp: Date.now() },
+				{ role: "user", content: "Second message", timestamp: Date.now() },
+			],
+		};
+
+		const result = buildRequest(model, context, "project-id");
+		const expected = createHash("sha256").update("First message").digest("hex").slice(0, 32);
+
+		expect(result.request.sessionId).toBe(expected);
+	});
+
+	it("omits sessionId when the first user message has no text", () => {
+		const context: Context = {
+			messages: [
+				{
+					role: "user",
+					content: [{ type: "image", data: "Zm9v", mimeType: "image/png" }],
+					timestamp: Date.now(),
+				},
+				{ role: "user", content: "Later text", timestamp: Date.now() },
+			],
+		};
+
+		const result = buildRequest(model, context, "project-id");
+
+		expect(result.request.sessionId).toBeUndefined();
+	});
+});
--- a/packages/ai/test/image-limits.test.ts
+++ b/packages/ai/test/image-limits.test.ts
@ -75,6 +75,7 @@ import { afterAll, beforeAll, describe, expect, it } from "vitest";
 import { getModel } from "../src/models.js";
 import { complete } from "../src/stream.js";
 import type { Api, Context, ImageContent, Model, OptionsForApi, UserMessage } from "../src/types.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";

 const __filename = fileURLToPath(import.meta.url);
 const __dirname = dirname(__filename);
@ -840,6 +841,122 @@ describe("Image Limits E2E Tests", () => {
 		});
 	});

+	// -------------------------------------------------------------------------
+	// Vercel AI Gateway (google/gemini-2.5-flash)
+	// -------------------------------------------------------------------------
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway (google/gemini-2.5-flash)", () => {
+		const model = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+
+		it("should accept a small number of images (5)", async () => {
+			const result = await testImageCount(model, 5, smallImage);
+			expect(result.success, result.error).toBe(true);
+		});
+
+		it("should find maximum image count limit", { timeout: 600000 }, async () => {
+			const { limit, lastError } = await findLimit((count) => testImageCount(model, count, smallImage), 10, 100, 10);
+			console.log(`\n  Vercel AI Gateway max images: ~${limit} (last error: ${lastError})`);
+			expect(limit).toBeGreaterThanOrEqual(5);
+		});
+
+		it("should find maximum image size limit", { timeout: 600000 }, async () => {
+			const MB = 1024 * 1024;
+			const sizes = [5, 10, 15, 20];
+
+			let lastSuccess = 0;
+			let lastError: string | undefined;
+
+			for (const sizeMB of sizes) {
+				console.log(`  Testing size: ${sizeMB}MB...`);
+				const imageBase64 = generateImageWithSize(sizeMB * MB, `size-${sizeMB}mb.png`);
+				const result = await testImageSize(model, imageBase64);
+				if (result.success) {
+					lastSuccess = sizeMB;
+					console.log(`    SUCCESS`);
+				} else {
+					lastError = result.error;
+					console.log(`    FAILED: ${result.error?.substring(0, 100)}`);
+					break;
+				}
+			}
+
+			console.log(`\n  Vercel AI Gateway max image size: ~${lastSuccess}MB (last error: ${lastError})`);
+			expect(lastSuccess).toBeGreaterThanOrEqual(5);
+		});
+	});
+
+	// -------------------------------------------------------------------------
+	// Amazon Bedrock (claude-sonnet-4-5)
+	// Limits: 100 images (Anthropic), 5MB per image, 8000px max dimension
+	// -------------------------------------------------------------------------
+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock (claude-sonnet-4-5)", () => {
+		const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+
+		it("should accept a small number of images (5)", async () => {
+			const result = await testImageCount(model, 5, smallImage);
+			expect(result.success, result.error).toBe(true);
+		});
+
+		it("should find maximum image count limit", { timeout: 600000 }, async () => {
+			// Anthropic limit: 100 images
+			const { limit, lastError } = await findLimit((count) => testImageCount(model, count, smallImage), 20, 120, 20);
+			console.log(`\n  Bedrock max images: ~${limit} (last error: ${lastError})`);
+			expect(limit).toBeGreaterThanOrEqual(80);
+			expect(limit).toBeLessThanOrEqual(100);
+		});
+
+		it("should find maximum image size limit", { timeout: 600000 }, async () => {
+			const MB = 1024 * 1024;
+			// Anthropic limit: 5MB per image
+			const sizes = [1, 2, 3, 4, 5, 6];
+
+			let lastSuccess = 0;
+			let lastError: string | undefined;
+
+			for (const sizeMB of sizes) {
+				console.log(`  Testing size: ${sizeMB}MB...`);
+				const imageBase64 = generateImageWithSize(sizeMB * MB, `size-${sizeMB}mb.png`);
+				const result = await testImageSize(model, imageBase64);
+				if (result.success) {
+					lastSuccess = sizeMB;
+					console.log(`    SUCCESS`);
+				} else {
+					lastError = result.error;
+					console.log(`    FAILED: ${result.error?.substring(0, 100)}`);
+					break;
+				}
+			}
+
+			console.log(`\n  Bedrock max image size: ~${lastSuccess}MB (last error: ${lastError})`);
+			expect(lastSuccess).toBeGreaterThanOrEqual(1);
+		});
+
+		it("should find maximum image dimension limit", { timeout: 600000 }, async () => {
+			// Anthropic limit: 8000px
+			const dimensions = [1000, 2000, 4000, 6000, 8000, 10000];
+
+			let lastSuccess = 0;
+			let lastError: string | undefined;
+
+			for (const dim of dimensions) {
+				console.log(`  Testing dimension: ${dim}x${dim}...`);
+				const imageBase64 = generateImage(dim, dim, `dim-${dim}.png`);
+				const result = await testImageDimensions(model, imageBase64);
+				if (result.success) {
+					lastSuccess = dim;
+					console.log(`    SUCCESS`);
+				} else {
+					lastError = result.error;
+					console.log(`    FAILED: ${result.error?.substring(0, 100)}`);
+					break;
+				}
+			}
+
+			console.log(`\n  Bedrock max dimension: ~${lastSuccess}px (last error: ${lastError})`);
+			expect(lastSuccess).toBeGreaterThanOrEqual(6000);
+			expect(lastSuccess).toBeLessThanOrEqual(8000);
+		});
+	});
+
 	// =========================================================================
 	// MAX SIZE IMAGES TEST
 	// =========================================================================
@ -898,6 +1015,38 @@ describe("Image Limits E2E Tests", () => {
 			},
 		);

+		// Amazon Bedrock (Claude) - 5MB per image limit, same as Anthropic direct
+		// Using 3MB to stay under 5MB limit
+		it.skipIf(!hasBedrockCredentials())(
+			"Bedrock: max ~3MB images before rejection",
+			{ timeout: 900000 },
+			async () => {
+				const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+				const image3mb = getImageAtSize(3);
+				// Similar to Anthropic, test progressively
+				const counts = [1, 2, 4, 6, 8, 10, 12];
+
+				let lastSuccess = 0;
+				let lastError: string | undefined;
+
+				for (const count of counts) {
+					console.log(`  Testing ${count} x ~3MB images...`);
+					const result = await testImageCount(model, count, image3mb);
+					if (result.success) {
+						lastSuccess = count;
+						console.log(`    SUCCESS`);
+					} else {
+						lastError = result.error;
+						console.log(`    FAILED: ${result.error?.substring(0, 150)}`);
+						break;
+					}
+				}
+
+				console.log(`\n  Bedrock max ~3MB images: ${lastSuccess} (last error: ${lastError})`);
+				expect(lastSuccess).toBeGreaterThanOrEqual(1);
+			},
+		);
+
 		// OpenAI - 20MB per image documented, we found ≥25MB works
 		// Test with 15MB images to stay safely under limit
 		it.skipIf(!process.env.OPENAI_API_KEY)(
--- a/packages/ai/test/image-tool-result.test.ts
+++ b/packages/ai/test/image-tool-result.test.ts
@ -5,6 +5,7 @@ import { describe, expect, it } from "vitest";
 import type { Api, Context, Model, Tool, ToolResultMessage } from "../src/index.js";
 import { complete, getModel } from "../src/index.js";
 import type { OptionsForApi } from "../src/types.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
 import { resolveApiKey } from "./oauth.js";

 // Resolve OAuth tokens at module level (async, runs before tests)
@ -273,6 +274,30 @@ describe("Tool Results with Images", () => {
 		});
 	});

+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider (google/gemini-2.5-flash)", () => {
+		const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+
+		it("should handle tool result with only image", { retry: 3, timeout: 30000 }, async () => {
+			await handleToolWithImageResult(llm);
+		});
+
+		it("should handle tool result with text and image", { retry: 3, timeout: 30000 }, async () => {
+			await handleToolWithTextAndImageResult(llm);
+		});
+	});
+
+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider (claude-sonnet-4-5)", () => {
+		const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+
+		it("should handle tool result with only image", { retry: 3, timeout: 30000 }, async () => {
+			await handleToolWithImageResult(llm);
+		});
+
+		it("should handle tool result with text and image", { retry: 3, timeout: 30000 }, async () => {
+			await handleToolWithTextAndImageResult(llm);
+		});
+	});
+
 	// =========================================================================
 	// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
 	// =========================================================================
--- a/packages/ai/test/stream.test.ts
+++ b/packages/ai/test/stream.test.ts
@ -8,6 +8,7 @@ import { getModel } from "../src/models.js";
 import { complete, stream } from "../src/stream.js";
 import type { Api, Context, ImageContent, Model, OptionsForApi, Tool, ToolResultMessage } from "../src/types.js";
 import { StringEnum } from "../src/utils/typebox-helpers.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
 import { resolveApiKey } from "./oauth.js";

 const __filename = fileURLToPath(import.meta.url);
@ -356,7 +357,7 @@ describe("Generate E2E Tests", () => {
 			await handleStreaming(llm);
 		});

-		it("should handle ", { retry: 3 }, async () => {
+		it("should handle thinking", { retry: 3 }, async () => {
 			await handleThinking(llm, { thinking: { enabled: true, budgetTokens: 1024 } });
 		});

@ -597,6 +598,87 @@ describe("Generate E2E Tests", () => {
 		});
 	});

+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)(
+		"Vercel AI Gateway Provider (google/gemini-2.5-flash via Anthropic Messages)",
+		() => {
+			const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+
+			it("should complete basic text generation", { retry: 3 }, async () => {
+				await basicTextGeneration(llm);
+			});
+
+			it("should handle tool calling", { retry: 3 }, async () => {
+				await handleToolCall(llm);
+			});
+
+			it("should handle streaming", { retry: 3 }, async () => {
+				await handleStreaming(llm);
+			});
+
+			it("should handle image input", { retry: 3 }, async () => {
+				await handleImage(llm);
+			});
+
+			it("should handle multi-turn with tools", { retry: 3 }, async () => {
+				await multiTurn(llm);
+			});
+		},
+	);
+
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)(
+		"Vercel AI Gateway Provider (anthropic/claude-opus-4.5 via Anthropic Messages)",
+		() => {
+			const llm = getModel("vercel-ai-gateway", "anthropic/claude-opus-4.5");
+
+			it("should complete basic text generation", { retry: 3 }, async () => {
+				await basicTextGeneration(llm);
+			});
+
+			it("should handle tool calling", { retry: 3 }, async () => {
+				await handleToolCall(llm);
+			});
+
+			it("should handle streaming", { retry: 3 }, async () => {
+				await handleStreaming(llm);
+			});
+
+			it("should handle image input", { retry: 3 }, async () => {
+				await handleImage(llm);
+			});
+
+			it("should handle multi-turn with tools", { retry: 3 }, async () => {
+				await multiTurn(llm);
+			});
+		},
+	);
+
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)(
+		"Vercel AI Gateway Provider (openai/gpt-5.1-codex-max via Anthropic Messages)",
+		() => {
+			const llm = getModel("vercel-ai-gateway", "openai/gpt-5.1-codex-max");
+
+			it("should complete basic text generation", { retry: 3 }, async () => {
+				await basicTextGeneration(llm);
+			});
+
+			it("should handle tool calling", { retry: 3 }, async () => {
+				await handleToolCall(llm);
+			});
+
+			it("should handle streaming", { retry: 3 }, async () => {
+				await handleStreaming(llm);
+			});
+
+			it("should handle image input", { retry: 3 }, async () => {
+				await handleImage(llm);
+			});
+
+			it("should handle multi-turn with tools", { retry: 3 }, async () => {
+				await multiTurn(llm);
+			});
+		},
+	);
+
 	describe.skipIf(!process.env.ZAI_API_KEY)("zAI Provider (glm-4.5-air via OpenAI Completions)", () => {
 		const llm = getModel("zai", "glm-4.5-air");

@ -698,6 +780,30 @@ describe("Generate E2E Tests", () => {
 		});
 	});

+	describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider (MiniMax-M2.1 via Anthropic Messages)", () => {
+		const llm = getModel("minimax", "MiniMax-M2.1");
+
+		it("should complete basic text generation", { retry: 3 }, async () => {
+			await basicTextGeneration(llm);
+		});
+
+		it("should handle tool calling", { retry: 3 }, async () => {
+			await handleToolCall(llm);
+		});
+
+		it("should handle streaming", { retry: 3 }, async () => {
+			await handleStreaming(llm);
+		});
+
+		it("should handle thinking mode", { retry: 3 }, async () => {
+			await handleThinking(llm, { thinkingEnabled: true, thinkingBudgetTokens: 2048 });
+		});
+
+		it("should handle multi-turn with thinking and tools", { retry: 3 }, async () => {
+			await multiTurn(llm, { thinkingEnabled: true, thinkingBudgetTokens: 2048 });
+		});
+	});
+
 	// =========================================================================
 	// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
 	// Tokens are resolved at module level (see oauthTokens above)
@ -907,6 +1013,34 @@ describe("Generate E2E Tests", () => {
 		});
 	});

+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider (claude-sonnet-4-5)", () => {
+		const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+
+		it("should complete basic text generation", { retry: 3 }, async () => {
+			await basicTextGeneration(llm);
+		});
+
+		it("should handle tool calling", { retry: 3 }, async () => {
+			await handleToolCall(llm);
+		});
+
+		it("should handle streaming", { retry: 3 }, async () => {
+			await handleStreaming(llm);
+		});
+
+		it("should handle thinking", { retry: 3 }, async () => {
+			await handleThinking(llm, { reasoning: "medium" });
+		});
+
+		it("should handle multi-turn with thinking and tools", { retry: 3 }, async () => {
+			await multiTurn(llm, { reasoning: "high" });
+		});
+
+		it("should handle image input", { retry: 3 }, async () => {
+			await handleImage(llm);
+		});
+	});
+
 	// Check if ollama is installed and local LLM tests are enabled
 	let ollamaInstalled = false;
 	if (!process.env.PI_NO_LOCAL_LLM) {
--- a/packages/ai/test/tokens.test.ts
+++ b/packages/ai/test/tokens.test.ts
@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
 import { getModel } from "../src/models.js";
 import { stream } from "../src/stream.js";
 import type { Api, Context, Model, OptionsForApi } from "../src/types.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
 import { resolveApiKey } from "./oauth.js";

 // Resolve OAuth tokens at module level (async, runs before tests)
@ -44,18 +45,25 @@ async function testTokensOnAbort<TApi extends Api>(llm: Model<TApi>, options: Op

 	expect(msg.stopReason).toBe("aborted");

-	// OpenAI providers, OpenAI Codex, Gemini CLI, zai, and the GPT-OSS model on Antigravity only send usage in the final chunk,
-	// so when aborted they have no token stats Anthropic and Google send usage information early in the stream
+	// OpenAI providers, OpenAI Codex, Gemini CLI, zai, Amazon Bedrock, and the GPT-OSS model on Antigravity only send usage in the final chunk,
+	// so when aborted they have no token stats. Anthropic and Google send usage information early in the stream.
+	// MiniMax reports input tokens but not output tokens when aborted.
 	if (
 		llm.api === "openai-completions" ||
 		llm.api === "openai-responses" ||
 		llm.api === "openai-codex-responses" ||
 		llm.provider === "google-gemini-cli" ||
 		llm.provider === "zai" ||
+		llm.provider === "amazon-bedrock" ||
+		llm.provider === "vercel-ai-gateway" ||
 		(llm.provider === "google-antigravity" && llm.id.includes("gpt-oss"))
 	) {
 		expect(msg.usage.input).toBe(0);
 		expect(msg.usage.output).toBe(0);
+	} else if (llm.provider === "minimax") {
+		// MiniMax reports input tokens early but output tokens only in final chunk
+		expect(msg.usage.input).toBeGreaterThan(0);
+		expect(msg.usage.output).toBe(0);
 	} else {
 		expect(msg.usage.input).toBeGreaterThan(0);
 		expect(msg.usage.output).toBeGreaterThan(0);
@ -144,6 +152,22 @@ describe("Token Statistics on Abort", () => {
 		});
 	});

+	describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider", () => {
+		const llm = getModel("minimax", "MiniMax-M2.1");
+
+		it("should include token stats when aborted mid-stream", { retry: 3, timeout: 30000 }, async () => {
+			await testTokensOnAbort(llm);
+		});
+	});
+
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider", () => {
+		const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+
+		it("should include token stats when aborted mid-stream", { retry: 3, timeout: 30000 }, async () => {
+			await testTokensOnAbort(llm);
+		});
+	});
+
 	// =========================================================================
 	// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
 	// =========================================================================
@ -230,4 +254,12 @@ describe("Token Statistics on Abort", () => {
 			},
 		);
 	});
+
+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider", () => {
+		const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+
+		it("should include token stats when aborted mid-stream", { retry: 3, timeout: 30000 }, async () => {
+			await testTokensOnAbort(llm);
+		});
+	});
 });
--- a/packages/ai/test/tool-call-without-result.test.ts
+++ b/packages/ai/test/tool-call-without-result.test.ts
@ -3,6 +3,7 @@ import { describe, expect, it } from "vitest";
 import { getModel } from "../src/models.js";
 import { complete } from "../src/stream.js";
 import type { Api, Context, Model, OptionsForApi, Tool } from "../src/types.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
 import { resolveApiKey } from "./oauth.js";

 // Resolve OAuth tokens at module level (async, runs before tests)
@ -170,6 +171,30 @@ describe("Tool Call Without Result Tests", () => {
 		});
 	});

+	describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider", () => {
+		const model = getModel("minimax", "MiniMax-M2.1");
+
+		it("should filter out tool calls without corresponding tool results", { retry: 3, timeout: 30000 }, async () => {
+			await testToolCallWithoutResult(model);
+		});
+	});
+
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider", () => {
+		const model = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+
+		it("should filter out tool calls without corresponding tool results", { retry: 3, timeout: 30000 }, async () => {
+			await testToolCallWithoutResult(model);
+		});
+	});
+
+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider", () => {
+		const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+
+		it("should filter out tool calls without corresponding tool results", { retry: 3, timeout: 30000 }, async () => {
+			await testToolCallWithoutResult(model);
+		});
+	});
+
 	// =========================================================================
 	// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
 	// =========================================================================
--- a/packages/ai/test/total-tokens.test.ts
+++ b/packages/ai/test/total-tokens.test.ts
@ -16,6 +16,7 @@ import { describe, expect, it } from "vitest";
 import { getModel } from "../src/models.js";
 import { complete } from "../src/stream.js";
 import type { Api, Context, Model, OptionsForApi, Usage } from "../src/types.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
 import { resolveApiKey } from "./oauth.js";

 // Resolve OAuth tokens at module level (async, runs before tests)
@ -324,6 +325,52 @@ describe("totalTokens field", () => {
 		);
 	});

+	// =========================================================================
+	// MiniMax
+	// =========================================================================
+
+	describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax", () => {
+		it(
+			"MiniMax-M2.1 - should return totalTokens equal to sum of components",
+			{ retry: 3, timeout: 60000 },
+			async () => {
+				const llm = getModel("minimax", "MiniMax-M2.1");
+
+				console.log(`\nMiniMax / ${llm.id}:`);
+				const { first, second } = await testTotalTokensWithCache(llm, { apiKey: process.env.MINIMAX_API_KEY });
+
+				logUsage("First request", first);
+				logUsage("Second request", second);
+
+				assertTotalTokensEqualsComponents(first);
+				assertTotalTokensEqualsComponents(second);
+			},
+		);
+	});
+
+	// =========================================================================
+	// Vercel AI Gateway
+	// =========================================================================
+
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway", () => {
+		it(
+			"google/gemini-2.5-flash - should return totalTokens equal to sum of components",
+			{ retry: 3, timeout: 60000 },
+			async () => {
+				const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+
+				console.log(`\nVercel AI Gateway / ${llm.id}:`);
+				const { first, second } = await testTotalTokensWithCache(llm, { apiKey: process.env.AI_GATEWAY_API_KEY });
+
+				logUsage("First request", first);
+				logUsage("Second request", second);
+
+				assertTotalTokensEqualsComponents(first);
+				assertTotalTokensEqualsComponents(second);
+			},
+		);
+	});
+
 	// =========================================================================
 	// OpenRouter - Multiple backend providers
 	// =========================================================================
@ -535,6 +582,25 @@ describe("totalTokens field", () => {
 		);
 	});

+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock", () => {
+		it(
+			"claude-sonnet-4-5 - should return totalTokens equal to sum of components",
+			{ retry: 3, timeout: 60000 },
+			async () => {
+				const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+
+				console.log(`\nAmazon Bedrock / ${llm.id}:`);
+				const { first, second } = await testTotalTokensWithCache(llm);
+
+				logUsage("First request", first);
+				logUsage("Second request", second);
+
+				assertTotalTokensEqualsComponents(first);
+				assertTotalTokensEqualsComponents(second);
+			},
+		);
+	});
+
 	// =========================================================================
 	// OpenAI Codex (OAuth)
 	// =========================================================================
--- a/packages/ai/test/unicode-surrogate.test.ts
+++ b/packages/ai/test/unicode-surrogate.test.ts
@ -3,6 +3,7 @@ import { describe, expect, it } from "vitest";
 import { getModel } from "../src/models.js";
 import { complete } from "../src/stream.js";
 import type { Api, Context, Model, OptionsForApi, ToolResultMessage } from "../src/types.js";
+import { hasBedrockCredentials } from "./bedrock-utils.js";
 import { resolveApiKey } from "./oauth.js";

 // Empty schema for test tools - must be proper OBJECT type for Cloud Code Assist
@ -617,6 +618,54 @@ describe("AI Providers Unicode Surrogate Pair Tests", () => {
 		});
 	});

+	describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider Unicode Handling", () => {
+		const llm = getModel("minimax", "MiniMax-M2.1");
+
+		it("should handle emoji in tool results", { retry: 3, timeout: 30000 }, async () => {
+			await testEmojiInToolResults(llm);
+		});
+
+		it("should handle real-world LinkedIn comment data with emoji", { retry: 3, timeout: 30000 }, async () => {
+			await testRealWorldLinkedInData(llm);
+		});
+
+		it("should handle unpaired high surrogate (0xD83D) in tool results", { retry: 3, timeout: 30000 }, async () => {
+			await testUnpairedHighSurrogate(llm);
+		});
+	});
+
+	describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider Unicode Handling", () => {
+		const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
+
+		it("should handle emoji in tool results", { retry: 3, timeout: 30000 }, async () => {
+			await testEmojiInToolResults(llm);
+		});
+
+		it("should handle real-world LinkedIn comment data with emoji", { retry: 3, timeout: 30000 }, async () => {
+			await testRealWorldLinkedInData(llm);
+		});
+
+		it("should handle unpaired high surrogate (0xD83D) in tool results", { retry: 3, timeout: 30000 }, async () => {
+			await testUnpairedHighSurrogate(llm);
+		});
+	});
+
+	describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider Unicode Handling", () => {
+		const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
+
+		it("should handle emoji in tool results", { retry: 3, timeout: 30000 }, async () => {
+			await testEmojiInToolResults(llm);
+		});
+
+		it("should handle real-world LinkedIn comment data with emoji", { retry: 3, timeout: 30000 }, async () => {
+			await testRealWorldLinkedInData(llm);
+		});
+
+		it("should handle unpaired high surrogate (0xD83D) in tool results", { retry: 3, timeout: 30000 }, async () => {
+			await testUnpairedHighSurrogate(llm);
+		});
+	});
+
 	describe("OpenAI Codex Provider Unicode Handling", () => {
 		it.skipIf(!openaiCodexToken)(
 			"gpt-5.2-codex - should handle emoji in tool results",