Merge branch 'main' into feat/tui-overlay-options

This commit is contained in:
Mario Zechner 2026-01-13 22:06:02 +01:00
commit 7d45e434de
90 changed files with 10277 additions and 1700 deletions

View file

@ -2,6 +2,36 @@
## [Unreleased]
## [0.45.5] - 2026-01-13
## [0.45.4] - 2026-01-13
### Added
- Added Vercel AI Gateway provider with model discovery and `AI_GATEWAY_API_KEY` env support ([#689](https://github.com/badlogic/pi-mono/pull/689) by [@timolins](https://github.com/timolins))
### Fixed
- Fixed z.ai thinking/reasoning: z.ai uses `thinking: { type: "enabled" }` instead of OpenAI's `reasoning_effort`. Added `thinkingFormat` compat flag to handle this. ([#688](https://github.com/badlogic/pi-mono/issues/688))
## [0.45.3] - 2026-01-13
## [0.45.2] - 2026-01-13
## [0.45.1] - 2026-01-13
## [0.45.0] - 2026-01-13
### Added
- MiniMax provider support with M2 and M2.1 models via Anthropic-compatible API ([#656](https://github.com/badlogic/pi-mono/pull/656) by [@dannote](https://github.com/dannote))
- Add Amazon Bedrock provider with prompt caching for Claude models (experimental, tested with Anthropic Claude models only) ([#494](https://github.com/badlogic/pi-mono/pull/494) by [@unexge](https://github.com/unexge))
- Added `serviceTier` option for OpenAI Responses requests ([#672](https://github.com/badlogic/pi-mono/pull/672) by [@markusylisiurunen](https://github.com/markusylisiurunen))
- **Anthropic caching on OpenRouter**: Interactions with Anthropic models via OpenRouter now set a 5-minute cache point using Anthropic-style `cache_control` breakpoints on the last assistant or user message. ([#584](https://github.com/badlogic/pi-mono/pull/584) by [@nathyong](https://github.com/nathyong))
- **Google Gemini CLI provider improvements**: Added Antigravity endpoint fallback (tries daily sandbox then prod when `baseUrl` is unset), header-based retry delay parsing (`Retry-After`, `x-ratelimit-reset`, `x-ratelimit-reset-after`), stable `sessionId` derivation from first user message for cache affinity, empty SSE stream retry with backoff, and `anthropic-beta` header for Claude thinking models ([#670](https://github.com/badlogic/pi-mono/pull/670) by [@kim0](https://github.com/kim0))
## [0.44.0] - 2026-01-12
## [0.43.0] - 2026-01-11
### Fixed

View file

@ -56,9 +56,12 @@ Unified LLM API with automatic model discovery, provider configuration, token an
- **Cerebras**
- **xAI**
- **OpenRouter**
- **Vercel AI Gateway**
- **MiniMax**
- **GitHub Copilot** (requires OAuth, see below)
- **Google Gemini CLI** (requires OAuth, see below)
- **Antigravity** (requires OAuth, see below)
- **Amazon Bedrock**
- **Any OpenAI-compatible API**: Ollama, vLLM, LM Studio, etc.
## Installation
@ -708,6 +711,7 @@ interface OpenAICompat {
supportsDeveloperRole?: boolean; // Whether provider supports `developer` role vs `system` (default: true)
supportsReasoningEffort?: boolean; // Whether provider supports `reasoning_effort` (default: true)
maxTokensField?: 'max_completion_tokens' | 'max_tokens'; // Which field name to use (default: max_completion_tokens)
thinkingFormat?: 'openai' | 'zai'; // Format for reasoning param: 'openai' uses reasoning_effort, 'zai' uses thinking: { type: "enabled" } (default: openai)
}
```
@ -860,7 +864,9 @@ In Node.js environments, you can set environment variables to avoid passing API
| Cerebras | `CEREBRAS_API_KEY` |
| xAI | `XAI_API_KEY` |
| OpenRouter | `OPENROUTER_API_KEY` |
| Vercel AI Gateway | `AI_GATEWAY_API_KEY` |
| zAI | `ZAI_API_KEY` |
| MiniMax | `MINIMAX_API_KEY` |
| GitHub Copilot | `COPILOT_GITHUB_TOKEN` or `GH_TOKEN` or `GITHUB_TOKEN` |
When set, the library automatically uses these keys:
@ -1026,6 +1032,90 @@ const response = await complete(model, {
**Google Gemini CLI / Antigravity**: These use Google Cloud OAuth. The `apiKey` returned by `getOAuthApiKey()` is a JSON string containing both the token and project ID, which the library handles automatically.
## Development
### Adding a New Provider
Adding a new LLM provider requires changes across multiple files. This checklist covers all necessary steps:
#### 1. Core Types (`src/types.ts`)
- Add the API identifier to the `Api` type union (e.g., `"bedrock-converse-stream"`)
- Create an options interface extending `StreamOptions` (e.g., `BedrockOptions`)
- Add the mapping to `ApiOptionsMap`
- Add the provider name to `KnownProvider` type union (e.g., `"amazon-bedrock"`)
#### 2. Provider Implementation (`src/providers/`)
Create a new provider file (e.g., `amazon-bedrock.ts`) that exports:
- `stream<Provider>()` function returning `AssistantMessageEventStream`
- Provider-specific options interface
- Message conversion functions to transform `Context` to provider format
- Tool conversion if the provider supports tools
- Response parsing to emit standardized events (`text`, `tool_call`, `thinking`, `usage`, `stop`)
#### 3. Stream Integration (`src/stream.ts`)
- Import the provider's stream function and options type
- Add credential detection in `getEnvApiKey()` for the new provider
- Add a case in `mapOptionsForApi()` to map `SimpleStreamOptions` to provider options
- Add the provider's stream function to the `streamFunctions` map
#### 4. Model Generation (`scripts/generate-models.ts`)
- Add logic to fetch and parse models from the provider's source (e.g., models.dev API)
- Map provider model data to the standardized `Model` interface
- Handle provider-specific quirks (pricing format, capability flags, model ID transformations)
#### 5. Tests (`test/`)
Create or update test files to cover the new provider:
- `stream.test.ts` - Basic streaming and tool use
- `tokens.test.ts` - Token usage reporting
- `abort.test.ts` - Request cancellation
- `empty.test.ts` - Empty message handling
- `context-overflow.test.ts` - Context limit errors
- `image-limits.test.ts` - Image support (if applicable)
- `unicode-surrogate.test.ts` - Unicode handling
- `tool-call-without-result.test.ts` - Orphaned tool calls
- `image-tool-result.test.ts` - Images in tool results
- `total-tokens.test.ts` - Token counting accuracy
For providers with non-standard auth (AWS, Google Vertex), create a utility like `bedrock-utils.ts` with credential detection helpers.
#### 6. Coding Agent Integration (`../coding-agent/`)
Update `src/core/model-resolver.ts`:
- Add a default model ID for the provider in `DEFAULT_MODELS`
Update `src/cli/args.ts`:
- Add environment variable documentation in the help text
Update `README.md`:
- Add the provider to the providers section with setup instructions
#### 7. Documentation
Update `packages/ai/README.md`:
- Add to the Supported Providers table
- Document any provider-specific options or authentication requirements
- Add environment variable to the Environment Variables section
#### 8. Changelog
Add an entry to `packages/ai/CHANGELOG.md` under `## [Unreleased]`:
```markdown
### Added
- Added support for [Provider Name] provider ([#PR](link) by [@author](link))
```
## License
MIT

View file

@ -1,6 +1,6 @@
{
"name": "@mariozechner/pi-ai",
"version": "0.43.0",
"version": "0.45.5",
"description": "Unified LLM API with automatic model discovery and provider configuration",
"type": "module",
"main": "./dist/index.js",
@ -23,6 +23,7 @@
},
"dependencies": {
"@anthropic-ai/sdk": "0.71.2",
"@aws-sdk/client-bedrock-runtime": "^3.966.0",
"@google/genai": "1.34.0",
"@mistralai/mistralai": "1.10.0",
"@sinclair/typebox": "^0.34.41",
@ -39,6 +40,7 @@
"openai",
"anthropic",
"gemini",
"bedrock",
"unified",
"api"
],

View file

@ -32,6 +32,20 @@ interface ModelsDevModel {
};
}
interface AiGatewayModel {
id: string;
name?: string;
context_window?: number;
max_tokens?: number;
tags?: string[];
pricing?: {
input?: string | number;
output?: string | number;
input_cache_read?: string | number;
input_cache_write?: string | number;
};
}
const COPILOT_STATIC_HEADERS = {
"User-Agent": "GitHubCopilotChat/0.35.0",
"Editor-Version": "vscode/1.107.0",
@ -39,6 +53,9 @@ const COPILOT_STATIC_HEADERS = {
"Copilot-Integration-Id": "vscode-chat",
} as const;
const AI_GATEWAY_MODELS_URL = "https://ai-gateway.vercel.sh/v1";
const AI_GATEWAY_BASE_URL = "https://ai-gateway.vercel.sh";
async function fetchOpenRouterModels(): Promise<Model<any>[]> {
try {
console.log("Fetching models from OpenRouter API...");
@ -97,6 +114,64 @@ async function fetchOpenRouterModels(): Promise<Model<any>[]> {
}
}
async function fetchAiGatewayModels(): Promise<Model<any>[]> {
try {
console.log("Fetching models from Vercel AI Gateway API...");
const response = await fetch(`${AI_GATEWAY_MODELS_URL}/models`);
const data = await response.json();
const models: Model<any>[] = [];
const toNumber = (value: string | number | undefined): number => {
if (typeof value === "number") {
return Number.isFinite(value) ? value : 0;
}
const parsed = parseFloat(value ?? "0");
return Number.isFinite(parsed) ? parsed : 0;
};
const items = Array.isArray(data.data) ? (data.data as AiGatewayModel[]) : [];
for (const model of items) {
const tags = Array.isArray(model.tags) ? model.tags : [];
// Only include models that support tools
if (!tags.includes("tool-use")) continue;
const input: ("text" | "image")[] = ["text"];
if (tags.includes("vision")) {
input.push("image");
}
const inputCost = toNumber(model.pricing?.input) * 1_000_000;
const outputCost = toNumber(model.pricing?.output) * 1_000_000;
const cacheReadCost = toNumber(model.pricing?.input_cache_read) * 1_000_000;
const cacheWriteCost = toNumber(model.pricing?.input_cache_write) * 1_000_000;
models.push({
id: model.id,
name: model.name || model.id,
api: "anthropic-messages",
baseUrl: AI_GATEWAY_BASE_URL,
provider: "vercel-ai-gateway",
reasoning: tags.includes("reasoning"),
input,
cost: {
input: inputCost,
output: outputCost,
cacheRead: cacheReadCost,
cacheWrite: cacheWriteCost,
},
contextWindow: model.context_window || 4096,
maxTokens: model.max_tokens || 4096,
});
}
console.log(`Fetched ${models.length} tool-capable models from Vercel AI Gateway`);
return models;
} catch (error) {
console.error("Failed to fetch Vercel AI Gateway models:", error);
return [];
}
}
async function loadModelsDevData(): Promise<Model<any>[]> {
try {
console.log("Fetching models from models.dev API...");
@ -105,6 +180,87 @@ async function loadModelsDevData(): Promise<Model<any>[]> {
const models: Model<any>[] = [];
// Process Amazon Bedrock models
if (data["amazon-bedrock"]?.models) {
for (const [modelId, model] of Object.entries(data["amazon-bedrock"].models)) {
const m = model as ModelsDevModel;
if (m.tool_call !== true) continue;
let id = modelId;
if (id.startsWith("ai21.jamba")) {
// These models doesn't support tool use in streaming mode
continue;
}
if (id.startsWith("amazon.titan-text-express") ||
id.startsWith("mistral.mistral-7b-instruct-v0")) {
// These models doesn't support system messages
continue;
}
// Some Amazon Bedrock models require cross-region inference profiles to work.
// To use cross-region inference, we need to add a region prefix to the models.
// See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html#inference-profiles-support-system
// TODO: Remove Claude models once https://github.com/anomalyco/models.dev/pull/607 is merged, and follow-up with other models.
// Models with global cross-region inference profiles
if (id.startsWith("anthropic.claude-haiku-4-5") ||
id.startsWith("anthropic.claude-sonnet-4") ||
id.startsWith("anthropic.claude-opus-4-5") ||
id.startsWith("amazon.nova-2-lite") ||
id.startsWith("cohere.embed-v4") ||
id.startsWith("twelvelabs.pegasus-1-2")) {
id = "global." + id;
}
// Models with US cross-region inference profiles
if (id.startsWith("amazon.nova-lite") ||
id.startsWith("amazon.nova-micro") ||
id.startsWith("amazon.nova-premier") ||
id.startsWith("amazon.nova-pro") ||
id.startsWith("anthropic.claude-3-7-sonnet") ||
id.startsWith("anthropic.claude-opus-4-1") ||
id.startsWith("anthropic.claude-opus-4-20250514") ||
id.startsWith("deepseek.r1") ||
id.startsWith("meta.llama3-2") ||
id.startsWith("meta.llama3-3") ||
id.startsWith("meta.llama4")) {
id = "us." + id;
}
const bedrockModel = {
id,
name: m.name || id,
api: "bedrock-converse-stream" as const,
provider: "amazon-bedrock" as const,
baseUrl: "https://bedrock-runtime.us-east-1.amazonaws.com",
reasoning: m.reasoning === true,
input: (m.modalities?.input?.includes("image") ? ["text", "image"] : ["text"]) as ("text" | "image")[],
cost: {
input: m.cost?.input || 0,
output: m.cost?.output || 0,
cacheRead: m.cost?.cache_read || 0,
cacheWrite: m.cost?.cache_write || 0,
},
contextWindow: m.limit?.context || 4096,
maxTokens: m.limit?.output || 4096,
};
models.push(bedrockModel);
// Add EU cross-region inference variants for Claude models
if (modelId.startsWith("anthropic.claude-haiku-4-5") ||
modelId.startsWith("anthropic.claude-sonnet-4-5") ||
modelId.startsWith("anthropic.claude-opus-4-5")) {
models.push({
...bedrockModel,
id: "eu." + modelId,
name: (m.name || modelId) + " (EU)",
});
}
}
}
// Process Anthropic models
if (data.anthropic?.models) {
for (const [modelId, model] of Object.entries(data.anthropic.models)) {
@ -284,6 +440,7 @@ async function loadModelsDevData(): Promise<Model<any>[]> {
},
compat: {
supportsDeveloperRole: false,
thinkingFormat: "zai",
},
contextWindow: m.limit?.context || 4096,
maxTokens: m.limit?.output || 4096,
@ -409,6 +566,33 @@ async function loadModelsDevData(): Promise<Model<any>[]> {
}
}
// Process MiniMax models
if (data.minimax?.models) {
for (const [modelId, model] of Object.entries(data.minimax.models)) {
const m = model as ModelsDevModel;
if (m.tool_call !== true) continue;
models.push({
id: modelId,
name: m.name || modelId,
api: "anthropic-messages",
provider: "minimax",
// MiniMax's Anthropic-compatible API - SDK appends /v1/messages
baseUrl: "https://api.minimax.io/anthropic",
reasoning: m.reasoning === true,
input: m.modalities?.input?.includes("image") ? ["text", "image"] : ["text"],
cost: {
input: m.cost?.input || 0,
output: m.cost?.output || 0,
cacheRead: m.cost?.cache_read || 0,
cacheWrite: m.cost?.cache_write || 0,
},
contextWindow: m.limit?.context || 4096,
maxTokens: m.limit?.output || 4096,
});
}
}
console.log(`Loaded ${models.length} tool-capable models from models.dev`);
return models;
} catch (error) {
@ -421,11 +605,13 @@ async function generateModels() {
// Fetch models from both sources
// models.dev: Anthropic, Google, OpenAI, Groq, Cerebras
// OpenRouter: xAI and other providers (excluding Anthropic, Google, OpenAI)
// AI Gateway: OpenAI-compatible catalog with tool-capable models
const modelsDevModels = await loadModelsDevData();
const openRouterModels = await fetchOpenRouterModels();
const aiGatewayModels = await fetchAiGatewayModels();
// Combine models (models.dev has priority)
const allModels = [...modelsDevModels, ...openRouterModels];
const allModels = [...modelsDevModels, ...openRouterModels, ...aiGatewayModels];
// Fix incorrect cache pricing for Claude Opus 4.5 from models.dev
// models.dev has 3x the correct pricing (1.5/18.75 instead of 0.5/6.25)

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,548 @@
import {
BedrockRuntimeClient,
StopReason as BedrockStopReason,
type Tool as BedrockTool,
CachePointType,
type ContentBlock,
type ContentBlockDeltaEvent,
type ContentBlockStartEvent,
type ContentBlockStopEvent,
ConversationRole,
ConverseStreamCommand,
type ConverseStreamMetadataEvent,
ImageFormat,
type Message,
type SystemContentBlock,
type ToolChoice,
type ToolConfiguration,
ToolResultStatus,
} from "@aws-sdk/client-bedrock-runtime";
import { calculateCost } from "../models.js";
import type {
Api,
AssistantMessage,
Context,
Model,
StopReason,
StreamFunction,
StreamOptions,
TextContent,
ThinkingBudgets,
ThinkingContent,
ThinkingLevel,
Tool,
ToolCall,
ToolResultMessage,
} from "../types.js";
import { AssistantMessageEventStream } from "../utils/event-stream.js";
import { parseStreamingJson } from "../utils/json-parse.js";
import { sanitizeSurrogates } from "../utils/sanitize-unicode.js";
export interface BedrockOptions extends StreamOptions {
region?: string;
profile?: string;
toolChoice?: "auto" | "any" | "none" | { type: "tool"; name: string };
/* See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-reasoning.html for supported models. */
reasoning?: ThinkingLevel;
/* Custom token budgets per thinking level. Overrides default budgets. */
thinkingBudgets?: ThinkingBudgets;
/* Only supported by Claude 4.x models, see https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved */
interleavedThinking?: boolean;
}
type Block = (TextContent | ThinkingContent | ToolCall) & { index?: number; partialJson?: string };
export const streamBedrock: StreamFunction<"bedrock-converse-stream"> = (
model: Model<"bedrock-converse-stream">,
context: Context,
options: BedrockOptions,
): AssistantMessageEventStream => {
const stream = new AssistantMessageEventStream();
(async () => {
const output: AssistantMessage = {
role: "assistant",
content: [],
api: "bedrock-converse-stream" as Api,
provider: model.provider,
model: model.id,
usage: {
input: 0,
output: 0,
cacheRead: 0,
cacheWrite: 0,
totalTokens: 0,
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
},
stopReason: "stop",
timestamp: Date.now(),
};
const blocks = output.content as Block[];
try {
const client = new BedrockRuntimeClient({
region: options.region || process.env.AWS_REGION || process.env.AWS_DEFAULT_REGION || "us-east-1",
profile: options.profile,
});
const command = new ConverseStreamCommand({
modelId: model.id,
messages: convertMessages(context, model),
system: buildSystemPrompt(context.systemPrompt, model),
inferenceConfig: { maxTokens: options.maxTokens, temperature: options.temperature },
toolConfig: convertToolConfig(context.tools, options.toolChoice),
additionalModelRequestFields: buildAdditionalModelRequestFields(model, options),
});
const response = await client.send(command, { abortSignal: options.signal });
for await (const item of response.stream!) {
if (item.messageStart) {
if (item.messageStart.role !== ConversationRole.ASSISTANT) {
throw new Error("Unexpected assistant message start but got user message start instead");
}
stream.push({ type: "start", partial: output });
} else if (item.contentBlockStart) {
handleContentBlockStart(item.contentBlockStart, blocks, output, stream);
} else if (item.contentBlockDelta) {
handleContentBlockDelta(item.contentBlockDelta, blocks, output, stream);
} else if (item.contentBlockStop) {
handleContentBlockStop(item.contentBlockStop, blocks, output, stream);
} else if (item.messageStop) {
output.stopReason = mapStopReason(item.messageStop.stopReason);
} else if (item.metadata) {
handleMetadata(item.metadata, model, output);
} else if (item.internalServerException) {
throw new Error(`Internal server error: ${item.internalServerException.message}`);
} else if (item.modelStreamErrorException) {
throw new Error(`Model stream error: ${item.modelStreamErrorException.message}`);
} else if (item.validationException) {
throw new Error(`Validation error: ${item.validationException.message}`);
} else if (item.throttlingException) {
throw new Error(`Throttling error: ${item.throttlingException.message}`);
} else if (item.serviceUnavailableException) {
throw new Error(`Service unavailable: ${item.serviceUnavailableException.message}`);
}
}
if (options.signal?.aborted) {
throw new Error("Request was aborted");
}
if (output.stopReason === "error" || output.stopReason === "aborted") {
throw new Error("An unknown error occurred");
}
stream.push({ type: "done", reason: output.stopReason, message: output });
stream.end();
} catch (error) {
for (const block of output.content) {
delete (block as Block).index;
delete (block as Block).partialJson;
}
output.stopReason = options.signal?.aborted ? "aborted" : "error";
output.errorMessage = error instanceof Error ? error.message : JSON.stringify(error);
stream.push({ type: "error", reason: output.stopReason, error: output });
stream.end();
}
})();
return stream;
};
function handleContentBlockStart(
event: ContentBlockStartEvent,
blocks: Block[],
output: AssistantMessage,
stream: AssistantMessageEventStream,
): void {
const index = event.contentBlockIndex!;
const start = event.start;
if (start?.toolUse) {
const block: Block = {
type: "toolCall",
id: start.toolUse.toolUseId || "",
name: start.toolUse.name || "",
arguments: {},
partialJson: "",
index,
};
output.content.push(block);
stream.push({ type: "toolcall_start", contentIndex: blocks.length - 1, partial: output });
}
}
function handleContentBlockDelta(
event: ContentBlockDeltaEvent,
blocks: Block[],
output: AssistantMessage,
stream: AssistantMessageEventStream,
): void {
const contentBlockIndex = event.contentBlockIndex!;
const delta = event.delta;
let index = blocks.findIndex((b) => b.index === contentBlockIndex);
let block = blocks[index];
if (delta?.text !== undefined) {
// If no text block exists yet, create one, as `handleContentBlockStart` is not sent for text blocks
if (!block) {
const newBlock: Block = { type: "text", text: "", index: contentBlockIndex };
output.content.push(newBlock);
index = blocks.length - 1;
block = blocks[index];
stream.push({ type: "text_start", contentIndex: index, partial: output });
}
if (block.type === "text") {
block.text += delta.text;
stream.push({ type: "text_delta", contentIndex: index, delta: delta.text, partial: output });
}
} else if (delta?.toolUse && block?.type === "toolCall") {
block.partialJson = (block.partialJson || "") + (delta.toolUse.input || "");
block.arguments = parseStreamingJson(block.partialJson);
stream.push({ type: "toolcall_delta", contentIndex: index, delta: delta.toolUse.input || "", partial: output });
} else if (delta?.reasoningContent) {
let thinkingBlock = block;
let thinkingIndex = index;
if (!thinkingBlock) {
const newBlock: Block = { type: "thinking", thinking: "", thinkingSignature: "", index: contentBlockIndex };
output.content.push(newBlock);
thinkingIndex = blocks.length - 1;
thinkingBlock = blocks[thinkingIndex];
stream.push({ type: "thinking_start", contentIndex: thinkingIndex, partial: output });
}
if (thinkingBlock?.type === "thinking") {
if (delta.reasoningContent.text) {
thinkingBlock.thinking += delta.reasoningContent.text;
stream.push({
type: "thinking_delta",
contentIndex: thinkingIndex,
delta: delta.reasoningContent.text,
partial: output,
});
}
if (delta.reasoningContent.signature) {
thinkingBlock.thinkingSignature =
(thinkingBlock.thinkingSignature || "") + delta.reasoningContent.signature;
}
}
}
}
function handleMetadata(
event: ConverseStreamMetadataEvent,
model: Model<"bedrock-converse-stream">,
output: AssistantMessage,
): void {
if (event.usage) {
output.usage.input = event.usage.inputTokens || 0;
output.usage.output = event.usage.outputTokens || 0;
output.usage.cacheRead = event.usage.cacheReadInputTokens || 0;
output.usage.cacheWrite = event.usage.cacheWriteInputTokens || 0;
output.usage.totalTokens = event.usage.totalTokens || output.usage.input + output.usage.output;
calculateCost(model, output.usage);
}
}
function handleContentBlockStop(
event: ContentBlockStopEvent,
blocks: Block[],
output: AssistantMessage,
stream: AssistantMessageEventStream,
): void {
const index = blocks.findIndex((b) => b.index === event.contentBlockIndex);
const block = blocks[index];
if (!block) return;
delete (block as Block).index;
switch (block.type) {
case "text":
stream.push({ type: "text_end", contentIndex: index, content: block.text, partial: output });
break;
case "thinking":
stream.push({ type: "thinking_end", contentIndex: index, content: block.thinking, partial: output });
break;
case "toolCall":
block.arguments = parseStreamingJson(block.partialJson);
delete (block as Block).partialJson;
stream.push({ type: "toolcall_end", contentIndex: index, toolCall: block, partial: output });
break;
}
}
/**
* Check if the model supports prompt caching.
* Supported: Claude 3.5 Haiku, Claude 3.7 Sonnet, Claude 4.x models
*/
function supportsPromptCaching(model: Model<"bedrock-converse-stream">): boolean {
const id = model.id.toLowerCase();
// Claude 4.x models (opus-4, sonnet-4, haiku-4)
if (id.includes("claude") && (id.includes("-4-") || id.includes("-4."))) return true;
// Claude 3.7 Sonnet
if (id.includes("claude-3-7-sonnet")) return true;
// Claude 3.5 Haiku
if (id.includes("claude-3-5-haiku")) return true;
return false;
}
function buildSystemPrompt(
systemPrompt: string | undefined,
model: Model<"bedrock-converse-stream">,
): SystemContentBlock[] | undefined {
if (!systemPrompt) return undefined;
const blocks: SystemContentBlock[] = [{ text: sanitizeSurrogates(systemPrompt) }];
// Add cache point for supported Claude models
if (supportsPromptCaching(model)) {
blocks.push({ cachePoint: { type: CachePointType.DEFAULT } });
}
return blocks;
}
function convertMessages(context: Context, model: Model<"bedrock-converse-stream">): Message[] {
const result: Message[] = [];
const messages = context.messages;
for (let i = 0; i < messages.length; i++) {
const m = messages[i];
switch (m.role) {
case "user":
result.push({
role: ConversationRole.USER,
content:
typeof m.content === "string"
? [{ text: sanitizeSurrogates(m.content) }]
: m.content.map((c) => {
switch (c.type) {
case "text":
return { text: sanitizeSurrogates(c.text) };
case "image":
return { image: createImageBlock(c.mimeType, c.data) };
default:
throw new Error("Unknown user content type");
}
}),
});
break;
case "assistant": {
// Skip assistant messages with empty content (e.g., from aborted requests)
// Bedrock rejects messages with empty content arrays
if (m.content.length === 0) {
continue;
}
const contentBlocks: ContentBlock[] = [];
for (const c of m.content) {
switch (c.type) {
case "text":
// Skip empty text blocks
if (c.text.trim().length === 0) continue;
contentBlocks.push({ text: sanitizeSurrogates(c.text) });
break;
case "toolCall":
contentBlocks.push({
toolUse: { toolUseId: c.id, name: c.name, input: c.arguments },
});
break;
case "thinking":
// Skip empty thinking blocks
if (c.thinking.trim().length === 0) continue;
contentBlocks.push({
reasoningContent: {
reasoningText: { text: sanitizeSurrogates(c.thinking), signature: c.thinkingSignature },
},
});
break;
default:
throw new Error("Unknown assistant content type");
}
}
// Skip if all content blocks were filtered out
if (contentBlocks.length === 0) {
continue;
}
result.push({
role: ConversationRole.ASSISTANT,
content: contentBlocks,
});
break;
}
case "toolResult": {
// Collect all consecutive toolResult messages into a single user message
// Bedrock requires all tool results to be in one message
const toolResults: ContentBlock.ToolResultMember[] = [];
// Add current tool result with all content blocks combined
toolResults.push({
toolResult: {
toolUseId: m.toolCallId,
content: m.content.map((c) =>
c.type === "image"
? { image: createImageBlock(c.mimeType, c.data) }
: { text: sanitizeSurrogates(c.text) },
),
status: m.isError ? ToolResultStatus.ERROR : ToolResultStatus.SUCCESS,
},
});
// Look ahead for consecutive toolResult messages
let j = i + 1;
while (j < messages.length && messages[j].role === "toolResult") {
const nextMsg = messages[j] as ToolResultMessage;
toolResults.push({
toolResult: {
toolUseId: nextMsg.toolCallId,
content: nextMsg.content.map((c) =>
c.type === "image"
? { image: createImageBlock(c.mimeType, c.data) }
: { text: sanitizeSurrogates(c.text) },
),
status: nextMsg.isError ? ToolResultStatus.ERROR : ToolResultStatus.SUCCESS,
},
});
j++;
}
// Skip the messages we've already processed
i = j - 1;
result.push({
role: ConversationRole.USER,
content: toolResults,
});
break;
}
default:
throw new Error("Unknown message role");
}
}
// Add cache point to the last user message for supported Claude models
if (supportsPromptCaching(model) && result.length > 0) {
const lastMessage = result[result.length - 1];
if (lastMessage.role === ConversationRole.USER && lastMessage.content) {
(lastMessage.content as ContentBlock[]).push({ cachePoint: { type: CachePointType.DEFAULT } });
}
}
return result;
}
function convertToolConfig(
tools: Tool[] | undefined,
toolChoice: BedrockOptions["toolChoice"],
): ToolConfiguration | undefined {
if (!tools?.length || toolChoice === "none") return undefined;
const bedrockTools: BedrockTool[] = tools.map((tool) => ({
toolSpec: {
name: tool.name,
description: tool.description,
inputSchema: { json: tool.parameters },
},
}));
let bedrockToolChoice: ToolChoice | undefined;
switch (toolChoice) {
case "auto":
bedrockToolChoice = { auto: {} };
break;
case "any":
bedrockToolChoice = { any: {} };
break;
default:
if (toolChoice?.type === "tool") {
bedrockToolChoice = { tool: { name: toolChoice.name } };
}
}
return { tools: bedrockTools, toolChoice: bedrockToolChoice };
}
function mapStopReason(reason: string | undefined): StopReason {
switch (reason) {
case BedrockStopReason.END_TURN:
case BedrockStopReason.STOP_SEQUENCE:
return "stop";
case BedrockStopReason.MAX_TOKENS:
case BedrockStopReason.MODEL_CONTEXT_WINDOW_EXCEEDED:
return "length";
case BedrockStopReason.TOOL_USE:
return "toolUse";
default:
return "error";
}
}
function buildAdditionalModelRequestFields(
model: Model<"bedrock-converse-stream">,
options: BedrockOptions,
): Record<string, any> | undefined {
if (!options.reasoning || !model.reasoning) {
return undefined;
}
if (model.id.includes("anthropic.claude")) {
const defaultBudgets: Record<ThinkingLevel, number> = {
minimal: 1024,
low: 2048,
medium: 8192,
high: 16384,
xhigh: 16384, // Claude doesn't support xhigh, clamp to high
};
// Custom budgets override defaults (xhigh not in ThinkingBudgets, use high)
const level = options.reasoning === "xhigh" ? "high" : options.reasoning;
const budget = options.thinkingBudgets?.[level] ?? defaultBudgets[options.reasoning];
const result: Record<string, any> = {
thinking: {
type: "enabled",
budget_tokens: budget,
},
};
if (options.interleavedThinking) {
result.anthropic_beta = ["interleaved-thinking-2025-05-14"];
}
return result;
}
return undefined;
}
function createImageBlock(mimeType: string, data: string) {
let format: ImageFormat;
switch (mimeType) {
case "image/jpeg":
case "image/jpg":
format = ImageFormat.JPEG;
break;
case "image/png":
format = ImageFormat.PNG;
break;
case "image/gif":
format = ImageFormat.GIF;
break;
case "image/webp":
format = ImageFormat.WEBP;
break;
default:
throw new Error(`Unknown image type: ${mimeType}`);
}
const binaryString = atob(data);
const bytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
return { source: { bytes }, format };
}

View file

@ -287,7 +287,7 @@ export const streamAnthropic: StreamFunction<"anthropic-messages"> = (
}
if (output.stopReason === "aborted" || output.stopReason === "error") {
throw new Error("An unkown error ocurred");
throw new Error("An unknown error occurred");
}
stream.push({ type: "done", reason: output.stopReason, message: output });

View file

@ -4,6 +4,7 @@
* Uses the Cloud Code Assist API endpoint to access Gemini and Claude models.
*/
import { createHash } from "node:crypto";
import type { Content, ThinkingConfig } from "@google/genai";
import { calculateCost } from "../models.js";
import type {
@ -54,6 +55,8 @@ export interface GoogleGeminiCliOptions extends StreamOptions {
}
const DEFAULT_ENDPOINT = "https://cloudcode-pa.googleapis.com";
const ANTIGRAVITY_DAILY_ENDPOINT = "https://daily-cloudcode-pa.sandbox.googleapis.com";
const ANTIGRAVITY_ENDPOINT_FALLBACKS = [ANTIGRAVITY_DAILY_ENDPOINT, DEFAULT_ENDPOINT] as const;
// Headers for Gemini CLI (prod endpoint)
const GEMINI_CLI_HEADERS = {
"User-Agent": "google-cloud-sdk vscode_cloudshelleditor/0.1",
@ -163,16 +166,66 @@ let toolCallCounter = 0;
// Retry configuration
const MAX_RETRIES = 3;
const BASE_DELAY_MS = 1000;
const MAX_EMPTY_STREAM_RETRIES = 2;
const EMPTY_STREAM_BASE_DELAY_MS = 500;
const CLAUDE_THINKING_BETA_HEADER = "interleaved-thinking-2025-05-14";
/**
* Extract retry delay from Gemini error response (in milliseconds).
* Parses patterns like:
* Checks headers first (Retry-After, x-ratelimit-reset, x-ratelimit-reset-after),
* then parses body patterns like:
* - "Your quota will reset after 39s"
* - "Your quota will reset after 18h31m10s"
* - "Please retry in Xs" or "Please retry in Xms"
* - "retryDelay": "34.074824224s" (JSON field)
*/
function extractRetryDelay(errorText: string): number | undefined {
export function extractRetryDelay(errorText: string, response?: Response | Headers): number | undefined {
const normalizeDelay = (ms: number): number | undefined => (ms > 0 ? Math.ceil(ms + 1000) : undefined);
const headers = response instanceof Headers ? response : response?.headers;
if (headers) {
const retryAfter = headers.get("retry-after");
if (retryAfter) {
const retryAfterSeconds = Number(retryAfter);
if (Number.isFinite(retryAfterSeconds)) {
const delay = normalizeDelay(retryAfterSeconds * 1000);
if (delay !== undefined) {
return delay;
}
}
const retryAfterDate = new Date(retryAfter);
const retryAfterMs = retryAfterDate.getTime();
if (!Number.isNaN(retryAfterMs)) {
const delay = normalizeDelay(retryAfterMs - Date.now());
if (delay !== undefined) {
return delay;
}
}
}
const rateLimitReset = headers.get("x-ratelimit-reset");
if (rateLimitReset) {
const resetSeconds = Number.parseInt(rateLimitReset, 10);
if (!Number.isNaN(resetSeconds)) {
const delay = normalizeDelay(resetSeconds * 1000 - Date.now());
if (delay !== undefined) {
return delay;
}
}
}
const rateLimitResetAfter = headers.get("x-ratelimit-reset-after");
if (rateLimitResetAfter) {
const resetAfterSeconds = Number(rateLimitResetAfter);
if (Number.isFinite(resetAfterSeconds)) {
const delay = normalizeDelay(resetAfterSeconds * 1000);
if (delay !== undefined) {
return delay;
}
}
}
}
// Pattern 1: "Your quota will reset after ..." (formats: "18h31m10s", "10m15s", "6s", "39s")
const durationMatch = errorText.match(/reset after (?:(\d+)h)?(?:(\d+)m)?(\d+(?:\.\d+)?)s/i);
if (durationMatch) {
@ -181,8 +234,9 @@ function extractRetryDelay(errorText: string): number | undefined {
const seconds = parseFloat(durationMatch[3]);
if (!Number.isNaN(seconds)) {
const totalMs = ((hours * 60 + minutes) * 60 + seconds) * 1000;
if (totalMs > 0) {
return Math.ceil(totalMs + 1000); // Add 1s buffer
const delay = normalizeDelay(totalMs);
if (delay !== undefined) {
return delay;
}
}
}
@ -193,7 +247,10 @@ function extractRetryDelay(errorText: string): number | undefined {
const value = parseFloat(retryInMatch[1]);
if (!Number.isNaN(value) && value > 0) {
const ms = retryInMatch[2].toLowerCase() === "ms" ? value : value * 1000;
return Math.ceil(ms + 1000);
const delay = normalizeDelay(ms);
if (delay !== undefined) {
return delay;
}
}
}
@ -203,21 +260,45 @@ function extractRetryDelay(errorText: string): number | undefined {
const value = parseFloat(retryDelayMatch[1]);
if (!Number.isNaN(value) && value > 0) {
const ms = retryDelayMatch[2].toLowerCase() === "ms" ? value : value * 1000;
return Math.ceil(ms + 1000);
const delay = normalizeDelay(ms);
if (delay !== undefined) {
return delay;
}
}
}
return undefined;
}
function isClaudeThinkingModel(modelId: string): boolean {
const normalized = modelId.toLowerCase();
return normalized.includes("claude") && normalized.includes("thinking");
}
/**
* Check if an error is retryable (rate limit, server error, etc.)
* Check if an error is retryable (rate limit, server error, network error, etc.)
*/
function isRetryableError(status: number, errorText: string): boolean {
if (status === 429 || status === 500 || status === 502 || status === 503 || status === 504) {
return true;
}
return /resource.?exhausted|rate.?limit|overloaded|service.?unavailable/i.test(errorText);
return /resource.?exhausted|rate.?limit|overloaded|service.?unavailable|other.?side.?closed/i.test(errorText);
}
/**
* Extract a clean, user-friendly error message from Google API error response.
* Parses JSON error responses and returns just the message field.
*/
function extractErrorMessage(errorText: string): string {
try {
const parsed = JSON.parse(errorText) as { error?: { message?: string } };
if (parsed.error?.message) {
return parsed.error.message;
}
} catch {
// Not JSON, return as-is
}
return errorText;
}
/**
@ -242,6 +323,7 @@ interface CloudCodeAssistRequest {
model: string;
request: {
contents: Content[];
sessionId?: string;
systemInstruction?: { role?: string; parts: { text: string }[] };
generationConfig?: {
maxOutputTokens?: number;
@ -339,17 +421,26 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
throw new Error("Missing token or projectId in Google Cloud credentials. Use /login to re-authenticate.");
}
const endpoint = model.baseUrl || DEFAULT_ENDPOINT;
const url = `${endpoint}/v1internal:streamGenerateContent?alt=sse`;
const isAntigravity = model.provider === "google-antigravity";
const baseUrl = model.baseUrl?.trim();
const endpoints = baseUrl ? [baseUrl] : isAntigravity ? ANTIGRAVITY_ENDPOINT_FALLBACKS : [DEFAULT_ENDPOINT];
// Use Antigravity headers for sandbox endpoint, otherwise Gemini CLI headers
const isAntigravity = endpoint.includes("sandbox.googleapis.com");
const requestBody = buildRequest(model, context, projectId, options, isAntigravity);
const headers = isAntigravity ? ANTIGRAVITY_HEADERS : GEMINI_CLI_HEADERS;
const requestHeaders = {
Authorization: `Bearer ${accessToken}`,
"Content-Type": "application/json",
Accept: "text/event-stream",
...headers,
...(isClaudeThinkingModel(model.id) ? { "anthropic-beta": CLAUDE_THINKING_BETA_HEADER } : {}),
};
const requestBodyJson = JSON.stringify(requestBody);
// Fetch with retry logic for rate limits and transient errors
let response: Response | undefined;
let lastError: Error | undefined;
let requestUrl: string | undefined;
for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
if (options?.signal?.aborted) {
@ -357,15 +448,12 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
}
try {
response = await fetch(url, {
const endpoint = endpoints[Math.min(attempt, endpoints.length - 1)];
requestUrl = `${endpoint}/v1internal:streamGenerateContent?alt=sse`;
response = await fetch(requestUrl, {
method: "POST",
headers: {
Authorization: `Bearer ${accessToken}`,
"Content-Type": "application/json",
Accept: "text/event-stream",
...headers,
},
body: JSON.stringify(requestBody),
headers: requestHeaders,
body: requestBodyJson,
signal: options?.signal,
});
@ -378,14 +466,14 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
// Check if retryable
if (attempt < MAX_RETRIES && isRetryableError(response.status, errorText)) {
// Use server-provided delay or exponential backoff
const serverDelay = extractRetryDelay(errorText);
const serverDelay = extractRetryDelay(errorText, response);
const delayMs = serverDelay ?? BASE_DELAY_MS * 2 ** attempt;
await sleep(delayMs, options?.signal);
continue;
}
// Not retryable or max retries exceeded
throw new Error(`Cloud Code Assist API error (${response.status}): ${errorText}`);
throw new Error(`Cloud Code Assist API error (${response.status}): ${extractErrorMessage(errorText)}`);
} catch (error) {
// Check for abort - fetch throws AbortError, our code throws "Request was aborted"
if (error instanceof Error) {
@ -393,7 +481,11 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
throw new Error("Request was aborted");
}
}
// Extract detailed error message from fetch errors (Node includes cause)
lastError = error instanceof Error ? error : new Error(String(error));
if (lastError.message === "fetch failed" && lastError.cause instanceof Error) {
lastError = new Error(`Network error: ${lastError.cause.message}`);
}
// Network errors are retryable
if (attempt < MAX_RETRIES) {
const delayMs = BASE_DELAY_MS * 2 ** attempt;
@ -408,73 +500,160 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
throw lastError ?? new Error("Failed to get response after retries");
}
if (!response.body) {
throw new Error("No response body");
}
stream.push({ type: "start", partial: output });
let currentBlock: TextContent | ThinkingContent | null = null;
const blocks = output.content;
const blockIndex = () => blocks.length - 1;
// Read SSE stream
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
// Set up abort handler to cancel reader when signal fires
const abortHandler = () => {
void reader.cancel().catch(() => {});
let started = false;
const ensureStarted = () => {
if (!started) {
stream.push({ type: "start", partial: output });
started = true;
}
};
options?.signal?.addEventListener("abort", abortHandler);
try {
while (true) {
// Check abort signal before each read
if (options?.signal?.aborted) {
throw new Error("Request was aborted");
}
const resetOutput = () => {
output.content = [];
output.usage = {
input: 0,
output: 0,
cacheRead: 0,
cacheWrite: 0,
totalTokens: 0,
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
};
output.stopReason = "stop";
output.errorMessage = undefined;
output.timestamp = Date.now();
started = false;
};
const { done, value } = await reader.read();
if (done) break;
const streamResponse = async (activeResponse: Response): Promise<boolean> => {
if (!activeResponse.body) {
throw new Error("No response body");
}
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
let hasContent = false;
let currentBlock: TextContent | ThinkingContent | null = null;
const blocks = output.content;
const blockIndex = () => blocks.length - 1;
for (const line of lines) {
if (!line.startsWith("data:")) continue;
// Read SSE stream
const reader = activeResponse.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
const jsonStr = line.slice(5).trim();
if (!jsonStr) continue;
// Set up abort handler to cancel reader when signal fires
const abortHandler = () => {
void reader.cancel().catch(() => {});
};
options?.signal?.addEventListener("abort", abortHandler);
let chunk: CloudCodeAssistResponseChunk;
try {
chunk = JSON.parse(jsonStr);
} catch {
continue;
try {
while (true) {
// Check abort signal before each read
if (options?.signal?.aborted) {
throw new Error("Request was aborted");
}
// Unwrap the response
const responseData = chunk.response;
if (!responseData) continue;
const { done, value } = await reader.read();
if (done) break;
const candidate = responseData.candidates?.[0];
if (candidate?.content?.parts) {
for (const part of candidate.content.parts) {
if (part.text !== undefined) {
const isThinking = isThinkingPart(part);
if (
!currentBlock ||
(isThinking && currentBlock.type !== "thinking") ||
(!isThinking && currentBlock.type !== "text")
) {
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
for (const line of lines) {
if (!line.startsWith("data:")) continue;
const jsonStr = line.slice(5).trim();
if (!jsonStr) continue;
let chunk: CloudCodeAssistResponseChunk;
try {
chunk = JSON.parse(jsonStr);
} catch {
continue;
}
// Unwrap the response
const responseData = chunk.response;
if (!responseData) continue;
const candidate = responseData.candidates?.[0];
if (candidate?.content?.parts) {
for (const part of candidate.content.parts) {
if (part.text !== undefined) {
hasContent = true;
const isThinking = isThinkingPart(part);
if (
!currentBlock ||
(isThinking && currentBlock.type !== "thinking") ||
(!isThinking && currentBlock.type !== "text")
) {
if (currentBlock) {
if (currentBlock.type === "text") {
stream.push({
type: "text_end",
contentIndex: blocks.length - 1,
content: currentBlock.text,
partial: output,
});
} else {
stream.push({
type: "thinking_end",
contentIndex: blockIndex(),
content: currentBlock.thinking,
partial: output,
});
}
}
if (isThinking) {
currentBlock = { type: "thinking", thinking: "", thinkingSignature: undefined };
output.content.push(currentBlock);
ensureStarted();
stream.push({
type: "thinking_start",
contentIndex: blockIndex(),
partial: output,
});
} else {
currentBlock = { type: "text", text: "" };
output.content.push(currentBlock);
ensureStarted();
stream.push({ type: "text_start", contentIndex: blockIndex(), partial: output });
}
}
if (currentBlock.type === "thinking") {
currentBlock.thinking += part.text;
currentBlock.thinkingSignature = retainThoughtSignature(
currentBlock.thinkingSignature,
part.thoughtSignature,
);
stream.push({
type: "thinking_delta",
contentIndex: blockIndex(),
delta: part.text,
partial: output,
});
} else {
currentBlock.text += part.text;
currentBlock.textSignature = retainThoughtSignature(
currentBlock.textSignature,
part.thoughtSignature,
);
stream.push({
type: "text_delta",
contentIndex: blockIndex(),
delta: part.text,
partial: output,
});
}
}
if (part.functionCall) {
hasContent = true;
if (currentBlock) {
if (currentBlock.type === "text") {
stream.push({
type: "text_end",
contentIndex: blocks.length - 1,
contentIndex: blockIndex(),
content: currentBlock.text,
partial: output,
});
@ -486,143 +665,142 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
partial: output,
});
}
currentBlock = null;
}
if (isThinking) {
currentBlock = { type: "thinking", thinking: "", thinkingSignature: undefined };
output.content.push(currentBlock);
stream.push({ type: "thinking_start", contentIndex: blockIndex(), partial: output });
} else {
currentBlock = { type: "text", text: "" };
output.content.push(currentBlock);
stream.push({ type: "text_start", contentIndex: blockIndex(), partial: output });
}
}
if (currentBlock.type === "thinking") {
currentBlock.thinking += part.text;
currentBlock.thinkingSignature = retainThoughtSignature(
currentBlock.thinkingSignature,
part.thoughtSignature,
);
const providedId = part.functionCall.id;
const needsNewId =
!providedId ||
output.content.some((b) => b.type === "toolCall" && b.id === providedId);
const toolCallId = needsNewId
? `${part.functionCall.name}_${Date.now()}_${++toolCallCounter}`
: providedId;
const toolCall: ToolCall = {
type: "toolCall",
id: toolCallId,
name: part.functionCall.name || "",
arguments: part.functionCall.args as Record<string, unknown>,
...(part.thoughtSignature && { thoughtSignature: part.thoughtSignature }),
};
output.content.push(toolCall);
ensureStarted();
stream.push({ type: "toolcall_start", contentIndex: blockIndex(), partial: output });
stream.push({
type: "thinking_delta",
type: "toolcall_delta",
contentIndex: blockIndex(),
delta: part.text,
delta: JSON.stringify(toolCall.arguments),
partial: output,
});
} else {
currentBlock.text += part.text;
currentBlock.textSignature = retainThoughtSignature(
currentBlock.textSignature,
part.thoughtSignature,
);
stream.push({
type: "text_delta",
type: "toolcall_end",
contentIndex: blockIndex(),
delta: part.text,
toolCall,
partial: output,
});
}
}
}
if (part.functionCall) {
if (currentBlock) {
if (currentBlock.type === "text") {
stream.push({
type: "text_end",
contentIndex: blockIndex(),
content: currentBlock.text,
partial: output,
});
} else {
stream.push({
type: "thinking_end",
contentIndex: blockIndex(),
content: currentBlock.thinking,
partial: output,
});
}
currentBlock = null;
}
const providedId = part.functionCall.id;
const needsNewId =
!providedId || output.content.some((b) => b.type === "toolCall" && b.id === providedId);
const toolCallId = needsNewId
? `${part.functionCall.name}_${Date.now()}_${++toolCallCounter}`
: providedId;
const toolCall: ToolCall = {
type: "toolCall",
id: toolCallId,
name: part.functionCall.name || "",
arguments: part.functionCall.args as Record<string, unknown>,
...(part.thoughtSignature && { thoughtSignature: part.thoughtSignature }),
};
output.content.push(toolCall);
stream.push({ type: "toolcall_start", contentIndex: blockIndex(), partial: output });
stream.push({
type: "toolcall_delta",
contentIndex: blockIndex(),
delta: JSON.stringify(toolCall.arguments),
partial: output,
});
stream.push({ type: "toolcall_end", contentIndex: blockIndex(), toolCall, partial: output });
if (candidate?.finishReason) {
output.stopReason = mapStopReasonString(candidate.finishReason);
if (output.content.some((b) => b.type === "toolCall")) {
output.stopReason = "toolUse";
}
}
}
if (candidate?.finishReason) {
output.stopReason = mapStopReasonString(candidate.finishReason);
if (output.content.some((b) => b.type === "toolCall")) {
output.stopReason = "toolUse";
}
}
if (responseData.usageMetadata) {
// promptTokenCount includes cachedContentTokenCount, so subtract to get fresh input
const promptTokens = responseData.usageMetadata.promptTokenCount || 0;
const cacheReadTokens = responseData.usageMetadata.cachedContentTokenCount || 0;
output.usage = {
input: promptTokens - cacheReadTokens,
output:
(responseData.usageMetadata.candidatesTokenCount || 0) +
(responseData.usageMetadata.thoughtsTokenCount || 0),
cacheRead: cacheReadTokens,
cacheWrite: 0,
totalTokens: responseData.usageMetadata.totalTokenCount || 0,
cost: {
input: 0,
output: 0,
cacheRead: 0,
if (responseData.usageMetadata) {
// promptTokenCount includes cachedContentTokenCount, so subtract to get fresh input
const promptTokens = responseData.usageMetadata.promptTokenCount || 0;
const cacheReadTokens = responseData.usageMetadata.cachedContentTokenCount || 0;
output.usage = {
input: promptTokens - cacheReadTokens,
output:
(responseData.usageMetadata.candidatesTokenCount || 0) +
(responseData.usageMetadata.thoughtsTokenCount || 0),
cacheRead: cacheReadTokens,
cacheWrite: 0,
total: 0,
},
};
calculateCost(model, output.usage);
totalTokens: responseData.usageMetadata.totalTokenCount || 0,
cost: {
input: 0,
output: 0,
cacheRead: 0,
cacheWrite: 0,
total: 0,
},
};
calculateCost(model, output.usage);
}
}
}
} finally {
options?.signal?.removeEventListener("abort", abortHandler);
}
if (currentBlock) {
if (currentBlock.type === "text") {
stream.push({
type: "text_end",
contentIndex: blockIndex(),
content: currentBlock.text,
partial: output,
});
} else {
stream.push({
type: "thinking_end",
contentIndex: blockIndex(),
content: currentBlock.thinking,
partial: output,
});
}
}
return hasContent;
};
let receivedContent = false;
let currentResponse = response;
for (let emptyAttempt = 0; emptyAttempt <= MAX_EMPTY_STREAM_RETRIES; emptyAttempt++) {
if (options?.signal?.aborted) {
throw new Error("Request was aborted");
}
if (emptyAttempt > 0) {
const backoffMs = EMPTY_STREAM_BASE_DELAY_MS * 2 ** (emptyAttempt - 1);
await sleep(backoffMs, options?.signal);
if (!requestUrl) {
throw new Error("Missing request URL");
}
currentResponse = await fetch(requestUrl, {
method: "POST",
headers: requestHeaders,
body: requestBodyJson,
signal: options?.signal,
});
if (!currentResponse.ok) {
const retryErrorText = await currentResponse.text();
throw new Error(`Cloud Code Assist API error (${currentResponse.status}): ${retryErrorText}`);
}
}
const streamed = await streamResponse(currentResponse);
if (streamed) {
receivedContent = true;
break;
}
if (emptyAttempt < MAX_EMPTY_STREAM_RETRIES) {
resetOutput();
}
} finally {
options?.signal?.removeEventListener("abort", abortHandler);
}
if (currentBlock) {
if (currentBlock.type === "text") {
stream.push({
type: "text_end",
contentIndex: blockIndex(),
content: currentBlock.text,
partial: output,
});
} else {
stream.push({
type: "thinking_end",
contentIndex: blockIndex(),
content: currentBlock.thinking,
partial: output,
});
}
if (!receivedContent) {
throw new Error("Cloud Code Assist API returned an empty response");
}
if (options?.signal?.aborted) {
@ -651,7 +829,34 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
return stream;
};
function buildRequest(
function deriveSessionId(context: Context): string | undefined {
for (const message of context.messages) {
if (message.role !== "user") {
continue;
}
let text = "";
if (typeof message.content === "string") {
text = message.content;
} else if (Array.isArray(message.content)) {
text = message.content
.filter((item): item is TextContent => item.type === "text")
.map((item) => item.text)
.join("\n");
}
if (!text || text.trim().length === 0) {
return undefined;
}
const hash = createHash("sha256").update(text).digest("hex");
return hash.slice(0, 32);
}
return undefined;
}
export function buildRequest(
model: Model<"google-gemini-cli">,
context: Context,
projectId: string,
@ -686,6 +891,11 @@ function buildRequest(
contents,
};
const sessionId = deriveSessionId(context);
if (sessionId) {
request.sessionId = sessionId;
}
// System instruction must be object with parts, not plain string
if (context.systemPrompt) {
request.systemInstruction = {

View file

@ -365,6 +365,7 @@ function createClient(model: Model<"openai-completions">, context: Context, apiK
function buildParams(model: Model<"openai-completions">, context: Context, options?: OpenAICompletionsOptions) {
const compat = getCompat(model);
const messages = convertMessages(model, context, compat);
maybeAddOpenRouterAnthropicCacheControl(model, messages);
const params: OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming = {
model: model.id,
@ -403,13 +404,51 @@ function buildParams(model: Model<"openai-completions">, context: Context, optio
params.tool_choice = options.toolChoice;
}
if (options?.reasoningEffort && model.reasoning && compat.supportsReasoningEffort) {
if (compat.thinkingFormat === "zai" && model.reasoning) {
// Z.ai uses binary thinking: { type: "enabled" | "disabled" }
// Must explicitly disable since z.ai defaults to thinking enabled
(params as any).thinking = { type: options?.reasoningEffort ? "enabled" : "disabled" };
} else if (options?.reasoningEffort && model.reasoning && compat.supportsReasoningEffort) {
// OpenAI-style reasoning_effort
params.reasoning_effort = options.reasoningEffort;
}
return params;
}
function maybeAddOpenRouterAnthropicCacheControl(
model: Model<"openai-completions">,
messages: ChatCompletionMessageParam[],
): void {
if (model.provider !== "openrouter" || !model.id.startsWith("anthropic/")) return;
// Anthropic-style caching requires cache_control on a text part. Add a breakpoint
// on the last user/assistant message (walking backwards until we find text content).
for (let i = messages.length - 1; i >= 0; i--) {
const msg = messages[i];
if (msg.role !== "user" && msg.role !== "assistant") continue;
const content = msg.content;
if (typeof content === "string") {
msg.content = [
Object.assign({ type: "text" as const, text: content }, { cache_control: { type: "ephemeral" } }),
];
return;
}
if (!Array.isArray(content)) continue;
// Find last text part and add cache_control
for (let j = content.length - 1; j >= 0; j--) {
const part = content[j];
if (part?.type === "text") {
Object.assign(part, { cache_control: { type: "ephemeral" } });
return;
}
}
}
}
function convertMessages(
model: Model<"openai-completions">,
context: Context,
@ -644,11 +683,14 @@ function mapStopReason(reason: ChatCompletionChunk.Choice["finish_reason"]): Sto
* Returns a fully resolved OpenAICompat object with all fields set.
*/
function detectCompatFromUrl(baseUrl: string): Required<OpenAICompat> {
const isZai = baseUrl.includes("api.z.ai");
const isNonStandard =
baseUrl.includes("cerebras.ai") ||
baseUrl.includes("api.x.ai") ||
baseUrl.includes("mistral.ai") ||
baseUrl.includes("chutes.ai");
baseUrl.includes("chutes.ai") ||
isZai;
const useMaxTokens = baseUrl.includes("mistral.ai") || baseUrl.includes("chutes.ai");
@ -659,13 +701,14 @@ function detectCompatFromUrl(baseUrl: string): Required<OpenAICompat> {
return {
supportsStore: !isNonStandard,
supportsDeveloperRole: !isNonStandard,
supportsReasoningEffort: !isGrok,
supportsReasoningEffort: !isGrok && !isZai,
supportsUsageInStreaming: true,
maxTokensField: useMaxTokens ? "max_tokens" : "max_completion_tokens",
requiresToolResultName: isMistral,
requiresAssistantAfterToolResult: false, // Mistral no longer requires this as of Dec 2024
requiresThinkingAsText: isMistral,
requiresMistralToolIds: isMistral,
thinkingFormat: isZai ? "zai" : "openai",
};
}
@ -688,5 +731,6 @@ function getCompat(model: Model<"openai-completions">): Required<OpenAICompat> {
model.compat.requiresAssistantAfterToolResult ?? detected.requiresAssistantAfterToolResult,
requiresThinkingAsText: model.compat.requiresThinkingAsText ?? detected.requiresThinkingAsText,
requiresMistralToolIds: model.compat.requiresMistralToolIds ?? detected.requiresMistralToolIds,
thinkingFormat: model.compat.thinkingFormat ?? detected.thinkingFormat,
};
}

View file

@ -24,6 +24,7 @@ import type {
ThinkingContent,
Tool,
ToolCall,
Usage,
} from "../types.js";
import { AssistantMessageEventStream } from "../utils/event-stream.js";
import { parseStreamingJson } from "../utils/json-parse.js";
@ -48,6 +49,7 @@ function shortHash(str: string): string {
export interface OpenAIResponsesOptions extends StreamOptions {
reasoningEffort?: "minimal" | "low" | "medium" | "high" | "xhigh";
reasoningSummary?: "auto" | "detailed" | "concise" | null;
serviceTier?: ResponseCreateParamsStreaming["service_tier"];
}
/**
@ -85,7 +87,7 @@ export const streamOpenAIResponses: StreamFunction<"openai-responses"> = (
const apiKey = options?.apiKey || getEnvApiKey(model.provider) || "";
const client = createClient(model, context, apiKey);
const params = buildParams(model, context, options);
const openaiStream = await client.responses.create(params, { signal: options?.signal });
const openaiStream = await client.responses.create(params, { signal: options?.signal, timeout: undefined });
stream.push({ type: "start", partial: output });
let currentItem: ResponseReasoningItem | ResponseOutputMessage | ResponseFunctionToolCall | null = null;
@ -276,6 +278,7 @@ export const streamOpenAIResponses: StreamFunction<"openai-responses"> = (
};
}
calculateCost(model, output.usage);
applyServiceTierPricing(output.usage, response?.service_tier ?? options?.serviceTier);
// Map status to stop reason
output.stopReason = mapStopReason(response?.status);
if (output.content.some((b) => b.type === "toolCall") && output.stopReason === "stop") {
@ -363,6 +366,7 @@ function buildParams(model: Model<"openai-responses">, context: Context, options
model: model.id,
input: messages,
stream: true,
prompt_cache_key: options?.sessionId,
};
if (options?.maxTokens) {
@ -373,6 +377,10 @@ function buildParams(model: Model<"openai-responses">, context: Context, options
params.temperature = options?.temperature;
}
if (options?.serviceTier !== undefined) {
params.service_tier = options.serviceTier;
}
if (context.tools) {
params.tools = convertTools(context.tools);
}
@ -547,6 +555,28 @@ function convertTools(tools: Tool[]): OpenAITool[] {
}));
}
function getServiceTierCostMultiplier(serviceTier: ResponseCreateParamsStreaming["service_tier"] | undefined): number {
switch (serviceTier) {
case "flex":
return 0.5;
case "priority":
return 2;
default:
return 1;
}
}
function applyServiceTierPricing(usage: Usage, serviceTier: ResponseCreateParamsStreaming["service_tier"] | undefined) {
const multiplier = getServiceTierCostMultiplier(serviceTier);
if (multiplier === 1) return;
usage.cost.input *= multiplier;
usage.cost.output *= multiplier;
usage.cost.cacheRead *= multiplier;
usage.cost.cacheWrite *= multiplier;
usage.cost.total = usage.cost.input + usage.cost.output + usage.cost.cacheRead + usage.cost.cacheWrite;
}
function mapStopReason(status: OpenAI.Responses.ResponseStatus | undefined): StopReason {
if (!status) return "stop";
switch (status) {

View file

@ -1,11 +1,11 @@
import type { Api, AssistantMessage, Message, Model, ToolCall, ToolResultMessage } from "../types.js";
/**
* Normalize tool call ID for GitHub Copilot cross-API compatibility.
* Normalize tool call ID for cross-provider compatibility.
* OpenAI Responses API generates IDs that are 450+ chars with special characters like `|`.
* Other APIs (Claude, etc.) require max 40 chars and only alphanumeric + underscore + hyphen.
* Anthropic APIs require IDs matching ^[a-zA-Z0-9_-]+$ (max 64 chars).
*/
function normalizeCopilotToolCallId(id: string): string {
function normalizeToolCallId(id: string): string {
return id.replace(/[^a-zA-Z0-9_-]/g, "").slice(0, 40);
}
@ -38,11 +38,17 @@ export function transformMessages<TApi extends Api>(messages: Message[], model:
return msg;
}
// Check if we need to normalize tool call IDs (github-copilot cross-API)
const needsToolCallIdNormalization =
// Check if we need to normalize tool call IDs
// Anthropic APIs require IDs matching ^[a-zA-Z0-9_-]+$ (max 64 chars)
// OpenAI Responses API generates IDs with `|` and 450+ chars
// GitHub Copilot routes to Anthropic for Claude models
const targetRequiresStrictIds = model.api === "anthropic-messages" || model.provider === "github-copilot";
const crossProviderSwitch = assistantMsg.provider !== model.provider;
const copilotCrossApiSwitch =
assistantMsg.provider === "github-copilot" &&
model.provider === "github-copilot" &&
assistantMsg.api !== model.api;
const needsToolCallIdNormalization = targetRequiresStrictIds && (crossProviderSwitch || copilotCrossApiSwitch);
// Transform message from different provider/model
const transformedContent = assistantMsg.content.flatMap((block) => {
@ -54,10 +60,10 @@ export function transformMessages<TApi extends Api>(messages: Message[], model:
text: block.thinking,
};
}
// Normalize tool call IDs for github-copilot cross-API switches
// Normalize tool call IDs when target API requires strict format
if (block.type === "toolCall" && needsToolCallIdNormalization) {
const toolCall = block as ToolCall;
const normalizedId = normalizeCopilotToolCallId(toolCall.id);
const normalizedId = normalizeToolCallId(toolCall.id);
if (normalizedId !== toolCall.id) {
toolCallIdMap.set(toolCall.id, normalizedId);
return { ...toolCall, id: normalizedId };

View file

@ -2,6 +2,7 @@ import { existsSync } from "node:fs";
import { homedir } from "node:os";
import { join } from "node:path";
import { supportsXhigh } from "./models.js";
import { type BedrockOptions, streamBedrock } from "./providers/amazon-bedrock.js";
import { type AnthropicOptions, streamAnthropic } from "./providers/anthropic.js";
import { type GoogleOptions, streamGoogle } from "./providers/google.js";
import {
@ -74,6 +75,20 @@ export function getEnvApiKey(provider: any): string | undefined {
}
}
if (provider === "amazon-bedrock") {
// Amazon Bedrock supports multiple credential sources:
// 1. AWS_PROFILE - named profile from ~/.aws/credentials
// 2. AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY - standard IAM keys
// 3. AWS_BEARER_TOKEN_BEDROCK - Bedrock API keys (bearer token)
if (
process.env.AWS_PROFILE ||
(process.env.AWS_ACCESS_KEY_ID && process.env.AWS_SECRET_ACCESS_KEY) ||
process.env.AWS_BEARER_TOKEN_BEDROCK
) {
return "<authenticated>";
}
}
const envMap: Record<string, string> = {
openai: "OPENAI_API_KEY",
google: "GEMINI_API_KEY",
@ -81,8 +96,10 @@ export function getEnvApiKey(provider: any): string | undefined {
cerebras: "CEREBRAS_API_KEY",
xai: "XAI_API_KEY",
openrouter: "OPENROUTER_API_KEY",
"vercel-ai-gateway": "AI_GATEWAY_API_KEY",
zai: "ZAI_API_KEY",
mistral: "MISTRAL_API_KEY",
minimax: "MINIMAX_API_KEY",
opencode: "OPENCODE_API_KEY",
};
@ -98,6 +115,9 @@ export function stream<TApi extends Api>(
// Vertex AI uses Application Default Credentials, not API keys
if (model.api === "google-vertex") {
return streamGoogleVertex(model as Model<"google-vertex">, context, options as GoogleVertexOptions);
} else if (model.api === "bedrock-converse-stream") {
// Bedrock doesn't have any API keys instead it sources credentials from standard AWS env variables or from given AWS profile.
return streamBedrock(model as Model<"bedrock-converse-stream">, context, (options || {}) as BedrockOptions);
}
const apiKey = options?.apiKey || getEnvApiKey(model.provider);
@ -156,6 +176,10 @@ export function streamSimple<TApi extends Api>(
if (model.api === "google-vertex") {
const providerOptions = mapOptionsForApi(model, options, undefined);
return stream(model, context, providerOptions);
} else if (model.api === "bedrock-converse-stream") {
// Bedrock doesn't have any API keys instead it sources credentials from standard AWS env variables or from given AWS profile.
const providerOptions = mapOptionsForApi(model, options, undefined);
return stream(model, context, providerOptions);
}
const apiKey = options?.apiKey || getEnvApiKey(model.provider);
@ -228,6 +252,13 @@ function mapOptionsForApi<TApi extends Api>(
} satisfies AnthropicOptions;
}
case "bedrock-converse-stream":
return {
...base,
reasoning: options?.reasoning,
thinkingBudgets: options?.thinkingBudgets,
} satisfies BedrockOptions;
case "openai-completions":
return {
...base,

View file

@ -1,3 +1,4 @@
import type { BedrockOptions } from "./providers/amazon-bedrock.js";
import type { AnthropicOptions } from "./providers/anthropic.js";
import type { GoogleOptions } from "./providers/google.js";
import type { GoogleGeminiCliOptions } from "./providers/google-gemini-cli.js";
@ -14,12 +15,14 @@ export type Api =
| "openai-responses"
| "openai-codex-responses"
| "anthropic-messages"
| "bedrock-converse-stream"
| "google-generative-ai"
| "google-gemini-cli"
| "google-vertex";
export interface ApiOptionsMap {
"anthropic-messages": AnthropicOptions;
"bedrock-converse-stream": BedrockOptions;
"openai-completions": OpenAICompletionsOptions;
"openai-responses": OpenAIResponsesOptions;
"openai-codex-responses": OpenAICodexResponsesOptions;
@ -40,6 +43,7 @@ const _exhaustive: _CheckExhaustive = true;
export type OptionsForApi<TApi extends Api> = ApiOptionsMap[TApi];
export type KnownProvider =
| "amazon-bedrock"
| "anthropic"
| "google"
| "google-gemini-cli"
@ -52,8 +56,10 @@ export type KnownProvider =
| "groq"
| "cerebras"
| "openrouter"
| "vercel-ai-gateway"
| "zai"
| "mistral"
| "minimax"
| "opencode";
export type Provider = KnownProvider | string;
@ -219,6 +225,8 @@ export interface OpenAICompat {
requiresThinkingAsText?: boolean;
/** Whether tool call IDs must be normalized to Mistral format (exactly 9 alphanumeric chars). Default: auto-detected from URL. */
requiresMistralToolIds?: boolean;
/** Format for reasoning/thinking parameter. "openai" uses reasoning_effort, "zai" uses thinking: { type: "enabled" }. Default: "openai". */
thinkingFormat?: "openai" | "zai";
}
// Model interface for the unified model system

View file

@ -17,6 +17,7 @@ import type { AssistantMessage } from "../types.js";
* - llama.cpp: "the request exceeds the available context size, try increasing it"
* - LM Studio: "tokens to keep from the initial prompt is greater than the context length"
* - GitHub Copilot: "prompt token count of X exceeds the limit of Y"
* - MiniMax: "invalid params, context window exceeds limit"
* - Cerebras: Returns "400 status code (no body)" - handled separately below
* - Mistral: Returns "400 status code (no body)" - handled separately below
* - z.ai: Does NOT error, accepts overflow silently - handled via usage.input > contextWindow
@ -24,6 +25,7 @@ import type { AssistantMessage } from "../types.js";
*/
const OVERFLOW_PATTERNS = [
/prompt is too long/i, // Anthropic
/input is too long for requested model/i, // Amazon Bedrock
/exceeds the context window/i, // OpenAI (Completions & Responses API)
/input token count.*exceeds the maximum/i, // Google (Gemini)
/maximum prompt length is \d+/i, // xAI (Grok)
@ -32,6 +34,7 @@ const OVERFLOW_PATTERNS = [
/exceeds the limit of \d+/i, // GitHub Copilot
/exceeds the available context size/i, // llama.cpp server
/greater than the context length/i, // LM Studio
/context window exceeds limit/i, // MiniMax
/context[_ ]length[_ ]exceeded/i, // Generic fallback
/too many tokens/i, // Generic fallback
/token limit exceeded/i, // Generic fallback

View file

@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
import { getModel } from "../src/models.js";
import { complete, stream } from "../src/stream.js";
import type { Api, Context, Model, OptionsForApi } from "../src/types.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
import { resolveApiKey } from "./oauth.js";
// Resolve OAuth tokens at module level (async, runs before tests)
@ -66,6 +67,35 @@ async function testImmediateAbort<TApi extends Api>(llm: Model<TApi>, options: O
expect(response.stopReason).toBe("aborted");
}
async function testAbortThenNewMessage<TApi extends Api>(llm: Model<TApi>, options: OptionsForApi<TApi> = {}) {
// First request: abort immediately before any response content arrives
const controller = new AbortController();
controller.abort();
const context: Context = {
messages: [{ role: "user", content: "Hello, how are you?", timestamp: Date.now() }],
};
const abortedResponse = await complete(llm, context, { ...options, signal: controller.signal });
expect(abortedResponse.stopReason).toBe("aborted");
// The aborted message has empty content since we aborted before anything arrived
expect(abortedResponse.content.length).toBe(0);
// Add the aborted assistant message to context (this is what happens in the real coding agent)
context.messages.push(abortedResponse);
// Second request: send a new message - this should work even with the aborted message in context
context.messages.push({
role: "user",
content: "What is 2 + 2?",
timestamp: Date.now(),
});
const followUp = await complete(llm, context, options);
expect(followUp.stopReason).toBe("stop");
expect(followUp.content.length).toBeGreaterThan(0);
}
describe("AI Providers Abort Tests", () => {
describe.skipIf(!process.env.GEMINI_API_KEY)("Google Provider Abort", () => {
const llm = getModel("google", "gemini-2.5-flash");
@ -130,6 +160,30 @@ describe("AI Providers Abort Tests", () => {
});
});
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider Abort", () => {
const llm = getModel("minimax", "MiniMax-M2.1");
it("should abort mid-stream", { retry: 3 }, async () => {
await testAbortSignal(llm);
});
it("should handle immediate abort", { retry: 3 }, async () => {
await testImmediateAbort(llm);
});
});
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider Abort", () => {
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
it("should abort mid-stream", { retry: 3 }, async () => {
await testAbortSignal(llm);
});
it("should handle immediate abort", { retry: 3 }, async () => {
await testImmediateAbort(llm);
});
});
// Google Gemini CLI / Antigravity share the same provider, so one test covers both
describe("Google Gemini CLI Provider Abort", () => {
it.skipIf(!geminiCliToken)("should abort mid-stream", { retry: 3 }, async () => {
@ -154,4 +208,20 @@ describe("AI Providers Abort Tests", () => {
await testImmediateAbort(llm, { apiKey: openaiCodexToken });
});
});
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider Abort", () => {
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
it("should abort mid-stream", { retry: 3 }, async () => {
await testAbortSignal(llm, { reasoning: "medium" });
});
it("should handle immediate abort", { retry: 3 }, async () => {
await testImmediateAbort(llm);
});
it("should handle abort then new message", { retry: 3 }, async () => {
await testAbortThenNewMessage(llm);
});
});
});

View file

@ -0,0 +1,66 @@
/**
* A test suite to ensure all configured Amazon Bedrock models are usable.
*
* This is here to make sure we got correct model identifiers from models.dev and other sources.
* Because Amazon Bedrock requires cross-region inference in some models,
* plain model identifiers are not always usable and it requires tweaking of model identifiers to use cross-region inference.
* See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html#inference-profiles-support-system for more details.
*
* This test suite is not enabled by default unless AWS credentials and `BEDROCK_EXTENSIVE_MODEL_TEST` environment variables are set.
* This test suite takes ~2 minutes to run. Because not all models are available in all regions,
* it's recommended to use `us-west-2` region for best coverage for running this test suite.
*
* You can run this test suite with:
* ```bash
* $ AWS_REGION=us-west-2 BEDROCK_EXTENSIVE_MODEL_TEST=1 AWS_PROFILE=... npm test -- ./test/bedrock-models.test.ts
* ```
*/
import { describe, expect, it } from "vitest";
import { getModels } from "../src/models.js";
import { complete } from "../src/stream.js";
import type { Context } from "../src/types.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
describe("Amazon Bedrock Models", () => {
const models = getModels("amazon-bedrock");
it("should get all available Bedrock models", () => {
expect(models.length).toBeGreaterThan(0);
console.log(`Found ${models.length} Bedrock models`);
});
if (hasBedrockCredentials() && process.env.BEDROCK_EXTENSIVE_MODEL_TEST) {
for (const model of models) {
it(`should make a simple request with ${model.id}`, { timeout: 10_000 }, async () => {
const context: Context = {
systemPrompt: "You are a helpful assistant. Be extremely concise.",
messages: [
{
role: "user",
content: "Reply with exactly: 'OK'",
timestamp: Date.now(),
},
],
};
const response = await complete(model, context);
expect(response.role).toBe("assistant");
expect(response.content).toBeTruthy();
expect(response.content.length).toBeGreaterThan(0);
expect(response.usage.input + response.usage.cacheRead).toBeGreaterThan(0);
expect(response.usage.output).toBeGreaterThan(0);
expect(response.errorMessage).toBeFalsy();
const textContent = response.content
.filter((b) => b.type === "text")
.map((b) => (b.type === "text" ? b.text : ""))
.join("")
.trim();
expect(textContent).toBeTruthy();
console.log(`${model.id}: ${textContent.substring(0, 100)}`);
});
}
}
});

View file

@ -0,0 +1,18 @@
/**
* Utility functions for Amazon Bedrock tests
*/
/**
* Check if any valid AWS credentials are configured for Bedrock.
* Returns true if any of the following are set:
* - AWS_PROFILE (named profile from ~/.aws/credentials)
* - AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY (IAM keys)
* - AWS_BEARER_TOKEN_BEDROCK (Bedrock API key)
*/
export function hasBedrockCredentials(): boolean {
return !!(
process.env.AWS_PROFILE ||
(process.env.AWS_ACCESS_KEY_ID && process.env.AWS_SECRET_ACCESS_KEY) ||
process.env.AWS_BEARER_TOKEN_BEDROCK
);
}

View file

@ -18,6 +18,7 @@ import { getModel } from "../src/models.js";
import { complete } from "../src/stream.js";
import type { AssistantMessage, Context, Model, Usage } from "../src/types.js";
import { isContextOverflow } from "../src/utils/overflow.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
import { resolveApiKey } from "./oauth.js";
// Resolve OAuth tokens at module level (async, runs before tests)
@ -284,6 +285,22 @@ describe("Context overflow error handling", () => {
);
});
// =============================================================================
// Amazon Bedrock
// Expected pattern: "Input is too long for requested model"
// =============================================================================
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock", () => {
it("claude-sonnet-4-5 - should detect overflow via isContextOverflow", async () => {
const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
const result = await testContextOverflow(model, "");
logResult(result);
expect(result.stopReason).toBe("error");
expect(isContextOverflow(result.response, model.contextWindow)).toBe(true);
}, 120000);
});
// =============================================================================
// xAI
// Expected pattern: "maximum prompt length is X but the request contains Y"
@ -379,6 +396,37 @@ describe("Context overflow error handling", () => {
}, 120000);
});
// =============================================================================
// MiniMax
// Expected pattern: TBD - need to test actual error message
// =============================================================================
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax", () => {
it("MiniMax-M2.1 - should detect overflow via isContextOverflow", async () => {
const model = getModel("minimax", "MiniMax-M2.1");
const result = await testContextOverflow(model, process.env.MINIMAX_API_KEY!);
logResult(result);
expect(result.stopReason).toBe("error");
expect(isContextOverflow(result.response, model.contextWindow)).toBe(true);
}, 120000);
});
// =============================================================================
// Vercel AI Gateway - Unified API for multiple providers
// =============================================================================
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway", () => {
it("google/gemini-2.5-flash via AI Gateway - should detect overflow via isContextOverflow", async () => {
const model = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
const result = await testContextOverflow(model, process.env.AI_GATEWAY_API_KEY!);
logResult(result);
expect(result.stopReason).toBe("error");
expect(isContextOverflow(result.response, model.contextWindow)).toBe(true);
}, 120000);
});
// =============================================================================
// OpenRouter - Multiple backend providers
// Expected pattern: "maximum context length is X tokens"

View file

@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
import { getModel } from "../src/models.js";
import { complete } from "../src/stream.js";
import type { Api, AssistantMessage, Context, Model, OptionsForApi, UserMessage } from "../src/types.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
import { resolveApiKey } from "./oauth.js";
// Resolve OAuth tokens at module level (async, runs before tests)
@ -321,6 +322,66 @@ describe("AI Providers Empty Message Tests", () => {
});
});
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider Empty Messages", () => {
const llm = getModel("minimax", "MiniMax-M2.1");
it("should handle empty content array", { retry: 3, timeout: 30000 }, async () => {
await testEmptyMessage(llm);
});
it("should handle empty string content", { retry: 3, timeout: 30000 }, async () => {
await testEmptyStringMessage(llm);
});
it("should handle whitespace-only content", { retry: 3, timeout: 30000 }, async () => {
await testWhitespaceOnlyMessage(llm);
});
it("should handle empty assistant message in conversation", { retry: 3, timeout: 30000 }, async () => {
await testEmptyAssistantMessage(llm);
});
});
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider Empty Messages", () => {
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
it("should handle empty content array", { retry: 3, timeout: 30000 }, async () => {
await testEmptyMessage(llm);
});
it("should handle empty string content", { retry: 3, timeout: 30000 }, async () => {
await testEmptyStringMessage(llm);
});
it("should handle whitespace-only content", { retry: 3, timeout: 30000 }, async () => {
await testWhitespaceOnlyMessage(llm);
});
it("should handle empty assistant message in conversation", { retry: 3, timeout: 30000 }, async () => {
await testEmptyAssistantMessage(llm);
});
});
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider Empty Messages", () => {
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
it("should handle empty content array", { retry: 3, timeout: 30000 }, async () => {
await testEmptyMessage(llm);
});
it("should handle empty string content", { retry: 3, timeout: 30000 }, async () => {
await testEmptyStringMessage(llm);
});
it("should handle whitespace-only content", { retry: 3, timeout: 30000 }, async () => {
await testWhitespaceOnlyMessage(llm);
});
it("should handle empty assistant message in conversation", { retry: 3, timeout: 30000 }, async () => {
await testEmptyAssistantMessage(llm);
});
});
// =========================================================================
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
// =========================================================================

View file

@ -0,0 +1,103 @@
import { afterEach, describe, expect, it, vi } from "vitest";
import { streamGoogleGeminiCli } from "../src/providers/google-gemini-cli.js";
import type { Context, Model } from "../src/types.js";
const originalFetch = global.fetch;
const apiKey = JSON.stringify({ token: "token", projectId: "project" });
const createSseResponse = () => {
const sse = `${[
`data: ${JSON.stringify({
response: {
candidates: [
{
content: { role: "model", parts: [{ text: "Hello" }] },
finishReason: "STOP",
},
],
},
})}`,
].join("\n\n")}\n\n`;
const encoder = new TextEncoder();
const stream = new ReadableStream<Uint8Array>({
start(controller) {
controller.enqueue(encoder.encode(sse));
controller.close();
},
});
return new Response(stream, {
status: 200,
headers: { "content-type": "text/event-stream" },
});
};
afterEach(() => {
global.fetch = originalFetch;
vi.restoreAllMocks();
});
describe("google-gemini-cli Claude thinking header", () => {
const context: Context = {
messages: [{ role: "user", content: "Say hello", timestamp: Date.now() }],
};
it("adds anthropic-beta for Claude thinking models", async () => {
const fetchMock = vi.fn(async (_input: string | URL, init?: RequestInit) => {
const headers = new Headers(init?.headers);
expect(headers.get("anthropic-beta")).toBe("interleaved-thinking-2025-05-14");
return createSseResponse();
});
global.fetch = fetchMock as typeof fetch;
const model: Model<"google-gemini-cli"> = {
id: "claude-opus-4-5-thinking",
name: "Claude Opus 4.5 Thinking",
api: "google-gemini-cli",
provider: "google-antigravity",
baseUrl: "https://cloudcode-pa.googleapis.com",
reasoning: true,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 8192,
};
const stream = streamGoogleGeminiCli(model, context, { apiKey });
for await (const _event of stream) {
// exhaust stream
}
await stream.result();
});
it("does not add anthropic-beta for Gemini models", async () => {
const fetchMock = vi.fn(async (_input: string | URL, init?: RequestInit) => {
const headers = new Headers(init?.headers);
expect(headers.has("anthropic-beta")).toBe(false);
return createSseResponse();
});
global.fetch = fetchMock as typeof fetch;
const model: Model<"google-gemini-cli"> = {
id: "gemini-2.5-flash",
name: "Gemini 2.5 Flash",
api: "google-gemini-cli",
provider: "google-gemini-cli",
baseUrl: "https://cloudcode-pa.googleapis.com",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 8192,
};
const stream = streamGoogleGeminiCli(model, context, { apiKey });
for await (const _event of stream) {
// exhaust stream
}
await stream.result();
});
});

View file

@ -0,0 +1,108 @@
import { afterEach, describe, expect, it, vi } from "vitest";
import { streamGoogleGeminiCli } from "../src/providers/google-gemini-cli.js";
import type { Context, Model } from "../src/types.js";
const originalFetch = global.fetch;
afterEach(() => {
global.fetch = originalFetch;
vi.restoreAllMocks();
});
describe("google-gemini-cli empty stream retry", () => {
it("retries empty SSE responses without duplicate start", async () => {
const emptyStream = new ReadableStream<Uint8Array>({
start(controller) {
controller.close();
},
});
const sse = `${[
`data: ${JSON.stringify({
response: {
candidates: [
{
content: { role: "model", parts: [{ text: "Hello" }] },
finishReason: "STOP",
},
],
usageMetadata: {
promptTokenCount: 1,
candidatesTokenCount: 1,
totalTokenCount: 2,
},
},
})}`,
].join("\n\n")}\n\n`;
const encoder = new TextEncoder();
const dataStream = new ReadableStream<Uint8Array>({
start(controller) {
controller.enqueue(encoder.encode(sse));
controller.close();
},
});
let callCount = 0;
const fetchMock = vi.fn(async () => {
callCount += 1;
if (callCount === 1) {
return new Response(emptyStream, {
status: 200,
headers: { "content-type": "text/event-stream" },
});
}
return new Response(dataStream, {
status: 200,
headers: { "content-type": "text/event-stream" },
});
});
global.fetch = fetchMock as typeof fetch;
const model: Model<"google-gemini-cli"> = {
id: "gemini-2.5-flash",
name: "Gemini 2.5 Flash",
api: "google-gemini-cli",
provider: "google-gemini-cli",
baseUrl: "https://cloudcode-pa.googleapis.com",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 8192,
};
const context: Context = {
messages: [{ role: "user", content: "Say hello", timestamp: Date.now() }],
};
const stream = streamGoogleGeminiCli(model, context, {
apiKey: JSON.stringify({ token: "token", projectId: "project" }),
});
let startCount = 0;
let doneCount = 0;
let text = "";
for await (const event of stream) {
if (event.type === "start") {
startCount += 1;
}
if (event.type === "done") {
doneCount += 1;
}
if (event.type === "text_delta") {
text += event.delta;
}
}
const result = await stream.result();
expect(text).toBe("Hello");
expect(result.stopReason).toBe("stop");
expect(startCount).toBe(1);
expect(doneCount).toBe(1);
expect(fetchMock).toHaveBeenCalledTimes(2);
});
});

View file

@ -0,0 +1,53 @@
import { afterEach, describe, expect, it, vi } from "vitest";
import { extractRetryDelay } from "../src/providers/google-gemini-cli.js";
describe("extractRetryDelay header parsing", () => {
afterEach(() => {
vi.useRealTimers();
});
it("prefers Retry-After seconds header", () => {
vi.useFakeTimers();
vi.setSystemTime(new Date("2025-01-01T00:00:00Z"));
const response = new Response("", { headers: { "Retry-After": "5" } });
const delay = extractRetryDelay("Please retry in 1s", response);
expect(delay).toBe(6000);
});
it("parses Retry-After HTTP date header", () => {
vi.useFakeTimers();
const now = new Date("2025-01-01T00:00:00Z");
vi.setSystemTime(now);
const retryAt = new Date(now.getTime() + 12000).toUTCString();
const response = new Response("", { headers: { "Retry-After": retryAt } });
const delay = extractRetryDelay("", response);
expect(delay).toBe(13000);
});
it("parses x-ratelimit-reset header", () => {
vi.useFakeTimers();
const now = new Date("2025-01-01T00:00:00Z");
vi.setSystemTime(now);
const resetAtMs = now.getTime() + 20000;
const resetSeconds = Math.floor(resetAtMs / 1000).toString();
const response = new Response("", { headers: { "x-ratelimit-reset": resetSeconds } });
const delay = extractRetryDelay("", response);
expect(delay).toBe(21000);
});
it("parses x-ratelimit-reset-after header", () => {
vi.useFakeTimers();
vi.setSystemTime(new Date("2025-01-01T00:00:00Z"));
const response = new Response("", { headers: { "x-ratelimit-reset-after": "30" } });
const delay = extractRetryDelay("", response);
expect(delay).toBe(31000);
});
});

View file

@ -0,0 +1,50 @@
import { createHash } from "node:crypto";
import { describe, expect, it } from "vitest";
import { buildRequest } from "../src/providers/google-gemini-cli.js";
import type { Context, Model } from "../src/types.js";
const model: Model<"google-gemini-cli"> = {
id: "gemini-2.5-flash",
name: "Gemini 2.5 Flash",
api: "google-gemini-cli",
provider: "google-gemini-cli",
baseUrl: "https://cloudcode-pa.googleapis.com",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 8192,
};
describe("buildRequest sessionId", () => {
it("derives sessionId from the first user message", () => {
const context: Context = {
messages: [
{ role: "user", content: "First message", timestamp: Date.now() },
{ role: "user", content: "Second message", timestamp: Date.now() },
],
};
const result = buildRequest(model, context, "project-id");
const expected = createHash("sha256").update("First message").digest("hex").slice(0, 32);
expect(result.request.sessionId).toBe(expected);
});
it("omits sessionId when the first user message has no text", () => {
const context: Context = {
messages: [
{
role: "user",
content: [{ type: "image", data: "Zm9v", mimeType: "image/png" }],
timestamp: Date.now(),
},
{ role: "user", content: "Later text", timestamp: Date.now() },
],
};
const result = buildRequest(model, context, "project-id");
expect(result.request.sessionId).toBeUndefined();
});
});

View file

@ -75,6 +75,7 @@ import { afterAll, beforeAll, describe, expect, it } from "vitest";
import { getModel } from "../src/models.js";
import { complete } from "../src/stream.js";
import type { Api, Context, ImageContent, Model, OptionsForApi, UserMessage } from "../src/types.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
@ -840,6 +841,122 @@ describe("Image Limits E2E Tests", () => {
});
});
// -------------------------------------------------------------------------
// Vercel AI Gateway (google/gemini-2.5-flash)
// -------------------------------------------------------------------------
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway (google/gemini-2.5-flash)", () => {
const model = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
it("should accept a small number of images (5)", async () => {
const result = await testImageCount(model, 5, smallImage);
expect(result.success, result.error).toBe(true);
});
it("should find maximum image count limit", { timeout: 600000 }, async () => {
const { limit, lastError } = await findLimit((count) => testImageCount(model, count, smallImage), 10, 100, 10);
console.log(`\n Vercel AI Gateway max images: ~${limit} (last error: ${lastError})`);
expect(limit).toBeGreaterThanOrEqual(5);
});
it("should find maximum image size limit", { timeout: 600000 }, async () => {
const MB = 1024 * 1024;
const sizes = [5, 10, 15, 20];
let lastSuccess = 0;
let lastError: string | undefined;
for (const sizeMB of sizes) {
console.log(` Testing size: ${sizeMB}MB...`);
const imageBase64 = generateImageWithSize(sizeMB * MB, `size-${sizeMB}mb.png`);
const result = await testImageSize(model, imageBase64);
if (result.success) {
lastSuccess = sizeMB;
console.log(` SUCCESS`);
} else {
lastError = result.error;
console.log(` FAILED: ${result.error?.substring(0, 100)}`);
break;
}
}
console.log(`\n Vercel AI Gateway max image size: ~${lastSuccess}MB (last error: ${lastError})`);
expect(lastSuccess).toBeGreaterThanOrEqual(5);
});
});
// -------------------------------------------------------------------------
// Amazon Bedrock (claude-sonnet-4-5)
// Limits: 100 images (Anthropic), 5MB per image, 8000px max dimension
// -------------------------------------------------------------------------
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock (claude-sonnet-4-5)", () => {
const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
it("should accept a small number of images (5)", async () => {
const result = await testImageCount(model, 5, smallImage);
expect(result.success, result.error).toBe(true);
});
it("should find maximum image count limit", { timeout: 600000 }, async () => {
// Anthropic limit: 100 images
const { limit, lastError } = await findLimit((count) => testImageCount(model, count, smallImage), 20, 120, 20);
console.log(`\n Bedrock max images: ~${limit} (last error: ${lastError})`);
expect(limit).toBeGreaterThanOrEqual(80);
expect(limit).toBeLessThanOrEqual(100);
});
it("should find maximum image size limit", { timeout: 600000 }, async () => {
const MB = 1024 * 1024;
// Anthropic limit: 5MB per image
const sizes = [1, 2, 3, 4, 5, 6];
let lastSuccess = 0;
let lastError: string | undefined;
for (const sizeMB of sizes) {
console.log(` Testing size: ${sizeMB}MB...`);
const imageBase64 = generateImageWithSize(sizeMB * MB, `size-${sizeMB}mb.png`);
const result = await testImageSize(model, imageBase64);
if (result.success) {
lastSuccess = sizeMB;
console.log(` SUCCESS`);
} else {
lastError = result.error;
console.log(` FAILED: ${result.error?.substring(0, 100)}`);
break;
}
}
console.log(`\n Bedrock max image size: ~${lastSuccess}MB (last error: ${lastError})`);
expect(lastSuccess).toBeGreaterThanOrEqual(1);
});
it("should find maximum image dimension limit", { timeout: 600000 }, async () => {
// Anthropic limit: 8000px
const dimensions = [1000, 2000, 4000, 6000, 8000, 10000];
let lastSuccess = 0;
let lastError: string | undefined;
for (const dim of dimensions) {
console.log(` Testing dimension: ${dim}x${dim}...`);
const imageBase64 = generateImage(dim, dim, `dim-${dim}.png`);
const result = await testImageDimensions(model, imageBase64);
if (result.success) {
lastSuccess = dim;
console.log(` SUCCESS`);
} else {
lastError = result.error;
console.log(` FAILED: ${result.error?.substring(0, 100)}`);
break;
}
}
console.log(`\n Bedrock max dimension: ~${lastSuccess}px (last error: ${lastError})`);
expect(lastSuccess).toBeGreaterThanOrEqual(6000);
expect(lastSuccess).toBeLessThanOrEqual(8000);
});
});
// =========================================================================
// MAX SIZE IMAGES TEST
// =========================================================================
@ -898,6 +1015,38 @@ describe("Image Limits E2E Tests", () => {
},
);
// Amazon Bedrock (Claude) - 5MB per image limit, same as Anthropic direct
// Using 3MB to stay under 5MB limit
it.skipIf(!hasBedrockCredentials())(
"Bedrock: max ~3MB images before rejection",
{ timeout: 900000 },
async () => {
const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
const image3mb = getImageAtSize(3);
// Similar to Anthropic, test progressively
const counts = [1, 2, 4, 6, 8, 10, 12];
let lastSuccess = 0;
let lastError: string | undefined;
for (const count of counts) {
console.log(` Testing ${count} x ~3MB images...`);
const result = await testImageCount(model, count, image3mb);
if (result.success) {
lastSuccess = count;
console.log(` SUCCESS`);
} else {
lastError = result.error;
console.log(` FAILED: ${result.error?.substring(0, 150)}`);
break;
}
}
console.log(`\n Bedrock max ~3MB images: ${lastSuccess} (last error: ${lastError})`);
expect(lastSuccess).toBeGreaterThanOrEqual(1);
},
);
// OpenAI - 20MB per image documented, we found ≥25MB works
// Test with 15MB images to stay safely under limit
it.skipIf(!process.env.OPENAI_API_KEY)(

View file

@ -5,6 +5,7 @@ import { describe, expect, it } from "vitest";
import type { Api, Context, Model, Tool, ToolResultMessage } from "../src/index.js";
import { complete, getModel } from "../src/index.js";
import type { OptionsForApi } from "../src/types.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
import { resolveApiKey } from "./oauth.js";
// Resolve OAuth tokens at module level (async, runs before tests)
@ -273,6 +274,30 @@ describe("Tool Results with Images", () => {
});
});
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider (google/gemini-2.5-flash)", () => {
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
it("should handle tool result with only image", { retry: 3, timeout: 30000 }, async () => {
await handleToolWithImageResult(llm);
});
it("should handle tool result with text and image", { retry: 3, timeout: 30000 }, async () => {
await handleToolWithTextAndImageResult(llm);
});
});
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider (claude-sonnet-4-5)", () => {
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
it("should handle tool result with only image", { retry: 3, timeout: 30000 }, async () => {
await handleToolWithImageResult(llm);
});
it("should handle tool result with text and image", { retry: 3, timeout: 30000 }, async () => {
await handleToolWithTextAndImageResult(llm);
});
});
// =========================================================================
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
// =========================================================================

View file

@ -8,6 +8,7 @@ import { getModel } from "../src/models.js";
import { complete, stream } from "../src/stream.js";
import type { Api, Context, ImageContent, Model, OptionsForApi, Tool, ToolResultMessage } from "../src/types.js";
import { StringEnum } from "../src/utils/typebox-helpers.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
import { resolveApiKey } from "./oauth.js";
const __filename = fileURLToPath(import.meta.url);
@ -356,7 +357,7 @@ describe("Generate E2E Tests", () => {
await handleStreaming(llm);
});
it("should handle ", { retry: 3 }, async () => {
it("should handle thinking", { retry: 3 }, async () => {
await handleThinking(llm, { thinking: { enabled: true, budgetTokens: 1024 } });
});
@ -597,6 +598,87 @@ describe("Generate E2E Tests", () => {
});
});
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)(
"Vercel AI Gateway Provider (google/gemini-2.5-flash via Anthropic Messages)",
() => {
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
it("should complete basic text generation", { retry: 3 }, async () => {
await basicTextGeneration(llm);
});
it("should handle tool calling", { retry: 3 }, async () => {
await handleToolCall(llm);
});
it("should handle streaming", { retry: 3 }, async () => {
await handleStreaming(llm);
});
it("should handle image input", { retry: 3 }, async () => {
await handleImage(llm);
});
it("should handle multi-turn with tools", { retry: 3 }, async () => {
await multiTurn(llm);
});
},
);
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)(
"Vercel AI Gateway Provider (anthropic/claude-opus-4.5 via Anthropic Messages)",
() => {
const llm = getModel("vercel-ai-gateway", "anthropic/claude-opus-4.5");
it("should complete basic text generation", { retry: 3 }, async () => {
await basicTextGeneration(llm);
});
it("should handle tool calling", { retry: 3 }, async () => {
await handleToolCall(llm);
});
it("should handle streaming", { retry: 3 }, async () => {
await handleStreaming(llm);
});
it("should handle image input", { retry: 3 }, async () => {
await handleImage(llm);
});
it("should handle multi-turn with tools", { retry: 3 }, async () => {
await multiTurn(llm);
});
},
);
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)(
"Vercel AI Gateway Provider (openai/gpt-5.1-codex-max via Anthropic Messages)",
() => {
const llm = getModel("vercel-ai-gateway", "openai/gpt-5.1-codex-max");
it("should complete basic text generation", { retry: 3 }, async () => {
await basicTextGeneration(llm);
});
it("should handle tool calling", { retry: 3 }, async () => {
await handleToolCall(llm);
});
it("should handle streaming", { retry: 3 }, async () => {
await handleStreaming(llm);
});
it("should handle image input", { retry: 3 }, async () => {
await handleImage(llm);
});
it("should handle multi-turn with tools", { retry: 3 }, async () => {
await multiTurn(llm);
});
},
);
describe.skipIf(!process.env.ZAI_API_KEY)("zAI Provider (glm-4.5-air via OpenAI Completions)", () => {
const llm = getModel("zai", "glm-4.5-air");
@ -698,6 +780,30 @@ describe("Generate E2E Tests", () => {
});
});
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider (MiniMax-M2.1 via Anthropic Messages)", () => {
const llm = getModel("minimax", "MiniMax-M2.1");
it("should complete basic text generation", { retry: 3 }, async () => {
await basicTextGeneration(llm);
});
it("should handle tool calling", { retry: 3 }, async () => {
await handleToolCall(llm);
});
it("should handle streaming", { retry: 3 }, async () => {
await handleStreaming(llm);
});
it("should handle thinking mode", { retry: 3 }, async () => {
await handleThinking(llm, { thinkingEnabled: true, thinkingBudgetTokens: 2048 });
});
it("should handle multi-turn with thinking and tools", { retry: 3 }, async () => {
await multiTurn(llm, { thinkingEnabled: true, thinkingBudgetTokens: 2048 });
});
});
// =========================================================================
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
// Tokens are resolved at module level (see oauthTokens above)
@ -907,6 +1013,34 @@ describe("Generate E2E Tests", () => {
});
});
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider (claude-sonnet-4-5)", () => {
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
it("should complete basic text generation", { retry: 3 }, async () => {
await basicTextGeneration(llm);
});
it("should handle tool calling", { retry: 3 }, async () => {
await handleToolCall(llm);
});
it("should handle streaming", { retry: 3 }, async () => {
await handleStreaming(llm);
});
it("should handle thinking", { retry: 3 }, async () => {
await handleThinking(llm, { reasoning: "medium" });
});
it("should handle multi-turn with thinking and tools", { retry: 3 }, async () => {
await multiTurn(llm, { reasoning: "high" });
});
it("should handle image input", { retry: 3 }, async () => {
await handleImage(llm);
});
});
// Check if ollama is installed and local LLM tests are enabled
let ollamaInstalled = false;
if (!process.env.PI_NO_LOCAL_LLM) {

View file

@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
import { getModel } from "../src/models.js";
import { stream } from "../src/stream.js";
import type { Api, Context, Model, OptionsForApi } from "../src/types.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
import { resolveApiKey } from "./oauth.js";
// Resolve OAuth tokens at module level (async, runs before tests)
@ -44,18 +45,25 @@ async function testTokensOnAbort<TApi extends Api>(llm: Model<TApi>, options: Op
expect(msg.stopReason).toBe("aborted");
// OpenAI providers, OpenAI Codex, Gemini CLI, zai, and the GPT-OSS model on Antigravity only send usage in the final chunk,
// so when aborted they have no token stats Anthropic and Google send usage information early in the stream
// OpenAI providers, OpenAI Codex, Gemini CLI, zai, Amazon Bedrock, and the GPT-OSS model on Antigravity only send usage in the final chunk,
// so when aborted they have no token stats. Anthropic and Google send usage information early in the stream.
// MiniMax reports input tokens but not output tokens when aborted.
if (
llm.api === "openai-completions" ||
llm.api === "openai-responses" ||
llm.api === "openai-codex-responses" ||
llm.provider === "google-gemini-cli" ||
llm.provider === "zai" ||
llm.provider === "amazon-bedrock" ||
llm.provider === "vercel-ai-gateway" ||
(llm.provider === "google-antigravity" && llm.id.includes("gpt-oss"))
) {
expect(msg.usage.input).toBe(0);
expect(msg.usage.output).toBe(0);
} else if (llm.provider === "minimax") {
// MiniMax reports input tokens early but output tokens only in final chunk
expect(msg.usage.input).toBeGreaterThan(0);
expect(msg.usage.output).toBe(0);
} else {
expect(msg.usage.input).toBeGreaterThan(0);
expect(msg.usage.output).toBeGreaterThan(0);
@ -144,6 +152,22 @@ describe("Token Statistics on Abort", () => {
});
});
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider", () => {
const llm = getModel("minimax", "MiniMax-M2.1");
it("should include token stats when aborted mid-stream", { retry: 3, timeout: 30000 }, async () => {
await testTokensOnAbort(llm);
});
});
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider", () => {
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
it("should include token stats when aborted mid-stream", { retry: 3, timeout: 30000 }, async () => {
await testTokensOnAbort(llm);
});
});
// =========================================================================
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
// =========================================================================
@ -230,4 +254,12 @@ describe("Token Statistics on Abort", () => {
},
);
});
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider", () => {
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
it("should include token stats when aborted mid-stream", { retry: 3, timeout: 30000 }, async () => {
await testTokensOnAbort(llm);
});
});
});

View file

@ -3,6 +3,7 @@ import { describe, expect, it } from "vitest";
import { getModel } from "../src/models.js";
import { complete } from "../src/stream.js";
import type { Api, Context, Model, OptionsForApi, Tool } from "../src/types.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
import { resolveApiKey } from "./oauth.js";
// Resolve OAuth tokens at module level (async, runs before tests)
@ -170,6 +171,30 @@ describe("Tool Call Without Result Tests", () => {
});
});
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider", () => {
const model = getModel("minimax", "MiniMax-M2.1");
it("should filter out tool calls without corresponding tool results", { retry: 3, timeout: 30000 }, async () => {
await testToolCallWithoutResult(model);
});
});
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider", () => {
const model = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
it("should filter out tool calls without corresponding tool results", { retry: 3, timeout: 30000 }, async () => {
await testToolCallWithoutResult(model);
});
});
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider", () => {
const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
it("should filter out tool calls without corresponding tool results", { retry: 3, timeout: 30000 }, async () => {
await testToolCallWithoutResult(model);
});
});
// =========================================================================
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
// =========================================================================

View file

@ -16,6 +16,7 @@ import { describe, expect, it } from "vitest";
import { getModel } from "../src/models.js";
import { complete } from "../src/stream.js";
import type { Api, Context, Model, OptionsForApi, Usage } from "../src/types.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
import { resolveApiKey } from "./oauth.js";
// Resolve OAuth tokens at module level (async, runs before tests)
@ -324,6 +325,52 @@ describe("totalTokens field", () => {
);
});
// =========================================================================
// MiniMax
// =========================================================================
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax", () => {
it(
"MiniMax-M2.1 - should return totalTokens equal to sum of components",
{ retry: 3, timeout: 60000 },
async () => {
const llm = getModel("minimax", "MiniMax-M2.1");
console.log(`\nMiniMax / ${llm.id}:`);
const { first, second } = await testTotalTokensWithCache(llm, { apiKey: process.env.MINIMAX_API_KEY });
logUsage("First request", first);
logUsage("Second request", second);
assertTotalTokensEqualsComponents(first);
assertTotalTokensEqualsComponents(second);
},
);
});
// =========================================================================
// Vercel AI Gateway
// =========================================================================
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway", () => {
it(
"google/gemini-2.5-flash - should return totalTokens equal to sum of components",
{ retry: 3, timeout: 60000 },
async () => {
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
console.log(`\nVercel AI Gateway / ${llm.id}:`);
const { first, second } = await testTotalTokensWithCache(llm, { apiKey: process.env.AI_GATEWAY_API_KEY });
logUsage("First request", first);
logUsage("Second request", second);
assertTotalTokensEqualsComponents(first);
assertTotalTokensEqualsComponents(second);
},
);
});
// =========================================================================
// OpenRouter - Multiple backend providers
// =========================================================================
@ -535,6 +582,25 @@ describe("totalTokens field", () => {
);
});
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock", () => {
it(
"claude-sonnet-4-5 - should return totalTokens equal to sum of components",
{ retry: 3, timeout: 60000 },
async () => {
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
console.log(`\nAmazon Bedrock / ${llm.id}:`);
const { first, second } = await testTotalTokensWithCache(llm);
logUsage("First request", first);
logUsage("Second request", second);
assertTotalTokensEqualsComponents(first);
assertTotalTokensEqualsComponents(second);
},
);
});
// =========================================================================
// OpenAI Codex (OAuth)
// =========================================================================

View file

@ -3,6 +3,7 @@ import { describe, expect, it } from "vitest";
import { getModel } from "../src/models.js";
import { complete } from "../src/stream.js";
import type { Api, Context, Model, OptionsForApi, ToolResultMessage } from "../src/types.js";
import { hasBedrockCredentials } from "./bedrock-utils.js";
import { resolveApiKey } from "./oauth.js";
// Empty schema for test tools - must be proper OBJECT type for Cloud Code Assist
@ -617,6 +618,54 @@ describe("AI Providers Unicode Surrogate Pair Tests", () => {
});
});
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider Unicode Handling", () => {
const llm = getModel("minimax", "MiniMax-M2.1");
it("should handle emoji in tool results", { retry: 3, timeout: 30000 }, async () => {
await testEmojiInToolResults(llm);
});
it("should handle real-world LinkedIn comment data with emoji", { retry: 3, timeout: 30000 }, async () => {
await testRealWorldLinkedInData(llm);
});
it("should handle unpaired high surrogate (0xD83D) in tool results", { retry: 3, timeout: 30000 }, async () => {
await testUnpairedHighSurrogate(llm);
});
});
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider Unicode Handling", () => {
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
it("should handle emoji in tool results", { retry: 3, timeout: 30000 }, async () => {
await testEmojiInToolResults(llm);
});
it("should handle real-world LinkedIn comment data with emoji", { retry: 3, timeout: 30000 }, async () => {
await testRealWorldLinkedInData(llm);
});
it("should handle unpaired high surrogate (0xD83D) in tool results", { retry: 3, timeout: 30000 }, async () => {
await testUnpairedHighSurrogate(llm);
});
});
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider Unicode Handling", () => {
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
it("should handle emoji in tool results", { retry: 3, timeout: 30000 }, async () => {
await testEmojiInToolResults(llm);
});
it("should handle real-world LinkedIn comment data with emoji", { retry: 3, timeout: 30000 }, async () => {
await testRealWorldLinkedInData(llm);
});
it("should handle unpaired high surrogate (0xD83D) in tool results", { retry: 3, timeout: 30000 }, async () => {
await testUnpairedHighSurrogate(llm);
});
});
describe("OpenAI Codex Provider Unicode Handling", () => {
it.skipIf(!openaiCodexToken)(
"gpt-5.2-codex - should handle emoji in tool results",