mirror of
https://github.com/getcompanion-ai/co-mono.git
synced 2026-04-17 08:00:59 +00:00
Merge branch 'main' into feat/tui-overlay-options
This commit is contained in:
commit
7d45e434de
90 changed files with 10277 additions and 1700 deletions
|
|
@ -2,6 +2,36 @@
|
|||
|
||||
## [Unreleased]
|
||||
|
||||
## [0.45.5] - 2026-01-13
|
||||
|
||||
## [0.45.4] - 2026-01-13
|
||||
|
||||
### Added
|
||||
|
||||
- Added Vercel AI Gateway provider with model discovery and `AI_GATEWAY_API_KEY` env support ([#689](https://github.com/badlogic/pi-mono/pull/689) by [@timolins](https://github.com/timolins))
|
||||
|
||||
### Fixed
|
||||
|
||||
- Fixed z.ai thinking/reasoning: z.ai uses `thinking: { type: "enabled" }` instead of OpenAI's `reasoning_effort`. Added `thinkingFormat` compat flag to handle this. ([#688](https://github.com/badlogic/pi-mono/issues/688))
|
||||
|
||||
## [0.45.3] - 2026-01-13
|
||||
|
||||
## [0.45.2] - 2026-01-13
|
||||
|
||||
## [0.45.1] - 2026-01-13
|
||||
|
||||
## [0.45.0] - 2026-01-13
|
||||
|
||||
### Added
|
||||
|
||||
- MiniMax provider support with M2 and M2.1 models via Anthropic-compatible API ([#656](https://github.com/badlogic/pi-mono/pull/656) by [@dannote](https://github.com/dannote))
|
||||
- Add Amazon Bedrock provider with prompt caching for Claude models (experimental, tested with Anthropic Claude models only) ([#494](https://github.com/badlogic/pi-mono/pull/494) by [@unexge](https://github.com/unexge))
|
||||
- Added `serviceTier` option for OpenAI Responses requests ([#672](https://github.com/badlogic/pi-mono/pull/672) by [@markusylisiurunen](https://github.com/markusylisiurunen))
|
||||
- **Anthropic caching on OpenRouter**: Interactions with Anthropic models via OpenRouter now set a 5-minute cache point using Anthropic-style `cache_control` breakpoints on the last assistant or user message. ([#584](https://github.com/badlogic/pi-mono/pull/584) by [@nathyong](https://github.com/nathyong))
|
||||
- **Google Gemini CLI provider improvements**: Added Antigravity endpoint fallback (tries daily sandbox then prod when `baseUrl` is unset), header-based retry delay parsing (`Retry-After`, `x-ratelimit-reset`, `x-ratelimit-reset-after`), stable `sessionId` derivation from first user message for cache affinity, empty SSE stream retry with backoff, and `anthropic-beta` header for Claude thinking models ([#670](https://github.com/badlogic/pi-mono/pull/670) by [@kim0](https://github.com/kim0))
|
||||
|
||||
## [0.44.0] - 2026-01-12
|
||||
|
||||
## [0.43.0] - 2026-01-11
|
||||
|
||||
### Fixed
|
||||
|
|
|
|||
|
|
@ -56,9 +56,12 @@ Unified LLM API with automatic model discovery, provider configuration, token an
|
|||
- **Cerebras**
|
||||
- **xAI**
|
||||
- **OpenRouter**
|
||||
- **Vercel AI Gateway**
|
||||
- **MiniMax**
|
||||
- **GitHub Copilot** (requires OAuth, see below)
|
||||
- **Google Gemini CLI** (requires OAuth, see below)
|
||||
- **Antigravity** (requires OAuth, see below)
|
||||
- **Amazon Bedrock**
|
||||
- **Any OpenAI-compatible API**: Ollama, vLLM, LM Studio, etc.
|
||||
|
||||
## Installation
|
||||
|
|
@ -708,6 +711,7 @@ interface OpenAICompat {
|
|||
supportsDeveloperRole?: boolean; // Whether provider supports `developer` role vs `system` (default: true)
|
||||
supportsReasoningEffort?: boolean; // Whether provider supports `reasoning_effort` (default: true)
|
||||
maxTokensField?: 'max_completion_tokens' | 'max_tokens'; // Which field name to use (default: max_completion_tokens)
|
||||
thinkingFormat?: 'openai' | 'zai'; // Format for reasoning param: 'openai' uses reasoning_effort, 'zai' uses thinking: { type: "enabled" } (default: openai)
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -860,7 +864,9 @@ In Node.js environments, you can set environment variables to avoid passing API
|
|||
| Cerebras | `CEREBRAS_API_KEY` |
|
||||
| xAI | `XAI_API_KEY` |
|
||||
| OpenRouter | `OPENROUTER_API_KEY` |
|
||||
| Vercel AI Gateway | `AI_GATEWAY_API_KEY` |
|
||||
| zAI | `ZAI_API_KEY` |
|
||||
| MiniMax | `MINIMAX_API_KEY` |
|
||||
| GitHub Copilot | `COPILOT_GITHUB_TOKEN` or `GH_TOKEN` or `GITHUB_TOKEN` |
|
||||
|
||||
When set, the library automatically uses these keys:
|
||||
|
|
@ -1026,6 +1032,90 @@ const response = await complete(model, {
|
|||
|
||||
**Google Gemini CLI / Antigravity**: These use Google Cloud OAuth. The `apiKey` returned by `getOAuthApiKey()` is a JSON string containing both the token and project ID, which the library handles automatically.
|
||||
|
||||
## Development
|
||||
|
||||
### Adding a New Provider
|
||||
|
||||
Adding a new LLM provider requires changes across multiple files. This checklist covers all necessary steps:
|
||||
|
||||
#### 1. Core Types (`src/types.ts`)
|
||||
|
||||
- Add the API identifier to the `Api` type union (e.g., `"bedrock-converse-stream"`)
|
||||
- Create an options interface extending `StreamOptions` (e.g., `BedrockOptions`)
|
||||
- Add the mapping to `ApiOptionsMap`
|
||||
- Add the provider name to `KnownProvider` type union (e.g., `"amazon-bedrock"`)
|
||||
|
||||
#### 2. Provider Implementation (`src/providers/`)
|
||||
|
||||
Create a new provider file (e.g., `amazon-bedrock.ts`) that exports:
|
||||
|
||||
- `stream<Provider>()` function returning `AssistantMessageEventStream`
|
||||
- Provider-specific options interface
|
||||
- Message conversion functions to transform `Context` to provider format
|
||||
- Tool conversion if the provider supports tools
|
||||
- Response parsing to emit standardized events (`text`, `tool_call`, `thinking`, `usage`, `stop`)
|
||||
|
||||
#### 3. Stream Integration (`src/stream.ts`)
|
||||
|
||||
- Import the provider's stream function and options type
|
||||
- Add credential detection in `getEnvApiKey()` for the new provider
|
||||
- Add a case in `mapOptionsForApi()` to map `SimpleStreamOptions` to provider options
|
||||
- Add the provider's stream function to the `streamFunctions` map
|
||||
|
||||
#### 4. Model Generation (`scripts/generate-models.ts`)
|
||||
|
||||
- Add logic to fetch and parse models from the provider's source (e.g., models.dev API)
|
||||
- Map provider model data to the standardized `Model` interface
|
||||
- Handle provider-specific quirks (pricing format, capability flags, model ID transformations)
|
||||
|
||||
#### 5. Tests (`test/`)
|
||||
|
||||
Create or update test files to cover the new provider:
|
||||
|
||||
- `stream.test.ts` - Basic streaming and tool use
|
||||
- `tokens.test.ts` - Token usage reporting
|
||||
- `abort.test.ts` - Request cancellation
|
||||
- `empty.test.ts` - Empty message handling
|
||||
- `context-overflow.test.ts` - Context limit errors
|
||||
- `image-limits.test.ts` - Image support (if applicable)
|
||||
- `unicode-surrogate.test.ts` - Unicode handling
|
||||
- `tool-call-without-result.test.ts` - Orphaned tool calls
|
||||
- `image-tool-result.test.ts` - Images in tool results
|
||||
- `total-tokens.test.ts` - Token counting accuracy
|
||||
|
||||
For providers with non-standard auth (AWS, Google Vertex), create a utility like `bedrock-utils.ts` with credential detection helpers.
|
||||
|
||||
#### 6. Coding Agent Integration (`../coding-agent/`)
|
||||
|
||||
Update `src/core/model-resolver.ts`:
|
||||
|
||||
- Add a default model ID for the provider in `DEFAULT_MODELS`
|
||||
|
||||
Update `src/cli/args.ts`:
|
||||
|
||||
- Add environment variable documentation in the help text
|
||||
|
||||
Update `README.md`:
|
||||
|
||||
- Add the provider to the providers section with setup instructions
|
||||
|
||||
#### 7. Documentation
|
||||
|
||||
Update `packages/ai/README.md`:
|
||||
|
||||
- Add to the Supported Providers table
|
||||
- Document any provider-specific options or authentication requirements
|
||||
- Add environment variable to the Environment Variables section
|
||||
|
||||
#### 8. Changelog
|
||||
|
||||
Add an entry to `packages/ai/CHANGELOG.md` under `## [Unreleased]`:
|
||||
|
||||
```markdown
|
||||
### Added
|
||||
- Added support for [Provider Name] provider ([#PR](link) by [@author](link))
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
{
|
||||
"name": "@mariozechner/pi-ai",
|
||||
"version": "0.43.0",
|
||||
"version": "0.45.5",
|
||||
"description": "Unified LLM API with automatic model discovery and provider configuration",
|
||||
"type": "module",
|
||||
"main": "./dist/index.js",
|
||||
|
|
@ -23,6 +23,7 @@
|
|||
},
|
||||
"dependencies": {
|
||||
"@anthropic-ai/sdk": "0.71.2",
|
||||
"@aws-sdk/client-bedrock-runtime": "^3.966.0",
|
||||
"@google/genai": "1.34.0",
|
||||
"@mistralai/mistralai": "1.10.0",
|
||||
"@sinclair/typebox": "^0.34.41",
|
||||
|
|
@ -39,6 +40,7 @@
|
|||
"openai",
|
||||
"anthropic",
|
||||
"gemini",
|
||||
"bedrock",
|
||||
"unified",
|
||||
"api"
|
||||
],
|
||||
|
|
|
|||
|
|
@ -32,6 +32,20 @@ interface ModelsDevModel {
|
|||
};
|
||||
}
|
||||
|
||||
interface AiGatewayModel {
|
||||
id: string;
|
||||
name?: string;
|
||||
context_window?: number;
|
||||
max_tokens?: number;
|
||||
tags?: string[];
|
||||
pricing?: {
|
||||
input?: string | number;
|
||||
output?: string | number;
|
||||
input_cache_read?: string | number;
|
||||
input_cache_write?: string | number;
|
||||
};
|
||||
}
|
||||
|
||||
const COPILOT_STATIC_HEADERS = {
|
||||
"User-Agent": "GitHubCopilotChat/0.35.0",
|
||||
"Editor-Version": "vscode/1.107.0",
|
||||
|
|
@ -39,6 +53,9 @@ const COPILOT_STATIC_HEADERS = {
|
|||
"Copilot-Integration-Id": "vscode-chat",
|
||||
} as const;
|
||||
|
||||
const AI_GATEWAY_MODELS_URL = "https://ai-gateway.vercel.sh/v1";
|
||||
const AI_GATEWAY_BASE_URL = "https://ai-gateway.vercel.sh";
|
||||
|
||||
async function fetchOpenRouterModels(): Promise<Model<any>[]> {
|
||||
try {
|
||||
console.log("Fetching models from OpenRouter API...");
|
||||
|
|
@ -97,6 +114,64 @@ async function fetchOpenRouterModels(): Promise<Model<any>[]> {
|
|||
}
|
||||
}
|
||||
|
||||
async function fetchAiGatewayModels(): Promise<Model<any>[]> {
|
||||
try {
|
||||
console.log("Fetching models from Vercel AI Gateway API...");
|
||||
const response = await fetch(`${AI_GATEWAY_MODELS_URL}/models`);
|
||||
const data = await response.json();
|
||||
const models: Model<any>[] = [];
|
||||
|
||||
const toNumber = (value: string | number | undefined): number => {
|
||||
if (typeof value === "number") {
|
||||
return Number.isFinite(value) ? value : 0;
|
||||
}
|
||||
const parsed = parseFloat(value ?? "0");
|
||||
return Number.isFinite(parsed) ? parsed : 0;
|
||||
};
|
||||
|
||||
const items = Array.isArray(data.data) ? (data.data as AiGatewayModel[]) : [];
|
||||
for (const model of items) {
|
||||
const tags = Array.isArray(model.tags) ? model.tags : [];
|
||||
// Only include models that support tools
|
||||
if (!tags.includes("tool-use")) continue;
|
||||
|
||||
const input: ("text" | "image")[] = ["text"];
|
||||
if (tags.includes("vision")) {
|
||||
input.push("image");
|
||||
}
|
||||
|
||||
const inputCost = toNumber(model.pricing?.input) * 1_000_000;
|
||||
const outputCost = toNumber(model.pricing?.output) * 1_000_000;
|
||||
const cacheReadCost = toNumber(model.pricing?.input_cache_read) * 1_000_000;
|
||||
const cacheWriteCost = toNumber(model.pricing?.input_cache_write) * 1_000_000;
|
||||
|
||||
models.push({
|
||||
id: model.id,
|
||||
name: model.name || model.id,
|
||||
api: "anthropic-messages",
|
||||
baseUrl: AI_GATEWAY_BASE_URL,
|
||||
provider: "vercel-ai-gateway",
|
||||
reasoning: tags.includes("reasoning"),
|
||||
input,
|
||||
cost: {
|
||||
input: inputCost,
|
||||
output: outputCost,
|
||||
cacheRead: cacheReadCost,
|
||||
cacheWrite: cacheWriteCost,
|
||||
},
|
||||
contextWindow: model.context_window || 4096,
|
||||
maxTokens: model.max_tokens || 4096,
|
||||
});
|
||||
}
|
||||
|
||||
console.log(`Fetched ${models.length} tool-capable models from Vercel AI Gateway`);
|
||||
return models;
|
||||
} catch (error) {
|
||||
console.error("Failed to fetch Vercel AI Gateway models:", error);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
async function loadModelsDevData(): Promise<Model<any>[]> {
|
||||
try {
|
||||
console.log("Fetching models from models.dev API...");
|
||||
|
|
@ -105,6 +180,87 @@ async function loadModelsDevData(): Promise<Model<any>[]> {
|
|||
|
||||
const models: Model<any>[] = [];
|
||||
|
||||
// Process Amazon Bedrock models
|
||||
if (data["amazon-bedrock"]?.models) {
|
||||
for (const [modelId, model] of Object.entries(data["amazon-bedrock"].models)) {
|
||||
const m = model as ModelsDevModel;
|
||||
if (m.tool_call !== true) continue;
|
||||
|
||||
let id = modelId;
|
||||
|
||||
if (id.startsWith("ai21.jamba")) {
|
||||
// These models doesn't support tool use in streaming mode
|
||||
continue;
|
||||
}
|
||||
|
||||
if (id.startsWith("amazon.titan-text-express") ||
|
||||
id.startsWith("mistral.mistral-7b-instruct-v0")) {
|
||||
// These models doesn't support system messages
|
||||
continue;
|
||||
}
|
||||
|
||||
// Some Amazon Bedrock models require cross-region inference profiles to work.
|
||||
// To use cross-region inference, we need to add a region prefix to the models.
|
||||
// See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html#inference-profiles-support-system
|
||||
// TODO: Remove Claude models once https://github.com/anomalyco/models.dev/pull/607 is merged, and follow-up with other models.
|
||||
|
||||
// Models with global cross-region inference profiles
|
||||
if (id.startsWith("anthropic.claude-haiku-4-5") ||
|
||||
id.startsWith("anthropic.claude-sonnet-4") ||
|
||||
id.startsWith("anthropic.claude-opus-4-5") ||
|
||||
id.startsWith("amazon.nova-2-lite") ||
|
||||
id.startsWith("cohere.embed-v4") ||
|
||||
id.startsWith("twelvelabs.pegasus-1-2")) {
|
||||
id = "global." + id;
|
||||
}
|
||||
|
||||
// Models with US cross-region inference profiles
|
||||
if (id.startsWith("amazon.nova-lite") ||
|
||||
id.startsWith("amazon.nova-micro") ||
|
||||
id.startsWith("amazon.nova-premier") ||
|
||||
id.startsWith("amazon.nova-pro") ||
|
||||
id.startsWith("anthropic.claude-3-7-sonnet") ||
|
||||
id.startsWith("anthropic.claude-opus-4-1") ||
|
||||
id.startsWith("anthropic.claude-opus-4-20250514") ||
|
||||
id.startsWith("deepseek.r1") ||
|
||||
id.startsWith("meta.llama3-2") ||
|
||||
id.startsWith("meta.llama3-3") ||
|
||||
id.startsWith("meta.llama4")) {
|
||||
id = "us." + id;
|
||||
}
|
||||
|
||||
const bedrockModel = {
|
||||
id,
|
||||
name: m.name || id,
|
||||
api: "bedrock-converse-stream" as const,
|
||||
provider: "amazon-bedrock" as const,
|
||||
baseUrl: "https://bedrock-runtime.us-east-1.amazonaws.com",
|
||||
reasoning: m.reasoning === true,
|
||||
input: (m.modalities?.input?.includes("image") ? ["text", "image"] : ["text"]) as ("text" | "image")[],
|
||||
cost: {
|
||||
input: m.cost?.input || 0,
|
||||
output: m.cost?.output || 0,
|
||||
cacheRead: m.cost?.cache_read || 0,
|
||||
cacheWrite: m.cost?.cache_write || 0,
|
||||
},
|
||||
contextWindow: m.limit?.context || 4096,
|
||||
maxTokens: m.limit?.output || 4096,
|
||||
};
|
||||
models.push(bedrockModel);
|
||||
|
||||
// Add EU cross-region inference variants for Claude models
|
||||
if (modelId.startsWith("anthropic.claude-haiku-4-5") ||
|
||||
modelId.startsWith("anthropic.claude-sonnet-4-5") ||
|
||||
modelId.startsWith("anthropic.claude-opus-4-5")) {
|
||||
models.push({
|
||||
...bedrockModel,
|
||||
id: "eu." + modelId,
|
||||
name: (m.name || modelId) + " (EU)",
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Process Anthropic models
|
||||
if (data.anthropic?.models) {
|
||||
for (const [modelId, model] of Object.entries(data.anthropic.models)) {
|
||||
|
|
@ -284,6 +440,7 @@ async function loadModelsDevData(): Promise<Model<any>[]> {
|
|||
},
|
||||
compat: {
|
||||
supportsDeveloperRole: false,
|
||||
thinkingFormat: "zai",
|
||||
},
|
||||
contextWindow: m.limit?.context || 4096,
|
||||
maxTokens: m.limit?.output || 4096,
|
||||
|
|
@ -409,6 +566,33 @@ async function loadModelsDevData(): Promise<Model<any>[]> {
|
|||
}
|
||||
}
|
||||
|
||||
// Process MiniMax models
|
||||
if (data.minimax?.models) {
|
||||
for (const [modelId, model] of Object.entries(data.minimax.models)) {
|
||||
const m = model as ModelsDevModel;
|
||||
if (m.tool_call !== true) continue;
|
||||
|
||||
models.push({
|
||||
id: modelId,
|
||||
name: m.name || modelId,
|
||||
api: "anthropic-messages",
|
||||
provider: "minimax",
|
||||
// MiniMax's Anthropic-compatible API - SDK appends /v1/messages
|
||||
baseUrl: "https://api.minimax.io/anthropic",
|
||||
reasoning: m.reasoning === true,
|
||||
input: m.modalities?.input?.includes("image") ? ["text", "image"] : ["text"],
|
||||
cost: {
|
||||
input: m.cost?.input || 0,
|
||||
output: m.cost?.output || 0,
|
||||
cacheRead: m.cost?.cache_read || 0,
|
||||
cacheWrite: m.cost?.cache_write || 0,
|
||||
},
|
||||
contextWindow: m.limit?.context || 4096,
|
||||
maxTokens: m.limit?.output || 4096,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`Loaded ${models.length} tool-capable models from models.dev`);
|
||||
return models;
|
||||
} catch (error) {
|
||||
|
|
@ -421,11 +605,13 @@ async function generateModels() {
|
|||
// Fetch models from both sources
|
||||
// models.dev: Anthropic, Google, OpenAI, Groq, Cerebras
|
||||
// OpenRouter: xAI and other providers (excluding Anthropic, Google, OpenAI)
|
||||
// AI Gateway: OpenAI-compatible catalog with tool-capable models
|
||||
const modelsDevModels = await loadModelsDevData();
|
||||
const openRouterModels = await fetchOpenRouterModels();
|
||||
const aiGatewayModels = await fetchAiGatewayModels();
|
||||
|
||||
// Combine models (models.dev has priority)
|
||||
const allModels = [...modelsDevModels, ...openRouterModels];
|
||||
const allModels = [...modelsDevModels, ...openRouterModels, ...aiGatewayModels];
|
||||
|
||||
// Fix incorrect cache pricing for Claude Opus 4.5 from models.dev
|
||||
// models.dev has 3x the correct pricing (1.5/18.75 instead of 0.5/6.25)
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
548
packages/ai/src/providers/amazon-bedrock.ts
Normal file
548
packages/ai/src/providers/amazon-bedrock.ts
Normal file
|
|
@ -0,0 +1,548 @@
|
|||
import {
|
||||
BedrockRuntimeClient,
|
||||
StopReason as BedrockStopReason,
|
||||
type Tool as BedrockTool,
|
||||
CachePointType,
|
||||
type ContentBlock,
|
||||
type ContentBlockDeltaEvent,
|
||||
type ContentBlockStartEvent,
|
||||
type ContentBlockStopEvent,
|
||||
ConversationRole,
|
||||
ConverseStreamCommand,
|
||||
type ConverseStreamMetadataEvent,
|
||||
ImageFormat,
|
||||
type Message,
|
||||
type SystemContentBlock,
|
||||
type ToolChoice,
|
||||
type ToolConfiguration,
|
||||
ToolResultStatus,
|
||||
} from "@aws-sdk/client-bedrock-runtime";
|
||||
|
||||
import { calculateCost } from "../models.js";
|
||||
import type {
|
||||
Api,
|
||||
AssistantMessage,
|
||||
Context,
|
||||
Model,
|
||||
StopReason,
|
||||
StreamFunction,
|
||||
StreamOptions,
|
||||
TextContent,
|
||||
ThinkingBudgets,
|
||||
ThinkingContent,
|
||||
ThinkingLevel,
|
||||
Tool,
|
||||
ToolCall,
|
||||
ToolResultMessage,
|
||||
} from "../types.js";
|
||||
import { AssistantMessageEventStream } from "../utils/event-stream.js";
|
||||
import { parseStreamingJson } from "../utils/json-parse.js";
|
||||
import { sanitizeSurrogates } from "../utils/sanitize-unicode.js";
|
||||
|
||||
export interface BedrockOptions extends StreamOptions {
|
||||
region?: string;
|
||||
profile?: string;
|
||||
toolChoice?: "auto" | "any" | "none" | { type: "tool"; name: string };
|
||||
/* See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-reasoning.html for supported models. */
|
||||
reasoning?: ThinkingLevel;
|
||||
/* Custom token budgets per thinking level. Overrides default budgets. */
|
||||
thinkingBudgets?: ThinkingBudgets;
|
||||
/* Only supported by Claude 4.x models, see https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved */
|
||||
interleavedThinking?: boolean;
|
||||
}
|
||||
|
||||
type Block = (TextContent | ThinkingContent | ToolCall) & { index?: number; partialJson?: string };
|
||||
|
||||
export const streamBedrock: StreamFunction<"bedrock-converse-stream"> = (
|
||||
model: Model<"bedrock-converse-stream">,
|
||||
context: Context,
|
||||
options: BedrockOptions,
|
||||
): AssistantMessageEventStream => {
|
||||
const stream = new AssistantMessageEventStream();
|
||||
|
||||
(async () => {
|
||||
const output: AssistantMessage = {
|
||||
role: "assistant",
|
||||
content: [],
|
||||
api: "bedrock-converse-stream" as Api,
|
||||
provider: model.provider,
|
||||
model: model.id,
|
||||
usage: {
|
||||
input: 0,
|
||||
output: 0,
|
||||
cacheRead: 0,
|
||||
cacheWrite: 0,
|
||||
totalTokens: 0,
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
|
||||
},
|
||||
stopReason: "stop",
|
||||
timestamp: Date.now(),
|
||||
};
|
||||
|
||||
const blocks = output.content as Block[];
|
||||
|
||||
try {
|
||||
const client = new BedrockRuntimeClient({
|
||||
region: options.region || process.env.AWS_REGION || process.env.AWS_DEFAULT_REGION || "us-east-1",
|
||||
profile: options.profile,
|
||||
});
|
||||
|
||||
const command = new ConverseStreamCommand({
|
||||
modelId: model.id,
|
||||
messages: convertMessages(context, model),
|
||||
system: buildSystemPrompt(context.systemPrompt, model),
|
||||
inferenceConfig: { maxTokens: options.maxTokens, temperature: options.temperature },
|
||||
toolConfig: convertToolConfig(context.tools, options.toolChoice),
|
||||
additionalModelRequestFields: buildAdditionalModelRequestFields(model, options),
|
||||
});
|
||||
|
||||
const response = await client.send(command, { abortSignal: options.signal });
|
||||
|
||||
for await (const item of response.stream!) {
|
||||
if (item.messageStart) {
|
||||
if (item.messageStart.role !== ConversationRole.ASSISTANT) {
|
||||
throw new Error("Unexpected assistant message start but got user message start instead");
|
||||
}
|
||||
stream.push({ type: "start", partial: output });
|
||||
} else if (item.contentBlockStart) {
|
||||
handleContentBlockStart(item.contentBlockStart, blocks, output, stream);
|
||||
} else if (item.contentBlockDelta) {
|
||||
handleContentBlockDelta(item.contentBlockDelta, blocks, output, stream);
|
||||
} else if (item.contentBlockStop) {
|
||||
handleContentBlockStop(item.contentBlockStop, blocks, output, stream);
|
||||
} else if (item.messageStop) {
|
||||
output.stopReason = mapStopReason(item.messageStop.stopReason);
|
||||
} else if (item.metadata) {
|
||||
handleMetadata(item.metadata, model, output);
|
||||
} else if (item.internalServerException) {
|
||||
throw new Error(`Internal server error: ${item.internalServerException.message}`);
|
||||
} else if (item.modelStreamErrorException) {
|
||||
throw new Error(`Model stream error: ${item.modelStreamErrorException.message}`);
|
||||
} else if (item.validationException) {
|
||||
throw new Error(`Validation error: ${item.validationException.message}`);
|
||||
} else if (item.throttlingException) {
|
||||
throw new Error(`Throttling error: ${item.throttlingException.message}`);
|
||||
} else if (item.serviceUnavailableException) {
|
||||
throw new Error(`Service unavailable: ${item.serviceUnavailableException.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
if (options.signal?.aborted) {
|
||||
throw new Error("Request was aborted");
|
||||
}
|
||||
|
||||
if (output.stopReason === "error" || output.stopReason === "aborted") {
|
||||
throw new Error("An unknown error occurred");
|
||||
}
|
||||
|
||||
stream.push({ type: "done", reason: output.stopReason, message: output });
|
||||
stream.end();
|
||||
} catch (error) {
|
||||
for (const block of output.content) {
|
||||
delete (block as Block).index;
|
||||
delete (block as Block).partialJson;
|
||||
}
|
||||
output.stopReason = options.signal?.aborted ? "aborted" : "error";
|
||||
output.errorMessage = error instanceof Error ? error.message : JSON.stringify(error);
|
||||
stream.push({ type: "error", reason: output.stopReason, error: output });
|
||||
stream.end();
|
||||
}
|
||||
})();
|
||||
|
||||
return stream;
|
||||
};
|
||||
|
||||
function handleContentBlockStart(
|
||||
event: ContentBlockStartEvent,
|
||||
blocks: Block[],
|
||||
output: AssistantMessage,
|
||||
stream: AssistantMessageEventStream,
|
||||
): void {
|
||||
const index = event.contentBlockIndex!;
|
||||
const start = event.start;
|
||||
|
||||
if (start?.toolUse) {
|
||||
const block: Block = {
|
||||
type: "toolCall",
|
||||
id: start.toolUse.toolUseId || "",
|
||||
name: start.toolUse.name || "",
|
||||
arguments: {},
|
||||
partialJson: "",
|
||||
index,
|
||||
};
|
||||
output.content.push(block);
|
||||
stream.push({ type: "toolcall_start", contentIndex: blocks.length - 1, partial: output });
|
||||
}
|
||||
}
|
||||
|
||||
function handleContentBlockDelta(
|
||||
event: ContentBlockDeltaEvent,
|
||||
blocks: Block[],
|
||||
output: AssistantMessage,
|
||||
stream: AssistantMessageEventStream,
|
||||
): void {
|
||||
const contentBlockIndex = event.contentBlockIndex!;
|
||||
const delta = event.delta;
|
||||
let index = blocks.findIndex((b) => b.index === contentBlockIndex);
|
||||
let block = blocks[index];
|
||||
|
||||
if (delta?.text !== undefined) {
|
||||
// If no text block exists yet, create one, as `handleContentBlockStart` is not sent for text blocks
|
||||
if (!block) {
|
||||
const newBlock: Block = { type: "text", text: "", index: contentBlockIndex };
|
||||
output.content.push(newBlock);
|
||||
index = blocks.length - 1;
|
||||
block = blocks[index];
|
||||
stream.push({ type: "text_start", contentIndex: index, partial: output });
|
||||
}
|
||||
if (block.type === "text") {
|
||||
block.text += delta.text;
|
||||
stream.push({ type: "text_delta", contentIndex: index, delta: delta.text, partial: output });
|
||||
}
|
||||
} else if (delta?.toolUse && block?.type === "toolCall") {
|
||||
block.partialJson = (block.partialJson || "") + (delta.toolUse.input || "");
|
||||
block.arguments = parseStreamingJson(block.partialJson);
|
||||
stream.push({ type: "toolcall_delta", contentIndex: index, delta: delta.toolUse.input || "", partial: output });
|
||||
} else if (delta?.reasoningContent) {
|
||||
let thinkingBlock = block;
|
||||
let thinkingIndex = index;
|
||||
|
||||
if (!thinkingBlock) {
|
||||
const newBlock: Block = { type: "thinking", thinking: "", thinkingSignature: "", index: contentBlockIndex };
|
||||
output.content.push(newBlock);
|
||||
thinkingIndex = blocks.length - 1;
|
||||
thinkingBlock = blocks[thinkingIndex];
|
||||
stream.push({ type: "thinking_start", contentIndex: thinkingIndex, partial: output });
|
||||
}
|
||||
|
||||
if (thinkingBlock?.type === "thinking") {
|
||||
if (delta.reasoningContent.text) {
|
||||
thinkingBlock.thinking += delta.reasoningContent.text;
|
||||
stream.push({
|
||||
type: "thinking_delta",
|
||||
contentIndex: thinkingIndex,
|
||||
delta: delta.reasoningContent.text,
|
||||
partial: output,
|
||||
});
|
||||
}
|
||||
if (delta.reasoningContent.signature) {
|
||||
thinkingBlock.thinkingSignature =
|
||||
(thinkingBlock.thinkingSignature || "") + delta.reasoningContent.signature;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function handleMetadata(
|
||||
event: ConverseStreamMetadataEvent,
|
||||
model: Model<"bedrock-converse-stream">,
|
||||
output: AssistantMessage,
|
||||
): void {
|
||||
if (event.usage) {
|
||||
output.usage.input = event.usage.inputTokens || 0;
|
||||
output.usage.output = event.usage.outputTokens || 0;
|
||||
output.usage.cacheRead = event.usage.cacheReadInputTokens || 0;
|
||||
output.usage.cacheWrite = event.usage.cacheWriteInputTokens || 0;
|
||||
output.usage.totalTokens = event.usage.totalTokens || output.usage.input + output.usage.output;
|
||||
calculateCost(model, output.usage);
|
||||
}
|
||||
}
|
||||
|
||||
function handleContentBlockStop(
|
||||
event: ContentBlockStopEvent,
|
||||
blocks: Block[],
|
||||
output: AssistantMessage,
|
||||
stream: AssistantMessageEventStream,
|
||||
): void {
|
||||
const index = blocks.findIndex((b) => b.index === event.contentBlockIndex);
|
||||
const block = blocks[index];
|
||||
if (!block) return;
|
||||
delete (block as Block).index;
|
||||
|
||||
switch (block.type) {
|
||||
case "text":
|
||||
stream.push({ type: "text_end", contentIndex: index, content: block.text, partial: output });
|
||||
break;
|
||||
case "thinking":
|
||||
stream.push({ type: "thinking_end", contentIndex: index, content: block.thinking, partial: output });
|
||||
break;
|
||||
case "toolCall":
|
||||
block.arguments = parseStreamingJson(block.partialJson);
|
||||
delete (block as Block).partialJson;
|
||||
stream.push({ type: "toolcall_end", contentIndex: index, toolCall: block, partial: output });
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if the model supports prompt caching.
|
||||
* Supported: Claude 3.5 Haiku, Claude 3.7 Sonnet, Claude 4.x models
|
||||
*/
|
||||
function supportsPromptCaching(model: Model<"bedrock-converse-stream">): boolean {
|
||||
const id = model.id.toLowerCase();
|
||||
// Claude 4.x models (opus-4, sonnet-4, haiku-4)
|
||||
if (id.includes("claude") && (id.includes("-4-") || id.includes("-4."))) return true;
|
||||
// Claude 3.7 Sonnet
|
||||
if (id.includes("claude-3-7-sonnet")) return true;
|
||||
// Claude 3.5 Haiku
|
||||
if (id.includes("claude-3-5-haiku")) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function buildSystemPrompt(
|
||||
systemPrompt: string | undefined,
|
||||
model: Model<"bedrock-converse-stream">,
|
||||
): SystemContentBlock[] | undefined {
|
||||
if (!systemPrompt) return undefined;
|
||||
|
||||
const blocks: SystemContentBlock[] = [{ text: sanitizeSurrogates(systemPrompt) }];
|
||||
|
||||
// Add cache point for supported Claude models
|
||||
if (supportsPromptCaching(model)) {
|
||||
blocks.push({ cachePoint: { type: CachePointType.DEFAULT } });
|
||||
}
|
||||
|
||||
return blocks;
|
||||
}
|
||||
|
||||
function convertMessages(context: Context, model: Model<"bedrock-converse-stream">): Message[] {
|
||||
const result: Message[] = [];
|
||||
const messages = context.messages;
|
||||
|
||||
for (let i = 0; i < messages.length; i++) {
|
||||
const m = messages[i];
|
||||
|
||||
switch (m.role) {
|
||||
case "user":
|
||||
result.push({
|
||||
role: ConversationRole.USER,
|
||||
content:
|
||||
typeof m.content === "string"
|
||||
? [{ text: sanitizeSurrogates(m.content) }]
|
||||
: m.content.map((c) => {
|
||||
switch (c.type) {
|
||||
case "text":
|
||||
return { text: sanitizeSurrogates(c.text) };
|
||||
case "image":
|
||||
return { image: createImageBlock(c.mimeType, c.data) };
|
||||
default:
|
||||
throw new Error("Unknown user content type");
|
||||
}
|
||||
}),
|
||||
});
|
||||
break;
|
||||
case "assistant": {
|
||||
// Skip assistant messages with empty content (e.g., from aborted requests)
|
||||
// Bedrock rejects messages with empty content arrays
|
||||
if (m.content.length === 0) {
|
||||
continue;
|
||||
}
|
||||
const contentBlocks: ContentBlock[] = [];
|
||||
for (const c of m.content) {
|
||||
switch (c.type) {
|
||||
case "text":
|
||||
// Skip empty text blocks
|
||||
if (c.text.trim().length === 0) continue;
|
||||
contentBlocks.push({ text: sanitizeSurrogates(c.text) });
|
||||
break;
|
||||
case "toolCall":
|
||||
contentBlocks.push({
|
||||
toolUse: { toolUseId: c.id, name: c.name, input: c.arguments },
|
||||
});
|
||||
break;
|
||||
case "thinking":
|
||||
// Skip empty thinking blocks
|
||||
if (c.thinking.trim().length === 0) continue;
|
||||
contentBlocks.push({
|
||||
reasoningContent: {
|
||||
reasoningText: { text: sanitizeSurrogates(c.thinking), signature: c.thinkingSignature },
|
||||
},
|
||||
});
|
||||
break;
|
||||
default:
|
||||
throw new Error("Unknown assistant content type");
|
||||
}
|
||||
}
|
||||
// Skip if all content blocks were filtered out
|
||||
if (contentBlocks.length === 0) {
|
||||
continue;
|
||||
}
|
||||
result.push({
|
||||
role: ConversationRole.ASSISTANT,
|
||||
content: contentBlocks,
|
||||
});
|
||||
break;
|
||||
}
|
||||
case "toolResult": {
|
||||
// Collect all consecutive toolResult messages into a single user message
|
||||
// Bedrock requires all tool results to be in one message
|
||||
const toolResults: ContentBlock.ToolResultMember[] = [];
|
||||
|
||||
// Add current tool result with all content blocks combined
|
||||
toolResults.push({
|
||||
toolResult: {
|
||||
toolUseId: m.toolCallId,
|
||||
content: m.content.map((c) =>
|
||||
c.type === "image"
|
||||
? { image: createImageBlock(c.mimeType, c.data) }
|
||||
: { text: sanitizeSurrogates(c.text) },
|
||||
),
|
||||
status: m.isError ? ToolResultStatus.ERROR : ToolResultStatus.SUCCESS,
|
||||
},
|
||||
});
|
||||
|
||||
// Look ahead for consecutive toolResult messages
|
||||
let j = i + 1;
|
||||
while (j < messages.length && messages[j].role === "toolResult") {
|
||||
const nextMsg = messages[j] as ToolResultMessage;
|
||||
toolResults.push({
|
||||
toolResult: {
|
||||
toolUseId: nextMsg.toolCallId,
|
||||
content: nextMsg.content.map((c) =>
|
||||
c.type === "image"
|
||||
? { image: createImageBlock(c.mimeType, c.data) }
|
||||
: { text: sanitizeSurrogates(c.text) },
|
||||
),
|
||||
status: nextMsg.isError ? ToolResultStatus.ERROR : ToolResultStatus.SUCCESS,
|
||||
},
|
||||
});
|
||||
j++;
|
||||
}
|
||||
|
||||
// Skip the messages we've already processed
|
||||
i = j - 1;
|
||||
|
||||
result.push({
|
||||
role: ConversationRole.USER,
|
||||
content: toolResults,
|
||||
});
|
||||
break;
|
||||
}
|
||||
default:
|
||||
throw new Error("Unknown message role");
|
||||
}
|
||||
}
|
||||
|
||||
// Add cache point to the last user message for supported Claude models
|
||||
if (supportsPromptCaching(model) && result.length > 0) {
|
||||
const lastMessage = result[result.length - 1];
|
||||
if (lastMessage.role === ConversationRole.USER && lastMessage.content) {
|
||||
(lastMessage.content as ContentBlock[]).push({ cachePoint: { type: CachePointType.DEFAULT } });
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
function convertToolConfig(
|
||||
tools: Tool[] | undefined,
|
||||
toolChoice: BedrockOptions["toolChoice"],
|
||||
): ToolConfiguration | undefined {
|
||||
if (!tools?.length || toolChoice === "none") return undefined;
|
||||
|
||||
const bedrockTools: BedrockTool[] = tools.map((tool) => ({
|
||||
toolSpec: {
|
||||
name: tool.name,
|
||||
description: tool.description,
|
||||
inputSchema: { json: tool.parameters },
|
||||
},
|
||||
}));
|
||||
|
||||
let bedrockToolChoice: ToolChoice | undefined;
|
||||
switch (toolChoice) {
|
||||
case "auto":
|
||||
bedrockToolChoice = { auto: {} };
|
||||
break;
|
||||
case "any":
|
||||
bedrockToolChoice = { any: {} };
|
||||
break;
|
||||
default:
|
||||
if (toolChoice?.type === "tool") {
|
||||
bedrockToolChoice = { tool: { name: toolChoice.name } };
|
||||
}
|
||||
}
|
||||
|
||||
return { tools: bedrockTools, toolChoice: bedrockToolChoice };
|
||||
}
|
||||
|
||||
function mapStopReason(reason: string | undefined): StopReason {
|
||||
switch (reason) {
|
||||
case BedrockStopReason.END_TURN:
|
||||
case BedrockStopReason.STOP_SEQUENCE:
|
||||
return "stop";
|
||||
case BedrockStopReason.MAX_TOKENS:
|
||||
case BedrockStopReason.MODEL_CONTEXT_WINDOW_EXCEEDED:
|
||||
return "length";
|
||||
case BedrockStopReason.TOOL_USE:
|
||||
return "toolUse";
|
||||
default:
|
||||
return "error";
|
||||
}
|
||||
}
|
||||
|
||||
function buildAdditionalModelRequestFields(
|
||||
model: Model<"bedrock-converse-stream">,
|
||||
options: BedrockOptions,
|
||||
): Record<string, any> | undefined {
|
||||
if (!options.reasoning || !model.reasoning) {
|
||||
return undefined;
|
||||
}
|
||||
|
||||
if (model.id.includes("anthropic.claude")) {
|
||||
const defaultBudgets: Record<ThinkingLevel, number> = {
|
||||
minimal: 1024,
|
||||
low: 2048,
|
||||
medium: 8192,
|
||||
high: 16384,
|
||||
xhigh: 16384, // Claude doesn't support xhigh, clamp to high
|
||||
};
|
||||
|
||||
// Custom budgets override defaults (xhigh not in ThinkingBudgets, use high)
|
||||
const level = options.reasoning === "xhigh" ? "high" : options.reasoning;
|
||||
const budget = options.thinkingBudgets?.[level] ?? defaultBudgets[options.reasoning];
|
||||
|
||||
const result: Record<string, any> = {
|
||||
thinking: {
|
||||
type: "enabled",
|
||||
budget_tokens: budget,
|
||||
},
|
||||
};
|
||||
|
||||
if (options.interleavedThinking) {
|
||||
result.anthropic_beta = ["interleaved-thinking-2025-05-14"];
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
return undefined;
|
||||
}
|
||||
|
||||
function createImageBlock(mimeType: string, data: string) {
|
||||
let format: ImageFormat;
|
||||
switch (mimeType) {
|
||||
case "image/jpeg":
|
||||
case "image/jpg":
|
||||
format = ImageFormat.JPEG;
|
||||
break;
|
||||
case "image/png":
|
||||
format = ImageFormat.PNG;
|
||||
break;
|
||||
case "image/gif":
|
||||
format = ImageFormat.GIF;
|
||||
break;
|
||||
case "image/webp":
|
||||
format = ImageFormat.WEBP;
|
||||
break;
|
||||
default:
|
||||
throw new Error(`Unknown image type: ${mimeType}`);
|
||||
}
|
||||
|
||||
const binaryString = atob(data);
|
||||
const bytes = new Uint8Array(binaryString.length);
|
||||
for (let i = 0; i < binaryString.length; i++) {
|
||||
bytes[i] = binaryString.charCodeAt(i);
|
||||
}
|
||||
|
||||
return { source: { bytes }, format };
|
||||
}
|
||||
|
|
@ -287,7 +287,7 @@ export const streamAnthropic: StreamFunction<"anthropic-messages"> = (
|
|||
}
|
||||
|
||||
if (output.stopReason === "aborted" || output.stopReason === "error") {
|
||||
throw new Error("An unkown error ocurred");
|
||||
throw new Error("An unknown error occurred");
|
||||
}
|
||||
|
||||
stream.push({ type: "done", reason: output.stopReason, message: output });
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@
|
|||
* Uses the Cloud Code Assist API endpoint to access Gemini and Claude models.
|
||||
*/
|
||||
|
||||
import { createHash } from "node:crypto";
|
||||
import type { Content, ThinkingConfig } from "@google/genai";
|
||||
import { calculateCost } from "../models.js";
|
||||
import type {
|
||||
|
|
@ -54,6 +55,8 @@ export interface GoogleGeminiCliOptions extends StreamOptions {
|
|||
}
|
||||
|
||||
const DEFAULT_ENDPOINT = "https://cloudcode-pa.googleapis.com";
|
||||
const ANTIGRAVITY_DAILY_ENDPOINT = "https://daily-cloudcode-pa.sandbox.googleapis.com";
|
||||
const ANTIGRAVITY_ENDPOINT_FALLBACKS = [ANTIGRAVITY_DAILY_ENDPOINT, DEFAULT_ENDPOINT] as const;
|
||||
// Headers for Gemini CLI (prod endpoint)
|
||||
const GEMINI_CLI_HEADERS = {
|
||||
"User-Agent": "google-cloud-sdk vscode_cloudshelleditor/0.1",
|
||||
|
|
@ -163,16 +166,66 @@ let toolCallCounter = 0;
|
|||
// Retry configuration
|
||||
const MAX_RETRIES = 3;
|
||||
const BASE_DELAY_MS = 1000;
|
||||
const MAX_EMPTY_STREAM_RETRIES = 2;
|
||||
const EMPTY_STREAM_BASE_DELAY_MS = 500;
|
||||
const CLAUDE_THINKING_BETA_HEADER = "interleaved-thinking-2025-05-14";
|
||||
|
||||
/**
|
||||
* Extract retry delay from Gemini error response (in milliseconds).
|
||||
* Parses patterns like:
|
||||
* Checks headers first (Retry-After, x-ratelimit-reset, x-ratelimit-reset-after),
|
||||
* then parses body patterns like:
|
||||
* - "Your quota will reset after 39s"
|
||||
* - "Your quota will reset after 18h31m10s"
|
||||
* - "Please retry in Xs" or "Please retry in Xms"
|
||||
* - "retryDelay": "34.074824224s" (JSON field)
|
||||
*/
|
||||
function extractRetryDelay(errorText: string): number | undefined {
|
||||
export function extractRetryDelay(errorText: string, response?: Response | Headers): number | undefined {
|
||||
const normalizeDelay = (ms: number): number | undefined => (ms > 0 ? Math.ceil(ms + 1000) : undefined);
|
||||
|
||||
const headers = response instanceof Headers ? response : response?.headers;
|
||||
if (headers) {
|
||||
const retryAfter = headers.get("retry-after");
|
||||
if (retryAfter) {
|
||||
const retryAfterSeconds = Number(retryAfter);
|
||||
if (Number.isFinite(retryAfterSeconds)) {
|
||||
const delay = normalizeDelay(retryAfterSeconds * 1000);
|
||||
if (delay !== undefined) {
|
||||
return delay;
|
||||
}
|
||||
}
|
||||
const retryAfterDate = new Date(retryAfter);
|
||||
const retryAfterMs = retryAfterDate.getTime();
|
||||
if (!Number.isNaN(retryAfterMs)) {
|
||||
const delay = normalizeDelay(retryAfterMs - Date.now());
|
||||
if (delay !== undefined) {
|
||||
return delay;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const rateLimitReset = headers.get("x-ratelimit-reset");
|
||||
if (rateLimitReset) {
|
||||
const resetSeconds = Number.parseInt(rateLimitReset, 10);
|
||||
if (!Number.isNaN(resetSeconds)) {
|
||||
const delay = normalizeDelay(resetSeconds * 1000 - Date.now());
|
||||
if (delay !== undefined) {
|
||||
return delay;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const rateLimitResetAfter = headers.get("x-ratelimit-reset-after");
|
||||
if (rateLimitResetAfter) {
|
||||
const resetAfterSeconds = Number(rateLimitResetAfter);
|
||||
if (Number.isFinite(resetAfterSeconds)) {
|
||||
const delay = normalizeDelay(resetAfterSeconds * 1000);
|
||||
if (delay !== undefined) {
|
||||
return delay;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Pattern 1: "Your quota will reset after ..." (formats: "18h31m10s", "10m15s", "6s", "39s")
|
||||
const durationMatch = errorText.match(/reset after (?:(\d+)h)?(?:(\d+)m)?(\d+(?:\.\d+)?)s/i);
|
||||
if (durationMatch) {
|
||||
|
|
@ -181,8 +234,9 @@ function extractRetryDelay(errorText: string): number | undefined {
|
|||
const seconds = parseFloat(durationMatch[3]);
|
||||
if (!Number.isNaN(seconds)) {
|
||||
const totalMs = ((hours * 60 + minutes) * 60 + seconds) * 1000;
|
||||
if (totalMs > 0) {
|
||||
return Math.ceil(totalMs + 1000); // Add 1s buffer
|
||||
const delay = normalizeDelay(totalMs);
|
||||
if (delay !== undefined) {
|
||||
return delay;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -193,7 +247,10 @@ function extractRetryDelay(errorText: string): number | undefined {
|
|||
const value = parseFloat(retryInMatch[1]);
|
||||
if (!Number.isNaN(value) && value > 0) {
|
||||
const ms = retryInMatch[2].toLowerCase() === "ms" ? value : value * 1000;
|
||||
return Math.ceil(ms + 1000);
|
||||
const delay = normalizeDelay(ms);
|
||||
if (delay !== undefined) {
|
||||
return delay;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -203,21 +260,45 @@ function extractRetryDelay(errorText: string): number | undefined {
|
|||
const value = parseFloat(retryDelayMatch[1]);
|
||||
if (!Number.isNaN(value) && value > 0) {
|
||||
const ms = retryDelayMatch[2].toLowerCase() === "ms" ? value : value * 1000;
|
||||
return Math.ceil(ms + 1000);
|
||||
const delay = normalizeDelay(ms);
|
||||
if (delay !== undefined) {
|
||||
return delay;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return undefined;
|
||||
}
|
||||
|
||||
function isClaudeThinkingModel(modelId: string): boolean {
|
||||
const normalized = modelId.toLowerCase();
|
||||
return normalized.includes("claude") && normalized.includes("thinking");
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if an error is retryable (rate limit, server error, etc.)
|
||||
* Check if an error is retryable (rate limit, server error, network error, etc.)
|
||||
*/
|
||||
function isRetryableError(status: number, errorText: string): boolean {
|
||||
if (status === 429 || status === 500 || status === 502 || status === 503 || status === 504) {
|
||||
return true;
|
||||
}
|
||||
return /resource.?exhausted|rate.?limit|overloaded|service.?unavailable/i.test(errorText);
|
||||
return /resource.?exhausted|rate.?limit|overloaded|service.?unavailable|other.?side.?closed/i.test(errorText);
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract a clean, user-friendly error message from Google API error response.
|
||||
* Parses JSON error responses and returns just the message field.
|
||||
*/
|
||||
function extractErrorMessage(errorText: string): string {
|
||||
try {
|
||||
const parsed = JSON.parse(errorText) as { error?: { message?: string } };
|
||||
if (parsed.error?.message) {
|
||||
return parsed.error.message;
|
||||
}
|
||||
} catch {
|
||||
// Not JSON, return as-is
|
||||
}
|
||||
return errorText;
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
@ -242,6 +323,7 @@ interface CloudCodeAssistRequest {
|
|||
model: string;
|
||||
request: {
|
||||
contents: Content[];
|
||||
sessionId?: string;
|
||||
systemInstruction?: { role?: string; parts: { text: string }[] };
|
||||
generationConfig?: {
|
||||
maxOutputTokens?: number;
|
||||
|
|
@ -339,17 +421,26 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
|
|||
throw new Error("Missing token or projectId in Google Cloud credentials. Use /login to re-authenticate.");
|
||||
}
|
||||
|
||||
const endpoint = model.baseUrl || DEFAULT_ENDPOINT;
|
||||
const url = `${endpoint}/v1internal:streamGenerateContent?alt=sse`;
|
||||
const isAntigravity = model.provider === "google-antigravity";
|
||||
const baseUrl = model.baseUrl?.trim();
|
||||
const endpoints = baseUrl ? [baseUrl] : isAntigravity ? ANTIGRAVITY_ENDPOINT_FALLBACKS : [DEFAULT_ENDPOINT];
|
||||
|
||||
// Use Antigravity headers for sandbox endpoint, otherwise Gemini CLI headers
|
||||
const isAntigravity = endpoint.includes("sandbox.googleapis.com");
|
||||
const requestBody = buildRequest(model, context, projectId, options, isAntigravity);
|
||||
const headers = isAntigravity ? ANTIGRAVITY_HEADERS : GEMINI_CLI_HEADERS;
|
||||
|
||||
const requestHeaders = {
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
"Content-Type": "application/json",
|
||||
Accept: "text/event-stream",
|
||||
...headers,
|
||||
...(isClaudeThinkingModel(model.id) ? { "anthropic-beta": CLAUDE_THINKING_BETA_HEADER } : {}),
|
||||
};
|
||||
const requestBodyJson = JSON.stringify(requestBody);
|
||||
|
||||
// Fetch with retry logic for rate limits and transient errors
|
||||
let response: Response | undefined;
|
||||
let lastError: Error | undefined;
|
||||
let requestUrl: string | undefined;
|
||||
|
||||
for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
|
||||
if (options?.signal?.aborted) {
|
||||
|
|
@ -357,15 +448,12 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
|
|||
}
|
||||
|
||||
try {
|
||||
response = await fetch(url, {
|
||||
const endpoint = endpoints[Math.min(attempt, endpoints.length - 1)];
|
||||
requestUrl = `${endpoint}/v1internal:streamGenerateContent?alt=sse`;
|
||||
response = await fetch(requestUrl, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
"Content-Type": "application/json",
|
||||
Accept: "text/event-stream",
|
||||
...headers,
|
||||
},
|
||||
body: JSON.stringify(requestBody),
|
||||
headers: requestHeaders,
|
||||
body: requestBodyJson,
|
||||
signal: options?.signal,
|
||||
});
|
||||
|
||||
|
|
@ -378,14 +466,14 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
|
|||
// Check if retryable
|
||||
if (attempt < MAX_RETRIES && isRetryableError(response.status, errorText)) {
|
||||
// Use server-provided delay or exponential backoff
|
||||
const serverDelay = extractRetryDelay(errorText);
|
||||
const serverDelay = extractRetryDelay(errorText, response);
|
||||
const delayMs = serverDelay ?? BASE_DELAY_MS * 2 ** attempt;
|
||||
await sleep(delayMs, options?.signal);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Not retryable or max retries exceeded
|
||||
throw new Error(`Cloud Code Assist API error (${response.status}): ${errorText}`);
|
||||
throw new Error(`Cloud Code Assist API error (${response.status}): ${extractErrorMessage(errorText)}`);
|
||||
} catch (error) {
|
||||
// Check for abort - fetch throws AbortError, our code throws "Request was aborted"
|
||||
if (error instanceof Error) {
|
||||
|
|
@ -393,7 +481,11 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
|
|||
throw new Error("Request was aborted");
|
||||
}
|
||||
}
|
||||
// Extract detailed error message from fetch errors (Node includes cause)
|
||||
lastError = error instanceof Error ? error : new Error(String(error));
|
||||
if (lastError.message === "fetch failed" && lastError.cause instanceof Error) {
|
||||
lastError = new Error(`Network error: ${lastError.cause.message}`);
|
||||
}
|
||||
// Network errors are retryable
|
||||
if (attempt < MAX_RETRIES) {
|
||||
const delayMs = BASE_DELAY_MS * 2 ** attempt;
|
||||
|
|
@ -408,73 +500,160 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
|
|||
throw lastError ?? new Error("Failed to get response after retries");
|
||||
}
|
||||
|
||||
if (!response.body) {
|
||||
throw new Error("No response body");
|
||||
}
|
||||
|
||||
stream.push({ type: "start", partial: output });
|
||||
|
||||
let currentBlock: TextContent | ThinkingContent | null = null;
|
||||
const blocks = output.content;
|
||||
const blockIndex = () => blocks.length - 1;
|
||||
|
||||
// Read SSE stream
|
||||
const reader = response.body.getReader();
|
||||
const decoder = new TextDecoder();
|
||||
let buffer = "";
|
||||
|
||||
// Set up abort handler to cancel reader when signal fires
|
||||
const abortHandler = () => {
|
||||
void reader.cancel().catch(() => {});
|
||||
let started = false;
|
||||
const ensureStarted = () => {
|
||||
if (!started) {
|
||||
stream.push({ type: "start", partial: output });
|
||||
started = true;
|
||||
}
|
||||
};
|
||||
options?.signal?.addEventListener("abort", abortHandler);
|
||||
|
||||
try {
|
||||
while (true) {
|
||||
// Check abort signal before each read
|
||||
if (options?.signal?.aborted) {
|
||||
throw new Error("Request was aborted");
|
||||
}
|
||||
const resetOutput = () => {
|
||||
output.content = [];
|
||||
output.usage = {
|
||||
input: 0,
|
||||
output: 0,
|
||||
cacheRead: 0,
|
||||
cacheWrite: 0,
|
||||
totalTokens: 0,
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
|
||||
};
|
||||
output.stopReason = "stop";
|
||||
output.errorMessage = undefined;
|
||||
output.timestamp = Date.now();
|
||||
started = false;
|
||||
};
|
||||
|
||||
const { done, value } = await reader.read();
|
||||
if (done) break;
|
||||
const streamResponse = async (activeResponse: Response): Promise<boolean> => {
|
||||
if (!activeResponse.body) {
|
||||
throw new Error("No response body");
|
||||
}
|
||||
|
||||
buffer += decoder.decode(value, { stream: true });
|
||||
const lines = buffer.split("\n");
|
||||
buffer = lines.pop() || "";
|
||||
let hasContent = false;
|
||||
let currentBlock: TextContent | ThinkingContent | null = null;
|
||||
const blocks = output.content;
|
||||
const blockIndex = () => blocks.length - 1;
|
||||
|
||||
for (const line of lines) {
|
||||
if (!line.startsWith("data:")) continue;
|
||||
// Read SSE stream
|
||||
const reader = activeResponse.body.getReader();
|
||||
const decoder = new TextDecoder();
|
||||
let buffer = "";
|
||||
|
||||
const jsonStr = line.slice(5).trim();
|
||||
if (!jsonStr) continue;
|
||||
// Set up abort handler to cancel reader when signal fires
|
||||
const abortHandler = () => {
|
||||
void reader.cancel().catch(() => {});
|
||||
};
|
||||
options?.signal?.addEventListener("abort", abortHandler);
|
||||
|
||||
let chunk: CloudCodeAssistResponseChunk;
|
||||
try {
|
||||
chunk = JSON.parse(jsonStr);
|
||||
} catch {
|
||||
continue;
|
||||
try {
|
||||
while (true) {
|
||||
// Check abort signal before each read
|
||||
if (options?.signal?.aborted) {
|
||||
throw new Error("Request was aborted");
|
||||
}
|
||||
|
||||
// Unwrap the response
|
||||
const responseData = chunk.response;
|
||||
if (!responseData) continue;
|
||||
const { done, value } = await reader.read();
|
||||
if (done) break;
|
||||
|
||||
const candidate = responseData.candidates?.[0];
|
||||
if (candidate?.content?.parts) {
|
||||
for (const part of candidate.content.parts) {
|
||||
if (part.text !== undefined) {
|
||||
const isThinking = isThinkingPart(part);
|
||||
if (
|
||||
!currentBlock ||
|
||||
(isThinking && currentBlock.type !== "thinking") ||
|
||||
(!isThinking && currentBlock.type !== "text")
|
||||
) {
|
||||
buffer += decoder.decode(value, { stream: true });
|
||||
const lines = buffer.split("\n");
|
||||
buffer = lines.pop() || "";
|
||||
|
||||
for (const line of lines) {
|
||||
if (!line.startsWith("data:")) continue;
|
||||
|
||||
const jsonStr = line.slice(5).trim();
|
||||
if (!jsonStr) continue;
|
||||
|
||||
let chunk: CloudCodeAssistResponseChunk;
|
||||
try {
|
||||
chunk = JSON.parse(jsonStr);
|
||||
} catch {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Unwrap the response
|
||||
const responseData = chunk.response;
|
||||
if (!responseData) continue;
|
||||
|
||||
const candidate = responseData.candidates?.[0];
|
||||
if (candidate?.content?.parts) {
|
||||
for (const part of candidate.content.parts) {
|
||||
if (part.text !== undefined) {
|
||||
hasContent = true;
|
||||
const isThinking = isThinkingPart(part);
|
||||
if (
|
||||
!currentBlock ||
|
||||
(isThinking && currentBlock.type !== "thinking") ||
|
||||
(!isThinking && currentBlock.type !== "text")
|
||||
) {
|
||||
if (currentBlock) {
|
||||
if (currentBlock.type === "text") {
|
||||
stream.push({
|
||||
type: "text_end",
|
||||
contentIndex: blocks.length - 1,
|
||||
content: currentBlock.text,
|
||||
partial: output,
|
||||
});
|
||||
} else {
|
||||
stream.push({
|
||||
type: "thinking_end",
|
||||
contentIndex: blockIndex(),
|
||||
content: currentBlock.thinking,
|
||||
partial: output,
|
||||
});
|
||||
}
|
||||
}
|
||||
if (isThinking) {
|
||||
currentBlock = { type: "thinking", thinking: "", thinkingSignature: undefined };
|
||||
output.content.push(currentBlock);
|
||||
ensureStarted();
|
||||
stream.push({
|
||||
type: "thinking_start",
|
||||
contentIndex: blockIndex(),
|
||||
partial: output,
|
||||
});
|
||||
} else {
|
||||
currentBlock = { type: "text", text: "" };
|
||||
output.content.push(currentBlock);
|
||||
ensureStarted();
|
||||
stream.push({ type: "text_start", contentIndex: blockIndex(), partial: output });
|
||||
}
|
||||
}
|
||||
if (currentBlock.type === "thinking") {
|
||||
currentBlock.thinking += part.text;
|
||||
currentBlock.thinkingSignature = retainThoughtSignature(
|
||||
currentBlock.thinkingSignature,
|
||||
part.thoughtSignature,
|
||||
);
|
||||
stream.push({
|
||||
type: "thinking_delta",
|
||||
contentIndex: blockIndex(),
|
||||
delta: part.text,
|
||||
partial: output,
|
||||
});
|
||||
} else {
|
||||
currentBlock.text += part.text;
|
||||
currentBlock.textSignature = retainThoughtSignature(
|
||||
currentBlock.textSignature,
|
||||
part.thoughtSignature,
|
||||
);
|
||||
stream.push({
|
||||
type: "text_delta",
|
||||
contentIndex: blockIndex(),
|
||||
delta: part.text,
|
||||
partial: output,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
if (part.functionCall) {
|
||||
hasContent = true;
|
||||
if (currentBlock) {
|
||||
if (currentBlock.type === "text") {
|
||||
stream.push({
|
||||
type: "text_end",
|
||||
contentIndex: blocks.length - 1,
|
||||
contentIndex: blockIndex(),
|
||||
content: currentBlock.text,
|
||||
partial: output,
|
||||
});
|
||||
|
|
@ -486,143 +665,142 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
|
|||
partial: output,
|
||||
});
|
||||
}
|
||||
currentBlock = null;
|
||||
}
|
||||
if (isThinking) {
|
||||
currentBlock = { type: "thinking", thinking: "", thinkingSignature: undefined };
|
||||
output.content.push(currentBlock);
|
||||
stream.push({ type: "thinking_start", contentIndex: blockIndex(), partial: output });
|
||||
} else {
|
||||
currentBlock = { type: "text", text: "" };
|
||||
output.content.push(currentBlock);
|
||||
stream.push({ type: "text_start", contentIndex: blockIndex(), partial: output });
|
||||
}
|
||||
}
|
||||
if (currentBlock.type === "thinking") {
|
||||
currentBlock.thinking += part.text;
|
||||
currentBlock.thinkingSignature = retainThoughtSignature(
|
||||
currentBlock.thinkingSignature,
|
||||
part.thoughtSignature,
|
||||
);
|
||||
|
||||
const providedId = part.functionCall.id;
|
||||
const needsNewId =
|
||||
!providedId ||
|
||||
output.content.some((b) => b.type === "toolCall" && b.id === providedId);
|
||||
const toolCallId = needsNewId
|
||||
? `${part.functionCall.name}_${Date.now()}_${++toolCallCounter}`
|
||||
: providedId;
|
||||
|
||||
const toolCall: ToolCall = {
|
||||
type: "toolCall",
|
||||
id: toolCallId,
|
||||
name: part.functionCall.name || "",
|
||||
arguments: part.functionCall.args as Record<string, unknown>,
|
||||
...(part.thoughtSignature && { thoughtSignature: part.thoughtSignature }),
|
||||
};
|
||||
|
||||
output.content.push(toolCall);
|
||||
ensureStarted();
|
||||
stream.push({ type: "toolcall_start", contentIndex: blockIndex(), partial: output });
|
||||
stream.push({
|
||||
type: "thinking_delta",
|
||||
type: "toolcall_delta",
|
||||
contentIndex: blockIndex(),
|
||||
delta: part.text,
|
||||
delta: JSON.stringify(toolCall.arguments),
|
||||
partial: output,
|
||||
});
|
||||
} else {
|
||||
currentBlock.text += part.text;
|
||||
currentBlock.textSignature = retainThoughtSignature(
|
||||
currentBlock.textSignature,
|
||||
part.thoughtSignature,
|
||||
);
|
||||
stream.push({
|
||||
type: "text_delta",
|
||||
type: "toolcall_end",
|
||||
contentIndex: blockIndex(),
|
||||
delta: part.text,
|
||||
toolCall,
|
||||
partial: output,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (part.functionCall) {
|
||||
if (currentBlock) {
|
||||
if (currentBlock.type === "text") {
|
||||
stream.push({
|
||||
type: "text_end",
|
||||
contentIndex: blockIndex(),
|
||||
content: currentBlock.text,
|
||||
partial: output,
|
||||
});
|
||||
} else {
|
||||
stream.push({
|
||||
type: "thinking_end",
|
||||
contentIndex: blockIndex(),
|
||||
content: currentBlock.thinking,
|
||||
partial: output,
|
||||
});
|
||||
}
|
||||
currentBlock = null;
|
||||
}
|
||||
|
||||
const providedId = part.functionCall.id;
|
||||
const needsNewId =
|
||||
!providedId || output.content.some((b) => b.type === "toolCall" && b.id === providedId);
|
||||
const toolCallId = needsNewId
|
||||
? `${part.functionCall.name}_${Date.now()}_${++toolCallCounter}`
|
||||
: providedId;
|
||||
|
||||
const toolCall: ToolCall = {
|
||||
type: "toolCall",
|
||||
id: toolCallId,
|
||||
name: part.functionCall.name || "",
|
||||
arguments: part.functionCall.args as Record<string, unknown>,
|
||||
...(part.thoughtSignature && { thoughtSignature: part.thoughtSignature }),
|
||||
};
|
||||
|
||||
output.content.push(toolCall);
|
||||
stream.push({ type: "toolcall_start", contentIndex: blockIndex(), partial: output });
|
||||
stream.push({
|
||||
type: "toolcall_delta",
|
||||
contentIndex: blockIndex(),
|
||||
delta: JSON.stringify(toolCall.arguments),
|
||||
partial: output,
|
||||
});
|
||||
stream.push({ type: "toolcall_end", contentIndex: blockIndex(), toolCall, partial: output });
|
||||
if (candidate?.finishReason) {
|
||||
output.stopReason = mapStopReasonString(candidate.finishReason);
|
||||
if (output.content.some((b) => b.type === "toolCall")) {
|
||||
output.stopReason = "toolUse";
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (candidate?.finishReason) {
|
||||
output.stopReason = mapStopReasonString(candidate.finishReason);
|
||||
if (output.content.some((b) => b.type === "toolCall")) {
|
||||
output.stopReason = "toolUse";
|
||||
}
|
||||
}
|
||||
|
||||
if (responseData.usageMetadata) {
|
||||
// promptTokenCount includes cachedContentTokenCount, so subtract to get fresh input
|
||||
const promptTokens = responseData.usageMetadata.promptTokenCount || 0;
|
||||
const cacheReadTokens = responseData.usageMetadata.cachedContentTokenCount || 0;
|
||||
output.usage = {
|
||||
input: promptTokens - cacheReadTokens,
|
||||
output:
|
||||
(responseData.usageMetadata.candidatesTokenCount || 0) +
|
||||
(responseData.usageMetadata.thoughtsTokenCount || 0),
|
||||
cacheRead: cacheReadTokens,
|
||||
cacheWrite: 0,
|
||||
totalTokens: responseData.usageMetadata.totalTokenCount || 0,
|
||||
cost: {
|
||||
input: 0,
|
||||
output: 0,
|
||||
cacheRead: 0,
|
||||
if (responseData.usageMetadata) {
|
||||
// promptTokenCount includes cachedContentTokenCount, so subtract to get fresh input
|
||||
const promptTokens = responseData.usageMetadata.promptTokenCount || 0;
|
||||
const cacheReadTokens = responseData.usageMetadata.cachedContentTokenCount || 0;
|
||||
output.usage = {
|
||||
input: promptTokens - cacheReadTokens,
|
||||
output:
|
||||
(responseData.usageMetadata.candidatesTokenCount || 0) +
|
||||
(responseData.usageMetadata.thoughtsTokenCount || 0),
|
||||
cacheRead: cacheReadTokens,
|
||||
cacheWrite: 0,
|
||||
total: 0,
|
||||
},
|
||||
};
|
||||
calculateCost(model, output.usage);
|
||||
totalTokens: responseData.usageMetadata.totalTokenCount || 0,
|
||||
cost: {
|
||||
input: 0,
|
||||
output: 0,
|
||||
cacheRead: 0,
|
||||
cacheWrite: 0,
|
||||
total: 0,
|
||||
},
|
||||
};
|
||||
calculateCost(model, output.usage);
|
||||
}
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
options?.signal?.removeEventListener("abort", abortHandler);
|
||||
}
|
||||
|
||||
if (currentBlock) {
|
||||
if (currentBlock.type === "text") {
|
||||
stream.push({
|
||||
type: "text_end",
|
||||
contentIndex: blockIndex(),
|
||||
content: currentBlock.text,
|
||||
partial: output,
|
||||
});
|
||||
} else {
|
||||
stream.push({
|
||||
type: "thinking_end",
|
||||
contentIndex: blockIndex(),
|
||||
content: currentBlock.thinking,
|
||||
partial: output,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return hasContent;
|
||||
};
|
||||
|
||||
let receivedContent = false;
|
||||
let currentResponse = response;
|
||||
|
||||
for (let emptyAttempt = 0; emptyAttempt <= MAX_EMPTY_STREAM_RETRIES; emptyAttempt++) {
|
||||
if (options?.signal?.aborted) {
|
||||
throw new Error("Request was aborted");
|
||||
}
|
||||
|
||||
if (emptyAttempt > 0) {
|
||||
const backoffMs = EMPTY_STREAM_BASE_DELAY_MS * 2 ** (emptyAttempt - 1);
|
||||
await sleep(backoffMs, options?.signal);
|
||||
|
||||
if (!requestUrl) {
|
||||
throw new Error("Missing request URL");
|
||||
}
|
||||
|
||||
currentResponse = await fetch(requestUrl, {
|
||||
method: "POST",
|
||||
headers: requestHeaders,
|
||||
body: requestBodyJson,
|
||||
signal: options?.signal,
|
||||
});
|
||||
|
||||
if (!currentResponse.ok) {
|
||||
const retryErrorText = await currentResponse.text();
|
||||
throw new Error(`Cloud Code Assist API error (${currentResponse.status}): ${retryErrorText}`);
|
||||
}
|
||||
}
|
||||
|
||||
const streamed = await streamResponse(currentResponse);
|
||||
if (streamed) {
|
||||
receivedContent = true;
|
||||
break;
|
||||
}
|
||||
|
||||
if (emptyAttempt < MAX_EMPTY_STREAM_RETRIES) {
|
||||
resetOutput();
|
||||
}
|
||||
} finally {
|
||||
options?.signal?.removeEventListener("abort", abortHandler);
|
||||
}
|
||||
|
||||
if (currentBlock) {
|
||||
if (currentBlock.type === "text") {
|
||||
stream.push({
|
||||
type: "text_end",
|
||||
contentIndex: blockIndex(),
|
||||
content: currentBlock.text,
|
||||
partial: output,
|
||||
});
|
||||
} else {
|
||||
stream.push({
|
||||
type: "thinking_end",
|
||||
contentIndex: blockIndex(),
|
||||
content: currentBlock.thinking,
|
||||
partial: output,
|
||||
});
|
||||
}
|
||||
if (!receivedContent) {
|
||||
throw new Error("Cloud Code Assist API returned an empty response");
|
||||
}
|
||||
|
||||
if (options?.signal?.aborted) {
|
||||
|
|
@ -651,7 +829,34 @@ export const streamGoogleGeminiCli: StreamFunction<"google-gemini-cli"> = (
|
|||
return stream;
|
||||
};
|
||||
|
||||
function buildRequest(
|
||||
function deriveSessionId(context: Context): string | undefined {
|
||||
for (const message of context.messages) {
|
||||
if (message.role !== "user") {
|
||||
continue;
|
||||
}
|
||||
|
||||
let text = "";
|
||||
if (typeof message.content === "string") {
|
||||
text = message.content;
|
||||
} else if (Array.isArray(message.content)) {
|
||||
text = message.content
|
||||
.filter((item): item is TextContent => item.type === "text")
|
||||
.map((item) => item.text)
|
||||
.join("\n");
|
||||
}
|
||||
|
||||
if (!text || text.trim().length === 0) {
|
||||
return undefined;
|
||||
}
|
||||
|
||||
const hash = createHash("sha256").update(text).digest("hex");
|
||||
return hash.slice(0, 32);
|
||||
}
|
||||
|
||||
return undefined;
|
||||
}
|
||||
|
||||
export function buildRequest(
|
||||
model: Model<"google-gemini-cli">,
|
||||
context: Context,
|
||||
projectId: string,
|
||||
|
|
@ -686,6 +891,11 @@ function buildRequest(
|
|||
contents,
|
||||
};
|
||||
|
||||
const sessionId = deriveSessionId(context);
|
||||
if (sessionId) {
|
||||
request.sessionId = sessionId;
|
||||
}
|
||||
|
||||
// System instruction must be object with parts, not plain string
|
||||
if (context.systemPrompt) {
|
||||
request.systemInstruction = {
|
||||
|
|
|
|||
|
|
@ -365,6 +365,7 @@ function createClient(model: Model<"openai-completions">, context: Context, apiK
|
|||
function buildParams(model: Model<"openai-completions">, context: Context, options?: OpenAICompletionsOptions) {
|
||||
const compat = getCompat(model);
|
||||
const messages = convertMessages(model, context, compat);
|
||||
maybeAddOpenRouterAnthropicCacheControl(model, messages);
|
||||
|
||||
const params: OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming = {
|
||||
model: model.id,
|
||||
|
|
@ -403,13 +404,51 @@ function buildParams(model: Model<"openai-completions">, context: Context, optio
|
|||
params.tool_choice = options.toolChoice;
|
||||
}
|
||||
|
||||
if (options?.reasoningEffort && model.reasoning && compat.supportsReasoningEffort) {
|
||||
if (compat.thinkingFormat === "zai" && model.reasoning) {
|
||||
// Z.ai uses binary thinking: { type: "enabled" | "disabled" }
|
||||
// Must explicitly disable since z.ai defaults to thinking enabled
|
||||
(params as any).thinking = { type: options?.reasoningEffort ? "enabled" : "disabled" };
|
||||
} else if (options?.reasoningEffort && model.reasoning && compat.supportsReasoningEffort) {
|
||||
// OpenAI-style reasoning_effort
|
||||
params.reasoning_effort = options.reasoningEffort;
|
||||
}
|
||||
|
||||
return params;
|
||||
}
|
||||
|
||||
function maybeAddOpenRouterAnthropicCacheControl(
|
||||
model: Model<"openai-completions">,
|
||||
messages: ChatCompletionMessageParam[],
|
||||
): void {
|
||||
if (model.provider !== "openrouter" || !model.id.startsWith("anthropic/")) return;
|
||||
|
||||
// Anthropic-style caching requires cache_control on a text part. Add a breakpoint
|
||||
// on the last user/assistant message (walking backwards until we find text content).
|
||||
for (let i = messages.length - 1; i >= 0; i--) {
|
||||
const msg = messages[i];
|
||||
if (msg.role !== "user" && msg.role !== "assistant") continue;
|
||||
|
||||
const content = msg.content;
|
||||
if (typeof content === "string") {
|
||||
msg.content = [
|
||||
Object.assign({ type: "text" as const, text: content }, { cache_control: { type: "ephemeral" } }),
|
||||
];
|
||||
return;
|
||||
}
|
||||
|
||||
if (!Array.isArray(content)) continue;
|
||||
|
||||
// Find last text part and add cache_control
|
||||
for (let j = content.length - 1; j >= 0; j--) {
|
||||
const part = content[j];
|
||||
if (part?.type === "text") {
|
||||
Object.assign(part, { cache_control: { type: "ephemeral" } });
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function convertMessages(
|
||||
model: Model<"openai-completions">,
|
||||
context: Context,
|
||||
|
|
@ -644,11 +683,14 @@ function mapStopReason(reason: ChatCompletionChunk.Choice["finish_reason"]): Sto
|
|||
* Returns a fully resolved OpenAICompat object with all fields set.
|
||||
*/
|
||||
function detectCompatFromUrl(baseUrl: string): Required<OpenAICompat> {
|
||||
const isZai = baseUrl.includes("api.z.ai");
|
||||
|
||||
const isNonStandard =
|
||||
baseUrl.includes("cerebras.ai") ||
|
||||
baseUrl.includes("api.x.ai") ||
|
||||
baseUrl.includes("mistral.ai") ||
|
||||
baseUrl.includes("chutes.ai");
|
||||
baseUrl.includes("chutes.ai") ||
|
||||
isZai;
|
||||
|
||||
const useMaxTokens = baseUrl.includes("mistral.ai") || baseUrl.includes("chutes.ai");
|
||||
|
||||
|
|
@ -659,13 +701,14 @@ function detectCompatFromUrl(baseUrl: string): Required<OpenAICompat> {
|
|||
return {
|
||||
supportsStore: !isNonStandard,
|
||||
supportsDeveloperRole: !isNonStandard,
|
||||
supportsReasoningEffort: !isGrok,
|
||||
supportsReasoningEffort: !isGrok && !isZai,
|
||||
supportsUsageInStreaming: true,
|
||||
maxTokensField: useMaxTokens ? "max_tokens" : "max_completion_tokens",
|
||||
requiresToolResultName: isMistral,
|
||||
requiresAssistantAfterToolResult: false, // Mistral no longer requires this as of Dec 2024
|
||||
requiresThinkingAsText: isMistral,
|
||||
requiresMistralToolIds: isMistral,
|
||||
thinkingFormat: isZai ? "zai" : "openai",
|
||||
};
|
||||
}
|
||||
|
||||
|
|
@ -688,5 +731,6 @@ function getCompat(model: Model<"openai-completions">): Required<OpenAICompat> {
|
|||
model.compat.requiresAssistantAfterToolResult ?? detected.requiresAssistantAfterToolResult,
|
||||
requiresThinkingAsText: model.compat.requiresThinkingAsText ?? detected.requiresThinkingAsText,
|
||||
requiresMistralToolIds: model.compat.requiresMistralToolIds ?? detected.requiresMistralToolIds,
|
||||
thinkingFormat: model.compat.thinkingFormat ?? detected.thinkingFormat,
|
||||
};
|
||||
}
|
||||
|
|
|
|||
|
|
@ -24,6 +24,7 @@ import type {
|
|||
ThinkingContent,
|
||||
Tool,
|
||||
ToolCall,
|
||||
Usage,
|
||||
} from "../types.js";
|
||||
import { AssistantMessageEventStream } from "../utils/event-stream.js";
|
||||
import { parseStreamingJson } from "../utils/json-parse.js";
|
||||
|
|
@ -48,6 +49,7 @@ function shortHash(str: string): string {
|
|||
export interface OpenAIResponsesOptions extends StreamOptions {
|
||||
reasoningEffort?: "minimal" | "low" | "medium" | "high" | "xhigh";
|
||||
reasoningSummary?: "auto" | "detailed" | "concise" | null;
|
||||
serviceTier?: ResponseCreateParamsStreaming["service_tier"];
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
@ -85,7 +87,7 @@ export const streamOpenAIResponses: StreamFunction<"openai-responses"> = (
|
|||
const apiKey = options?.apiKey || getEnvApiKey(model.provider) || "";
|
||||
const client = createClient(model, context, apiKey);
|
||||
const params = buildParams(model, context, options);
|
||||
const openaiStream = await client.responses.create(params, { signal: options?.signal });
|
||||
const openaiStream = await client.responses.create(params, { signal: options?.signal, timeout: undefined });
|
||||
stream.push({ type: "start", partial: output });
|
||||
|
||||
let currentItem: ResponseReasoningItem | ResponseOutputMessage | ResponseFunctionToolCall | null = null;
|
||||
|
|
@ -276,6 +278,7 @@ export const streamOpenAIResponses: StreamFunction<"openai-responses"> = (
|
|||
};
|
||||
}
|
||||
calculateCost(model, output.usage);
|
||||
applyServiceTierPricing(output.usage, response?.service_tier ?? options?.serviceTier);
|
||||
// Map status to stop reason
|
||||
output.stopReason = mapStopReason(response?.status);
|
||||
if (output.content.some((b) => b.type === "toolCall") && output.stopReason === "stop") {
|
||||
|
|
@ -363,6 +366,7 @@ function buildParams(model: Model<"openai-responses">, context: Context, options
|
|||
model: model.id,
|
||||
input: messages,
|
||||
stream: true,
|
||||
prompt_cache_key: options?.sessionId,
|
||||
};
|
||||
|
||||
if (options?.maxTokens) {
|
||||
|
|
@ -373,6 +377,10 @@ function buildParams(model: Model<"openai-responses">, context: Context, options
|
|||
params.temperature = options?.temperature;
|
||||
}
|
||||
|
||||
if (options?.serviceTier !== undefined) {
|
||||
params.service_tier = options.serviceTier;
|
||||
}
|
||||
|
||||
if (context.tools) {
|
||||
params.tools = convertTools(context.tools);
|
||||
}
|
||||
|
|
@ -547,6 +555,28 @@ function convertTools(tools: Tool[]): OpenAITool[] {
|
|||
}));
|
||||
}
|
||||
|
||||
function getServiceTierCostMultiplier(serviceTier: ResponseCreateParamsStreaming["service_tier"] | undefined): number {
|
||||
switch (serviceTier) {
|
||||
case "flex":
|
||||
return 0.5;
|
||||
case "priority":
|
||||
return 2;
|
||||
default:
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
function applyServiceTierPricing(usage: Usage, serviceTier: ResponseCreateParamsStreaming["service_tier"] | undefined) {
|
||||
const multiplier = getServiceTierCostMultiplier(serviceTier);
|
||||
if (multiplier === 1) return;
|
||||
|
||||
usage.cost.input *= multiplier;
|
||||
usage.cost.output *= multiplier;
|
||||
usage.cost.cacheRead *= multiplier;
|
||||
usage.cost.cacheWrite *= multiplier;
|
||||
usage.cost.total = usage.cost.input + usage.cost.output + usage.cost.cacheRead + usage.cost.cacheWrite;
|
||||
}
|
||||
|
||||
function mapStopReason(status: OpenAI.Responses.ResponseStatus | undefined): StopReason {
|
||||
if (!status) return "stop";
|
||||
switch (status) {
|
||||
|
|
|
|||
|
|
@ -1,11 +1,11 @@
|
|||
import type { Api, AssistantMessage, Message, Model, ToolCall, ToolResultMessage } from "../types.js";
|
||||
|
||||
/**
|
||||
* Normalize tool call ID for GitHub Copilot cross-API compatibility.
|
||||
* Normalize tool call ID for cross-provider compatibility.
|
||||
* OpenAI Responses API generates IDs that are 450+ chars with special characters like `|`.
|
||||
* Other APIs (Claude, etc.) require max 40 chars and only alphanumeric + underscore + hyphen.
|
||||
* Anthropic APIs require IDs matching ^[a-zA-Z0-9_-]+$ (max 64 chars).
|
||||
*/
|
||||
function normalizeCopilotToolCallId(id: string): string {
|
||||
function normalizeToolCallId(id: string): string {
|
||||
return id.replace(/[^a-zA-Z0-9_-]/g, "").slice(0, 40);
|
||||
}
|
||||
|
||||
|
|
@ -38,11 +38,17 @@ export function transformMessages<TApi extends Api>(messages: Message[], model:
|
|||
return msg;
|
||||
}
|
||||
|
||||
// Check if we need to normalize tool call IDs (github-copilot cross-API)
|
||||
const needsToolCallIdNormalization =
|
||||
// Check if we need to normalize tool call IDs
|
||||
// Anthropic APIs require IDs matching ^[a-zA-Z0-9_-]+$ (max 64 chars)
|
||||
// OpenAI Responses API generates IDs with `|` and 450+ chars
|
||||
// GitHub Copilot routes to Anthropic for Claude models
|
||||
const targetRequiresStrictIds = model.api === "anthropic-messages" || model.provider === "github-copilot";
|
||||
const crossProviderSwitch = assistantMsg.provider !== model.provider;
|
||||
const copilotCrossApiSwitch =
|
||||
assistantMsg.provider === "github-copilot" &&
|
||||
model.provider === "github-copilot" &&
|
||||
assistantMsg.api !== model.api;
|
||||
const needsToolCallIdNormalization = targetRequiresStrictIds && (crossProviderSwitch || copilotCrossApiSwitch);
|
||||
|
||||
// Transform message from different provider/model
|
||||
const transformedContent = assistantMsg.content.flatMap((block) => {
|
||||
|
|
@ -54,10 +60,10 @@ export function transformMessages<TApi extends Api>(messages: Message[], model:
|
|||
text: block.thinking,
|
||||
};
|
||||
}
|
||||
// Normalize tool call IDs for github-copilot cross-API switches
|
||||
// Normalize tool call IDs when target API requires strict format
|
||||
if (block.type === "toolCall" && needsToolCallIdNormalization) {
|
||||
const toolCall = block as ToolCall;
|
||||
const normalizedId = normalizeCopilotToolCallId(toolCall.id);
|
||||
const normalizedId = normalizeToolCallId(toolCall.id);
|
||||
if (normalizedId !== toolCall.id) {
|
||||
toolCallIdMap.set(toolCall.id, normalizedId);
|
||||
return { ...toolCall, id: normalizedId };
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@ import { existsSync } from "node:fs";
|
|||
import { homedir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
import { supportsXhigh } from "./models.js";
|
||||
import { type BedrockOptions, streamBedrock } from "./providers/amazon-bedrock.js";
|
||||
import { type AnthropicOptions, streamAnthropic } from "./providers/anthropic.js";
|
||||
import { type GoogleOptions, streamGoogle } from "./providers/google.js";
|
||||
import {
|
||||
|
|
@ -74,6 +75,20 @@ export function getEnvApiKey(provider: any): string | undefined {
|
|||
}
|
||||
}
|
||||
|
||||
if (provider === "amazon-bedrock") {
|
||||
// Amazon Bedrock supports multiple credential sources:
|
||||
// 1. AWS_PROFILE - named profile from ~/.aws/credentials
|
||||
// 2. AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY - standard IAM keys
|
||||
// 3. AWS_BEARER_TOKEN_BEDROCK - Bedrock API keys (bearer token)
|
||||
if (
|
||||
process.env.AWS_PROFILE ||
|
||||
(process.env.AWS_ACCESS_KEY_ID && process.env.AWS_SECRET_ACCESS_KEY) ||
|
||||
process.env.AWS_BEARER_TOKEN_BEDROCK
|
||||
) {
|
||||
return "<authenticated>";
|
||||
}
|
||||
}
|
||||
|
||||
const envMap: Record<string, string> = {
|
||||
openai: "OPENAI_API_KEY",
|
||||
google: "GEMINI_API_KEY",
|
||||
|
|
@ -81,8 +96,10 @@ export function getEnvApiKey(provider: any): string | undefined {
|
|||
cerebras: "CEREBRAS_API_KEY",
|
||||
xai: "XAI_API_KEY",
|
||||
openrouter: "OPENROUTER_API_KEY",
|
||||
"vercel-ai-gateway": "AI_GATEWAY_API_KEY",
|
||||
zai: "ZAI_API_KEY",
|
||||
mistral: "MISTRAL_API_KEY",
|
||||
minimax: "MINIMAX_API_KEY",
|
||||
opencode: "OPENCODE_API_KEY",
|
||||
};
|
||||
|
||||
|
|
@ -98,6 +115,9 @@ export function stream<TApi extends Api>(
|
|||
// Vertex AI uses Application Default Credentials, not API keys
|
||||
if (model.api === "google-vertex") {
|
||||
return streamGoogleVertex(model as Model<"google-vertex">, context, options as GoogleVertexOptions);
|
||||
} else if (model.api === "bedrock-converse-stream") {
|
||||
// Bedrock doesn't have any API keys instead it sources credentials from standard AWS env variables or from given AWS profile.
|
||||
return streamBedrock(model as Model<"bedrock-converse-stream">, context, (options || {}) as BedrockOptions);
|
||||
}
|
||||
|
||||
const apiKey = options?.apiKey || getEnvApiKey(model.provider);
|
||||
|
|
@ -156,6 +176,10 @@ export function streamSimple<TApi extends Api>(
|
|||
if (model.api === "google-vertex") {
|
||||
const providerOptions = mapOptionsForApi(model, options, undefined);
|
||||
return stream(model, context, providerOptions);
|
||||
} else if (model.api === "bedrock-converse-stream") {
|
||||
// Bedrock doesn't have any API keys instead it sources credentials from standard AWS env variables or from given AWS profile.
|
||||
const providerOptions = mapOptionsForApi(model, options, undefined);
|
||||
return stream(model, context, providerOptions);
|
||||
}
|
||||
|
||||
const apiKey = options?.apiKey || getEnvApiKey(model.provider);
|
||||
|
|
@ -228,6 +252,13 @@ function mapOptionsForApi<TApi extends Api>(
|
|||
} satisfies AnthropicOptions;
|
||||
}
|
||||
|
||||
case "bedrock-converse-stream":
|
||||
return {
|
||||
...base,
|
||||
reasoning: options?.reasoning,
|
||||
thinkingBudgets: options?.thinkingBudgets,
|
||||
} satisfies BedrockOptions;
|
||||
|
||||
case "openai-completions":
|
||||
return {
|
||||
...base,
|
||||
|
|
|
|||
|
|
@ -1,3 +1,4 @@
|
|||
import type { BedrockOptions } from "./providers/amazon-bedrock.js";
|
||||
import type { AnthropicOptions } from "./providers/anthropic.js";
|
||||
import type { GoogleOptions } from "./providers/google.js";
|
||||
import type { GoogleGeminiCliOptions } from "./providers/google-gemini-cli.js";
|
||||
|
|
@ -14,12 +15,14 @@ export type Api =
|
|||
| "openai-responses"
|
||||
| "openai-codex-responses"
|
||||
| "anthropic-messages"
|
||||
| "bedrock-converse-stream"
|
||||
| "google-generative-ai"
|
||||
| "google-gemini-cli"
|
||||
| "google-vertex";
|
||||
|
||||
export interface ApiOptionsMap {
|
||||
"anthropic-messages": AnthropicOptions;
|
||||
"bedrock-converse-stream": BedrockOptions;
|
||||
"openai-completions": OpenAICompletionsOptions;
|
||||
"openai-responses": OpenAIResponsesOptions;
|
||||
"openai-codex-responses": OpenAICodexResponsesOptions;
|
||||
|
|
@ -40,6 +43,7 @@ const _exhaustive: _CheckExhaustive = true;
|
|||
export type OptionsForApi<TApi extends Api> = ApiOptionsMap[TApi];
|
||||
|
||||
export type KnownProvider =
|
||||
| "amazon-bedrock"
|
||||
| "anthropic"
|
||||
| "google"
|
||||
| "google-gemini-cli"
|
||||
|
|
@ -52,8 +56,10 @@ export type KnownProvider =
|
|||
| "groq"
|
||||
| "cerebras"
|
||||
| "openrouter"
|
||||
| "vercel-ai-gateway"
|
||||
| "zai"
|
||||
| "mistral"
|
||||
| "minimax"
|
||||
| "opencode";
|
||||
export type Provider = KnownProvider | string;
|
||||
|
||||
|
|
@ -219,6 +225,8 @@ export interface OpenAICompat {
|
|||
requiresThinkingAsText?: boolean;
|
||||
/** Whether tool call IDs must be normalized to Mistral format (exactly 9 alphanumeric chars). Default: auto-detected from URL. */
|
||||
requiresMistralToolIds?: boolean;
|
||||
/** Format for reasoning/thinking parameter. "openai" uses reasoning_effort, "zai" uses thinking: { type: "enabled" }. Default: "openai". */
|
||||
thinkingFormat?: "openai" | "zai";
|
||||
}
|
||||
|
||||
// Model interface for the unified model system
|
||||
|
|
|
|||
|
|
@ -17,6 +17,7 @@ import type { AssistantMessage } from "../types.js";
|
|||
* - llama.cpp: "the request exceeds the available context size, try increasing it"
|
||||
* - LM Studio: "tokens to keep from the initial prompt is greater than the context length"
|
||||
* - GitHub Copilot: "prompt token count of X exceeds the limit of Y"
|
||||
* - MiniMax: "invalid params, context window exceeds limit"
|
||||
* - Cerebras: Returns "400 status code (no body)" - handled separately below
|
||||
* - Mistral: Returns "400 status code (no body)" - handled separately below
|
||||
* - z.ai: Does NOT error, accepts overflow silently - handled via usage.input > contextWindow
|
||||
|
|
@ -24,6 +25,7 @@ import type { AssistantMessage } from "../types.js";
|
|||
*/
|
||||
const OVERFLOW_PATTERNS = [
|
||||
/prompt is too long/i, // Anthropic
|
||||
/input is too long for requested model/i, // Amazon Bedrock
|
||||
/exceeds the context window/i, // OpenAI (Completions & Responses API)
|
||||
/input token count.*exceeds the maximum/i, // Google (Gemini)
|
||||
/maximum prompt length is \d+/i, // xAI (Grok)
|
||||
|
|
@ -32,6 +34,7 @@ const OVERFLOW_PATTERNS = [
|
|||
/exceeds the limit of \d+/i, // GitHub Copilot
|
||||
/exceeds the available context size/i, // llama.cpp server
|
||||
/greater than the context length/i, // LM Studio
|
||||
/context window exceeds limit/i, // MiniMax
|
||||
/context[_ ]length[_ ]exceeded/i, // Generic fallback
|
||||
/too many tokens/i, // Generic fallback
|
||||
/token limit exceeded/i, // Generic fallback
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
|
|||
import { getModel } from "../src/models.js";
|
||||
import { complete, stream } from "../src/stream.js";
|
||||
import type { Api, Context, Model, OptionsForApi } from "../src/types.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
import { resolveApiKey } from "./oauth.js";
|
||||
|
||||
// Resolve OAuth tokens at module level (async, runs before tests)
|
||||
|
|
@ -66,6 +67,35 @@ async function testImmediateAbort<TApi extends Api>(llm: Model<TApi>, options: O
|
|||
expect(response.stopReason).toBe("aborted");
|
||||
}
|
||||
|
||||
async function testAbortThenNewMessage<TApi extends Api>(llm: Model<TApi>, options: OptionsForApi<TApi> = {}) {
|
||||
// First request: abort immediately before any response content arrives
|
||||
const controller = new AbortController();
|
||||
controller.abort();
|
||||
|
||||
const context: Context = {
|
||||
messages: [{ role: "user", content: "Hello, how are you?", timestamp: Date.now() }],
|
||||
};
|
||||
|
||||
const abortedResponse = await complete(llm, context, { ...options, signal: controller.signal });
|
||||
expect(abortedResponse.stopReason).toBe("aborted");
|
||||
// The aborted message has empty content since we aborted before anything arrived
|
||||
expect(abortedResponse.content.length).toBe(0);
|
||||
|
||||
// Add the aborted assistant message to context (this is what happens in the real coding agent)
|
||||
context.messages.push(abortedResponse);
|
||||
|
||||
// Second request: send a new message - this should work even with the aborted message in context
|
||||
context.messages.push({
|
||||
role: "user",
|
||||
content: "What is 2 + 2?",
|
||||
timestamp: Date.now(),
|
||||
});
|
||||
|
||||
const followUp = await complete(llm, context, options);
|
||||
expect(followUp.stopReason).toBe("stop");
|
||||
expect(followUp.content.length).toBeGreaterThan(0);
|
||||
}
|
||||
|
||||
describe("AI Providers Abort Tests", () => {
|
||||
describe.skipIf(!process.env.GEMINI_API_KEY)("Google Provider Abort", () => {
|
||||
const llm = getModel("google", "gemini-2.5-flash");
|
||||
|
|
@ -130,6 +160,30 @@ describe("AI Providers Abort Tests", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider Abort", () => {
|
||||
const llm = getModel("minimax", "MiniMax-M2.1");
|
||||
|
||||
it("should abort mid-stream", { retry: 3 }, async () => {
|
||||
await testAbortSignal(llm);
|
||||
});
|
||||
|
||||
it("should handle immediate abort", { retry: 3 }, async () => {
|
||||
await testImmediateAbort(llm);
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider Abort", () => {
|
||||
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
|
||||
it("should abort mid-stream", { retry: 3 }, async () => {
|
||||
await testAbortSignal(llm);
|
||||
});
|
||||
|
||||
it("should handle immediate abort", { retry: 3 }, async () => {
|
||||
await testImmediateAbort(llm);
|
||||
});
|
||||
});
|
||||
|
||||
// Google Gemini CLI / Antigravity share the same provider, so one test covers both
|
||||
describe("Google Gemini CLI Provider Abort", () => {
|
||||
it.skipIf(!geminiCliToken)("should abort mid-stream", { retry: 3 }, async () => {
|
||||
|
|
@ -154,4 +208,20 @@ describe("AI Providers Abort Tests", () => {
|
|||
await testImmediateAbort(llm, { apiKey: openaiCodexToken });
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider Abort", () => {
|
||||
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
|
||||
it("should abort mid-stream", { retry: 3 }, async () => {
|
||||
await testAbortSignal(llm, { reasoning: "medium" });
|
||||
});
|
||||
|
||||
it("should handle immediate abort", { retry: 3 }, async () => {
|
||||
await testImmediateAbort(llm);
|
||||
});
|
||||
|
||||
it("should handle abort then new message", { retry: 3 }, async () => {
|
||||
await testAbortThenNewMessage(llm);
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
|
|||
66
packages/ai/test/bedrock-models.test.ts
Normal file
66
packages/ai/test/bedrock-models.test.ts
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
/**
|
||||
* A test suite to ensure all configured Amazon Bedrock models are usable.
|
||||
*
|
||||
* This is here to make sure we got correct model identifiers from models.dev and other sources.
|
||||
* Because Amazon Bedrock requires cross-region inference in some models,
|
||||
* plain model identifiers are not always usable and it requires tweaking of model identifiers to use cross-region inference.
|
||||
* See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html#inference-profiles-support-system for more details.
|
||||
*
|
||||
* This test suite is not enabled by default unless AWS credentials and `BEDROCK_EXTENSIVE_MODEL_TEST` environment variables are set.
|
||||
* This test suite takes ~2 minutes to run. Because not all models are available in all regions,
|
||||
* it's recommended to use `us-west-2` region for best coverage for running this test suite.
|
||||
*
|
||||
* You can run this test suite with:
|
||||
* ```bash
|
||||
* $ AWS_REGION=us-west-2 BEDROCK_EXTENSIVE_MODEL_TEST=1 AWS_PROFILE=... npm test -- ./test/bedrock-models.test.ts
|
||||
* ```
|
||||
*/
|
||||
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { getModels } from "../src/models.js";
|
||||
import { complete } from "../src/stream.js";
|
||||
import type { Context } from "../src/types.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
|
||||
describe("Amazon Bedrock Models", () => {
|
||||
const models = getModels("amazon-bedrock");
|
||||
|
||||
it("should get all available Bedrock models", () => {
|
||||
expect(models.length).toBeGreaterThan(0);
|
||||
console.log(`Found ${models.length} Bedrock models`);
|
||||
});
|
||||
|
||||
if (hasBedrockCredentials() && process.env.BEDROCK_EXTENSIVE_MODEL_TEST) {
|
||||
for (const model of models) {
|
||||
it(`should make a simple request with ${model.id}`, { timeout: 10_000 }, async () => {
|
||||
const context: Context = {
|
||||
systemPrompt: "You are a helpful assistant. Be extremely concise.",
|
||||
messages: [
|
||||
{
|
||||
role: "user",
|
||||
content: "Reply with exactly: 'OK'",
|
||||
timestamp: Date.now(),
|
||||
},
|
||||
],
|
||||
};
|
||||
|
||||
const response = await complete(model, context);
|
||||
|
||||
expect(response.role).toBe("assistant");
|
||||
expect(response.content).toBeTruthy();
|
||||
expect(response.content.length).toBeGreaterThan(0);
|
||||
expect(response.usage.input + response.usage.cacheRead).toBeGreaterThan(0);
|
||||
expect(response.usage.output).toBeGreaterThan(0);
|
||||
expect(response.errorMessage).toBeFalsy();
|
||||
|
||||
const textContent = response.content
|
||||
.filter((b) => b.type === "text")
|
||||
.map((b) => (b.type === "text" ? b.text : ""))
|
||||
.join("")
|
||||
.trim();
|
||||
expect(textContent).toBeTruthy();
|
||||
console.log(`${model.id}: ${textContent.substring(0, 100)}`);
|
||||
});
|
||||
}
|
||||
}
|
||||
});
|
||||
18
packages/ai/test/bedrock-utils.ts
Normal file
18
packages/ai/test/bedrock-utils.ts
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
/**
|
||||
* Utility functions for Amazon Bedrock tests
|
||||
*/
|
||||
|
||||
/**
|
||||
* Check if any valid AWS credentials are configured for Bedrock.
|
||||
* Returns true if any of the following are set:
|
||||
* - AWS_PROFILE (named profile from ~/.aws/credentials)
|
||||
* - AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY (IAM keys)
|
||||
* - AWS_BEARER_TOKEN_BEDROCK (Bedrock API key)
|
||||
*/
|
||||
export function hasBedrockCredentials(): boolean {
|
||||
return !!(
|
||||
process.env.AWS_PROFILE ||
|
||||
(process.env.AWS_ACCESS_KEY_ID && process.env.AWS_SECRET_ACCESS_KEY) ||
|
||||
process.env.AWS_BEARER_TOKEN_BEDROCK
|
||||
);
|
||||
}
|
||||
|
|
@ -18,6 +18,7 @@ import { getModel } from "../src/models.js";
|
|||
import { complete } from "../src/stream.js";
|
||||
import type { AssistantMessage, Context, Model, Usage } from "../src/types.js";
|
||||
import { isContextOverflow } from "../src/utils/overflow.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
import { resolveApiKey } from "./oauth.js";
|
||||
|
||||
// Resolve OAuth tokens at module level (async, runs before tests)
|
||||
|
|
@ -284,6 +285,22 @@ describe("Context overflow error handling", () => {
|
|||
);
|
||||
});
|
||||
|
||||
// =============================================================================
|
||||
// Amazon Bedrock
|
||||
// Expected pattern: "Input is too long for requested model"
|
||||
// =============================================================================
|
||||
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock", () => {
|
||||
it("claude-sonnet-4-5 - should detect overflow via isContextOverflow", async () => {
|
||||
const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
const result = await testContextOverflow(model, "");
|
||||
logResult(result);
|
||||
|
||||
expect(result.stopReason).toBe("error");
|
||||
expect(isContextOverflow(result.response, model.contextWindow)).toBe(true);
|
||||
}, 120000);
|
||||
});
|
||||
|
||||
// =============================================================================
|
||||
// xAI
|
||||
// Expected pattern: "maximum prompt length is X but the request contains Y"
|
||||
|
|
@ -379,6 +396,37 @@ describe("Context overflow error handling", () => {
|
|||
}, 120000);
|
||||
});
|
||||
|
||||
// =============================================================================
|
||||
// MiniMax
|
||||
// Expected pattern: TBD - need to test actual error message
|
||||
// =============================================================================
|
||||
|
||||
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax", () => {
|
||||
it("MiniMax-M2.1 - should detect overflow via isContextOverflow", async () => {
|
||||
const model = getModel("minimax", "MiniMax-M2.1");
|
||||
const result = await testContextOverflow(model, process.env.MINIMAX_API_KEY!);
|
||||
logResult(result);
|
||||
|
||||
expect(result.stopReason).toBe("error");
|
||||
expect(isContextOverflow(result.response, model.contextWindow)).toBe(true);
|
||||
}, 120000);
|
||||
});
|
||||
|
||||
// =============================================================================
|
||||
// Vercel AI Gateway - Unified API for multiple providers
|
||||
// =============================================================================
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway", () => {
|
||||
it("google/gemini-2.5-flash via AI Gateway - should detect overflow via isContextOverflow", async () => {
|
||||
const model = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
const result = await testContextOverflow(model, process.env.AI_GATEWAY_API_KEY!);
|
||||
logResult(result);
|
||||
|
||||
expect(result.stopReason).toBe("error");
|
||||
expect(isContextOverflow(result.response, model.contextWindow)).toBe(true);
|
||||
}, 120000);
|
||||
});
|
||||
|
||||
// =============================================================================
|
||||
// OpenRouter - Multiple backend providers
|
||||
// Expected pattern: "maximum context length is X tokens"
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
|
|||
import { getModel } from "../src/models.js";
|
||||
import { complete } from "../src/stream.js";
|
||||
import type { Api, AssistantMessage, Context, Model, OptionsForApi, UserMessage } from "../src/types.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
import { resolveApiKey } from "./oauth.js";
|
||||
|
||||
// Resolve OAuth tokens at module level (async, runs before tests)
|
||||
|
|
@ -321,6 +322,66 @@ describe("AI Providers Empty Message Tests", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider Empty Messages", () => {
|
||||
const llm = getModel("minimax", "MiniMax-M2.1");
|
||||
|
||||
it("should handle empty content array", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmptyMessage(llm);
|
||||
});
|
||||
|
||||
it("should handle empty string content", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmptyStringMessage(llm);
|
||||
});
|
||||
|
||||
it("should handle whitespace-only content", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testWhitespaceOnlyMessage(llm);
|
||||
});
|
||||
|
||||
it("should handle empty assistant message in conversation", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmptyAssistantMessage(llm);
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider Empty Messages", () => {
|
||||
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
|
||||
it("should handle empty content array", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmptyMessage(llm);
|
||||
});
|
||||
|
||||
it("should handle empty string content", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmptyStringMessage(llm);
|
||||
});
|
||||
|
||||
it("should handle whitespace-only content", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testWhitespaceOnlyMessage(llm);
|
||||
});
|
||||
|
||||
it("should handle empty assistant message in conversation", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmptyAssistantMessage(llm);
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider Empty Messages", () => {
|
||||
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
|
||||
it("should handle empty content array", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmptyMessage(llm);
|
||||
});
|
||||
|
||||
it("should handle empty string content", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmptyStringMessage(llm);
|
||||
});
|
||||
|
||||
it("should handle whitespace-only content", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testWhitespaceOnlyMessage(llm);
|
||||
});
|
||||
|
||||
it("should handle empty assistant message in conversation", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmptyAssistantMessage(llm);
|
||||
});
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
|
||||
// =========================================================================
|
||||
|
|
|
|||
|
|
@ -0,0 +1,103 @@
|
|||
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||
import { streamGoogleGeminiCli } from "../src/providers/google-gemini-cli.js";
|
||||
import type { Context, Model } from "../src/types.js";
|
||||
|
||||
const originalFetch = global.fetch;
|
||||
const apiKey = JSON.stringify({ token: "token", projectId: "project" });
|
||||
|
||||
const createSseResponse = () => {
|
||||
const sse = `${[
|
||||
`data: ${JSON.stringify({
|
||||
response: {
|
||||
candidates: [
|
||||
{
|
||||
content: { role: "model", parts: [{ text: "Hello" }] },
|
||||
finishReason: "STOP",
|
||||
},
|
||||
],
|
||||
},
|
||||
})}`,
|
||||
].join("\n\n")}\n\n`;
|
||||
|
||||
const encoder = new TextEncoder();
|
||||
const stream = new ReadableStream<Uint8Array>({
|
||||
start(controller) {
|
||||
controller.enqueue(encoder.encode(sse));
|
||||
controller.close();
|
||||
},
|
||||
});
|
||||
|
||||
return new Response(stream, {
|
||||
status: 200,
|
||||
headers: { "content-type": "text/event-stream" },
|
||||
});
|
||||
};
|
||||
|
||||
afterEach(() => {
|
||||
global.fetch = originalFetch;
|
||||
vi.restoreAllMocks();
|
||||
});
|
||||
|
||||
describe("google-gemini-cli Claude thinking header", () => {
|
||||
const context: Context = {
|
||||
messages: [{ role: "user", content: "Say hello", timestamp: Date.now() }],
|
||||
};
|
||||
|
||||
it("adds anthropic-beta for Claude thinking models", async () => {
|
||||
const fetchMock = vi.fn(async (_input: string | URL, init?: RequestInit) => {
|
||||
const headers = new Headers(init?.headers);
|
||||
expect(headers.get("anthropic-beta")).toBe("interleaved-thinking-2025-05-14");
|
||||
return createSseResponse();
|
||||
});
|
||||
|
||||
global.fetch = fetchMock as typeof fetch;
|
||||
|
||||
const model: Model<"google-gemini-cli"> = {
|
||||
id: "claude-opus-4-5-thinking",
|
||||
name: "Claude Opus 4.5 Thinking",
|
||||
api: "google-gemini-cli",
|
||||
provider: "google-antigravity",
|
||||
baseUrl: "https://cloudcode-pa.googleapis.com",
|
||||
reasoning: true,
|
||||
input: ["text"],
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
|
||||
contextWindow: 128000,
|
||||
maxTokens: 8192,
|
||||
};
|
||||
|
||||
const stream = streamGoogleGeminiCli(model, context, { apiKey });
|
||||
for await (const _event of stream) {
|
||||
// exhaust stream
|
||||
}
|
||||
await stream.result();
|
||||
});
|
||||
|
||||
it("does not add anthropic-beta for Gemini models", async () => {
|
||||
const fetchMock = vi.fn(async (_input: string | URL, init?: RequestInit) => {
|
||||
const headers = new Headers(init?.headers);
|
||||
expect(headers.has("anthropic-beta")).toBe(false);
|
||||
return createSseResponse();
|
||||
});
|
||||
|
||||
global.fetch = fetchMock as typeof fetch;
|
||||
|
||||
const model: Model<"google-gemini-cli"> = {
|
||||
id: "gemini-2.5-flash",
|
||||
name: "Gemini 2.5 Flash",
|
||||
api: "google-gemini-cli",
|
||||
provider: "google-gemini-cli",
|
||||
baseUrl: "https://cloudcode-pa.googleapis.com",
|
||||
reasoning: false,
|
||||
input: ["text"],
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
|
||||
contextWindow: 128000,
|
||||
maxTokens: 8192,
|
||||
};
|
||||
|
||||
const stream = streamGoogleGeminiCli(model, context, { apiKey });
|
||||
for await (const _event of stream) {
|
||||
// exhaust stream
|
||||
}
|
||||
await stream.result();
|
||||
});
|
||||
});
|
||||
108
packages/ai/test/google-gemini-cli-empty-stream.test.ts
Normal file
108
packages/ai/test/google-gemini-cli-empty-stream.test.ts
Normal file
|
|
@ -0,0 +1,108 @@
|
|||
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||
import { streamGoogleGeminiCli } from "../src/providers/google-gemini-cli.js";
|
||||
import type { Context, Model } from "../src/types.js";
|
||||
|
||||
const originalFetch = global.fetch;
|
||||
|
||||
afterEach(() => {
|
||||
global.fetch = originalFetch;
|
||||
vi.restoreAllMocks();
|
||||
});
|
||||
|
||||
describe("google-gemini-cli empty stream retry", () => {
|
||||
it("retries empty SSE responses without duplicate start", async () => {
|
||||
const emptyStream = new ReadableStream<Uint8Array>({
|
||||
start(controller) {
|
||||
controller.close();
|
||||
},
|
||||
});
|
||||
|
||||
const sse = `${[
|
||||
`data: ${JSON.stringify({
|
||||
response: {
|
||||
candidates: [
|
||||
{
|
||||
content: { role: "model", parts: [{ text: "Hello" }] },
|
||||
finishReason: "STOP",
|
||||
},
|
||||
],
|
||||
usageMetadata: {
|
||||
promptTokenCount: 1,
|
||||
candidatesTokenCount: 1,
|
||||
totalTokenCount: 2,
|
||||
},
|
||||
},
|
||||
})}`,
|
||||
].join("\n\n")}\n\n`;
|
||||
|
||||
const encoder = new TextEncoder();
|
||||
const dataStream = new ReadableStream<Uint8Array>({
|
||||
start(controller) {
|
||||
controller.enqueue(encoder.encode(sse));
|
||||
controller.close();
|
||||
},
|
||||
});
|
||||
|
||||
let callCount = 0;
|
||||
const fetchMock = vi.fn(async () => {
|
||||
callCount += 1;
|
||||
if (callCount === 1) {
|
||||
return new Response(emptyStream, {
|
||||
status: 200,
|
||||
headers: { "content-type": "text/event-stream" },
|
||||
});
|
||||
}
|
||||
return new Response(dataStream, {
|
||||
status: 200,
|
||||
headers: { "content-type": "text/event-stream" },
|
||||
});
|
||||
});
|
||||
|
||||
global.fetch = fetchMock as typeof fetch;
|
||||
|
||||
const model: Model<"google-gemini-cli"> = {
|
||||
id: "gemini-2.5-flash",
|
||||
name: "Gemini 2.5 Flash",
|
||||
api: "google-gemini-cli",
|
||||
provider: "google-gemini-cli",
|
||||
baseUrl: "https://cloudcode-pa.googleapis.com",
|
||||
reasoning: false,
|
||||
input: ["text"],
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
|
||||
contextWindow: 128000,
|
||||
maxTokens: 8192,
|
||||
};
|
||||
|
||||
const context: Context = {
|
||||
messages: [{ role: "user", content: "Say hello", timestamp: Date.now() }],
|
||||
};
|
||||
|
||||
const stream = streamGoogleGeminiCli(model, context, {
|
||||
apiKey: JSON.stringify({ token: "token", projectId: "project" }),
|
||||
});
|
||||
|
||||
let startCount = 0;
|
||||
let doneCount = 0;
|
||||
let text = "";
|
||||
|
||||
for await (const event of stream) {
|
||||
if (event.type === "start") {
|
||||
startCount += 1;
|
||||
}
|
||||
if (event.type === "done") {
|
||||
doneCount += 1;
|
||||
}
|
||||
if (event.type === "text_delta") {
|
||||
text += event.delta;
|
||||
}
|
||||
}
|
||||
|
||||
const result = await stream.result();
|
||||
|
||||
expect(text).toBe("Hello");
|
||||
expect(result.stopReason).toBe("stop");
|
||||
expect(startCount).toBe(1);
|
||||
expect(doneCount).toBe(1);
|
||||
expect(fetchMock).toHaveBeenCalledTimes(2);
|
||||
});
|
||||
});
|
||||
53
packages/ai/test/google-gemini-cli-retry-delay.test.ts
Normal file
53
packages/ai/test/google-gemini-cli-retry-delay.test.ts
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||
import { extractRetryDelay } from "../src/providers/google-gemini-cli.js";
|
||||
|
||||
describe("extractRetryDelay header parsing", () => {
|
||||
afterEach(() => {
|
||||
vi.useRealTimers();
|
||||
});
|
||||
|
||||
it("prefers Retry-After seconds header", () => {
|
||||
vi.useFakeTimers();
|
||||
vi.setSystemTime(new Date("2025-01-01T00:00:00Z"));
|
||||
|
||||
const response = new Response("", { headers: { "Retry-After": "5" } });
|
||||
const delay = extractRetryDelay("Please retry in 1s", response);
|
||||
|
||||
expect(delay).toBe(6000);
|
||||
});
|
||||
|
||||
it("parses Retry-After HTTP date header", () => {
|
||||
vi.useFakeTimers();
|
||||
const now = new Date("2025-01-01T00:00:00Z");
|
||||
vi.setSystemTime(now);
|
||||
|
||||
const retryAt = new Date(now.getTime() + 12000).toUTCString();
|
||||
const response = new Response("", { headers: { "Retry-After": retryAt } });
|
||||
const delay = extractRetryDelay("", response);
|
||||
|
||||
expect(delay).toBe(13000);
|
||||
});
|
||||
|
||||
it("parses x-ratelimit-reset header", () => {
|
||||
vi.useFakeTimers();
|
||||
const now = new Date("2025-01-01T00:00:00Z");
|
||||
vi.setSystemTime(now);
|
||||
|
||||
const resetAtMs = now.getTime() + 20000;
|
||||
const resetSeconds = Math.floor(resetAtMs / 1000).toString();
|
||||
const response = new Response("", { headers: { "x-ratelimit-reset": resetSeconds } });
|
||||
const delay = extractRetryDelay("", response);
|
||||
|
||||
expect(delay).toBe(21000);
|
||||
});
|
||||
|
||||
it("parses x-ratelimit-reset-after header", () => {
|
||||
vi.useFakeTimers();
|
||||
vi.setSystemTime(new Date("2025-01-01T00:00:00Z"));
|
||||
|
||||
const response = new Response("", { headers: { "x-ratelimit-reset-after": "30" } });
|
||||
const delay = extractRetryDelay("", response);
|
||||
|
||||
expect(delay).toBe(31000);
|
||||
});
|
||||
});
|
||||
50
packages/ai/test/google-gemini-cli-session-id.test.ts
Normal file
50
packages/ai/test/google-gemini-cli-session-id.test.ts
Normal file
|
|
@ -0,0 +1,50 @@
|
|||
import { createHash } from "node:crypto";
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { buildRequest } from "../src/providers/google-gemini-cli.js";
|
||||
import type { Context, Model } from "../src/types.js";
|
||||
|
||||
const model: Model<"google-gemini-cli"> = {
|
||||
id: "gemini-2.5-flash",
|
||||
name: "Gemini 2.5 Flash",
|
||||
api: "google-gemini-cli",
|
||||
provider: "google-gemini-cli",
|
||||
baseUrl: "https://cloudcode-pa.googleapis.com",
|
||||
reasoning: false,
|
||||
input: ["text"],
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
|
||||
contextWindow: 128000,
|
||||
maxTokens: 8192,
|
||||
};
|
||||
|
||||
describe("buildRequest sessionId", () => {
|
||||
it("derives sessionId from the first user message", () => {
|
||||
const context: Context = {
|
||||
messages: [
|
||||
{ role: "user", content: "First message", timestamp: Date.now() },
|
||||
{ role: "user", content: "Second message", timestamp: Date.now() },
|
||||
],
|
||||
};
|
||||
|
||||
const result = buildRequest(model, context, "project-id");
|
||||
const expected = createHash("sha256").update("First message").digest("hex").slice(0, 32);
|
||||
|
||||
expect(result.request.sessionId).toBe(expected);
|
||||
});
|
||||
|
||||
it("omits sessionId when the first user message has no text", () => {
|
||||
const context: Context = {
|
||||
messages: [
|
||||
{
|
||||
role: "user",
|
||||
content: [{ type: "image", data: "Zm9v", mimeType: "image/png" }],
|
||||
timestamp: Date.now(),
|
||||
},
|
||||
{ role: "user", content: "Later text", timestamp: Date.now() },
|
||||
],
|
||||
};
|
||||
|
||||
const result = buildRequest(model, context, "project-id");
|
||||
|
||||
expect(result.request.sessionId).toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
|
@ -75,6 +75,7 @@ import { afterAll, beforeAll, describe, expect, it } from "vitest";
|
|||
import { getModel } from "../src/models.js";
|
||||
import { complete } from "../src/stream.js";
|
||||
import type { Api, Context, ImageContent, Model, OptionsForApi, UserMessage } from "../src/types.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
const __dirname = dirname(__filename);
|
||||
|
|
@ -840,6 +841,122 @@ describe("Image Limits E2E Tests", () => {
|
|||
});
|
||||
});
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Vercel AI Gateway (google/gemini-2.5-flash)
|
||||
// -------------------------------------------------------------------------
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway (google/gemini-2.5-flash)", () => {
|
||||
const model = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
|
||||
it("should accept a small number of images (5)", async () => {
|
||||
const result = await testImageCount(model, 5, smallImage);
|
||||
expect(result.success, result.error).toBe(true);
|
||||
});
|
||||
|
||||
it("should find maximum image count limit", { timeout: 600000 }, async () => {
|
||||
const { limit, lastError } = await findLimit((count) => testImageCount(model, count, smallImage), 10, 100, 10);
|
||||
console.log(`\n Vercel AI Gateway max images: ~${limit} (last error: ${lastError})`);
|
||||
expect(limit).toBeGreaterThanOrEqual(5);
|
||||
});
|
||||
|
||||
it("should find maximum image size limit", { timeout: 600000 }, async () => {
|
||||
const MB = 1024 * 1024;
|
||||
const sizes = [5, 10, 15, 20];
|
||||
|
||||
let lastSuccess = 0;
|
||||
let lastError: string | undefined;
|
||||
|
||||
for (const sizeMB of sizes) {
|
||||
console.log(` Testing size: ${sizeMB}MB...`);
|
||||
const imageBase64 = generateImageWithSize(sizeMB * MB, `size-${sizeMB}mb.png`);
|
||||
const result = await testImageSize(model, imageBase64);
|
||||
if (result.success) {
|
||||
lastSuccess = sizeMB;
|
||||
console.log(` SUCCESS`);
|
||||
} else {
|
||||
lastError = result.error;
|
||||
console.log(` FAILED: ${result.error?.substring(0, 100)}`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n Vercel AI Gateway max image size: ~${lastSuccess}MB (last error: ${lastError})`);
|
||||
expect(lastSuccess).toBeGreaterThanOrEqual(5);
|
||||
});
|
||||
});
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Amazon Bedrock (claude-sonnet-4-5)
|
||||
// Limits: 100 images (Anthropic), 5MB per image, 8000px max dimension
|
||||
// -------------------------------------------------------------------------
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock (claude-sonnet-4-5)", () => {
|
||||
const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
|
||||
it("should accept a small number of images (5)", async () => {
|
||||
const result = await testImageCount(model, 5, smallImage);
|
||||
expect(result.success, result.error).toBe(true);
|
||||
});
|
||||
|
||||
it("should find maximum image count limit", { timeout: 600000 }, async () => {
|
||||
// Anthropic limit: 100 images
|
||||
const { limit, lastError } = await findLimit((count) => testImageCount(model, count, smallImage), 20, 120, 20);
|
||||
console.log(`\n Bedrock max images: ~${limit} (last error: ${lastError})`);
|
||||
expect(limit).toBeGreaterThanOrEqual(80);
|
||||
expect(limit).toBeLessThanOrEqual(100);
|
||||
});
|
||||
|
||||
it("should find maximum image size limit", { timeout: 600000 }, async () => {
|
||||
const MB = 1024 * 1024;
|
||||
// Anthropic limit: 5MB per image
|
||||
const sizes = [1, 2, 3, 4, 5, 6];
|
||||
|
||||
let lastSuccess = 0;
|
||||
let lastError: string | undefined;
|
||||
|
||||
for (const sizeMB of sizes) {
|
||||
console.log(` Testing size: ${sizeMB}MB...`);
|
||||
const imageBase64 = generateImageWithSize(sizeMB * MB, `size-${sizeMB}mb.png`);
|
||||
const result = await testImageSize(model, imageBase64);
|
||||
if (result.success) {
|
||||
lastSuccess = sizeMB;
|
||||
console.log(` SUCCESS`);
|
||||
} else {
|
||||
lastError = result.error;
|
||||
console.log(` FAILED: ${result.error?.substring(0, 100)}`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n Bedrock max image size: ~${lastSuccess}MB (last error: ${lastError})`);
|
||||
expect(lastSuccess).toBeGreaterThanOrEqual(1);
|
||||
});
|
||||
|
||||
it("should find maximum image dimension limit", { timeout: 600000 }, async () => {
|
||||
// Anthropic limit: 8000px
|
||||
const dimensions = [1000, 2000, 4000, 6000, 8000, 10000];
|
||||
|
||||
let lastSuccess = 0;
|
||||
let lastError: string | undefined;
|
||||
|
||||
for (const dim of dimensions) {
|
||||
console.log(` Testing dimension: ${dim}x${dim}...`);
|
||||
const imageBase64 = generateImage(dim, dim, `dim-${dim}.png`);
|
||||
const result = await testImageDimensions(model, imageBase64);
|
||||
if (result.success) {
|
||||
lastSuccess = dim;
|
||||
console.log(` SUCCESS`);
|
||||
} else {
|
||||
lastError = result.error;
|
||||
console.log(` FAILED: ${result.error?.substring(0, 100)}`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n Bedrock max dimension: ~${lastSuccess}px (last error: ${lastError})`);
|
||||
expect(lastSuccess).toBeGreaterThanOrEqual(6000);
|
||||
expect(lastSuccess).toBeLessThanOrEqual(8000);
|
||||
});
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// MAX SIZE IMAGES TEST
|
||||
// =========================================================================
|
||||
|
|
@ -898,6 +1015,38 @@ describe("Image Limits E2E Tests", () => {
|
|||
},
|
||||
);
|
||||
|
||||
// Amazon Bedrock (Claude) - 5MB per image limit, same as Anthropic direct
|
||||
// Using 3MB to stay under 5MB limit
|
||||
it.skipIf(!hasBedrockCredentials())(
|
||||
"Bedrock: max ~3MB images before rejection",
|
||||
{ timeout: 900000 },
|
||||
async () => {
|
||||
const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
const image3mb = getImageAtSize(3);
|
||||
// Similar to Anthropic, test progressively
|
||||
const counts = [1, 2, 4, 6, 8, 10, 12];
|
||||
|
||||
let lastSuccess = 0;
|
||||
let lastError: string | undefined;
|
||||
|
||||
for (const count of counts) {
|
||||
console.log(` Testing ${count} x ~3MB images...`);
|
||||
const result = await testImageCount(model, count, image3mb);
|
||||
if (result.success) {
|
||||
lastSuccess = count;
|
||||
console.log(` SUCCESS`);
|
||||
} else {
|
||||
lastError = result.error;
|
||||
console.log(` FAILED: ${result.error?.substring(0, 150)}`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n Bedrock max ~3MB images: ${lastSuccess} (last error: ${lastError})`);
|
||||
expect(lastSuccess).toBeGreaterThanOrEqual(1);
|
||||
},
|
||||
);
|
||||
|
||||
// OpenAI - 20MB per image documented, we found ≥25MB works
|
||||
// Test with 15MB images to stay safely under limit
|
||||
it.skipIf(!process.env.OPENAI_API_KEY)(
|
||||
|
|
|
|||
|
|
@ -5,6 +5,7 @@ import { describe, expect, it } from "vitest";
|
|||
import type { Api, Context, Model, Tool, ToolResultMessage } from "../src/index.js";
|
||||
import { complete, getModel } from "../src/index.js";
|
||||
import type { OptionsForApi } from "../src/types.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
import { resolveApiKey } from "./oauth.js";
|
||||
|
||||
// Resolve OAuth tokens at module level (async, runs before tests)
|
||||
|
|
@ -273,6 +274,30 @@ describe("Tool Results with Images", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider (google/gemini-2.5-flash)", () => {
|
||||
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
|
||||
it("should handle tool result with only image", { retry: 3, timeout: 30000 }, async () => {
|
||||
await handleToolWithImageResult(llm);
|
||||
});
|
||||
|
||||
it("should handle tool result with text and image", { retry: 3, timeout: 30000 }, async () => {
|
||||
await handleToolWithTextAndImageResult(llm);
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider (claude-sonnet-4-5)", () => {
|
||||
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
|
||||
it("should handle tool result with only image", { retry: 3, timeout: 30000 }, async () => {
|
||||
await handleToolWithImageResult(llm);
|
||||
});
|
||||
|
||||
it("should handle tool result with text and image", { retry: 3, timeout: 30000 }, async () => {
|
||||
await handleToolWithTextAndImageResult(llm);
|
||||
});
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
|
||||
// =========================================================================
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ import { getModel } from "../src/models.js";
|
|||
import { complete, stream } from "../src/stream.js";
|
||||
import type { Api, Context, ImageContent, Model, OptionsForApi, Tool, ToolResultMessage } from "../src/types.js";
|
||||
import { StringEnum } from "../src/utils/typebox-helpers.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
import { resolveApiKey } from "./oauth.js";
|
||||
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
|
|
@ -356,7 +357,7 @@ describe("Generate E2E Tests", () => {
|
|||
await handleStreaming(llm);
|
||||
});
|
||||
|
||||
it("should handle ", { retry: 3 }, async () => {
|
||||
it("should handle thinking", { retry: 3 }, async () => {
|
||||
await handleThinking(llm, { thinking: { enabled: true, budgetTokens: 1024 } });
|
||||
});
|
||||
|
||||
|
|
@ -597,6 +598,87 @@ describe("Generate E2E Tests", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)(
|
||||
"Vercel AI Gateway Provider (google/gemini-2.5-flash via Anthropic Messages)",
|
||||
() => {
|
||||
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
|
||||
it("should complete basic text generation", { retry: 3 }, async () => {
|
||||
await basicTextGeneration(llm);
|
||||
});
|
||||
|
||||
it("should handle tool calling", { retry: 3 }, async () => {
|
||||
await handleToolCall(llm);
|
||||
});
|
||||
|
||||
it("should handle streaming", { retry: 3 }, async () => {
|
||||
await handleStreaming(llm);
|
||||
});
|
||||
|
||||
it("should handle image input", { retry: 3 }, async () => {
|
||||
await handleImage(llm);
|
||||
});
|
||||
|
||||
it("should handle multi-turn with tools", { retry: 3 }, async () => {
|
||||
await multiTurn(llm);
|
||||
});
|
||||
},
|
||||
);
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)(
|
||||
"Vercel AI Gateway Provider (anthropic/claude-opus-4.5 via Anthropic Messages)",
|
||||
() => {
|
||||
const llm = getModel("vercel-ai-gateway", "anthropic/claude-opus-4.5");
|
||||
|
||||
it("should complete basic text generation", { retry: 3 }, async () => {
|
||||
await basicTextGeneration(llm);
|
||||
});
|
||||
|
||||
it("should handle tool calling", { retry: 3 }, async () => {
|
||||
await handleToolCall(llm);
|
||||
});
|
||||
|
||||
it("should handle streaming", { retry: 3 }, async () => {
|
||||
await handleStreaming(llm);
|
||||
});
|
||||
|
||||
it("should handle image input", { retry: 3 }, async () => {
|
||||
await handleImage(llm);
|
||||
});
|
||||
|
||||
it("should handle multi-turn with tools", { retry: 3 }, async () => {
|
||||
await multiTurn(llm);
|
||||
});
|
||||
},
|
||||
);
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)(
|
||||
"Vercel AI Gateway Provider (openai/gpt-5.1-codex-max via Anthropic Messages)",
|
||||
() => {
|
||||
const llm = getModel("vercel-ai-gateway", "openai/gpt-5.1-codex-max");
|
||||
|
||||
it("should complete basic text generation", { retry: 3 }, async () => {
|
||||
await basicTextGeneration(llm);
|
||||
});
|
||||
|
||||
it("should handle tool calling", { retry: 3 }, async () => {
|
||||
await handleToolCall(llm);
|
||||
});
|
||||
|
||||
it("should handle streaming", { retry: 3 }, async () => {
|
||||
await handleStreaming(llm);
|
||||
});
|
||||
|
||||
it("should handle image input", { retry: 3 }, async () => {
|
||||
await handleImage(llm);
|
||||
});
|
||||
|
||||
it("should handle multi-turn with tools", { retry: 3 }, async () => {
|
||||
await multiTurn(llm);
|
||||
});
|
||||
},
|
||||
);
|
||||
|
||||
describe.skipIf(!process.env.ZAI_API_KEY)("zAI Provider (glm-4.5-air via OpenAI Completions)", () => {
|
||||
const llm = getModel("zai", "glm-4.5-air");
|
||||
|
||||
|
|
@ -698,6 +780,30 @@ describe("Generate E2E Tests", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider (MiniMax-M2.1 via Anthropic Messages)", () => {
|
||||
const llm = getModel("minimax", "MiniMax-M2.1");
|
||||
|
||||
it("should complete basic text generation", { retry: 3 }, async () => {
|
||||
await basicTextGeneration(llm);
|
||||
});
|
||||
|
||||
it("should handle tool calling", { retry: 3 }, async () => {
|
||||
await handleToolCall(llm);
|
||||
});
|
||||
|
||||
it("should handle streaming", { retry: 3 }, async () => {
|
||||
await handleStreaming(llm);
|
||||
});
|
||||
|
||||
it("should handle thinking mode", { retry: 3 }, async () => {
|
||||
await handleThinking(llm, { thinkingEnabled: true, thinkingBudgetTokens: 2048 });
|
||||
});
|
||||
|
||||
it("should handle multi-turn with thinking and tools", { retry: 3 }, async () => {
|
||||
await multiTurn(llm, { thinkingEnabled: true, thinkingBudgetTokens: 2048 });
|
||||
});
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
|
||||
// Tokens are resolved at module level (see oauthTokens above)
|
||||
|
|
@ -907,6 +1013,34 @@ describe("Generate E2E Tests", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider (claude-sonnet-4-5)", () => {
|
||||
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
|
||||
it("should complete basic text generation", { retry: 3 }, async () => {
|
||||
await basicTextGeneration(llm);
|
||||
});
|
||||
|
||||
it("should handle tool calling", { retry: 3 }, async () => {
|
||||
await handleToolCall(llm);
|
||||
});
|
||||
|
||||
it("should handle streaming", { retry: 3 }, async () => {
|
||||
await handleStreaming(llm);
|
||||
});
|
||||
|
||||
it("should handle thinking", { retry: 3 }, async () => {
|
||||
await handleThinking(llm, { reasoning: "medium" });
|
||||
});
|
||||
|
||||
it("should handle multi-turn with thinking and tools", { retry: 3 }, async () => {
|
||||
await multiTurn(llm, { reasoning: "high" });
|
||||
});
|
||||
|
||||
it("should handle image input", { retry: 3 }, async () => {
|
||||
await handleImage(llm);
|
||||
});
|
||||
});
|
||||
|
||||
// Check if ollama is installed and local LLM tests are enabled
|
||||
let ollamaInstalled = false;
|
||||
if (!process.env.PI_NO_LOCAL_LLM) {
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
|
|||
import { getModel } from "../src/models.js";
|
||||
import { stream } from "../src/stream.js";
|
||||
import type { Api, Context, Model, OptionsForApi } from "../src/types.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
import { resolveApiKey } from "./oauth.js";
|
||||
|
||||
// Resolve OAuth tokens at module level (async, runs before tests)
|
||||
|
|
@ -44,18 +45,25 @@ async function testTokensOnAbort<TApi extends Api>(llm: Model<TApi>, options: Op
|
|||
|
||||
expect(msg.stopReason).toBe("aborted");
|
||||
|
||||
// OpenAI providers, OpenAI Codex, Gemini CLI, zai, and the GPT-OSS model on Antigravity only send usage in the final chunk,
|
||||
// so when aborted they have no token stats Anthropic and Google send usage information early in the stream
|
||||
// OpenAI providers, OpenAI Codex, Gemini CLI, zai, Amazon Bedrock, and the GPT-OSS model on Antigravity only send usage in the final chunk,
|
||||
// so when aborted they have no token stats. Anthropic and Google send usage information early in the stream.
|
||||
// MiniMax reports input tokens but not output tokens when aborted.
|
||||
if (
|
||||
llm.api === "openai-completions" ||
|
||||
llm.api === "openai-responses" ||
|
||||
llm.api === "openai-codex-responses" ||
|
||||
llm.provider === "google-gemini-cli" ||
|
||||
llm.provider === "zai" ||
|
||||
llm.provider === "amazon-bedrock" ||
|
||||
llm.provider === "vercel-ai-gateway" ||
|
||||
(llm.provider === "google-antigravity" && llm.id.includes("gpt-oss"))
|
||||
) {
|
||||
expect(msg.usage.input).toBe(0);
|
||||
expect(msg.usage.output).toBe(0);
|
||||
} else if (llm.provider === "minimax") {
|
||||
// MiniMax reports input tokens early but output tokens only in final chunk
|
||||
expect(msg.usage.input).toBeGreaterThan(0);
|
||||
expect(msg.usage.output).toBe(0);
|
||||
} else {
|
||||
expect(msg.usage.input).toBeGreaterThan(0);
|
||||
expect(msg.usage.output).toBeGreaterThan(0);
|
||||
|
|
@ -144,6 +152,22 @@ describe("Token Statistics on Abort", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider", () => {
|
||||
const llm = getModel("minimax", "MiniMax-M2.1");
|
||||
|
||||
it("should include token stats when aborted mid-stream", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testTokensOnAbort(llm);
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider", () => {
|
||||
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
|
||||
it("should include token stats when aborted mid-stream", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testTokensOnAbort(llm);
|
||||
});
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
|
||||
// =========================================================================
|
||||
|
|
@ -230,4 +254,12 @@ describe("Token Statistics on Abort", () => {
|
|||
},
|
||||
);
|
||||
});
|
||||
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider", () => {
|
||||
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
|
||||
it("should include token stats when aborted mid-stream", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testTokensOnAbort(llm);
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -3,6 +3,7 @@ import { describe, expect, it } from "vitest";
|
|||
import { getModel } from "../src/models.js";
|
||||
import { complete } from "../src/stream.js";
|
||||
import type { Api, Context, Model, OptionsForApi, Tool } from "../src/types.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
import { resolveApiKey } from "./oauth.js";
|
||||
|
||||
// Resolve OAuth tokens at module level (async, runs before tests)
|
||||
|
|
@ -170,6 +171,30 @@ describe("Tool Call Without Result Tests", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider", () => {
|
||||
const model = getModel("minimax", "MiniMax-M2.1");
|
||||
|
||||
it("should filter out tool calls without corresponding tool results", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testToolCallWithoutResult(model);
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider", () => {
|
||||
const model = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
|
||||
it("should filter out tool calls without corresponding tool results", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testToolCallWithoutResult(model);
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider", () => {
|
||||
const model = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
|
||||
it("should filter out tool calls without corresponding tool results", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testToolCallWithoutResult(model);
|
||||
});
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// OAuth-based providers (credentials from ~/.pi/agent/oauth.json)
|
||||
// =========================================================================
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ import { describe, expect, it } from "vitest";
|
|||
import { getModel } from "../src/models.js";
|
||||
import { complete } from "../src/stream.js";
|
||||
import type { Api, Context, Model, OptionsForApi, Usage } from "../src/types.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
import { resolveApiKey } from "./oauth.js";
|
||||
|
||||
// Resolve OAuth tokens at module level (async, runs before tests)
|
||||
|
|
@ -324,6 +325,52 @@ describe("totalTokens field", () => {
|
|||
);
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// MiniMax
|
||||
// =========================================================================
|
||||
|
||||
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax", () => {
|
||||
it(
|
||||
"MiniMax-M2.1 - should return totalTokens equal to sum of components",
|
||||
{ retry: 3, timeout: 60000 },
|
||||
async () => {
|
||||
const llm = getModel("minimax", "MiniMax-M2.1");
|
||||
|
||||
console.log(`\nMiniMax / ${llm.id}:`);
|
||||
const { first, second } = await testTotalTokensWithCache(llm, { apiKey: process.env.MINIMAX_API_KEY });
|
||||
|
||||
logUsage("First request", first);
|
||||
logUsage("Second request", second);
|
||||
|
||||
assertTotalTokensEqualsComponents(first);
|
||||
assertTotalTokensEqualsComponents(second);
|
||||
},
|
||||
);
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// Vercel AI Gateway
|
||||
// =========================================================================
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway", () => {
|
||||
it(
|
||||
"google/gemini-2.5-flash - should return totalTokens equal to sum of components",
|
||||
{ retry: 3, timeout: 60000 },
|
||||
async () => {
|
||||
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
|
||||
console.log(`\nVercel AI Gateway / ${llm.id}:`);
|
||||
const { first, second } = await testTotalTokensWithCache(llm, { apiKey: process.env.AI_GATEWAY_API_KEY });
|
||||
|
||||
logUsage("First request", first);
|
||||
logUsage("Second request", second);
|
||||
|
||||
assertTotalTokensEqualsComponents(first);
|
||||
assertTotalTokensEqualsComponents(second);
|
||||
},
|
||||
);
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// OpenRouter - Multiple backend providers
|
||||
// =========================================================================
|
||||
|
|
@ -535,6 +582,25 @@ describe("totalTokens field", () => {
|
|||
);
|
||||
});
|
||||
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock", () => {
|
||||
it(
|
||||
"claude-sonnet-4-5 - should return totalTokens equal to sum of components",
|
||||
{ retry: 3, timeout: 60000 },
|
||||
async () => {
|
||||
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
|
||||
console.log(`\nAmazon Bedrock / ${llm.id}:`);
|
||||
const { first, second } = await testTotalTokensWithCache(llm);
|
||||
|
||||
logUsage("First request", first);
|
||||
logUsage("Second request", second);
|
||||
|
||||
assertTotalTokensEqualsComponents(first);
|
||||
assertTotalTokensEqualsComponents(second);
|
||||
},
|
||||
);
|
||||
});
|
||||
|
||||
// =========================================================================
|
||||
// OpenAI Codex (OAuth)
|
||||
// =========================================================================
|
||||
|
|
|
|||
|
|
@ -3,6 +3,7 @@ import { describe, expect, it } from "vitest";
|
|||
import { getModel } from "../src/models.js";
|
||||
import { complete } from "../src/stream.js";
|
||||
import type { Api, Context, Model, OptionsForApi, ToolResultMessage } from "../src/types.js";
|
||||
import { hasBedrockCredentials } from "./bedrock-utils.js";
|
||||
import { resolveApiKey } from "./oauth.js";
|
||||
|
||||
// Empty schema for test tools - must be proper OBJECT type for Cloud Code Assist
|
||||
|
|
@ -617,6 +618,54 @@ describe("AI Providers Unicode Surrogate Pair Tests", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.MINIMAX_API_KEY)("MiniMax Provider Unicode Handling", () => {
|
||||
const llm = getModel("minimax", "MiniMax-M2.1");
|
||||
|
||||
it("should handle emoji in tool results", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmojiInToolResults(llm);
|
||||
});
|
||||
|
||||
it("should handle real-world LinkedIn comment data with emoji", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testRealWorldLinkedInData(llm);
|
||||
});
|
||||
|
||||
it("should handle unpaired high surrogate (0xD83D) in tool results", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testUnpairedHighSurrogate(llm);
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!process.env.AI_GATEWAY_API_KEY)("Vercel AI Gateway Provider Unicode Handling", () => {
|
||||
const llm = getModel("vercel-ai-gateway", "google/gemini-2.5-flash");
|
||||
|
||||
it("should handle emoji in tool results", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmojiInToolResults(llm);
|
||||
});
|
||||
|
||||
it("should handle real-world LinkedIn comment data with emoji", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testRealWorldLinkedInData(llm);
|
||||
});
|
||||
|
||||
it("should handle unpaired high surrogate (0xD83D) in tool results", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testUnpairedHighSurrogate(llm);
|
||||
});
|
||||
});
|
||||
|
||||
describe.skipIf(!hasBedrockCredentials())("Amazon Bedrock Provider Unicode Handling", () => {
|
||||
const llm = getModel("amazon-bedrock", "global.anthropic.claude-sonnet-4-5-20250929-v1:0");
|
||||
|
||||
it("should handle emoji in tool results", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testEmojiInToolResults(llm);
|
||||
});
|
||||
|
||||
it("should handle real-world LinkedIn comment data with emoji", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testRealWorldLinkedInData(llm);
|
||||
});
|
||||
|
||||
it("should handle unpaired high surrogate (0xD83D) in tool results", { retry: 3, timeout: 30000 }, async () => {
|
||||
await testUnpairedHighSurrogate(llm);
|
||||
});
|
||||
});
|
||||
|
||||
describe("OpenAI Codex Provider Unicode Handling", () => {
|
||||
it.skipIf(!openaiCodexToken)(
|
||||
"gpt-5.2-codex - should handle emoji in tool results",
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue