- Map redacted_thinking to ThinkingContent with redacted: true instead of
adding a new content type. The opaque payload goes in thinkingSignature,
thinking text is set to "[Reasoning redacted]" so it renders naturally
everywhere. Cross-model transform drops redacted blocks.
- Skip interleaved-thinking-2025-05-14 beta header for Opus 4.6 / Sonnet 4.6
where adaptive thinking makes it deprecated/redundant.
- Do not send temperature when thinkingEnabled is true (incompatible with
both adaptive and budget-based thinking).
Based on #1665 by @tctev
Z.ai uses the same enable_thinking: boolean parameter as Qwen to control reasoning, not thinking: { type: "enabled" | "disabled" }.
The wrong parameter name means Z.ai ignores the disable request and always runs with thinking enabled, wasting tokens and adding latency.
Merge the Z.ai and Qwen branches since they use the same format.
PR by @okuyam2y
`hasVertexAdcCredentials()` uses dynamic imports to load `node:fs`,
`node:os`, and `node:path` to avoid breaking browser/Vite builds. These
imports are fired eagerly but resolve asynchronously. If the function is
called during gateway startup before those promises resolve, `_existsSync`,
`_homedir`, and `_join` are still null — causing the function to cache
`false` permanently and never re-evaluate.
This means users with valid `GOOGLE_APPLICATION_CREDENTIALS`,
`GOOGLE_CLOUD_PROJECT`, and `GOOGLE_CLOUD_LOCATION` configured are silently
treated as unauthenticated for Vertex AI. Calls fall back to the AI Studio
endpoint (generativelanguage.googleapis.com) which has much stricter rate
limits, causing unexpected 429 errors even though Vertex credentials are
correctly configured.
Fix: in Node.js/Bun environments, return false without caching when the
async modules aren't loaded yet, so the next call retries. Only cache false
permanently in browser environments where `fs` is genuinely unavailable.
Co-authored-by: Jeremiah Gaylord <jeremiahgaylord-web@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add Gemini 3.1 Pro Preview model to the Cloud Code Assist (google-gemini-cli)
provider for parity with the google and google-vertex providers that already
include this model.
Tested and confirmed working via the Cloud Code Assist API endpoint.
Added to both OpenAI API and OpenAI Codex (ChatGPT OAuth) providers.
128k context window, text-only, research preview with zero cost.
Not yet functional via pi, may become available in the next few hours or days.
Add metadata?: Record<string, unknown> to StreamOptions so providers
can extract fields they understand. Anthropic provider extracts user_id
for abuse tracking and rate limiting. Other providers ignore it.
Based on #1384 by @7Sageer, reworked to use a generic type instead of
Anthropic-specific typing on the base interface.