Z.ai uses the same enable_thinking: boolean parameter as Qwen to control reasoning, not thinking: { type: "enabled" | "disabled" }.
The wrong parameter name means Z.ai ignores the disable request and always runs with thinking enabled, wasting tokens and adding latency.
Merge the Z.ai and Qwen branches since they use the same format.
PR by @okuyam2y
`hasVertexAdcCredentials()` uses dynamic imports to load `node:fs`,
`node:os`, and `node:path` to avoid breaking browser/Vite builds. These
imports are fired eagerly but resolve asynchronously. If the function is
called during gateway startup before those promises resolve, `_existsSync`,
`_homedir`, and `_join` are still null — causing the function to cache
`false` permanently and never re-evaluate.
This means users with valid `GOOGLE_APPLICATION_CREDENTIALS`,
`GOOGLE_CLOUD_PROJECT`, and `GOOGLE_CLOUD_LOCATION` configured are silently
treated as unauthenticated for Vertex AI. Calls fall back to the AI Studio
endpoint (generativelanguage.googleapis.com) which has much stricter rate
limits, causing unexpected 429 errors even though Vertex credentials are
correctly configured.
Fix: in Node.js/Bun environments, return false without caching when the
async modules aren't loaded yet, so the next call retries. Only cache false
permanently in browser environments where `fs` is genuinely unavailable.
Co-authored-by: Jeremiah Gaylord <jeremiahgaylord-web@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add Gemini 3.1 Pro Preview model to the Cloud Code Assist (google-gemini-cli)
provider for parity with the google and google-vertex providers that already
include this model.
Tested and confirmed working via the Cloud Code Assist API endpoint.
The previous lazy-loading change switched to require("koffi") inside
enableWindowsVTInput(). In ESM, require is undefined, so the call
threw and VT input mode was silently not enabled on Windows.
Use createRequire(import.meta.url) at module scope and
cjsRequire("koffi") at the call site. This keeps koffi externalized
for Bun binaries while restoring Windows VT input behavior, including
multiline paste handling.
The test used 'github.com/test/extension' as the git source, but
parseGitUrl() requires a 'git:' prefix for bare hostnames. Changed
to 'git:github.com/test/extension' so the source is correctly
parsed as a git type and update() actually runs.
When resolving --model zai/glm-5, the resolver now correctly interprets
'zai' as the provider and 'glm-5' as the model id, rather than matching
a vercel-ai-gateway model whose id is literally 'zai/glm-5'.
If the provider/model split fails to find a match, falls back to raw id
matching to still support OpenRouter-style ids like 'openai/gpt-4o:extended'.
Changed koffi import from top-level to dynamic require in
enableWindowsVTInput() and added --external koffi to bun build.
This prevents embedding all 18 platform .node files (~74MB) into
every compiled binary. For Windows builds, only the win32_x64
koffi.node is shipped alongside the binary.
Binary size reduction: darwin-arm64 142MB -> 67MB, archive 43MB -> 28MB.