20 KiB
Changelog
[Unreleased]
[0.40.1] - 2026-01-09
[0.40.0] - 2026-01-08
[0.39.1] - 2026-01-08
[0.39.0] - 2026-01-08
Fixed
- Fixed Gemini CLI abort handling: detect native
AbortErrorin retry catch block, cancel SSE reader when abort signal fires (#568 by @tmustier) - Fixed Antigravity provider 429 errors by aligning request payload with CLIProxyAPI v6.6.89: inject Antigravity system instruction with
role: "user", setrequestType: "agent", and useantigravityuserAgent. Added bridge prompt to override Antigravity behavior (identity, paths, web dev guidelines) with Pi defaults. (#571 by @ben-vargas) - Fixed thinking block handling for cross-model conversations: thinking blocks are now converted to plain text (no
<thinking>tags) when switching models. Previously,<thinking>tags caused models to mimic the pattern and output literal tags. Also fixed empty thinking blocks causing API errors. (#561)
[0.38.0] - 2026-01-08
Added
thinkingBudgetsoption inSimpleStreamOptionsfor customizing token budgets per thinking level on token-based providers (#529 by @melihmucuk)
Breaking Changes
- Removed OpenAI Codex model aliases (
gpt-5,gpt-5-mini,gpt-5-nano,codex-mini-latest,gpt-5-codex,gpt-5.1-codex,gpt-5.1-chat-latest). Use canonical model IDs:gpt-5.1,gpt-5.1-codex-max,gpt-5.1-codex-mini,gpt-5.2,gpt-5.2-codex. (#536 by @ghoulr)
Fixed
- Fixed OpenAI Codex context window from 400,000 to 272,000 tokens to match Codex CLI defaults and prevent 400 errors. (#536 by @ghoulr)
- Fixed Codex SSE error events to surface message, code, and status. (#551 by @tmustier)
- Fixed context overflow detection for
context_length_exceedederror codes.
[0.37.8] - 2026-01-07
[0.37.7] - 2026-01-07
[0.37.6] - 2026-01-06
Added
- Exported OpenAI Codex utilities:
CacheMetadata,getCodexInstructions,getModelFamily,ModelFamily,buildCodexPiBridge,buildCodexSystemPrompt,CodexSystemPrompt(#510 by @mitsuhiko)
[0.37.5] - 2026-01-06
[0.37.4] - 2026-01-06
[0.37.3] - 2026-01-06
Added
sessionIdoption inStreamOptionsfor providers that support session-based caching. OpenAI Codex provider uses this to setprompt_cache_keyand routing headers.
[0.37.2] - 2026-01-05
Fixed
- Codex provider now always includes
reasoning.encrypted_contenteven when customincludeoptions are passed (#484 by @kim0)
[0.37.1] - 2026-01-05
[0.37.0] - 2026-01-05
Breaking Changes
- OpenAI Codex models no longer have per-thinking-level variants (e.g.,
gpt-5.2-codex-high). Use the base model ID and set thinking level separately. The Codex provider clamps reasoning effort to what each model supports internally. (initial implementation by @ben-vargas in #472)
Added
- Headless OAuth support for all callback-server providers (Google Gemini CLI, Antigravity, OpenAI Codex): paste redirect URL when browser callback is unreachable (#428 by @ben-vargas, #468 by @crcatala)
- Cancellable GitHub Copilot device code polling via AbortSignal
Fixed
- Codex requests now omit the
reasoningfield entirely when thinking is off, letting the backend use its default instead of forcing a value. (#472)
[0.36.0] - 2026-01-05
Added
- OpenAI Codex OAuth provider with Responses API streaming support:
openai-codex-responsesstreaming provider with SSE parsing, tool-call handling, usage/cost tracking, and PKCE OAuth flow (#451 by @kim0)
Fixed
- Vertex AI dummy value for
getEnvApiKey(): Returns"<authenticated>"when Application Default Credentials are configured (~/.config/gcloud/application_default_credentials.jsonexists) and bothGOOGLE_CLOUD_PROJECT(orGCLOUD_PROJECT) andGOOGLE_CLOUD_LOCATIONare set. This allowsstreamSimple()to work with Vertex AI without explicitapiKeyoption. The ADC credentials file existence check is cached per-process to avoid repeated filesystem access.
[0.35.0] - 2026-01-05
[0.34.2] - 2026-01-04
[0.34.1] - 2026-01-04
[0.34.0] - 2026-01-04
[0.33.0] - 2026-01-04
[0.32.3] - 2026-01-03
Fixed
- Google Vertex AI models no longer appear in available models list without explicit authentication. Previously,
getEnvApiKey()returned a dummy value forgoogle-vertex, causing models to show up even when Google Cloud ADC was not configured.
[0.32.2] - 2026-01-03
[0.32.1] - 2026-01-03
[0.32.0] - 2026-01-03
Added
- Vertex AI provider with ADC (Application Default Credentials) support. Authenticate with
gcloud auth application-default login, setGOOGLE_CLOUD_PROJECTandGOOGLE_CLOUD_LOCATION, and access Gemini models via Vertex AI. (#300 by @default-anton)
Fixed
- Gemini CLI rate limit handling: Added automatic retry with server-provided delay for 429 errors. Parses delay from error messages like "Your quota will reset after 39s" and waits accordingly. Falls back to exponential backoff for other transient errors. (#370)
[0.31.1] - 2026-01-02
[0.31.0] - 2026-01-02
Breaking Changes
- Agent API moved: All agent functionality (
agentLoop,agentLoopContinue,AgentContext,AgentEvent,AgentTool,AgentToolResult, etc.) has moved to@mariozechner/pi-agent-core. Import from that package instead of@mariozechner/pi-ai.
Added
GoogleThinkingLeveltype: Exported type that mirrors Google'sThinkingLevelenum values ("THINKING_LEVEL_UNSPECIFIED" | "MINIMAL" | "LOW" | "MEDIUM" | "HIGH"). Allows configuring Gemini thinking levels without importing from@google/genai.ANTHROPIC_OAUTH_TOKENenv var: Now checked beforeANTHROPIC_API_KEYingetEnvApiKey(), allowing OAuth tokens to take precedence.event-stream.jsexport:AssistantMessageEventStreamutility now exported from package index.
Changed
- OAuth uses Web Crypto API: PKCE generation and OAuth flows now use Web Crypto API (
crypto.subtle) instead of Node.jscryptomodule. This improves browser compatibility while still working in Node.js 20+. - Deterministic model generation:
generate-models.tsnow sorts providers and models alphabetically for consistent output across runs. (#332 by @mrexodia)
Fixed
- OpenAI completions empty content blocks: Empty text or thinking blocks in assistant messages are now filtered out before sending to the OpenAI completions API, preventing validation errors. (#344 by @default-anton)
- Thinking token duplication: Fixed thinking content duplication with chutes.ai provider. The provider was returning thinking content in both
reasoning_contentandreasoningfields, causing each chunk to be processed twice. Now only the first non-empty reasoning field is used. - zAi provider API mapping: Fixed zAi models to use
openai-completionsAPI with correct base URL (https://api.z.ai/api/coding/paas/v4) instead of incorrect Anthropic API mapping. (#344, #358 by @default-anton)
[0.28.0] - 2025-12-25
Breaking Changes
- OAuth storage removed (#296): All storage functions (
loadOAuthCredentials,saveOAuthCredentials,setOAuthStorage, etc.) removed. Callers are responsible for storing credentials. - OAuth login functions:
loginAnthropic,loginGitHubCopilot,loginGeminiCli,loginAntigravitynow returnOAuthCredentialsinstead of saving to disk. - refreshOAuthToken: Now takes
(provider, credentials)and returns newOAuthCredentialsinstead of saving. - getOAuthApiKey: Now takes
(provider, credentials)and returns{ newCredentials, apiKey }or null. - OAuthCredentials type: No longer includes
type: "oauth"discriminator. Callers add discriminator when storing. - setApiKey, resolveApiKey: Removed. Callers must manage their own API key storage/resolution.
- getApiKey: Renamed to
getEnvApiKey. Only checks environment variables for known providers.
[0.27.7] - 2025-12-24
Fixed
- Thinking tag leakage: Fixed Claude mimicking literal
</thinking>tags in responses. Unsigned thinking blocks (from aborted streams) are now converted to plain text without<thinking>tags. The TUI still displays them as thinking blocks. (#302 by @nicobailon)
[0.25.1] - 2025-12-21
Added
- xhigh thinking level support: Added
supportsXhigh()function to check if a model supports xhigh reasoning level. Also clamps xhigh to high for OpenAI models that don't support it. (#236 by @theBucky)
Fixed
-
Gemini multimodal tool results: Fixed images in tool results causing flaky/broken responses with Gemini models. For Gemini 3, images are now nested inside
functionResponse.partsper the docs. For older models (which don't support multimodal function responses), images are sent in a separate user message. -
Queued message steering: When
getQueuedMessagesis provided, the agent loop now checks for queued user messages after each tool call and skips remaining tool calls in the current assistant message when a queued message arrives (emitting error tool results). -
Double API version path in Google provider URL: Fixed Gemini API calls returning 404 after baseUrl support was added. The SDK was appending its default apiVersion to baseUrl which already included the version path. (#251 by @shellfyred)
-
Anthropic SDK retries disabled: Re-enabled SDK-level retries (default 2) for transient HTTP failures. (#252)
[0.23.5] - 2025-12-19
Added
-
Gemini 3 Flash thinking support: Extended thinking level support for Gemini 3 Flash models (MINIMAL, LOW, MEDIUM, HIGH) to match Pro models' capabilities. (#212 by @markusylisiurunen)
-
GitHub Copilot thinking models: Added thinking support for additional Copilot models (o3-mini, o1-mini, o1-preview). (#234 by @aadishv)
Fixed
-
Gemini tool result format: Fixed tool result format for Gemini 3 Flash Preview which strictly requires
{ output: value }for success and{ error: value }for errors. Previous format using{ result, isError }was rejected by newer Gemini models. Also improved type safety by removingas anycasts. (#213, #220) -
Google baseUrl configuration: Google provider now respects
baseUrlconfiguration for custom endpoints or API proxies. (#216, #221 by @theBucky) -
GitHub Copilot vision requests: Added
Copilot-Vision-Requestheader when sending images to GitHub Copilot models. (#222) -
GitHub Copilot X-Initiator header: Fixed X-Initiator logic to check last message role instead of any message in history. This ensures proper billing when users send follow-up messages. (#209)
[0.22.3] - 2025-12-16
Added
-
Image limits test suite: Added comprehensive tests for provider-specific image limitations (max images, max size, max dimensions). Discovered actual limits: Anthropic (100 images, 5MB, 8000px), OpenAI (500 images, ≥25MB), Gemini (~2500 images, ≥40MB), Mistral (8 images, ~15MB), OpenRouter (~40 images context-limited, ~15MB). (#120)
-
Tool result streaming: Added
tool_execution_updateevent and optionalonUpdatecallback toAgentTool.execute()for streaming tool output during execution. Tools can now emit partial results (e.g., bash stdout) that are forwarded to subscribers. (#44) -
X-Initiator header for GitHub Copilot: Added X-Initiator header handling for GitHub Copilot provider to ensure correct call accounting (agent calls are not deducted from quota). Sets initiator based on last message role. (#200 by @kim0)
Changed
- Normalized tool_execution_end result:
tool_execution_endevent now always containsAgentToolResult(no longerAgentToolResult | string). Errors are wrapped in the standard result format.
Fixed
- Reasoning disabled by default: When
reasoningoption is not specified, thinking is now explicitly disabled for all providers. Previously, some providers like Gemini with "dynamic thinking" would use their default (thinking ON), causing unexpected token usage. This was the original intended behavior. (#180 by @markusylisiurunen)
[0.22.2] - 2025-12-15
Added
- Interleaved thinking for Anthropic: Added
interleavedThinkingoption toAnthropicOptions. When enabled, Claude 4 models can think between tool calls and reason after receiving tool results. Enabled by default (no extra token cost, just unlocks the capability). SetinterleavedThinking: falseto disable.
[0.22.1] - 2025-12-15
Dedicated to Peter's shoulder (@steipete)
Added
- Interleaved thinking for Anthropic: Enabled interleaved thinking in the Anthropic provider, allowing Claude models to output thinking blocks interspersed with text responses.
[0.22.0] - 2025-12-15
Added
- GitHub Copilot provider: Added
github-copilotas a known provider with models sourced from models.dev. Includes Claude, GPT, Gemini, Grok, and other models available through GitHub Copilot. (#191 by @cau1k)
Fixed
-
GitHub Copilot gpt-5 models: Fixed API selection for gpt-5 models to use
openai-responsesinstead ofopenai-completions(gpt-5 models are not accessible via completions endpoint) -
GitHub Copilot cross-model context handoff: Fixed context handoff failing when switching between GitHub Copilot models using different APIs (e.g., gpt-5 to claude-sonnet-4). Tool call IDs from OpenAI Responses API were incompatible with other models. (#198)
-
Gemini 3 Pro thinking levels: Thinking level configuration now works correctly for Gemini 3 Pro models. Previously all levels mapped to -1 (minimal thinking). Now LOW/MEDIUM/HIGH properly control test-time computation. (#176 by @markusylisiurunen)
[0.18.2] - 2025-12-11
Changed
- Anthropic SDK retries disabled: Set
maxRetries: 0on Anthropic client to allow application-level retry handling. The SDK's built-in retries were interfering with coding-agent's retry logic. (#157)
[0.18.1] - 2025-12-10
Added
- Mistral provider: Added support for Mistral AI models via the OpenAI-compatible API. Includes automatic handling of Mistral-specific requirements (tool call ID format). Set
MISTRAL_API_KEYenvironment variable to use.
Fixed
-
Fixed Mistral 400 errors after aborted assistant messages by skipping empty assistant messages (no content, no tool calls) (#165)
-
Removed synthetic assistant bridge message after tool results for Mistral (no longer required as of Dec 2025) (#165)
-
Fixed bug where
ANTHROPIC_API_KEYenvironment variable was deleted globally after first OAuth token usage, causing subsequent prompts to fail (#164)
[0.17.0] - 2025-12-09
Added
agentLoopContinuefunction: Continue an agent loop from existing context without adding a new user message. Validates that the last message isuserortoolResult. Useful for retry after context overflow or resuming from manually-added tool results.
Breaking Changes
- Removed provider-level tool argument validation. Validation now happens in
agentLoopviaexecuteToolCalls, allowing models to retry on validation errors. For manual tool execution, usevalidateToolCall(tools, toolCall)orvalidateToolArguments(tool, toolCall).
Added
-
Added
validateToolCall(tools, toolCall)helper that finds the tool by name and validates arguments. -
OpenAI compatibility overrides: Added
compatfield toModelforopenai-completionsAPI, allowing explicit configuration of provider quirks (supportsStore,supportsDeveloperRole,supportsReasoningEffort,maxTokensField). Falls back to URL-based detection if not set. Useful for LiteLLM, custom proxies, and other non-standard endpoints. (#133, thanks @fink-andreas for the initial idea and PR) -
xhigh reasoning level: Added
xhightoReasoningEfforttype for OpenAI codex-max models. For non-OpenAI providers (Anthropic, Google),xhighis automatically mapped tohigh. (#143)
Changed
- Updated SDK versions: OpenAI SDK 5.21.0 → 6.10.0, Anthropic SDK 0.61.0 → 0.71.2, Google GenAI SDK 1.30.0 → 1.31.0
[0.13.0] - 2025-12-06
Breaking Changes
- Added
totalTokensfield toUsagetype: All code that constructsUsageobjects must now include thetotalTokensfield. This field represents the total tokens processed by the LLM (input + output + cache). For OpenAI and Google, this uses native API values (total_tokens,totalTokenCount). For Anthropic, it's computed asinput + output + cacheRead + cacheWrite.
[0.12.10] - 2025-12-04
Added
- Added
gpt-5.1-codex-maxmodel support
Fixed
-
OpenAI Token Counting: Fixed
usage.inputto exclude cached tokens for OpenAI providers. Previously,inputincluded cached tokens, causing double-counting when calculating total context size viainput + cacheRead. Nowinputrepresents non-cached input tokens across all providers, makinginput + output + cacheRead + cacheWritethe correct formula for total context size. -
Fixed Claude Opus 4.5 cache pricing (was 3x too expensive)
- Corrected cache_read: $1.50 → $0.50 per MTok
- Corrected cache_write: $18.75 → $6.25 per MTok
- Added manual override in
scripts/generate-models.tsuntil upstream fix is merged - Submitted PR to models.dev: https://github.com/sst/models.dev/pull/439
[0.9.4] - 2025-11-26
Initial release with multi-provider LLM support.