Extracted HTTP proxy setup to shared module and imported it from both
stream.ts and oauth/index.ts. This ensures fetch() calls during OAuth
flows (token exchange, refresh, project discovery) go through the proxy.
fixes#1132
Allows users to override the Antigravity User-Agent version when Google
updates their version requirements, avoiding the need to wait for a
package release.
Fixes#1129
When a provider (e.g., Google Gemini CLI) requests a retry delay longer
than maxDelayMs (default: 60s), the request fails immediately with an
informative error instead of waiting silently for hours.
The error is then handled by agent-level auto-retry, which shows the
delay to the user and allows aborting with Escape.
- Add maxRetryDelayMs to StreamOptions (packages/ai)
- Add maxRetryDelayMs to AgentOptions (packages/agent)
- Add retry.maxDelayMs to settings (packages/coding-agent)
- Update _isRetryableError to match 'retry delay' errors
fixes#1123
- Add kimi-coding provider using Anthropic Messages API
- API endpoint: https://api.kimi.com/coding/v1
- Environment variable: KIMI_API_KEY
- Models: kimi-k2-thinking (text), k2p5 (text + image)
- Add context overflow detection pattern for Kimi errors
- Add tests for all standard test suites
- Add huggingface to KnownProvider type
- Add HF_TOKEN env var mapping
- Process huggingface models from models.dev (14 models)
- Use openai-completions API with compat settings
- Add tests for all provider test suites
- Update documentation
fixes#994
Adds support for extended cache retention via PI_CACHE_RETENTION=long:
- Anthropic: 5m -> 1h TTL
- OpenAI: in-memory -> 24h retention
Only applies to direct API calls (api.anthropic.com, api.openai.com).
Proxies and other providers are unaffected.
fixes#967
- Handle pipe-separated IDs from OpenAI Responses API in openai-completions provider
- Strip trailing underscores after truncation in openai-responses-shared (OpenAI Codex rejects them)
- Add regression tests for tool call ID normalization
fixes#1022
429 (Too Many Requests) was incorrectly classified as context overflow,
triggering compaction instead of retry with backoff. The original logic
assumed token-based rate limiting correlates with context overflow, but
these are different concepts:
- Rate limiting (429): requests/tokens per time period (throughput)
- Context overflow: single request exceeds context window (size)
Now 429 errors are handled by the existing retry logic with exponential
backoff, while 400/413 remain as potential context overflow indicators.
fixes#1038
Proxies like Portkey omit input_tokens in message_delta events (it's nullable
per the SDK). The previous code unconditionally overwrote usage fields, causing
input token counts to reset to 0.
Now only updates usage fields when they are present (not null), preserving
the correct input_tokens value captured from message_start.
Fixes#1045
Allows custom models to specify which upstream providers OpenRouter
should route requests to via the `openRouterRouting` field in model
definitions.
Supported fields:
- `only`: list of provider slugs to exclusively use
- `order`: list of provider slugs to try in order
- Add resetApiProviders() to clear and re-register built-in providers
- Add createAssistantMessageEventStream() factory for extensions
- Add streamSimple support in ProviderConfig for custom API implementations
- Call resetApiProviders() on /reload to clean up extension providers
- Add custom-provider.md documentation
- Add custom-provider.ts example with full Anthropic implementation
- Update extensions.md with streamSimple config option