mirror of
https://github.com/getcompanion-ai/co-mono.git
synced 2026-04-17 11:04:51 +00:00
Add Unicode surrogate sanitization for all providers
Fixes issue where unpaired Unicode surrogates in tool results cause JSON serialization errors in API providers, particularly Anthropic. - Add sanitizeSurrogates() utility function to remove unpaired surrogates - Apply sanitization in all provider convertMessages() functions: - User message text content (string and text blocks) - Assistant message text and thinking blocks - Tool result output - System prompts - Valid emoji (properly paired surrogates) are preserved - Add comprehensive test suite covering all 8 providers Previously only Google and Groq handled unpaired surrogates correctly. Now all providers (Anthropic, OpenAI Completions/Responses, Google, xAI, Groq, Cerebras, zAI) sanitize text before API submission.
This commit is contained in:
parent
949cd4efd8
commit
4e7a340460
6 changed files with 420 additions and 24 deletions
25
packages/ai/src/utils/sanitize-unicode.ts
Normal file
25
packages/ai/src/utils/sanitize-unicode.ts
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
/**
|
||||
* Removes unpaired Unicode surrogate characters from a string.
|
||||
*
|
||||
* Unpaired surrogates (high surrogates 0xD800-0xDBFF without matching low surrogates 0xDC00-0xDFFF,
|
||||
* or vice versa) cause JSON serialization errors in many API providers.
|
||||
*
|
||||
* Valid emoji and other characters outside the Basic Multilingual Plane use properly paired
|
||||
* surrogates and will NOT be affected by this function.
|
||||
*
|
||||
* @param text - The text to sanitize
|
||||
* @returns The sanitized text with unpaired surrogates removed
|
||||
*
|
||||
* @example
|
||||
* // Valid emoji (properly paired surrogates) are preserved
|
||||
* sanitizeSurrogates("Hello 🙈 World") // => "Hello 🙈 World"
|
||||
*
|
||||
* // Unpaired high surrogate is removed
|
||||
* const unpaired = String.fromCharCode(0xD83D); // high surrogate without low
|
||||
* sanitizeSurrogates(`Text ${unpaired} here`) // => "Text here"
|
||||
*/
|
||||
export function sanitizeSurrogates(text: string): string {
|
||||
// Replace unpaired high surrogates (0xD800-0xDBFF not followed by low surrogate)
|
||||
// Replace unpaired low surrogates (0xDC00-0xDFFF not preceded by high surrogate)
|
||||
return text.replace(/[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]/g, "");
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue