mirror of
https://github.com/getcompanion-ai/co-mono.git
synced 2026-04-15 18:01:22 +00:00
Fixes issue where unpaired Unicode surrogates in tool results cause JSON serialization errors in API providers, particularly Anthropic. - Add sanitizeSurrogates() utility function to remove unpaired surrogates - Apply sanitization in all provider convertMessages() functions: - User message text content (string and text blocks) - Assistant message text and thinking blocks - Tool result output - System prompts - Valid emoji (properly paired surrogates) are preserved - Add comprehensive test suite covering all 8 providers Previously only Google and Groq handled unpaired surrogates correctly. Now all providers (Anthropic, OpenAI Completions/Responses, Google, xAI, Groq, Cerebras, zAI) sanitize text before API submission.
25 lines
1.1 KiB
TypeScript
25 lines
1.1 KiB
TypeScript
/**
|
|
* Removes unpaired Unicode surrogate characters from a string.
|
|
*
|
|
* Unpaired surrogates (high surrogates 0xD800-0xDBFF without matching low surrogates 0xDC00-0xDFFF,
|
|
* or vice versa) cause JSON serialization errors in many API providers.
|
|
*
|
|
* Valid emoji and other characters outside the Basic Multilingual Plane use properly paired
|
|
* surrogates and will NOT be affected by this function.
|
|
*
|
|
* @param text - The text to sanitize
|
|
* @returns The sanitized text with unpaired surrogates removed
|
|
*
|
|
* @example
|
|
* // Valid emoji (properly paired surrogates) are preserved
|
|
* sanitizeSurrogates("Hello 🙈 World") // => "Hello 🙈 World"
|
|
*
|
|
* // Unpaired high surrogate is removed
|
|
* const unpaired = String.fromCharCode(0xD83D); // high surrogate without low
|
|
* sanitizeSurrogates(`Text ${unpaired} here`) // => "Text here"
|
|
*/
|
|
export function sanitizeSurrogates(text: string): string {
|
|
// Replace unpaired high surrogates (0xD800-0xDBFF not followed by low surrogate)
|
|
// Replace unpaired low surrogates (0xDC00-0xDFFF not preceded by high surrogate)
|
|
return text.replace(/[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]/g, "");
|
|
}
|