fix(core): handle Anthropic overloaded errors and Unicode encoding issues (#9305)

* fix: handle Anthropic overloaded_error in streaming interfaces

* fix: handle Unicode surrogates in OpenAI requests

Sanitize Unicode surrogate pairs before sending requests to OpenAI API.
Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause
UnicodeEncodeError when encoding to UTF-8.

Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in
position 326605: surrogates not allowed

* fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests

Improved sanitize_unicode_surrogates() to explicitly filter out lone
surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8.

Previous implementation used errors='ignore' which could still fail in
edge cases. New approach directly checks Unicode code points and removes
any surrogates before data reaches httpx encoding.

Also added sanitization to stream_async_responses() method which was
missing it.

Fixes: 'utf-8' codec can't encode character '\ud83c' in position X:
surrogates not allowed
This commit is contained in:
Kian Jones
2026-02-05 16:00:36 -08:00
committed by Caren Thomas
parent 93249b96f5
commit 6f746c5225
12 changed files with 98 additions and 36 deletions

View File

@@ -229,7 +229,6 @@ class TestSchemaValidator:
assert status == SchemaHealth.STRICT_COMPLIANT
assert reasons == []
def test_root_level_without_required_non_strict(self):
"""Test that root-level objects without 'required' field are STRICT_COMPLIANT (validator is relaxed)."""
schema = {