Files
letta-server/letta/llm_api
Kian Jones 630c147b13 fix: truncate oversized text in embedding requests (#9196)
fix: handle oversized text in embedding requests with recursive chunking

When message text exceeds the embedding model's context length, recursively
split it until all chunks can be embedded successfully.

Changes:
- `tpuf_client.py`: Add `_split_text_in_half()` helper for recursive splitting
- `tpuf_client.py`: Add `_generate_embeddings_with_chunking()` that retries
  with splits on context length errors
- `tpuf_client.py`: Store `message_id` and `chunk_index` columns in Turbopuffer
- `tpuf_client.py`: Deduplicate query results by `message_id`
- `tpuf_client.py`: Use `LettaInvalidArgumentError` instead of `ValueError`
- `tpuf_client.py`: Move LLMClient import to top of file
- `openai_client.py`: Remove fixed truncation (chunking handles this now)
- Add tests for `_split_text_in_half` and chunked query deduplication

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
..
2025-09-17 15:47:40 -07:00
2025-09-17 15:47:40 -07:00
2025-09-17 15:47:40 -07:00
2026-01-29 12:44:04 -08:00
2026-01-29 12:44:04 -08:00
2025-09-17 15:47:40 -07:00
2025-09-17 15:47:40 -07:00