letta-server/letta/llm_api at 68eb076135d4b7c099ef8a770dfbb0f248e2a979 - letta-server - WIUF Gitea: Waiting is - Until Fullness

Fimeg/letta-server

Files

History

Kian Jones 630c147b13 fix: truncate oversized text in embedding requests (#9196 )

fix: handle oversized text in embedding requests with recursive chunking

When message text exceeds the embedding model's context length, recursively
split it until all chunks can be embedded successfully.

Changes:
- `tpuf_client.py`: Add `_split_text_in_half()` helper for recursive splitting
- `tpuf_client.py`: Add `_generate_embeddings_with_chunking()` that retries
  with splits on context length errors
- `tpuf_client.py`: Store `message_id` and `chunk_index` columns in Turbopuffer
- `tpuf_client.py`: Deduplicate query results by `message_id`
- `tpuf_client.py`: Use `LettaInvalidArgumentError` instead of `ValueError`
- `tpuf_client.py`: Move LLMClient import to top of file
- `openai_client.py`: Remove fixed truncation (chunking handles this now)
- Add tests for `_split_text_in_half` and chunked query deduplication

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>

2026-02-24 10:52:06 -08:00

..

sample_response_jsons

merge this (#4759 )

2025-09-17 15:47:40 -07:00

__init__.py

merge this (#4759 )

2025-09-17 15:47:40 -07:00

anthropic_client.py

feat: add minimax byok to ui (#9101 )

2026-01-29 12:44:04 -08:00

anthropic_constants.py

feat: Add structured outputs for Anthropic (#7495 )

2026-01-12 10:57:19 -08:00

azure_client.py

merge this (#4759 )

2025-09-17 15:47:40 -07:00

bedrock_client.py

feat: enable bedrock for anthropic models (#8847 )

2026-01-19 15:54:44 -08:00

chatgpt_oauth_client.py

refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065 )

2026-01-29 12:44:04 -08:00

deepseek_client.py

chore: Update deepseek client for v3.2 models (#6556 )

2025-12-15 12:02:34 -08:00

error_utils.py

fix: handle new openai overflow error format (#7110 )

2025-12-17 17:31:02 -08:00

google_ai_client.py

fix: add explicit timeouts to httpx clients to prevent ReadTimeout errors (#8538 )

2026-01-19 15:54:38 -08:00

google_constants.py

fix: max output tokens for gemini 3 models (#7322 )

2025-12-17 17:31:03 -08:00

google_vertex_client.py

refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065 )

2026-01-29 12:44:04 -08:00

groq_client.py

fix: filter our reasoning for groq client [LET-7135] (#8982 )

2026-01-29 12:43:53 -08:00

helpers.py

fix: remove unused sync code (#8613 )

2026-01-19 15:54:37 -08:00

llm_api_tools.py

feat: openrouter byok (#9148 )

2026-01-29 12:44:04 -08:00

llm_client_base.py

refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065 )

2026-01-29 12:44:04 -08:00

llm_client.py

feat: openrouter byok (#9148 )

2026-01-29 12:44:04 -08:00

minimax_client.py

feat: add minimax byok to ui (#9101 )

2026-01-29 12:44:04 -08:00

mistral.py

merge this (#4759 )

2025-09-17 15:47:40 -07:00

openai_client.py

fix: truncate oversized text in embedding requests (#9196 )

2026-02-24 10:52:06 -08:00

openai.py

fix: add explicit timeouts to httpx clients to prevent ReadTimeout errors (#8538 )

2026-01-19 15:54:38 -08:00

together_client.py

merge this (#4759 )

2025-09-17 15:47:40 -07:00

xai_client.py

feat: add tool return truncation to summarization as a fallback [LET-5970] (#5859 )

2025-11-13 15:36:30 -08:00

zai_client.py

feat: add zai provider support (#7626 )

2026-01-12 10:57:19 -08:00