This commit addresses the httpx.ReadTimeout error detected in production
by adding explicit timeout configurations to several httpx client usages:
1. MCP SSE client: Pass mcp_connect_to_server_timeout (30s) to sse_client()
2. MCP StreamableHTTP client: Pass mcp_connect_to_server_timeout (30s) to streamablehttp_client()
3. OpenAI model list API: Add 30s timeout with 10s connect timeout
4. Google AI model list/details API: Add 30s timeout with 10s connect timeout
Previously, these httpx clients were created without explicit timeouts,
which could cause ReadTimeout errors when remote servers are slow to respond.
Fixes#8073🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: datadog-official[bot] <datadog-official[bot]@users.noreply.github.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
When the Anthropic SDK detects a request may exceed 10 minutes, it
raises a ValueError requiring streaming mode. This fix catches that
specific error in request_async and automatically falls back to
streaming mode, accumulating the response into the same format as
non-streaming.
This resolves the production error:
"ValueError: Streaming is required for operations that may take
longer than 10 minutes"
Fixes#8516🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: datadog-official[bot] <datadog-official[bot]@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
Critical fixes:
- llm_client_base.send_llm_request() now calls await self.request_async() instead of self.request()
- Remove unused sync get_openai_embedding() that used sync OpenAI client
- Remove deprecated compile_in_thread_async() from Memory
These were blocking the event loop during LLM requests and embeddings.
🐾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
Adds explicit handling for httpx network errors (ReadError, WriteError,
ConnectError) in AnthropicClient, OpenAIClient, and GoogleVertexClient.
These errors can occur during streaming when the connection is unexpectedly
closed while reading/writing data.
Maps these errors to LLMConnectionError for consistent error handling.
Fixes#8221 (and duplicate #8156)
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
The Anthropic API returns a 413 status code with error type `request_too_large`
when the request payload exceeds the maximum allowed size. This error should
be converted to `ContextWindowExceededError` so the system can handle it
appropriately (e.g., by summarizing the conversation to reduce context size).
Changes:
- Added `request_too_large` and `request exceeds the maximum size` to the
early string-based error detection in `handle_llm_error`
- Added specific handling for HTTP 413 status code in the `APIStatusError`
handler
- Added tests to verify the new error handling behavior
Fixes: #8422🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: datadog-official[bot] <datadog-official[bot]@users.noreply.github.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
* feat: add zai provider support
* add zai_api_key secret to deploy-core
* add to justfile
* add testing, provider integration skill
* enable zai key
* fix zai test
* clean up skill a little
* small changes
* Revert "fix test"
This reverts commit 5126815f23cefb4edad3e3bf9e7083209dcc7bf1.
* fix server and better test
* test fix, get api key for base and byok?
* set letta default endpoint
* try to fix timeout for test
* fix for letta api key
* Delete apps/core/tests/sdk_v1/conftest.py
* Update utils.py
* clean up a few issues
* fix filterning on list_llm_models
* soft delete models with provider
* add one more test
* fix ci
* add timeout
* band aid for letta embedding provider
* info instead of error logs when creating models
* first hack with test
* remove changes integration test
* Delete apps/core/tests/sdk_v1/integration/integration_test_send_message_v2.py
* add test
* remove comment
* stage and publish api
* deprecate base level response_schema
* add param to llm_config test
---------
Co-authored-by: Ari Webb <ari@letta.com>
* feat: add support for new model
* fix: just stage-api && just publish-api (anthropic model settings changed)
* fix: just stage-api && just publish-api (anthropic model settings changed)
* fix: make kevlar have default reasoning on
* fix: bump anthropic sdk version
* fix: patch name
* pin newer version anthropic
---------
Co-authored-by: Ari Webb <ari@letta.com>
Root cause: When splitting failed embedding batches, mid=0 for single
items created empty chunks. These empty chunks were then processed,
creating hundreds of no-op tasks that consumed memory.
Crash pattern from logs:
- 600+ 'batch_size=0' embedding tasks created
- Memory spiked 531 MB → 4.9 GB
- Pod crashed
Fixes:
1. Skip empty chunks before creating tasks
2. Guard chunk splits to prevent empty slices (mid = max(1, len//2))
3. Break early if all chunks are empty
This prevents the asyncio.gather() from creating thousands of empty
coroutines that exhaust memory.