Commit Graph

327 Commits

Author SHA1 Message Date
Sarah Wooders
97cdfb4225 Revert "feat: add strict tool calling setting [LET-6902]" (#8720)
Revert "feat: add strict tool calling setting [LET-6902] (#8577)"

This reverts commit 697c9d0dee6af73ec4d5d98780e2ca7632a69173.
2026-01-19 15:54:39 -08:00
github-actions[bot]
5fbf8f93e2 fix: add explicit timeouts to httpx clients to prevent ReadTimeout errors (#8538)
This commit addresses the httpx.ReadTimeout error detected in production
by adding explicit timeout configurations to several httpx client usages:

1. MCP SSE client: Pass mcp_connect_to_server_timeout (30s) to sse_client()
2. MCP StreamableHTTP client: Pass mcp_connect_to_server_timeout (30s) to streamablehttp_client()
3. OpenAI model list API: Add 30s timeout with 10s connect timeout
4. Google AI model list/details API: Add 30s timeout with 10s connect timeout

Previously, these httpx clients were created without explicit timeouts,
which could cause ReadTimeout errors when remote servers are slow to respond.

Fixes #8073

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: datadog-official[bot] <datadog-official[bot]@users.noreply.github.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
2026-01-19 15:54:38 -08:00
github-actions[bot]
85c40c8154 fix: add streaming fallback for long-running Anthropic requests (#8564)
When the Anthropic SDK detects a request may exceed 10 minutes, it
raises a ValueError requiring streaming mode. This fix catches that
specific error in request_async and automatically falls back to
streaming mode, accumulating the response into the same format as
non-streaming.

This resolves the production error:
"ValueError: Streaming is required for operations that may take
longer than 10 minutes"

Fixes #8516

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: datadog-official[bot] <datadog-official[bot]@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
2026-01-19 15:54:38 -08:00
Sarah Wooders
bdede5f90c feat: add strict tool calling setting [LET-6902] (#8577) 2026-01-19 15:54:38 -08:00
cthomas
9b5067bed9 fix: remove unused sync code (#8613)
* chore: remove unused sync code

* chore: remove deprecated sync Google AI functions

Removes unused sync functions that used httpx.Client (blocking):
- google_ai_get_model_details()
- google_ai_get_model_context_window()
- GoogleGeminiProvider.get_model_context_window()

All code now uses async versions with httpx.AsyncClient.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:37 -08:00
cthomas
57cb2d7566 fix: async functions must call async methods (#8612)
Critical fixes:
- llm_client_base.send_llm_request() now calls await self.request_async() instead of self.request()
- Remove unused sync get_openai_embedding() that used sync OpenAI client
- Remove deprecated compile_in_thread_async() from Memory

These were blocking the event loop during LLM requests and embeddings.

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:37 -08:00
Sarah Wooders
0cbdf452fa fix: temporarily disable structured outputs for anthropic (#8491) 2026-01-12 10:57:49 -08:00
cthomas
938bb78afe fix: handle anthropic incorrect tool id bug (#8447) 2026-01-12 10:57:49 -08:00
github-actions[bot]
f2171447a8 fix: handle httpx.ReadError, WriteError, and ConnectError in LLM streaming clients (#8243)
Adds explicit handling for httpx network errors (ReadError, WriteError,
ConnectError) in AnthropicClient, OpenAIClient, and GoogleVertexClient.
These errors can occur during streaming when the connection is unexpectedly
closed while reading/writing data.

Maps these errors to LLMConnectionError for consistent error handling.

Fixes #8221 (and duplicate #8156)

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
2026-01-12 10:57:49 -08:00
github-actions[bot]
05ec02e384 fix: handle Anthropic 413 request_too_large as ContextWindowExceededError (#8424)
The Anthropic API returns a 413 status code with error type `request_too_large`
when the request payload exceeds the maximum allowed size. This error should
be converted to `ContextWindowExceededError` so the system can handle it
appropriately (e.g., by summarizing the conversation to reduce context size).

Changes:
- Added `request_too_large` and `request exceeds the maximum size` to the
  early string-based error detection in `handle_llm_error`
- Added specific handling for HTTP 413 status code in the `APIStatusError`
  handler
- Added tests to verify the new error handling behavior

Fixes: #8422

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: datadog-official[bot] <datadog-official[bot]@users.noreply.github.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
2026-01-12 10:57:48 -08:00
Ari Webb
02f3e3f3b9 fix: fix providers and models persistence (#8302) 2026-01-12 10:57:48 -08:00
Ari Webb
cc825b4f5c Revert "Revert "feat: enable provider models persistence" (#6590)" (#6595) 2026-01-12 10:57:48 -08:00
github-actions[bot]
76008c61f4 fix: handle httpx.RemoteProtocolError during LLM streaming (#8206) 2026-01-12 10:57:48 -08:00
Sarah Wooders
256dfb31cf fix: attempt to fix /context again (#7946) 2026-01-12 10:57:20 -08:00
Sarah Wooders
f512d13bc9 feat: test token counting (#7943) 2026-01-12 10:57:20 -08:00
Kevin Lin
a1dfedfb0b feat: Add structured outputs for Anthropic (#7495) 2026-01-12 10:57:19 -08:00
jnjpng
a1fb295e69 feat: enable claude pro max plan for letta code (#7663)
* base

* more

* rev
2026-01-12 10:57:19 -08:00
Ari Webb
cd45212acb feat: add zai provider support (#7626)
* feat: add zai provider support

* add zai_api_key secret to deploy-core

* add to justfile

* add testing, provider integration skill

* enable zai key

* fix zai test

* clean up skill a little

* small changes
2026-01-12 10:57:19 -08:00
Kevin Lin
33afb930fc fix: Google AI client logging as Vertex (#7337)
fix
2025-12-17 17:32:27 -08:00
Ari Webb
e47eb23522 fix: max output tokens for gemini 3 models (#7322) 2025-12-17 17:31:03 -08:00
Sarah Wooders
8729a037b9 fix: handle new openai overflow error format (#7110) 2025-12-17 17:31:02 -08:00
Charles Packer
33d39f4643 fix(core): patch usage data tracking for anthropic when context caching is on (#6997) 2025-12-15 12:03:09 -08:00
Kevin Lin
4b9485a484 feat: Add max tokens exceeded to stop reasons [LET-6480] (#6576) 2025-12-15 12:03:09 -08:00
Ari Webb
4d90f37f50 feat: add gpt-5.2 support (#6698) 2025-12-15 12:02:34 -08:00
Sarah Wooders
c8fa77a01f feat: cleanup cancellation code and add more logging (#6588) 2025-12-15 12:02:34 -08:00
Sarah Wooders
8440e319e2 Revert "feat: enable provider models persistence" (#6590)
Revert "feat: enable provider models persistence (#6193)"

This reverts commit 9682aff32640a6ee8cf71a6f18c9fa7cda25c40e.
2025-12-15 12:02:34 -08:00
Ari Webb
848a73125c feat: enable provider models persistence (#6193)
* Revert "fix test"

This reverts commit 5126815f23cefb4edad3e3bf9e7083209dcc7bf1.

* fix server and better test

* test fix, get api key for base and byok?

* set letta default endpoint

* try to fix timeout for test

* fix for letta api key

* Delete apps/core/tests/sdk_v1/conftest.py

* Update utils.py

* clean up a few issues

* fix filterning on list_llm_models

* soft delete models with provider

* add one more test

* fix ci

* add timeout

* band aid for letta embedding provider

* info instead of error logs when creating models
2025-12-15 12:02:34 -08:00
Devansh Jain
d1536df6f6 chore: Update deepseek client for v3.2 models (#6556)
* support for v3.2 models

* streaming + context window fix

* fix for no assitant text from deepseek
2025-12-15 12:02:34 -08:00
Kian Jones
d6292b6eb6 fix: bug which causes unrecoverable state if previous message was an image (#6486)
* trying tout gpt-5.1-codex

* add unit test for message content

* try to support multimodal
2025-12-15 12:02:33 -08:00
Sarah Wooders
91e3dd8b3e feat: fix new summarizer code and add more tests (#6461) 2025-12-15 12:02:19 -08:00
Charles Packer
e67c98eedb feat: add tests for prompt caching + fix anthropic prompt caching [LET-6373] (#6454)
* feat: add tests for prompt caching

* fix: add cache control breakpoints for anthropic + fix tests

* fix: silence logging

* fix: patch token counting error

* fix: same patch on non-streaming path
2025-12-15 12:02:19 -08:00
Charles Packer
88a3743cc8 fix(core): distinguish between null and 0 for prompt caching (#6451)
* fix(core): distinguish between null and 0 for prompt caching

* fix: runtime errors

* fix: just publish just sgate
2025-12-15 12:02:19 -08:00
Charles Packer
131891e05f feat: add tracking of advanced usage data (eg caching) [LET-6372] (#6449)
* feat: init refactor

* feat: add helper code

* fix: missing file + test

* fix: just state/publish api
2025-12-15 12:02:19 -08:00
Sarah Wooders
807c5c18d9 feat: add gemini token counting [LET-6371] (#6444) 2025-12-15 12:02:19 -08:00
jnjpng
c6df306ccf fix: upgrade google-genai sdk version and fix gemini 3 streaming (#6437)
* base

* base

---------

Co-authored-by: Letta Bot <noreply@letta.com>
2025-12-15 12:02:18 -08:00
Ari Webb
d4e7428c98 feat: structured outputs for anthropic [LET-6232] (#6410)
feat: structured outputs for anthropic

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-26 14:39:40 -08:00
Ari Webb
89c7ab5f14 feat: structured outputs for openai [LET-6233] (#6363)
* first hack with test

* remove changes integration test

* Delete apps/core/tests/sdk_v1/integration/integration_test_send_message_v2.py

* add test

* remove comment

* stage and publish api

* deprecate base level response_schema

* add param to llm_config test

---------

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-26 14:39:39 -08:00
Charles Packer
c4699b3d17 feat: add support for opus 4.5 (#6256)
* feat: add support for new model

* fix: just stage-api && just publish-api (anthropic model settings changed)

* fix: just stage-api && just publish-api (anthropic model settings changed)

* fix: make kevlar have default reasoning on

* fix: bump anthropic sdk version

* fix: patch name

* pin newer version anthropic

---------

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-24 19:10:27 -08:00
jnjpng
9adeb122cd feat: set thinking level high for gemini 3 (#6330)
* base

* include thoughts

* update

* clean

---------

Co-authored-by: Letta Bot <noreply@letta.com>
2025-11-24 19:10:27 -08:00
cthomas
d903056a43 fix: list indices must be int not str error [LET-6254] (#6302)
fix: list indices must be int not str error
2025-11-24 19:10:26 -08:00
cthomas
1c358c863e fix: google api sync request (#6292)
* fix: google api sync request

* remove unused function
2025-11-24 19:10:26 -08:00
cthomas
2ebf863998 fix: type error in agent step (#6287) 2025-11-24 19:10:26 -08:00
cthomas
6f810d95d8 feat: add semaphore to limit embeddings creation (#6261) 2025-11-24 19:10:11 -08:00
Ari Webb
9bff8413a0 fix: fix gemini models reasoning [LET-6228] (#6245)
* fixed locally

* fix tests

---------

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-24 19:09:33 -08:00
Ari Webb
7380eaec13 feat: enable gpt5.1 models [LET-6178] (#6175)
* hack at gpt51

* revert package lock

* first hack

* default context window

---------

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-24 19:09:33 -08:00
Kian Jones
ddb6f3836e Fix: prevent empty embedding batches from causing memory spikes (#6230)
Root cause: When splitting failed embedding batches, mid=0 for single
items created empty chunks. These empty chunks were then processed,
creating hundreds of no-op tasks that consumed memory.

Crash pattern from logs:
- 600+ 'batch_size=0' embedding tasks created
- Memory spiked 531 MB → 4.9 GB
- Pod crashed

Fixes:
1. Skip empty chunks before creating tasks
2. Guard chunk splits to prevent empty slices (mid = max(1, len//2))
3. Break early if all chunks are empty

This prevents the asyncio.gather() from creating thousands of empty
coroutines that exhaust memory.
2025-11-24 19:09:33 -08:00
Sarah Wooders
a6b19bf3aa Revert "feat: support anthropic structured outputs [LET-6190]" (#6229)
Revert "feat: support anthropic structured outputs [LET-6190] (#6200)"

This reverts commit 46f6422e68b0d35df1690118dd971c20a4ba7a81.
2025-11-24 19:09:33 -08:00
Sarah Wooders
c18af2bc81 feat: support anthropic structured outputs [LET-6190] (#6200) 2025-11-24 19:09:33 -08:00
Ari Webb
f19a71dad1 chore: log problematic chunk (#6166)
log problematic chunk

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-24 19:09:32 -08:00
Kian Jones
aafd5696c5 fix: logging and try to handle invalid embeddings (#6145)
logging and try to handkle invalid embeddings
2025-11-13 15:36:56 -08:00