Commit Graph

395 Commits

Author SHA1 Message Date
Kevin Lin
bd5b5fa9f3 feat(gemini): add 3.1 pro preview support (#9553)
Add 3.1 model metadata for Google AI and update Gemini tests/examples to use the new handle.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
Ari Webb
21765d16c9 fix(core): add OpenAI 24h prompt cache retention for supported models (#9509)
* fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492)

* fix(core): apply OpenAI prompt cache settings to request payloads

Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): prefix prompt cache key with letta tag

Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* add integration test

* skip test

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Ari Webb <ari@letta.com>

* fix(core): only set prompt_cache_retention, drop prompt_cache_key

Two issues with the original prompt_cache_key approach:
1. Key exceeded 64-char max (agent-<uuid>:conv-<uuid> = 90 chars)
2. Setting an explicit key disrupted OpenAI's default prefix-hash
   routing, dropping cache hit rates from 40-45% to 10-13%

OpenAI's default routing (hash of first ~256 tokens) already provides
good cache affinity since each agent has a unique system prompt.
We only need prompt_cache_retention="24h" for extended retention.

Also fixes:
- Operator precedence bug in _supports_extended_prompt_cache_retention
- Removes incorrect gpt-5.2-codex exclusion (it IS supported per docs)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Charles Packer <packercharles@gmail.com>
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
Kian Jones
f5c4ab50f4 chore: add ty + pre-commit hook and repeal even more ruff rules (#9504)
* auto fixes

* auto fix pt2 and transitive deps and undefined var checking locals()

* manual fixes (ignored or letta-code fixed)

* fix circular import

* remove all ignores, add FastAPI rules and Ruff rules

* add ty and precommit

* ruff stuff

* ty check fixes

* ty check fixes pt 2

* error on invalid
2026-02-24 10:55:11 -08:00
Devansh Jain
39ddda81cc feat: add Anthropic Sonnet 4.6 (#9408) 2026-02-24 10:55:11 -08:00
Kian Jones
25d54dd896 chore: enable F821, F401, W293 (#9503)
* auto fixes

* auto fix pt2 and transitive deps and undefined var checking locals()

* manual fixes (ignored or letta-code fixed)

* fix circular import
2026-02-24 10:55:08 -08:00
Ari Webb
fa70e09963 Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention" (#9502)
Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention …"

This reverts commit f5bb9c629cb7d45544e90758cdfb899bcef41912.
2026-02-24 10:52:07 -08:00
Charles Packer
619e81ed1e fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492)
* fix(core): apply OpenAI prompt cache settings to request payloads

Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): prefix prompt cache key with letta tag

Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* add integration test

* skip test

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Ari Webb <ari@letta.com>
2026-02-24 10:52:07 -08:00
jnjpng
5b001a7749 fix: rename ChatGPT server error to ChatGPT API error (#9497)
fix: rename ChatGPT server error to ChatGPT API error in error messages
2026-02-24 10:52:07 -08:00
jnjpng
fbc0bb60d9 fix: retry ChatGPT 502 and upstream connection errors with exponential backoff (#9495)
502s and upstream connection errors (envoy proxy failures) from ChatGPT
were not being retried. This classifies them as LLMConnectionError (retryable)
in both the streaming and non-streaming paths, and adds retry handling in
the non-streaming HTTPStatusError handler so 502s get the same exponential
backoff treatment as transport-level connection drops.

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
amysguan
80a0d1a95f Add LLM client compaction errors to traces (#9474)
* add llm client errors to traces

* update response json for telemetry

* prevent silent failures and properly log errored responses in streaming path

* remove double logging

---------

Co-authored-by: Amy Guan <amy@letta.com>
Co-authored-by: Kian Jones <kian@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
80f34f134d fix(core): catch bare openai.APIError in handle_llm_error (#9468)
* fix(core): catch bare openai.APIError in handle_llm_error fallthrough

openai.APIError raised during streaming (e.g. OpenRouter credit
exhaustion) is not an APIStatusError, so it skipped the catch-all
at the end and fell through to LLMError("Unhandled"). Now bare
APIErrors that aren't context window overflows are mapped to
LLMBadRequestError.

Datadog: https://us5.datadoghq.com/error-tracking/issue/7a2c356c-0849-11f1-be66-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat(core): add LLMInsufficientCreditsError for BYOK credit exhaustion

Adds dedicated error type for insufficient credits/quota across all
providers (OpenAI, Anthropic, Google). Returns HTTP 402 with
BYOK-aware messaging instead of generic 400.

- New LLMInsufficientCreditsError class and PAYMENT_REQUIRED ErrorCode
- is_insufficient_credits_message() helper detecting credit/quota strings
- All 3 provider clients detect 402 status + credit keywords
- FastAPI handler returns 402 with "your API key" vs generic messaging
- 5 new parametrized tests covering OpenRouter, OpenAI, and negative case

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Ari Webb
cfd2ca3102 fix: zai clear empty messages (#9466) 2026-02-24 10:52:07 -08:00
jnjpng
778f28ccf3 fix: handle transient network errors in ChatGPT OAuth client (#9462)
- Map httpx.ReadError/WriteError/ConnectError to LLMConnectionError in
  handle_llm_error so Temporal correctly classifies them as retryable
  (previously fell through to generic non-retryable LLMError)
- Add client-level retry with exponential backoff (up to 3 attempts) on
  request_async and stream_async for transient transport errors
- Stream retry is guarded by has_yielded flag to avoid corrupting
  partial responses already consumed by the caller
2026-02-24 10:52:07 -08:00
Kian Jones
b9c4ed3b15 fix: catch contextwindowexceeded error on gemini (#9450)
* catch contextwindowexceeded error

* fix(core): detect Google token limit errors as ContextWindowExceededError

Google's error message says "input token count exceeds the maximum
number of tokens allowed" which doesn't contain the word "context",
so it was falling through to generic LLMBadRequestError instead of
ContextWindowExceededError. This means compaction won't auto-trigger.

Expands the detection to also match "token count" and "tokens allowed"
in addition to the existing "context" keyword.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): add missing message arg to LLMBadRequestError in OpenAI client

The generic 400 path in handle_llm_error was constructing
LLMBadRequestError without the required message positional arg,
causing TypeError in prod during summarization.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* ci: add adapters/ test suite to core unit test matrix

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(tests): update adapter error handling test expectations to match actual behavior

The streaming adapter's error handling double-wraps errors: the
AnthropicStreamingInterface calls handle_llm_error first, then the
adapter catches the result and calls handle_llm_error again, which
falls through to the base class LLMError. Updated test expectations
to match this behavior.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): prevent double-wrapping of LLMError in stream adapter

The AnthropicStreamingInterface.process() already transforms raw
provider errors into LLMError subtypes via handle_llm_error. The
adapter was catching the result and calling handle_llm_error again,
which didn't recognize the already-transformed LLMError and wrapped
it in a generic LLMError("Unhandled LLM error"). This downgraded
specific error types (LLMConnectionError, LLMServerError, etc.)
and broke retry logic that matches on specific subtypes.

Now the adapter checks if the error is already an LLMError and
re-raises it as-is. Tests restored to original correct expectations.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
5b7dd15905 fix(core): use BYOK API keys for Google AI/Vertex LLM requests (#9439)
GoogleAIClient and GoogleVertexClient were hardcoding Letta's managed
credentials for all requests, ignoring user-provided BYOK API keys.
This meant Letta was paying Google API costs for BYOK users.

Add _get_client_async and update _get_client to check BYOK overrides
(via get_byok_overrides / get_byok_overrides_async) before falling back
to managed credentials, matching the pattern used by OpenAIClient and
AnthropicClient.

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Ari Webb
d0e25ae471 feat: add glm 5 to core (#9436)
* feat: add glm 5 to core

* test glm 5
2026-02-24 10:52:07 -08:00
Kian Jones
7c65fd77f1 fix(core): return 400 for Google GenAI ClientError bad requests (#9357)
Google genai.errors.ClientError with code 400 was being caught and
wrapped as LLMBadRequestError but returned to clients as 502 because
no dedicated FastAPI exception handler existed for LLMBadRequestError.

- Add LLMBadRequestError exception handler in app.py returning HTTP 400
- Fix ErrorCode on Google 400 bad requests from INTERNAL_SERVER_ERROR
  to INVALID_ARGUMENT
- Route Google API errors through handle_llm_error in stream_async path

Datadog: https://us5.datadoghq.com/error-tracking/issue/4eb3ff3c-d937-11f0-8177-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
411bb63990 fix(core): improve error handling for upstream LLM provider errors (#9423)
Handle HTML error responses from ALB/load balancers in OpenAI client and
add explicit InternalServerError handling for Anthropic upstream issues.

🐛 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
382e216cbb fix(core): differentiate BYOK vs base provider in all LLM error details (#9425)
Add is_byok flag to every LLMError's details dict returned from
handle_llm_error across all providers (OpenAI, Anthropic, Google,
ChatGPT OAuth). This enables observability into whether errors
originate from Letta's production keys or user-provided BYOK keys.

The rate limit handler in app.py now returns a more helpful message
for BYOK users ("check your provider's rate limits and billing")
versus the generic message for base provider rate limits.

Datadog issues:
- https://us5.datadoghq.com/error-tracking/issue/b711c824-f490-11f0-96e4-da7ad0900000
- https://us5.datadoghq.com/error-tracking/issue/76623036-f4de-11f0-8697-da7ad0900000
- https://us5.datadoghq.com/error-tracking/issue/43e9888a-dfcf-11f0-a645-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
424a1ada64 fix: google gen ai format error fix (#9147)
* google gen ai format error fix

* fix(core): add $ref safety net, warning log, and unit tests for Google schema resolution

- Add `$ref` to unsupported_keys in `_clean_google_ai_schema_properties` so unresolvable refs (e.g. `#/properties/...` style) are stripped as a safety net instead of crashing the Google SDK
- Add warning log when `_resolve_json_schema_refs` encounters a ref it cannot resolve
- Deduplicate the `#/$defs/` and `#/definitions/` resolution branches
- Add 11 unit tests covering: single/multiple $defs, nested refs, refs in anyOf/allOf, array items, definitions key, unresolvable refs, and the full resolve+clean pipeline

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kevin Lin
23c94ec6d3 feat: add log probabilities from OpenAI-compatible servers and SGLang native endpoint (#9240)
* Add log probabilities support for RL training

This enables Letta server to request and return log probabilities from
OpenAI-compatible providers (including SGLang) for use in RL training.

Changes:
- LLMConfig: Add return_logprobs and top_logprobs fields
- OpenAIClient: Set logprobs in ChatCompletionRequest when enabled
- LettaLLMAdapter: Add logprobs field and extract from response
- LettaResponse: Add logprobs field to return log probs to client
- LettaRequest: Add return_logprobs/top_logprobs for per-request override
- LettaAgentV3: Store and pass logprobs through to response
- agents.py: Handle request-level logprobs override

Usage:
  response = client.agents.messages.create(
      agent_id=agent_id,
      messages=[...],
      return_logprobs=True,
      top_logprobs=5,
  )
  print(response.logprobs)  # Per-token log probabilities

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* Add multi-turn token tracking for RL training via SGLang native endpoint

- Add TurnTokenData schema to track token IDs and logprobs per turn
- Add return_token_ids flag to LettaRequest and LLMConfig
- Create SGLangNativeClient for /generate endpoint (returns output_ids)
- Create SGLangNativeAdapter that uses native endpoint
- Modify LettaAgentV3 to accumulate turns across LLM calls
- Include turns in LettaResponse when return_token_ids=True

* Fix: Add SGLang native adapter to step() method, not just stream()

* Fix: Handle Pydantic Message objects in SGLang native adapter

* Fix: Remove api_key reference from LLMConfig (not present)

* Fix: Add missing 'created' field to ChatCompletionResponse

* Add full tool support to SGLang native adapter

- Format tools into prompt in Qwen-style format
- Parse tool calls from <tool_call> tags in response
- Format tool results as <tool_response> in user messages
- Set finish_reason to 'tool_calls' when tools are called

* Use tokenizer.apply_chat_template for proper tool formatting

- Add tokenizer caching in SGLang native adapter
- Use apply_chat_template when tokenizer available
- Fall back to manual formatting if not
- Convert Letta messages to OpenAI format for tokenizer

* Fix: Use func_response instead of tool_return for ToolReturn content

* Fix: Get output_token_logprobs from meta_info in SGLang response

* Fix: Allow None in output_token_logprobs (SGLang format includes null)

* chore: remove unrelated files from logprobs branch

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add missing call_type param to adapter constructors in letta_agent_v3

The SGLang refactor dropped call_type=LLMCallType.agent_step when extracting
adapter creation into conditional blocks. Restores it for all 3 spots (SGLang
in step, SimpleLLM in step, SGLang in stream).

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* just stage-api && just publish-api

* fix: update expected LLMConfig fields in schema test for logprobs support

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: remove rllm provider references

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* just stage-api && just publish-api

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-65-206.ec2.internal>
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Ari Webb
5fd5a6dd07 feat: add new azure api maintaining backward compat (#9387)
* feat: add new azure provider type

* fix context window
2026-02-24 10:52:07 -08:00
jnjpng
226df8baef fix: propagate context window exceeded from chatgpt oauth client (#9393)
* base

* clean up

* fixes
2026-02-24 10:52:07 -08:00
Kian Jones
4c753f3f3c fix: handle non-JSON responses from LLM provider endpoints (#9362)
When an OpenAI/Anthropic-compatible endpoint returns a non-JSON response
(e.g. HTML error page), the SDK's paginated response parser falls back
to returning a raw string. The post-parser then calls
_set_private_attributes() on that string, causing an AttributeError.

Add explicit AttributeError handling around SDK models.list() calls in
provider check_api_key/list_llm_models_async methods, and add type
guards in convert_response_to_chat_completion to reject raw strings
before Pydantic model construction.

Datadog: https://us5.datadoghq.com/error-tracking/issue/59a7a206-00b8-11f1-be73-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
825019c2ce fix(core): handle Anthropic streaming required ValueError (#9344)
* Fix Anthropic ValueError for long-running operations

Adds proper error handling for Anthropic SDK's streaming requirement.
When operations may exceed 10 minutes, the SDK raises a ValueError.

Changes:
- Catch ValueError in sync request() method
- Provide user-friendly error directing to async API
- Async version already had this fix with streaming fallback

Fixes Datadog issue 955d10b4-ed95-11f0-a5a5-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use LLMBadRequestError instead of ValueError for Anthropic streaming constraint

ValueError maps to HTTP 400 which incorrectly implies a bad client request.
LLMBadRequestError maps to HTTP 502 (Bad Gateway) which correctly signals
that the downstream provider (Anthropic) rejected the proxied request due
to its own constraints.

Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
2026-02-24 10:52:07 -08:00
Kian Jones
14ef479e70 fix(core): handle empty content in Anthropic response gracefully (#9345)
Fixes Datadog issue a47619fa-d5b8-11f0-9fd7-da7ad0900000

Handle empty content in Anthropic responses gracefully by replacing RuntimeError with LLMServerError. Now logs detailed debugging information (response ID, model, stop_reason) and returns a user-friendly error instead of crashing.

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
2c0cddf9f5 fix(core): handle Google 499 CANCELLED as client disconnect, not server error (#9363)
The google.genai.errors.ClientError with code 499 (CANCELLED) indicates the
client disconnected, not a server-side failure. Previously this fell through
to the generic ClientError handler and was classified as LLMServerError,
causing false 500s in Datadog error tracking.

- Add explicit 499 handling in handle_llm_error: log at info level, return
  LLMConnectionError instead of LLMServerError
- Catch 499 during stream iteration in stream_async and end gracefully
  instead of propagating the error

Datadog: https://us5.datadoghq.com/error-tracking/issue/c8453aaa-d559-11f0-81c6-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
f20fdc73d1 fix(core): preserve Gemini thought_signature on function calls in non-streaming path (#9351)
* fix(core): preserve Gemini thought_signature on function calls in non-streaming path

The Google Gemini API requires thought_signature to be echoed back on
function call parts in multi-turn conversations. In the non-streaming
request path, the signature was only captured for subsequent function
calls (else branch) but dropped for the first/only function call (if
branch) in convert_response_to_chat_completion. This caused 400
INVALID_ARGUMENT errors on the next turn.

Additionally, when no ReasoningContent existed to carry the signature
(e.g. Gemini 2.5 Flash with include_thoughts=False), the signature was
lost in the adapter layer. Now it falls through to TextContent.

Datadog: https://us5.datadoghq.com/error-tracking/issue/17c4b114-d596-11f0-bcd6-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): preserve Gemini thought_signature in non-temporal agent path

Carry reasoning_content_signature on TextContent in letta_agent.py
at both locations where content falls through from reasoning (same
fix already applied to the adapter and temporal activity paths).

Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
2026-02-24 10:52:07 -08:00
Kian Jones
745dd1e124 fix(core): reject empty API keys in Bearer auth headers (#9350)
Empty or None API keys resulted in "Bearer " header values which cause
httpx.LocalProtocolError. Use truthiness checks instead of `is not None`
to also reject empty strings before constructing Authorization headers.

Datadog: https://us5.datadoghq.com/error-tracking/issue/ad3c1e38-d557-11f0-a65d-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
d48932bdb6 fix(core): sanitize Unicode surrogates in all LLM client requests (#9323)
Multiple OpenAI-compatible LLM clients (Azure, Deepseek, Groq, Together, XAI, ZAI)
and Anthropic-compatible clients (Anthropic, MiniMax, Google Vertex) were overriding
request_async/stream_async without calling sanitize_unicode_surrogates, causing
UnicodeEncodeError when message content contained lone UTF-16 surrogates.

Root cause: Child classes override parent methods but omit the sanitization step that
the base OpenAIClient includes. This allows corrupted Unicode (unpaired surrogates
from malformed emoji) to reach the httpx layer, which rejects it during UTF-8 encoding.

Fix: Import and call sanitize_unicode_surrogates in all overridden request methods.
Also removed duplicate sanitize_unicode_surrogates definition from openai_client.py
that shadowed the canonical implementation in letta.helpers.json_helpers.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

Issue-ID: 10c0f2e4-f87b-11f0-b91c-da7ad0900000
2026-02-24 10:52:06 -08:00
Kian Jones
662ec082cf fix(core): handle MCP errors and API key whitespace (#9306)
* fix: strip whitespace from API keys in LLM client headers

Fixes httpx.LocalProtocolError when API keys contain leading/trailing whitespace.
Strips whitespace from API keys before using them in HTTP headers across:
- OpenAI client (openai.py)
- Mistral client (mistral.py)
- Anthropic client (anthropic_client.py)
- Anthropic schema provider (schemas/providers/anthropic.py)
- Google AI client (google_ai_client.py)
- Proxy helpers (proxy_helpers.py)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: handle McpError gracefully in MCP client execute_tool

Return error as failed result instead of re-raising to avoid Datadog alerts for expected user-facing errors like missing tool arguments.

* fix: strip whitespace from API keys before passing to httpx client

Fixes httpx.LocalProtocolError by stripping leading/trailing whitespace
from API keys before passing them to OpenAI/AsyncOpenAI clients. The
OpenAI client library constructs Authorization headers internally, and
invalid header values (like keys with leading spaces) cause protocol
errors.

Applied fix to:
- azure_client.py (AzureOpenAI/AsyncAzureOpenAI)
- deepseek_client.py (OpenAI/AsyncOpenAI)
- openai_client.py (OpenAI/AsyncOpenAI via kwargs)
- xai_client.py (OpenAI/AsyncOpenAI)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: handle JSONDecodeError in OpenAI client requests

Catches json.JSONDecodeError from OpenAI SDK when API returns invalid
JSON (typically HTML error pages from 500-series errors) and converts
to LLMServerError with helpful details.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): strip API key whitespace at schema level on write/create

Add field_validator to ProviderCreate, ProviderUpdate, and ProviderCheck
schemas to strip whitespace from api_key and access_key fields before
persistence. This ensures keys are clean at the point of entry, preventing
whitespace from being encrypted and stored in the database.

Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>

* refactor: remove api_key.strip() calls across all LLM clients

Remove redundant .strip() calls on api_key parameters since pydantic models
now handle whitespace trimming at the validation layer. This centralizes
the validation logic and follows DRY principles.

- Updated 13 files across multiple LLM client implementations
- Removed 34 occurrences of api_key.strip()
- Includes: OpenAI, Anthropic, Azure, Google AI, Groq, XAI, DeepSeek, ZAI, Together, Mistral
- Also updated proxy helpers and provider schemas

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: remove redundant ternary operators from api_key parameters

Remove `if api_key else None` ternaries since pydantic validation ensures
api_key is either a valid string or None. The ternary was defensive programming
that's now unnecessary with proper model-level validation.

- Simplified 23 occurrences across 7 files
- Cleaner, more concise client initialization code
- No behavioral change since pydantic already handles this

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>
2026-02-24 10:52:06 -08:00
Kian Jones
3709be28dd fix(core): handle Google GenAI validation errors (#9307)
* fix: handle const keyword in google genai tool schemas

* fix: handle pydantic ValidationError in Google GenAI client

Fixes Datadog error tracking issue where pydantic_core.ValidationError
was raised when tool schemas contained unsupported fields (e.g., 'const',
'default', 'additionalProperties').

Changes:
- Add error handling for pydantic ValidationError in request(), request_async(), and stream_async()
- Convert validation errors to LLMBadRequestError with helpful error message
- Deep copy tool parameters before cleaning to avoid modifying shared objects
- Add imports for pydantic_core and copy module

This prevents unhandled exceptions and provides better diagnostics when
tool schemas contain fields not supported by Google AI API.
2026-02-24 10:52:06 -08:00
Kian Jones
be60697a62 fix(core): handle protocol errors and foreign key violations (#9308)
* fix(core): handle PermissionDeniedError in provider API key validation

Fixed OpenAI PermissionDeniedError being raised as unknown error when
validating provider API keys. The check_api_key methods in OpenAI-based
providers (OpenAI, OpenRouter, Azure, Together) now properly catch and
re-raise PermissionDeniedError as LLMPermissionDeniedError.

🐛 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): handle Unicode surrogates in OpenAI requests

Sanitize invalid UTF-16 surrogates before sending requests to OpenAI API.
Fixes UnicodeEncodeError when message content contains unpaired surrogates
from corrupted emoji data or malformed Unicode sequences.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): handle MCP tool schema validation errors gracefully

Catch fastmcp.exceptions.ToolError in execute_mcp_tool endpoint and
convert to LettaInvalidArgumentError (400) instead of letting it
propagate as 500 error. This is an expected user error when tool
arguments don't match the MCP tool's schema.

Fixes Datadog issue 8f2d874a-f8e5-11f0-9b25-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): handle ExceptionGroup-wrapped ToolError in MCP executor

When MCP tools fail with validation errors (e.g., missing required parameters),
fastmcp raises ToolError exceptions that may be wrapped in ExceptionGroup by
Python's async TaskGroup. The exception handler now unwraps single-exception
groups before checking if the error should be handled gracefully.

Fixes Calendly API "organization parameter missing" errors being logged to
Datadog instead of returning friendly error messages to users.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: handle missing agent in create_conversation to prevent foreign key violation

* Update .gitignore

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Ari Webb
85ee7ed7b4 fix: anthropic tool sanitation (#9310) 2026-02-24 10:52:06 -08:00
Kian Jones
6f746c5225 fix(core): handle Anthropic overloaded errors and Unicode encoding issues (#9305)
* fix: handle Anthropic overloaded_error in streaming interfaces

* fix: handle Unicode surrogates in OpenAI requests

Sanitize Unicode surrogate pairs before sending requests to OpenAI API.
Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause
UnicodeEncodeError when encoding to UTF-8.

Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in
position 326605: surrogates not allowed

* fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests

Improved sanitize_unicode_surrogates() to explicitly filter out lone
surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8.

Previous implementation used errors='ignore' which could still fail in
edge cases. New approach directly checks Unicode code points and removes
any surrogates before data reaches httpx encoding.

Also added sanitization to stream_async_responses() method which was
missing it.

Fixes: 'utf-8' codec can't encode character '\ud83c' in position X:
surrogates not allowed
2026-02-24 10:52:06 -08:00
jnjpng
0bdedb3c0f feat: agent generate endpoint (#9304)
* base

* update

* clean up

* update
2026-02-24 10:52:06 -08:00
Devansh Jain
644f7b9d5d chore: Add Opus 4.6 with 1M context window [OPUS-46] (#9301)
opus 4.6 1M version
2026-02-24 10:52:06 -08:00
Kevin Lin
34159ffa21 feat: add Anthropic Opus 4.6 model support (#9123) 2026-02-24 10:52:06 -08:00
cthomas
09d7940090 fix: use string tool_choice for Groq and OpenRouter (#9267)
Some providers (Groq, OpenRouter proxied providers) only support string
values for tool_choice ("none", "auto", "required"), not the object
format {"type": "function", "name": "..."}.

When force_tool_call is set, convert to "required" instead of object
format for these providers.

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Kian Jones
203b6ead7c fix: remove duplicate provider trace logging and dead code (#9278)
Provider traces were being created twice per step:
1. Via `request_async_with_telemetry` / `log_provider_trace_async` in LLMClient
2. Via direct `create_provider_trace_async` calls in LettaAgent

This caused duplicate records in provider_trace_metadata (Postgres) and
llm_traces (ClickHouse) for every agent step.

Changes:
- Remove redundant direct `create_provider_trace_async` calls from letta_agent.py
- Remove no-op `stream_async_with_telemetry` method (was just a pass-through to `stream_async`)
- Update callers to use `stream_async` directly

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Sarah Wooders
eaf64fb510 fix: add LLMCallType enum and ensure call_type is set on all provider traces (#9258)
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
cthomas
00aa51927d fix: add missing call_type to more ProviderTrace callsites (#9266)
- letta_llm_request_adapter.py
- llm_api_tools.py

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Ari Webb
170886b8a8 fix: compaction on 413 from openai (#9263) 2026-02-24 10:52:06 -08:00
Sarah Wooders
4096b30cd7 feat: log LLM traces to clickhouse (#9111)
* feat: add non-streaming option for conversation messages

- Add ConversationMessageRequest with stream=True default (backwards compatible)
- stream=true (default): SSE streaming via StreamingService
- stream=false: JSON response via AgentLoop.load().step()

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate API schema for ConversationMessageRequest

* feat: add direct ClickHouse storage for raw LLM traces

    Adds ability to store raw LLM request/response payloads directly in ClickHouse,
    bypassing OTEL span attribute size limits. This enables debugging and analytics
    on large LLM payloads (>10MB system prompts, large tool schemas, etc.).

    New files:
    - letta/schemas/llm_raw_trace.py: Pydantic schema with ClickHouse row helper
    - letta/services/llm_raw_trace_writer.py: Async batching writer (fire-and-forget)
    - letta/services/llm_raw_trace_reader.py: Reader with query methods
    - scripts/sql/clickhouse/llm_raw_traces.ddl: Production table DDL
    - scripts/sql/clickhouse/llm_raw_traces_local.ddl: Local dev DDL
    - apps/core/clickhouse-init.sql: Local dev initialization

    Modified:
    - letta/settings.py: Added 4 settings (store_llm_raw_traces, ttl, batch_size, flush_interval)
    - letta/llm_api/llm_client_base.py: Integration into request_async_with_telemetry
    - compose.yaml: Added ClickHouse service for local dev
    - justfile: Added clickhouse, clickhouse-cli, clickhouse-traces commands

    Feature disabled by default (LETTA_STORE_LLM_RAW_TRACES=false).
    Uses ZSTD(3) compression for 10-30x reduction on JSON payloads.

    🤖 Generated with [Letta Code](https://letta.com)

    Co-Authored-By: Letta <noreply@letta.com>

* fix: address code review feedback for LLM raw traces

Fixes based on code review feedback:

1. Fix ClickHouse endpoint parsing - default to secure=False for raw host:port
   inputs (was defaulting to HTTPS which breaks local dev)

2. Make raw trace writes truly fire-and-forget - use asyncio.create_task()
   instead of awaiting, so JSON serialization doesn't block request path

3. Add bounded queue (maxsize=10000) - prevents unbounded memory growth
   under load. Drops traces with warning if queue is full.

4. Fix deprecated asyncio usage - get_running_loop() instead of get_event_loop()

5. Add org_id fallback - use _telemetry_org_id if actor doesn't have it

6. Remove unused imports - json import in reader

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add missing asyncio import and simplify JSON serialization

- Add missing 'import asyncio' that was causing 'name asyncio is not defined' error
- Remove unnecessary clean_double_escapes() function - the JSON is stored correctly,
  the clickhouse-client CLI was just adding extra escaping when displaying
- Update just clickhouse-trace to use Python client for correct JSON output

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: add clickhouse raw trace integration test

* test: simplify clickhouse trace assertions

* refactor: centralize usage parsing and stream error traces

Use per-client usage helpers for raw trace extraction and ensure streaming errors log requests with error metadata.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: exercise provider usage parsing live

Make live OpenAI/Anthropic/Gemini requests with credential gating and validate Anthropic cache usage mapping when present.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: fix usage parsing tests to pass

- Use GoogleAIClient with GEMINI_API_KEY instead of GoogleVertexClient
- Update model to gemini-2.0-flash (1.5-flash deprecated in v1beta)
- Add tools=[] for Gemini/Anthropic build_request_data

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: extract_usage_statistics returns LettaUsageStatistics

Standardize on LettaUsageStatistics as the canonical usage format returned by client helpers. Inline UsageStatistics construction for ChatCompletionResponse where needed.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add is_byok and llm_config_json columns to ClickHouse traces

Extend llm_raw_traces table with:
- is_byok (UInt8): Track BYOK vs base provider usage for billing analytics
- llm_config_json (String, ZSTD): Store full LLM config for debugging and analysis

This enables queries like:
- BYOK usage breakdown by provider/model
- Config parameter analysis (temperature, max_tokens, etc.)
- Debugging specific request configurations

* feat: add tests for error traces, llm_config_json, and cache tokens

- Update llm_raw_trace_reader.py to query new columns (is_byok,
  cached_input_tokens, cache_write_tokens, reasoning_tokens, llm_config_json)
- Add test_error_trace_stored_in_clickhouse to verify error fields
- Add test_cache_tokens_stored_for_anthropic to verify cache token storage
- Update existing tests to verify llm_config_json is stored correctly
- Make llm_config required in log_provider_trace_async()
- Simplify provider extraction to use provider_name directly

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* ci: add ClickHouse integration tests to CI pipeline

- Add use-clickhouse option to reusable-test-workflow.yml
- Add ClickHouse service container with otel database
- Add schema initialization step using clickhouse-init.sql
- Add ClickHouse env vars (CLICKHOUSE_ENDPOINT, etc.)
- Add separate clickhouse-integration-tests job running
  integration_test_clickhouse_llm_raw_traces.py

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify provider and org_id extraction in raw trace writer

- Use model_endpoint_type.value for provider (not provider_name)
- Simplify org_id to just self.actor.organization_id (actor is always pydantic)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify LLMRawTraceWriter with _enabled flag

- Check ClickHouse env vars once at init, set _enabled flag
- Early return in write_async/flush_async if not enabled
- Remove ValueError raises (never used)
- Simplify _get_client (no validation needed since already checked)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add LLMRawTraceWriter shutdown to FastAPI lifespan

Properly flush pending traces on graceful shutdown via lifespan
instead of relying only on atexit handler.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add agent_tags column to ClickHouse traces

Store agent tags as Array(String) for filtering/analytics by tag.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* cleanup

* fix(ci): fix ClickHouse schema initialization in CI

- Create database separately before loading SQL file
- Remove CREATE DATABASE from SQL file (handled in CI step)
- Add verification step to confirm table was created
- Use -sf flag for curl to fail on HTTP errors

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify LLM trace writer with ClickHouse async_insert

- Use ClickHouse async_insert for server-side batching instead of manual queue/flush loop
- Sync cloud DDL schema with clickhouse-init.sql (add missing columns)
- Remove redundant llm_raw_traces_local.ddl
- Remove unused batch_size/flush_interval settings
- Update tests for simplified writer

Key changes:
- async_insert=1, wait_for_async_insert=1 for reliable server-side batching
- Simple per-trace retry with exponential backoff (max 3 retries)
- ~150 lines removed from writer

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: consolidate ClickHouse direct writes into TelemetryManager backend

- Add clickhouse_direct backend to provider_trace_backends
- Remove duplicate ClickHouse write logic from llm_client_base.py
- Configure via LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse_direct

The clickhouse_direct backend:
- Converts ProviderTrace to LLMRawTrace
- Extracts usage stats from response JSON
- Writes via LLMRawTraceWriter with async_insert

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: address PR review comments and fix llm_config bug

Review comment fixes:
- Rename clickhouse_direct -> clickhouse_analytics (clearer purpose)
- Remove ClickHouse from OSS compose.yaml, create separate compose.clickhouse.yaml
- Delete redundant scripts/test_llm_raw_traces.py (use pytest tests)
- Remove unused llm_raw_traces_ttl_days setting (TTL handled in DDL)
- Fix socket description leak in telemetry_manager docstring
- Add cloud-only comment to clickhouse-init.sql
- Update justfile to use separate compose file

Bug fix:
- Fix llm_config not being passed to ProviderTrace in telemetry
- Now correctly populates provider, model, is_byok for all LLM calls
- Affects both request_async_with_telemetry and log_provider_trace_async

DDL optimizations:
- Add secondary indexes (bloom_filter for agent_id, model, step_id)
- Add minmax indexes for is_byok, is_error
- Change model and error_type to LowCardinality for faster GROUP BY

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: rename llm_raw_traces -> llm_traces

Address review feedback that "raw" is misleading since we denormalize fields.

Renames:
- Table: llm_raw_traces -> llm_traces
- Schema: LLMRawTrace -> LLMTrace
- Files: llm_raw_trace_{reader,writer}.py -> llm_trace_{reader,writer}.py
- Setting: store_llm_raw_traces -> store_llm_traces

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: update workflow references to llm_traces

Missed renaming table name in CI workflow files.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: update clickhouse_direct -> clickhouse_analytics in docstring

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: remove inaccurate OTEL size limit comments

The 4MB limit is our own truncation logic, not an OTEL protocol limit.
The real benefit is denormalized columns for analytics queries.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: remove local ClickHouse dev setup (cloud-only feature)

- Delete clickhouse-init.sql and compose.clickhouse.yaml
- Remove local clickhouse just commands
- Update CI to use cloud DDL with MergeTree for testing

clickhouse_analytics is a cloud-only feature. For local dev, use postgres backend.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: restore compose.yaml to match main

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: merge clickhouse_analytics into clickhouse backend

Per review feedback - having two separate backends was confusing.

Now the clickhouse backend:
- Writes to llm_traces table (denormalized for cost analytics)
- Reads from OTEL traces table (will cut over to llm_traces later)

Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: correct path to DDL file in CI workflow

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: add provider index to DDL for faster filtering

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: configure telemetry backend in clickhouse tests

Tests need to set telemetry_settings.provider_trace_backends to include
'clickhouse', otherwise traces are routed to default postgres backend.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: set provider_trace_backend field, not property

provider_trace_backends is a computed property, need to set the
underlying provider_trace_backend string field instead.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: error trace test and error_type extraction

- Add TelemetryManager to error trace test so traces get written
- Fix error_type extraction to check top-level before nested error dict

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use provider_trace.id for trace correlation across backends

- Pass provider_trace.id to LLMTrace instead of auto-generating
- Log warning if ID is missing (shouldn't happen, helps debug)
- Fallback to new UUID only if not set

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: trace ID correlation and concurrency issues

- Strip "provider_trace-" prefix from ID for UUID storage in ClickHouse
- Add asyncio.Lock to serialize writes (clickhouse_connect not thread-safe)
- Fix Anthropic prompt_tokens to include cached tokens for cost analytics
- Log warning if provider_trace.id is missing

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Caren Thomas <carenthomas@gmail.com>
2026-02-24 10:52:06 -08:00
Ari Webb
0bbb9c9bc0 feat: add reasoning zai openrouter (#9189)
* feat: add reasoning zai openrouter

* add openrouter reasoning

* stage + publish api

* openrouter reasoning always on

* revert

* fix

* remove reference

* do
2026-02-24 10:52:06 -08:00
Kian Jones
01cb00ae10 Revert "fix: truncate oversized text in embedding requests" (#9227)
Revert "fix: truncate oversized text in embedding requests (#9196)"

This reverts commit a9c342087e022519c63d62fb76b72aed8859539b.
2026-02-24 10:52:06 -08:00
Kian Jones
630c147b13 fix: truncate oversized text in embedding requests (#9196)
fix: handle oversized text in embedding requests with recursive chunking

When message text exceeds the embedding model's context length, recursively
split it until all chunks can be embedded successfully.

Changes:
- `tpuf_client.py`: Add `_split_text_in_half()` helper for recursive splitting
- `tpuf_client.py`: Add `_generate_embeddings_with_chunking()` that retries
  with splits on context length errors
- `tpuf_client.py`: Store `message_id` and `chunk_index` columns in Turbopuffer
- `tpuf_client.py`: Deduplicate query results by `message_id`
- `tpuf_client.py`: Use `LettaInvalidArgumentError` instead of `ValueError`
- `tpuf_client.py`: Move LLMClient import to top of file
- `openai_client.py`: Remove fixed truncation (chunking handles this now)
- Add tests for `_split_text_in_half` and chunked query deduplication

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Ari Webb
501de90d6c fix: fix base openrouter (#9171) 2026-01-29 12:44:04 -08:00
Ari Webb
bcd90859ec fix: byok uses our key, fix that (#9169) 2026-01-29 12:44:04 -08:00
Ari Webb
9ce1249738 feat: openrouter byok (#9148)
* feat: openrouter byok

* new client is unnecessary

* revert json diffs
2026-01-29 12:44:04 -08:00