letta-server

Author	SHA1	Message	Date
Ani Tunturi	08d3c26732	fix: sanitize control characters before sending to inference backends Fireworks (via Synthetic Direct) chokes on raw ASCII control chars (0x00-0x1F) in JSON payloads with "Unterminated string" errors. The existing sanitize_unicode_surrogates only handles U+D800-DFFF. Now we also strip control chars (preserving tab/newline/CR) at all 4 request paths — sync, async, and both streaming variants.	2026-03-21 20:23:56 -04:00
Ani Tunturi	9af8e94fc9	fix: use exclude_none instead of per-provider field stripping The Fireworks workaround manually popped reasoning fields, but Synthetic Direct routes through Fireworks infra and hit the same issue. exclude_none=True in model_dump is the general fix — no need to enumerate providers or fields. Removes the Fireworks special case since exclude_none covers it.	2026-03-21 12:43:32 -04:00
cthomas	416ffc7cd7	Add billing context to LLM telemetry traces (#9745 ) * feat: add billing context to LLM telemetry traces Add billing metadata (plan type, cost source, customer ID) to LLM traces in ClickHouse for cost analytics and attribution. Data Flow: - Cloud-API: Extract billing info from subscription in rate limiting, set x-billing-* headers - Core: Parse headers into BillingContext object via dependencies - Adapters: Flow billing_context through all LLM adapters (blocking & streaming) - Agent: Pass billing_context to step() and stream() methods - ClickHouse: Store in billing_plan_type, billing_cost_source, billing_customer_id columns Changes: - Add BillingContext schema to provider_trace.py - Add billing columns to llm_traces ClickHouse table DDL - Update getCustomerSubscription to fetch stripeCustomerId from organization_billing_details - Propagate billing_context through agent step flow, adapters, and streaming service - Update ProviderTrace and LLMTrace to include billing metadata - Regenerate SDK with autogen Production Deployment: Requires env vars: LETTA_PROVIDER_TRACE_BACKEND=clickhouse, LETTA_STORE_LLM_TRACES=true, CLICKHOUSE_* 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add billing_context parameter to agent step methods - Add billing_context to BaseAgent and BaseAgentV2 abstract methods - Update LettaAgent, LettaAgentV2, LettaAgentV3 step methods - Update multi-agent groups: SleeptimeMultiAgentV2, V3, V4 - Fix test_utils.py to include billing header parameters - Import BillingContext in all affected files * fix: add billing_context to stream methods - Add billing_context parameter to BaseAgentV2.stream() - Add billing_context parameter to LettaAgentV2.stream() - LettaAgentV3.stream() already has it from previous commit * fix: exclude billing headers from OpenAPI spec Mark billing headers as internal (include_in_schema=False) so they don't appear in the public API. These are internal headers between cloud-api and core, not part of the public SDK. Regenerated SDK with stage-api - removes 10,650 lines of bloat that was causing OOM during Next.js build. * refactor: return billing context from handleUnifiedRateLimiting instead of mutating req Instead of passing req into handleUnifiedRateLimiting and mutating headers inside it: - Return billing context fields (billingPlanType, billingCostSource, billingCustomerId) from handleUnifiedRateLimiting - Set headers in handleMessageRateLimiting (middleware layer) after getting the result - This fixes step-orchestrator compatibility since it doesn't have a real Express req object * chore: remove extra gencode * p --------- Co-authored-by: Letta <noreply@letta.com>	2026-03-03 18:34:13 -08:00
jnjpng	db9e0f42af	fix(core): prevent ModelSettings default max_output_tokens from overriding agent config (#9739 ) * fix(core): prevent ModelSettings default max_output_tokens from overriding agent config When a conversation's model_settings were saved, the Pydantic default of max_output_tokens=4096 was always persisted to the DB even when the client never specified it. On subsequent messages, this default would overwrite the agent's max_tokens (typically None) with 4096, silently capping output. Two changes: 1. Use model_dump(exclude_unset=True) when persisting model_settings to the DB so Pydantic defaults are not saved. 2. Add model_fields_set guards at all callsites that apply _to_legacy_config_params() to skip max_tokens when it was not explicitly provided by the caller. Also conditionally set max_output_tokens in the OpenAI Responses API request builder so None is not sent as null (which some models treat as a hard 4096 cap). * nit * Fix model_settings serialization to preserve provider_type discriminator Replace blanket exclude_unset=True with targeted removal of only max_output_tokens when not explicitly set. The previous approach stripped the provider_type field (a Literal with a default), which broke discriminated union deserialization when reading back from DB.	2026-03-03 18:34:02 -08:00
amysguan	612a2ae98b	Fix: Change Z.ai context window to account for max_token subtraction (#9710 ) fix zai context window (functionally [advertised context window] - [max output tokens]) and properly pass in max tokens so Z.ai doesn't default to 65k for GLM-5	2026-03-03 18:34:02 -08:00
Kevin Lin	a11ba9710c	feat(core): increase Gemini timeout to 10 minutes (#9714 )	2026-03-03 18:34:02 -08:00
Ari Webb	673c1220a1	fix: strip properties for fireworks (#9703 )	2026-03-03 18:34:02 -08:00
cthomas	3d781efd21	fix(core): raise LLMEmptyResponseError for empty Anthropic responses (#9624 ) * fix(core): raise LLMEmptyResponseError for empty Anthropic responses Fixes LET-7679: Opus 4.6 occasionally returns empty responses (no content and no tool calls), causing silent failures with stop_reason=end_turn. Changes: - Add LLMEmptyResponseError class (subclass of LLMServerError) - Raise error in anthropic_client for empty non-streaming responses - Raise error in anthropic_streaming_interface for empty streaming responses - Pass through LLMError instances in handle_llm_error to preserve specific types - Add test for empty streaming response detection This allows clients (letta-code) to catch this specific error and implement retry logic with cache-busting modifications. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): set invalid_llm_response stop reason for empty responses Catch LLMEmptyResponseError specifically and set stop_reason to invalid_llm_response instead of llm_api_error. This allows clients to distinguish empty responses from transient API errors. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-03-03 18:34:01 -08:00
Kevin Lin	895acb9f4e	feat(core): add gpt-5.3-codex model support (#9628 ) * feat(core): add gpt-5.3-codex model support Add OpenAI gpt-5.3-codex model: context window overrides, model pricing and capabilities, none-reasoning-effort support, and test config. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * just stage-api && just publish-api --------- Co-authored-by: Letta <noreply@letta.com>	2026-03-03 18:34:01 -08:00
amysguan	47b0c87ebe	Add modes `self` and `self_sliding_window` for prompt caching (#9372 ) * add self compaction method with proper caching (pass in tools, don't refresh sys prompt beforehand) + sliding fallback * updated prompts for self compaction * add tests for self, self_sliding_window modes and w/o refresh messages before compaction * add cache logging to summarization * better handling to prevent agent from continuing convo on self modes * if mode changes via summarize endpoint, will use default prompt for the new mode --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:55:26 -08:00
Kevin Lin	bd5b5fa9f3	feat(gemini): add 3.1 pro preview support (#9553 ) Add 3.1 model metadata for Google AI and update Gemini tests/examples to use the new handle. 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
Ari Webb	21765d16c9	fix(core): add OpenAI 24h prompt cache retention for supported models (#9509 ) * fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492) * fix(core): apply OpenAI prompt cache settings to request payloads Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prefix prompt cache key with letta tag Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add integration test * skip test --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Ari Webb <ari@letta.com> * fix(core): only set prompt_cache_retention, drop prompt_cache_key Two issues with the original prompt_cache_key approach: 1. Key exceeded 64-char max (agent-<uuid>:conv-<uuid> = 90 chars) 2. Setting an explicit key disrupted OpenAI's default prefix-hash routing, dropping cache hit rates from 40-45% to 10-13% OpenAI's default routing (hash of first ~256 tokens) already provides good cache affinity since each agent has a unique system prompt. We only need prompt_cache_retention="24h" for extended retention. Also fixes: - Operator precedence bug in _supports_extended_prompt_cache_retention - Removes incorrect gpt-5.2-codex exclusion (it IS supported per docs) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Charles Packer <packercharles@gmail.com> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
Kian Jones	f5c4ab50f4	chore: add ty + pre-commit hook and repeal even more ruff rules (#9504 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import * remove all ignores, add FastAPI rules and Ruff rules * add ty and precommit * ruff stuff * ty check fixes * ty check fixes pt 2 * error on invalid	2026-02-24 10:55:11 -08:00
Devansh Jain	39ddda81cc	feat: add Anthropic Sonnet 4.6 (#9408 )	2026-02-24 10:55:11 -08:00
Kian Jones	25d54dd896	chore: enable F821, F401, W293 (#9503 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import	2026-02-24 10:55:08 -08:00
Ari Webb	fa70e09963	Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention" (#9502 ) Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention …" This reverts commit f5bb9c629cb7d45544e90758cdfb899bcef41912.	2026-02-24 10:52:07 -08:00
Charles Packer	619e81ed1e	fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492 ) * fix(core): apply OpenAI prompt cache settings to request payloads Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prefix prompt cache key with letta tag Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add integration test * skip test --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Ari Webb <ari@letta.com>	2026-02-24 10:52:07 -08:00
jnjpng	5b001a7749	fix: rename ChatGPT server error to ChatGPT API error (#9497 ) fix: rename ChatGPT server error to ChatGPT API error in error messages	2026-02-24 10:52:07 -08:00
jnjpng	fbc0bb60d9	fix: retry ChatGPT 502 and upstream connection errors with exponential backoff (#9495 ) 502s and upstream connection errors (envoy proxy failures) from ChatGPT were not being retried. This classifies them as LLMConnectionError (retryable) in both the streaming and non-streaming paths, and adds retry handling in the non-streaming HTTPStatusError handler so 502s get the same exponential backoff treatment as transport-level connection drops. 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
amysguan	80a0d1a95f	Add LLM client compaction errors to traces (#9474 ) * add llm client errors to traces * update response json for telemetry * prevent silent failures and properly log errored responses in streaming path * remove double logging --------- Co-authored-by: Amy Guan <amy@letta.com> Co-authored-by: Kian Jones <kian@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	80f34f134d	fix(core): catch bare openai.APIError in handle_llm_error (#9468 ) * fix(core): catch bare openai.APIError in handle_llm_error fallthrough openai.APIError raised during streaming (e.g. OpenRouter credit exhaustion) is not an APIStatusError, so it skipped the catch-all at the end and fell through to LLMError("Unhandled"). Now bare APIErrors that aren't context window overflows are mapped to LLMBadRequestError. Datadog: https://us5.datadoghq.com/error-tracking/issue/7a2c356c-0849-11f1-be66-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat(core): add LLMInsufficientCreditsError for BYOK credit exhaustion Adds dedicated error type for insufficient credits/quota across all providers (OpenAI, Anthropic, Google). Returns HTTP 402 with BYOK-aware messaging instead of generic 400. - New LLMInsufficientCreditsError class and PAYMENT_REQUIRED ErrorCode - is_insufficient_credits_message() helper detecting credit/quota strings - All 3 provider clients detect 402 status + credit keywords - FastAPI handler returns 402 with "your API key" vs generic messaging - 5 new parametrized tests covering OpenRouter, OpenAI, and negative case 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	cfd2ca3102	fix: zai clear empty messages (#9466 )	2026-02-24 10:52:07 -08:00
jnjpng	778f28ccf3	fix: handle transient network errors in ChatGPT OAuth client (#9462 ) - Map httpx.ReadError/WriteError/ConnectError to LLMConnectionError in handle_llm_error so Temporal correctly classifies them as retryable (previously fell through to generic non-retryable LLMError) - Add client-level retry with exponential backoff (up to 3 attempts) on request_async and stream_async for transient transport errors - Stream retry is guarded by has_yielded flag to avoid corrupting partial responses already consumed by the caller	2026-02-24 10:52:07 -08:00
Kian Jones	b9c4ed3b15	fix: catch contextwindowexceeded error on gemini (#9450 ) * catch contextwindowexceeded error * fix(core): detect Google token limit errors as ContextWindowExceededError Google's error message says "input token count exceeds the maximum number of tokens allowed" which doesn't contain the word "context", so it was falling through to generic LLMBadRequestError instead of ContextWindowExceededError. This means compaction won't auto-trigger. Expands the detection to also match "token count" and "tokens allowed" in addition to the existing "context" keyword. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): add missing message arg to LLMBadRequestError in OpenAI client The generic 400 path in handle_llm_error was constructing LLMBadRequestError without the required message positional arg, causing TypeError in prod during summarization. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * ci: add adapters/ test suite to core unit test matrix 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(tests): update adapter error handling test expectations to match actual behavior The streaming adapter's error handling double-wraps errors: the AnthropicStreamingInterface calls handle_llm_error first, then the adapter catches the result and calls handle_llm_error again, which falls through to the base class LLMError. Updated test expectations to match this behavior. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prevent double-wrapping of LLMError in stream adapter The AnthropicStreamingInterface.process() already transforms raw provider errors into LLMError subtypes via handle_llm_error. The adapter was catching the result and calling handle_llm_error again, which didn't recognize the already-transformed LLMError and wrapped it in a generic LLMError("Unhandled LLM error"). This downgraded specific error types (LLMConnectionError, LLMServerError, etc.) and broke retry logic that matches on specific subtypes. Now the adapter checks if the error is already an LLMError and re-raises it as-is. Tests restored to original correct expectations. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	5b7dd15905	fix(core): use BYOK API keys for Google AI/Vertex LLM requests (#9439 ) GoogleAIClient and GoogleVertexClient were hardcoding Letta's managed credentials for all requests, ignoring user-provided BYOK API keys. This meant Letta was paying Google API costs for BYOK users. Add _get_client_async and update _get_client to check BYOK overrides (via get_byok_overrides / get_byok_overrides_async) before falling back to managed credentials, matching the pattern used by OpenAIClient and AnthropicClient. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	d0e25ae471	feat: add glm 5 to core (#9436 ) * feat: add glm 5 to core * test glm 5	2026-02-24 10:52:07 -08:00
Kian Jones	7c65fd77f1	fix(core): return 400 for Google GenAI ClientError bad requests (#9357 ) Google genai.errors.ClientError with code 400 was being caught and wrapped as LLMBadRequestError but returned to clients as 502 because no dedicated FastAPI exception handler existed for LLMBadRequestError. - Add LLMBadRequestError exception handler in app.py returning HTTP 400 - Fix ErrorCode on Google 400 bad requests from INTERNAL_SERVER_ERROR to INVALID_ARGUMENT - Route Google API errors through handle_llm_error in stream_async path Datadog: https://us5.datadoghq.com/error-tracking/issue/4eb3ff3c-d937-11f0-8177-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	411bb63990	fix(core): improve error handling for upstream LLM provider errors (#9423 ) Handle HTML error responses from ALB/load balancers in OpenAI client and add explicit InternalServerError handling for Anthropic upstream issues. 🐛 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	382e216cbb	fix(core): differentiate BYOK vs base provider in all LLM error details (#9425 ) Add is_byok flag to every LLMError's details dict returned from handle_llm_error across all providers (OpenAI, Anthropic, Google, ChatGPT OAuth). This enables observability into whether errors originate from Letta's production keys or user-provided BYOK keys. The rate limit handler in app.py now returns a more helpful message for BYOK users ("check your provider's rate limits and billing") versus the generic message for base provider rate limits. Datadog issues: - https://us5.datadoghq.com/error-tracking/issue/b711c824-f490-11f0-96e4-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/76623036-f4de-11f0-8697-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/43e9888a-dfcf-11f0-a645-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	424a1ada64	fix: google gen ai format error fix (#9147 ) * google gen ai format error fix * fix(core): add $ref safety net, warning log, and unit tests for Google schema resolution - Add `$ref` to unsupported_keys in `_clean_google_ai_schema_properties` so unresolvable refs (e.g. `#/properties/...` style) are stripped as a safety net instead of crashing the Google SDK - Add warning log when `_resolve_json_schema_refs` encounters a ref it cannot resolve - Deduplicate the `#/$defs/` and `#/definitions/` resolution branches - Add 11 unit tests covering: single/multiple $defs, nested refs, refs in anyOf/allOf, array items, definitions key, unresolvable refs, and the full resolve+clean pipeline 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kevin Lin	23c94ec6d3	feat: add log probabilities from OpenAI-compatible servers and SGLang native endpoint (#9240 ) * Add log probabilities support for RL training This enables Letta server to request and return log probabilities from OpenAI-compatible providers (including SGLang) for use in RL training. Changes: - LLMConfig: Add return_logprobs and top_logprobs fields - OpenAIClient: Set logprobs in ChatCompletionRequest when enabled - LettaLLMAdapter: Add logprobs field and extract from response - LettaResponse: Add logprobs field to return log probs to client - LettaRequest: Add return_logprobs/top_logprobs for per-request override - LettaAgentV3: Store and pass logprobs through to response - agents.py: Handle request-level logprobs override Usage: response = client.agents.messages.create( agent_id=agent_id, messages=[...], return_logprobs=True, top_logprobs=5, ) print(response.logprobs) # Per-token log probabilities 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * Add multi-turn token tracking for RL training via SGLang native endpoint - Add TurnTokenData schema to track token IDs and logprobs per turn - Add return_token_ids flag to LettaRequest and LLMConfig - Create SGLangNativeClient for /generate endpoint (returns output_ids) - Create SGLangNativeAdapter that uses native endpoint - Modify LettaAgentV3 to accumulate turns across LLM calls - Include turns in LettaResponse when return_token_ids=True * Fix: Add SGLang native adapter to step() method, not just stream() * Fix: Handle Pydantic Message objects in SGLang native adapter * Fix: Remove api_key reference from LLMConfig (not present) * Fix: Add missing 'created' field to ChatCompletionResponse * Add full tool support to SGLang native adapter - Format tools into prompt in Qwen-style format - Parse tool calls from <tool_call> tags in response - Format tool results as <tool_response> in user messages - Set finish_reason to 'tool_calls' when tools are called * Use tokenizer.apply_chat_template for proper tool formatting - Add tokenizer caching in SGLang native adapter - Use apply_chat_template when tokenizer available - Fall back to manual formatting if not - Convert Letta messages to OpenAI format for tokenizer * Fix: Use func_response instead of tool_return for ToolReturn content * Fix: Get output_token_logprobs from meta_info in SGLang response * Fix: Allow None in output_token_logprobs (SGLang format includes null) * chore: remove unrelated files from logprobs branch 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add missing call_type param to adapter constructors in letta_agent_v3 The SGLang refactor dropped call_type=LLMCallType.agent_step when extracting adapter creation into conditional blocks. Restores it for all 3 spots (SGLang in step, SimpleLLM in step, SGLang in stream). 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * just stage-api && just publish-api * fix: update expected LLMConfig fields in schema test for logprobs support 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove rllm provider references 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * just stage-api && just publish-api 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-65-206.ec2.internal> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	5fd5a6dd07	feat: add new azure api maintaining backward compat (#9387 ) * feat: add new azure provider type * fix context window	2026-02-24 10:52:07 -08:00
jnjpng	226df8baef	fix: propagate context window exceeded from chatgpt oauth client (#9393 ) * base * clean up * fixes	2026-02-24 10:52:07 -08:00
Kian Jones	4c753f3f3c	fix: handle non-JSON responses from LLM provider endpoints (#9362 ) When an OpenAI/Anthropic-compatible endpoint returns a non-JSON response (e.g. HTML error page), the SDK's paginated response parser falls back to returning a raw string. The post-parser then calls _set_private_attributes() on that string, causing an AttributeError. Add explicit AttributeError handling around SDK models.list() calls in provider check_api_key/list_llm_models_async methods, and add type guards in convert_response_to_chat_completion to reject raw strings before Pydantic model construction. Datadog: https://us5.datadoghq.com/error-tracking/issue/59a7a206-00b8-11f1-be73-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	825019c2ce	fix(core): handle Anthropic streaming required ValueError (#9344 ) * Fix Anthropic ValueError for long-running operations Adds proper error handling for Anthropic SDK's streaming requirement. When operations may exceed 10 minutes, the SDK raises a ValueError. Changes: - Catch ValueError in sync request() method - Provide user-friendly error directing to async API - Async version already had this fix with streaming fallback Fixes Datadog issue 955d10b4-ed95-11f0-a5a5-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use LLMBadRequestError instead of ValueError for Anthropic streaming constraint ValueError maps to HTTP 400 which incorrectly implies a bad client request. LLMBadRequestError maps to HTTP 502 (Bad Gateway) which correctly signals that the downstream provider (Anthropic) rejected the proxied request due to its own constraints. Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>	2026-02-24 10:52:07 -08:00
Kian Jones	14ef479e70	fix(core): handle empty content in Anthropic response gracefully (#9345 ) Fixes Datadog issue a47619fa-d5b8-11f0-9fd7-da7ad0900000 Handle empty content in Anthropic responses gracefully by replacing RuntimeError with LLMServerError. Now logs detailed debugging information (response ID, model, stop_reason) and returns a user-friendly error instead of crashing. 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	2c0cddf9f5	fix(core): handle Google 499 CANCELLED as client disconnect, not server error (#9363 ) The google.genai.errors.ClientError with code 499 (CANCELLED) indicates the client disconnected, not a server-side failure. Previously this fell through to the generic ClientError handler and was classified as LLMServerError, causing false 500s in Datadog error tracking. - Add explicit 499 handling in handle_llm_error: log at info level, return LLMConnectionError instead of LLMServerError - Catch 499 during stream iteration in stream_async and end gracefully instead of propagating the error Datadog: https://us5.datadoghq.com/error-tracking/issue/c8453aaa-d559-11f0-81c6-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	f20fdc73d1	fix(core): preserve Gemini thought_signature on function calls in non-streaming path (#9351 ) * fix(core): preserve Gemini thought_signature on function calls in non-streaming path The Google Gemini API requires thought_signature to be echoed back on function call parts in multi-turn conversations. In the non-streaming request path, the signature was only captured for subsequent function calls (else branch) but dropped for the first/only function call (if branch) in convert_response_to_chat_completion. This caused 400 INVALID_ARGUMENT errors on the next turn. Additionally, when no ReasoningContent existed to carry the signature (e.g. Gemini 2.5 Flash with include_thoughts=False), the signature was lost in the adapter layer. Now it falls through to TextContent. Datadog: https://us5.datadoghq.com/error-tracking/issue/17c4b114-d596-11f0-bcd6-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): preserve Gemini thought_signature in non-temporal agent path Carry reasoning_content_signature on TextContent in letta_agent.py at both locations where content falls through from reasoning (same fix already applied to the adapter and temporal activity paths). Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>	2026-02-24 10:52:07 -08:00
Kian Jones	745dd1e124	fix(core): reject empty API keys in Bearer auth headers (#9350 ) Empty or None API keys resulted in "Bearer " header values which cause httpx.LocalProtocolError. Use truthiness checks instead of `is not None` to also reject empty strings before constructing Authorization headers. Datadog: https://us5.datadoghq.com/error-tracking/issue/ad3c1e38-d557-11f0-a65d-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	d48932bdb6	fix(core): sanitize Unicode surrogates in all LLM client requests (#9323 ) Multiple OpenAI-compatible LLM clients (Azure, Deepseek, Groq, Together, XAI, ZAI) and Anthropic-compatible clients (Anthropic, MiniMax, Google Vertex) were overriding request_async/stream_async without calling sanitize_unicode_surrogates, causing UnicodeEncodeError when message content contained lone UTF-16 surrogates. Root cause: Child classes override parent methods but omit the sanitization step that the base OpenAIClient includes. This allows corrupted Unicode (unpaired surrogates from malformed emoji) to reach the httpx layer, which rejects it during UTF-8 encoding. Fix: Import and call sanitize_unicode_surrogates in all overridden request methods. Also removed duplicate sanitize_unicode_surrogates definition from openai_client.py that shadowed the canonical implementation in letta.helpers.json_helpers. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> Issue-ID: 10c0f2e4-f87b-11f0-b91c-da7ad0900000	2026-02-24 10:52:06 -08:00
Kian Jones	662ec082cf	fix(core): handle MCP errors and API key whitespace (#9306 ) * fix: strip whitespace from API keys in LLM client headers Fixes httpx.LocalProtocolError when API keys contain leading/trailing whitespace. Strips whitespace from API keys before using them in HTTP headers across: - OpenAI client (openai.py) - Mistral client (mistral.py) - Anthropic client (anthropic_client.py) - Anthropic schema provider (schemas/providers/anthropic.py) - Google AI client (google_ai_client.py) - Proxy helpers (proxy_helpers.py) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: handle McpError gracefully in MCP client execute_tool Return error as failed result instead of re-raising to avoid Datadog alerts for expected user-facing errors like missing tool arguments. * fix: strip whitespace from API keys before passing to httpx client Fixes httpx.LocalProtocolError by stripping leading/trailing whitespace from API keys before passing them to OpenAI/AsyncOpenAI clients. The OpenAI client library constructs Authorization headers internally, and invalid header values (like keys with leading spaces) cause protocol errors. Applied fix to: - azure_client.py (AzureOpenAI/AsyncAzureOpenAI) - deepseek_client.py (OpenAI/AsyncOpenAI) - openai_client.py (OpenAI/AsyncOpenAI via kwargs) - xai_client.py (OpenAI/AsyncOpenAI) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: handle JSONDecodeError in OpenAI client requests Catches json.JSONDecodeError from OpenAI SDK when API returns invalid JSON (typically HTML error pages from 500-series errors) and converts to LLMServerError with helpful details. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): strip API key whitespace at schema level on write/create Add field_validator to ProviderCreate, ProviderUpdate, and ProviderCheck schemas to strip whitespace from api_key and access_key fields before persistence. This ensures keys are clean at the point of entry, preventing whitespace from being encrypted and stored in the database. Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> * refactor: remove api_key.strip() calls across all LLM clients Remove redundant .strip() calls on api_key parameters since pydantic models now handle whitespace trimming at the validation layer. This centralizes the validation logic and follows DRY principles. - Updated 13 files across multiple LLM client implementations - Removed 34 occurrences of api_key.strip() - Includes: OpenAI, Anthropic, Azure, Google AI, Groq, XAI, DeepSeek, ZAI, Together, Mistral - Also updated proxy helpers and provider schemas 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: remove redundant ternary operators from api_key parameters Remove `if api_key else None` ternaries since pydantic validation ensures api_key is either a valid string or None. The ternary was defensive programming that's now unnecessary with proper model-level validation. - Simplified 23 occurrences across 7 files - Cleaner, more concise client initialization code - No behavioral change since pydantic already handles this 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>	2026-02-24 10:52:06 -08:00
Kian Jones	3709be28dd	fix(core): handle Google GenAI validation errors (#9307 ) * fix: handle const keyword in google genai tool schemas * fix: handle pydantic ValidationError in Google GenAI client Fixes Datadog error tracking issue where pydantic_core.ValidationError was raised when tool schemas contained unsupported fields (e.g., 'const', 'default', 'additionalProperties'). Changes: - Add error handling for pydantic ValidationError in request(), request_async(), and stream_async() - Convert validation errors to LLMBadRequestError with helpful error message - Deep copy tool parameters before cleaning to avoid modifying shared objects - Add imports for pydantic_core and copy module This prevents unhandled exceptions and provides better diagnostics when tool schemas contain fields not supported by Google AI API.	2026-02-24 10:52:06 -08:00
Kian Jones	be60697a62	fix(core): handle protocol errors and foreign key violations (#9308 ) * fix(core): handle PermissionDeniedError in provider API key validation Fixed OpenAI PermissionDeniedError being raised as unknown error when validating provider API keys. The check_api_key methods in OpenAI-based providers (OpenAI, OpenRouter, Azure, Together) now properly catch and re-raise PermissionDeniedError as LLMPermissionDeniedError. 🐛 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): handle Unicode surrogates in OpenAI requests Sanitize invalid UTF-16 surrogates before sending requests to OpenAI API. Fixes UnicodeEncodeError when message content contains unpaired surrogates from corrupted emoji data or malformed Unicode sequences. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): handle MCP tool schema validation errors gracefully Catch fastmcp.exceptions.ToolError in execute_mcp_tool endpoint and convert to LettaInvalidArgumentError (400) instead of letting it propagate as 500 error. This is an expected user error when tool arguments don't match the MCP tool's schema. Fixes Datadog issue 8f2d874a-f8e5-11f0-9b25-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): handle ExceptionGroup-wrapped ToolError in MCP executor When MCP tools fail with validation errors (e.g., missing required parameters), fastmcp raises ToolError exceptions that may be wrapped in ExceptionGroup by Python's async TaskGroup. The exception handler now unwraps single-exception groups before checking if the error should be handled gracefully. Fixes Calendly API "organization parameter missing" errors being logged to Datadog instead of returning friendly error messages to users. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: handle missing agent in create_conversation to prevent foreign key violation * Update .gitignore --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Ari Webb	85ee7ed7b4	fix: anthropic tool sanitation (#9310 )	2026-02-24 10:52:06 -08:00
Kian Jones	6f746c5225	fix(core): handle Anthropic overloaded errors and Unicode encoding issues (#9305 ) * fix: handle Anthropic overloaded_error in streaming interfaces * fix: handle Unicode surrogates in OpenAI requests Sanitize Unicode surrogate pairs before sending requests to OpenAI API. Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause UnicodeEncodeError when encoding to UTF-8. Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in position 326605: surrogates not allowed * fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests Improved sanitize_unicode_surrogates() to explicitly filter out lone surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8. Previous implementation used errors='ignore' which could still fail in edge cases. New approach directly checks Unicode code points and removes any surrogates before data reaches httpx encoding. Also added sanitization to stream_async_responses() method which was missing it. Fixes: 'utf-8' codec can't encode character '\ud83c' in position X: surrogates not allowed	2026-02-24 10:52:06 -08:00
jnjpng	0bdedb3c0f	feat: agent generate endpoint (#9304 ) * base * update * clean up * update	2026-02-24 10:52:06 -08:00
Devansh Jain	644f7b9d5d	chore: Add Opus 4.6 with 1M context window [OPUS-46] (#9301 ) opus 4.6 1M version	2026-02-24 10:52:06 -08:00
Kevin Lin	34159ffa21	feat: add Anthropic Opus 4.6 model support (#9123 )	2026-02-24 10:52:06 -08:00
cthomas	09d7940090	fix: use string tool_choice for Groq and OpenRouter (#9267 ) Some providers (Groq, OpenRouter proxied providers) only support string values for tool_choice ("none", "auto", "required"), not the object format {"type": "function", "name": "..."}. When force_tool_call is set, convert to "required" instead of object format for these providers. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Kian Jones	203b6ead7c	fix: remove duplicate provider trace logging and dead code (#9278 ) Provider traces were being created twice per step: 1. Via `request_async_with_telemetry` / `log_provider_trace_async` in LLMClient 2. Via direct `create_provider_trace_async` calls in LettaAgent This caused duplicate records in provider_trace_metadata (Postgres) and llm_traces (ClickHouse) for every agent step. Changes: - Remove redundant direct `create_provider_trace_async` calls from letta_agent.py - Remove no-op `stream_async_with_telemetry` method (was just a pass-through to `stream_async`) - Update callers to use `stream_async` directly 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00

1 2 3 4 5 ...

405 Commits