letta-server

Author	SHA1	Message	Date
Kian Jones	80f34f134d	fix(core): catch bare openai.APIError in handle_llm_error (#9468 ) * fix(core): catch bare openai.APIError in handle_llm_error fallthrough openai.APIError raised during streaming (e.g. OpenRouter credit exhaustion) is not an APIStatusError, so it skipped the catch-all at the end and fell through to LLMError("Unhandled"). Now bare APIErrors that aren't context window overflows are mapped to LLMBadRequestError. Datadog: https://us5.datadoghq.com/error-tracking/issue/7a2c356c-0849-11f1-be66-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat(core): add LLMInsufficientCreditsError for BYOK credit exhaustion Adds dedicated error type for insufficient credits/quota across all providers (OpenAI, Anthropic, Google). Returns HTTP 402 with BYOK-aware messaging instead of generic 400. - New LLMInsufficientCreditsError class and PAYMENT_REQUIRED ErrorCode - is_insufficient_credits_message() helper detecting credit/quota strings - All 3 provider clients detect 402 status + credit keywords - FastAPI handler returns 402 with "your API key" vs generic messaging - 5 new parametrized tests covering OpenRouter, OpenAI, and negative case 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	cfd2ca3102	fix: zai clear empty messages (#9466 )	2026-02-24 10:52:07 -08:00
Shubham Naik	6579f9b906	feat: add saveTemplateVersionNoProject endpoint (#9465 ) * feat: add saveTemplateVersionNoProject endpoint Added NoProject version of saveTemplateVersion endpoint: - Backend supports both URL patterns: - `/v1/templates/:project_id/:template_name` (with project in path) - `/v1/templates/:template_name` (NoProject, uses X-Project header) - Stainless surfaces only the cleaner NoProject version as `templates.save()` - NoProject route exported first for correct route matching Changes: - Added saveTemplateVersionNoProject contract and handler - Updated stainless.yml to map `save` method to NoProject endpoint - Follows same pattern as other NoProject endpoints (create, delete, rollback) 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * bump * bump --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Shubham Naik	565fd3c143	feat: add template rollback endpoint [LET-7423] (#9455 ) * feat: add template rollback endpoint [LET-7423] Adds POST /v1/templates/:template_name/rollback endpoint to restore templates to previous versions. Key features: - Rollback to any numbered version (1, 2, 3, etc.) or "latest" - Auto-saves unsaved changes before rollback to prevent data loss - Validates input (rejects "current"/"dev" as target versions) - Preserves entity IDs and relationships across rollback - Uses project context from X-Project header (no project_id in path) Implementation includes: - API contract in templatesContract.ts - Handler in templatesRouter.ts with comprehensive error handling - 9 E2E tests covering functionality and edge cases - Updated stainless.yml for SDK generation 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore test * fix: add X-Project header to rollback endpoint tests The rollback endpoint uses project context from X-Project header instead of URL path. Updated all rollback test calls to include the X-Project header with testProject value. This follows the no-project-in-path pattern for template endpoints. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: support both URL patterns for rollback endpoint Added dual URL pattern support for rollback endpoint: - `/v1/templates/:project_id/:template_name/rollback` (with project in path) - `/v1/templates/:template_name/rollback` (NoProject, uses X-Project header) Backend supports both patterns, but Stainless only exposes the cleaner NoProject version for SDKs. Key changes: - Fixed "rollback to latest" bug by resolving target version BEFORE auto-saving - NoProject route is exported first to ensure correct route matching order - Updated tests to use project_id in path for better compatibility - All 8 rollback tests passing 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * bump * bump * bump --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
jnjpng	778f28ccf3	fix: handle transient network errors in ChatGPT OAuth client (#9462 ) - Map httpx.ReadError/WriteError/ConnectError to LLMConnectionError in handle_llm_error so Temporal correctly classifies them as retryable (previously fell through to generic non-retryable LLMError) - Add client-level retry with exponential backoff (up to 3 attempts) on request_async and stream_async for transient transport errors - Stream retry is guarded by has_yielded flag to avoid corrupting partial responses already consumed by the caller	2026-02-24 10:52:07 -08:00
Kian Jones	4126fdadea	fix(core): preserve thought_signature on TextContent in Gemini streaming path (#9461 ) get_content() was only setting signature on ReasoningContent items. When Gemini returns a function call with thought_signature but no ReasoningContent (e.g. include_thoughts=False), the signature was stored on self.thinking_signature but never attached to TextContent. This caused "missing thought_signature in functionCall parts" errors when the message was echoed back to Gemini on the next turn. 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	b9c4ed3b15	fix: catch contextwindowexceeded error on gemini (#9450 ) * catch contextwindowexceeded error * fix(core): detect Google token limit errors as ContextWindowExceededError Google's error message says "input token count exceeds the maximum number of tokens allowed" which doesn't contain the word "context", so it was falling through to generic LLMBadRequestError instead of ContextWindowExceededError. This means compaction won't auto-trigger. Expands the detection to also match "token count" and "tokens allowed" in addition to the existing "context" keyword. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): add missing message arg to LLMBadRequestError in OpenAI client The generic 400 path in handle_llm_error was constructing LLMBadRequestError without the required message positional arg, causing TypeError in prod during summarization. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * ci: add adapters/ test suite to core unit test matrix 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(tests): update adapter error handling test expectations to match actual behavior The streaming adapter's error handling double-wraps errors: the AnthropicStreamingInterface calls handle_llm_error first, then the adapter catches the result and calls handle_llm_error again, which falls through to the base class LLMError. Updated test expectations to match this behavior. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prevent double-wrapping of LLMError in stream adapter The AnthropicStreamingInterface.process() already transforms raw provider errors into LLMError subtypes via handle_llm_error. The adapter was catching the result and calling handle_llm_error again, which didn't recognize the already-transformed LLMError and wrapped it in a generic LLMError("Unhandled LLM error"). This downgraded specific error types (LLMConnectionError, LLMServerError, etc.) and broke retry logic that matches on specific subtypes. Now the adapter checks if the error is already an LLMError and re-raises it as-is. Tests restored to original correct expectations. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	05073ba837	fix(core): preserve git-memory formatting and enforce lock conflicts (#9451 ) * test(core): strengthen git-memory system prompt stability integration coverage Switch git-memory HTTP integration tests to OpenAI model handles and add assertions that system prompt content remains stable after normal turns and direct block value updates until explicit recompilation or reset. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): preserve git-memory formatting and enforce lock conflicts Preserve existing markdown frontmatter formatting on block updates while still ensuring required metadata fields exist, and make post-push git sync propagate memory-repo lock conflicts as 409 responses. Also enable slash-containing core-memory block labels in route params and add regression coverage. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(memfs): fail closed on memory repo lock contention Make memfs git commits fail closed when the per-agent Redis lock cannot be acquired, return 409 MEMORY_REPO_BUSY from the memfs files write API, and map that 409 back to core MemoryRepoBusyError so API callers receive consistent busy conflicts. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore(core): minimize git-memory fix scope to memfs lock and frontmatter paths 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: drop unrelated changes and keep memfs-focused scope Revert branch-only changes that are not required for the memfs lock contention and frontmatter-preservation fix so the PR contains only issue-relevant files. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(memfs): lock push sync path and improve nested sync diagnostics Serialize memfs push-to-GCS sync with the same per-agent Redis lock key used by API commits, and add targeted post-push nested-block diagnostics plus a focused nested-label sync regression test for _sync_after_push. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	d7793a4474	fix(core): stabilize system prompt refresh and expand git-memory coverage (#9438 ) * fix(core): stabilize system prompt refresh and expand git-memory coverage Only rebuild system prompts on explicit refresh paths so normal turns preserve prefix-cache stability, including git/custom prompt layouts. Add integration coverage for memory filesystem tree structure and recompile/reset system-message updates via message-id retrieval. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): recompile system prompt around compaction and stabilize source tests Force system prompt refresh before/after compaction in LettaAgentV3 so repaired system+memory state is used and persisted across subsequent turns. Update source-system prompt tests to explicitly recompile before raw preview assertions instead of assuming automatic rebuild timing. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	5b7dd15905	fix(core): use BYOK API keys for Google AI/Vertex LLM requests (#9439 ) GoogleAIClient and GoogleVertexClient were hardcoding Letta's managed credentials for all requests, ignoring user-provided BYOK API keys. This meant Letta was paying Google API costs for BYOK users. Add _get_client_async and update _get_client to check BYOK overrides (via get_byok_overrides / get_byok_overrides_async) before falling back to managed credentials, matching the pattern used by OpenAIClient and AnthropicClient. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	d0e25ae471	feat: add glm 5 to core (#9436 ) * feat: add glm 5 to core * test glm 5	2026-02-24 10:52:07 -08:00
Kevin Lin	7ec2783ded	fix: increase SGLang provider default context window from 8k to 32k (#9435 )	2026-02-24 10:52:07 -08:00
Kian Jones	e3dbb44fc9	feat(telem): support reading from clickhouse traces (#9431 ) draft	2026-02-24 10:52:07 -08:00
Kian Jones	7c65fd77f1	fix(core): return 400 for Google GenAI ClientError bad requests (#9357 ) Google genai.errors.ClientError with code 400 was being caught and wrapped as LLMBadRequestError but returned to clients as 502 because no dedicated FastAPI exception handler existed for LLMBadRequestError. - Add LLMBadRequestError exception handler in app.py returning HTTP 400 - Fix ErrorCode on Google 400 bad requests from INTERNAL_SERVER_ERROR to INVALID_ARGUMENT - Route Google API errors through handle_llm_error in stream_async path Datadog: https://us5.datadoghq.com/error-tracking/issue/4eb3ff3c-d937-11f0-8177-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	02183efd5f	test: enable SQLAlchemy pooling in CI tests (#9279 ) * test: enable SQLAlchemy pooling in CI tests Changes CI test config to use LETTA_DISABLE_SQLALCHEMY_POOLING=false, enabling connection pooling to match production settings. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: remove hardcoded LETTA_DISABLE_SQLALCHEMY_POOLING fixture from conftest Remove the fixture that hardcoded the pooling setting in test code. The value should instead come from the CI workflow environment via vars.LETTA_DISABLE_SQLALCHEMY_POOLING (same source as production). 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>	2026-02-24 10:52:07 -08:00
Kian Jones	3634464251	fix(core): handle anyio.BrokenResourceError for client disconnects (#9358 ) Catch BrokenResourceError alongside ClosedResourceError in streaming response, logging middleware, and app exception handlers so client disconnects are logged at info level instead of surfacing as 500s. Datadog: https://us5.datadoghq.com/error-tracking/issue/4f57af0c-d558-11f0-a65d-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	0d42afa151	fix(core): catch LockNotAvailableError and return 409 instead of 500 (#9359 ) Re-apply changes on top of latest main to resolve merge conflicts. - Add DatabaseLockNotAvailableError custom exception in orm/errors.py - Catch asyncpg LockNotAvailableError and pgcode 55P03 in _handle_dbapi_error - Register FastAPI exception handler returning 409 with Retry-After header 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	411bb63990	fix(core): improve error handling for upstream LLM provider errors (#9423 ) Handle HTML error responses from ALB/load balancers in OpenAI client and add explicit InternalServerError handling for Anthropic upstream issues. 🐛 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	6d4e320cc3	chore: clean up dead code (#9427 ) remove striaght up unsued code files	2026-02-24 10:52:07 -08:00
Kian Jones	2568c02b51	fix(core): strip quotes from MCP server header keys and values (#9349 ) * fix(core): strip quotes from MCP server header keys and values Users pasting JSON-formatted env vars into MCP server config end up with quoted header names like `"CONTEXT7_API_KEY":` which causes httpx.LocalProtocolError. Sanitize keys (strip surrounding quotes and trailing colons) and values (strip surrounding quotes) in resolve_custom_headers, resolve_environment_variables for HTTP configs, and stdio env dicts. Datadog: https://us5.datadoghq.com/error-tracking/issue/4a2f4af6-f2d8-11f0-930c-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: revert stdio env sanitization to pass-through The stdio path doesn't need header/env sanitization - that's only relevant for SSE/streamable HTTP servers with auth headers. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	ddcfeb26b1	fix(core): catch all MCP tool execution errors instead of re-raising (#9419 ) * fix(core): catch all MCP tool execution errors instead of re-raising MCP tools are external user-configured servers - any failure during tool execution is expected and should be returned as (error_msg, False) to the agent, not raised as an exception that hits Datadog as a 500. Previously: - base_client.py only caught McpError/ToolError, re-raised everything else - fastmcp_client.py (both SSE and StreamableHTTP) always re-raised Now all three execute_tool() methods catch all exceptions and return the error message to the agent conversation. The agent handles tool failures via the error message naturally. This silences ~15 Datadog issue types including: - fastmcp.exceptions.ToolError (validation, permissions) - mcp.shared.exceptions.McpError (connection closed, credentials) - httpx.HTTPStatusError (503 from Zapier, etc.) - httpx.ConnectError, ReadTimeout, RemoteProtocolError - requests.exceptions.ConnectionError - builtins.ConnectionError 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): log unexpected MCP errors at warning level with traceback Expected MCP errors (ToolError, McpError, httpx.*, ConnectionError, etc.) log at info level. Anything else (e.g. TypeError, AttributeError from our own code) logs at warning with exc_info=True so it still surfaces in Datadog without crashing the request. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	382e216cbb	fix(core): differentiate BYOK vs base provider in all LLM error details (#9425 ) Add is_byok flag to every LLMError's details dict returned from handle_llm_error across all providers (OpenAI, Anthropic, Google, ChatGPT OAuth). This enables observability into whether errors originate from Letta's production keys or user-provided BYOK keys. The rate limit handler in app.py now returns a more helpful message for BYOK users ("check your provider's rate limits and billing") versus the generic message for base provider rate limits. Datadog issues: - https://us5.datadoghq.com/error-tracking/issue/b711c824-f490-11f0-96e4-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/76623036-f4de-11f0-8697-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/43e9888a-dfcf-11f0-a645-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	424a1ada64	fix: google gen ai format error fix (#9147 ) * google gen ai format error fix * fix(core): add $ref safety net, warning log, and unit tests for Google schema resolution - Add `$ref` to unsupported_keys in `_clean_google_ai_schema_properties` so unresolvable refs (e.g. `#/properties/...` style) are stripped as a safety net instead of crashing the Google SDK - Add warning log when `_resolve_json_schema_refs` encounters a ref it cannot resolve - Deduplicate the `#/$defs/` and `#/definitions/` resolution branches - Add 11 unit tests covering: single/multiple $defs, nested refs, refs in anyOf/allOf, array items, definitions key, unresolvable refs, and the full resolve+clean pipeline 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	02f776b016	fix: handle system messages with mixed TextContent + ImageContent (#9418 ) * fix: handle system messages with mixed TextContent + ImageContent System messages injected by external tools (e.g. packify.ai MCP) can contain both TextContent and ImageContent. The assertions in to_openai_responses_dicts and to_anthropic_dict expected exactly one TextContent, causing AssertionError in production. Extract all text parts and join them, matching how to_openai_dict already handles this case. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: replace asserts with logger.warning + graceful skip Asserts are the wrong tool for production input validation — if a system message has only non-text content, we should warn and skip rather than crash the request. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	a00270d199	fix(core): handle UTF-8 surrogate characters in API responses (#9422 ) * fix(core): handle UTF-8 surrogate characters in API responses LLM responses or user input can contain surrogate characters (U+D800-U+DFFF) which are valid Python strings but illegal in UTF-8. ORJSONResponse rejects them with "str is not valid UTF-8: surrogates not allowed". Add SafeORJSONResponse that catches the TypeError and strips surrogates before retrying serialization. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: reuse sanitize_unicode_surrogates from json_helpers Replace the inline _sanitize_surrogates function with the existing sanitize_unicode_surrogates helper from letta.helpers.json_helpers, which is already used across all LLM clients. Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>	2026-02-24 10:52:07 -08:00
jnjpng	39b25a0e3c	fix: update ContextWindowCalculator to parse new system message sections (#9398 ) * fix: update ContextWindowCalculator to parse new system message sections The context window calculator was using outdated position-based parsing that only handled 3 sections (base_instructions, memory_blocks, memory_metadata). The actual system message now includes additional sections that were not being tracked: - <memory_filesystem> (git-enabled agents) - <tool_usage_rules> (when tool rules configured) - <directories> (when sources attached) Changes: - Add _extract_tag_content() helper for proper XML tag extraction - Rewrite extract_system_components() to return a Dict with all 6 sections - Update calculate_context_window() to count tokens for new sections - Add new fields to ContextWindowOverview schema with backward-compatible defaults - Add unit tests for the extraction logic * update * generate * fix: check attached file in directories section instead of core_memory Files are rendered inside <directories> tags, not <memory_blocks>. Update validate_context_window_overview assertions accordingly. * fix: address review feedback for context window parser - Fix git-enabled agents regression: capture bare file blocks (e.g. <system/human.md>) rendered after </memory_filesystem> as core_memory via new _extract_git_core_memory() method - Make _extract_top_level_tag robust: scan all occurrences to find tag outside container, handling nested-first + top-level-later case - Document system_prompt tag inconsistency in docstring - Add TODO to base_agent.py extract_dynamic_section linking to ContextWindowCalculator to flag parallel parser tech debt - Add tests: git-enabled agent parsing, dual-occurrence tag extraction, pure text system prompt, git-enabled integration test	2026-02-24 10:52:07 -08:00
Kian Jones	7cc1cd3dc0	feat(ci): self-hosted provider test for lmstudio (#9404 ) * add gpu runners and prod memory_repos * add lmstudio and vllm in model_settings * fix llm_configs and change variable name in reusable workflow and change perms for memory_repos to admin in tf * fix: update self-hosted provider tests to use SDK 1.0 and v2 tests - Update letta-client from ==0.1.324 to >=1.0.0 - Switch ollama/vllm/lmstudio tests to integration_test_send_message_v2.py 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai provider_type for self-hosted model settings ollama/vllm/lmstudio are not valid provider_type values in the SDK model_settings schema - they use openai-compatible APIs so provider_type should be openai. The provider routing is determined by the handle prefix. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: enable redis for ollama/vllm/lmstudio tests Background streaming tests require Redis. Add use-redis: true to self-hosted provider test workflows. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * prep for lmstudio and vllm * used lmstudio_openai client * change tool call parser from hermes to qwen3_xml * qwen3_xmlk -> qwen3_coder * revert to hermes (incompatible with parallel tool calls?) and skipping vllm tests on parallel tool calls * install uv redis extra * remove lmstudio * create lmstudio test * qwen3-14b on lmstudio * try with qwen3-4b * actually update the model config json to use qwen3-4b * add test_providers::test_lmstudio * bump timeout from 60 to 120 for slow lmstudio on cpu model * misc vllm changes --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	2ffef0fb31	Fix git-memory context preview parsing (#9414 ) * fix(core): handle git memory label prefix collisions in filesystem view Prevent context window preview crashes when a block label is both a leaf and a prefix (e.g. system/human and system/human/context) by rendering a node as both file and directory. Add regression test. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): parse git-backed core memory in context window preview ContextWindowCalculator.extract_system_components now detects git-backed memory rendering (<memory_filesystem> and <system/...> tags) when <memory_blocks> wrapper is absent, so core_memory is populated in the context preview. Add regression tests. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Shubham Naik	ca32311b9a	feat: allow users to specify via query to stip messages [LET-7392] (#9411 ) * feat: allow users to specify via query to stip messages * chore: regenerate API SDK and OpenAPI spec [LET-7392] 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Ari Webb <AriWebb@users.noreply.github.com> Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Ari Webb <AriWebb@users.noreply.github.com> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
jnjpng	6f51fa74be	fix: handling should continue for non system exceeded exceptions (#9406 ) * base * add log	2026-02-24 10:52:07 -08:00
Sarah Wooders	0dde155e9a	feat: Prefix cache optimization system prompt (#9381 )	2026-02-24 10:52:07 -08:00
Kian Jones	7eb85707b1	feat(tf): gpu runners and prod memory_repos (#9283 ) * add gpu runners and prod memory_repos * add lmstudio and vllm in model_settings * fix llm_configs and change variable name in reusable workflow and change perms for memory_repos to admin in tf * fix: update self-hosted provider tests to use SDK 1.0 and v2 tests - Update letta-client from ==0.1.324 to >=1.0.0 - Switch ollama/vllm/lmstudio tests to integration_test_send_message_v2.py 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai provider_type for self-hosted model settings ollama/vllm/lmstudio are not valid provider_type values in the SDK model_settings schema - they use openai-compatible APIs so provider_type should be openai. The provider routing is determined by the handle prefix. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai_compat_base_url for ollama/vllm/lmstudio providers When reconstructing LLMConfig from a model handle lookup, use the provider's openai_compat_base_url (which includes /v1) instead of raw base_url. This fixes 404 errors when calling ollama/vllm/lmstudio since OpenAI client expects /v1/chat/completions endpoint. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: enable redis for ollama/vllm/lmstudio tests Background streaming tests require Redis. Add use-redis: true to self-hosted provider test workflows. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add memfs-py in prod bucket access * change ollama * change packer model defaults * self-hosted provider support * diasble reasoner to match the number of messages in test case, enable parallel tool calls, and pass embedding configs * remove reasoning setting not supported for ollama * add qwen3 to extra assistant message case * lower temp * prep for lmstudio and vllm * used lmstudio_openai client * skip parallel tool calls on cpu ran provider lmstudio * revert downgrade since it's so slow already * add reuired flags for tool call parsing etc. * change tool call parser from hermes to qwen3_xml * qwen3_xmlk -> qwen3_coder * upgrade vllm to latest container * revert to hermes (incompatible with parallel tool calls?) and skipping vllm tests on parallel tool calls * install uv redis extra * remove lmstudio --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kevin Lin	23c94ec6d3	feat: add log probabilities from OpenAI-compatible servers and SGLang native endpoint (#9240 ) * Add log probabilities support for RL training This enables Letta server to request and return log probabilities from OpenAI-compatible providers (including SGLang) for use in RL training. Changes: - LLMConfig: Add return_logprobs and top_logprobs fields - OpenAIClient: Set logprobs in ChatCompletionRequest when enabled - LettaLLMAdapter: Add logprobs field and extract from response - LettaResponse: Add logprobs field to return log probs to client - LettaRequest: Add return_logprobs/top_logprobs for per-request override - LettaAgentV3: Store and pass logprobs through to response - agents.py: Handle request-level logprobs override Usage: response = client.agents.messages.create( agent_id=agent_id, messages=[...], return_logprobs=True, top_logprobs=5, ) print(response.logprobs) # Per-token log probabilities 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * Add multi-turn token tracking for RL training via SGLang native endpoint - Add TurnTokenData schema to track token IDs and logprobs per turn - Add return_token_ids flag to LettaRequest and LLMConfig - Create SGLangNativeClient for /generate endpoint (returns output_ids) - Create SGLangNativeAdapter that uses native endpoint - Modify LettaAgentV3 to accumulate turns across LLM calls - Include turns in LettaResponse when return_token_ids=True * Fix: Add SGLang native adapter to step() method, not just stream() * Fix: Handle Pydantic Message objects in SGLang native adapter * Fix: Remove api_key reference from LLMConfig (not present) * Fix: Add missing 'created' field to ChatCompletionResponse * Add full tool support to SGLang native adapter - Format tools into prompt in Qwen-style format - Parse tool calls from <tool_call> tags in response - Format tool results as <tool_response> in user messages - Set finish_reason to 'tool_calls' when tools are called * Use tokenizer.apply_chat_template for proper tool formatting - Add tokenizer caching in SGLang native adapter - Use apply_chat_template when tokenizer available - Fall back to manual formatting if not - Convert Letta messages to OpenAI format for tokenizer * Fix: Use func_response instead of tool_return for ToolReturn content * Fix: Get output_token_logprobs from meta_info in SGLang response * Fix: Allow None in output_token_logprobs (SGLang format includes null) * chore: remove unrelated files from logprobs branch 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add missing call_type param to adapter constructors in letta_agent_v3 The SGLang refactor dropped call_type=LLMCallType.agent_step when extracting adapter creation into conditional blocks. Restores it for all 3 spots (SGLang in step, SimpleLLM in step, SGLang in stream). 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * just stage-api && just publish-api * fix: update expected LLMConfig fields in schema test for logprobs support 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove rllm provider references 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * just stage-api && just publish-api 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-65-206.ec2.internal> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	f9f1c55c93	fix: fix context preview for git (#9403 )	2026-02-24 10:52:07 -08:00
Sarah Wooders	bbc648909b	refactor: drop memory/ prefix from git memory repo file paths and update core memory rendering [LET-7356] (#9395 )	2026-02-24 10:52:07 -08:00
Ari Webb	5fd5a6dd07	feat: add new azure api maintaining backward compat (#9387 ) * feat: add new azure provider type * fix context window	2026-02-24 10:52:07 -08:00
jnjpng	226df8baef	fix: propagate context window exceeded from chatgpt oauth client (#9393 ) * base * clean up * fixes	2026-02-24 10:52:07 -08:00
Ari Webb	c08b67a26a	feat: add ToolReturnCreate to MessageCreateParams [LET-7366] (#9385 ) * fix: add ToolReturnCreate to sdk types * ci	2026-02-24 10:52:07 -08:00
Cameron	89a7a12b38	fix(core): remove send_message tool requirement from A2A messages (#9383 ) The A2A messaging tools were instructing receiving agents to use the send_message tool to reply, but that tool is often not attached to agents anymore. This caused agents confusion when they couldn't find the required tool. For synchronous functions (send_message_to_agent_and_wait_for_reply, send_message_to_agents_matching_tags, send_message_to_all_agents_in_group), the system already captures AssistantMessage automatically, so agents just need to respond normally. For the async/fire-and-forget function (send_message_to_agent_async), updated to indicate it's a one-way notification and hint that messaging tools exist without requiring a specific one. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	6e0e1cc312	fix(core): validate run exists before creating step/step_metrics (#9382 ) Checks if the referenced run_id exists in the runs table before inserting steps and step_metrics. If the run doesn't exist (deleted or failed creation), sets run_id to None instead of hitting ForeignKeyViolationError on fk_steps_run_id. Fixes https://us5.datadoghq.com/error-tracking/issue/a1768774-d691-11f0-9330-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	4c753f3f3c	fix: handle non-JSON responses from LLM provider endpoints (#9362 ) When an OpenAI/Anthropic-compatible endpoint returns a non-JSON response (e.g. HTML error page), the SDK's paginated response parser falls back to returning a raw string. The post-parser then calls _set_private_attributes() on that string, causing an AttributeError. Add explicit AttributeError handling around SDK models.list() calls in provider check_api_key/list_llm_models_async methods, and add type guards in convert_response_to_chat_completion to reject raw strings before Pydantic model construction. Datadog: https://us5.datadoghq.com/error-tracking/issue/59a7a206-00b8-11f1-be73-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	71e0a8aab9	fix(core): use INSERT ON CONFLICT DO NOTHING for provider model sync (#9342 ) * fix(core): use INSERT ON CONFLICT DO NOTHING for provider model sync Replaces try/except around model.create_async() with pg_insert() .on_conflict_do_nothing() to prevent UniqueViolationError from being raised at the asyncpg driver level during concurrent model syncs. The previous approach caught the exception in Python but ddtrace still captured it at the driver level, causing Datadog error tracking noise. Fixes Datadog issue d8dec148-d535-11f0-95eb-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * cleaner impl * fix --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Ari Webb <ari@letta.com>	2026-02-24 10:52:07 -08:00
Charles Packer	b0e16ae50f	fix: surface GPT-5.3 Codex for ChatGPT OAuth providers (#9379 )	2026-02-24 10:52:07 -08:00
Sarah Wooders	526da4c49b	Revert "perf: optimize prefix caching by skipping system prompt rebuild on every step" (#9380 ) Revert "perf: optimize prefix caching by skipping system prompt rebuild on ev…" This reverts commit eafa4144c2577a45b7007a177b701863b98d1dfa.	2026-02-24 10:52:07 -08:00
Sarah Wooders	9dbe28e8f1	perf: optimize prefix caching by skipping system prompt rebuild on every step (#9080 )	2026-02-24 10:52:07 -08:00
Kian Jones	825019c2ce	fix(core): handle Anthropic streaming required ValueError (#9344 ) * Fix Anthropic ValueError for long-running operations Adds proper error handling for Anthropic SDK's streaming requirement. When operations may exceed 10 minutes, the SDK raises a ValueError. Changes: - Catch ValueError in sync request() method - Provide user-friendly error directing to async API - Async version already had this fix with streaming fallback Fixes Datadog issue 955d10b4-ed95-11f0-a5a5-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use LLMBadRequestError instead of ValueError for Anthropic streaming constraint ValueError maps to HTTP 400 which incorrectly implies a bad client request. LLMBadRequestError maps to HTTP 502 (Bad Gateway) which correctly signals that the downstream provider (Anthropic) rejected the proxied request due to its own constraints. Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>	2026-02-24 10:52:07 -08:00
Kian Jones	14ef479e70	fix(core): handle empty content in Anthropic response gracefully (#9345 ) Fixes Datadog issue a47619fa-d5b8-11f0-9fd7-da7ad0900000 Handle empty content in Anthropic responses gracefully by replacing RuntimeError with LLMServerError. Now logs detailed debugging information (response ID, model, stop_reason) and returns a user-friendly error instead of crashing. 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	4eb27e23b3	fix(core): add deadlock retry logic to ORM write operations (#9352 ) Adds automatic retry with exponential backoff for PostgreSQL deadlock errors (40P01) in all ORM write methods: create_async, update_async, batch_create_async, hard_delete_async, and bulk_hard_delete_async. For update_async, column values are snapshotted before the commit attempt so they can be restored after rollback clears them. Also adds DatabaseDeadlockError to _handle_dbapi_error as a fallback when retries are exhausted. Datadog: https://us5.datadoghq.com/error-tracking/issue/53ccdd7a-f0cc-11f0-8969-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>	2026-02-24 10:52:07 -08:00
Kian Jones	2c0cddf9f5	fix(core): handle Google 499 CANCELLED as client disconnect, not server error (#9363 ) The google.genai.errors.ClientError with code 499 (CANCELLED) indicates the client disconnected, not a server-side failure. Previously this fell through to the generic ClientError handler and was classified as LLMServerError, causing false 500s in Datadog error tracking. - Add explicit 499 handling in handle_llm_error: log at info level, return LLMConnectionError instead of LLMServerError - Catch 499 during stream iteration in stream_async and end gracefully instead of propagating the error Datadog: https://us5.datadoghq.com/error-tracking/issue/c8453aaa-d559-11f0-81c6-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	f20fdc73d1	fix(core): preserve Gemini thought_signature on function calls in non-streaming path (#9351 ) * fix(core): preserve Gemini thought_signature on function calls in non-streaming path The Google Gemini API requires thought_signature to be echoed back on function call parts in multi-turn conversations. In the non-streaming request path, the signature was only captured for subsequent function calls (else branch) but dropped for the first/only function call (if branch) in convert_response_to_chat_completion. This caused 400 INVALID_ARGUMENT errors on the next turn. Additionally, when no ReasoningContent existed to carry the signature (e.g. Gemini 2.5 Flash with include_thoughts=False), the signature was lost in the adapter layer. Now it falls through to TextContent. Datadog: https://us5.datadoghq.com/error-tracking/issue/17c4b114-d596-11f0-bcd6-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): preserve Gemini thought_signature in non-temporal agent path Carry reasoning_content_signature on TextContent in letta_agent.py at both locations where content falls through from reasoning (same fix already applied to the adapter and temporal activity paths). Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>	2026-02-24 10:52:07 -08:00

1 2 3 4 5 ...

7178 Commits