letta-server

Author	SHA1	Message	Date
Ari Webb	21765d16c9	fix(core): add OpenAI 24h prompt cache retention for supported models (#9509 ) * fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492) * fix(core): apply OpenAI prompt cache settings to request payloads Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prefix prompt cache key with letta tag Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add integration test * skip test --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Ari Webb <ari@letta.com> * fix(core): only set prompt_cache_retention, drop prompt_cache_key Two issues with the original prompt_cache_key approach: 1. Key exceeded 64-char max (agent-<uuid>:conv-<uuid> = 90 chars) 2. Setting an explicit key disrupted OpenAI's default prefix-hash routing, dropping cache hit rates from 40-45% to 10-13% OpenAI's default routing (hash of first ~256 tokens) already provides good cache affinity since each agent has a unique system prompt. We only need prompt_cache_retention="24h" for extended retention. Also fixes: - Operator precedence bug in _supports_extended_prompt_cache_retention - Removes incorrect gpt-5.2-codex exclusion (it IS supported per docs) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Charles Packer <packercharles@gmail.com> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
Kian Jones	f5c4ab50f4	chore: add ty + pre-commit hook and repeal even more ruff rules (#9504 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import * remove all ignores, add FastAPI rules and Ruff rules * add ty and precommit * ruff stuff * ty check fixes * ty check fixes pt 2 * error on invalid	2026-02-24 10:55:11 -08:00
Kian Jones	25d54dd896	chore: enable F821, F401, W293 (#9503 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import	2026-02-24 10:55:08 -08:00
Ari Webb	fa70e09963	Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention" (#9502 ) Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention …" This reverts commit f5bb9c629cb7d45544e90758cdfb899bcef41912.	2026-02-24 10:52:07 -08:00
Charles Packer	619e81ed1e	fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492 ) * fix(core): apply OpenAI prompt cache settings to request payloads Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prefix prompt cache key with letta tag Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add integration test * skip test --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Ari Webb <ari@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	5faec5632f	fix: add m2.5 (#9480 ) * fix: add m2.5 * fix test	2026-02-24 10:52:07 -08:00
github-actions[bot]	0b08164cc2	fix: update system prompt metadata label to "System prompt last recompiled" (#9477 ) fix: update system prompt metadata label from "Memory blocks were last modified" to "System prompt last recompiled" When git-based memory is enabled, there are no memory blocks, so the label "Memory blocks were last modified" is inaccurate. Changed to "System prompt last recompiled" which accurately reflects the timestamp meaning. Fixes #9476 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	80f34f134d	fix(core): catch bare openai.APIError in handle_llm_error (#9468 ) * fix(core): catch bare openai.APIError in handle_llm_error fallthrough openai.APIError raised during streaming (e.g. OpenRouter credit exhaustion) is not an APIStatusError, so it skipped the catch-all at the end and fell through to LLMError("Unhandled"). Now bare APIErrors that aren't context window overflows are mapped to LLMBadRequestError. Datadog: https://us5.datadoghq.com/error-tracking/issue/7a2c356c-0849-11f1-be66-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat(core): add LLMInsufficientCreditsError for BYOK credit exhaustion Adds dedicated error type for insufficient credits/quota across all providers (OpenAI, Anthropic, Google). Returns HTTP 402 with BYOK-aware messaging instead of generic 400. - New LLMInsufficientCreditsError class and PAYMENT_REQUIRED ErrorCode - is_insufficient_credits_message() helper detecting credit/quota strings - All 3 provider clients detect 402 status + credit keywords - FastAPI handler returns 402 with "your API key" vs generic messaging - 5 new parametrized tests covering OpenRouter, OpenAI, and negative case 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	b9c4ed3b15	fix: catch contextwindowexceeded error on gemini (#9450 ) * catch contextwindowexceeded error * fix(core): detect Google token limit errors as ContextWindowExceededError Google's error message says "input token count exceeds the maximum number of tokens allowed" which doesn't contain the word "context", so it was falling through to generic LLMBadRequestError instead of ContextWindowExceededError. This means compaction won't auto-trigger. Expands the detection to also match "token count" and "tokens allowed" in addition to the existing "context" keyword. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): add missing message arg to LLMBadRequestError in OpenAI client The generic 400 path in handle_llm_error was constructing LLMBadRequestError without the required message positional arg, causing TypeError in prod during summarization. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * ci: add adapters/ test suite to core unit test matrix 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(tests): update adapter error handling test expectations to match actual behavior The streaming adapter's error handling double-wraps errors: the AnthropicStreamingInterface calls handle_llm_error first, then the adapter catches the result and calls handle_llm_error again, which falls through to the base class LLMError. Updated test expectations to match this behavior. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prevent double-wrapping of LLMError in stream adapter The AnthropicStreamingInterface.process() already transforms raw provider errors into LLMError subtypes via handle_llm_error. The adapter was catching the result and calling handle_llm_error again, which didn't recognize the already-transformed LLMError and wrapped it in a generic LLMError("Unhandled LLM error"). This downgraded specific error types (LLMConnectionError, LLMServerError, etc.) and broke retry logic that matches on specific subtypes. Now the adapter checks if the error is already an LLMError and re-raises it as-is. Tests restored to original correct expectations. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	05073ba837	fix(core): preserve git-memory formatting and enforce lock conflicts (#9451 ) * test(core): strengthen git-memory system prompt stability integration coverage Switch git-memory HTTP integration tests to OpenAI model handles and add assertions that system prompt content remains stable after normal turns and direct block value updates until explicit recompilation or reset. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): preserve git-memory formatting and enforce lock conflicts Preserve existing markdown frontmatter formatting on block updates while still ensuring required metadata fields exist, and make post-push git sync propagate memory-repo lock conflicts as 409 responses. Also enable slash-containing core-memory block labels in route params and add regression coverage. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(memfs): fail closed on memory repo lock contention Make memfs git commits fail closed when the per-agent Redis lock cannot be acquired, return 409 MEMORY_REPO_BUSY from the memfs files write API, and map that 409 back to core MemoryRepoBusyError so API callers receive consistent busy conflicts. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore(core): minimize git-memory fix scope to memfs lock and frontmatter paths 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: drop unrelated changes and keep memfs-focused scope Revert branch-only changes that are not required for the memfs lock contention and frontmatter-preservation fix so the PR contains only issue-relevant files. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(memfs): lock push sync path and improve nested sync diagnostics Serialize memfs push-to-GCS sync with the same per-agent Redis lock key used by API commits, and add targeted post-push nested-block diagnostics plus a focused nested-label sync regression test for _sync_after_push. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	d7793a4474	fix(core): stabilize system prompt refresh and expand git-memory coverage (#9438 ) * fix(core): stabilize system prompt refresh and expand git-memory coverage Only rebuild system prompts on explicit refresh paths so normal turns preserve prefix-cache stability, including git/custom prompt layouts. Add integration coverage for memory filesystem tree structure and recompile/reset system-message updates via message-id retrieval. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): recompile system prompt around compaction and stabilize source tests Force system prompt refresh before/after compaction in LettaAgentV3 so repaired system+memory state is used and persisted across subsequent turns. Update source-system prompt tests to explicitly recompile before raw preview assertions instead of assuming automatic rebuild timing. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	d0e25ae471	feat: add glm 5 to core (#9436 ) * feat: add glm 5 to core * test glm 5	2026-02-24 10:52:07 -08:00
Kian Jones	02183efd5f	test: enable SQLAlchemy pooling in CI tests (#9279 ) * test: enable SQLAlchemy pooling in CI tests Changes CI test config to use LETTA_DISABLE_SQLALCHEMY_POOLING=false, enabling connection pooling to match production settings. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: remove hardcoded LETTA_DISABLE_SQLALCHEMY_POOLING fixture from conftest Remove the fixture that hardcoded the pooling setting in test code. The value should instead come from the CI workflow environment via vars.LETTA_DISABLE_SQLALCHEMY_POOLING (same source as production). 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>	2026-02-24 10:52:07 -08:00
Kian Jones	424a1ada64	fix: google gen ai format error fix (#9147 ) * google gen ai format error fix * fix(core): add $ref safety net, warning log, and unit tests for Google schema resolution - Add `$ref` to unsupported_keys in `_clean_google_ai_schema_properties` so unresolvable refs (e.g. `#/properties/...` style) are stripped as a safety net instead of crashing the Google SDK - Add warning log when `_resolve_json_schema_refs` encounters a ref it cannot resolve - Deduplicate the `#/$defs/` and `#/definitions/` resolution branches - Add 11 unit tests covering: single/multiple $defs, nested refs, refs in anyOf/allOf, array items, definitions key, unresolvable refs, and the full resolve+clean pipeline 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
jnjpng	39b25a0e3c	fix: update ContextWindowCalculator to parse new system message sections (#9398 ) * fix: update ContextWindowCalculator to parse new system message sections The context window calculator was using outdated position-based parsing that only handled 3 sections (base_instructions, memory_blocks, memory_metadata). The actual system message now includes additional sections that were not being tracked: - <memory_filesystem> (git-enabled agents) - <tool_usage_rules> (when tool rules configured) - <directories> (when sources attached) Changes: - Add _extract_tag_content() helper for proper XML tag extraction - Rewrite extract_system_components() to return a Dict with all 6 sections - Update calculate_context_window() to count tokens for new sections - Add new fields to ContextWindowOverview schema with backward-compatible defaults - Add unit tests for the extraction logic * update * generate * fix: check attached file in directories section instead of core_memory Files are rendered inside <directories> tags, not <memory_blocks>. Update validate_context_window_overview assertions accordingly. * fix: address review feedback for context window parser - Fix git-enabled agents regression: capture bare file blocks (e.g. <system/human.md>) rendered after </memory_filesystem> as core_memory via new _extract_git_core_memory() method - Make _extract_top_level_tag robust: scan all occurrences to find tag outside container, handling nested-first + top-level-later case - Document system_prompt tag inconsistency in docstring - Add TODO to base_agent.py extract_dynamic_section linking to ContextWindowCalculator to flag parallel parser tech debt - Add tests: git-enabled agent parsing, dual-occurrence tag extraction, pure text system prompt, git-enabled integration test	2026-02-24 10:52:07 -08:00
Kian Jones	7cc1cd3dc0	feat(ci): self-hosted provider test for lmstudio (#9404 ) * add gpu runners and prod memory_repos * add lmstudio and vllm in model_settings * fix llm_configs and change variable name in reusable workflow and change perms for memory_repos to admin in tf * fix: update self-hosted provider tests to use SDK 1.0 and v2 tests - Update letta-client from ==0.1.324 to >=1.0.0 - Switch ollama/vllm/lmstudio tests to integration_test_send_message_v2.py 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai provider_type for self-hosted model settings ollama/vllm/lmstudio are not valid provider_type values in the SDK model_settings schema - they use openai-compatible APIs so provider_type should be openai. The provider routing is determined by the handle prefix. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: enable redis for ollama/vllm/lmstudio tests Background streaming tests require Redis. Add use-redis: true to self-hosted provider test workflows. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * prep for lmstudio and vllm * used lmstudio_openai client * change tool call parser from hermes to qwen3_xml * qwen3_xmlk -> qwen3_coder * revert to hermes (incompatible with parallel tool calls?) and skipping vllm tests on parallel tool calls * install uv redis extra * remove lmstudio * create lmstudio test * qwen3-14b on lmstudio * try with qwen3-4b * actually update the model config json to use qwen3-4b * add test_providers::test_lmstudio * bump timeout from 60 to 120 for slow lmstudio on cpu model * misc vllm changes --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	2ffef0fb31	Fix git-memory context preview parsing (#9414 ) * fix(core): handle git memory label prefix collisions in filesystem view Prevent context window preview crashes when a block label is both a leaf and a prefix (e.g. system/human and system/human/context) by rendering a node as both file and directory. Add regression test. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): parse git-backed core memory in context window preview ContextWindowCalculator.extract_system_components now detects git-backed memory rendering (<memory_filesystem> and <system/...> tags) when <memory_blocks> wrapper is absent, so core_memory is populated in the context preview. Add regression tests. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	0dde155e9a	feat: Prefix cache optimization system prompt (#9381 )	2026-02-24 10:52:07 -08:00
Kian Jones	7eb85707b1	feat(tf): gpu runners and prod memory_repos (#9283 ) * add gpu runners and prod memory_repos * add lmstudio and vllm in model_settings * fix llm_configs and change variable name in reusable workflow and change perms for memory_repos to admin in tf * fix: update self-hosted provider tests to use SDK 1.0 and v2 tests - Update letta-client from ==0.1.324 to >=1.0.0 - Switch ollama/vllm/lmstudio tests to integration_test_send_message_v2.py 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai provider_type for self-hosted model settings ollama/vllm/lmstudio are not valid provider_type values in the SDK model_settings schema - they use openai-compatible APIs so provider_type should be openai. The provider routing is determined by the handle prefix. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai_compat_base_url for ollama/vllm/lmstudio providers When reconstructing LLMConfig from a model handle lookup, use the provider's openai_compat_base_url (which includes /v1) instead of raw base_url. This fixes 404 errors when calling ollama/vllm/lmstudio since OpenAI client expects /v1/chat/completions endpoint. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: enable redis for ollama/vllm/lmstudio tests Background streaming tests require Redis. Add use-redis: true to self-hosted provider test workflows. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add memfs-py in prod bucket access * change ollama * change packer model defaults * self-hosted provider support * diasble reasoner to match the number of messages in test case, enable parallel tool calls, and pass embedding configs * remove reasoning setting not supported for ollama * add qwen3 to extra assistant message case * lower temp * prep for lmstudio and vllm * used lmstudio_openai client * skip parallel tool calls on cpu ran provider lmstudio * revert downgrade since it's so slow already * add reuired flags for tool call parsing etc. * change tool call parser from hermes to qwen3_xml * qwen3_xmlk -> qwen3_coder * upgrade vllm to latest container * revert to hermes (incompatible with parallel tool calls?) and skipping vllm tests on parallel tool calls * install uv redis extra * remove lmstudio --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kevin Lin	23c94ec6d3	feat: add log probabilities from OpenAI-compatible servers and SGLang native endpoint (#9240 ) * Add log probabilities support for RL training This enables Letta server to request and return log probabilities from OpenAI-compatible providers (including SGLang) for use in RL training. Changes: - LLMConfig: Add return_logprobs and top_logprobs fields - OpenAIClient: Set logprobs in ChatCompletionRequest when enabled - LettaLLMAdapter: Add logprobs field and extract from response - LettaResponse: Add logprobs field to return log probs to client - LettaRequest: Add return_logprobs/top_logprobs for per-request override - LettaAgentV3: Store and pass logprobs through to response - agents.py: Handle request-level logprobs override Usage: response = client.agents.messages.create( agent_id=agent_id, messages=[...], return_logprobs=True, top_logprobs=5, ) print(response.logprobs) # Per-token log probabilities 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * Add multi-turn token tracking for RL training via SGLang native endpoint - Add TurnTokenData schema to track token IDs and logprobs per turn - Add return_token_ids flag to LettaRequest and LLMConfig - Create SGLangNativeClient for /generate endpoint (returns output_ids) - Create SGLangNativeAdapter that uses native endpoint - Modify LettaAgentV3 to accumulate turns across LLM calls - Include turns in LettaResponse when return_token_ids=True * Fix: Add SGLang native adapter to step() method, not just stream() * Fix: Handle Pydantic Message objects in SGLang native adapter * Fix: Remove api_key reference from LLMConfig (not present) * Fix: Add missing 'created' field to ChatCompletionResponse * Add full tool support to SGLang native adapter - Format tools into prompt in Qwen-style format - Parse tool calls from <tool_call> tags in response - Format tool results as <tool_response> in user messages - Set finish_reason to 'tool_calls' when tools are called * Use tokenizer.apply_chat_template for proper tool formatting - Add tokenizer caching in SGLang native adapter - Use apply_chat_template when tokenizer available - Fall back to manual formatting if not - Convert Letta messages to OpenAI format for tokenizer * Fix: Use func_response instead of tool_return for ToolReturn content * Fix: Get output_token_logprobs from meta_info in SGLang response * Fix: Allow None in output_token_logprobs (SGLang format includes null) * chore: remove unrelated files from logprobs branch 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add missing call_type param to adapter constructors in letta_agent_v3 The SGLang refactor dropped call_type=LLMCallType.agent_step when extracting adapter creation into conditional blocks. Restores it for all 3 spots (SGLang in step, SimpleLLM in step, SGLang in stream). 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * just stage-api && just publish-api * fix: update expected LLMConfig fields in schema test for logprobs support 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove rllm provider references 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * just stage-api && just publish-api 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-65-206.ec2.internal> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Charles Packer	b0e16ae50f	fix: surface GPT-5.3 Codex for ChatGPT OAuth providers (#9379 )	2026-02-24 10:52:07 -08:00
Sarah Wooders	526da4c49b	Revert "perf: optimize prefix caching by skipping system prompt rebuild on every step" (#9380 ) Revert "perf: optimize prefix caching by skipping system prompt rebuild on ev…" This reverts commit eafa4144c2577a45b7007a177b701863b98d1dfa.	2026-02-24 10:52:07 -08:00
Sarah Wooders	9dbe28e8f1	perf: optimize prefix caching by skipping system prompt rebuild on every step (#9080 )	2026-02-24 10:52:07 -08:00
Kian Jones	b0c40b6b1d	fix: multi_agent flaky test (#9314 ) * fix(core): handle PermissionDeniedError in provider API key validation Fixed OpenAI PermissionDeniedError being raised as unknown error when validating provider API keys. The check_api_key methods in OpenAI-based providers (OpenAI, OpenRouter, Azure, Together) now properly catch and re-raise PermissionDeniedError as LLMPermissionDeniedError. 🐛 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): handle Unicode surrogates in OpenAI requests Sanitize invalid UTF-16 surrogates before sending requests to OpenAI API. Fixes UnicodeEncodeError when message content contains unpaired surrogates from corrupted emoji data or malformed Unicode sequences. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * try to fix * revert random stuff * revert some stuff --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Sarah Wooders	21e880907f	feat(core): structure memory directory and block labels [LET-7336] (#9309 )	2026-02-24 10:52:06 -08:00
Kian Jones	6f746c5225	fix(core): handle Anthropic overloaded errors and Unicode encoding issues (#9305 ) * fix: handle Anthropic overloaded_error in streaming interfaces * fix: handle Unicode surrogates in OpenAI requests Sanitize Unicode surrogate pairs before sending requests to OpenAI API. Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause UnicodeEncodeError when encoding to UTF-8. Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in position 326605: surrogates not allowed * fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests Improved sanitize_unicode_surrogates() to explicitly filter out lone surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8. Previous implementation used errors='ignore' which could still fail in edge cases. New approach directly checks Unicode code points and removes any surrogates before data reaches httpx encoding. Also added sanitization to stream_async_responses() method which was missing it. Fixes: 'utf-8' codec can't encode character '\ud83c' in position X: surrogates not allowed	2026-02-24 10:52:06 -08:00
Ari Webb	5c6ca705f1	Revert "feat: bring back use message packing for timezone [LET-6846]" (#9302 ) Revert "feat: bring back use message packing for timezone [LET-6846] (#9256)" This reverts commit c5017cccdef95b84fc585b26a0ddc5b7e44eb7c9.	2026-02-24 10:52:06 -08:00
jnjpng	ff69c6a32e	feat: add /agents/{agent_id}/generate endpoint for direct LLM requests (#9272 ) * feat: add /agents/{agent_id}/generate endpoint for direct LLM requests Add new endpoint that makes direct LLM provider requests without agent context, memory, tools, or state modification. This enables: - Quick LLM queries without agent overhead - Testing model configurations - Simple chat completions using agent's credentials - Comparing responses across different models Features: - Uses agent's LLM config by default - Supports model override with full provider config resolution - Non-streaming, stateless operation - Proper error handling and validation - Request/response schemas with Pydantic validation Implementation: - Add GenerateRequest and GenerateResponse schemas - Implement generate_completion endpoint handler - Add necessary imports (LLMError, LLMClient, HandleNotFoundError) - Include logging and comprehensive error handling * fix: improve error handling and fix Message construction - Fix critical bug: use content=[TextContent(text=...)] instead of text=... - Add explicit error handling for NoResultFound and HandleNotFoundError - Add error handling for convert_response_to_chat_completion - Add structured logging for debugging - Remove unnecessary .get() calls since Pydantic validates messages * refactor: extract generate logic to AgentCompletionService Move the generate endpoint business logic out of the endpoint handler into a dedicated AgentCompletionService class for better code organization and separation of concerns. Changes: - Create new AgentCompletionService in services/agent_completion_service.py - Service handles all business logic: agent validation, LLM config resolution, message conversion, LLM client creation, and request/response processing - Integrate service with SyncServer initialization - Refactor generate_completion endpoint to use the service - Endpoint now only handles HTTP concerns (auth, error mapping) Benefits: - Cleaner endpoint code (reduced from ~140 lines to ~25 lines) - Better separation of concerns (HTTP vs business logic) - Service logic can be reused or tested independently - Follows established patterns in the codebase (AgentManager, etc.) * feat: simplify generate API to accept just prompt text Simplify the client interface by accepting a simple prompt string instead of requiring clients to format messages. Changes: - Update GenerateRequest schema: - Replace 'messages' array with simple 'prompt' string - Add optional 'system_prompt' for context/instructions - Keep 'override_model' for model selection - Update AgentCompletionService to format messages automatically: - Accepts prompt and optional system_prompt - Constructs message array internally (system + user messages) - Simpler API surface for clients - Update endpoint documentation with new simplified examples - Regenerate OpenAPI spec and TypeScript SDK Benefits: - Much simpler client experience - just send text - No need to understand message formatting - Still supports system prompts for context - Cleaner API that matches common use cases Example (before): { "messages": [{"role": "user", "content": "What is 2+2?"}] } Example (after): { "prompt": "What is 2+2?" } * test: add comprehensive integration tests for generate endpoint Add 9 integration tests covering various scenarios: Happy path tests: - test_agent_generate_basic: Basic prompt -> response flow - test_agent_generate_with_system_prompt: System prompt + user prompt - test_agent_generate_with_model_override: Override model selection - test_agent_generate_long_prompt: Handle longer prompts - test_agent_generate_no_persistence: Verify no messages saved to agent Error handling tests: - test_agent_generate_empty_prompt_error: Empty prompt validation (422) - test_agent_generate_invalid_agent_id: Invalid agent ID (404) - test_agent_generate_invalid_model_override: Invalid model handle (404) All tests verify: - Response structure (content, model, usage) - Proper status codes for errors - Usage statistics (tokens, counts) - No side effects on agent state Tests follow existing test patterns in test_client.py and use the letta_client SDK (assuming generate_completion method is auto-generated from the OpenAPI spec). * openapi * refactor: rename AgentCompletionService to AgentGenerateCompletionManager Rename for better clarity and consistency with codebase naming conventions: - Rename file: agent_completion_service.py → agent_generate_completion_manager.py - Rename class: AgentCompletionService → AgentGenerateCompletionManager - Rename attribute: server.agent_completion_service → server.agent_generate_completion_manager - Update docstrings: 'Service' → 'Manager' Changes: - apps/core/letta/services/agent_generate_completion_manager.py (renamed + updated class) - apps/core/letta/server/server.py (import + initialization) - apps/core/letta/server/rest_api/routers/v1/agents.py (usage in endpoint) No functional changes, purely a naming refactor. * fix: remove invalid Message parameters in generate manager Remove agent_id=None and user_id=None from Message construction. The Message model doesn't accept these as None values - only pass required parameters (role, content). Fixes validation error: 'Extra inputs are not permitted [type=extra_forbidden, input_value=None]' This aligns with other Message construction patterns in the codebase (see tools.py, memory.py examples). * feat: improve generate endpoint validation and tests - Add field validator for whitespace-only prompts - Always include system message (required by Anthropic) - Use default "You are a helpful assistant." when no system_prompt provided - Update tests to use direct HTTP calls via httpx - Fix test issues: - Use valid agent ID format (agent-{uuid}) - Use available model (openai/gpt-4o-mini) - Add whitespace validation test - All 9 integration tests passing	2026-02-24 10:52:06 -08:00
Ari Webb	426f6a8ca4	feat: bring back use message packing for timezone [LET-6846] (#9256 ) * feat: bring back use message packing for timezone * add tests	2026-02-24 10:52:06 -08:00
amysguan	16c96cc3c0	Fix sliding window cutoff logic (#9261 ) * fix sliding window cutoff calculations to use agent instead of summarizer config * allow approval messages with tool_calls as valid cutoffs, prevent approval pairs from being split * update tests with updated sliding window parameters --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:52:06 -08:00
Kian Jones	00b36bc591	fix: resolve crouton telemetry failures (#9269 ) Two issues were causing telemetry failures: 1. Startup race - memgpt-server sending telemetry before crouton created socket 2. Oversized payloads - large context windows (1M+ tokens) exceeding buffer Changes: - Increase crouton buffer to 128MB max with lazy allocation (64KB initial) - Bump crouton resources (512Mi limit, 128Mi request) - Add retry with exponential backoff in socket backend - Move crouton to initContainers with restartPolicy: Always for deterministic startup 🐙 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Sarah Wooders	eaf64fb510	fix: add LLMCallType enum and ensure call_type is set on all provider traces (#9258 ) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
jnjpng	f48b60634f	refactor: extract compact logic to shared function for temporal (#9249 ) * refactor: extract compact logic to shared function Extract the compaction logic from LettaAgentV3.compact() into a standalone compact_messages() function that can be shared between the agent and temporal workflows. Changes: - Create apps/core/letta/services/summarizer/compact.py with: - compact_messages(): Core compaction logic - build_summarizer_llm_config(): LLM config builder for summarization - CompactResult: Dataclass for compaction results - Update LettaAgentV3.compact() to use compact_messages() - Update temporal summarize_conversation_history activity to use compact_messages() instead of the old Summarizer class - Add use_summary_role parameter to SummarizeParams This ensures consistent summarization behavior across different execution paths and prevents drift as we improve the implementation. * chore: clean up verbose comments * fix: correct CompactionSettings import path * fix: correct count_tokens import from summarizer_sliding_window * fix: update test patch path for count_tokens_with_tools After extracting compact logic to compact.py, the test was patching the old location. Update the patch path to the new module location. * fix: update test to use build_summarizer_llm_config from compact.py The function was moved from LettaAgentV3._build_summarizer_llm_config to compact.py as a standalone function. * fix: add early check for system prompt size in compact_messages Check if the system prompt alone exceeds the context window before attempting summarization. The system prompt cannot be compacted, so fail fast with SystemPromptTokenExceededError. * fix: properly propagate SystemPromptTokenExceededError from compact The exception handler in _step() was not setting the correct stop_reason for SystemPromptTokenExceededError, which caused the finally block to return early and swallow the exception. Add special handling to set stop_reason to context_window_overflow_in_system_prompt when SystemPromptTokenExceededError is caught. * revert: remove redundant SystemPromptTokenExceededError handling The special handling in the outer exception handler is redundant because stop_reason is already set in the inner handler at line 943. The actual fix for the test was the early check in compact_messages(), not this redundant handling. * fix: correctly re-raise SystemPromptTokenExceededError The inner exception handler was using 'raise e' which re-raised the outer ContextWindowExceededError instead of the current SystemPromptTokenExceededError. Changed to 'raise' to correctly re-raise the current exception. This bug was pre-existing but masked because _check_for_system_prompt_overflow was only called as a fallback. The new early check in compact_messages() exposed it. * revert: remove early check and restore raise e to match main behavior * fix: set should_continue=False and correctly re-raise exception - Add should_continue=False in SystemPromptTokenExceededError handler (matching main's _check_for_system_prompt_overflow behavior) - Fix raise e -> raise to correctly propagate SystemPromptTokenExceededError Note: test_large_system_prompt_summarization still fails locally but passes on main. Need to investigate why exception isn't propagating correctly on refactored branch. * fix: add SystemPromptTokenExceededError handler for post-step compaction The post-step compaction (line 1066) was missing a SystemPromptTokenExceededError exception handler. When compact_messages() raised this error, it would be caught by the outer exception handler which would: 1. Set stop_reason to "error" instead of "context_window_overflow_in_system_prompt" 2. Not set should_continue = False 3. Get swallowed by the finally block (line 1126) which returns early This caused test_large_system_prompt_summarization to fail because the exception never propagated to the test. The fix adds the same exception handler pattern used in the retry compaction flow (line 941-946), ensuring proper state is set before re-raising. This issue only affected the refactored code because on main, _check_for_system_prompt_overflow() was an instance method that set should_continue/stop_reason BEFORE raising. In the refactor, compact_messages() is a standalone function that cannot set instance state, so the caller must handle the exception and set the state.	2026-02-24 10:52:06 -08:00
Kian Jones	a206f7f345	feat: add ID format validation to agent and user schemas (#9151 ) * feat: add ID format validation to agent and user schemas Reuse existing validator types (ToolId, SourceId, BlockId, MessageId, IdentityId, UserId) from letta.validators to enforce ID format validation at the schema level. This ensures malformed IDs are rejected with a 422 validation error instead of causing 500 database errors. Changes: - CreateAgent: validate tool_ids, source_ids, folder_ids, block_ids, identity_ids - UpdateAgent: validate tool_ids, source_ids, folder_ids, block_ids, message_ids, identity_ids - UserUpdate: validate id 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: regenerate API spec and SDK * fix: override ID validation in AgentSchema for agent file portability AgentSchema extends CreateAgent but needs to allow arbitrary short IDs (e.g., tool-0, block-0) for portable agent files. Override the validated ID fields to use plain List[str] instead of the validated types. Also fix test_agent.af to use proper UUID-format IDs. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: regenerate API spec and SDK 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: revert test_agent.af - short IDs are valid for agent files 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix openapi schema --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Sarah Wooders	3fdf2b6c79	chore: deprecate old agent messaging (#9120 )	2026-02-24 10:52:06 -08:00
Sarah Wooders	4096b30cd7	feat: log LLM traces to clickhouse (#9111 ) * feat: add non-streaming option for conversation messages - Add ConversationMessageRequest with stream=True default (backwards compatible) - stream=true (default): SSE streaming via StreamingService - stream=false: JSON response via AgentLoop.load().step() 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: regenerate API schema for ConversationMessageRequest * feat: add direct ClickHouse storage for raw LLM traces Adds ability to store raw LLM request/response payloads directly in ClickHouse, bypassing OTEL span attribute size limits. This enables debugging and analytics on large LLM payloads (>10MB system prompts, large tool schemas, etc.). New files: - letta/schemas/llm_raw_trace.py: Pydantic schema with ClickHouse row helper - letta/services/llm_raw_trace_writer.py: Async batching writer (fire-and-forget) - letta/services/llm_raw_trace_reader.py: Reader with query methods - scripts/sql/clickhouse/llm_raw_traces.ddl: Production table DDL - scripts/sql/clickhouse/llm_raw_traces_local.ddl: Local dev DDL - apps/core/clickhouse-init.sql: Local dev initialization Modified: - letta/settings.py: Added 4 settings (store_llm_raw_traces, ttl, batch_size, flush_interval) - letta/llm_api/llm_client_base.py: Integration into request_async_with_telemetry - compose.yaml: Added ClickHouse service for local dev - justfile: Added clickhouse, clickhouse-cli, clickhouse-traces commands Feature disabled by default (LETTA_STORE_LLM_RAW_TRACES=false). Uses ZSTD(3) compression for 10-30x reduction on JSON payloads. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: address code review feedback for LLM raw traces Fixes based on code review feedback: 1. Fix ClickHouse endpoint parsing - default to secure=False for raw host:port inputs (was defaulting to HTTPS which breaks local dev) 2. Make raw trace writes truly fire-and-forget - use asyncio.create_task() instead of awaiting, so JSON serialization doesn't block request path 3. Add bounded queue (maxsize=10000) - prevents unbounded memory growth under load. Drops traces with warning if queue is full. 4. Fix deprecated asyncio usage - get_running_loop() instead of get_event_loop() 5. Add org_id fallback - use _telemetry_org_id if actor doesn't have it 6. Remove unused imports - json import in reader 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add missing asyncio import and simplify JSON serialization - Add missing 'import asyncio' that was causing 'name asyncio is not defined' error - Remove unnecessary clean_double_escapes() function - the JSON is stored correctly, the clickhouse-client CLI was just adding extra escaping when displaying - Update just clickhouse-trace to use Python client for correct JSON output 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: add clickhouse raw trace integration test * test: simplify clickhouse trace assertions * refactor: centralize usage parsing and stream error traces Use per-client usage helpers for raw trace extraction and ensure streaming errors log requests with error metadata. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: exercise provider usage parsing live Make live OpenAI/Anthropic/Gemini requests with credential gating and validate Anthropic cache usage mapping when present. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: fix usage parsing tests to pass - Use GoogleAIClient with GEMINI_API_KEY instead of GoogleVertexClient - Update model to gemini-2.0-flash (1.5-flash deprecated in v1beta) - Add tools=[] for Gemini/Anthropic build_request_data 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: extract_usage_statistics returns LettaUsageStatistics Standardize on LettaUsageStatistics as the canonical usage format returned by client helpers. Inline UsageStatistics construction for ChatCompletionResponse where needed. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add is_byok and llm_config_json columns to ClickHouse traces Extend llm_raw_traces table with: - is_byok (UInt8): Track BYOK vs base provider usage for billing analytics - llm_config_json (String, ZSTD): Store full LLM config for debugging and analysis This enables queries like: - BYOK usage breakdown by provider/model - Config parameter analysis (temperature, max_tokens, etc.) - Debugging specific request configurations * feat: add tests for error traces, llm_config_json, and cache tokens - Update llm_raw_trace_reader.py to query new columns (is_byok, cached_input_tokens, cache_write_tokens, reasoning_tokens, llm_config_json) - Add test_error_trace_stored_in_clickhouse to verify error fields - Add test_cache_tokens_stored_for_anthropic to verify cache token storage - Update existing tests to verify llm_config_json is stored correctly - Make llm_config required in log_provider_trace_async() - Simplify provider extraction to use provider_name directly 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * ci: add ClickHouse integration tests to CI pipeline - Add use-clickhouse option to reusable-test-workflow.yml - Add ClickHouse service container with otel database - Add schema initialization step using clickhouse-init.sql - Add ClickHouse env vars (CLICKHOUSE_ENDPOINT, etc.) - Add separate clickhouse-integration-tests job running integration_test_clickhouse_llm_raw_traces.py 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify provider and org_id extraction in raw trace writer - Use model_endpoint_type.value for provider (not provider_name) - Simplify org_id to just self.actor.organization_id (actor is always pydantic) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify LLMRawTraceWriter with _enabled flag - Check ClickHouse env vars once at init, set _enabled flag - Early return in write_async/flush_async if not enabled - Remove ValueError raises (never used) - Simplify _get_client (no validation needed since already checked) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add LLMRawTraceWriter shutdown to FastAPI lifespan Properly flush pending traces on graceful shutdown via lifespan instead of relying only on atexit handler. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add agent_tags column to ClickHouse traces Store agent tags as Array(String) for filtering/analytics by tag. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * cleanup * fix(ci): fix ClickHouse schema initialization in CI - Create database separately before loading SQL file - Remove CREATE DATABASE from SQL file (handled in CI step) - Add verification step to confirm table was created - Use -sf flag for curl to fail on HTTP errors 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify LLM trace writer with ClickHouse async_insert - Use ClickHouse async_insert for server-side batching instead of manual queue/flush loop - Sync cloud DDL schema with clickhouse-init.sql (add missing columns) - Remove redundant llm_raw_traces_local.ddl - Remove unused batch_size/flush_interval settings - Update tests for simplified writer Key changes: - async_insert=1, wait_for_async_insert=1 for reliable server-side batching - Simple per-trace retry with exponential backoff (max 3 retries) - ~150 lines removed from writer 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: consolidate ClickHouse direct writes into TelemetryManager backend - Add clickhouse_direct backend to provider_trace_backends - Remove duplicate ClickHouse write logic from llm_client_base.py - Configure via LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse_direct The clickhouse_direct backend: - Converts ProviderTrace to LLMRawTrace - Extracts usage stats from response JSON - Writes via LLMRawTraceWriter with async_insert 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: address PR review comments and fix llm_config bug Review comment fixes: - Rename clickhouse_direct -> clickhouse_analytics (clearer purpose) - Remove ClickHouse from OSS compose.yaml, create separate compose.clickhouse.yaml - Delete redundant scripts/test_llm_raw_traces.py (use pytest tests) - Remove unused llm_raw_traces_ttl_days setting (TTL handled in DDL) - Fix socket description leak in telemetry_manager docstring - Add cloud-only comment to clickhouse-init.sql - Update justfile to use separate compose file Bug fix: - Fix llm_config not being passed to ProviderTrace in telemetry - Now correctly populates provider, model, is_byok for all LLM calls - Affects both request_async_with_telemetry and log_provider_trace_async DDL optimizations: - Add secondary indexes (bloom_filter for agent_id, model, step_id) - Add minmax indexes for is_byok, is_error - Change model and error_type to LowCardinality for faster GROUP BY 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: rename llm_raw_traces -> llm_traces Address review feedback that "raw" is misleading since we denormalize fields. Renames: - Table: llm_raw_traces -> llm_traces - Schema: LLMRawTrace -> LLMTrace - Files: llm_raw_trace_{reader,writer}.py -> llm_trace_{reader,writer}.py - Setting: store_llm_raw_traces -> store_llm_traces 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: update workflow references to llm_traces Missed renaming table name in CI workflow files. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: update clickhouse_direct -> clickhouse_analytics in docstring 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove inaccurate OTEL size limit comments The 4MB limit is our own truncation logic, not an OTEL protocol limit. The real benefit is denormalized columns for analytics queries. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove local ClickHouse dev setup (cloud-only feature) - Delete clickhouse-init.sql and compose.clickhouse.yaml - Remove local clickhouse just commands - Update CI to use cloud DDL with MergeTree for testing clickhouse_analytics is a cloud-only feature. For local dev, use postgres backend. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: restore compose.yaml to match main 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: merge clickhouse_analytics into clickhouse backend Per review feedback - having two separate backends was confusing. Now the clickhouse backend: - Writes to llm_traces table (denormalized for cost analytics) - Reads from OTEL traces table (will cut over to llm_traces later) Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: correct path to DDL file in CI workflow 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: add provider index to DDL for faster filtering 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: configure telemetry backend in clickhouse tests Tests need to set telemetry_settings.provider_trace_backends to include 'clickhouse', otherwise traces are routed to default postgres backend. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: set provider_trace_backend field, not property provider_trace_backends is a computed property, need to set the underlying provider_trace_backend string field instead. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: error trace test and error_type extraction - Add TelemetryManager to error trace test so traces get written - Fix error_type extraction to check top-level before nested error dict 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use provider_trace.id for trace correlation across backends - Pass provider_trace.id to LLMTrace instead of auto-generating - Log warning if ID is missing (shouldn't happen, helps debug) - Fallback to new UUID only if not set 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: trace ID correlation and concurrency issues - Strip "provider_trace-" prefix from ID for UUID storage in ClickHouse - Add asyncio.Lock to serialize writes (clickhouse_connect not thread-safe) - Fix Anthropic prompt_tokens to include cached tokens for cost analytics - Log warning if provider_trace.id is missing 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Caren Thomas <carenthomas@gmail.com>	2026-02-24 10:52:06 -08:00
jnjpng	24ea7dbaed	feat: include tools as part of token estimate in compact (#9242 ) * base * fix	2026-02-24 10:52:06 -08:00
Ari Webb	0bbb9c9bc0	feat: add reasoning zai openrouter (#9189 ) * feat: add reasoning zai openrouter * add openrouter reasoning * stage + publish api * openrouter reasoning always on * revert * fix * remove reference * do	2026-02-24 10:52:06 -08:00
Kian Jones	01cb00ae10	Revert "fix: truncate oversized text in embedding requests" (#9227 ) Revert "fix: truncate oversized text in embedding requests (#9196)" This reverts commit a9c342087e022519c63d62fb76b72aed8859539b.	2026-02-24 10:52:06 -08:00
Kian Jones	630c147b13	fix: truncate oversized text in embedding requests (#9196 ) fix: handle oversized text in embedding requests with recursive chunking When message text exceeds the embedding model's context length, recursively split it until all chunks can be embedded successfully. Changes: - `tpuf_client.py`: Add `_split_text_in_half()` helper for recursive splitting - `tpuf_client.py`: Add `_generate_embeddings_with_chunking()` that retries with splits on context length errors - `tpuf_client.py`: Store `message_id` and `chunk_index` columns in Turbopuffer - `tpuf_client.py`: Deduplicate query results by `message_id` - `tpuf_client.py`: Use `LettaInvalidArgumentError` instead of `ValueError` - `tpuf_client.py`: Move LLMClient import to top of file - `openai_client.py`: Remove fixed truncation (chunking handles this now) - Add tests for `_split_text_in_half` and chunked query deduplication 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
jnjpng	3f23a23227	feat: add compaction stats (#9219 ) * base * update * last * generate * fix test	2026-02-24 10:52:06 -08:00
jnjpng	df2e666ced	test: skip deepwiki SSE MCP server test (#9218 ) The deepwiki SSE MCP server is deprecated, so skip this test. 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
jnjpng	e25a0c9cdf	feat: update compact endpoint to store summary message (#9215 ) * base * add tests	2026-02-24 10:52:06 -08:00
jnjpng	d28ccc0be6	feat: add summary message and event on compaction (#9144 ) * base * update * update * revert formatting * routes * legacy * fix * review * update	2026-02-24 10:52:05 -08:00
cthomas	59ffaec8f4	fix: revert test comments (#9161 )	2026-01-29 12:44:04 -08:00
cthomas	d992aa0df4	fix: non-streaming conversation messages endpoint (#9159 ) * fix: non-streaming conversation messages endpoint Problems: 1. `AssertionError: run_id is required when enforce_run_id_set is True` - Non-streaming path didn't create a run before calling `step()` 2. `ResponseValidationError: Unable to extract tag using discriminator 'message_type'` - `response_model=LettaStreamingResponse` but non-streaming returns `LettaResponse` Fixes: 1. Add run creation before calling `step()` (mirrors agents endpoint) 2. Set run_id in Redis for cancellation support 3. Pass `run_id` to `step()` 4. Change `response_model` from `LettaStreamingResponse` to `LettaResponse` (streaming returns `StreamingResponse` which bypasses response_model validation) Test: Added `test_conversation_non_streaming_raw_http` to verify the fix. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * api sync --------- Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
Kian Jones	34eed72150	feat: add user id validation (#9128 ) * add user id validation * relax conversation id check to allow default while I'm here * fix annotation validation * -api changes	2026-01-29 12:44:04 -08:00
Kian Jones	0099a95a43	fix(sec): first pass of ensuring actor id is required everywhere (#9126 ) first pass of ensuring actor id is required	2026-01-29 12:44:04 -08:00
Sarah Wooders	b34ad43691	feat: add minimax byok to ui (#9101 ) * fix: patch minimax * feat: add frontend changes for minimax * add logo, fix backend * better check for is minimax * more references fixed for minimax * start revering unnecessary changes * revert backend changes, just ui * fix minimax fully * fix test * add key to deploy action --------- Co-authored-by: Ari Webb <ari@letta.com> Co-authored-by: Ari Webb <arijwebb@gmail.com>	2026-01-29 12:44:04 -08:00
Sarah Wooders	fb69a96cd6	fix: patch minimax (#9099 )	2026-01-29 12:44:04 -08:00

1 2 3 4 5 ...

2215 Commits