Commit Graph

2210 Commits

Author SHA1 Message Date
Ari Webb
5faec5632f fix: add m2.5 (#9480)
* fix: add m2.5

* fix test
2026-02-24 10:52:07 -08:00
github-actions[bot]
0b08164cc2 fix: update system prompt metadata label to "System prompt last recompiled" (#9477)
fix: update system prompt metadata label from "Memory blocks were last modified" to "System prompt last recompiled"

When git-based memory is enabled, there are no memory blocks, so the label
"Memory blocks were last modified" is inaccurate. Changed to
"System prompt last recompiled" which accurately reflects the timestamp meaning.

Fixes #9476



🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
80f34f134d fix(core): catch bare openai.APIError in handle_llm_error (#9468)
* fix(core): catch bare openai.APIError in handle_llm_error fallthrough

openai.APIError raised during streaming (e.g. OpenRouter credit
exhaustion) is not an APIStatusError, so it skipped the catch-all
at the end and fell through to LLMError("Unhandled"). Now bare
APIErrors that aren't context window overflows are mapped to
LLMBadRequestError.

Datadog: https://us5.datadoghq.com/error-tracking/issue/7a2c356c-0849-11f1-be66-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat(core): add LLMInsufficientCreditsError for BYOK credit exhaustion

Adds dedicated error type for insufficient credits/quota across all
providers (OpenAI, Anthropic, Google). Returns HTTP 402 with
BYOK-aware messaging instead of generic 400.

- New LLMInsufficientCreditsError class and PAYMENT_REQUIRED ErrorCode
- is_insufficient_credits_message() helper detecting credit/quota strings
- All 3 provider clients detect 402 status + credit keywords
- FastAPI handler returns 402 with "your API key" vs generic messaging
- 5 new parametrized tests covering OpenRouter, OpenAI, and negative case

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
b9c4ed3b15 fix: catch contextwindowexceeded error on gemini (#9450)
* catch contextwindowexceeded error

* fix(core): detect Google token limit errors as ContextWindowExceededError

Google's error message says "input token count exceeds the maximum
number of tokens allowed" which doesn't contain the word "context",
so it was falling through to generic LLMBadRequestError instead of
ContextWindowExceededError. This means compaction won't auto-trigger.

Expands the detection to also match "token count" and "tokens allowed"
in addition to the existing "context" keyword.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): add missing message arg to LLMBadRequestError in OpenAI client

The generic 400 path in handle_llm_error was constructing
LLMBadRequestError without the required message positional arg,
causing TypeError in prod during summarization.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* ci: add adapters/ test suite to core unit test matrix

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(tests): update adapter error handling test expectations to match actual behavior

The streaming adapter's error handling double-wraps errors: the
AnthropicStreamingInterface calls handle_llm_error first, then the
adapter catches the result and calls handle_llm_error again, which
falls through to the base class LLMError. Updated test expectations
to match this behavior.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): prevent double-wrapping of LLMError in stream adapter

The AnthropicStreamingInterface.process() already transforms raw
provider errors into LLMError subtypes via handle_llm_error. The
adapter was catching the result and calling handle_llm_error again,
which didn't recognize the already-transformed LLMError and wrapped
it in a generic LLMError("Unhandled LLM error"). This downgraded
specific error types (LLMConnectionError, LLMServerError, etc.)
and broke retry logic that matches on specific subtypes.

Now the adapter checks if the error is already an LLMError and
re-raises it as-is. Tests restored to original correct expectations.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Sarah Wooders
05073ba837 fix(core): preserve git-memory formatting and enforce lock conflicts (#9451)
* test(core): strengthen git-memory system prompt stability integration coverage

Switch git-memory HTTP integration tests to OpenAI model handles and add assertions that system prompt content remains stable after normal turns and direct block value updates until explicit recompilation or reset.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): preserve git-memory formatting and enforce lock conflicts

Preserve existing markdown frontmatter formatting on block updates while still ensuring required metadata fields exist, and make post-push git sync propagate memory-repo lock conflicts as 409 responses. Also enable slash-containing core-memory block labels in route params and add regression coverage.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(memfs): fail closed on memory repo lock contention

Make memfs git commits fail closed when the per-agent Redis lock cannot be acquired, return 409 MEMORY_REPO_BUSY from the memfs files write API, and map that 409 back to core MemoryRepoBusyError so API callers receive consistent busy conflicts.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore(core): minimize git-memory fix scope to memfs lock and frontmatter paths

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: drop unrelated changes and keep memfs-focused scope

Revert branch-only changes that are not required for the memfs lock contention and frontmatter-preservation fix so the PR contains only issue-relevant files.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(memfs): lock push sync path and improve nested sync diagnostics

Serialize memfs push-to-GCS sync with the same per-agent Redis lock key used by API commits, and add targeted post-push nested-block diagnostics plus a focused nested-label sync regression test for _sync_after_push.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Sarah Wooders
d7793a4474 fix(core): stabilize system prompt refresh and expand git-memory coverage (#9438)
* fix(core): stabilize system prompt refresh and expand git-memory coverage

Only rebuild system prompts on explicit refresh paths so normal turns preserve prefix-cache stability, including git/custom prompt layouts. Add integration coverage for memory filesystem tree structure and recompile/reset system-message updates via message-id retrieval.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): recompile system prompt around compaction and stabilize source tests

Force system prompt refresh before/after compaction in LettaAgentV3 so repaired system+memory state is used and persisted across subsequent turns. Update source-system prompt tests to explicitly recompile before raw preview assertions instead of assuming automatic rebuild timing.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Ari Webb
d0e25ae471 feat: add glm 5 to core (#9436)
* feat: add glm 5 to core

* test glm 5
2026-02-24 10:52:07 -08:00
Kian Jones
02183efd5f test: enable SQLAlchemy pooling in CI tests (#9279)
* test: enable SQLAlchemy pooling in CI tests

Changes CI test config to use LETTA_DISABLE_SQLALCHEMY_POOLING=false,
enabling connection pooling to match production settings.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: remove hardcoded LETTA_DISABLE_SQLALCHEMY_POOLING fixture from conftest

Remove the fixture that hardcoded the pooling setting in test code.
The value should instead come from the CI workflow environment via
vars.LETTA_DISABLE_SQLALCHEMY_POOLING (same source as production).

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>
Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>
2026-02-24 10:52:07 -08:00
Kian Jones
424a1ada64 fix: google gen ai format error fix (#9147)
* google gen ai format error fix

* fix(core): add $ref safety net, warning log, and unit tests for Google schema resolution

- Add `$ref` to unsupported_keys in `_clean_google_ai_schema_properties` so unresolvable refs (e.g. `#/properties/...` style) are stripped as a safety net instead of crashing the Google SDK
- Add warning log when `_resolve_json_schema_refs` encounters a ref it cannot resolve
- Deduplicate the `#/$defs/` and `#/definitions/` resolution branches
- Add 11 unit tests covering: single/multiple $defs, nested refs, refs in anyOf/allOf, array items, definitions key, unresolvable refs, and the full resolve+clean pipeline

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
jnjpng
39b25a0e3c fix: update ContextWindowCalculator to parse new system message sections (#9398)
* fix: update ContextWindowCalculator to parse new system message sections

The context window calculator was using outdated position-based parsing
that only handled 3 sections (base_instructions, memory_blocks, memory_metadata).
The actual system message now includes additional sections that were not
being tracked:

- <memory_filesystem> (git-enabled agents)
- <tool_usage_rules> (when tool rules configured)
- <directories> (when sources attached)

Changes:
- Add _extract_tag_content() helper for proper XML tag extraction
- Rewrite extract_system_components() to return a Dict with all 6 sections
- Update calculate_context_window() to count tokens for new sections
- Add new fields to ContextWindowOverview schema with backward-compatible defaults
- Add unit tests for the extraction logic

* update

* generate

* fix: check attached file in directories section instead of core_memory

Files are rendered inside <directories> tags, not <memory_blocks>.
Update validate_context_window_overview assertions accordingly.

* fix: address review feedback for context window parser

- Fix git-enabled agents regression: capture bare file blocks
  (e.g. <system/human.md>) rendered after </memory_filesystem> as
  core_memory via new _extract_git_core_memory() method
- Make _extract_top_level_tag robust: scan all occurrences to find
  tag outside container, handling nested-first + top-level-later case
- Document system_prompt tag inconsistency in docstring
- Add TODO to base_agent.py extract_dynamic_section linking to
  ContextWindowCalculator to flag parallel parser tech debt
- Add tests: git-enabled agent parsing, dual-occurrence tag
  extraction, pure text system prompt, git-enabled integration test
2026-02-24 10:52:07 -08:00
Kian Jones
7cc1cd3dc0 feat(ci): self-hosted provider test for lmstudio (#9404)
* add gpu runners and prod memory_repos

* add lmstudio and vllm in model_settings

* fix llm_configs and change variable name in reusable workflow and change perms for memory_repos to admin in tf

* fix: update self-hosted provider tests to use SDK 1.0 and v2 tests

- Update letta-client from ==0.1.324 to >=1.0.0
- Switch ollama/vllm/lmstudio tests to integration_test_send_message_v2.py

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use openai provider_type for self-hosted model settings

ollama/vllm/lmstudio are not valid provider_type values in the SDK
model_settings schema - they use openai-compatible APIs so provider_type
should be openai. The provider routing is determined by the handle prefix.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: enable redis for ollama/vllm/lmstudio tests

Background streaming tests require Redis. Add use-redis: true to
self-hosted provider test workflows.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* prep for lmstudio and vllm

* used lmstudio_openai client

* change tool call parser from hermes to qwen3_xml

* qwen3_xmlk -> qwen3_coder

* revert to hermes (incompatible with parallel tool calls?) and skipping vllm tests on parallel tool calls

* install uv redis extra

* remove lmstudio

* create lmstudio test

* qwen3-14b on lmstudio

* try with qwen3-4b

* actually update the model config json to use qwen3-4b

* add test_providers::test_lmstudio

* bump timeout from 60 to 120 for slow lmstudio on cpu model

* misc vllm changes

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Sarah Wooders
2ffef0fb31 Fix git-memory context preview parsing (#9414)
* fix(core): handle git memory label prefix collisions in filesystem view

Prevent context window preview crashes when a block label is both a leaf and a prefix (e.g. system/human and system/human/context) by rendering a node as both file and directory. Add regression test.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): parse git-backed core memory in context window preview

ContextWindowCalculator.extract_system_components now detects git-backed memory rendering (<memory_filesystem> and <system/...> tags) when <memory_blocks> wrapper is absent, so core_memory is populated in the context preview. Add regression tests.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Sarah Wooders
0dde155e9a feat: Prefix cache optimization system prompt (#9381) 2026-02-24 10:52:07 -08:00
Kian Jones
7eb85707b1 feat(tf): gpu runners and prod memory_repos (#9283)
* add gpu runners and prod memory_repos

* add lmstudio and vllm in model_settings

* fix llm_configs and change variable name in reusable workflow and change perms for memory_repos to admin in tf

* fix: update self-hosted provider tests to use SDK 1.0 and v2 tests

- Update letta-client from ==0.1.324 to >=1.0.0
- Switch ollama/vllm/lmstudio tests to integration_test_send_message_v2.py

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use openai provider_type for self-hosted model settings

ollama/vllm/lmstudio are not valid provider_type values in the SDK
model_settings schema - they use openai-compatible APIs so provider_type
should be openai. The provider routing is determined by the handle prefix.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use openai_compat_base_url for ollama/vllm/lmstudio providers

When reconstructing LLMConfig from a model handle lookup, use the
provider's openai_compat_base_url (which includes /v1) instead of
raw base_url. This fixes 404 errors when calling ollama/vllm/lmstudio
since OpenAI client expects /v1/chat/completions endpoint.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: enable redis for ollama/vllm/lmstudio tests

Background streaming tests require Redis. Add use-redis: true to
self-hosted provider test workflows.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* add memfs-py in prod bucket access

* change ollama

* change packer model defaults

* self-hosted provider support

* diasble reasoner to match the number of messages in test case, enable parallel tool calls, and pass embedding configs

* remove reasoning setting not supported for ollama

* add qwen3 to extra assistant message case

* lower temp

* prep for lmstudio and vllm

* used lmstudio_openai client

* skip parallel tool calls on cpu ran provider lmstudio

* revert downgrade since it's so slow already

* add reuired flags for tool call parsing etc.

* change tool call parser from hermes to qwen3_xml

* qwen3_xmlk -> qwen3_coder

* upgrade vllm to latest container

* revert to hermes (incompatible with parallel tool calls?) and skipping vllm tests on parallel tool calls

* install uv redis extra

* remove lmstudio

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kevin Lin
23c94ec6d3 feat: add log probabilities from OpenAI-compatible servers and SGLang native endpoint (#9240)
* Add log probabilities support for RL training

This enables Letta server to request and return log probabilities from
OpenAI-compatible providers (including SGLang) for use in RL training.

Changes:
- LLMConfig: Add return_logprobs and top_logprobs fields
- OpenAIClient: Set logprobs in ChatCompletionRequest when enabled
- LettaLLMAdapter: Add logprobs field and extract from response
- LettaResponse: Add logprobs field to return log probs to client
- LettaRequest: Add return_logprobs/top_logprobs for per-request override
- LettaAgentV3: Store and pass logprobs through to response
- agents.py: Handle request-level logprobs override

Usage:
  response = client.agents.messages.create(
      agent_id=agent_id,
      messages=[...],
      return_logprobs=True,
      top_logprobs=5,
  )
  print(response.logprobs)  # Per-token log probabilities

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* Add multi-turn token tracking for RL training via SGLang native endpoint

- Add TurnTokenData schema to track token IDs and logprobs per turn
- Add return_token_ids flag to LettaRequest and LLMConfig
- Create SGLangNativeClient for /generate endpoint (returns output_ids)
- Create SGLangNativeAdapter that uses native endpoint
- Modify LettaAgentV3 to accumulate turns across LLM calls
- Include turns in LettaResponse when return_token_ids=True

* Fix: Add SGLang native adapter to step() method, not just stream()

* Fix: Handle Pydantic Message objects in SGLang native adapter

* Fix: Remove api_key reference from LLMConfig (not present)

* Fix: Add missing 'created' field to ChatCompletionResponse

* Add full tool support to SGLang native adapter

- Format tools into prompt in Qwen-style format
- Parse tool calls from <tool_call> tags in response
- Format tool results as <tool_response> in user messages
- Set finish_reason to 'tool_calls' when tools are called

* Use tokenizer.apply_chat_template for proper tool formatting

- Add tokenizer caching in SGLang native adapter
- Use apply_chat_template when tokenizer available
- Fall back to manual formatting if not
- Convert Letta messages to OpenAI format for tokenizer

* Fix: Use func_response instead of tool_return for ToolReturn content

* Fix: Get output_token_logprobs from meta_info in SGLang response

* Fix: Allow None in output_token_logprobs (SGLang format includes null)

* chore: remove unrelated files from logprobs branch

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add missing call_type param to adapter constructors in letta_agent_v3

The SGLang refactor dropped call_type=LLMCallType.agent_step when extracting
adapter creation into conditional blocks. Restores it for all 3 spots (SGLang
in step, SimpleLLM in step, SGLang in stream).

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* just stage-api && just publish-api

* fix: update expected LLMConfig fields in schema test for logprobs support

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: remove rllm provider references

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* just stage-api && just publish-api

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-65-206.ec2.internal>
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Charles Packer
b0e16ae50f fix: surface GPT-5.3 Codex for ChatGPT OAuth providers (#9379) 2026-02-24 10:52:07 -08:00
Sarah Wooders
526da4c49b Revert "perf: optimize prefix caching by skipping system prompt rebuild on every step" (#9380)
Revert "perf: optimize prefix caching by skipping system prompt rebuild on ev…"

This reverts commit eafa4144c2577a45b7007a177b701863b98d1dfa.
2026-02-24 10:52:07 -08:00
Sarah Wooders
9dbe28e8f1 perf: optimize prefix caching by skipping system prompt rebuild on every step (#9080) 2026-02-24 10:52:07 -08:00
Kian Jones
b0c40b6b1d fix: multi_agent flaky test (#9314)
* fix(core): handle PermissionDeniedError in provider API key validation

Fixed OpenAI PermissionDeniedError being raised as unknown error when
validating provider API keys. The check_api_key methods in OpenAI-based
providers (OpenAI, OpenRouter, Azure, Together) now properly catch and
re-raise PermissionDeniedError as LLMPermissionDeniedError.

🐛 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): handle Unicode surrogates in OpenAI requests

Sanitize invalid UTF-16 surrogates before sending requests to OpenAI API.
Fixes UnicodeEncodeError when message content contains unpaired surrogates
from corrupted emoji data or malformed Unicode sequences.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* try to fix

* revert random stuff

* revert some stuff

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Sarah Wooders
21e880907f feat(core): structure memory directory and block labels [LET-7336] (#9309) 2026-02-24 10:52:06 -08:00
Kian Jones
6f746c5225 fix(core): handle Anthropic overloaded errors and Unicode encoding issues (#9305)
* fix: handle Anthropic overloaded_error in streaming interfaces

* fix: handle Unicode surrogates in OpenAI requests

Sanitize Unicode surrogate pairs before sending requests to OpenAI API.
Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause
UnicodeEncodeError when encoding to UTF-8.

Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in
position 326605: surrogates not allowed

* fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests

Improved sanitize_unicode_surrogates() to explicitly filter out lone
surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8.

Previous implementation used errors='ignore' which could still fail in
edge cases. New approach directly checks Unicode code points and removes
any surrogates before data reaches httpx encoding.

Also added sanitization to stream_async_responses() method which was
missing it.

Fixes: 'utf-8' codec can't encode character '\ud83c' in position X:
surrogates not allowed
2026-02-24 10:52:06 -08:00
Ari Webb
5c6ca705f1 Revert "feat: bring back use message packing for timezone [LET-6846]" (#9302)
Revert "feat: bring back use message packing for timezone [LET-6846] (#9256)"

This reverts commit c5017cccdef95b84fc585b26a0ddc5b7e44eb7c9.
2026-02-24 10:52:06 -08:00
jnjpng
ff69c6a32e feat: add /agents/{agent_id}/generate endpoint for direct LLM requests (#9272)
* feat: add /agents/{agent_id}/generate endpoint for direct LLM requests

Add new endpoint that makes direct LLM provider requests without agent
context, memory, tools, or state modification. This enables:
- Quick LLM queries without agent overhead
- Testing model configurations
- Simple chat completions using agent's credentials
- Comparing responses across different models

Features:
- Uses agent's LLM config by default
- Supports model override with full provider config resolution
- Non-streaming, stateless operation
- Proper error handling and validation
- Request/response schemas with Pydantic validation

Implementation:
- Add GenerateRequest and GenerateResponse schemas
- Implement generate_completion endpoint handler
- Add necessary imports (LLMError, LLMClient, HandleNotFoundError)
- Include logging and comprehensive error handling

* fix: improve error handling and fix Message construction

- Fix critical bug: use content=[TextContent(text=...)] instead of text=...
- Add explicit error handling for NoResultFound and HandleNotFoundError
- Add error handling for convert_response_to_chat_completion
- Add structured logging for debugging
- Remove unnecessary .get() calls since Pydantic validates messages

* refactor: extract generate logic to AgentCompletionService

Move the generate endpoint business logic out of the endpoint handler
into a dedicated AgentCompletionService class for better code organization
and separation of concerns.

Changes:
- Create new AgentCompletionService in services/agent_completion_service.py
- Service handles all business logic: agent validation, LLM config resolution,
  message conversion, LLM client creation, and request/response processing
- Integrate service with SyncServer initialization
- Refactor generate_completion endpoint to use the service
- Endpoint now only handles HTTP concerns (auth, error mapping)

Benefits:
- Cleaner endpoint code (reduced from ~140 lines to ~25 lines)
- Better separation of concerns (HTTP vs business logic)
- Service logic can be reused or tested independently
- Follows established patterns in the codebase (AgentManager, etc.)

* feat: simplify generate API to accept just prompt text

Simplify the client interface by accepting a simple prompt string instead
of requiring clients to format messages.

Changes:
- Update GenerateRequest schema:
  - Replace 'messages' array with simple 'prompt' string
  - Add optional 'system_prompt' for context/instructions
  - Keep 'override_model' for model selection
- Update AgentCompletionService to format messages automatically:
  - Accepts prompt and optional system_prompt
  - Constructs message array internally (system + user messages)
  - Simpler API surface for clients
- Update endpoint documentation with new simplified examples
- Regenerate OpenAPI spec and TypeScript SDK

Benefits:
- Much simpler client experience - just send text
- No need to understand message formatting
- Still supports system prompts for context
- Cleaner API that matches common use cases

Example (before):
{
  "messages": [{"role": "user", "content": "What is 2+2?"}]
}

Example (after):
{
  "prompt": "What is 2+2?"
}

* test: add comprehensive integration tests for generate endpoint

Add 9 integration tests covering various scenarios:

Happy path tests:
- test_agent_generate_basic: Basic prompt -> response flow
- test_agent_generate_with_system_prompt: System prompt + user prompt
- test_agent_generate_with_model_override: Override model selection
- test_agent_generate_long_prompt: Handle longer prompts
- test_agent_generate_no_persistence: Verify no messages saved to agent

Error handling tests:
- test_agent_generate_empty_prompt_error: Empty prompt validation (422)
- test_agent_generate_invalid_agent_id: Invalid agent ID (404)
- test_agent_generate_invalid_model_override: Invalid model handle (404)

All tests verify:
- Response structure (content, model, usage)
- Proper status codes for errors
- Usage statistics (tokens, counts)
- No side effects on agent state

Tests follow existing test patterns in test_client.py and use the
letta_client SDK (assuming generate_completion method is auto-generated
from the OpenAPI spec).

* openapi

* refactor: rename AgentCompletionService to AgentGenerateCompletionManager

Rename for better clarity and consistency with codebase naming conventions:
- Rename file: agent_completion_service.py → agent_generate_completion_manager.py
- Rename class: AgentCompletionService → AgentGenerateCompletionManager
- Rename attribute: server.agent_completion_service → server.agent_generate_completion_manager
- Update docstrings: 'Service' → 'Manager'

Changes:
- apps/core/letta/services/agent_generate_completion_manager.py (renamed + updated class)
- apps/core/letta/server/server.py (import + initialization)
- apps/core/letta/server/rest_api/routers/v1/agents.py (usage in endpoint)

No functional changes, purely a naming refactor.

* fix: remove invalid Message parameters in generate manager

Remove agent_id=None and user_id=None from Message construction.
The Message model doesn't accept these as None values - only pass
required parameters (role, content).

Fixes validation error:
  'Extra inputs are not permitted [type=extra_forbidden, input_value=None]'

This aligns with other Message construction patterns in the codebase
(see tools.py, memory.py examples).

* feat: improve generate endpoint validation and tests

- Add field validator for whitespace-only prompts
- Always include system message (required by Anthropic)
- Use default "You are a helpful assistant." when no system_prompt provided
- Update tests to use direct HTTP calls via httpx
- Fix test issues:
  - Use valid agent ID format (agent-{uuid})
  - Use available model (openai/gpt-4o-mini)
  - Add whitespace validation test
- All 9 integration tests passing
2026-02-24 10:52:06 -08:00
Ari Webb
426f6a8ca4 feat: bring back use message packing for timezone [LET-6846] (#9256)
* feat: bring back use message packing for timezone

* add tests
2026-02-24 10:52:06 -08:00
amysguan
16c96cc3c0 Fix sliding window cutoff logic (#9261)
* fix sliding window cutoff calculations to use agent instead of summarizer config

* allow approval messages with tool_calls as valid cutoffs, prevent approval pairs from being split

* update tests with updated sliding window parameters

---------

Co-authored-by: Amy Guan <amy@letta.com>
2026-02-24 10:52:06 -08:00
Kian Jones
00b36bc591 fix: resolve crouton telemetry failures (#9269)
Two issues were causing telemetry failures:
1. Startup race - memgpt-server sending telemetry before crouton created socket
2. Oversized payloads - large context windows (1M+ tokens) exceeding buffer

Changes:
- Increase crouton buffer to 128MB max with lazy allocation (64KB initial)
- Bump crouton resources (512Mi limit, 128Mi request)
- Add retry with exponential backoff in socket backend
- Move crouton to initContainers with restartPolicy: Always for deterministic startup

🐙 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Sarah Wooders
eaf64fb510 fix: add LLMCallType enum and ensure call_type is set on all provider traces (#9258)
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
jnjpng
f48b60634f refactor: extract compact logic to shared function for temporal (#9249)
* refactor: extract compact logic to shared function

Extract the compaction logic from LettaAgentV3.compact() into a
standalone compact_messages() function that can be shared between
the agent and temporal workflows.

Changes:
- Create apps/core/letta/services/summarizer/compact.py with:
  - compact_messages(): Core compaction logic
  - build_summarizer_llm_config(): LLM config builder for summarization
  - CompactResult: Dataclass for compaction results
- Update LettaAgentV3.compact() to use compact_messages()
- Update temporal summarize_conversation_history activity to use
  compact_messages() instead of the old Summarizer class
- Add use_summary_role parameter to SummarizeParams

This ensures consistent summarization behavior across different
execution paths and prevents drift as we improve the implementation.

* chore: clean up verbose comments

* fix: correct CompactionSettings import path

* fix: correct count_tokens import from summarizer_sliding_window

* fix: update test patch path for count_tokens_with_tools

After extracting compact logic to compact.py, the test was patching
the old location. Update the patch path to the new module location.

* fix: update test to use build_summarizer_llm_config from compact.py

The function was moved from LettaAgentV3._build_summarizer_llm_config
to compact.py as a standalone function.

* fix: add early check for system prompt size in compact_messages

Check if the system prompt alone exceeds the context window before
attempting summarization. The system prompt cannot be compacted,
so fail fast with SystemPromptTokenExceededError.

* fix: properly propagate SystemPromptTokenExceededError from compact

The exception handler in _step() was not setting the correct stop_reason
for SystemPromptTokenExceededError, which caused the finally block to
return early and swallow the exception.

Add special handling to set stop_reason to context_window_overflow_in_system_prompt
when SystemPromptTokenExceededError is caught.

* revert: remove redundant SystemPromptTokenExceededError handling

The special handling in the outer exception handler is redundant because
stop_reason is already set in the inner handler at line 943. The actual
fix for the test was the early check in compact_messages(), not this
redundant handling.

* fix: correctly re-raise SystemPromptTokenExceededError

The inner exception handler was using 'raise e' which re-raised the outer
ContextWindowExceededError instead of the current SystemPromptTokenExceededError.

Changed to 'raise' to correctly re-raise the current exception. This bug
was pre-existing but masked because _check_for_system_prompt_overflow was
only called as a fallback. The new early check in compact_messages() exposed it.

* revert: remove early check and restore raise e to match main behavior

* fix: set should_continue=False and correctly re-raise exception

- Add should_continue=False in SystemPromptTokenExceededError handler (matching main's _check_for_system_prompt_overflow behavior)
- Fix raise e -> raise to correctly propagate SystemPromptTokenExceededError

Note: test_large_system_prompt_summarization still fails locally but passes on main.
Need to investigate why exception isn't propagating correctly on refactored branch.

* fix: add SystemPromptTokenExceededError handler for post-step compaction

The post-step compaction (line 1066) was missing a SystemPromptTokenExceededError
exception handler. When compact_messages() raised this error, it would be caught
by the outer exception handler which would:
1. Set stop_reason to "error" instead of "context_window_overflow_in_system_prompt"
2. Not set should_continue = False
3. Get swallowed by the finally block (line 1126) which returns early

This caused test_large_system_prompt_summarization to fail because the exception
never propagated to the test.

The fix adds the same exception handler pattern used in the retry compaction flow
(line 941-946), ensuring proper state is set before re-raising.

This issue only affected the refactored code because on main, _check_for_system_prompt_overflow()
was an instance method that set should_continue/stop_reason BEFORE raising. In the refactor,
compact_messages() is a standalone function that cannot set instance state, so the caller
must handle the exception and set the state.
2026-02-24 10:52:06 -08:00
Kian Jones
a206f7f345 feat: add ID format validation to agent and user schemas (#9151)
* feat: add ID format validation to agent and user schemas

Reuse existing validator types (ToolId, SourceId, BlockId, MessageId,
IdentityId, UserId) from letta.validators to enforce ID format validation
at the schema level. This ensures malformed IDs are rejected with a 422
validation error instead of causing 500 database errors.

Changes:
- CreateAgent: validate tool_ids, source_ids, folder_ids, block_ids, identity_ids
- UpdateAgent: validate tool_ids, source_ids, folder_ids, block_ids, message_ids, identity_ids
- UserUpdate: validate id

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate API spec and SDK

* fix: override ID validation in AgentSchema for agent file portability

AgentSchema extends CreateAgent but needs to allow arbitrary short IDs
(e.g., tool-0, block-0) for portable agent files. Override the validated
ID fields to use plain List[str] instead of the validated types.

Also fix test_agent.af to use proper UUID-format IDs.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate API spec and SDK

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: revert test_agent.af - short IDs are valid for agent files

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix openapi schema

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Sarah Wooders
3fdf2b6c79 chore: deprecate old agent messaging (#9120) 2026-02-24 10:52:06 -08:00
Sarah Wooders
4096b30cd7 feat: log LLM traces to clickhouse (#9111)
* feat: add non-streaming option for conversation messages

- Add ConversationMessageRequest with stream=True default (backwards compatible)
- stream=true (default): SSE streaming via StreamingService
- stream=false: JSON response via AgentLoop.load().step()

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate API schema for ConversationMessageRequest

* feat: add direct ClickHouse storage for raw LLM traces

    Adds ability to store raw LLM request/response payloads directly in ClickHouse,
    bypassing OTEL span attribute size limits. This enables debugging and analytics
    on large LLM payloads (>10MB system prompts, large tool schemas, etc.).

    New files:
    - letta/schemas/llm_raw_trace.py: Pydantic schema with ClickHouse row helper
    - letta/services/llm_raw_trace_writer.py: Async batching writer (fire-and-forget)
    - letta/services/llm_raw_trace_reader.py: Reader with query methods
    - scripts/sql/clickhouse/llm_raw_traces.ddl: Production table DDL
    - scripts/sql/clickhouse/llm_raw_traces_local.ddl: Local dev DDL
    - apps/core/clickhouse-init.sql: Local dev initialization

    Modified:
    - letta/settings.py: Added 4 settings (store_llm_raw_traces, ttl, batch_size, flush_interval)
    - letta/llm_api/llm_client_base.py: Integration into request_async_with_telemetry
    - compose.yaml: Added ClickHouse service for local dev
    - justfile: Added clickhouse, clickhouse-cli, clickhouse-traces commands

    Feature disabled by default (LETTA_STORE_LLM_RAW_TRACES=false).
    Uses ZSTD(3) compression for 10-30x reduction on JSON payloads.

    🤖 Generated with [Letta Code](https://letta.com)

    Co-Authored-By: Letta <noreply@letta.com>

* fix: address code review feedback for LLM raw traces

Fixes based on code review feedback:

1. Fix ClickHouse endpoint parsing - default to secure=False for raw host:port
   inputs (was defaulting to HTTPS which breaks local dev)

2. Make raw trace writes truly fire-and-forget - use asyncio.create_task()
   instead of awaiting, so JSON serialization doesn't block request path

3. Add bounded queue (maxsize=10000) - prevents unbounded memory growth
   under load. Drops traces with warning if queue is full.

4. Fix deprecated asyncio usage - get_running_loop() instead of get_event_loop()

5. Add org_id fallback - use _telemetry_org_id if actor doesn't have it

6. Remove unused imports - json import in reader

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add missing asyncio import and simplify JSON serialization

- Add missing 'import asyncio' that was causing 'name asyncio is not defined' error
- Remove unnecessary clean_double_escapes() function - the JSON is stored correctly,
  the clickhouse-client CLI was just adding extra escaping when displaying
- Update just clickhouse-trace to use Python client for correct JSON output

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: add clickhouse raw trace integration test

* test: simplify clickhouse trace assertions

* refactor: centralize usage parsing and stream error traces

Use per-client usage helpers for raw trace extraction and ensure streaming errors log requests with error metadata.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: exercise provider usage parsing live

Make live OpenAI/Anthropic/Gemini requests with credential gating and validate Anthropic cache usage mapping when present.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: fix usage parsing tests to pass

- Use GoogleAIClient with GEMINI_API_KEY instead of GoogleVertexClient
- Update model to gemini-2.0-flash (1.5-flash deprecated in v1beta)
- Add tools=[] for Gemini/Anthropic build_request_data

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: extract_usage_statistics returns LettaUsageStatistics

Standardize on LettaUsageStatistics as the canonical usage format returned by client helpers. Inline UsageStatistics construction for ChatCompletionResponse where needed.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add is_byok and llm_config_json columns to ClickHouse traces

Extend llm_raw_traces table with:
- is_byok (UInt8): Track BYOK vs base provider usage for billing analytics
- llm_config_json (String, ZSTD): Store full LLM config for debugging and analysis

This enables queries like:
- BYOK usage breakdown by provider/model
- Config parameter analysis (temperature, max_tokens, etc.)
- Debugging specific request configurations

* feat: add tests for error traces, llm_config_json, and cache tokens

- Update llm_raw_trace_reader.py to query new columns (is_byok,
  cached_input_tokens, cache_write_tokens, reasoning_tokens, llm_config_json)
- Add test_error_trace_stored_in_clickhouse to verify error fields
- Add test_cache_tokens_stored_for_anthropic to verify cache token storage
- Update existing tests to verify llm_config_json is stored correctly
- Make llm_config required in log_provider_trace_async()
- Simplify provider extraction to use provider_name directly

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* ci: add ClickHouse integration tests to CI pipeline

- Add use-clickhouse option to reusable-test-workflow.yml
- Add ClickHouse service container with otel database
- Add schema initialization step using clickhouse-init.sql
- Add ClickHouse env vars (CLICKHOUSE_ENDPOINT, etc.)
- Add separate clickhouse-integration-tests job running
  integration_test_clickhouse_llm_raw_traces.py

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify provider and org_id extraction in raw trace writer

- Use model_endpoint_type.value for provider (not provider_name)
- Simplify org_id to just self.actor.organization_id (actor is always pydantic)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify LLMRawTraceWriter with _enabled flag

- Check ClickHouse env vars once at init, set _enabled flag
- Early return in write_async/flush_async if not enabled
- Remove ValueError raises (never used)
- Simplify _get_client (no validation needed since already checked)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add LLMRawTraceWriter shutdown to FastAPI lifespan

Properly flush pending traces on graceful shutdown via lifespan
instead of relying only on atexit handler.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add agent_tags column to ClickHouse traces

Store agent tags as Array(String) for filtering/analytics by tag.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* cleanup

* fix(ci): fix ClickHouse schema initialization in CI

- Create database separately before loading SQL file
- Remove CREATE DATABASE from SQL file (handled in CI step)
- Add verification step to confirm table was created
- Use -sf flag for curl to fail on HTTP errors

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify LLM trace writer with ClickHouse async_insert

- Use ClickHouse async_insert for server-side batching instead of manual queue/flush loop
- Sync cloud DDL schema with clickhouse-init.sql (add missing columns)
- Remove redundant llm_raw_traces_local.ddl
- Remove unused batch_size/flush_interval settings
- Update tests for simplified writer

Key changes:
- async_insert=1, wait_for_async_insert=1 for reliable server-side batching
- Simple per-trace retry with exponential backoff (max 3 retries)
- ~150 lines removed from writer

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: consolidate ClickHouse direct writes into TelemetryManager backend

- Add clickhouse_direct backend to provider_trace_backends
- Remove duplicate ClickHouse write logic from llm_client_base.py
- Configure via LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse_direct

The clickhouse_direct backend:
- Converts ProviderTrace to LLMRawTrace
- Extracts usage stats from response JSON
- Writes via LLMRawTraceWriter with async_insert

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: address PR review comments and fix llm_config bug

Review comment fixes:
- Rename clickhouse_direct -> clickhouse_analytics (clearer purpose)
- Remove ClickHouse from OSS compose.yaml, create separate compose.clickhouse.yaml
- Delete redundant scripts/test_llm_raw_traces.py (use pytest tests)
- Remove unused llm_raw_traces_ttl_days setting (TTL handled in DDL)
- Fix socket description leak in telemetry_manager docstring
- Add cloud-only comment to clickhouse-init.sql
- Update justfile to use separate compose file

Bug fix:
- Fix llm_config not being passed to ProviderTrace in telemetry
- Now correctly populates provider, model, is_byok for all LLM calls
- Affects both request_async_with_telemetry and log_provider_trace_async

DDL optimizations:
- Add secondary indexes (bloom_filter for agent_id, model, step_id)
- Add minmax indexes for is_byok, is_error
- Change model and error_type to LowCardinality for faster GROUP BY

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: rename llm_raw_traces -> llm_traces

Address review feedback that "raw" is misleading since we denormalize fields.

Renames:
- Table: llm_raw_traces -> llm_traces
- Schema: LLMRawTrace -> LLMTrace
- Files: llm_raw_trace_{reader,writer}.py -> llm_trace_{reader,writer}.py
- Setting: store_llm_raw_traces -> store_llm_traces

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: update workflow references to llm_traces

Missed renaming table name in CI workflow files.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: update clickhouse_direct -> clickhouse_analytics in docstring

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: remove inaccurate OTEL size limit comments

The 4MB limit is our own truncation logic, not an OTEL protocol limit.
The real benefit is denormalized columns for analytics queries.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: remove local ClickHouse dev setup (cloud-only feature)

- Delete clickhouse-init.sql and compose.clickhouse.yaml
- Remove local clickhouse just commands
- Update CI to use cloud DDL with MergeTree for testing

clickhouse_analytics is a cloud-only feature. For local dev, use postgres backend.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: restore compose.yaml to match main

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: merge clickhouse_analytics into clickhouse backend

Per review feedback - having two separate backends was confusing.

Now the clickhouse backend:
- Writes to llm_traces table (denormalized for cost analytics)
- Reads from OTEL traces table (will cut over to llm_traces later)

Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: correct path to DDL file in CI workflow

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: add provider index to DDL for faster filtering

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: configure telemetry backend in clickhouse tests

Tests need to set telemetry_settings.provider_trace_backends to include
'clickhouse', otherwise traces are routed to default postgres backend.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: set provider_trace_backend field, not property

provider_trace_backends is a computed property, need to set the
underlying provider_trace_backend string field instead.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: error trace test and error_type extraction

- Add TelemetryManager to error trace test so traces get written
- Fix error_type extraction to check top-level before nested error dict

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use provider_trace.id for trace correlation across backends

- Pass provider_trace.id to LLMTrace instead of auto-generating
- Log warning if ID is missing (shouldn't happen, helps debug)
- Fallback to new UUID only if not set

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: trace ID correlation and concurrency issues

- Strip "provider_trace-" prefix from ID for UUID storage in ClickHouse
- Add asyncio.Lock to serialize writes (clickhouse_connect not thread-safe)
- Fix Anthropic prompt_tokens to include cached tokens for cost analytics
- Log warning if provider_trace.id is missing

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Caren Thomas <carenthomas@gmail.com>
2026-02-24 10:52:06 -08:00
jnjpng
24ea7dbaed feat: include tools as part of token estimate in compact (#9242)
* base

* fix
2026-02-24 10:52:06 -08:00
Ari Webb
0bbb9c9bc0 feat: add reasoning zai openrouter (#9189)
* feat: add reasoning zai openrouter

* add openrouter reasoning

* stage + publish api

* openrouter reasoning always on

* revert

* fix

* remove reference

* do
2026-02-24 10:52:06 -08:00
Kian Jones
01cb00ae10 Revert "fix: truncate oversized text in embedding requests" (#9227)
Revert "fix: truncate oversized text in embedding requests (#9196)"

This reverts commit a9c342087e022519c63d62fb76b72aed8859539b.
2026-02-24 10:52:06 -08:00
Kian Jones
630c147b13 fix: truncate oversized text in embedding requests (#9196)
fix: handle oversized text in embedding requests with recursive chunking

When message text exceeds the embedding model's context length, recursively
split it until all chunks can be embedded successfully.

Changes:
- `tpuf_client.py`: Add `_split_text_in_half()` helper for recursive splitting
- `tpuf_client.py`: Add `_generate_embeddings_with_chunking()` that retries
  with splits on context length errors
- `tpuf_client.py`: Store `message_id` and `chunk_index` columns in Turbopuffer
- `tpuf_client.py`: Deduplicate query results by `message_id`
- `tpuf_client.py`: Use `LettaInvalidArgumentError` instead of `ValueError`
- `tpuf_client.py`: Move LLMClient import to top of file
- `openai_client.py`: Remove fixed truncation (chunking handles this now)
- Add tests for `_split_text_in_half` and chunked query deduplication

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
jnjpng
3f23a23227 feat: add compaction stats (#9219)
* base

* update

* last

* generate

* fix test
2026-02-24 10:52:06 -08:00
jnjpng
df2e666ced test: skip deepwiki SSE MCP server test (#9218)
The deepwiki SSE MCP server is deprecated, so skip this test.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
jnjpng
e25a0c9cdf feat: update compact endpoint to store summary message (#9215)
* base

* add tests
2026-02-24 10:52:06 -08:00
jnjpng
d28ccc0be6 feat: add summary message and event on compaction (#9144)
* base

* update

* update

* revert formatting

* routes

* legacy

* fix

* review

* update
2026-02-24 10:52:05 -08:00
cthomas
59ffaec8f4 fix: revert test comments (#9161) 2026-01-29 12:44:04 -08:00
cthomas
d992aa0df4 fix: non-streaming conversation messages endpoint (#9159)
* fix: non-streaming conversation messages endpoint

**Problems:**
1. `AssertionError: run_id is required when enforce_run_id_set is True`
   - Non-streaming path didn't create a run before calling `step()`

2. `ResponseValidationError: Unable to extract tag using discriminator 'message_type'`
   - `response_model=LettaStreamingResponse` but non-streaming returns `LettaResponse`

**Fixes:**
1. Add run creation before calling `step()` (mirrors agents endpoint)
2. Set run_id in Redis for cancellation support
3. Pass `run_id` to `step()`
4. Change `response_model` from `LettaStreamingResponse` to `LettaResponse`
   (streaming returns `StreamingResponse` which bypasses response_model validation)

**Test:**
Added `test_conversation_non_streaming_raw_http` to verify the fix.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* api sync

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
Kian Jones
34eed72150 feat: add user id validation (#9128)
* add user id validation

* relax conversation id check to allow default while I'm here

* fix annotation validation

* -api changes
2026-01-29 12:44:04 -08:00
Kian Jones
0099a95a43 fix(sec): first pass of ensuring actor id is required everywhere (#9126)
first pass of ensuring actor id is required
2026-01-29 12:44:04 -08:00
Sarah Wooders
b34ad43691 feat: add minimax byok to ui (#9101)
* fix: patch minimax

* feat: add frontend changes for minimax

* add logo, fix backend

* better check for is minimax

* more references fixed for minimax

* start revering unnecessary changes

* revert backend changes, just ui

* fix minimax fully

* fix test

* add key to deploy action

---------

Co-authored-by: Ari Webb <ari@letta.com>
Co-authored-by: Ari Webb <arijwebb@gmail.com>
2026-01-29 12:44:04 -08:00
Sarah Wooders
fb69a96cd6 fix: patch minimax (#9099) 2026-01-29 12:44:04 -08:00
Sarah Wooders
adab8cd9b5 feat: add MiniMax provider support (#9095)
* feat: add MiniMax provider support

Add MiniMax as a new LLM provider using their Anthropic-compatible API.

Key implementation details:
- Uses standard messages API (not beta) - MiniMax supports thinking blocks natively
- Base URL: https://api.minimax.io/anthropic
- Models: MiniMax-M2.1, MiniMax-M2.1-lightning, MiniMax-M2 (all 200K context, 128K output)
- Temperature clamped to valid range (0.0, 1.0]
- All M2.x models treated as reasoning models (support interleaved thinking)

Files added:
- letta/schemas/providers/minimax.py - MiniMax provider schema
- letta/llm_api/minimax_client.py - Client extending AnthropicClient
- tests/test_minimax_client.py - Unit tests (13 tests)
- tests/model_settings/minimax-m2.1.json - Integration test config

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate API spec with MiniMax provider

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: use MiniMax-M2.1-lightning for CI tests

Switch to the faster/cheaper lightning model variant for integration tests.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: add MINIMAX_API_KEY to deploy-core command

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

* chore: regenerate web openapi spec with MiniMax provider

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

🐾 Generated with [Letta Code](https://letta.com)

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
2026-01-29 12:44:04 -08:00
Sarah Wooders
221b4e6279 refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065)
👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
cthomas
3e49cf5d44 fix: load default provider config when summarizer uses different prov… (#9051)
fix: load default provider config when summarizer uses different provider

**Problem:**
Summarization failed when agent used one provider (e.g., Google AI) but
summarizer config specified a different provider (e.g., Anthropic):

```python
# Agent LLM config
model_endpoint_type='google_ai', handle='gemini-something/gemini-2.5-pro',
context_window=100000

# Summarizer config
model='anthropic/claude-haiku-4-5-20251001'

# Bug: Resulting summarizer_llm_config mixed Google + Anthropic settings
model='claude-haiku-4-5-20251001', model_endpoint_type='google_ai',  #  Wrong endpoint!
context_window=100000  #  Google's context window, not Anthropic's default!
```

This sent Claude requests to Google AI endpoints with incorrect parameters.

**Root Cause:**
`_build_summarizer_llm_config()` always copied the agent's LLM config as base,
then patched model/provider fields. But this kept all provider-specific settings
(endpoint, context_window, etc.) from the wrong provider.

**Fix:**
1. Parse provider_name from summarizer handle
2. Check if it matches agent's model_endpoint_type (or provider_name for custom)
3. **If YES** → Use agent config as base, override model/handle (same provider)
4. **If NO** → Load default config via `provider_manager.get_llm_config_from_handle()` (new provider)

**Example Flow:**
```python
# Agent: google_ai/gemini-2.5-pro
# Summarizer: anthropic/claude-haiku

provider_name = "anthropic"  # Parsed from handle
provider_matches = ("anthropic" == "google_ai")  # False 

# Different provider → load default Anthropic config
base = await provider_manager.get_llm_config_from_handle(
    handle="anthropic/claude-haiku",
    actor=self.actor
)
# Returns: model_endpoint_type='anthropic', endpoint='https://api.anthropic.com', etc. 
```

**Result:**
- Summarizer with different provider gets correct default config
- No more mixing Google endpoints with Anthropic models
- Same-provider summarizers still inherit agent settings efficiently

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
github-actions[bot]
1d1bb29a43 feat: add override_model support for agent file import (#9058) 2026-01-29 12:44:04 -08:00
cthomas
57ab117437 feat: dedupe approval response retries on server (#9038) 2026-01-29 12:44:04 -08:00