letta-server

Author	SHA1	Message	Date
jnjpng	5505e9cf4b	fix(core): suppress missing-otid warning for compaction events (#9616 ) fix(core): skip missing-otid warning for compaction events	2026-02-24 10:55:26 -08:00
Ari Webb	62967bcca0	feat: parallel tool calling minimax provider [LET-7647] (#9613 ) * feat: parallel tool calling minimax provider * stage publish api	2026-02-24 10:55:26 -08:00
jnjpng	a59f24ac87	fix(core): ensure buffered Anthropic tool chunks always include otid (#9516 ) fix(core): ensure otid exists when flushing buffered anthropic tool chunks Anthropic TOOL_USE buffering can emit buffered tool_call/approval chunks on content block stop before otid is assigned in the normal inner_thoughts_complete path. Ensure flush-time chunks get a deterministic otid so streaming clients can reliably correlate deltas. 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:26 -08:00
Shubham Naik	f082fd5061	feat: add order_by and order params to /v1/conversations list endpoin… (#9599 ) * feat: add order_by and order params to /v1/conversations list endpoint [LET-7628] Added sorting support to the conversations list endpoint, matching the pattern from /v1/agents. API Changes: - Added `order` query param: "asc" or "desc" (default: "desc") - Added `order_by` query param: "created_at" or "last_run_completion" (default: "created_at") Implementation: created_at ordering: - Simple ORDER BY on ConversationModel.created_at - No join required, fast query - Nulls not applicable (created_at always set) last_run_completion ordering: - LEFT JOIN with runs table using subquery - Subquery: MAX(completed_at) grouped by conversation_id - Uses OUTER JOIN so conversations with no runs are included - Nulls last ordering (conversations with no runs go to end) - Index on runs.conversation_id ensures performant join Pagination: - Cursor-based pagination with `after` parameter - Handles null values correctly for last_run_completion - For created_at: simple timestamp comparison - For last_run_completion: complex null-aware cursor logic Performance: - Existing index: `ix_runs_conversation_id` on runs table - Subquery with GROUP BY is efficient for this use case - OUTER JOIN ensures conversations without runs are included Follows agents pattern: - Same parameter names (order, order_by) - Same Literal types and defaults - Converts "asc"/"desc" to ascending boolean internally 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: order --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:26 -08:00
Sarah Wooders	afbc416972	feat(core): add model/model_settings override fields to conversation create/update (#9607 )	2026-02-24 10:55:26 -08:00
Ari Webb	a9a6a5f29d	fix: add correct logging (#9603 )	2026-02-24 10:55:26 -08:00
Kevin Lin	8fc77af685	fix(memory): standardize tool parameter names (#9552 ) fix(memory): standardize tool parameter names Use old_string/new_string across memory edit tools, docs, tests, and starter kits to avoid mismatched parameter names. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> EOF )	2026-02-24 10:55:24 -08:00
jnjpng	9155b4fa86	fix: use canonical stop reason mapping in redis stream finalizer (#9600 ) fix: derive run status from StopReasonType mapping	2026-02-24 10:55:12 -08:00
github-actions[bot]	ba67621e1b	feat: add conversation deletion endpoint (soft delete) [LET-7286] (#9230 ) * feat: add conversation deletion endpoint (soft delete) [LET-7286] - Add DELETE /conversations/{conversation_id} endpoint - Filter soft-deleted conversations from list operations - Add check_is_deleted=True to update/delete operations Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add tests, update SDK and stainless for delete conversation - Add 5 integration tests for DELETE conversation endpoint - Run stage-api to regenerate OpenAPI spec and SDK - Add delete method to conversations in stainless.yml Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: add manager-level tests for conversation soft delete [LET-7286] - test_delete_conversation_removes_from_list - test_delete_conversation_double_delete_raises - test_update_deleted_conversation_raises - test_delete_conversation_excluded_from_summary_search 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>	2026-02-24 10:55:12 -08:00
jnjpng	9c8589a687	fix: correct ChatGPT OAuth GPT-5 max output token defaults (#9592 ) fix: align ChatGPT OAuth GPT-5 max output token defaults Update ChatGPT OAuth provider defaults so GPT-5 family models report 128k max output tokens based on current OpenAI model docs, avoiding incorrect 16k values in /v1/models responses. 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:12 -08:00
Shubham Naik	73c824f5d2	feat: make agent_id optional in conversations list endpoint [LET-7612] (#9585 ) * feat: make agent_id optional in conversations list endpoint [LET-7612] Allow listing all conversations without filtering by agent_id. Router changes (conversations.py): - Changed agent_id from required (`...`) to optional (`None`) - Updated description to clarify behavior - Updated docstring to reflect optional filtering Manager changes (conversation_manager.py): - Updated list_conversations signature: agent_id: str → Optional[str] - Updated docstring to clarify optional behavior - Summary search query: conditionally adds agent_id filter only if provided - Default list logic: passes agent_id (can be None) to list_async How it works: - Without agent_id: returns all conversations for the user's organization - With agent_id: returns conversations filtered by that agent - list_async handles None gracefully via kwargs pattern Use case:** - Cloud UI can list all user conversations across agents - Still supports filtering by agent_id when needed 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: update logs * chore: update logs --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:12 -08:00
jnjpng	257b99923b	fix: preserve max_tokens on model_settings updates without max_output_tokens (#9591 ) When model_settings is sent without max_output_tokens (e.g. only changing reasoning_effort), the Pydantic default of 4096 was being applied via _to_legacy_config_params(), silently overwriting the agent's existing max_tokens. Use model_fields_set to detect when max_output_tokens was not explicitly provided and skip overwriting max_tokens in that case. Only applied to the update path — on create, letting the default apply is reasonable since there's no pre-existing value.	2026-02-24 10:55:12 -08:00
cthomas	857c289ed2	fix: handle compact edge case in idempotency check (#9588 )	2026-02-24 10:55:12 -08:00
cthomas	73c9b14fa9	fix: dont throw error if compaction races (#9576 )	2026-02-24 10:55:12 -08:00
jnjpng	f10440b49c	fix: update Anthropic Haiku test model after 3.5 retirement (#9569 ) * fix: migrate Anthropic Haiku test model off retired release Update Anthropic Haiku references in integration and usage parsing tests to a supported model id so test requests stop failing with 404 model not found errors. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use canonical Anthropic Haiku handle in tests Replace dated Anthropic Haiku handle references with the canonical provider handle so handle-based model resolution does not fail in batch and client tests. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:12 -08:00
amysguan	a101d5980d	Fix: load config for summarizer model from defaults instead of agent's config (#9568 ) * load default settings instead of loading from agent for summarizer config * update tests to allow use of get_llm_config_from_handle * remove nit comment --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:55:12 -08:00
amysguan	33969d7190	Default to lightweight compaction model instead of agent's model (#9488 ) --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:55:12 -08:00
jnjpng	eb4a0daabd	fix: allow explicit null for max_tokens on GPT-5 models (#9562 ) The Pydantic validator `set_model_specific_defaults` was checking `values.get("max_tokens") is None`, which matched both "field not provided" and "field explicitly set to null". This meant users could not disable the max output tokens limit for GPT-5/GPT-4.1 models - the validator would always override null with a default value during request deserialization. Changed to `"max_tokens" not in values` so that an explicit `max_tokens: null` is preserved while still applying defaults when the field is omitted entirely.	2026-02-24 10:55:12 -08:00
jnjpng	828c89c76f	fix: populate max_tokens when listing LLM models (#9559 ) list_llm_models_async was constructing LLMConfig without max_tokens, causing the GET /models/ endpoint to return null for max_tokens. Now calls typed_provider.get_default_max_output_tokens() for both base and BYOK provider paths, matching get_llm_config_from_handle.	2026-02-24 10:55:12 -08:00
Kevin Lin	bd5b5fa9f3	feat(gemini): add 3.1 pro preview support (#9553 ) Add 3.1 model metadata for Google AI and update Gemini tests/examples to use the new handle. 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
cthomas	e2ad8762fe	fix: gemini streaming bug (#9555 )	2026-02-24 10:55:11 -08:00
cthomas	8ffc515674	fix: flip parallel_tool_calls setting default (#9541 )	2026-02-24 10:55:11 -08:00
cthomas	3cdd64dc24	chore: update keepalive interval 50->20 (#9538 ) * chore: update keepalive interval 50->20 * update comment	2026-02-24 10:55:11 -08:00
Kian Jones	8f56527958	perf(memfs): delta upload — only push new git objects after commit (#9548 ) perf(memfs): delta upload — only push new/modified git objects after commit Instead of re-uploading the entire .git/ directory after every commit, snapshot file mtimes before the commit and only upload files that are new or changed. A typical single-block update creates ~5 new objects (blob, trees, commit, ref) vs re-uploading all ~30. Full _upload_repo retained for create_repo and other paths that need it. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
Charles Packer	044241daec	fix(core): include effort in AnthropicModelSettings returned by _to_model_settings() (#9543 ) LlmConfig._to_model_settings() for Anthropic built an AnthropicModelSettings object without passing effort=self.effort, so GET /agents/{id} never returned the effort field in model_settings even when it was stored on the agent. The Letta Code CLI derives the reasoning tier displayed in the status bar from model_settings.effort (canonical source), so the footer always showed e.g. "Sonnet 4.6" instead of "Sonnet 4.6 (high)" after a model switch. 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
Kian Jones	e65795b5f1	fix(core): handle None message_ids in context window calculator (#9330 ) * fix(core): always create system message even with _init_with_no_messages When _init_with_no_messages=True (used by agent import flows), the agent was created with message_ids=None. If subsequent message initialization failed, this left orphaned agents that crash when context window is calculated (TypeError on message_ids[1:]). Now the system message is always generated and persisted, even when skipping the rest of the initial message sequence. This ensures every agent has at least message_ids=[system_message_id]. Fixes Datadog issue 773a24ea-eeb3-11f0-8f9f-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): clean up placeholder messages during import and add test Delete placeholder system messages after imported messages are successfully created (not before), so agents retain their safety-net system message if import fails. Also adds a test verifying that _init_with_no_messages=True still produces a valid context window. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): add descriptive error for empty message_ids in get_system_message 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
jnjpng	e8d5922ff9	fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming (#9535 ) * fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming When reasoning models (gpt-5.x) exhaust their max_output_tokens budget on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent instead of ResponseCompletedEvent. This was previously unhandled, causing final_response to remain None — which meant get_content() and get_tool_call_objects() returned empty results, silently dropping the partial response. Now ResponseIncompleteEvent is handled identically to ResponseCompletedEvent (extracting partial content, usage stats, and token details), with an additional warning log indicating the incomplete reason. * fix(core): propagate finish_reason for Responses API incomplete events - Guard usage extraction against None usage payload in ResponseIncompleteEvent handler - Add _finish_reason override to LettaLLMAdapter so streaming adapters can explicitly set finish_reason without a chat_completions_response - Map incomplete_details.reason="max_output_tokens" to finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat Completions API convention - This allows the agent loop's _decide_continuation to correctly return stop_reason="max_tokens_exceeded" instead of "end_turn" when the model exhausts its output token budget on reasoning * fix(core): handle empty content parts in incomplete ResponseOutputMessage When a model hits max_output_tokens after starting a ResponseOutputMessage but before producing any content parts, the message has content=[]. This previously raised ValueError("Got 0 content parts, expected 1"). Now it logs a warning and skips the empty message, allowing reasoning-only incomplete responses to be processed cleanly. * fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens Handle content_filter and any future unknown incomplete reasons from the Responses API instead of silently leaving finish_reason as None.	2026-02-24 10:55:11 -08:00
Ari Webb	5896e5d023	fix: logging for credit verification step (#9514 )	2026-02-24 10:55:11 -08:00
cthomas	3651658ea7	fix: tool call streaming using deprecated field (#9517 )	2026-02-24 10:55:11 -08:00
Ari Webb	21765d16c9	fix(core): add OpenAI 24h prompt cache retention for supported models (#9509 ) * fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492) * fix(core): apply OpenAI prompt cache settings to request payloads Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prefix prompt cache key with letta tag Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add integration test * skip test --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Ari Webb <ari@letta.com> * fix(core): only set prompt_cache_retention, drop prompt_cache_key Two issues with the original prompt_cache_key approach: 1. Key exceeded 64-char max (agent-<uuid>:conv-<uuid> = 90 chars) 2. Setting an explicit key disrupted OpenAI's default prefix-hash routing, dropping cache hit rates from 40-45% to 10-13% OpenAI's default routing (hash of first ~256 tokens) already provides good cache affinity since each agent has a unique system prompt. We only need prompt_cache_retention="24h" for extended retention. Also fixes: - Operator precedence bug in _supports_extended_prompt_cache_retention - Removes incorrect gpt-5.2-codex exclusion (it IS supported per docs) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Charles Packer <packercharles@gmail.com> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
jnjpng	042c9c36af	fix(core): add warning log for streaming chunks missing id or otid (#9513 ) Adds a diagnostic log at the streaming chokepoint in LettaAgentV3.stream() to detect when any LettaMessage chunk is yielded without an id or otid field. This helps trace the root cause of client-side id/otid inconsistencies.	2026-02-24 10:55:11 -08:00
Kian Jones	f5c4ab50f4	chore: add ty + pre-commit hook and repeal even more ruff rules (#9504 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import * remove all ignores, add FastAPI rules and Ruff rules * add ty and precommit * ruff stuff * ty check fixes * ty check fixes pt 2 * error on invalid	2026-02-24 10:55:11 -08:00
Devansh Jain	39ddda81cc	feat: add Anthropic Sonnet 4.6 (#9408 )	2026-02-24 10:55:11 -08:00
Kian Jones	25d54dd896	chore: enable F821, F401, W293 (#9503 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import	2026-02-24 10:55:08 -08:00
Ari Webb	fa70e09963	Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention" (#9502 ) Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention …" This reverts commit f5bb9c629cb7d45544e90758cdfb899bcef41912.	2026-02-24 10:52:07 -08:00
Sarah Wooders	2bf3314cef	fix: import asyncio for parallel tool calls (#9501 )	2026-02-24 10:52:07 -08:00
Shubham Naik	20c71523f8	chore: hotwire fix for core (#9500 )	2026-02-24 10:52:07 -08:00
Shubham Naik	e66981c7e8	feat: update undertaker to use rate limiter (#9498 )	2026-02-24 10:52:07 -08:00
Charles Packer	619e81ed1e	fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492 ) * fix(core): apply OpenAI prompt cache settings to request payloads Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prefix prompt cache key with letta tag Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add integration test * skip test --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Ari Webb <ari@letta.com>	2026-02-24 10:52:07 -08:00
jnjpng	5b001a7749	fix: rename ChatGPT server error to ChatGPT API error (#9497 ) fix: rename ChatGPT server error to ChatGPT API error in error messages	2026-02-24 10:52:07 -08:00
jnjpng	fbc0bb60d9	fix: retry ChatGPT 502 and upstream connection errors with exponential backoff (#9495 ) 502s and upstream connection errors (envoy proxy failures) from ChatGPT were not being retried. This classifies them as LLMConnectionError (retryable) in both the streaming and non-streaming paths, and adds retry handling in the non-streaming HTTPStatusError handler so 502s get the same exponential backoff treatment as transport-level connection drops. 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	26cbdb7b7b	fix(core): skip malformed send_message entries in message conversion (#9494 ) Avoid failing message-list endpoints when historical send_message tool calls are missing the expected message argument by logging and skipping malformed entries during conversion. 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Charles Packer	c32d53f8a3	fix(core): remove old static landing page from Docker image (#9369 ) The "Experience the new ADE" page was outdated and no longer useful. Root path now redirects to /docs (FastAPI Swagger UI) instead. 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
amysguan	80a0d1a95f	Add LLM client compaction errors to traces (#9474 ) * add llm client errors to traces * update response json for telemetry * prevent silent failures and properly log errored responses in streaming path * remove double logging --------- Co-authored-by: Amy Guan <amy@letta.com> Co-authored-by: Kian Jones <kian@letta.com>	2026-02-24 10:52:07 -08:00
jnjpng	e3eafb1977	fix: re-raise LLMError before wrapping with handle_llm_error (#9482 ) LLMError exceptions are already properly formatted errors that should propagate directly. Without this check, they get unnecessarily wrapped by handle_llm_error, losing their original error information.	2026-02-24 10:52:07 -08:00
Kian Jones	2f0294165c	debug: log statement_timeout + connection pid on session checkout (#9472 ) * debug: log statement_timeout + connection pid on every session checkout Temporary instrumentation to diagnose why some PlanetScale connections have statement_timeout=5s while others have 0 (disabled). 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * debug: log statement_timeout on every checkout, not just non-zero 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: rollback implicit transaction from debug query The SELECT implicitly begins a transaction, causing "A transaction is already begun" errors for code that calls session.begin() explicitly. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	0a8a8fda54	feat: add credit verification before agent message endpoints [LET-XXXX] (#9433 ) * feat: add credit verification before agent message endpoints Add credit verification checks to message endpoints to prevent execution when organizations have insufficient credits. - Add InsufficientCreditsError exception type - Add CreditVerificationService that calls step-orchestrator API - Add credit checks to /agents/{id}/messages endpoints - Add credit checks to /conversations/{id}/messages endpoint 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * surface error in ade * do per step instead * parallel check * parallel to step * small fixes * stage publish api * fixes * revert unnecessary frontend changes * insufficient credits stop reason --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	5faec5632f	fix: add m2.5 (#9480 ) * fix: add m2.5 * fix test	2026-02-24 10:52:07 -08:00
amysguan	9bec8c64f5	New prompts/defaults for `sliding_window` and `all` compaction (#9444 ) * new prompts for sliding window and all compaction + defaults to corresponding prompt * regenerate api spec --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:52:07 -08:00
github-actions[bot]	0b08164cc2	fix: update system prompt metadata label to "System prompt last recompiled" (#9477 ) fix: update system prompt metadata label from "Memory blocks were last modified" to "System prompt last recompiled" When git-based memory is enabled, there are no memory blocks, so the label "Memory blocks were last modified" is inaccurate. Changed to "System prompt last recompiled" which accurately reflects the timestamp meaning. Fixes #9476 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00

1 2 3 4 5 ...

4891 Commits