Commit Graph

4915 Commits

Author SHA1 Message Date
amysguan
612a2ae98b Fix: Change Z.ai context window to account for max_token subtraction (#9710)
fix zai context window (functionally [advertised context window] - [max output tokens]) and properly pass in max tokens so Z.ai doesn't default to 65k for GLM-5
2026-03-03 18:34:02 -08:00
Sarah Wooders
a50482e6d3 feat(core): sync skills from SKILL.md into memFS blocks (#9718) 2026-03-03 18:34:02 -08:00
Kevin Lin
a11ba9710c feat(core): increase Gemini timeout to 10 minutes (#9714) 2026-03-03 18:34:02 -08:00
cthomas
ab784e702d feat: support default convo in list and cancel endpoints (#9707)
* feat: support default convo in list and cancel endpoints

* also support in compact endpoint

* api sync
2026-03-03 18:34:02 -08:00
cthomas
39a537a9a5 feat: add default convo support to conversations endpoint (#9706)
* feat: add default convo support to conversations endpoint

* api sync
2026-03-03 18:34:02 -08:00
Ari Webb
673c1220a1 fix: strip properties for fireworks (#9703) 2026-03-03 18:34:02 -08:00
Sarah Wooders
57e7e0e52b feat(core): reserve skills in memfs sync and list top-level skill directory [LET-7710] (#9691) 2026-03-03 18:34:02 -08:00
cthomas
28a66fa9d7 chore: remove stmt timeout debug logging (#9693) 2026-03-03 18:34:02 -08:00
github-actions[bot]
f54ae7c929 feat: render description for non-system files in memory_filesystem tree (#9688)
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
2026-03-03 18:34:02 -08:00
github-actions[bot]
bf80de214d feat: change default context window from 32000 to 128000 (#9673)
* feat: change default context window from 32000 to 128000

Update DEFAULT_CONTEXT_WINDOW and global_max_context_window_limit from
32000 to 128000. Also update all .af (agent files), cypress test
fixtures, and integration tests to use the new default.

Closes #9672

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): update conversation manager tests for auto-created system message

create_conversation now auto-creates a system message at position 0
(from #9508), but the test assertions weren't updated. Adjust expected
message counts and ordering to account for the initial system message.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): fix mock Anthropic models.list() to return async iterable, not coroutine

The real Anthropic SDK's models.list() returns an AsyncPage (with __aiter__)
directly, but the mock used `async def list()` which returns a coroutine.
The code does `async for model in client.models.list()` which needs an
async iterable, not a coroutine. Fix by making list() a regular method.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
2026-03-03 18:34:01 -08:00
jnjpng
46971414a4 fix: preserve agent max_tokens when caller doesn't explicitly set it (#9679)
* fix: preserve agent max_tokens when caller doesn't explicitly set it

When updating an agent with convenience fields (model, model_settings)
but without an explicit max_tokens, the server was constructing a fresh
LLMConfig via get_llm_config_from_handle_async. The Pydantic validator
on LLMConfig hardcodes max_tokens=16384 for gpt-5* models, silently
overriding the agent's existing value (e.g. 128000).

This was triggered by reasoning tab-switch in the CLI, which sends
model + model_settings (with reasoning_effort) but no max_tokens.

Now, when request.max_tokens is None we carry forward the agent's
current max_tokens instead of accepting the provider default.

* fix: use correct 128k max_output_tokens defaults for gpt-5.2/5.3

- Update OpenAI provider fallback to return 128000 for gpt-5.2*/5.3*
  models (except -chat variants which are 16k)
- Update LLMConfig Pydantic validator to match
- Update gpt-5.2 default_config factory to use 128000
- Move server-side max_tokens preservation guard into the
  model_settings branch where llm_config is already available
2026-03-03 18:34:01 -08:00
cthomas
1fb355a39a fix: override stop reason for streaming for empty response (#9663) 2026-03-03 18:34:01 -08:00
Ari Webb
dd0e513951 fix: lazy load conversations [LET-7682] (#9629)
fix: lazy load conversations
2026-03-03 18:34:01 -08:00
cthomas
9422b2d993 fix: set otid for all approval request message (#9655) 2026-03-03 18:34:01 -08:00
cthomas
1448609ecf fix: set otid for summary message (#9654) 2026-03-03 18:34:01 -08:00
cthomas
3d781efd21 fix(core): raise LLMEmptyResponseError for empty Anthropic responses (#9624)
* fix(core): raise LLMEmptyResponseError for empty Anthropic responses

Fixes LET-7679: Opus 4.6 occasionally returns empty responses (no content
and no tool calls), causing silent failures with stop_reason=end_turn.

Changes:
- Add LLMEmptyResponseError class (subclass of LLMServerError)
- Raise error in anthropic_client for empty non-streaming responses
- Raise error in anthropic_streaming_interface for empty streaming responses
- Pass through LLMError instances in handle_llm_error to preserve specific types
- Add test for empty streaming response detection

This allows clients (letta-code) to catch this specific error and implement
retry logic with cache-busting modifications.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): set invalid_llm_response stop reason for empty responses

Catch LLMEmptyResponseError specifically and set stop_reason to
invalid_llm_response instead of llm_api_error. This allows clients
to distinguish empty responses from transient API errors.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:01 -08:00
Kevin Lin
895acb9f4e feat(core): add gpt-5.3-codex model support (#9628)
* feat(core): add gpt-5.3-codex model support

Add OpenAI gpt-5.3-codex model: context window overrides, model pricing
and capabilities, none-reasoning-effort support, and test config.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* just stage-api && just publish-api

---------

Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:01 -08:00
Kian Jones
ddfa922cde fix(core): prevent event loop saturation from ClickHouse and socket trace writes (#9617)
* fix(core): prevent event loop saturation from ClickHouse and socket trace writes

Two issues were causing the event loop watchdog to fire and liveness probes
to fail under load:

1. LLMTraceWriter held an asyncio.Lock across each ClickHouse write, and
   wait_for_async_insert=1 meant each write held that lock for ~1s. Under high
   request volume, N background tasks all queued for the lock simultaneously,
   saturating the event loop with task management overhead. Fix: switch to
   wait_for_async_insert=0 (ClickHouse async_insert handles server-side batching
   — no acknowledgment wait needed) and remove the lock (clickhouse_connect uses
   a thread-safe connection pool). The sync insert still runs in asyncio.to_thread
   so it never blocks the event loop. No traces are dropped.

2. SocketProviderTraceBackend spawned one OS thread per trace with a 60s socket
   timeout. During crouton restarts, threads accumulated blocking on sock.sendall
   for up to 3 minutes each (3 retries x 60s). Fix: reduce socket timeout from
   60s to 5s — the socket is local (Unix socket), so 5s is already generous, and
   fast failure lets retries resolve before threads pile up.

Root cause analysis: event_loop_watchdog.py was detecting saturation (lag >2s)
every ~60s on gke-letta-default-pool-c6915745-fmq6 via thread dumps. The
saturated event loop caused k8s liveness probes to time out, triggering restarts.

* chore(core): sync socket backend with main and document ClickHouse thread safety
2026-03-03 18:34:01 -08:00
github-actions[bot]
94fc05b6e5 feat: remove limit from git-base memory frontmatter and increase default to 100k (#9537)
- Remove `limit` from YAML frontmatter in `serialize_block()` and
  `merge_frontmatter_with_body()` (deprecated for git-base memory)
- Remove `limit` from `_render_memory_blocks_git()` in-context rendering
- Existing frontmatter with `limit` is automatically cleaned up on next write
- Parsing still accepts `limit` from frontmatter for backward compatibility
- Increase `CORE_MEMORY_BLOCK_CHAR_LIMIT` from 20,000 to 100,000
- Update integration tests to assert `limit` is not in frontmatter

Fixes #9536

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
2026-03-03 18:34:01 -08:00
github-actions[bot]
0020f4b866 feat: recompile system message on new conversation creation (#9508)
* feat: recompile system message on new conversation creation

When a new conversation is created, the system prompt is now recompiled
with the latest memory block values and metadata instead of starting
with no messages. This ensures each conversation captures the current
agent state at creation time.

- Add _initialize_conversation_system_message to ConversationManager
- Compile fresh system message using PromptGenerator during conversation creation
- Add integration tests for the full workflow (modify memory → new conversation
  gets updated system message)
- Update existing test expectations for non-empty conversation messages

Fixes #9507

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

* refactor: deduplicate system message compilation into ConversationManager

Consolidate the duplicate system message compilation logic into a single
shared method `compile_and_save_system_message_for_conversation` on
ConversationManager. This method accepts optional pre-loaded agent_state
and message_manager to avoid redundant DB loads when callers already have
them.

- Renamed _initialize_conversation_system_message → compile_and_save_system_message_for_conversation (public, reusable)
- Added optional agent_state and message_manager params
- Replaced 40-line duplicate in helpers.py with a 7-line call to the shared method
- Method returns the persisted system message for caller use

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

---------

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
2026-03-03 18:34:01 -08:00
Caren Thomas
ce54fb1a00 bump version 2026-02-24 10:58:16 -08:00
amysguan
47b0c87ebe Add modes self and self_sliding_window for prompt caching (#9372)
* add self compaction method with proper caching (pass in tools, don't refresh sys prompt beforehand) + sliding fallback

* updated prompts for self compaction

* add tests for self, self_sliding_window modes and w/o refresh messages before compaction

* add cache logging to summarization

* better handling to prevent agent from continuing convo on self modes

* if mode changes via summarize endpoint, will use default prompt for the new mode

---------

Co-authored-by: Amy Guan <amy@letta.com>
2026-02-24 10:55:26 -08:00
Ari Webb
47d55362a4 fix: models need to be paginated (#9621) 2026-02-24 10:55:26 -08:00
cthomas
8ab9d78a23 chore: cleanup (#9602)
* chore: cleanup

* update dependencies
2026-02-24 10:55:26 -08:00
jnjpng
5505e9cf4b fix(core): suppress missing-otid warning for compaction events (#9616)
fix(core): skip missing-otid warning for compaction events
2026-02-24 10:55:26 -08:00
Ari Webb
62967bcca0 feat: parallel tool calling minimax provider [LET-7647] (#9613)
* feat: parallel tool calling minimax provider

* stage publish api
2026-02-24 10:55:26 -08:00
jnjpng
a59f24ac87 fix(core): ensure buffered Anthropic tool chunks always include otid (#9516)
fix(core): ensure otid exists when flushing buffered anthropic tool chunks

Anthropic TOOL_USE buffering can emit buffered tool_call/approval chunks on content block stop before otid is assigned in the normal inner_thoughts_complete path. Ensure flush-time chunks get a deterministic otid so streaming clients can reliably correlate deltas.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:26 -08:00
Shubham Naik
f082fd5061 feat: add order_by and order params to /v1/conversations list endpoin… (#9599)
* feat: add order_by and order params to /v1/conversations list endpoint [LET-7628]

Added sorting support to the conversations list endpoint, matching the pattern from /v1/agents.

**API Changes:**
- Added `order` query param: "asc" or "desc" (default: "desc")
- Added `order_by` query param: "created_at" or "last_run_completion" (default: "created_at")

**Implementation:**

**created_at ordering:**
- Simple ORDER BY on ConversationModel.created_at
- No join required, fast query
- Nulls not applicable (created_at always set)

**last_run_completion ordering:**
- LEFT JOIN with runs table using subquery
- Subquery: MAX(completed_at) grouped by conversation_id
- Uses OUTER JOIN so conversations with no runs are included
- Nulls last ordering (conversations with no runs go to end)
- Index on runs.conversation_id ensures performant join

**Pagination:**
- Cursor-based pagination with `after` parameter
- Handles null values correctly for last_run_completion
- For created_at: simple timestamp comparison
- For last_run_completion: complex null-aware cursor logic

**Performance:**
- Existing index: `ix_runs_conversation_id` on runs table
- Subquery with GROUP BY is efficient for this use case
- OUTER JOIN ensures conversations without runs are included

**Follows agents pattern:**
- Same parameter names (order, order_by)
- Same Literal types and defaults
- Converts "asc"/"desc" to ascending boolean internally

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: order

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:26 -08:00
Sarah Wooders
afbc416972 feat(core): add model/model_settings override fields to conversation create/update (#9607) 2026-02-24 10:55:26 -08:00
Ari Webb
a9a6a5f29d fix: add correct logging (#9603) 2026-02-24 10:55:26 -08:00
Kevin Lin
8fc77af685 fix(memory): standardize tool parameter names (#9552)
fix(memory): standardize tool parameter names

    Use old_string/new_string across memory edit tools, docs, tests, and starter kits to avoid mismatched parameter names.

    👾 Generated with [Letta Code](https://letta.com)

    Co-Authored-By: Letta <noreply@letta.com>
    EOF
    )
2026-02-24 10:55:24 -08:00
jnjpng
9155b4fa86 fix: use canonical stop reason mapping in redis stream finalizer (#9600)
fix: derive run status from StopReasonType mapping
2026-02-24 10:55:12 -08:00
github-actions[bot]
ba67621e1b feat: add conversation deletion endpoint (soft delete) [LET-7286] (#9230)
* feat: add conversation deletion endpoint (soft delete) [LET-7286]

- Add DELETE /conversations/{conversation_id} endpoint
- Filter soft-deleted conversations from list operations
- Add check_is_deleted=True to update/delete operations

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add tests, update SDK and stainless for delete conversation

- Add 5 integration tests for DELETE conversation endpoint
- Run stage-api to regenerate OpenAPI spec and SDK
- Add delete method to conversations in stainless.yml

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: add manager-level tests for conversation soft delete [LET-7286]

- test_delete_conversation_removes_from_list
- test_delete_conversation_double_delete_raises
- test_update_deleted_conversation_raises
- test_delete_conversation_excluded_from_summary_search

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
2026-02-24 10:55:12 -08:00
jnjpng
9c8589a687 fix: correct ChatGPT OAuth GPT-5 max output token defaults (#9592)
fix: align ChatGPT OAuth GPT-5 max output token defaults

Update ChatGPT OAuth provider defaults so GPT-5 family models report 128k max output tokens based on current OpenAI model docs, avoiding incorrect 16k values in /v1/models responses.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:12 -08:00
Shubham Naik
73c824f5d2 feat: make agent_id optional in conversations list endpoint [LET-7612] (#9585)
* feat: make agent_id optional in conversations list endpoint [LET-7612]

Allow listing all conversations without filtering by agent_id.

**Router changes (conversations.py):**
- Changed agent_id from required (`...`) to optional (`None`)
- Updated description to clarify behavior
- Updated docstring to reflect optional filtering

**Manager changes (conversation_manager.py):**
- Updated list_conversations signature: agent_id: str → Optional[str]
- Updated docstring to clarify optional behavior
- Summary search query: conditionally adds agent_id filter only if provided
- Default list logic: passes agent_id (can be None) to list_async

**How it works:**
- Without agent_id: returns all conversations for the user's organization
- With agent_id: returns conversations filtered by that agent
- list_async handles None gracefully via **kwargs pattern

**Use case:**
- Cloud UI can list all user conversations across agents
- Still supports filtering by agent_id when needed

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: update logs

* chore: update logs

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:12 -08:00
jnjpng
257b99923b fix: preserve max_tokens on model_settings updates without max_output_tokens (#9591)
When model_settings is sent without max_output_tokens (e.g. only
changing reasoning_effort), the Pydantic default of 4096 was being
applied via _to_legacy_config_params(), silently overwriting the
agent's existing max_tokens.

Use model_fields_set to detect when max_output_tokens was not
explicitly provided and skip overwriting max_tokens in that case.
Only applied to the update path — on create, letting the default
apply is reasonable since there's no pre-existing value.
2026-02-24 10:55:12 -08:00
cthomas
857c289ed2 fix: handle compact edge case in idempotency check (#9588) 2026-02-24 10:55:12 -08:00
cthomas
73c9b14fa9 fix: dont throw error if compaction races (#9576) 2026-02-24 10:55:12 -08:00
jnjpng
f10440b49c fix: update Anthropic Haiku test model after 3.5 retirement (#9569)
* fix: migrate Anthropic Haiku test model off retired release

Update Anthropic Haiku references in integration and usage parsing tests to a supported model id so test requests stop failing with 404 model not found errors.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use canonical Anthropic Haiku handle in tests

Replace dated Anthropic Haiku handle references with the canonical provider handle so handle-based model resolution does not fail in batch and client tests.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:12 -08:00
amysguan
a101d5980d Fix: load config for summarizer model from defaults instead of agent's config (#9568)
* load default settings instead of loading from agent for summarizer config

* update tests to allow use of get_llm_config_from_handle

* remove nit comment

---------

Co-authored-by: Amy Guan <amy@letta.com>
2026-02-24 10:55:12 -08:00
amysguan
33969d7190 Default to lightweight compaction model instead of agent's model (#9488)
---------

Co-authored-by: Amy Guan <amy@letta.com>
2026-02-24 10:55:12 -08:00
jnjpng
eb4a0daabd fix: allow explicit null for max_tokens on GPT-5 models (#9562)
The Pydantic validator `set_model_specific_defaults` was checking
`values.get("max_tokens") is None`, which matched both "field not
provided" and "field explicitly set to null". This meant users could
not disable the max output tokens limit for GPT-5/GPT-4.1 models -
the validator would always override null with a default value during
request deserialization.

Changed to `"max_tokens" not in values` so that an explicit
`max_tokens: null` is preserved while still applying defaults when
the field is omitted entirely.
2026-02-24 10:55:12 -08:00
jnjpng
828c89c76f fix: populate max_tokens when listing LLM models (#9559)
list_llm_models_async was constructing LLMConfig without max_tokens,
causing the GET /models/ endpoint to return null for max_tokens.
Now calls typed_provider.get_default_max_output_tokens() for both
base and BYOK provider paths, matching get_llm_config_from_handle.
2026-02-24 10:55:12 -08:00
Kevin Lin
bd5b5fa9f3 feat(gemini): add 3.1 pro preview support (#9553)
Add 3.1 model metadata for Google AI and update Gemini tests/examples to use the new handle.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
cthomas
e2ad8762fe fix: gemini streaming bug (#9555) 2026-02-24 10:55:11 -08:00
cthomas
8ffc515674 fix: flip parallel_tool_calls setting default (#9541) 2026-02-24 10:55:11 -08:00
cthomas
3cdd64dc24 chore: update keepalive interval 50->20 (#9538)
* chore: update keepalive interval 50->20

* update comment
2026-02-24 10:55:11 -08:00
Kian Jones
8f56527958 perf(memfs): delta upload — only push new git objects after commit (#9548)
perf(memfs): delta upload — only push new/modified git objects after commit

Instead of re-uploading the entire .git/ directory after every commit,
snapshot file mtimes before the commit and only upload files that are
new or changed. A typical single-block update creates ~5 new objects
(blob, trees, commit, ref) vs re-uploading all ~30.

Full _upload_repo retained for create_repo and other paths that need it.

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
Charles Packer
044241daec fix(core): include effort in AnthropicModelSettings returned by _to_model_settings() (#9543)
LlmConfig._to_model_settings() for Anthropic built an AnthropicModelSettings
object without passing effort=self.effort, so GET /agents/{id} never returned
the effort field in model_settings even when it was stored on the agent.

The Letta Code CLI derives the reasoning tier displayed in the status bar
from model_settings.effort (canonical source), so the footer always showed
e.g. "Sonnet 4.6" instead of "Sonnet 4.6 (high)" after a model switch.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
Kian Jones
e65795b5f1 fix(core): handle None message_ids in context window calculator (#9330)
* fix(core): always create system message even with _init_with_no_messages

When _init_with_no_messages=True (used by agent import flows), the agent
was created with message_ids=None. If subsequent message initialization
failed, this left orphaned agents that crash when context window is
calculated (TypeError on message_ids[1:]).

Now the system message is always generated and persisted, even when
skipping the rest of the initial message sequence. This ensures every
agent has at least message_ids=[system_message_id].

Fixes Datadog issue 773a24ea-eeb3-11f0-8f9f-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): clean up placeholder messages during import and add test

Delete placeholder system messages after imported messages are
successfully created (not before), so agents retain their safety-net
system message if import fails. Also adds a test verifying that
_init_with_no_messages=True still produces a valid context window.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): add descriptive error for empty message_ids in get_system_message

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00