letta-server

Author	SHA1	Message	Date
Ani Tunturi	1d1adb261a	fix: orphaned approvals, token inflation, reasoning fields, memfs redis dep Some checks are pending Test Package Installation / test-install (3.11) (push) Waiting to run Details Test Package Installation / test-install (3.12) (push) Waiting to run Details Test Package Installation / test-install (3.13) (push) Waiting to run Details [IN TESTING — self-hosted 0.16.6, Kimi-K2.5 via Synthetic Direct] Four independent fixes that landed together on this stack: helpers.py — skip PendingApprovalError when the associated run is already cancelled or failed. Stale approvals from interrupted runs were blocking all subsequent messages on that conversation. Now checks run status before raising; falls back to raising on lookup failure (conservative). letta_agent_v3.py — use prompt_tokens not total_tokens for context window estimate. total_tokens inflated the estimate by including completion tokens, triggering premature compaction. This was causing context window resets mid- conversation and is the root of the token inflation bug (see #3242). openai_client.py (both build_request_data paths) — strip reasoning_content, reasoning_content_signature, redacted_reasoning_content, omitted_reasoning_content from message history before sending to inference backends. Fireworks and Synthetic Direct reject these fields with 422/400 errors. exclude_none handles None values but not actual text content from previous assistant turns. block_manager_git.py — skip DB write when block value is unchanged. Reduces unnecessary write amplification on every memfs sync cycle. memfs_client_base.py — remove redis_client= kwarg from GitOperations init. Dependency was removed upstream but the call site wasn't updated. Dockerfile / compose files — context window and config updates for 220k limit.	2026-03-26 23:24:32 -04:00
cthomas	416ffc7cd7	Add billing context to LLM telemetry traces (#9745 ) * feat: add billing context to LLM telemetry traces Add billing metadata (plan type, cost source, customer ID) to LLM traces in ClickHouse for cost analytics and attribution. Data Flow: - Cloud-API: Extract billing info from subscription in rate limiting, set x-billing-* headers - Core: Parse headers into BillingContext object via dependencies - Adapters: Flow billing_context through all LLM adapters (blocking & streaming) - Agent: Pass billing_context to step() and stream() methods - ClickHouse: Store in billing_plan_type, billing_cost_source, billing_customer_id columns Changes: - Add BillingContext schema to provider_trace.py - Add billing columns to llm_traces ClickHouse table DDL - Update getCustomerSubscription to fetch stripeCustomerId from organization_billing_details - Propagate billing_context through agent step flow, adapters, and streaming service - Update ProviderTrace and LLMTrace to include billing metadata - Regenerate SDK with autogen Production Deployment: Requires env vars: LETTA_PROVIDER_TRACE_BACKEND=clickhouse, LETTA_STORE_LLM_TRACES=true, CLICKHOUSE_* 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add billing_context parameter to agent step methods - Add billing_context to BaseAgent and BaseAgentV2 abstract methods - Update LettaAgent, LettaAgentV2, LettaAgentV3 step methods - Update multi-agent groups: SleeptimeMultiAgentV2, V3, V4 - Fix test_utils.py to include billing header parameters - Import BillingContext in all affected files * fix: add billing_context to stream methods - Add billing_context parameter to BaseAgentV2.stream() - Add billing_context parameter to LettaAgentV2.stream() - LettaAgentV3.stream() already has it from previous commit * fix: exclude billing headers from OpenAPI spec Mark billing headers as internal (include_in_schema=False) so they don't appear in the public API. These are internal headers between cloud-api and core, not part of the public SDK. Regenerated SDK with stage-api - removes 10,650 lines of bloat that was causing OOM during Next.js build. * refactor: return billing context from handleUnifiedRateLimiting instead of mutating req Instead of passing req into handleUnifiedRateLimiting and mutating headers inside it: - Return billing context fields (billingPlanType, billingCostSource, billingCustomerId) from handleUnifiedRateLimiting - Set headers in handleMessageRateLimiting (middleware layer) after getting the result - This fixes step-orchestrator compatibility since it doesn't have a real Express req object * chore: remove extra gencode * p --------- Co-authored-by: Letta <noreply@letta.com>	2026-03-03 18:34:13 -08:00
jnjpng	db9e0f42af	fix(core): prevent ModelSettings default max_output_tokens from overriding agent config (#9739 ) * fix(core): prevent ModelSettings default max_output_tokens from overriding agent config When a conversation's model_settings were saved, the Pydantic default of max_output_tokens=4096 was always persisted to the DB even when the client never specified it. On subsequent messages, this default would overwrite the agent's max_tokens (typically None) with 4096, silently capping output. Two changes: 1. Use model_dump(exclude_unset=True) when persisting model_settings to the DB so Pydantic defaults are not saved. 2. Add model_fields_set guards at all callsites that apply _to_legacy_config_params() to skip max_tokens when it was not explicitly provided by the caller. Also conditionally set max_output_tokens in the OpenAI Responses API request builder so None is not sent as null (which some models treat as a hard 4096 cap). * nit * Fix model_settings serialization to preserve provider_type discriminator Replace blanket exclude_unset=True with targeted removal of only max_output_tokens when not explicitly set. The previous approach stripped the provider_type field (a Literal with a default), which broke discriminated union deserialization when reading back from DB.	2026-03-03 18:34:02 -08:00
Ari Webb	8335aa0fa0	fix: add some more logging for interrupts (#9733 )	2026-03-03 18:34:02 -08:00
Christina Tong	c8ae02a1fb	feat(core): sort agents by updated_at [LET-7771] (#9730 ) feat(core): sort agents by last_updated_at	2026-03-03 18:34:02 -08:00
amysguan	8e60b73eee	fix: minor change in upsert logic for prompt default (#9729 ) minor compaction upsert change	2026-03-03 18:34:02 -08:00
amysguan	c28ba77354	Fix: ADE compaction button compacts current conversation (#9720 ) * ADE compaction button compacts current conversation, update conversation endpoint * update name (summerizer --> summarizer), type fixes * bug fix for conversation + self_compact_sliding_window * chore: add French translations for AgentSimulatorOptionsMenu Add missing French translations for the AgentSimulatorOptionsMenu section to match en.json changes. Co-authored-by: Christina Tong <christinatong01@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * retrigger CI * error typefix --------- Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Letta <noreply@letta.com>	2026-03-03 18:34:02 -08:00
amysguan	7a4188dbda	Add compaction settings to ADE (#9667 ) * add compaction settings to ADE, add get default prompt for updated mode route * update patch to auto set prompt on mode change, related ade changes * reset api and update test * feat: add compaction configuration translation keys for fr and cn Add ADE/CompactionConfiguration translation keys to fr.json and cn.json to match the new keys added in en.json. Co-authored-by: Christina Tong <christinatong01@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * type/translation/etc fixes * fix typing * update model selector path w/ change from main * import mode from sdk --------- Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Letta <noreply@letta.com>	2026-03-03 18:34:02 -08:00
Sarah Wooders	a50482e6d3	feat(core): sync skills from SKILL.md into memFS blocks (#9718 )	2026-03-03 18:34:02 -08:00
cthomas	39a537a9a5	feat: add default convo support to conversations endpoint (#9706 ) * feat: add default convo support to conversations endpoint * api sync	2026-03-03 18:34:02 -08:00
Sarah Wooders	57e7e0e52b	feat(core): reserve skills in memfs sync and list top-level skill directory [LET-7710] (#9691 )	2026-03-03 18:34:02 -08:00
cthomas	1fb355a39a	fix: override stop reason for streaming for empty response (#9663 )	2026-03-03 18:34:01 -08:00
Kian Jones	ddfa922cde	fix(core): prevent event loop saturation from ClickHouse and socket trace writes (#9617 ) * fix(core): prevent event loop saturation from ClickHouse and socket trace writes Two issues were causing the event loop watchdog to fire and liveness probes to fail under load: 1. LLMTraceWriter held an asyncio.Lock across each ClickHouse write, and wait_for_async_insert=1 meant each write held that lock for ~1s. Under high request volume, N background tasks all queued for the lock simultaneously, saturating the event loop with task management overhead. Fix: switch to wait_for_async_insert=0 (ClickHouse async_insert handles server-side batching — no acknowledgment wait needed) and remove the lock (clickhouse_connect uses a thread-safe connection pool). The sync insert still runs in asyncio.to_thread so it never blocks the event loop. No traces are dropped. 2. SocketProviderTraceBackend spawned one OS thread per trace with a 60s socket timeout. During crouton restarts, threads accumulated blocking on sock.sendall for up to 3 minutes each (3 retries x 60s). Fix: reduce socket timeout from 60s to 5s — the socket is local (Unix socket), so 5s is already generous, and fast failure lets retries resolve before threads pile up. Root cause analysis: event_loop_watchdog.py was detecting saturation (lag >2s) every ~60s on gke-letta-default-pool-c6915745-fmq6 via thread dumps. The saturated event loop caused k8s liveness probes to time out, triggering restarts. * chore(core): sync socket backend with main and document ClickHouse thread safety	2026-03-03 18:34:01 -08:00
github-actions[bot]	94fc05b6e5	feat: remove limit from git-base memory frontmatter and increase default to 100k (#9537 ) - Remove `limit` from YAML frontmatter in `serialize_block()` and `merge_frontmatter_with_body()` (deprecated for git-base memory) - Remove `limit` from `_render_memory_blocks_git()` in-context rendering - Existing frontmatter with `limit` is automatically cleaned up on next write - Parsing still accepts `limit` from frontmatter for backward compatibility - Increase `CORE_MEMORY_BLOCK_CHAR_LIMIT` from 20,000 to 100,000 - Update integration tests to assert `limit` is not in frontmatter Fixes #9536 Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com> Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>	2026-03-03 18:34:01 -08:00
github-actions[bot]	0020f4b866	feat: recompile system message on new conversation creation (#9508 ) * feat: recompile system message on new conversation creation When a new conversation is created, the system prompt is now recompiled with the latest memory block values and metadata instead of starting with no messages. This ensures each conversation captures the current agent state at creation time. - Add _initialize_conversation_system_message to ConversationManager - Compile fresh system message using PromptGenerator during conversation creation - Add integration tests for the full workflow (modify memory → new conversation gets updated system message) - Update existing test expectations for non-empty conversation messages Fixes #9507 Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com> * refactor: deduplicate system message compilation into ConversationManager Consolidate the duplicate system message compilation logic into a single shared method `compile_and_save_system_message_for_conversation` on ConversationManager. This method accepts optional pre-loaded agent_state and message_manager to avoid redundant DB loads when callers already have them. - Renamed _initialize_conversation_system_message → compile_and_save_system_message_for_conversation (public, reusable) - Added optional agent_state and message_manager params - Replaced 40-line duplicate in helpers.py with a 7-line call to the shared method - Method returns the persisted system message for caller use Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com> --------- Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>	2026-03-03 18:34:01 -08:00
amysguan	47b0c87ebe	Add modes `self` and `self_sliding_window` for prompt caching (#9372 ) * add self compaction method with proper caching (pass in tools, don't refresh sys prompt beforehand) + sliding fallback * updated prompts for self compaction * add tests for self, self_sliding_window modes and w/o refresh messages before compaction * add cache logging to summarization * better handling to prevent agent from continuing convo on self modes * if mode changes via summarize endpoint, will use default prompt for the new mode --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:55:26 -08:00
cthomas	8ab9d78a23	chore: cleanup (#9602 ) * chore: cleanup * update dependencies	2026-02-24 10:55:26 -08:00
Shubham Naik	f082fd5061	feat: add order_by and order params to /v1/conversations list endpoin… (#9599 ) * feat: add order_by and order params to /v1/conversations list endpoint [LET-7628] Added sorting support to the conversations list endpoint, matching the pattern from /v1/agents. API Changes: - Added `order` query param: "asc" or "desc" (default: "desc") - Added `order_by` query param: "created_at" or "last_run_completion" (default: "created_at") Implementation: created_at ordering: - Simple ORDER BY on ConversationModel.created_at - No join required, fast query - Nulls not applicable (created_at always set) last_run_completion ordering: - LEFT JOIN with runs table using subquery - Subquery: MAX(completed_at) grouped by conversation_id - Uses OUTER JOIN so conversations with no runs are included - Nulls last ordering (conversations with no runs go to end) - Index on runs.conversation_id ensures performant join Pagination: - Cursor-based pagination with `after` parameter - Handles null values correctly for last_run_completion - For created_at: simple timestamp comparison - For last_run_completion: complex null-aware cursor logic Performance: - Existing index: `ix_runs_conversation_id` on runs table - Subquery with GROUP BY is efficient for this use case - OUTER JOIN ensures conversations without runs are included Follows agents pattern: - Same parameter names (order, order_by) - Same Literal types and defaults - Converts "asc"/"desc" to ascending boolean internally 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: order --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:26 -08:00
Sarah Wooders	afbc416972	feat(core): add model/model_settings override fields to conversation create/update (#9607 )	2026-02-24 10:55:26 -08:00
Kevin Lin	8fc77af685	fix(memory): standardize tool parameter names (#9552 ) fix(memory): standardize tool parameter names Use old_string/new_string across memory edit tools, docs, tests, and starter kits to avoid mismatched parameter names. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> EOF )	2026-02-24 10:55:24 -08:00
github-actions[bot]	ba67621e1b	feat: add conversation deletion endpoint (soft delete) [LET-7286] (#9230 ) * feat: add conversation deletion endpoint (soft delete) [LET-7286] - Add DELETE /conversations/{conversation_id} endpoint - Filter soft-deleted conversations from list operations - Add check_is_deleted=True to update/delete operations Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add tests, update SDK and stainless for delete conversation - Add 5 integration tests for DELETE conversation endpoint - Run stage-api to regenerate OpenAPI spec and SDK - Add delete method to conversations in stainless.yml Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: add manager-level tests for conversation soft delete [LET-7286] - test_delete_conversation_removes_from_list - test_delete_conversation_double_delete_raises - test_update_deleted_conversation_raises - test_delete_conversation_excluded_from_summary_search 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>	2026-02-24 10:55:12 -08:00
Shubham Naik	73c824f5d2	feat: make agent_id optional in conversations list endpoint [LET-7612] (#9585 ) * feat: make agent_id optional in conversations list endpoint [LET-7612] Allow listing all conversations without filtering by agent_id. Router changes (conversations.py): - Changed agent_id from required (`...`) to optional (`None`) - Updated description to clarify behavior - Updated docstring to reflect optional filtering Manager changes (conversation_manager.py): - Updated list_conversations signature: agent_id: str → Optional[str] - Updated docstring to clarify optional behavior - Summary search query: conditionally adds agent_id filter only if provided - Default list logic: passes agent_id (can be None) to list_async How it works: - Without agent_id: returns all conversations for the user's organization - With agent_id: returns conversations filtered by that agent - list_async handles None gracefully via kwargs pattern Use case:** - Cloud UI can list all user conversations across agents - Still supports filtering by agent_id when needed 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: update logs * chore: update logs --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:12 -08:00
amysguan	a101d5980d	Fix: load config for summarizer model from defaults instead of agent's config (#9568 ) * load default settings instead of loading from agent for summarizer config * update tests to allow use of get_llm_config_from_handle * remove nit comment --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:55:12 -08:00
amysguan	33969d7190	Default to lightweight compaction model instead of agent's model (#9488 ) --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:55:12 -08:00
Kian Jones	8f56527958	perf(memfs): delta upload — only push new git objects after commit (#9548 ) perf(memfs): delta upload — only push new/modified git objects after commit Instead of re-uploading the entire .git/ directory after every commit, snapshot file mtimes before the commit and only upload files that are new or changed. A typical single-block update creates ~5 new objects (blob, trees, commit, ref) vs re-uploading all ~30. Full _upload_repo retained for create_repo and other paths that need it. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
Kian Jones	e65795b5f1	fix(core): handle None message_ids in context window calculator (#9330 ) * fix(core): always create system message even with _init_with_no_messages When _init_with_no_messages=True (used by agent import flows), the agent was created with message_ids=None. If subsequent message initialization failed, this left orphaned agents that crash when context window is calculated (TypeError on message_ids[1:]). Now the system message is always generated and persisted, even when skipping the rest of the initial message sequence. This ensures every agent has at least message_ids=[system_message_id]. Fixes Datadog issue 773a24ea-eeb3-11f0-8f9f-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): clean up placeholder messages during import and add test Delete placeholder system messages after imported messages are successfully created (not before), so agents retain their safety-net system message if import fails. Also adds a test verifying that _init_with_no_messages=True still produces a valid context window. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): add descriptive error for empty message_ids in get_system_message 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:55:11 -08:00
Ari Webb	5896e5d023	fix: logging for credit verification step (#9514 )	2026-02-24 10:55:11 -08:00
Kian Jones	f5c4ab50f4	chore: add ty + pre-commit hook and repeal even more ruff rules (#9504 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import * remove all ignores, add FastAPI rules and Ruff rules * add ty and precommit * ruff stuff * ty check fixes * ty check fixes pt 2 * error on invalid	2026-02-24 10:55:11 -08:00
Kian Jones	25d54dd896	chore: enable F821, F401, W293 (#9503 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import	2026-02-24 10:55:08 -08:00
Shubham Naik	20c71523f8	chore: hotwire fix for core (#9500 )	2026-02-24 10:52:07 -08:00
Shubham Naik	e66981c7e8	feat: update undertaker to use rate limiter (#9498 )	2026-02-24 10:52:07 -08:00
amysguan	80a0d1a95f	Add LLM client compaction errors to traces (#9474 ) * add llm client errors to traces * update response json for telemetry * prevent silent failures and properly log errored responses in streaming path * remove double logging --------- Co-authored-by: Amy Guan <amy@letta.com> Co-authored-by: Kian Jones <kian@letta.com>	2026-02-24 10:52:07 -08:00
Ari Webb	0a8a8fda54	feat: add credit verification before agent message endpoints [LET-XXXX] (#9433 ) * feat: add credit verification before agent message endpoints Add credit verification checks to message endpoints to prevent execution when organizations have insufficient credits. - Add InsufficientCreditsError exception type - Add CreditVerificationService that calls step-orchestrator API - Add credit checks to /agents/{id}/messages endpoints - Add credit checks to /conversations/{id}/messages endpoint 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * surface error in ade * do per step instead * parallel check * parallel to step * small fixes * stage publish api * fixes * revert unnecessary frontend changes * insufficient credits stop reason --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
amysguan	9bec8c64f5	New prompts/defaults for `sliding_window` and `all` compaction (#9444 ) * new prompts for sliding window and all compaction + defaults to corresponding prompt * regenerate api spec --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	05073ba837	fix(core): preserve git-memory formatting and enforce lock conflicts (#9451 ) * test(core): strengthen git-memory system prompt stability integration coverage Switch git-memory HTTP integration tests to OpenAI model handles and add assertions that system prompt content remains stable after normal turns and direct block value updates until explicit recompilation or reset. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): preserve git-memory formatting and enforce lock conflicts Preserve existing markdown frontmatter formatting on block updates while still ensuring required metadata fields exist, and make post-push git sync propagate memory-repo lock conflicts as 409 responses. Also enable slash-containing core-memory block labels in route params and add regression coverage. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(memfs): fail closed on memory repo lock contention Make memfs git commits fail closed when the per-agent Redis lock cannot be acquired, return 409 MEMORY_REPO_BUSY from the memfs files write API, and map that 409 back to core MemoryRepoBusyError so API callers receive consistent busy conflicts. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore(core): minimize git-memory fix scope to memfs lock and frontmatter paths 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: drop unrelated changes and keep memfs-focused scope Revert branch-only changes that are not required for the memfs lock contention and frontmatter-preservation fix so the PR contains only issue-relevant files. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(memfs): lock push sync path and improve nested sync diagnostics Serialize memfs push-to-GCS sync with the same per-agent Redis lock key used by API commits, and add targeted post-push nested-block diagnostics plus a focused nested-label sync regression test for _sync_after_push. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	d7793a4474	fix(core): stabilize system prompt refresh and expand git-memory coverage (#9438 ) * fix(core): stabilize system prompt refresh and expand git-memory coverage Only rebuild system prompts on explicit refresh paths so normal turns preserve prefix-cache stability, including git/custom prompt layouts. Add integration coverage for memory filesystem tree structure and recompile/reset system-message updates via message-id retrieval. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): recompile system prompt around compaction and stabilize source tests Force system prompt refresh before/after compaction in LettaAgentV3 so repaired system+memory state is used and persisted across subsequent turns. Update source-system prompt tests to explicitly recompile before raw preview assertions instead of assuming automatic rebuild timing. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	e3dbb44fc9	feat(telem): support reading from clickhouse traces (#9431 ) draft	2026-02-24 10:52:07 -08:00
Kian Jones	6d4e320cc3	chore: clean up dead code (#9427 ) remove striaght up unsued code files	2026-02-24 10:52:07 -08:00
Kian Jones	ddcfeb26b1	fix(core): catch all MCP tool execution errors instead of re-raising (#9419 ) * fix(core): catch all MCP tool execution errors instead of re-raising MCP tools are external user-configured servers - any failure during tool execution is expected and should be returned as (error_msg, False) to the agent, not raised as an exception that hits Datadog as a 500. Previously: - base_client.py only caught McpError/ToolError, re-raised everything else - fastmcp_client.py (both SSE and StreamableHTTP) always re-raised Now all three execute_tool() methods catch all exceptions and return the error message to the agent conversation. The agent handles tool failures via the error message naturally. This silences ~15 Datadog issue types including: - fastmcp.exceptions.ToolError (validation, permissions) - mcp.shared.exceptions.McpError (connection closed, credentials) - httpx.HTTPStatusError (503 from Zapier, etc.) - httpx.ConnectError, ReadTimeout, RemoteProtocolError - requests.exceptions.ConnectionError - builtins.ConnectionError 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): log unexpected MCP errors at warning level with traceback Expected MCP errors (ToolError, McpError, httpx.*, ConnectionError, etc.) log at info level. Anything else (e.g. TypeError, AttributeError from our own code) logs at warning with exc_info=True so it still surfaces in Datadog without crashing the request. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	382e216cbb	fix(core): differentiate BYOK vs base provider in all LLM error details (#9425 ) Add is_byok flag to every LLMError's details dict returned from handle_llm_error across all providers (OpenAI, Anthropic, Google, ChatGPT OAuth). This enables observability into whether errors originate from Letta's production keys or user-provided BYOK keys. The rate limit handler in app.py now returns a more helpful message for BYOK users ("check your provider's rate limits and billing") versus the generic message for base provider rate limits. Datadog issues: - https://us5.datadoghq.com/error-tracking/issue/b711c824-f490-11f0-96e4-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/76623036-f4de-11f0-8697-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/43e9888a-dfcf-11f0-a645-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
jnjpng	39b25a0e3c	fix: update ContextWindowCalculator to parse new system message sections (#9398 ) * fix: update ContextWindowCalculator to parse new system message sections The context window calculator was using outdated position-based parsing that only handled 3 sections (base_instructions, memory_blocks, memory_metadata). The actual system message now includes additional sections that were not being tracked: - <memory_filesystem> (git-enabled agents) - <tool_usage_rules> (when tool rules configured) - <directories> (when sources attached) Changes: - Add _extract_tag_content() helper for proper XML tag extraction - Rewrite extract_system_components() to return a Dict with all 6 sections - Update calculate_context_window() to count tokens for new sections - Add new fields to ContextWindowOverview schema with backward-compatible defaults - Add unit tests for the extraction logic * update * generate * fix: check attached file in directories section instead of core_memory Files are rendered inside <directories> tags, not <memory_blocks>. Update validate_context_window_overview assertions accordingly. * fix: address review feedback for context window parser - Fix git-enabled agents regression: capture bare file blocks (e.g. <system/human.md>) rendered after </memory_filesystem> as core_memory via new _extract_git_core_memory() method - Make _extract_top_level_tag robust: scan all occurrences to find tag outside container, handling nested-first + top-level-later case - Document system_prompt tag inconsistency in docstring - Add TODO to base_agent.py extract_dynamic_section linking to ContextWindowCalculator to flag parallel parser tech debt - Add tests: git-enabled agent parsing, dual-occurrence tag extraction, pure text system prompt, git-enabled integration test	2026-02-24 10:52:07 -08:00
Sarah Wooders	2ffef0fb31	Fix git-memory context preview parsing (#9414 ) * fix(core): handle git memory label prefix collisions in filesystem view Prevent context window preview crashes when a block label is both a leaf and a prefix (e.g. system/human and system/human/context) by rendering a node as both file and directory. Add regression test. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): parse git-backed core memory in context window preview ContextWindowCalculator.extract_system_components now detects git-backed memory rendering (<memory_filesystem> and <system/...> tags) when <memory_blocks> wrapper is absent, so core_memory is populated in the context preview. Add regression tests. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Shubham Naik	ca32311b9a	feat: allow users to specify via query to stip messages [LET-7392] (#9411 ) * feat: allow users to specify via query to stip messages * chore: regenerate API SDK and OpenAPI spec [LET-7392] 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Ari Webb <AriWebb@users.noreply.github.com> Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Ari Webb <AriWebb@users.noreply.github.com> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	0dde155e9a	feat: Prefix cache optimization system prompt (#9381 )	2026-02-24 10:52:07 -08:00
Kian Jones	7eb85707b1	feat(tf): gpu runners and prod memory_repos (#9283 ) * add gpu runners and prod memory_repos * add lmstudio and vllm in model_settings * fix llm_configs and change variable name in reusable workflow and change perms for memory_repos to admin in tf * fix: update self-hosted provider tests to use SDK 1.0 and v2 tests - Update letta-client from ==0.1.324 to >=1.0.0 - Switch ollama/vllm/lmstudio tests to integration_test_send_message_v2.py 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai provider_type for self-hosted model settings ollama/vllm/lmstudio are not valid provider_type values in the SDK model_settings schema - they use openai-compatible APIs so provider_type should be openai. The provider routing is determined by the handle prefix. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai_compat_base_url for ollama/vllm/lmstudio providers When reconstructing LLMConfig from a model handle lookup, use the provider's openai_compat_base_url (which includes /v1) instead of raw base_url. This fixes 404 errors when calling ollama/vllm/lmstudio since OpenAI client expects /v1/chat/completions endpoint. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: enable redis for ollama/vllm/lmstudio tests Background streaming tests require Redis. Add use-redis: true to self-hosted provider test workflows. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add memfs-py in prod bucket access * change ollama * change packer model defaults * self-hosted provider support * diasble reasoner to match the number of messages in test case, enable parallel tool calls, and pass embedding configs * remove reasoning setting not supported for ollama * add qwen3 to extra assistant message case * lower temp * prep for lmstudio and vllm * used lmstudio_openai client * skip parallel tool calls on cpu ran provider lmstudio * revert downgrade since it's so slow already * add reuired flags for tool call parsing etc. * change tool call parser from hermes to qwen3_xml * qwen3_xmlk -> qwen3_coder * upgrade vllm to latest container * revert to hermes (incompatible with parallel tool calls?) and skipping vllm tests on parallel tool calls * install uv redis extra * remove lmstudio --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	f9f1c55c93	fix: fix context preview for git (#9403 )	2026-02-24 10:52:07 -08:00
Sarah Wooders	bbc648909b	refactor: drop memory/ prefix from git memory repo file paths and update core memory rendering [LET-7356] (#9395 )	2026-02-24 10:52:07 -08:00
Kian Jones	6e0e1cc312	fix(core): validate run exists before creating step/step_metrics (#9382 ) Checks if the referenced run_id exists in the runs table before inserting steps and step_metrics. If the run doesn't exist (deleted or failed creation), sets run_id to None instead of hitting ForeignKeyViolationError on fk_steps_run_id. Fixes https://us5.datadoghq.com/error-tracking/issue/a1768774-d691-11f0-9330-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	71e0a8aab9	fix(core): use INSERT ON CONFLICT DO NOTHING for provider model sync (#9342 ) * fix(core): use INSERT ON CONFLICT DO NOTHING for provider model sync Replaces try/except around model.create_async() with pg_insert() .on_conflict_do_nothing() to prevent UniqueViolationError from being raised at the asyncpg driver level during concurrent model syncs. The previous approach caught the exception in Python but ddtrace still captured it at the driver level, causing Datadog error tracking noise. Fixes Datadog issue d8dec148-d535-11f0-95eb-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * cleaner impl * fix --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Ari Webb <ari@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	526da4c49b	Revert "perf: optimize prefix caching by skipping system prompt rebuild on every step" (#9380 ) Revert "perf: optimize prefix caching by skipping system prompt rebuild on ev…" This reverts commit eafa4144c2577a45b7007a177b701863b98d1dfa.	2026-02-24 10:52:07 -08:00

1 2 3 4 5 ...

1852 Commits