letta-server

Author	SHA1	Message	Date
Sarah Wooders	0dde155e9a	feat: Prefix cache optimization system prompt (#9381 )	2026-02-24 10:52:07 -08:00
Kian Jones	7eb85707b1	feat(tf): gpu runners and prod memory_repos (#9283 ) * add gpu runners and prod memory_repos * add lmstudio and vllm in model_settings * fix llm_configs and change variable name in reusable workflow and change perms for memory_repos to admin in tf * fix: update self-hosted provider tests to use SDK 1.0 and v2 tests - Update letta-client from ==0.1.324 to >=1.0.0 - Switch ollama/vllm/lmstudio tests to integration_test_send_message_v2.py 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai provider_type for self-hosted model settings ollama/vllm/lmstudio are not valid provider_type values in the SDK model_settings schema - they use openai-compatible APIs so provider_type should be openai. The provider routing is determined by the handle prefix. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use openai_compat_base_url for ollama/vllm/lmstudio providers When reconstructing LLMConfig from a model handle lookup, use the provider's openai_compat_base_url (which includes /v1) instead of raw base_url. This fixes 404 errors when calling ollama/vllm/lmstudio since OpenAI client expects /v1/chat/completions endpoint. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: enable redis for ollama/vllm/lmstudio tests Background streaming tests require Redis. Add use-redis: true to self-hosted provider test workflows. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * add memfs-py in prod bucket access * change ollama * change packer model defaults * self-hosted provider support * diasble reasoner to match the number of messages in test case, enable parallel tool calls, and pass embedding configs * remove reasoning setting not supported for ollama * add qwen3 to extra assistant message case * lower temp * prep for lmstudio and vllm * used lmstudio_openai client * skip parallel tool calls on cpu ran provider lmstudio * revert downgrade since it's so slow already * add reuired flags for tool call parsing etc. * change tool call parser from hermes to qwen3_xml * qwen3_xmlk -> qwen3_coder * upgrade vllm to latest container * revert to hermes (incompatible with parallel tool calls?) and skipping vllm tests on parallel tool calls * install uv redis extra * remove lmstudio --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	f9f1c55c93	fix: fix context preview for git (#9403 )	2026-02-24 10:52:07 -08:00
Sarah Wooders	bbc648909b	refactor: drop memory/ prefix from git memory repo file paths and update core memory rendering [LET-7356] (#9395 )	2026-02-24 10:52:07 -08:00
Kian Jones	6e0e1cc312	fix(core): validate run exists before creating step/step_metrics (#9382 ) Checks if the referenced run_id exists in the runs table before inserting steps and step_metrics. If the run doesn't exist (deleted or failed creation), sets run_id to None instead of hitting ForeignKeyViolationError on fk_steps_run_id. Fixes https://us5.datadoghq.com/error-tracking/issue/a1768774-d691-11f0-9330-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	71e0a8aab9	fix(core): use INSERT ON CONFLICT DO NOTHING for provider model sync (#9342 ) * fix(core): use INSERT ON CONFLICT DO NOTHING for provider model sync Replaces try/except around model.create_async() with pg_insert() .on_conflict_do_nothing() to prevent UniqueViolationError from being raised at the asyncpg driver level during concurrent model syncs. The previous approach caught the exception in Python but ddtrace still captured it at the driver level, causing Datadog error tracking noise. Fixes Datadog issue d8dec148-d535-11f0-95eb-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * cleaner impl * fix --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Ari Webb <ari@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	526da4c49b	Revert "perf: optimize prefix caching by skipping system prompt rebuild on every step" (#9380 ) Revert "perf: optimize prefix caching by skipping system prompt rebuild on ev…" This reverts commit eafa4144c2577a45b7007a177b701863b98d1dfa.	2026-02-24 10:52:07 -08:00
Sarah Wooders	9dbe28e8f1	perf: optimize prefix caching by skipping system prompt rebuild on every step (#9080 )	2026-02-24 10:52:07 -08:00
Kian Jones	47aedfa1a7	fix(core): convert MCP ConnectionError to LettaMCPConnectionError for proper HTTP 502 responses (#9364 ) MCP server connection failures were raising Python's builtin ConnectionError, which bypassed the LettaMCPConnectionError FastAPI exception handler and hit Datadog as unhandled 500 errors. Now all MCP client classes convert ConnectionError to LettaMCPConnectionError at the source, which the existing exception handler returns as a user-friendly 502. Datadog: https://us5.datadoghq.com/error-tracking/issue/93db4a82-fe5a-11f0-85f0-da7ad0900000 🐛 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Charles Packer	f58c4a43fa	fix(core): remove "edit again if necessary" from memory tool return message (#9367 ) Models (especially Opus) take this instruction literally and re-call the memory edit tool in a loop — one user saw 96 consecutive rethink calls. Dropping the sentence stops the feedback loop while still asking the agent to review the result. 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	369cdf72c7	feat(core): store block metadata as YAML frontmatter in .md files (#9365 ) * feat(core): store block metadata as YAML frontmatter in .md files Block .md files in git repos now embed metadata (description, limit, read_only, metadata dict) as YAML frontmatter instead of a separate metadata/blocks.json file. Only non-default values are rendered. Format: --- description: "Who I am" limit: 5000 --- Block value content here... Changes: - New block_markdown.py utility (serialize_block / parse_block_markdown) - Updated all three write/read paths: manager.py, memfs_client.py, memfs_client_base.py - block_manager_git.py now passes description/limit/read_only/metadata through to git commits - Post-push sync (git_http.py) parses frontmatter and syncs metadata fields to Postgres - Removed metadata/blocks.json reads/writes entirely - Backward compat: files without frontmatter treated as raw value - Integration test verifies frontmatter in cloned files and metadata sync via git push 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: derive frontmatter defaults from BaseBlock schema, not hardcoded dict Remove _DEFAULTS dict from block_markdown.py. The core version now imports BaseBlock and reads field defaults via model_fields. This fixes the limit default (was 5000, should be CORE_MEMORY_BLOCK_CHAR_LIMIT=20000). Also: - memfs-py copy simplified to parse-only (no serialize, no letta imports) - All hardcoded limit=5000 fallbacks replaced with CORE_MEMORY_BLOCK_CHAR_LIMIT - Test updated: blocks with all-default metadata correctly have no frontmatter; frontmatter verified after setting non-default description via API 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: always include description and limit in frontmatter description and limit are always rendered in the YAML frontmatter, even when at their default values. Only read_only and metadata are conditional (omitted when at defaults). 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: resolve read_only from block_update before git commit read_only was using the old Postgres value instead of the update value when committing to git. Also adds integration test coverage for read_only: true appearing in frontmatter after API PATCH, and verifying it's omitted when false (default). 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: add API→git round-trip coverage for description and limit Verifies that PATCH description/limit via API is reflected in frontmatter after git pull. Combined with the existing push→API test (step 6), this gives full bidirectional coverage: - API edit description/limit → pull → frontmatter updated - Push frontmatter with description/limit → API reflects changes 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
cthomas	9dee331e6c	fix(core): backfill missing blocks when git repo is incomplete (#9340 ) When enable_git_memory_for_agent is called on an agent that already has the git-memory-enabled tag, it was returning early if the repo existed, even if the repo was missing blocks. Now checks if all blocks are present in the repo and backfills any missing ones. 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	2cfaba3fe6	fix(core): validate file existence before creating files_agents rows (#9341 ) Prevents ForeignKeyViolationError when attaching files to agents where the file has been deleted between listing and attachment (race condition). Now validates file IDs exist in the files table before inserting, and skips any missing files with a warning log. Fixes Datadog issue a1768774-d691-11f0-9330-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	cb982cae89	fix(core): skip duplicate block attachment to prevent UniqueViolationError (#9343 ) Check if a block is already attached to an agent before appending to core_memory. Prevents asyncpg UniqueViolationError on the unique_agent_block constraint when attach_block_async is called twice with the same (agent_id, block_id) pair. Fixes Datadog issue d8dec148-d535-11f0-95eb-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	93e453ef8f	fix(core): transform nested block labels on git memory enable (#9339 ) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	98fa16899b	fix(core): handle ExceptionGroup-wrapped ToolError in MCP clients (#9329 ) Extends the ExceptionGroup unwrapping fix from mcp_tool_executor to the base MCP client implementations (AsyncBaseMCPClient, AsyncFastMCPSSEClient, AsyncFastMCPStreamableHTTPClient). When ToolError exceptions are wrapped in ExceptionGroup by Python's async TaskGroup, the exception handler now unwraps single-exception groups before checking class names. This prevents wrapped ToolError exceptions from being logged to Datadog as unexpected errors instead of being handled as expected validation failures. Related to commit 1cbf1b231 which fixed the same issue in mcp_tool_executor. 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
cthomas	05b77a5fed	fix(core): update memfs client to use memory/ directory (#9322 ) PR #9309 changed the block storage from blocks/ to memory/ directory. Update memfs_client.py and memfs_client_base.py to match. 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Kian Jones	662ec082cf	fix(core): handle MCP errors and API key whitespace (#9306 ) * fix: strip whitespace from API keys in LLM client headers Fixes httpx.LocalProtocolError when API keys contain leading/trailing whitespace. Strips whitespace from API keys before using them in HTTP headers across: - OpenAI client (openai.py) - Mistral client (mistral.py) - Anthropic client (anthropic_client.py) - Anthropic schema provider (schemas/providers/anthropic.py) - Google AI client (google_ai_client.py) - Proxy helpers (proxy_helpers.py) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: handle McpError gracefully in MCP client execute_tool Return error as failed result instead of re-raising to avoid Datadog alerts for expected user-facing errors like missing tool arguments. * fix: strip whitespace from API keys before passing to httpx client Fixes httpx.LocalProtocolError by stripping leading/trailing whitespace from API keys before passing them to OpenAI/AsyncOpenAI clients. The OpenAI client library constructs Authorization headers internally, and invalid header values (like keys with leading spaces) cause protocol errors. Applied fix to: - azure_client.py (AzureOpenAI/AsyncAzureOpenAI) - deepseek_client.py (OpenAI/AsyncOpenAI) - openai_client.py (OpenAI/AsyncOpenAI via kwargs) - xai_client.py (OpenAI/AsyncOpenAI) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: handle JSONDecodeError in OpenAI client requests Catches json.JSONDecodeError from OpenAI SDK when API returns invalid JSON (typically HTML error pages from 500-series errors) and converts to LLMServerError with helpful details. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): strip API key whitespace at schema level on write/create Add field_validator to ProviderCreate, ProviderUpdate, and ProviderCheck schemas to strip whitespace from api_key and access_key fields before persistence. This ensures keys are clean at the point of entry, preventing whitespace from being encrypted and stored in the database. Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> * refactor: remove api_key.strip() calls across all LLM clients Remove redundant .strip() calls on api_key parameters since pydantic models now handle whitespace trimming at the validation layer. This centralizes the validation logic and follows DRY principles. - Updated 13 files across multiple LLM client implementations - Removed 34 occurrences of api_key.strip() - Includes: OpenAI, Anthropic, Azure, Google AI, Groq, XAI, DeepSeek, ZAI, Together, Mistral - Also updated proxy helpers and provider schemas 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: remove redundant ternary operators from api_key parameters Remove `if api_key else None` ternaries since pydantic validation ensures api_key is either a valid string or None. The ternary was defensive programming that's now unnecessary with proper model-level validation. - Simplified 23 occurrences across 7 files - Cleaner, more concise client initialization code - No behavioral change since pydantic already handles this 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>	2026-02-24 10:52:06 -08:00
cthomas	0bdd555f33	feat: add memfs-py service (#9315 ) * feat: add memfs-py service * add tf for bucket access and secrets v2 access * feat(memfs): add helm charts, deploy workflow, and bug fixes - Add dev helm chart (helm/dev/memfs-py/) with CSI secrets pattern - Update prod helm chart with CSI secrets and correct service account - Add GitHub Actions deploy workflow - Change port from 8284 to 8285 to avoid conflict with core's dulwich sidecar - Fix chunked transfer encoding issue (strip HTTP_TRANSFER_ENCODING header) - Fix timestamp parsing to handle both ISO and HTTP date formats - Fix get_head_sha to raise FileNotFoundError on 404 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Kian Jones <kian@letta.com> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Sarah Wooders	21e880907f	feat(core): structure memory directory and block labels [LET-7336] (#9309 )	2026-02-24 10:52:06 -08:00
Kian Jones	be60697a62	fix(core): handle protocol errors and foreign key violations (#9308 ) * fix(core): handle PermissionDeniedError in provider API key validation Fixed OpenAI PermissionDeniedError being raised as unknown error when validating provider API keys. The check_api_key methods in OpenAI-based providers (OpenAI, OpenRouter, Azure, Together) now properly catch and re-raise PermissionDeniedError as LLMPermissionDeniedError. 🐛 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): handle Unicode surrogates in OpenAI requests Sanitize invalid UTF-16 surrogates before sending requests to OpenAI API. Fixes UnicodeEncodeError when message content contains unpaired surrogates from corrupted emoji data or malformed Unicode sequences. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): handle MCP tool schema validation errors gracefully Catch fastmcp.exceptions.ToolError in execute_mcp_tool endpoint and convert to LettaInvalidArgumentError (400) instead of letting it propagate as 500 error. This is an expected user error when tool arguments don't match the MCP tool's schema. Fixes Datadog issue 8f2d874a-f8e5-11f0-9b25-da7ad0900000 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): handle ExceptionGroup-wrapped ToolError in MCP executor When MCP tools fail with validation errors (e.g., missing required parameters), fastmcp raises ToolError exceptions that may be wrapped in ExceptionGroup by Python's async TaskGroup. The exception handler now unwraps single-exception groups before checking if the error should be handled gracefully. Fixes Calendly API "organization parameter missing" errors being logged to Datadog instead of returning friendly error messages to users. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: handle missing agent in create_conversation to prevent foreign key violation * Update .gitignore --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Kian Jones	6f746c5225	fix(core): handle Anthropic overloaded errors and Unicode encoding issues (#9305 ) * fix: handle Anthropic overloaded_error in streaming interfaces * fix: handle Unicode surrogates in OpenAI requests Sanitize Unicode surrogate pairs before sending requests to OpenAI API. Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause UnicodeEncodeError when encoding to UTF-8. Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in position 326605: surrogates not allowed * fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests Improved sanitize_unicode_surrogates() to explicitly filter out lone surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8. Previous implementation used errors='ignore' which could still fail in edge cases. New approach directly checks Unicode code points and removes any surrogates before data reaches httpx encoding. Also added sanitization to stream_async_responses() method which was missing it. Fixes: 'utf-8' codec can't encode character '\ud83c' in position X: surrogates not allowed	2026-02-24 10:52:06 -08:00
jnjpng	0bdedb3c0f	feat: agent generate endpoint (#9304 ) * base * update * clean up * update	2026-02-24 10:52:06 -08:00
Ari Webb	5c6ca705f1	Revert "feat: bring back use message packing for timezone [LET-6846]" (#9302 ) Revert "feat: bring back use message packing for timezone [LET-6846] (#9256)" This reverts commit c5017cccdef95b84fc585b26a0ddc5b7e44eb7c9.	2026-02-24 10:52:06 -08:00
jnjpng	ff69c6a32e	feat: add /agents/{agent_id}/generate endpoint for direct LLM requests (#9272 ) * feat: add /agents/{agent_id}/generate endpoint for direct LLM requests Add new endpoint that makes direct LLM provider requests without agent context, memory, tools, or state modification. This enables: - Quick LLM queries without agent overhead - Testing model configurations - Simple chat completions using agent's credentials - Comparing responses across different models Features: - Uses agent's LLM config by default - Supports model override with full provider config resolution - Non-streaming, stateless operation - Proper error handling and validation - Request/response schemas with Pydantic validation Implementation: - Add GenerateRequest and GenerateResponse schemas - Implement generate_completion endpoint handler - Add necessary imports (LLMError, LLMClient, HandleNotFoundError) - Include logging and comprehensive error handling * fix: improve error handling and fix Message construction - Fix critical bug: use content=[TextContent(text=...)] instead of text=... - Add explicit error handling for NoResultFound and HandleNotFoundError - Add error handling for convert_response_to_chat_completion - Add structured logging for debugging - Remove unnecessary .get() calls since Pydantic validates messages * refactor: extract generate logic to AgentCompletionService Move the generate endpoint business logic out of the endpoint handler into a dedicated AgentCompletionService class for better code organization and separation of concerns. Changes: - Create new AgentCompletionService in services/agent_completion_service.py - Service handles all business logic: agent validation, LLM config resolution, message conversion, LLM client creation, and request/response processing - Integrate service with SyncServer initialization - Refactor generate_completion endpoint to use the service - Endpoint now only handles HTTP concerns (auth, error mapping) Benefits: - Cleaner endpoint code (reduced from ~140 lines to ~25 lines) - Better separation of concerns (HTTP vs business logic) - Service logic can be reused or tested independently - Follows established patterns in the codebase (AgentManager, etc.) * feat: simplify generate API to accept just prompt text Simplify the client interface by accepting a simple prompt string instead of requiring clients to format messages. Changes: - Update GenerateRequest schema: - Replace 'messages' array with simple 'prompt' string - Add optional 'system_prompt' for context/instructions - Keep 'override_model' for model selection - Update AgentCompletionService to format messages automatically: - Accepts prompt and optional system_prompt - Constructs message array internally (system + user messages) - Simpler API surface for clients - Update endpoint documentation with new simplified examples - Regenerate OpenAPI spec and TypeScript SDK Benefits: - Much simpler client experience - just send text - No need to understand message formatting - Still supports system prompts for context - Cleaner API that matches common use cases Example (before): { "messages": [{"role": "user", "content": "What is 2+2?"}] } Example (after): { "prompt": "What is 2+2?" } * test: add comprehensive integration tests for generate endpoint Add 9 integration tests covering various scenarios: Happy path tests: - test_agent_generate_basic: Basic prompt -> response flow - test_agent_generate_with_system_prompt: System prompt + user prompt - test_agent_generate_with_model_override: Override model selection - test_agent_generate_long_prompt: Handle longer prompts - test_agent_generate_no_persistence: Verify no messages saved to agent Error handling tests: - test_agent_generate_empty_prompt_error: Empty prompt validation (422) - test_agent_generate_invalid_agent_id: Invalid agent ID (404) - test_agent_generate_invalid_model_override: Invalid model handle (404) All tests verify: - Response structure (content, model, usage) - Proper status codes for errors - Usage statistics (tokens, counts) - No side effects on agent state Tests follow existing test patterns in test_client.py and use the letta_client SDK (assuming generate_completion method is auto-generated from the OpenAPI spec). * openapi * refactor: rename AgentCompletionService to AgentGenerateCompletionManager Rename for better clarity and consistency with codebase naming conventions: - Rename file: agent_completion_service.py → agent_generate_completion_manager.py - Rename class: AgentCompletionService → AgentGenerateCompletionManager - Rename attribute: server.agent_completion_service → server.agent_generate_completion_manager - Update docstrings: 'Service' → 'Manager' Changes: - apps/core/letta/services/agent_generate_completion_manager.py (renamed + updated class) - apps/core/letta/server/server.py (import + initialization) - apps/core/letta/server/rest_api/routers/v1/agents.py (usage in endpoint) No functional changes, purely a naming refactor. * fix: remove invalid Message parameters in generate manager Remove agent_id=None and user_id=None from Message construction. The Message model doesn't accept these as None values - only pass required parameters (role, content). Fixes validation error: 'Extra inputs are not permitted [type=extra_forbidden, input_value=None]' This aligns with other Message construction patterns in the codebase (see tools.py, memory.py examples). * feat: improve generate endpoint validation and tests - Add field validator for whitespace-only prompts - Always include system message (required by Anthropic) - Use default "You are a helpful assistant." when no system_prompt provided - Update tests to use direct HTTP calls via httpx - Fix test issues: - Use valid agent ID format (agent-{uuid}) - Use available model (openai/gpt-4o-mini) - Add whitespace validation test - All 9 integration tests passing	2026-02-24 10:52:06 -08:00
cthomas	530d33c254	feat: add skills support to agentfile (#9287 )	2026-02-24 10:52:06 -08:00
Sarah Wooders	e0a23f7039	feat: add usage columns to steps table (#9270 ) * feat: add usage columns to steps table Adds denormalized usage fields to the steps table for easier querying: - model_handle: The model handle (e.g., "openai/gpt-4o-mini") - cached_input_tokens: Tokens served from cache - cache_write_tokens: Tokens written to cache (Anthropic) - reasoning_tokens: Reasoning/thinking tokens These fields mirror LettaUsageStatistics and are extracted from the existing prompt_tokens_details and completion_tokens_details JSON columns. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: regenerate OpenAPI specs and SDK for usage columns 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com> Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>	2026-02-24 10:52:06 -08:00
Ari Webb	426f6a8ca4	feat: bring back use message packing for timezone [LET-6846] (#9256 ) * feat: bring back use message packing for timezone * add tests	2026-02-24 10:52:06 -08:00
Kian Jones	203b6ead7c	fix: remove duplicate provider trace logging and dead code (#9278 ) Provider traces were being created twice per step: 1. Via `request_async_with_telemetry` / `log_provider_trace_async` in LLMClient 2. Via direct `create_provider_trace_async` calls in LettaAgent This caused duplicate records in provider_trace_metadata (Postgres) and llm_traces (ClickHouse) for every agent step. Changes: - Remove redundant direct `create_provider_trace_async` calls from letta_agent.py - Remove no-op `stream_async_with_telemetry` method (was just a pass-through to `stream_async`) - Update callers to use `stream_async` directly 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Kian Jones	a2993da29a	chore: bump base timeout on upload from letta to crouton (#9275 ) bump base timeout since we'll fail on huge payloads at 5s	2026-02-24 10:52:06 -08:00
Sarah Wooders	50a60c1393	feat: git smart HTTP for agent memory repos (#9257 ) * feat(core): add git-backed memory repos and block manager Introduce a GCS-backed git repository per agent as the source of truth for core memory blocks. Add a GitEnabledBlockManager that writes block updates to git and syncs values back into Postgres as a cache. Default newly-created memory repos to the `main` branch. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat(core): serve memory repos over git smart HTTP Run dulwich's WSGI HTTPGitApplication on a local sidecar port and proxy /v1/git/* through FastAPI to support git clone/fetch/push directly against GCS-backed memory repos. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): create memory repos on demand and stabilize git HTTP - Ensure MemoryRepoManager creates the git repo on first write (instead of 500ing) and avoids rewriting history by only auto-creating on FileNotFoundError. - Simplify dulwich-thread async execution and auto-create empty repos on first git clone. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): make dulwich optional for CI installs Guard dulwich imports in the git smart HTTP router so the core server can boot (and CI tests can run) without installing the memory-repo extra. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): guard git HTTP WSGI init when dulwich missing Avoid instantiating dulwich's HTTPGitApplication at import time when dulwich isn't installed (common in CI installs). 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): avoid masking send_message errors in finally Initialize `result` before the agent loop so error paths (e.g. approval validation) don't raise UnboundLocalError in the run-tracking finally block. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): stop event loop watchdog on FastAPI shutdown Ensure the EventLoopWatchdog thread is stopped during FastAPI lifespan shutdown to avoid daemon threads logging during interpreter teardown (seen in CI unit tests). 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore(core): remove send__message_to_agent from SyncServer Drop send_message_to_agent and send_group_message_to_agent from SyncServer and route internal fire-and-forget messaging through send_messages helpers instead. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> fix(core): backfill git memory repo when tag added When an agent is updated to include the git-memory-enabled tag, ensure the git-backed memory repo is created and initialized from the agent's current blocks. Also support configuring the memory repo object store via LETTA_OBJECT_STORE_URI. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): preserve block tags on git-enabled updates When updating a block for a git-memory-enabled agent, keep block tags in sync with PostgreSQL (tags are not currently stored in the git repo). 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore(core): remove git-state legacy shims - Rename optional dependency extra from memory-repo to git-state - Drop legacy object-store env aliases and unused region config - Simplify memory repo metadata to a single canonical format - Remove unused repo-cache invalidation helper 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): keep PR scope for git-backed blocks - Revert unrelated change in fire-and-forget multi-agent send helper - Route agent block updates-by-label through injected block manager only when needed 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
amysguan	16c96cc3c0	Fix sliding window cutoff logic (#9261 ) * fix sliding window cutoff calculations to use agent instead of summarizer config * allow approval messages with tool_calls as valid cutoffs, prevent approval pairs from being split * update tests with updated sliding window parameters --------- Co-authored-by: Amy Guan <amy@letta.com>	2026-02-24 10:52:06 -08:00
Kian Jones	00b36bc591	fix: resolve crouton telemetry failures (#9269 ) Two issues were causing telemetry failures: 1. Startup race - memgpt-server sending telemetry before crouton created socket 2. Oversized payloads - large context windows (1M+ tokens) exceeding buffer Changes: - Increase crouton buffer to 128MB max with lazy allocation (64KB initial) - Bump crouton resources (512Mi limit, 128Mi request) - Add retry with exponential backoff in socket backend - Move crouton to initContainers with restartPolicy: Always for deterministic startup 🐙 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Sarah Wooders	eaf64fb510	fix: add LLMCallType enum and ensure call_type is set on all provider traces (#9258 ) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
cthomas	24e51f611c	fix: check for actual error content in ClickHouse traces (#9260 ) fix: check for actual error content, not just "error" key presence OpenAI Responses API returns {"error": null} on success, which incorrectly triggered is_error=True. Now we check if error_data is truthy rather than just checking key existence. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
jnjpng	f48b60634f	refactor: extract compact logic to shared function for temporal (#9249 ) * refactor: extract compact logic to shared function Extract the compaction logic from LettaAgentV3.compact() into a standalone compact_messages() function that can be shared between the agent and temporal workflows. Changes: - Create apps/core/letta/services/summarizer/compact.py with: - compact_messages(): Core compaction logic - build_summarizer_llm_config(): LLM config builder for summarization - CompactResult: Dataclass for compaction results - Update LettaAgentV3.compact() to use compact_messages() - Update temporal summarize_conversation_history activity to use compact_messages() instead of the old Summarizer class - Add use_summary_role parameter to SummarizeParams This ensures consistent summarization behavior across different execution paths and prevents drift as we improve the implementation. * chore: clean up verbose comments * fix: correct CompactionSettings import path * fix: correct count_tokens import from summarizer_sliding_window * fix: update test patch path for count_tokens_with_tools After extracting compact logic to compact.py, the test was patching the old location. Update the patch path to the new module location. * fix: update test to use build_summarizer_llm_config from compact.py The function was moved from LettaAgentV3._build_summarizer_llm_config to compact.py as a standalone function. * fix: add early check for system prompt size in compact_messages Check if the system prompt alone exceeds the context window before attempting summarization. The system prompt cannot be compacted, so fail fast with SystemPromptTokenExceededError. * fix: properly propagate SystemPromptTokenExceededError from compact The exception handler in _step() was not setting the correct stop_reason for SystemPromptTokenExceededError, which caused the finally block to return early and swallow the exception. Add special handling to set stop_reason to context_window_overflow_in_system_prompt when SystemPromptTokenExceededError is caught. * revert: remove redundant SystemPromptTokenExceededError handling The special handling in the outer exception handler is redundant because stop_reason is already set in the inner handler at line 943. The actual fix for the test was the early check in compact_messages(), not this redundant handling. * fix: correctly re-raise SystemPromptTokenExceededError The inner exception handler was using 'raise e' which re-raised the outer ContextWindowExceededError instead of the current SystemPromptTokenExceededError. Changed to 'raise' to correctly re-raise the current exception. This bug was pre-existing but masked because _check_for_system_prompt_overflow was only called as a fallback. The new early check in compact_messages() exposed it. * revert: remove early check and restore raise e to match main behavior * fix: set should_continue=False and correctly re-raise exception - Add should_continue=False in SystemPromptTokenExceededError handler (matching main's _check_for_system_prompt_overflow behavior) - Fix raise e -> raise to correctly propagate SystemPromptTokenExceededError Note: test_large_system_prompt_summarization still fails locally but passes on main. Need to investigate why exception isn't propagating correctly on refactored branch. * fix: add SystemPromptTokenExceededError handler for post-step compaction The post-step compaction (line 1066) was missing a SystemPromptTokenExceededError exception handler. When compact_messages() raised this error, it would be caught by the outer exception handler which would: 1. Set stop_reason to "error" instead of "context_window_overflow_in_system_prompt" 2. Not set should_continue = False 3. Get swallowed by the finally block (line 1126) which returns early This caused test_large_system_prompt_summarization to fail because the exception never propagated to the test. The fix adds the same exception handler pattern used in the retry compaction flow (line 941-946), ensuring proper state is set before re-raising. This issue only affected the refactored code because on main, _check_for_system_prompt_overflow() was an instance method that set should_continue/stop_reason BEFORE raising. In the refactor, compact_messages() is a standalone function that cannot set instance state, so the caller must handle the exception and set the state.	2026-02-24 10:52:06 -08:00
Sarah Wooders	3fdf2b6c79	chore: deprecate old agent messaging (#9120 )	2026-02-24 10:52:06 -08:00
Sarah Wooders	4096b30cd7	feat: log LLM traces to clickhouse (#9111 ) * feat: add non-streaming option for conversation messages - Add ConversationMessageRequest with stream=True default (backwards compatible) - stream=true (default): SSE streaming via StreamingService - stream=false: JSON response via AgentLoop.load().step() 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: regenerate API schema for ConversationMessageRequest * feat: add direct ClickHouse storage for raw LLM traces Adds ability to store raw LLM request/response payloads directly in ClickHouse, bypassing OTEL span attribute size limits. This enables debugging and analytics on large LLM payloads (>10MB system prompts, large tool schemas, etc.). New files: - letta/schemas/llm_raw_trace.py: Pydantic schema with ClickHouse row helper - letta/services/llm_raw_trace_writer.py: Async batching writer (fire-and-forget) - letta/services/llm_raw_trace_reader.py: Reader with query methods - scripts/sql/clickhouse/llm_raw_traces.ddl: Production table DDL - scripts/sql/clickhouse/llm_raw_traces_local.ddl: Local dev DDL - apps/core/clickhouse-init.sql: Local dev initialization Modified: - letta/settings.py: Added 4 settings (store_llm_raw_traces, ttl, batch_size, flush_interval) - letta/llm_api/llm_client_base.py: Integration into request_async_with_telemetry - compose.yaml: Added ClickHouse service for local dev - justfile: Added clickhouse, clickhouse-cli, clickhouse-traces commands Feature disabled by default (LETTA_STORE_LLM_RAW_TRACES=false). Uses ZSTD(3) compression for 10-30x reduction on JSON payloads. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: address code review feedback for LLM raw traces Fixes based on code review feedback: 1. Fix ClickHouse endpoint parsing - default to secure=False for raw host:port inputs (was defaulting to HTTPS which breaks local dev) 2. Make raw trace writes truly fire-and-forget - use asyncio.create_task() instead of awaiting, so JSON serialization doesn't block request path 3. Add bounded queue (maxsize=10000) - prevents unbounded memory growth under load. Drops traces with warning if queue is full. 4. Fix deprecated asyncio usage - get_running_loop() instead of get_event_loop() 5. Add org_id fallback - use _telemetry_org_id if actor doesn't have it 6. Remove unused imports - json import in reader 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add missing asyncio import and simplify JSON serialization - Add missing 'import asyncio' that was causing 'name asyncio is not defined' error - Remove unnecessary clean_double_escapes() function - the JSON is stored correctly, the clickhouse-client CLI was just adding extra escaping when displaying - Update just clickhouse-trace to use Python client for correct JSON output 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: add clickhouse raw trace integration test * test: simplify clickhouse trace assertions * refactor: centralize usage parsing and stream error traces Use per-client usage helpers for raw trace extraction and ensure streaming errors log requests with error metadata. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: exercise provider usage parsing live Make live OpenAI/Anthropic/Gemini requests with credential gating and validate Anthropic cache usage mapping when present. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: fix usage parsing tests to pass - Use GoogleAIClient with GEMINI_API_KEY instead of GoogleVertexClient - Update model to gemini-2.0-flash (1.5-flash deprecated in v1beta) - Add tools=[] for Gemini/Anthropic build_request_data 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: extract_usage_statistics returns LettaUsageStatistics Standardize on LettaUsageStatistics as the canonical usage format returned by client helpers. Inline UsageStatistics construction for ChatCompletionResponse where needed. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add is_byok and llm_config_json columns to ClickHouse traces Extend llm_raw_traces table with: - is_byok (UInt8): Track BYOK vs base provider usage for billing analytics - llm_config_json (String, ZSTD): Store full LLM config for debugging and analysis This enables queries like: - BYOK usage breakdown by provider/model - Config parameter analysis (temperature, max_tokens, etc.) - Debugging specific request configurations * feat: add tests for error traces, llm_config_json, and cache tokens - Update llm_raw_trace_reader.py to query new columns (is_byok, cached_input_tokens, cache_write_tokens, reasoning_tokens, llm_config_json) - Add test_error_trace_stored_in_clickhouse to verify error fields - Add test_cache_tokens_stored_for_anthropic to verify cache token storage - Update existing tests to verify llm_config_json is stored correctly - Make llm_config required in log_provider_trace_async() - Simplify provider extraction to use provider_name directly 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * ci: add ClickHouse integration tests to CI pipeline - Add use-clickhouse option to reusable-test-workflow.yml - Add ClickHouse service container with otel database - Add schema initialization step using clickhouse-init.sql - Add ClickHouse env vars (CLICKHOUSE_ENDPOINT, etc.) - Add separate clickhouse-integration-tests job running integration_test_clickhouse_llm_raw_traces.py 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify provider and org_id extraction in raw trace writer - Use model_endpoint_type.value for provider (not provider_name) - Simplify org_id to just self.actor.organization_id (actor is always pydantic) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify LLMRawTraceWriter with _enabled flag - Check ClickHouse env vars once at init, set _enabled flag - Early return in write_async/flush_async if not enabled - Remove ValueError raises (never used) - Simplify _get_client (no validation needed since already checked) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add LLMRawTraceWriter shutdown to FastAPI lifespan Properly flush pending traces on graceful shutdown via lifespan instead of relying only on atexit handler. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add agent_tags column to ClickHouse traces Store agent tags as Array(String) for filtering/analytics by tag. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * cleanup * fix(ci): fix ClickHouse schema initialization in CI - Create database separately before loading SQL file - Remove CREATE DATABASE from SQL file (handled in CI step) - Add verification step to confirm table was created - Use -sf flag for curl to fail on HTTP errors 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify LLM trace writer with ClickHouse async_insert - Use ClickHouse async_insert for server-side batching instead of manual queue/flush loop - Sync cloud DDL schema with clickhouse-init.sql (add missing columns) - Remove redundant llm_raw_traces_local.ddl - Remove unused batch_size/flush_interval settings - Update tests for simplified writer Key changes: - async_insert=1, wait_for_async_insert=1 for reliable server-side batching - Simple per-trace retry with exponential backoff (max 3 retries) - ~150 lines removed from writer 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: consolidate ClickHouse direct writes into TelemetryManager backend - Add clickhouse_direct backend to provider_trace_backends - Remove duplicate ClickHouse write logic from llm_client_base.py - Configure via LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse_direct The clickhouse_direct backend: - Converts ProviderTrace to LLMRawTrace - Extracts usage stats from response JSON - Writes via LLMRawTraceWriter with async_insert 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: address PR review comments and fix llm_config bug Review comment fixes: - Rename clickhouse_direct -> clickhouse_analytics (clearer purpose) - Remove ClickHouse from OSS compose.yaml, create separate compose.clickhouse.yaml - Delete redundant scripts/test_llm_raw_traces.py (use pytest tests) - Remove unused llm_raw_traces_ttl_days setting (TTL handled in DDL) - Fix socket description leak in telemetry_manager docstring - Add cloud-only comment to clickhouse-init.sql - Update justfile to use separate compose file Bug fix: - Fix llm_config not being passed to ProviderTrace in telemetry - Now correctly populates provider, model, is_byok for all LLM calls - Affects both request_async_with_telemetry and log_provider_trace_async DDL optimizations: - Add secondary indexes (bloom_filter for agent_id, model, step_id) - Add minmax indexes for is_byok, is_error - Change model and error_type to LowCardinality for faster GROUP BY 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: rename llm_raw_traces -> llm_traces Address review feedback that "raw" is misleading since we denormalize fields. Renames: - Table: llm_raw_traces -> llm_traces - Schema: LLMRawTrace -> LLMTrace - Files: llm_raw_trace_{reader,writer}.py -> llm_trace_{reader,writer}.py - Setting: store_llm_raw_traces -> store_llm_traces 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: update workflow references to llm_traces Missed renaming table name in CI workflow files. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: update clickhouse_direct -> clickhouse_analytics in docstring 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove inaccurate OTEL size limit comments The 4MB limit is our own truncation logic, not an OTEL protocol limit. The real benefit is denormalized columns for analytics queries. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove local ClickHouse dev setup (cloud-only feature) - Delete clickhouse-init.sql and compose.clickhouse.yaml - Remove local clickhouse just commands - Update CI to use cloud DDL with MergeTree for testing clickhouse_analytics is a cloud-only feature. For local dev, use postgres backend. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: restore compose.yaml to match main 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: merge clickhouse_analytics into clickhouse backend Per review feedback - having two separate backends was confusing. Now the clickhouse backend: - Writes to llm_traces table (denormalized for cost analytics) - Reads from OTEL traces table (will cut over to llm_traces later) Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: correct path to DDL file in CI workflow 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: add provider index to DDL for faster filtering 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: configure telemetry backend in clickhouse tests Tests need to set telemetry_settings.provider_trace_backends to include 'clickhouse', otherwise traces are routed to default postgres backend. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: set provider_trace_backend field, not property provider_trace_backends is a computed property, need to set the underlying provider_trace_backend string field instead. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: error trace test and error_type extraction - Add TelemetryManager to error trace test so traces get written - Fix error_type extraction to check top-level before nested error dict 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use provider_trace.id for trace correlation across backends - Pass provider_trace.id to LLMTrace instead of auto-generating - Log warning if ID is missing (shouldn't happen, helps debug) - Fallback to new UUID only if not set 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: trace ID correlation and concurrency issues - Strip "provider_trace-" prefix from ID for UUID storage in ClickHouse - Add asyncio.Lock to serialize writes (clickhouse_connect not thread-safe) - Fix Anthropic prompt_tokens to include cached tokens for cost analytics - Log warning if provider_trace.id is missing 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Caren Thomas <carenthomas@gmail.com>	2026-02-24 10:52:06 -08:00
jnjpng	24ea7dbaed	feat: include tools as part of token estimate in compact (#9242 ) * base * fix	2026-02-24 10:52:06 -08:00
Ari Webb	0bbb9c9bc0	feat: add reasoning zai openrouter (#9189 ) * feat: add reasoning zai openrouter * add openrouter reasoning * stage + publish api * openrouter reasoning always on * revert * fix * remove reference * do	2026-02-24 10:52:06 -08:00
jnjpng	d28ccc0be6	feat: add summary message and event on compaction (#9144 ) * base * update * update * revert formatting * routes * legacy * fix * review * update	2026-02-24 10:52:05 -08:00
Kian Jones	b0f8e16ac0	fix: check both unique constraints in provider model sync (#9193 ) The sync_provider_models_async function was only checking for existing models by (handle, organization_id, model_type) before creating, but the database has a second unique constraint on (name, provider_id, model_type). This caused UniqueConstraintViolationError when a model with the same name/provider already existed under a different handle. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:05 -08:00
Kevin Lin	3a13c63f60	fix: preserve slashes in memory block paths (#9172 ) Remove .replace("/", "_") from memory block label processing to allow hierarchical memory organization with slashes. 🐾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
Kian Jones	c1a02fa180	feat: add metadata-only provider trace storage option (#9155 ) * feat: add metadata-only provider trace storage option Add support for writing provider traces to a lightweight metadata-only table (~1.5GB) instead of the full table (~725GB) since request/response JSON is now stored in GCS. - Add `LETTA_TELEMETRY_PROVIDER_TRACE_PG_METADATA_ONLY` setting - Create `provider_trace_metadata` table via alembic migration - Conditionally write to new table when flag is enabled - Include backfill script for migrating existing data 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: regenerate API spec and SDK * fix: use composite PK (created_at, id) for provider_trace_metadata Aligns with GCS partitioning structure (raw/date=YYYY-MM-DD/{id}.json.gz) and enables efficient date-range queries via the B-tree index. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * ammendments * fix: add bulk data copy to migration Copy existing provider_traces metadata in-migration instead of separate backfill script. Creates indexes after bulk insert for better performance. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: remove data copy from migration, create empty table only Old data stays in provider_traces, new writes go to provider_trace_metadata when flag is enabled. Full traces are in GCS anyway. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: address PR comments - Remove GCS mention from ProviderTraceMetadata docstring - Move metadata object creation outside session context 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: reads always use full provider_traces table The metadata_only flag should only control writes. Reads always go to the full table to avoid returning ProviderTraceMetadata where ProviderTrace is expected. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: enable metadata-only provider trace writes in prod Add LETTA_TELEMETRY_PROVIDER_TRACE_PG_METADATA_ONLY=true to all Helm values (memgpt-server and lettuce-py, prod and dev). 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
amysguan	69cad47e6a	fix: respect enable_reasoner setting from .af imports instead of falling back to model defaults (#9163 ) Co-authored-by: Amy Guan <amy@letta.com>	2026-01-29 12:44:04 -08:00
Ari Webb	a798cc90c4	fix: openrouter provider (#9166 ) * fix: openrouter provider * just stage publish api * web openapi	2026-01-29 12:44:04 -08:00
Ari Webb	9ce1249738	feat: openrouter byok (#9148 ) * feat: openrouter byok * new client is unnecessary * revert json diffs	2026-01-29 12:44:04 -08:00
Shubham Naik	bb2145c24c	connections (#9113 ) * chore: release code * chore: release code * chore: release code * chore: release code * chore: release code * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: change paths * chore: remote * chore: support multi project chat	2026-01-29 12:44:04 -08:00
Kian Jones	42b1e741dc	fix: prevent duplicate block attachment in sleeptime agents (#9150 ) Check if a block with the same label already exists before attaching to sleeptime agents. This prevents UniqueConstraintViolationError on the (agent_id, block_label) constraint when the same block is attached multiple times due to race conditions. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
Ari Webb	04e6d668ec	fix: make it so sync updates model_endpoint info (#9138 )	2026-01-29 12:44:04 -08:00

1 2 3 4 5 ...

1809 Commits