letta-server

Author	SHA1	Message	Date
Sarah Wooders	8729a037b9	fix: handle new openai overflow error format (#7110 )	2025-12-17 17:31:02 -08:00
Sarah Wooders	f1bd246e9b	feat: use token streaming for anthropic summarization (#7105 )	2025-12-17 17:31:02 -08:00
Kevin Lin	857139f907	feat: Set reasonable defaults for max output tokens [LET-6483] (#7084 )	2025-12-17 17:31:02 -08:00
jnjpng	00ba2d09f3	refactor: migrate mcp_servers and mcp_oauth to encrypted-only columns (#6751 ) * refactor: migrate mcp_servers and mcp_oauth to encrypted-only columns Complete migration to encrypted-only storage for sensitive fields: - Remove dual-write to plaintext columns (token, custom_headers, authorization_code, access_token, refresh_token, client_secret) - Read only from _enc columns, not from plaintext fallback - Remove helper methods (get_token_secret, set_token_secret, etc.) - Remove Secret.from_db() and Secret.to_dict() methods - Update tests to verify encrypted-only behavior After this change, plaintext columns can be set to NULL manually since they are no longer read from or written to. * fix test * rename * update * union * fix test	2025-12-17 17:31:02 -08:00
Kevin Lin	03a41f8e8d	chore: Increase LLM streaming timeout [LET-6562] (#7080 ) increase	2025-12-17 17:31:02 -08:00
Ari Webb	4878b49fa1	chore: bounds check for assistant message index (#7070 )	2025-12-17 17:31:02 -08:00
Sooty	6f48d4bd48	Correct provider name for openai-proxy in LLMConfig (#3097 )	2025-12-16 19:37:54 -08:00
cthomas	be53f15ce0	chore: bump v0.16.0 (#3095 )	2025-12-15 12:12:23 -08:00
Caren Thomas	c99ff56abc	chore: bump version v0.16.0	2025-12-15 12:04:32 -08:00
Sarah Wooders	bd9f3aca9b	fix: fix `prompt_acknowledgement` usage and update summarization prompts (#7012 )	2025-12-15 12:03:09 -08:00
Sarah Wooders	812bfd16dd	Revert "feat: project_id uniqueness for tools" (#7007 ) Revert "feat: project_id uniqueness for tools (#6604)" This reverts commit 2c4b6397041e2c965493525fc52e056f10d1bdb6.	2025-12-15 12:03:09 -08:00
Sarah Wooders	0c0ba5d03d	fix: remove letta-free embeddings from testing (#6870 )	2025-12-15 12:03:09 -08:00
Charles Packer	33d39f4643	fix(core): patch usage data tracking for anthropic when context caching is on (#6997 )	2025-12-15 12:03:09 -08:00
Sarah Wooders	a731e01e88	fix: use `model` instead of `model_settings` (#6834 )	2025-12-15 12:03:09 -08:00
Sarah Wooders	a721a00899	feat: add `agent_id` to search results (#6867 )	2025-12-15 12:03:09 -08:00
Kevin Lin	4b9485a484	feat: Add max tokens exceeded to stop reasons [LET-6480] (#6576 )	2025-12-15 12:03:09 -08:00
cthomas	efac48e9ea	feat: add zai proxy LET-6543 (#6836 ) feat: add zai proxy	2025-12-15 12:03:09 -08:00
Kian Jones	bce1749408	fix: run PBKDF2 in thread pool to prevent event loop freeze (#6763 ) * fix: run PBKDF2 in thread pool to prevent event loop freeze Problem: Event loop freezes for 100-500ms during secret decryption, blocking all HTTP requests and async operations. The diagnostic monitor detected the main thread stuck in PBKDF2 HMAC SHA256 computation at: apps/core/letta/helpers/crypto_utils.py:51 (_derive_key) apps/core/letta/schemas/secret.py:161 (get_plaintext) Root cause: PBKDF2 with 100k iterations is intentionally CPU-intensive for security, but running it synchronously on the main thread blocks the event loop. Stack trace showed: Thread 1 (Main): PBKDF2HMAC -> SHA256_Final -> sha256_block_data_order_avx2 Event loop watchdog: Detected freeze at 01:11:44 (request started 01:12:03) Solution: 1. Run PBKDF2 in ThreadPoolExecutor to avoid blocking event loop 2. Add async versions of encrypt/decrypt methods 3. Add LRU cache for derived keys (deterministic results) 4. Add async get_plaintext_async() method to Secret class Changes: - apps/core/letta/helpers/crypto_utils.py: - Added ThreadPoolExecutor for crypto operations - Added @lru_cache(maxsize=256) to _derive_key_cached() - Added _derive_key_async() using loop.run_in_executor() - Added encrypt_async() and decrypt_async() methods - Added warnings to sync methods about blocking behavior - apps/core/letta/schemas/secret.py: - Added get_plaintext_async() method - Added warnings to get_plaintext() about blocking behavior Benefits: - Event loop no longer freezes during secret decryption - HTTP requests continue processing while crypto runs in background - Derived keys are cached, reducing CPU usage for repeated operations - Backward compatible - sync methods still work for non-async code Performance impact: - Before: 100-500ms event loop block per decryption - After: 100-500ms in thread pool (non-blocking) + LRU cache hits ~0.1ms Next steps (follow-up PRs): - Migrate all async callsites to use get_plaintext_async() - Add metrics to track sync vs async usage - Consider reducing PBKDF2 iterations if security allows * update * test --------- Co-authored-by: Letta Bot <jinjpeng@gmail.com>	2025-12-15 12:03:09 -08:00
Ari Webb	c1aa01db6f	feat: project_id uniqueness for tools (#6604 ) * feat: project_id uniqueness for tools * prevent double upsert of global tools * use default project if no header for sdk * reorder unique constraint for performance * use separate session for check conflict * feature flag adding project id header in cloud api * add my migration after one on main * remove comment * stage and publish api * web set project id just for tools * includes instead of startswith	2025-12-15 12:03:09 -08:00
cthomas	22b9ed254a	feat: skip persisting redundant messages for proxy (#6819 )	2025-12-15 12:03:09 -08:00
Sarah Wooders	0634aa13a1	fix: avoid holding sessions open (#6769 )	2025-12-15 12:03:09 -08:00
Sarah Wooders	c9ad2fd7c4	chore: move things to debug logging (#6610 )	2025-12-15 12:03:09 -08:00
Ari Webb	fecf503ad9	feat: xhigh reasoning for gpt-5.2 (#6735 )	2025-12-15 12:03:09 -08:00
cthomas	bffb9064b8	fix: step logging error (#6755 )	2025-12-15 12:03:08 -08:00
cthomas	fd8e471b2e	chore: improve logging for proxy (#6754 )	2025-12-15 12:03:08 -08:00
cthomas	2dac75a223	fix: remove project id before proxying (#6750 )	2025-12-15 12:03:08 -08:00
jnjpng	4be813b956	fix: migrate sandbox and agent environment variables to encrypted only (#6623 ) * base * remove unnnecessary db migration * update * fix * update * update * comments * fix * revert * anotha --------- Co-authored-by: Letta Bot <noreply@letta.com>	2025-12-15 12:03:08 -08:00
cthomas	799ddc9fe8	chore: api sync (#6747 )	2025-12-15 12:03:07 -08:00
cthomas	b3561631da	feat: create agents with default project for proxy [LET-6488] (#6716 ) * feat: create agents with default project for proxy * make change less invasive	2025-12-15 12:02:53 -08:00
Kian Jones	0a19c4010d	chore: bump from 14.1 to 15.2 for compaction settingsa (#6727 ) bump from 14.1 to 15.2 for compaction settingsa	2025-12-15 12:02:51 -08:00
jnjpng	714c537dc5	chore: change e2b sandbox error logs from debug to warning (#6726 ) Update log level for tool execution errors in e2b sandbox from debug to warning for better visibility when troubleshooting issues. Co-authored-by: Jin Peng <jinjpeng@users.noreply.github.com>	2025-12-15 12:02:34 -08:00
Sarah Wooders	7ea297231a	feat: add `compaction_settings` to agents (#6625 ) * initial commit * Add database migration for compaction_settings field This migration adds the compaction_settings column to the agents table to support customized summarization configuration for each agent. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix * rename * update apis * fix tests * update web test --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Kian Jones <kian@letta.com>	2025-12-15 12:02:34 -08:00
Shubham Naik	4309ecf606	chore: list shceudled messages [LET-6497] (#6690 ) * chore: list shceudled messages * chore: list shceudled messages * chore: fix type * chore: fix * chore: fix --------- Co-authored-by: Shubham Naik <shub@memgpt.ai>	2025-12-15 12:02:34 -08:00
cthomas	1314e19286	feat: update system message for proxy [LET-6490] (#6714 ) feat: update system message for proxy	2025-12-15 12:02:34 -08:00
Ari Webb	4d90f37f50	feat: add gpt-5.2 support (#6698 )	2025-12-15 12:02:34 -08:00
jnjpng	b658c70063	test: add coverage for provider encryption without LETTA_ENCRYPTION_KEY (#6629 ) Add tests to verify that providers work correctly when no encryption key is configured. The Secret class stores values as plaintext in _enc columns and retrieves them successfully, but this code path had no test coverage. Co-authored-by: Letta Bot <noreply@letta.com>	2025-12-15 12:02:34 -08:00
Ari Webb	25dccc911e	fix: base providers won't break pods still running main (#6631 ) * fix: base providers won't break pods still running main * just stage and publish api	2025-12-15 12:02:34 -08:00
Shubham Naik	67d1c9c135	chore: autogenerate-api (#6699 ) Co-authored-by: Shubham Naik <shub@memgpt.ai>	2025-12-15 12:02:34 -08:00
Sarah Wooders	a2dfa5af17	fix: reorder summarization (#6606 )	2025-12-15 12:02:34 -08:00
jnjpng	17a90538ca	fix: exclude common API key prefixes from encryption detection (#6624 ) * fix: exclude common API key prefixes from encryption detection Add a list of known API key prefixes (OpenAI, Anthropic, GitHub, AWS, Slack, etc.) to prevent is_encrypted() from incorrectly identifying plaintext credentials as encrypted values. * update * test	2025-12-15 12:02:34 -08:00
Kian Jones	15cede7281	fix: prevent db connection pool exhaustion in multi-agent tool executor (#6619 ) Problem: When executing a tool that sends messages to many agents matching tags, the code used asyncio.gather to process all agents concurrently. Each agent processing creates database operations (run creation, message storage), leading to N concurrent database connections. Example: If 100 agents match the tags, 100 simultaneous database connections are created, exhausting the connection pool and causing errors. Root cause: asyncio.gather(*[_process_agent(...) for agent in agents]) creates all coroutines and runs them concurrently, each opening a DB session. Solution: Process agents sequentially instead of concurrently. While this is slower, it prevents database connection pool exhaustion. The operation is still async, so it won't block the event loop. Changes: - apps/core/letta/services/tool_executor/multi_agent_tool_executor.py: - Replaced asyncio.gather with sequential for loop - Added explanatory comment about why sequential processing is needed Impact: With 100 matching agents: - Before: 100 concurrent DB connections (pool exhaustion) - After: 1 DB connection at a time (no pool exhaustion) Note: This follows the same pattern as PR #6617 which fixed a similar issue in file attachment operations.	2025-12-15 12:02:34 -08:00
Kian Jones	fbd89c9360	fix: replace all 'PRODUCTION' references with 'prod' for consistency (#6627 ) * fix: replace all 'PRODUCTION' references with 'prod' for consistency Problem: Codebase had 11 references to 'PRODUCTION' (uppercase) that should use 'prod' (lowercase) for consistency with the deployment workflows and environment normalization. Changes across 8 files: 1. Source files (using settings.environment): - letta/functions/function_sets/multi_agent.py - letta/services/tool_manager.py - letta/services/tool_executor/multi_agent_tool_executor.py - letta/services/helpers/agent_manager_helper.py All checks changed from: settings.environment == "PRODUCTION" To: settings.environment == "prod" 2. OTEL resource configuration: - letta/otel/resource.py - Updated _normalize_environment_tag() to handle 'prod' directly - Removed 'PRODUCTION' -> 'prod' mapping (no longer needed) - Updated device.id check from _env != "PRODUCTION" to _env != "prod" 3. Test files: - tests/managers/conftest.py - Fixture parameter changed from "PRODUCTION" to "prod" - tests/managers/test_agent_manager.py (3 occurrences) - tests/managers/test_tool_manager.py (2 occurrences) All test checks changed to use "prod" Result: Complete consistency across the codebase: - All environment checks use "prod" instead of "PRODUCTION" - Normalization function simplified (no special case for PRODUCTION) - Tests use correct "prod" value - Matches deployment workflow configuration from PR #6626 This completes the environment naming standardization effort. * fix: update settings.py environment description to use 'prod' instead of 'PRODUCTION' The field description still referenced PRODUCTION as an example value. Updated to use lowercase 'prod' for consistency with actual usage. Before: "Application environment (PRODUCTION, DEV, CANARY, etc. - normalized to lowercase for OTEL tags)" After: "Application environment (prod, dev, canary, etc. - lowercase values used for OTEL tags)"	2025-12-15 12:02:34 -08:00
Kian Jones	08ccc8b399	fix: prevent db connection pool exhaustion in file status checks (#6620 ) Problem: When listing files with status checking enabled, the code used asyncio.gather to check and update status for all files concurrently. Each status check may update the file in the database (e.g., for timeouts or embedding completion), leading to N concurrent database connections. Example: Listing 100 files with status checking creates 100 simultaneous database update operations, exhausting the connection pool. Root cause: asyncio.gather(*[check_and_update_file_status(f) for f in files]) processes all files concurrently, each potentially creating DB updates. Solution: Check and update file status sequentially instead of concurrently. While this is slower, it prevents database connection pool exhaustion when listing many files. Changes: - apps/core/letta/services/file_manager.py: - Replaced asyncio.gather with sequential for loop - Added explanatory comment about db pool exhaustion prevention Impact: With 100 files: - Before: Up to 100 concurrent DB connections (pool exhaustion) - After: 1 DB connection at a time (no pool exhaustion) Note: This follows the same pattern as PR #6617 and #6619 which fixed similar issues in file attachment and multi-agent tool execution.	2025-12-15 12:02:34 -08:00
Kian Jones	1a2e0aa8b7	fix: prevent db connection pool exhaustion in MCP server manager (#6622 ) Problem: When creating an MCP server with many tools, the code used two asyncio.gather calls - one for tool creation and one for mapping creation. Each operation involves database INSERT/UPDATE, leading to 2N concurrent database connections. Example: An MCP server with 50 tools creates 50 + 50 = 100 simultaneous database connections (tools + mappings), severely exhausting the pool. Root cause: 1. asyncio.gather([create_mcp_tool_async(...) for tool in tools]) 2. asyncio.gather([create_mcp_tool_mapping(...) for tool in results]) Both process operations concurrently, each opening a DB session. Solution: Process tool creation and mapping sequentially in a single loop. Create each tool, then immediately create its mapping if successful. This: - Reduces connection count from 2N to 1 - Maintains proper error handling per tool - Prevents database connection pool exhaustion Changes: - apps/core/letta/services/mcp_server_manager.py: - Replaced two asyncio.gather calls with single sequential loop - Create mapping immediately after each successful tool creation - Maintained return_exceptions=True behavior with try/except - Added explanatory comment about db pool exhaustion prevention Impact: With 50 MCP tools: - Before: 100 concurrent DB connections (50 tools + 50 mappings, pool exhaustion) - After: 1 DB connection at a time (no pool exhaustion) Note: This follows the same pattern as PR #6617, #6619, #6620, and #6621 which fixed similar issues throughout the codebase.	2025-12-15 12:02:34 -08:00
Kian Jones	43aa97b7d2	fix: prevent db connection pool exhaustion in MCP tool creation (#6621 ) Problem: When creating an MCP server with many tools, the code used asyncio.gather to create all tools concurrently. Each tool creation involves database operations (INSERT with upsert logic), leading to N concurrent database connections. Example: An MCP server with 50 tools creates 50 simultaneous database connections during server creation, exhausting the connection pool. Root cause: asyncio.gather(*[create_mcp_tool_async(...) for tool in tools]) processes all tool creations concurrently, each opening a DB session. Solution: Create tools sequentially instead of concurrently. While this takes longer for server creation, it prevents database connection pool exhaustion and maintains error handling by catching exceptions per tool. Changes: - apps/core/letta/services/mcp_manager.py: - Replaced asyncio.gather with sequential for loop - Maintained return_exceptions=True behavior with try/except - Added explanatory comment about db pool exhaustion prevention Impact: With 50 MCP tools: - Before: 50 concurrent DB connections (pool exhaustion) - After: 1 DB connection at a time (no pool exhaustion) Note: This follows the same pattern as PR #6617, #6619, and #6620 which fixed similar issues in file operations, multi-agent execution, and file status checks.	2025-12-15 12:02:34 -08:00
cthomas	0d77b373e6	fix: remove concurrent db writes for file upload (#6617 )	2025-12-15 12:02:34 -08:00
jnjpng	3221ed8a14	fix: update base provider to only handle _enc fields (#6591 ) * base * update * another pass * fix * generate * fix test * don't set on create * last fixes --------- Co-authored-by: Letta Bot <noreply@letta.com>	2025-12-15 12:02:34 -08:00
Shubham Naik	99126c6283	feat: add delete scheudle message handler [LET-6496] (#6589 ) * feat: add delete scheudle message handler * chore: scheduled messages * chore: scheduled messages * chore: upodate sources --------- Co-authored-by: Shubham Naik <shub@memgpt.ai>	2025-12-15 12:02:34 -08:00
Sarah Wooders	c8fa77a01f	feat: cleanup cancellation code and add more logging (#6588 )	2025-12-15 12:02:34 -08:00
Sarah Wooders	70c57c5072	fix: various patches to summarizer (#6597 )	2025-12-15 12:02:34 -08:00

1 2 3 4 5 ...

6709 Commits