Swap the order of @trace_method and @raise_on_invalid_id decorators
across all service managers so that @trace_method is always the first
wrapper applied to the function (positioned directly above the method).
This ensures the ID validation happens before tracing begins, which is
the intended execution order.
Files modified:
- agent_manager.py (23 occurrences)
- archive_manager.py (11 occurrences)
- block_manager.py (7 occurrences)
- file_manager.py (6 occurrences)
- group_manager.py (9 occurrences)
- identity_manager.py (10 occurrences)
- job_manager.py (7 occurrences)
- message_manager.py (2 occurrences)
- provider_manager.py (3 occurrences)
- sandbox_config_manager.py (7 occurrences)
- source_manager.py (5 occurrences)
- step_manager.py (13 occurrences)
* refactor: migrate mcp_servers and mcp_oauth to encrypted-only columns
Complete migration to encrypted-only storage for sensitive fields:
- Remove dual-write to plaintext columns (token, custom_headers,
authorization_code, access_token, refresh_token, client_secret)
- Read only from _enc columns, not from plaintext fallback
- Remove helper methods (get_token_secret, set_token_secret, etc.)
- Remove Secret.from_db() and Secret.to_dict() methods
- Update tests to verify encrypted-only behavior
After this change, plaintext columns can be set to NULL manually
since they are no longer read from or written to.
* fix test
* rename
* update
* union
* fix test
* fix: run PBKDF2 in thread pool to prevent event loop freeze
Problem: Event loop freezes for 100-500ms during secret decryption, blocking
all HTTP requests and async operations. The diagnostic monitor detected the
main thread stuck in PBKDF2 HMAC SHA256 computation at:
apps/core/letta/helpers/crypto_utils.py:51 (_derive_key)
apps/core/letta/schemas/secret.py:161 (get_plaintext)
Root cause: PBKDF2 with 100k iterations is intentionally CPU-intensive for
security, but running it synchronously on the main thread blocks the event loop.
Stack trace showed:
Thread 1 (Main): PBKDF2HMAC -> SHA256_Final -> sha256_block_data_order_avx2
Event loop watchdog: Detected freeze at 01:11:44 (request started 01:12:03)
Solution:
1. Run PBKDF2 in ThreadPoolExecutor to avoid blocking event loop
2. Add async versions of encrypt/decrypt methods
3. Add LRU cache for derived keys (deterministic results)
4. Add async get_plaintext_async() method to Secret class
Changes:
- apps/core/letta/helpers/crypto_utils.py:
- Added ThreadPoolExecutor for crypto operations
- Added @lru_cache(maxsize=256) to _derive_key_cached()
- Added _derive_key_async() using loop.run_in_executor()
- Added encrypt_async() and decrypt_async() methods
- Added warnings to sync methods about blocking behavior
- apps/core/letta/schemas/secret.py:
- Added get_plaintext_async() method
- Added warnings to get_plaintext() about blocking behavior
Benefits:
- Event loop no longer freezes during secret decryption
- HTTP requests continue processing while crypto runs in background
- Derived keys are cached, reducing CPU usage for repeated operations
- Backward compatible - sync methods still work for non-async code
Performance impact:
- Before: 100-500ms event loop block per decryption
- After: 100-500ms in thread pool (non-blocking) + LRU cache hits ~0.1ms
Next steps (follow-up PRs):
- Migrate all async callsites to use get_plaintext_async()
- Add metrics to track sync vs async usage
- Consider reducing PBKDF2 iterations if security allows
* update
* test
---------
Co-authored-by: Letta Bot <jinjpeng@gmail.com>
* feat: project_id uniqueness for tools
* prevent double upsert of global tools
* use default project if no header for sdk
* reorder unique constraint for performance
* use separate session for check conflict
* feature flag adding project id header in cloud api
* add my migration after one on main
* remove comment
* stage and publish api
* web set project id just for tools
* includes instead of startswith
Update log level for tool execution errors in e2b sandbox from debug
to warning for better visibility when troubleshooting issues.
Co-authored-by: Jin Peng <jinjpeng@users.noreply.github.com>
* initial commit
* Add database migration for compaction_settings field
This migration adds the compaction_settings column to the agents table
to support customized summarization configuration for each agent.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix
* rename
* update apis
* fix tests
* update web test
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <kian@letta.com>
Problem: When executing a tool that sends messages to many agents matching
tags, the code used asyncio.gather to process all agents concurrently. Each
agent processing creates database operations (run creation, message storage),
leading to N concurrent database connections.
Example: If 100 agents match the tags, 100 simultaneous database connections
are created, exhausting the connection pool and causing errors.
Root cause: asyncio.gather(*[_process_agent(...) for agent in agents])
creates all coroutines and runs them concurrently, each opening a DB session.
Solution: Process agents sequentially instead of concurrently. While this is
slower, it prevents database connection pool exhaustion. The operation is
still async, so it won't block the event loop.
Changes:
- apps/core/letta/services/tool_executor/multi_agent_tool_executor.py:
- Replaced asyncio.gather with sequential for loop
- Added explanatory comment about why sequential processing is needed
Impact: With 100 matching agents:
- Before: 100 concurrent DB connections (pool exhaustion)
- After: 1 DB connection at a time (no pool exhaustion)
Note: This follows the same pattern as PR #6617 which fixed a similar issue
in file attachment operations.
* fix: replace all 'PRODUCTION' references with 'prod' for consistency
Problem: Codebase had 11 references to 'PRODUCTION' (uppercase) that should
use 'prod' (lowercase) for consistency with the deployment workflows and
environment normalization.
Changes across 8 files:
1. Source files (using settings.environment):
- letta/functions/function_sets/multi_agent.py
- letta/services/tool_manager.py
- letta/services/tool_executor/multi_agent_tool_executor.py
- letta/services/helpers/agent_manager_helper.py
All checks changed from: settings.environment == "PRODUCTION"
To: settings.environment == "prod"
2. OTEL resource configuration:
- letta/otel/resource.py
- Updated _normalize_environment_tag() to handle 'prod' directly
- Removed 'PRODUCTION' -> 'prod' mapping (no longer needed)
- Updated device.id check from _env != "PRODUCTION" to _env != "prod"
3. Test files:
- tests/managers/conftest.py
- Fixture parameter changed from "PRODUCTION" to "prod"
- tests/managers/test_agent_manager.py (3 occurrences)
- tests/managers/test_tool_manager.py (2 occurrences)
All test checks changed to use "prod"
Result: Complete consistency across the codebase:
- All environment checks use "prod" instead of "PRODUCTION"
- Normalization function simplified (no special case for PRODUCTION)
- Tests use correct "prod" value
- Matches deployment workflow configuration from PR #6626
This completes the environment naming standardization effort.
* fix: update settings.py environment description to use 'prod' instead of 'PRODUCTION'
The field description still referenced PRODUCTION as an example value.
Updated to use lowercase 'prod' for consistency with actual usage.
Before: "Application environment (PRODUCTION, DEV, CANARY, etc. - normalized to lowercase for OTEL tags)"
After: "Application environment (prod, dev, canary, etc. - lowercase values used for OTEL tags)"
Problem: When listing files with status checking enabled, the code used
asyncio.gather to check and update status for all files concurrently. Each
status check may update the file in the database (e.g., for timeouts or
embedding completion), leading to N concurrent database connections.
Example: Listing 100 files with status checking creates 100 simultaneous
database update operations, exhausting the connection pool.
Root cause: asyncio.gather(*[check_and_update_file_status(f) for f in files])
processes all files concurrently, each potentially creating DB updates.
Solution: Check and update file status sequentially instead of concurrently.
While this is slower, it prevents database connection pool exhaustion when
listing many files.
Changes:
- apps/core/letta/services/file_manager.py:
- Replaced asyncio.gather with sequential for loop
- Added explanatory comment about db pool exhaustion prevention
Impact: With 100 files:
- Before: Up to 100 concurrent DB connections (pool exhaustion)
- After: 1 DB connection at a time (no pool exhaustion)
Note: This follows the same pattern as PR #6617 and #6619 which fixed
similar issues in file attachment and multi-agent tool execution.
Problem: When creating an MCP server with many tools, the code used two
asyncio.gather calls - one for tool creation and one for mapping creation.
Each operation involves database INSERT/UPDATE, leading to 2N concurrent
database connections.
Example: An MCP server with 50 tools creates 50 + 50 = 100 simultaneous
database connections (tools + mappings), severely exhausting the pool.
Root cause:
1. asyncio.gather(*[create_mcp_tool_async(...) for tool in tools])
2. asyncio.gather(*[create_mcp_tool_mapping(...) for tool in results])
Both process operations concurrently, each opening a DB session.
Solution: Process tool creation and mapping sequentially in a single loop.
Create each tool, then immediately create its mapping if successful. This:
- Reduces connection count from 2N to 1
- Maintains proper error handling per tool
- Prevents database connection pool exhaustion
Changes:
- apps/core/letta/services/mcp_server_manager.py:
- Replaced two asyncio.gather calls with single sequential loop
- Create mapping immediately after each successful tool creation
- Maintained return_exceptions=True behavior with try/except
- Added explanatory comment about db pool exhaustion prevention
Impact: With 50 MCP tools:
- Before: 100 concurrent DB connections (50 tools + 50 mappings, pool exhaustion)
- After: 1 DB connection at a time (no pool exhaustion)
Note: This follows the same pattern as PR #6617, #6619, #6620, and #6621
which fixed similar issues throughout the codebase.
Problem: When creating an MCP server with many tools, the code used
asyncio.gather to create all tools concurrently. Each tool creation involves
database operations (INSERT with upsert logic), leading to N concurrent
database connections.
Example: An MCP server with 50 tools creates 50 simultaneous database
connections during server creation, exhausting the connection pool.
Root cause: asyncio.gather(*[create_mcp_tool_async(...) for tool in tools])
processes all tool creations concurrently, each opening a DB session.
Solution: Create tools sequentially instead of concurrently. While this takes
longer for server creation, it prevents database connection pool exhaustion
and maintains error handling by catching exceptions per tool.
Changes:
- apps/core/letta/services/mcp_manager.py:
- Replaced asyncio.gather with sequential for loop
- Maintained return_exceptions=True behavior with try/except
- Added explanatory comment about db pool exhaustion prevention
Impact: With 50 MCP tools:
- Before: 50 concurrent DB connections (pool exhaustion)
- After: 1 DB connection at a time (no pool exhaustion)
Note: This follows the same pattern as PR #6617, #6619, and #6620 which
fixed similar issues in file operations, multi-agent execution, and file
status checks.
* Revert "fix test"
This reverts commit 5126815f23cefb4edad3e3bf9e7083209dcc7bf1.
* fix server and better test
* test fix, get api key for base and byok?
* set letta default endpoint
* try to fix timeout for test
* fix for letta api key
* Delete apps/core/tests/sdk_v1/conftest.py
* Update utils.py
* clean up a few issues
* fix filterning on list_llm_models
* soft delete models with provider
* add one more test
* fix ci
* add timeout
* band aid for letta embedding provider
* info instead of error logs when creating models
* fix: combined tool manager improvements - tracing and redundant fetches
This PR combines improvements from #6530 and #6535:
- Add tracer import to enable proper tracing spans
- Improve update check logic to verify actual field changes before updating
- Return current_tool directly when no update is needed (avoids redundant fetch)
- Add structured tracing spans to update_tool_by_id_async for better observability
- Fix decorator order for better error handling (raise_on_invalid_id before trace_method)
- Remove unnecessary tracing spans in create_or_update_tool_async
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* revert: remove tracing spans from update_tool_by_id_async
Remove the tracer span additions from update_tool_by_id_async while keeping
all other improvements (decorator order fix, redundant fetch removal, and
improved update check logic).
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta Bot <noreply@letta.com>
The docstring incorrectly stated that fetch_webpage uses Jina AI reader.
Updated to accurately describe the actual implementation which uses:
1. Exa API (if EXA_API_KEY is available)
2. Trafilatura (fallback)
3. Readability + html2text (final fallback)
🐾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* fix: clear message history no longer deletes messages
* toast and make it stay for 8 secs
* fix test
---------
Co-authored-by: Ari Webb <ari@letta.com>