Swap the order of @trace_method and @raise_on_invalid_id decorators
across all service managers so that @trace_method is always the first
wrapper applied to the function (positioned directly above the method).
This ensures the ID validation happens before tracing begins, which is
the intended execution order.
Files modified:
- agent_manager.py (23 occurrences)
- archive_manager.py (11 occurrences)
- block_manager.py (7 occurrences)
- file_manager.py (6 occurrences)
- group_manager.py (9 occurrences)
- identity_manager.py (10 occurrences)
- job_manager.py (7 occurrences)
- message_manager.py (2 occurrences)
- provider_manager.py (3 occurrences)
- sandbox_config_manager.py (7 occurrences)
- source_manager.py (5 occurrences)
- step_manager.py (13 occurrences)
* fix: run PBKDF2 in thread pool to prevent event loop freeze
Problem: Event loop freezes for 100-500ms during secret decryption, blocking
all HTTP requests and async operations. The diagnostic monitor detected the
main thread stuck in PBKDF2 HMAC SHA256 computation at:
apps/core/letta/helpers/crypto_utils.py:51 (_derive_key)
apps/core/letta/schemas/secret.py:161 (get_plaintext)
Root cause: PBKDF2 with 100k iterations is intentionally CPU-intensive for
security, but running it synchronously on the main thread blocks the event loop.
Stack trace showed:
Thread 1 (Main): PBKDF2HMAC -> SHA256_Final -> sha256_block_data_order_avx2
Event loop watchdog: Detected freeze at 01:11:44 (request started 01:12:03)
Solution:
1. Run PBKDF2 in ThreadPoolExecutor to avoid blocking event loop
2. Add async versions of encrypt/decrypt methods
3. Add LRU cache for derived keys (deterministic results)
4. Add async get_plaintext_async() method to Secret class
Changes:
- apps/core/letta/helpers/crypto_utils.py:
- Added ThreadPoolExecutor for crypto operations
- Added @lru_cache(maxsize=256) to _derive_key_cached()
- Added _derive_key_async() using loop.run_in_executor()
- Added encrypt_async() and decrypt_async() methods
- Added warnings to sync methods about blocking behavior
- apps/core/letta/schemas/secret.py:
- Added get_plaintext_async() method
- Added warnings to get_plaintext() about blocking behavior
Benefits:
- Event loop no longer freezes during secret decryption
- HTTP requests continue processing while crypto runs in background
- Derived keys are cached, reducing CPU usage for repeated operations
- Backward compatible - sync methods still work for non-async code
Performance impact:
- Before: 100-500ms event loop block per decryption
- After: 100-500ms in thread pool (non-blocking) + LRU cache hits ~0.1ms
Next steps (follow-up PRs):
- Migrate all async callsites to use get_plaintext_async()
- Add metrics to track sync vs async usage
- Consider reducing PBKDF2 iterations if security allows
* update
* test
---------
Co-authored-by: Letta Bot <jinjpeng@gmail.com>
* claude coded first pass
* fix test cases to expect errors instead
* fix this
* let's see how letta-code did
* claude
* fix tests, remove dangling comments, retrofit all managers functions with decorator
* revert to main for these since we are not erroring on invalid tool and block ids
* reorder decorators
* finish refactoring test cases
* reorder agent_manager decorators and fix test tool manager
* add decorator on missing managers
* fix id sources
* remove redundant check
* uses enum now
* move to enum
* remove apps/core and apps/fern
* fix precommit
* add submodule updates in workflows
* submodule
* remove core tests
* update core revision
* Add submodules: true to all GitHub workflows
- Ensure all workflows can access git submodules
- Add submodules support to deployment, test, and CI workflows
- Fix YAML syntax issues in workflow files
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* remove core-lint
* upgrade core with latest main of oss
---------
Co-authored-by: Claude <noreply@anthropic.com>
* base requirements
* autofix
* Configure ruff for Python linting and formatting
- Set up minimal ruff configuration with basic checks (E, W, F, I)
- Add temporary ignores for common issues during migration
- Configure pre-commit hooks to use ruff with pass_filenames
- This enables gradual migration from black to ruff
* Delete sdj
* autofixed only
* migrate lint action
* more autofixed
* more fixes
* change precommit
* try changing the hook
* try this stuff