Commit Graph

6709 Commits

Author SHA1 Message Date
Sarah Wooders
8729a037b9 fix: handle new openai overflow error format (#7110) 2025-12-17 17:31:02 -08:00
Sarah Wooders
f1bd246e9b feat: use token streaming for anthropic summarization (#7105) 2025-12-17 17:31:02 -08:00
Kevin Lin
857139f907 feat: Set reasonable defaults for max output tokens [LET-6483] (#7084) 2025-12-17 17:31:02 -08:00
jnjpng
00ba2d09f3 refactor: migrate mcp_servers and mcp_oauth to encrypted-only columns (#6751)
* refactor: migrate mcp_servers and mcp_oauth to encrypted-only columns

Complete migration to encrypted-only storage for sensitive fields:

- Remove dual-write to plaintext columns (token, custom_headers,
  authorization_code, access_token, refresh_token, client_secret)
- Read only from _enc columns, not from plaintext fallback
- Remove helper methods (get_token_secret, set_token_secret, etc.)
- Remove Secret.from_db() and Secret.to_dict() methods
- Update tests to verify encrypted-only behavior

After this change, plaintext columns can be set to NULL manually
since they are no longer read from or written to.

* fix test

* rename

* update

* union

* fix test
2025-12-17 17:31:02 -08:00
Kevin Lin
03a41f8e8d chore: Increase LLM streaming timeout [LET-6562] (#7080)
increase
2025-12-17 17:31:02 -08:00
Ari Webb
4878b49fa1 chore: bounds check for assistant message index (#7070) 2025-12-17 17:31:02 -08:00
Sooty
6f48d4bd48 Correct provider name for openai-proxy in LLMConfig (#3097) 2025-12-16 19:37:54 -08:00
cthomas
be53f15ce0 chore: bump v0.16.0 (#3095) 2025-12-15 12:12:23 -08:00
Caren Thomas
c99ff56abc chore: bump version v0.16.0 2025-12-15 12:04:32 -08:00
Sarah Wooders
bd9f3aca9b fix: fix prompt_acknowledgement usage and update summarization prompts (#7012) 2025-12-15 12:03:09 -08:00
Sarah Wooders
812bfd16dd Revert "feat: project_id uniqueness for tools" (#7007)
Revert "feat: project_id uniqueness for tools (#6604)"

This reverts commit 2c4b6397041e2c965493525fc52e056f10d1bdb6.
2025-12-15 12:03:09 -08:00
Sarah Wooders
0c0ba5d03d fix: remove letta-free embeddings from testing (#6870) 2025-12-15 12:03:09 -08:00
Charles Packer
33d39f4643 fix(core): patch usage data tracking for anthropic when context caching is on (#6997) 2025-12-15 12:03:09 -08:00
Sarah Wooders
a731e01e88 fix: use model instead of model_settings (#6834) 2025-12-15 12:03:09 -08:00
Sarah Wooders
a721a00899 feat: add agent_id to search results (#6867) 2025-12-15 12:03:09 -08:00
Kevin Lin
4b9485a484 feat: Add max tokens exceeded to stop reasons [LET-6480] (#6576) 2025-12-15 12:03:09 -08:00
cthomas
efac48e9ea feat: add zai proxy LET-6543 (#6836)
feat: add zai proxy
2025-12-15 12:03:09 -08:00
Kian Jones
bce1749408 fix: run PBKDF2 in thread pool to prevent event loop freeze (#6763)
* fix: run PBKDF2 in thread pool to prevent event loop freeze

Problem: Event loop freezes for 100-500ms during secret decryption, blocking
all HTTP requests and async operations. The diagnostic monitor detected the
main thread stuck in PBKDF2 HMAC SHA256 computation at:
  apps/core/letta/helpers/crypto_utils.py:51 (_derive_key)
  apps/core/letta/schemas/secret.py:161 (get_plaintext)

Root cause: PBKDF2 with 100k iterations is intentionally CPU-intensive for
security, but running it synchronously on the main thread blocks the event loop.

Stack trace showed:
  Thread 1 (Main): PBKDF2HMAC -> SHA256_Final -> sha256_block_data_order_avx2
  Event loop watchdog: Detected freeze at 01:11:44 (request started 01:12:03)

Solution:
1. Run PBKDF2 in ThreadPoolExecutor to avoid blocking event loop
2. Add async versions of encrypt/decrypt methods
3. Add LRU cache for derived keys (deterministic results)
4. Add async get_plaintext_async() method to Secret class

Changes:
- apps/core/letta/helpers/crypto_utils.py:
  - Added ThreadPoolExecutor for crypto operations
  - Added @lru_cache(maxsize=256) to _derive_key_cached()
  - Added _derive_key_async() using loop.run_in_executor()
  - Added encrypt_async() and decrypt_async() methods
  - Added warnings to sync methods about blocking behavior

- apps/core/letta/schemas/secret.py:
  - Added get_plaintext_async() method
  - Added warnings to get_plaintext() about blocking behavior

Benefits:
- Event loop no longer freezes during secret decryption
- HTTP requests continue processing while crypto runs in background
- Derived keys are cached, reducing CPU usage for repeated operations
- Backward compatible - sync methods still work for non-async code

Performance impact:
- Before: 100-500ms event loop block per decryption
- After: 100-500ms in thread pool (non-blocking) + LRU cache hits ~0.1ms

Next steps (follow-up PRs):
- Migrate all async callsites to use get_plaintext_async()
- Add metrics to track sync vs async usage
- Consider reducing PBKDF2 iterations if security allows

* update

* test

---------

Co-authored-by: Letta Bot <jinjpeng@gmail.com>
2025-12-15 12:03:09 -08:00
Ari Webb
c1aa01db6f feat: project_id uniqueness for tools (#6604)
* feat: project_id uniqueness for tools

* prevent double upsert of global tools

* use default project if no header for sdk

* reorder unique constraint for performance

* use separate session for check conflict

* feature flag adding project id header in cloud api

* add my migration after one on main

* remove comment

* stage and publish api

* web set project id just for tools

* includes instead of startswith
2025-12-15 12:03:09 -08:00
cthomas
22b9ed254a feat: skip persisting redundant messages for proxy (#6819) 2025-12-15 12:03:09 -08:00
Sarah Wooders
0634aa13a1 fix: avoid holding sessions open (#6769) 2025-12-15 12:03:09 -08:00
Sarah Wooders
c9ad2fd7c4 chore: move things to debug logging (#6610) 2025-12-15 12:03:09 -08:00
Ari Webb
fecf503ad9 feat: xhigh reasoning for gpt-5.2 (#6735) 2025-12-15 12:03:09 -08:00
cthomas
bffb9064b8 fix: step logging error (#6755) 2025-12-15 12:03:08 -08:00
cthomas
fd8e471b2e chore: improve logging for proxy (#6754) 2025-12-15 12:03:08 -08:00
cthomas
2dac75a223 fix: remove project id before proxying (#6750) 2025-12-15 12:03:08 -08:00
jnjpng
4be813b956 fix: migrate sandbox and agent environment variables to encrypted only (#6623)
* base

* remove unnnecessary db migration

* update

* fix

* update

* update

* comments

* fix

* revert

* anotha

---------

Co-authored-by: Letta Bot <noreply@letta.com>
2025-12-15 12:03:08 -08:00
cthomas
799ddc9fe8 chore: api sync (#6747) 2025-12-15 12:03:07 -08:00
cthomas
b3561631da feat: create agents with default project for proxy [LET-6488] (#6716)
* feat: create agents with default project for proxy

* make change less invasive
2025-12-15 12:02:53 -08:00
Kian Jones
0a19c4010d chore: bump from 14.1 to 15.2 for compaction settingsa (#6727)
bump from 14.1 to 15.2 for compaction settingsa
2025-12-15 12:02:51 -08:00
jnjpng
714c537dc5 chore: change e2b sandbox error logs from debug to warning (#6726)
Update log level for tool execution errors in e2b sandbox from debug
to warning for better visibility when troubleshooting issues.

Co-authored-by: Jin Peng <jinjpeng@users.noreply.github.com>
2025-12-15 12:02:34 -08:00
Sarah Wooders
7ea297231a feat: add compaction_settings to agents (#6625)
* initial commit

* Add database migration for compaction_settings field

This migration adds the compaction_settings column to the agents table
to support customized summarization configuration for each agent.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix

* rename

* update apis

* fix tests

* update web test

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <kian@letta.com>
2025-12-15 12:02:34 -08:00
Shubham Naik
4309ecf606 chore: list shceudled messages [LET-6497] (#6690)
* chore: list shceudled messages

* chore: list shceudled messages

* chore: fix type

* chore: fix

* chore: fix

---------

Co-authored-by: Shubham Naik <shub@memgpt.ai>
2025-12-15 12:02:34 -08:00
cthomas
1314e19286 feat: update system message for proxy [LET-6490] (#6714)
feat: update system message for proxy
2025-12-15 12:02:34 -08:00
Ari Webb
4d90f37f50 feat: add gpt-5.2 support (#6698) 2025-12-15 12:02:34 -08:00
jnjpng
b658c70063 test: add coverage for provider encryption without LETTA_ENCRYPTION_KEY (#6629)
Add tests to verify that providers work correctly when no encryption key
is configured. The Secret class stores values as plaintext in _enc columns
and retrieves them successfully, but this code path had no test coverage.

Co-authored-by: Letta Bot <noreply@letta.com>
2025-12-15 12:02:34 -08:00
Ari Webb
25dccc911e fix: base providers won't break pods still running main (#6631)
* fix: base providers won't break pods still running main

* just stage and publish api
2025-12-15 12:02:34 -08:00
Shubham Naik
67d1c9c135 chore: autogenerate-api (#6699)
Co-authored-by: Shubham Naik <shub@memgpt.ai>
2025-12-15 12:02:34 -08:00
Sarah Wooders
a2dfa5af17 fix: reorder summarization (#6606) 2025-12-15 12:02:34 -08:00
jnjpng
17a90538ca fix: exclude common API key prefixes from encryption detection (#6624)
* fix: exclude common API key prefixes from encryption detection

Add a list of known API key prefixes (OpenAI, Anthropic, GitHub, AWS,
Slack, etc.) to prevent is_encrypted() from incorrectly identifying
plaintext credentials as encrypted values.

* update

* test
2025-12-15 12:02:34 -08:00
Kian Jones
15cede7281 fix: prevent db connection pool exhaustion in multi-agent tool executor (#6619)
Problem: When executing a tool that sends messages to many agents matching
tags, the code used asyncio.gather to process all agents concurrently. Each
agent processing creates database operations (run creation, message storage),
leading to N concurrent database connections.

Example: If 100 agents match the tags, 100 simultaneous database connections
are created, exhausting the connection pool and causing errors.

Root cause: asyncio.gather(*[_process_agent(...) for agent in agents])
creates all coroutines and runs them concurrently, each opening a DB session.

Solution: Process agents sequentially instead of concurrently. While this is
slower, it prevents database connection pool exhaustion. The operation is
still async, so it won't block the event loop.

Changes:
- apps/core/letta/services/tool_executor/multi_agent_tool_executor.py:
  - Replaced asyncio.gather with sequential for loop
  - Added explanatory comment about why sequential processing is needed

Impact: With 100 matching agents:
- Before: 100 concurrent DB connections (pool exhaustion)
- After: 1 DB connection at a time (no pool exhaustion)

Note: This follows the same pattern as PR #6617 which fixed a similar issue
in file attachment operations.
2025-12-15 12:02:34 -08:00
Kian Jones
fbd89c9360 fix: replace all 'PRODUCTION' references with 'prod' for consistency (#6627)
* fix: replace all 'PRODUCTION' references with 'prod' for consistency

Problem: Codebase had 11 references to 'PRODUCTION' (uppercase) that should
use 'prod' (lowercase) for consistency with the deployment workflows and
environment normalization.

Changes across 8 files:

1. Source files (using settings.environment):
   - letta/functions/function_sets/multi_agent.py
   - letta/services/tool_manager.py
   - letta/services/tool_executor/multi_agent_tool_executor.py
   - letta/services/helpers/agent_manager_helper.py
   All checks changed from: settings.environment == "PRODUCTION"
   To: settings.environment == "prod"

2. OTEL resource configuration:
   - letta/otel/resource.py
     - Updated _normalize_environment_tag() to handle 'prod' directly
     - Removed 'PRODUCTION' -> 'prod' mapping (no longer needed)
     - Updated device.id check from _env != "PRODUCTION" to _env != "prod"

3. Test files:
   - tests/managers/conftest.py
     - Fixture parameter changed from "PRODUCTION" to "prod"
   - tests/managers/test_agent_manager.py (3 occurrences)
   - tests/managers/test_tool_manager.py (2 occurrences)
   All test checks changed to use "prod"

Result: Complete consistency across the codebase:
- All environment checks use "prod" instead of "PRODUCTION"
- Normalization function simplified (no special case for PRODUCTION)
- Tests use correct "prod" value
- Matches deployment workflow configuration from PR #6626

This completes the environment naming standardization effort.

* fix: update settings.py environment description to use 'prod' instead of 'PRODUCTION'

The field description still referenced PRODUCTION as an example value.
Updated to use lowercase 'prod' for consistency with actual usage.

Before: "Application environment (PRODUCTION, DEV, CANARY, etc. - normalized to lowercase for OTEL tags)"
After: "Application environment (prod, dev, canary, etc. - lowercase values used for OTEL tags)"
2025-12-15 12:02:34 -08:00
Kian Jones
08ccc8b399 fix: prevent db connection pool exhaustion in file status checks (#6620)
Problem: When listing files with status checking enabled, the code used
asyncio.gather to check and update status for all files concurrently. Each
status check may update the file in the database (e.g., for timeouts or
embedding completion), leading to N concurrent database connections.

Example: Listing 100 files with status checking creates 100 simultaneous
database update operations, exhausting the connection pool.

Root cause: asyncio.gather(*[check_and_update_file_status(f) for f in files])
processes all files concurrently, each potentially creating DB updates.

Solution: Check and update file status sequentially instead of concurrently.
While this is slower, it prevents database connection pool exhaustion when
listing many files.

Changes:
- apps/core/letta/services/file_manager.py:
  - Replaced asyncio.gather with sequential for loop
  - Added explanatory comment about db pool exhaustion prevention

Impact: With 100 files:
- Before: Up to 100 concurrent DB connections (pool exhaustion)
- After: 1 DB connection at a time (no pool exhaustion)

Note: This follows the same pattern as PR #6617 and #6619 which fixed
similar issues in file attachment and multi-agent tool execution.
2025-12-15 12:02:34 -08:00
Kian Jones
1a2e0aa8b7 fix: prevent db connection pool exhaustion in MCP server manager (#6622)
Problem: When creating an MCP server with many tools, the code used two
asyncio.gather calls - one for tool creation and one for mapping creation.
Each operation involves database INSERT/UPDATE, leading to 2N concurrent
database connections.

Example: An MCP server with 50 tools creates 50 + 50 = 100 simultaneous
database connections (tools + mappings), severely exhausting the pool.

Root cause:
1. asyncio.gather(*[create_mcp_tool_async(...) for tool in tools])
2. asyncio.gather(*[create_mcp_tool_mapping(...) for tool in results])
Both process operations concurrently, each opening a DB session.

Solution: Process tool creation and mapping sequentially in a single loop.
Create each tool, then immediately create its mapping if successful. This:
- Reduces connection count from 2N to 1
- Maintains proper error handling per tool
- Prevents database connection pool exhaustion

Changes:
- apps/core/letta/services/mcp_server_manager.py:
  - Replaced two asyncio.gather calls with single sequential loop
  - Create mapping immediately after each successful tool creation
  - Maintained return_exceptions=True behavior with try/except
  - Added explanatory comment about db pool exhaustion prevention

Impact: With 50 MCP tools:
- Before: 100 concurrent DB connections (50 tools + 50 mappings, pool exhaustion)
- After: 1 DB connection at a time (no pool exhaustion)

Note: This follows the same pattern as PR #6617, #6619, #6620, and #6621
which fixed similar issues throughout the codebase.
2025-12-15 12:02:34 -08:00
Kian Jones
43aa97b7d2 fix: prevent db connection pool exhaustion in MCP tool creation (#6621)
Problem: When creating an MCP server with many tools, the code used
asyncio.gather to create all tools concurrently. Each tool creation involves
database operations (INSERT with upsert logic), leading to N concurrent
database connections.

Example: An MCP server with 50 tools creates 50 simultaneous database
connections during server creation, exhausting the connection pool.

Root cause: asyncio.gather(*[create_mcp_tool_async(...) for tool in tools])
processes all tool creations concurrently, each opening a DB session.

Solution: Create tools sequentially instead of concurrently. While this takes
longer for server creation, it prevents database connection pool exhaustion
and maintains error handling by catching exceptions per tool.

Changes:
- apps/core/letta/services/mcp_manager.py:
  - Replaced asyncio.gather with sequential for loop
  - Maintained return_exceptions=True behavior with try/except
  - Added explanatory comment about db pool exhaustion prevention

Impact: With 50 MCP tools:
- Before: 50 concurrent DB connections (pool exhaustion)
- After: 1 DB connection at a time (no pool exhaustion)

Note: This follows the same pattern as PR #6617, #6619, and #6620 which
fixed similar issues in file operations, multi-agent execution, and file
status checks.
2025-12-15 12:02:34 -08:00
cthomas
0d77b373e6 fix: remove concurrent db writes for file upload (#6617) 2025-12-15 12:02:34 -08:00
jnjpng
3221ed8a14 fix: update base provider to only handle _enc fields (#6591)
* base

* update

* another pass

* fix

* generate

* fix test

* don't set on create

* last fixes

---------

Co-authored-by: Letta Bot <noreply@letta.com>
2025-12-15 12:02:34 -08:00
Shubham Naik
99126c6283 feat: add delete scheudle message handler [LET-6496] (#6589)
* feat: add delete scheudle message handler

* chore: scheduled messages

* chore: scheduled messages

* chore: upodate sources

---------

Co-authored-by: Shubham Naik <shub@memgpt.ai>
2025-12-15 12:02:34 -08:00
Sarah Wooders
c8fa77a01f feat: cleanup cancellation code and add more logging (#6588) 2025-12-15 12:02:34 -08:00
Sarah Wooders
70c57c5072 fix: various patches to summarizer (#6597) 2025-12-15 12:02:34 -08:00