Commit Graph

560 Commits

Author SHA1 Message Date
Ari Webb
dd0e513951 fix: lazy load conversations [LET-7682] (#9629)
fix: lazy load conversations
2026-03-03 18:34:01 -08:00
Sarah Wooders
afbc416972 feat(core): add model/model_settings override fields to conversation create/update (#9607) 2026-02-24 10:55:26 -08:00
Kian Jones
f5c4ab50f4 chore: add ty + pre-commit hook and repeal even more ruff rules (#9504)
* auto fixes

* auto fix pt2 and transitive deps and undefined var checking locals()

* manual fixes (ignored or letta-code fixed)

* fix circular import

* remove all ignores, add FastAPI rules and Ruff rules

* add ty and precommit

* ruff stuff

* ty check fixes

* ty check fixes pt 2

* error on invalid
2026-02-24 10:55:11 -08:00
Kian Jones
25d54dd896 chore: enable F821, F401, W293 (#9503)
* auto fixes

* auto fix pt2 and transitive deps and undefined var checking locals()

* manual fixes (ignored or letta-code fixed)

* fix circular import
2026-02-24 10:55:08 -08:00
Kian Jones
0d42afa151 fix(core): catch LockNotAvailableError and return 409 instead of 500 (#9359)
Re-apply changes on top of latest main to resolve merge conflicts.

- Add DatabaseLockNotAvailableError custom exception in orm/errors.py
- Catch asyncpg LockNotAvailableError and pgcode 55P03 in _handle_dbapi_error
- Register FastAPI exception handler returning 409 with Retry-After header



🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
71e0a8aab9 fix(core): use INSERT ON CONFLICT DO NOTHING for provider model sync (#9342)
* fix(core): use INSERT ON CONFLICT DO NOTHING for provider model sync

Replaces try/except around model.create_async() with pg_insert()
.on_conflict_do_nothing() to prevent UniqueViolationError from being
raised at the asyncpg driver level during concurrent model syncs.
The previous approach caught the exception in Python but ddtrace still
captured it at the driver level, causing Datadog error tracking noise.

Fixes Datadog issue d8dec148-d535-11f0-95eb-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* cleaner impl

* fix

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Ari Webb <ari@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
4eb27e23b3 fix(core): add deadlock retry logic to ORM write operations (#9352)
Adds automatic retry with exponential backoff for PostgreSQL deadlock
errors (40P01) in all ORM write methods: create_async, update_async,
batch_create_async, hard_delete_async, and bulk_hard_delete_async.

For update_async, column values are snapshotted before the commit
attempt so they can be restored after rollback clears them.

Also adds DatabaseDeadlockError to _handle_dbapi_error as a fallback
when retries are exhausted.

Datadog: https://us5.datadoghq.com/error-tracking/issue/53ccdd7a-f0cc-11f0-8969-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
2026-02-24 10:52:07 -08:00
Kian Jones
d592ec3135 fix: handle DBAPIError wrapping asyncpg DeadlockDetectedError (#9355)
SQLAlchemy wraps asyncpg's DeadlockDetectedError in a DBAPIError,
which was falling through to the generic 500 handler. Now detected
at both the ORM level (_handle_dbapi_error) and FastAPI handler level,
returning 409 with Retry-After header.

Datadog: https://us5.datadoghq.com/error-tracking/issue/2f1dc54c-dab6-11f0-a828-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Sarah Wooders
21e880907f feat(core): structure memory directory and block labels [LET-7336] (#9309) 2026-02-24 10:52:06 -08:00
Sarah Wooders
e0a23f7039 feat: add usage columns to steps table (#9270)
* feat: add usage columns to steps table

Adds denormalized usage fields to the steps table for easier querying:
- model_handle: The model handle (e.g., "openai/gpt-4o-mini")
- cached_input_tokens: Tokens served from cache
- cache_write_tokens: Tokens written to cache (Anthropic)
- reasoning_tokens: Reasoning/thinking tokens

These fields mirror LettaUsageStatistics and are extracted from the
existing prompt_tokens_details and completion_tokens_details JSON columns.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate OpenAPI specs and SDK for usage columns

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
2026-02-24 10:52:06 -08:00
Kian Jones
c1a02fa180 feat: add metadata-only provider trace storage option (#9155)
* feat: add metadata-only provider trace storage option

Add support for writing provider traces to a lightweight metadata-only
table (~1.5GB) instead of the full table (~725GB) since request/response
JSON is now stored in GCS.

- Add `LETTA_TELEMETRY_PROVIDER_TRACE_PG_METADATA_ONLY` setting
- Create `provider_trace_metadata` table via alembic migration
- Conditionally write to new table when flag is enabled
- Include backfill script for migrating existing data

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate API spec and SDK

* fix: use composite PK (created_at, id) for provider_trace_metadata

Aligns with GCS partitioning structure (raw/date=YYYY-MM-DD/{id}.json.gz)
and enables efficient date-range queries via the B-tree index.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* ammendments

* fix: add bulk data copy to migration

Copy existing provider_traces metadata in-migration instead of separate
backfill script. Creates indexes after bulk insert for better performance.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: remove data copy from migration, create empty table only

Old data stays in provider_traces, new writes go to provider_trace_metadata
when flag is enabled. Full traces are in GCS anyway.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: address PR comments

- Remove GCS mention from ProviderTraceMetadata docstring
- Move metadata object creation outside session context

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: reads always use full provider_traces table

The metadata_only flag should only control writes. Reads always go to
the full table to avoid returning ProviderTraceMetadata where
ProviderTrace is expected.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: enable metadata-only provider trace writes in prod

Add LETTA_TELEMETRY_PROVIDER_TRACE_PG_METADATA_ONLY=true to all
Helm values (memgpt-server and lettuce-py, prod and dev).

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
Kian Jones
0099a95a43 fix(sec): first pass of ensuring actor id is required everywhere (#9126)
first pass of ensuring actor id is required
2026-01-29 12:44:04 -08:00
Kian Jones
e3fb00f970 feat(crouton): add orgId, userId, Compaction_Settings and LLM_Config (#9022)
* LC one shot?

* api changes

* fix summarizer nameerror
2026-01-29 12:44:04 -08:00
Ari Webb
5c06918042 fix: don't need embedding model for self hosted [LET-7009] (#8935)
* fix: don't need embedding model for self hosted

* stage publish api

* passes tests

* add test

* remove unnecessary upgrades

* update revision order db migrations

* add timeout for ci
2026-01-29 12:44:04 -08:00
Ari Webb
4ec6649caf feat: byok provider models in db also (#8317)
* feat: byok provider models in db also

* make tests and sync api

* fix inconsistent state with recreating provider of same name

* fix sync on byok creation

* update revision

* move stripe code for testing purposes

* revert

* add refresh byok models endpoint

* just stage publish api

* add tests

* reorder revision

* add test for name clashes
2026-01-29 12:43:53 -08:00
Kian Jones
2ee28c3264 feat: add telemetry source identifier (#8918)
* add telemetry source

* add source to provider trave
2026-01-19 15:54:44 -08:00
Kian Jones
9418ab9815 feat: add provider trace backend abstraction for multi-backend telemetry (#8814)
* feat: add provider trace backend abstraction for multi-backend telemetry

Introduces a pluggable backend system for provider traces:
- Base class with async/sync create and read interfaces
- PostgreSQL backend (existing behavior)
- ClickHouse backend (via OTEL instrumentation)
- Socket backend (writes to Unix socket for crouton sidecar)
- Factory for instantiating backends from config

Refactors TelemetryManager to use backends with support for:
- Multi-backend writes (concurrent via asyncio.gather)
- Primary backend for reads (first in config list)
- Graceful error handling per backend

Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND (comma-separated)
Example: "postgres,socket" for dual-write to Postgres and crouton

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add protocol version to socket backend records

Adds PROTOCOL_VERSION constant to socket backend:
- Included in every telemetry record sent to crouton
- Must match ProtocolVersion in apps/crouton/main.go
- Enables crouton to detect and reject incompatible messages

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: remove organization_id from ProviderTraceCreate calls

The organization_id is now handled via the actor parameter in the
telemetry manager, not through ProviderTraceCreate schema. This fixes
validation errors after changing ProviderTraceCreate to inherit from
BaseProviderTrace which forbids extra fields.

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* consolidate provider trace

* add clickhouse-connect to fix bug on main lmao

* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api

* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api

* consolidate provider trace

* consolidate provider trace bug fix

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:43 -08:00
Sarah Wooders
b888c4c17a feat: allow for conversation-level isolation of blocks (#8684)
* feat: add conversation_id parameter to context endpoint [LET-6989]

Add optional conversation_id query parameter to retrieve_agent_context_window.
When provided, the endpoint uses messages from the specific conversation
instead of the agent's default message_ids.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate SDK after context endpoint update [LET-6989]

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add isolated blocks support for conversations

Allows conversations to have their own copies of specific memory blocks (e.g., todo_list) that override agent defaults, enabling conversation-specific state isolation.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* undo

* update apis

* test

* cleanup

* fix tests

* simplify

* move override logic

* patch

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:39 -08:00
github-actions[bot]
2460b36f97 fix: handle asyncpg QueryCanceledError for statement timeouts (#8241)
The handle_db_timeout decorator only caught SQLAlchemy's TimeoutError
(for pool/connection timeouts) but not asyncpg's QueryCanceledError
which is thrown when PostgreSQL's statement_timeout kills a long-running
query.

This fix:
- Import asyncpg.exceptions.QueryCanceledError
- Update handle_db_timeout decorator to catch QueryCanceledError and wrap
  it in DatabaseTimeoutError
- Update _handle_dbapi_error method to also handle wrapped QueryCanceledError

Fixes #8108

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: datadog-official[bot] <datadog-official[bot]@users.noreply.github.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
2026-01-19 15:54:38 -08:00
cthomas
ab4ccfca31 feat: add tags support to blocks (#8474)
* feat: add tags support to blocks

* fix: add timestamps and org scoping to blocks_tags

Addresses PR feedback:

1. Migration: Added timestamps (created_at, updated_at), soft delete
   (is_deleted), audit fields (_created_by_id, _last_updated_by_id),
   and organization_id to blocks_tags table for filtering support.
   Follows SQLite baseline pattern (composite PK of block_id+tag, no
   separate id column) to avoid insert failures.

2. ORM: Relationship already correct with lazy="raise" to prevent
   implicit joins and passive_deletes=True for efficient CASCADE deletes.

3. Schema: Changed normalize_tags() from Any to dict for type safety.

4. SQLite: Added blocks_tags to SQLite baseline schema to prevent
   table-not-found errors.

5. Code: Updated all tag row inserts to include organization_id.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add ORM columns and update SQLite baseline for blocks_tags

Fixes test failures (CompileError: Unconsumed column names: organization_id):

1. ORM: Added organization_id, timestamps, audit fields to BlocksTags
   ORM model to match database schema from migrations.

2. SQLite baseline: Added full column set to blocks_tags (organization_id,
   timestamps, audit fields) to match PostgreSQL schema.

3. Test: Added 'tags' to expected Block schema fields.

This ensures SQLite and PostgreSQL have matching schemas and the ORM
can consume all columns that the code inserts.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* revert change to existing alembic migration

* fix: remove passive_deletes and SQLite support for blocks_tags

1. Removed passive_deletes=True from Block.tags relationship to match
   AgentsTags pattern (neither have ondelete CASCADE in DB schema).

2. Removed SQLite branch from _replace_block_pivot_rows_async since
   blocks_tags table is PostgreSQL-only (migration skips SQLite).

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* api sync

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:38 -08:00
cthomas
e964307f6a feat: add lazy=raise for passage-org relationship (#8482) 2026-01-12 10:57:49 -08:00
cthomas
cc975b5d15 fix: reuse db session when loading pending approval (#8363) 2026-01-12 10:57:48 -08:00
cthomas
9b359418d0 feat: add pending approval field on agent state (#8361)
* feat: add pending approval field on agent state

* test failures
2026-01-12 10:57:48 -08:00
cthomas
f87c607115 fix: add encrypted placeholder constant (#8354) 2026-01-12 10:57:48 -08:00
cthomas
a54513c343 feat: move decryption outside db session (#8323)
* feat: move decryption outside db session

* fix pydantic error
2026-01-12 10:57:48 -08:00
Sarah Wooders
87d920782f feat: add conversation and conversation_messages tables for concurrent messaging (#8182) 2026-01-12 10:57:48 -08:00
cthomas
3aaab90b4c feat: use bounded concurrency for decryption (#8296) 2026-01-12 10:57:48 -08:00
cthomas
3b0b2cbee1 fix: unbounded file to pydantic conversion (#8292)
* fix: unbounded file to pydantic conversion

* remove var name
2026-01-12 10:57:48 -08:00
cthomas
9be2ab2c3d fix: skip additional query in stale data error handling (#8171) 2026-01-12 10:57:48 -08:00
cthomas
e904330dde fix: step metrics db timeouts [LET-6697] (#8136)
fix: step metrics db timeouts
2026-01-12 10:57:47 -08:00
cthomas
d489d8837f fix: broken pagination for objects with missing created_at [LET-6699] (#8139)
fix: broken pagination for objects with missing created_at
2026-01-12 10:57:47 -08:00
cthomas
0dd1df306a fix: concurrent block update rollback [LET-6695] (#8133)
fix: concurrent block update rollback
2026-01-12 10:57:47 -08:00
cthomas
b6535b7590 fix: agent-runs relationship load causing timeouts [LET-6694] (#8129)
fix: agent-runs relationship load causing timeouts
2026-01-12 10:57:47 -08:00
Sarah Wooders
acd8dd7bcf feat: make embedding_config optional on agent creation (#7553)
* feat: make embedding_config optional on agent creation

- Remove requirement for embedding_config in agent creation
- Add EmbeddingConfigRequiredError for operations that need embeddings
- Add null checks in sleeptime agent creation, passage insert, archive creation
- Register new error in app.py exception handlers

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: update API schemas for optional embedding_config

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-12 10:57:19 -08:00
Ari Webb
4d8d9757aa feat: add request-id for steps [LET-6587] (#7349)
* feat: add request-id for steps

* order revisions correctly

* stage publish api
2026-01-12 10:57:19 -08:00
Ari Webb
79c2319403 chore: add project constraint on tools db (#7360) 2025-12-17 17:32:27 -08:00
jnjpng
25d75d6528 fix: update aysnc get plaintext callsites (#7069)
* base

* resolve

* fix

* fix
2025-12-17 17:31:02 -08:00
jnjpng
00ba2d09f3 refactor: migrate mcp_servers and mcp_oauth to encrypted-only columns (#6751)
* refactor: migrate mcp_servers and mcp_oauth to encrypted-only columns

Complete migration to encrypted-only storage for sensitive fields:

- Remove dual-write to plaintext columns (token, custom_headers,
  authorization_code, access_token, refresh_token, client_secret)
- Read only from _enc columns, not from plaintext fallback
- Remove helper methods (get_token_secret, set_token_secret, etc.)
- Remove Secret.from_db() and Secret.to_dict() methods
- Update tests to verify encrypted-only behavior

After this change, plaintext columns can be set to NULL manually
since they are no longer read from or written to.

* fix test

* rename

* update

* union

* fix test
2025-12-17 17:31:02 -08:00
Sarah Wooders
812bfd16dd Revert "feat: project_id uniqueness for tools" (#7007)
Revert "feat: project_id uniqueness for tools (#6604)"

This reverts commit 2c4b6397041e2c965493525fc52e056f10d1bdb6.
2025-12-15 12:03:09 -08:00
Kian Jones
bce1749408 fix: run PBKDF2 in thread pool to prevent event loop freeze (#6763)
* fix: run PBKDF2 in thread pool to prevent event loop freeze

Problem: Event loop freezes for 100-500ms during secret decryption, blocking
all HTTP requests and async operations. The diagnostic monitor detected the
main thread stuck in PBKDF2 HMAC SHA256 computation at:
  apps/core/letta/helpers/crypto_utils.py:51 (_derive_key)
  apps/core/letta/schemas/secret.py:161 (get_plaintext)

Root cause: PBKDF2 with 100k iterations is intentionally CPU-intensive for
security, but running it synchronously on the main thread blocks the event loop.

Stack trace showed:
  Thread 1 (Main): PBKDF2HMAC -> SHA256_Final -> sha256_block_data_order_avx2
  Event loop watchdog: Detected freeze at 01:11:44 (request started 01:12:03)

Solution:
1. Run PBKDF2 in ThreadPoolExecutor to avoid blocking event loop
2. Add async versions of encrypt/decrypt methods
3. Add LRU cache for derived keys (deterministic results)
4. Add async get_plaintext_async() method to Secret class

Changes:
- apps/core/letta/helpers/crypto_utils.py:
  - Added ThreadPoolExecutor for crypto operations
  - Added @lru_cache(maxsize=256) to _derive_key_cached()
  - Added _derive_key_async() using loop.run_in_executor()
  - Added encrypt_async() and decrypt_async() methods
  - Added warnings to sync methods about blocking behavior

- apps/core/letta/schemas/secret.py:
  - Added get_plaintext_async() method
  - Added warnings to get_plaintext() about blocking behavior

Benefits:
- Event loop no longer freezes during secret decryption
- HTTP requests continue processing while crypto runs in background
- Derived keys are cached, reducing CPU usage for repeated operations
- Backward compatible - sync methods still work for non-async code

Performance impact:
- Before: 100-500ms event loop block per decryption
- After: 100-500ms in thread pool (non-blocking) + LRU cache hits ~0.1ms

Next steps (follow-up PRs):
- Migrate all async callsites to use get_plaintext_async()
- Add metrics to track sync vs async usage
- Consider reducing PBKDF2 iterations if security allows

* update

* test

---------

Co-authored-by: Letta Bot <jinjpeng@gmail.com>
2025-12-15 12:03:09 -08:00
Ari Webb
c1aa01db6f feat: project_id uniqueness for tools (#6604)
* feat: project_id uniqueness for tools

* prevent double upsert of global tools

* use default project if no header for sdk

* reorder unique constraint for performance

* use separate session for check conflict

* feature flag adding project id header in cloud api

* add my migration after one on main

* remove comment

* stage and publish api

* web set project id just for tools

* includes instead of startswith
2025-12-15 12:03:09 -08:00
jnjpng
4be813b956 fix: migrate sandbox and agent environment variables to encrypted only (#6623)
* base

* remove unnnecessary db migration

* update

* fix

* update

* update

* comments

* fix

* revert

* anotha

---------

Co-authored-by: Letta Bot <noreply@letta.com>
2025-12-15 12:03:08 -08:00
Sarah Wooders
7ea297231a feat: add compaction_settings to agents (#6625)
* initial commit

* Add database migration for compaction_settings field

This migration adds the compaction_settings column to the agents table
to support customized summarization configuration for each agent.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix

* rename

* update apis

* fix tests

* update web test

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <kian@letta.com>
2025-12-15 12:02:34 -08:00
Ari Webb
4092820f3a feat: add project id scoping for tools backend changes (#6529) 2025-12-15 12:02:34 -08:00
Sarah Wooders
4f1fbe45aa feat: add index and concurrency control for tools (fixed alembic) (#6552) 2025-12-15 12:02:34 -08:00
Sarah Wooders
c8c06168e2 Revert "feat: add index and concurrency control for tools " (#6551)
Revert "feat: add index and concurrency control for tools  (#6547)"

This reverts commit f4abf8e061bc2f5e08853b5ce5775a7f8626463a.
2025-12-15 12:02:34 -08:00
Sarah Wooders
a2d3011d84 feat: add index and concurrency control for tools (#6547) 2025-12-15 12:02:34 -08:00
Charles Packer
131891e05f feat: add tracking of advanced usage data (eg caching) [LET-6372] (#6449)
* feat: init refactor

* feat: add helper code

* fix: missing file + test

* fix: just state/publish api
2025-12-15 12:02:19 -08:00
cthomas
6a47bf4946 feat: add fk reference for agent groups [LET-6260] (#6313)
feat: add fk reference for agent groups
2025-11-24 19:10:27 -08:00
cthomas
6dd3f6fecd feat: add back ref for mcp server on org (#6269) 2025-11-24 19:10:11 -08:00