Commit Graph

1636 Commits

Author SHA1 Message Date
Sarah Wooders
3fdf2b6c79 chore: deprecate old agent messaging (#9120) 2026-02-24 10:52:06 -08:00
Sarah Wooders
4096b30cd7 feat: log LLM traces to clickhouse (#9111)
* feat: add non-streaming option for conversation messages

- Add ConversationMessageRequest with stream=True default (backwards compatible)
- stream=true (default): SSE streaming via StreamingService
- stream=false: JSON response via AgentLoop.load().step()

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate API schema for ConversationMessageRequest

* feat: add direct ClickHouse storage for raw LLM traces

    Adds ability to store raw LLM request/response payloads directly in ClickHouse,
    bypassing OTEL span attribute size limits. This enables debugging and analytics
    on large LLM payloads (>10MB system prompts, large tool schemas, etc.).

    New files:
    - letta/schemas/llm_raw_trace.py: Pydantic schema with ClickHouse row helper
    - letta/services/llm_raw_trace_writer.py: Async batching writer (fire-and-forget)
    - letta/services/llm_raw_trace_reader.py: Reader with query methods
    - scripts/sql/clickhouse/llm_raw_traces.ddl: Production table DDL
    - scripts/sql/clickhouse/llm_raw_traces_local.ddl: Local dev DDL
    - apps/core/clickhouse-init.sql: Local dev initialization

    Modified:
    - letta/settings.py: Added 4 settings (store_llm_raw_traces, ttl, batch_size, flush_interval)
    - letta/llm_api/llm_client_base.py: Integration into request_async_with_telemetry
    - compose.yaml: Added ClickHouse service for local dev
    - justfile: Added clickhouse, clickhouse-cli, clickhouse-traces commands

    Feature disabled by default (LETTA_STORE_LLM_RAW_TRACES=false).
    Uses ZSTD(3) compression for 10-30x reduction on JSON payloads.

    🤖 Generated with [Letta Code](https://letta.com)

    Co-Authored-By: Letta <noreply@letta.com>

* fix: address code review feedback for LLM raw traces

Fixes based on code review feedback:

1. Fix ClickHouse endpoint parsing - default to secure=False for raw host:port
   inputs (was defaulting to HTTPS which breaks local dev)

2. Make raw trace writes truly fire-and-forget - use asyncio.create_task()
   instead of awaiting, so JSON serialization doesn't block request path

3. Add bounded queue (maxsize=10000) - prevents unbounded memory growth
   under load. Drops traces with warning if queue is full.

4. Fix deprecated asyncio usage - get_running_loop() instead of get_event_loop()

5. Add org_id fallback - use _telemetry_org_id if actor doesn't have it

6. Remove unused imports - json import in reader

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add missing asyncio import and simplify JSON serialization

- Add missing 'import asyncio' that was causing 'name asyncio is not defined' error
- Remove unnecessary clean_double_escapes() function - the JSON is stored correctly,
  the clickhouse-client CLI was just adding extra escaping when displaying
- Update just clickhouse-trace to use Python client for correct JSON output

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: add clickhouse raw trace integration test

* test: simplify clickhouse trace assertions

* refactor: centralize usage parsing and stream error traces

Use per-client usage helpers for raw trace extraction and ensure streaming errors log requests with error metadata.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: exercise provider usage parsing live

Make live OpenAI/Anthropic/Gemini requests with credential gating and validate Anthropic cache usage mapping when present.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: fix usage parsing tests to pass

- Use GoogleAIClient with GEMINI_API_KEY instead of GoogleVertexClient
- Update model to gemini-2.0-flash (1.5-flash deprecated in v1beta)
- Add tools=[] for Gemini/Anthropic build_request_data

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: extract_usage_statistics returns LettaUsageStatistics

Standardize on LettaUsageStatistics as the canonical usage format returned by client helpers. Inline UsageStatistics construction for ChatCompletionResponse where needed.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add is_byok and llm_config_json columns to ClickHouse traces

Extend llm_raw_traces table with:
- is_byok (UInt8): Track BYOK vs base provider usage for billing analytics
- llm_config_json (String, ZSTD): Store full LLM config for debugging and analysis

This enables queries like:
- BYOK usage breakdown by provider/model
- Config parameter analysis (temperature, max_tokens, etc.)
- Debugging specific request configurations

* feat: add tests for error traces, llm_config_json, and cache tokens

- Update llm_raw_trace_reader.py to query new columns (is_byok,
  cached_input_tokens, cache_write_tokens, reasoning_tokens, llm_config_json)
- Add test_error_trace_stored_in_clickhouse to verify error fields
- Add test_cache_tokens_stored_for_anthropic to verify cache token storage
- Update existing tests to verify llm_config_json is stored correctly
- Make llm_config required in log_provider_trace_async()
- Simplify provider extraction to use provider_name directly

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* ci: add ClickHouse integration tests to CI pipeline

- Add use-clickhouse option to reusable-test-workflow.yml
- Add ClickHouse service container with otel database
- Add schema initialization step using clickhouse-init.sql
- Add ClickHouse env vars (CLICKHOUSE_ENDPOINT, etc.)
- Add separate clickhouse-integration-tests job running
  integration_test_clickhouse_llm_raw_traces.py

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify provider and org_id extraction in raw trace writer

- Use model_endpoint_type.value for provider (not provider_name)
- Simplify org_id to just self.actor.organization_id (actor is always pydantic)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify LLMRawTraceWriter with _enabled flag

- Check ClickHouse env vars once at init, set _enabled flag
- Early return in write_async/flush_async if not enabled
- Remove ValueError raises (never used)
- Simplify _get_client (no validation needed since already checked)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add LLMRawTraceWriter shutdown to FastAPI lifespan

Properly flush pending traces on graceful shutdown via lifespan
instead of relying only on atexit handler.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add agent_tags column to ClickHouse traces

Store agent tags as Array(String) for filtering/analytics by tag.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* cleanup

* fix(ci): fix ClickHouse schema initialization in CI

- Create database separately before loading SQL file
- Remove CREATE DATABASE from SQL file (handled in CI step)
- Add verification step to confirm table was created
- Use -sf flag for curl to fail on HTTP errors

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify LLM trace writer with ClickHouse async_insert

- Use ClickHouse async_insert for server-side batching instead of manual queue/flush loop
- Sync cloud DDL schema with clickhouse-init.sql (add missing columns)
- Remove redundant llm_raw_traces_local.ddl
- Remove unused batch_size/flush_interval settings
- Update tests for simplified writer

Key changes:
- async_insert=1, wait_for_async_insert=1 for reliable server-side batching
- Simple per-trace retry with exponential backoff (max 3 retries)
- ~150 lines removed from writer

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: consolidate ClickHouse direct writes into TelemetryManager backend

- Add clickhouse_direct backend to provider_trace_backends
- Remove duplicate ClickHouse write logic from llm_client_base.py
- Configure via LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse_direct

The clickhouse_direct backend:
- Converts ProviderTrace to LLMRawTrace
- Extracts usage stats from response JSON
- Writes via LLMRawTraceWriter with async_insert

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: address PR review comments and fix llm_config bug

Review comment fixes:
- Rename clickhouse_direct -> clickhouse_analytics (clearer purpose)
- Remove ClickHouse from OSS compose.yaml, create separate compose.clickhouse.yaml
- Delete redundant scripts/test_llm_raw_traces.py (use pytest tests)
- Remove unused llm_raw_traces_ttl_days setting (TTL handled in DDL)
- Fix socket description leak in telemetry_manager docstring
- Add cloud-only comment to clickhouse-init.sql
- Update justfile to use separate compose file

Bug fix:
- Fix llm_config not being passed to ProviderTrace in telemetry
- Now correctly populates provider, model, is_byok for all LLM calls
- Affects both request_async_with_telemetry and log_provider_trace_async

DDL optimizations:
- Add secondary indexes (bloom_filter for agent_id, model, step_id)
- Add minmax indexes for is_byok, is_error
- Change model and error_type to LowCardinality for faster GROUP BY

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: rename llm_raw_traces -> llm_traces

Address review feedback that "raw" is misleading since we denormalize fields.

Renames:
- Table: llm_raw_traces -> llm_traces
- Schema: LLMRawTrace -> LLMTrace
- Files: llm_raw_trace_{reader,writer}.py -> llm_trace_{reader,writer}.py
- Setting: store_llm_raw_traces -> store_llm_traces

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: update workflow references to llm_traces

Missed renaming table name in CI workflow files.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: update clickhouse_direct -> clickhouse_analytics in docstring

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: remove inaccurate OTEL size limit comments

The 4MB limit is our own truncation logic, not an OTEL protocol limit.
The real benefit is denormalized columns for analytics queries.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: remove local ClickHouse dev setup (cloud-only feature)

- Delete clickhouse-init.sql and compose.clickhouse.yaml
- Remove local clickhouse just commands
- Update CI to use cloud DDL with MergeTree for testing

clickhouse_analytics is a cloud-only feature. For local dev, use postgres backend.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: restore compose.yaml to match main

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: merge clickhouse_analytics into clickhouse backend

Per review feedback - having two separate backends was confusing.

Now the clickhouse backend:
- Writes to llm_traces table (denormalized for cost analytics)
- Reads from OTEL traces table (will cut over to llm_traces later)

Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: correct path to DDL file in CI workflow

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: add provider index to DDL for faster filtering

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: configure telemetry backend in clickhouse tests

Tests need to set telemetry_settings.provider_trace_backends to include
'clickhouse', otherwise traces are routed to default postgres backend.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: set provider_trace_backend field, not property

provider_trace_backends is a computed property, need to set the
underlying provider_trace_backend string field instead.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: error trace test and error_type extraction

- Add TelemetryManager to error trace test so traces get written
- Fix error_type extraction to check top-level before nested error dict

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use provider_trace.id for trace correlation across backends

- Pass provider_trace.id to LLMTrace instead of auto-generating
- Log warning if ID is missing (shouldn't happen, helps debug)
- Fallback to new UUID only if not set

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: trace ID correlation and concurrency issues

- Strip "provider_trace-" prefix from ID for UUID storage in ClickHouse
- Add asyncio.Lock to serialize writes (clickhouse_connect not thread-safe)
- Fix Anthropic prompt_tokens to include cached tokens for cost analytics
- Log warning if provider_trace.id is missing

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Caren Thomas <carenthomas@gmail.com>
2026-02-24 10:52:06 -08:00
jnjpng
e25a0c9cdf feat: update compact endpoint to store summary message (#9215)
* base

* add tests
2026-02-24 10:52:06 -08:00
jnjpng
d28ccc0be6 feat: add summary message and event on compaction (#9144)
* base

* update

* update

* revert formatting

* routes

* legacy

* fix

* review

* update
2026-02-24 10:52:05 -08:00
Ari Webb
7b0b1f2531 fix: warning (#9179)
* fix: warning

* just stage publish api

* note

* api
2026-02-24 10:52:05 -08:00
Kian Jones
00045afc3c chore: add statement timeout log (#9177)
* fix: remove duplicate 'Tag and push latest' step from crouton workflow

* startuplog

* Update apps/core/letta/server/rest_api/app.py
2026-01-29 12:44:04 -08:00
cthomas
d992aa0df4 fix: non-streaming conversation messages endpoint (#9159)
* fix: non-streaming conversation messages endpoint

**Problems:**
1. `AssertionError: run_id is required when enforce_run_id_set is True`
   - Non-streaming path didn't create a run before calling `step()`

2. `ResponseValidationError: Unable to extract tag using discriminator 'message_type'`
   - `response_model=LettaStreamingResponse` but non-streaming returns `LettaResponse`

**Fixes:**
1. Add run creation before calling `step()` (mirrors agents endpoint)
2. Set run_id in Redis for cancellation support
3. Pass `run_id` to `step()`
4. Change `response_model` from `LettaStreamingResponse` to `LettaResponse`
   (streaming returns `StreamingResponse` which bypasses response_model validation)

**Test:**
Added `test_conversation_non_streaming_raw_http` to verify the fix.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* api sync

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
cthomas
dd7e28fae6 fix: return 400 instead of 500 for image fetch errors (#9157)
**Problem:**
When a user sends a message with an image URL that times out or fails to
fetch, the server returns a 500 Internal Server Error with a generic message.
This is confusing because the user doesn't know what went wrong.

**Root Cause:**
`LettaImageFetchError` was not registered in the exception handlers, so it
bubbled up as an unhandled exception.

**Fix:**
Register `LettaImageFetchError` with the 400 Bad Request handler. Now users
get a clear error message like:
```
Failed to fetch image from https://...: Timeout after 2 attempts
```

This tells users exactly what went wrong so they can retry with a different
image or verify the URL is accessible.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
Shubham Naik
bb2145c24c connections (#9113)
* chore: release code

* chore: release code

* chore: release code

* chore: release code

* chore: release code

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: change paths

* chore: remote

* chore: support  multi project chat
2026-01-29 12:44:04 -08:00
Kian Jones
34eed72150 feat: add user id validation (#9128)
* add user id validation

* relax conversation id check to allow default while I'm here

* fix annotation validation

* -api changes
2026-01-29 12:44:04 -08:00
cthomas
5530d541f1 fix: skip default actor init on app start (#9125)
* fix: skip default actor init on app start

* opposite check
2026-01-29 12:44:04 -08:00
Kian Jones
0099a95a43 fix(sec): first pass of ensuring actor id is required everywhere (#9126)
first pass of ensuring actor id is required
2026-01-29 12:44:04 -08:00
github-actions[bot]
62a00cc672 fix: remove deprecation from agent passages endpoints (#9117)
* fix: remove deprecation from agent passages endpoints

The client.agent.passages endpoints (list, create, search, delete) were
incorrectly marked as deprecated. This would break significant amounts
of user code and negatively impact developer experience.

Fixes #9116

Co-authored-by: Ari Webb <AriWebb@users.noreply.github.com>

* stage publish api

---------

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Ari Webb <AriWebb@users.noreply.github.com>
Co-authored-by: Ari Webb <ari@letta.com>
2026-01-29 12:44:04 -08:00
cthomas
2bccd36382 Revert "fix: ensure stop_reason is always set and reduce noisy logs (… (#9086)
Revert "fix: ensure stop_reason is always set and reduce noisy logs (#9046)"

This reverts commit 4241a360579440d2697124ba69061d0e46ecc5e9.

**Problem:**
After the original change, caren-code-agent reported streams hanging
indefinitely. The trace shows ttft (time to first token) succeeds, but
the stream never closes.

**Root Cause (suspected):**
The change modified `is_complete=is_done` to `is_complete=saw_done`,
meaning error events no longer mark the stream as complete immediately.
This may cause timing issues where clients wait for more data before
the finalizer runs.

**Fix:**
Revert to the defensive "belt-and-suspenders" approach that always
appends [DONE]. The noisy logs are preferable to hanging streams.

The original comment noted: "Even if a previous chunk set `complete`,
an extra [DONE] is harmless and ensures SDKs that rely on explicit
[DONE] will exit."

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
github-actions[bot]
194c743223 refactor: rename stream to streaming in ConversationMessageRequest (#9063) 2026-01-29 12:44:04 -08:00
github-actions[bot]
1d1bb29a43 feat: add override_model support for agent file import (#9058) 2026-01-29 12:44:04 -08:00
Sarah Wooders
6c415b27f8 feat: add non-streaming option for conversation messages (#9044)
* feat: add non-streaming option for conversation messages

- Add ConversationMessageRequest with stream=True default (backwards compatible)
- stream=true (default): SSE streaming via StreamingService
- stream=false: JSON response via AgentLoop.load().step()

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate API schema for ConversationMessageRequest

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
cthomas
2a2e777807 fix: ensure stop_reason is always set and reduce noisy logs (#9046)
fix: consume [DONE] token after error events to prevent forced finalizer append

**Problem:**
Stream finalizer was frequently logging warning and appending forced [DONE]:
```
[Stream Finalizer] Appending forced [DONE] for run=run-xxx (saw_error=True,
saw_done=False, final_stop_reason=llm_api_error)
```

This happened on every error, even though streaming_service.py already yields
[DONE] after all error events.

**Root Cause:**
Line 266: `is_done = saw_done or saw_error` caused loop to break immediately
after seeing error event, BEFORE consuming the [DONE] chunk that follows:

```python
is_done = saw_done or saw_error
await writer.write_chunk(...)
if is_done:  # Breaks on error!
    break
```

Sequence:
1. streaming_service.py yields: `event: error\ndata: {...}\n\n`
2. Redis reader sees error → sets `saw_error=True`
3. Sets `is_done=True` and breaks
4. Never reads next chunk: `data: [DONE]\n\n`
5. Finalizer runs → `saw_done=False` → appends forced [DONE]

**Fix:**
1. Only break when `saw_done=True` (not `saw_error`) → allows consuming [DONE]
2. Only run finalizer when `saw_done=False` → reduces log noise

**Result:**
- [DONE] now consumed naturally from streaming_service.py error handlers
- Finalizer warning only appears when truly needed (fallback cases)
- Cleaner production logs

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
cthomas
ca40eff7bc fix: ensure stop_reason is always set when marking runs as failed (#9045)
**Problem:**
Production error showed runs being marked as failed with stop_reason=None,
which violates LettaStopReason's Pydantic schema (requires valid enum value).
This caused cascading validation errors that got stored in metadata.

Example error:
```
Run is already in a terminal state failed with stop reason None, but is being
updated with data {'status': 'failed', 'stop_reason': None, 'metadata':
{'error': "1 validation error for LettaStopReason\nstop_reason Input should
be 'end_turn', 'error', ... [type=enum, input_value=None]"}}
```

**Root Causes:**
1. routers/v1/agents.py had 3 exception handlers creating RunUpdate(status=failed)
   without stop_reason
2. Success path assumed result.stop_reason always exists (AttributeError if None)
3. run_manager.py tried to create LettaStopReason(stop_reason=None) when
   refreshing result messages

**Fixes:**
1. Added stop_reason=StopReasonType.error to 3 exception handlers
2. Added defensive None checks before accessing result.stop_reason.stop_reason
3. Added fallback to StopReasonType.error when pydantic_run.stop_reason is None

**Trigger:**
OpenAI BadRequestError for invalid tool schema → exception handlers marked
run as failed without stop_reason → validation error when constructing response

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
Sarah Wooders
25e9539a6e feat: add batch passage create and optional search query (#8866) 2026-01-29 12:44:04 -08:00
cthomas
c162de5127 fix: use shared event + .athrow() to properly set stream_was_cancelle… (#9019)
fix: use shared event + .athrow() to properly set stream_was_cancelled flag

**Problem:**
When a run is cancelled via /cancel endpoint, `stream_was_cancelled` remained
False because `RunCancelledException` was raised in the consumer code (wrapper),
which closes the generator from outside. This causes Python to skip the
generator's except blocks and jump directly to finally with the wrong flag value.

**Solution:**
1. Shared `asyncio.Event` registry for cross-layer cancellation signaling
2. `cancellation_aware_stream_wrapper` sets the event when cancellation detected
3. Wrapper uses `.athrow()` to inject exception INTO generator (not consumer-side raise)
4. All streaming interfaces check event in `finally` block to set flag correctly
5. `streaming_service.py` handles `RunCancelledException` gracefully, yields [DONE]

**Changes:**
- streaming_response.py: Event registry + .athrow() injection + graceful handling
- openai_streaming_interface.py: 3 classes check event in finally
- gemini_streaming_interface.py: Check event in finally
- anthropic_*.py: Catch RunCancelledException
- simple_llm_stream_adapter.py: Create & pass event to interfaces
- streaming_service.py: Handle RunCancelledException, yield [DONE], skip double-update
- routers/v1/{conversations,runs}.py: Pass event to wrapper
- integration_test_human_in_the_loop.py: New test for approval + cancellation

**Tests:**
- test_tool_call with cancellation (OpenAI models) 
- test_approve_with_cancellation (approval flow + concurrent cancel) 

**Known cosmetic warnings (pre-existing):**
- "Run already in terminal state" - agent loop tries to update after /cancel
- "Stream ended without terminal event" - background streaming timing race

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
Ari Webb
5c06918042 fix: don't need embedding model for self hosted [LET-7009] (#8935)
* fix: don't need embedding model for self hosted

* stage publish api

* passes tests

* add test

* remove unnecessary upgrades

* update revision order db migrations

* add timeout for ci
2026-01-29 12:44:04 -08:00
Kian Jones
7133083b81 fix: agent_tags for provider traces (#8989)
* add include tags

* include agent_tags and pass them into the adapter
2026-01-29 12:43:53 -08:00
Charles Packer
2fc592e0b6 feat(core): add image support in tool returns [LET-7140] (#8985)
* feat(core): add image support in tool returns [LET-7140]

Enable tool_return to support both string and ImageContent content parts,
matching the pattern used for user message inputs. This allows tools
executed client-side to return images back to the agent.

Changes:
- Add LettaToolReturnContentUnion type for text/image content parts
- Update ToolReturn schema to accept Union[str, List[content parts]]
- Update converters for each provider:
  - OpenAI Chat Completions: placeholder text for images
  - OpenAI Responses API: full image support
  - Anthropic: full image support with base64
  - Google: placeholder text for images
- Add resolve_tool_return_images() for URL-to-base64 conversion
- Make create_approval_response_message_from_input() async

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): support images in Google tool returns as sibling parts

Following the gemini-cli pattern: images in tool returns are sent as
sibling inlineData parts alongside the functionResponse, rather than
inside it.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test(core): add integration tests for multi-modal tool returns [LET-7140]

Tests verify that:
- Models with image support (Anthropic, OpenAI Responses API) can see
  images in tool returns and identify the secret text
- Models without image support (Chat Completions) get placeholder text
  and cannot see the actual image content
- Tool returns with images persist correctly in the database

Uses secret.png test image containing hidden text "FIREBRAWL" that
models must identify to pass the test.

Also fixes misleading comment about Anthropic only supporting base64
images - they support URLs too, we just pre-resolve for consistency.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: simplify tool return image support implementation

Reduce code verbosity while maintaining all functionality:
- Extract _resolve_url_to_base64() helper in message_helper.py (eliminates duplication)
- Add _get_text_from_part() helper for text extraction
- Add _get_base64_image_data() helper for image data extraction
- Add _tool_return_to_google_parts() to simplify Google implementation
- Add _image_dict_to_data_url() for OpenAI Responses format
- Use walrus operator and list comprehensions where appropriate
- Add integration_test_multi_modal_tool_returns.py to CI workflow

Net change: -120 lines while preserving all features and test coverage.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(tests): improve prompt for multi-modal tool return tests

Make prompts more direct to reduce LLM flakiness:
- Simplify tool description: "Retrieves a secret image with hidden text. Call this function to get the image."
- Change user prompt from verbose request to direct command: "Call the get_secret_image function now."
- Apply to both test methods

This reduces ambiguity and makes tool calling more reliable across different LLM models.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix bugs

* test(core): add google_ai/gemini-2.0-flash-exp to multi-modal tests

Add Gemini model to test coverage for multi-modal tool returns. Google AI already supports images in tool returns via sibling inlineData parts.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(ui): handle multi-modal tool_return type in frontend components

Convert Union<string, LettaToolReturnContentUnion[]> to string for display:
- ViewRunDetails: Convert array to '[Image here]' placeholder
- ToolCallMessageComponent: Convert array to '[Image here]' placeholder

Fixes TypeScript errors in web, desktop-ui, and docker-ui type-checks.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Caren Thomas <carenthomas@gmail.com>
2026-01-29 12:43:53 -08:00
Ari Webb
4ec6649caf feat: byok provider models in db also (#8317)
* feat: byok provider models in db also

* make tests and sync api

* fix inconsistent state with recreating provider of same name

* fix sync on byok creation

* update revision

* move stripe code for testing purposes

* revert

* add refresh byok models endpoint

* just stage publish api

* add tests

* reorder revision

* add test for name clashes
2026-01-29 12:43:53 -08:00
cthomas
5f58819bbf fix: false positives for token markers (#8942) 2026-01-29 12:43:23 -08:00
Kian Jones
a92e868ee6 feat: centralize telemetry logging at LLM client level (#8815)
* feat: centralize telemetry logging at LLM client level

Moves telemetry logging from individual adapters to LLMClientBase:
- Add TelemetryStreamWrapper for streaming telemetry on stream close
- Add request_async_with_telemetry() for non-streaming requests
- Add stream_async_with_telemetry() for streaming requests
- Add set_telemetry_context() to configure agent_id, run_id, step_id

Updates adapters and agents to use new pattern:
- LettaLLMAdapter now accepts agent_id/run_id in constructor
- Adapters call set_telemetry_context() before LLM requests
- Removes duplicate telemetry logging from adapters
- Enriches traces with agent_id, run_id, call_type metadata

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: accumulate streaming response content for telemetry

TelemetryStreamWrapper now extracts actual response data from chunks:
- Content text (concatenated from deltas)
- Tool calls (id, name, arguments)
- Model name, finish reason, usage stats

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: move streaming telemetry to caller (option 3)

- Remove TelemetryStreamWrapper class
- Add log_provider_trace_async() helper to LLMClientBase
- stream_async_with_telemetry() now just returns raw stream
- Callers log telemetry after processing with rich interface data

Updated callers:
- summarizer.py: logs content + usage after stream processing
- letta_agent.py: logs tool_call, reasoning, model, usage

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: pass agent_id and run_id to parent adapter class

LettaLLMStreamAdapter was not passing agent_id/run_id to parent,
causing "unexpected keyword argument" errors.

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:43 -08:00
Kian Jones
9418ab9815 feat: add provider trace backend abstraction for multi-backend telemetry (#8814)
* feat: add provider trace backend abstraction for multi-backend telemetry

Introduces a pluggable backend system for provider traces:
- Base class with async/sync create and read interfaces
- PostgreSQL backend (existing behavior)
- ClickHouse backend (via OTEL instrumentation)
- Socket backend (writes to Unix socket for crouton sidecar)
- Factory for instantiating backends from config

Refactors TelemetryManager to use backends with support for:
- Multi-backend writes (concurrent via asyncio.gather)
- Primary backend for reads (first in config list)
- Graceful error handling per backend

Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND (comma-separated)
Example: "postgres,socket" for dual-write to Postgres and crouton

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add protocol version to socket backend records

Adds PROTOCOL_VERSION constant to socket backend:
- Included in every telemetry record sent to crouton
- Must match ProtocolVersion in apps/crouton/main.go
- Enables crouton to detect and reject incompatible messages

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: remove organization_id from ProviderTraceCreate calls

The organization_id is now handled via the actor parameter in the
telemetry manager, not through ProviderTraceCreate schema. This fixes
validation errors after changing ProviderTraceCreate to inherit from
BaseProviderTrace which forbids extra fields.

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* consolidate provider trace

* add clickhouse-connect to fix bug on main lmao

* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api

* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api

* consolidate provider trace

* consolidate provider trace bug fix

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:43 -08:00
Sarah Wooders
5c7bed7743 feat: add conversation_id to export export and compact (#8792) 2026-01-19 15:54:43 -08:00
cthomas
6599aa3b44 feat: populate seq_id for ping messages (#8844)
* feat: populate seq_id for ping messages

* fix import
2026-01-19 15:54:43 -08:00
jnjpng
5017cb1d12 feat: add chatgpt oauth client for codex routing (#8774)
* base

* refresh

* use default model fallback

* patch

* streaming

* generate
2026-01-19 15:54:42 -08:00
Ari Webb
193c0e4b74 feat: add override_model to message endpoints (#8763)
* feat: add override_model to message endpoints

* add tests back

* remove from ci
2026-01-19 15:54:42 -08:00
Kian Jones
d2c3350a7e feat(runs): add run ID filter to runs page (#8726)
feat(core): add run_id filter to internal runs endpoint

Adds the ability to filter runs by a specific run ID in the
internal runs list endpoint.

🐛 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:42 -08:00
cthomas
6b6ca91183 fix: remove letta ping schema override (#8790) 2026-01-19 15:54:41 -08:00
Charles Packer
97f7e95d1d feat: add PATCH route for updating conversation summary (#8322) 2026-01-19 15:54:41 -08:00
Sarah Wooders
f91e77d971 fix: add cancel for conversations to SDK (#8742) 2026-01-19 15:54:41 -08:00
jnjpng
58a5375c19 fix: test sdk client due to message batch route ordering (#8733)
* base

* generate
2026-01-19 15:54:40 -08:00
Sarah Wooders
aabd58628e feat: add conversation cancellation endpoint (#8729) 2026-01-19 15:54:40 -08:00
jnjpng
037c20ae1b feat: query param parity for conversation messages (#8730)
* base

* add tests

* generate
2026-01-19 15:54:40 -08:00
Sarah Wooders
9aac2abdfe chore: deprecate identities/groups APIs and remove from SDK (#8580)
* chore: deprecate identities/groups APIs and remove from SDK

- Mark all /v1/identities/* endpoints as deprecated
- Mark all /v1/groups/* endpoints as deprecated
- Remove identities, groups, and batches resources from stainless.yml
- Batch API remains active but hidden from SDK

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: update autogenerated SDK files

* chore: regenerate SDK and OpenAPI spec

Run `just stage-api` and `just publish-api` to sync generated files.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

* chore: remove schedule API from stainless SDK

Remove schedule subresource from stainless.yml to hide scheduled messages
endpoints from the SDK generation.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
2026-01-19 15:54:40 -08:00
jnjpng
e3e758a8c0 feat: add retrieve message endpoint and to client sdk (#8719)
* base

* generate openapi

* try again

* now
2026-01-19 15:54:40 -08:00
Kian Jones
3eae81cf62 feat: add /v1/runs/{run_id}/trace endpoint for OTEL traces (#8682)
* feat: add /v1/runs/{run_id}/trace endpoint for OTEL traces

- Add new endpoint to retrieve filtered OTEL spans for a run
- Filter to only return UI-relevant spans (agent_step, tool executions, root span, TTFT)
- Skip Postgres writes when ClickHouse is enabled for provider traces
- Add USE_CLICKHOUSE_FOR_PROVIDER_TRACES env var to helm/justfile
- Move typecheck CI to self-hosted runners

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add missing clickhouse_provider_traces.py

The telemetry_manager.py imports ClickhouseProviderTraceReader from
this module, but the file was not included when splitting the PR.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* autogen

* fix: add trace.retrieve to stainless.yml for SDK generation

Adds the runs.trace.retrieve method mapping so Stainless generates
the useRunsServiceRetrieveTraceForRun hook.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:39 -08:00
Sarah Wooders
9c4f191755 feat: add conversation_id parameter to context endpoint [LET-6989] (#8678)
* feat: add conversation_id parameter to context endpoint [LET-6989]

Add optional conversation_id query parameter to retrieve_agent_context_window.
When provided, the endpoint uses messages from the specific conversation
instead of the agent's default message_ids.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate SDK after context endpoint update [LET-6989]

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:39 -08:00
jnjpng
979062114c chore: fix typo and improve MCP OAuth comments (#8629)
- Fix typo "upate" -> "update" in TODO comments (mcp_manager.py, mcp_server_manager.py)
- Improve comments in OAuth callback handler to explain why MCPOAuthSession
  is used directly (callback is unauthenticated, manager requires actor)
- Clean up variable naming in callback handler

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:38 -08:00
cthomas
ab4ccfca31 feat: add tags support to blocks (#8474)
* feat: add tags support to blocks

* fix: add timestamps and org scoping to blocks_tags

Addresses PR feedback:

1. Migration: Added timestamps (created_at, updated_at), soft delete
   (is_deleted), audit fields (_created_by_id, _last_updated_by_id),
   and organization_id to blocks_tags table for filtering support.
   Follows SQLite baseline pattern (composite PK of block_id+tag, no
   separate id column) to avoid insert failures.

2. ORM: Relationship already correct with lazy="raise" to prevent
   implicit joins and passive_deletes=True for efficient CASCADE deletes.

3. Schema: Changed normalize_tags() from Any to dict for type safety.

4. SQLite: Added blocks_tags to SQLite baseline schema to prevent
   table-not-found errors.

5. Code: Updated all tag row inserts to include organization_id.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add ORM columns and update SQLite baseline for blocks_tags

Fixes test failures (CompileError: Unconsumed column names: organization_id):

1. ORM: Added organization_id, timestamps, audit fields to BlocksTags
   ORM model to match database schema from migrations.

2. SQLite baseline: Added full column set to blocks_tags (organization_id,
   timestamps, audit fields) to match PostgreSQL schema.

3. Test: Added 'tags' to expected Block schema fields.

This ensures SQLite and PostgreSQL have matching schemas and the ORM
can consume all columns that the code inserts.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* revert change to existing alembic migration

* fix: remove passive_deletes and SQLite support for blocks_tags

1. Removed passive_deletes=True from Block.tags relationship to match
   AgentsTags pattern (neither have ondelete CASCADE in DB schema).

2. Removed SQLite branch from _replace_block_pivot_rows_async since
   blocks_tags table is PostgreSQL-only (migration skips SQLite).

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* api sync

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:38 -08:00
jnjpng
c550457b60 feat: static redirect callback for mcp server oauth (#8611)
* base

* base

* more

* final

* remove

* pass
2026-01-19 15:54:38 -08:00
Sarah Wooders
b8a6496acb feat: add runs_metrics table (#5169) 2026-01-19 15:51:30 -08:00
Ari Webb
6d859174c2 feat: make conversations throw http busy to stop race condition [LET-6842] (#8411)
* feat: make conversations throw http busy to stop race condition

* use redis lock instead

* move acquire lock into redis client, integration tests, move lock release into run manager

* fix tests, bug

* conditional import

* remove else

* better release

* run ci

* final reordering lock

* update tests

* wrong naming of lock holder token
2026-01-12 10:57:49 -08:00
cthomas
03a64993cf fix: make file reads async (#8513) 2026-01-12 10:57:49 -08:00
jnjpng
87e939deda feat: add fastmcp v2 client (#8457)
* base

* testing code

* update

* nit
2026-01-12 10:57:49 -08:00