* feat: add /agents/{agent_id}/generate endpoint for direct LLM requests
Add new endpoint that makes direct LLM provider requests without agent
context, memory, tools, or state modification. This enables:
- Quick LLM queries without agent overhead
- Testing model configurations
- Simple chat completions using agent's credentials
- Comparing responses across different models
Features:
- Uses agent's LLM config by default
- Supports model override with full provider config resolution
- Non-streaming, stateless operation
- Proper error handling and validation
- Request/response schemas with Pydantic validation
Implementation:
- Add GenerateRequest and GenerateResponse schemas
- Implement generate_completion endpoint handler
- Add necessary imports (LLMError, LLMClient, HandleNotFoundError)
- Include logging and comprehensive error handling
* fix: improve error handling and fix Message construction
- Fix critical bug: use content=[TextContent(text=...)] instead of text=...
- Add explicit error handling for NoResultFound and HandleNotFoundError
- Add error handling for convert_response_to_chat_completion
- Add structured logging for debugging
- Remove unnecessary .get() calls since Pydantic validates messages
* refactor: extract generate logic to AgentCompletionService
Move the generate endpoint business logic out of the endpoint handler
into a dedicated AgentCompletionService class for better code organization
and separation of concerns.
Changes:
- Create new AgentCompletionService in services/agent_completion_service.py
- Service handles all business logic: agent validation, LLM config resolution,
message conversion, LLM client creation, and request/response processing
- Integrate service with SyncServer initialization
- Refactor generate_completion endpoint to use the service
- Endpoint now only handles HTTP concerns (auth, error mapping)
Benefits:
- Cleaner endpoint code (reduced from ~140 lines to ~25 lines)
- Better separation of concerns (HTTP vs business logic)
- Service logic can be reused or tested independently
- Follows established patterns in the codebase (AgentManager, etc.)
* feat: simplify generate API to accept just prompt text
Simplify the client interface by accepting a simple prompt string instead
of requiring clients to format messages.
Changes:
- Update GenerateRequest schema:
- Replace 'messages' array with simple 'prompt' string
- Add optional 'system_prompt' for context/instructions
- Keep 'override_model' for model selection
- Update AgentCompletionService to format messages automatically:
- Accepts prompt and optional system_prompt
- Constructs message array internally (system + user messages)
- Simpler API surface for clients
- Update endpoint documentation with new simplified examples
- Regenerate OpenAPI spec and TypeScript SDK
Benefits:
- Much simpler client experience - just send text
- No need to understand message formatting
- Still supports system prompts for context
- Cleaner API that matches common use cases
Example (before):
{
"messages": [{"role": "user", "content": "What is 2+2?"}]
}
Example (after):
{
"prompt": "What is 2+2?"
}
* test: add comprehensive integration tests for generate endpoint
Add 9 integration tests covering various scenarios:
Happy path tests:
- test_agent_generate_basic: Basic prompt -> response flow
- test_agent_generate_with_system_prompt: System prompt + user prompt
- test_agent_generate_with_model_override: Override model selection
- test_agent_generate_long_prompt: Handle longer prompts
- test_agent_generate_no_persistence: Verify no messages saved to agent
Error handling tests:
- test_agent_generate_empty_prompt_error: Empty prompt validation (422)
- test_agent_generate_invalid_agent_id: Invalid agent ID (404)
- test_agent_generate_invalid_model_override: Invalid model handle (404)
All tests verify:
- Response structure (content, model, usage)
- Proper status codes for errors
- Usage statistics (tokens, counts)
- No side effects on agent state
Tests follow existing test patterns in test_client.py and use the
letta_client SDK (assuming generate_completion method is auto-generated
from the OpenAPI spec).
* openapi
* refactor: rename AgentCompletionService to AgentGenerateCompletionManager
Rename for better clarity and consistency with codebase naming conventions:
- Rename file: agent_completion_service.py → agent_generate_completion_manager.py
- Rename class: AgentCompletionService → AgentGenerateCompletionManager
- Rename attribute: server.agent_completion_service → server.agent_generate_completion_manager
- Update docstrings: 'Service' → 'Manager'
Changes:
- apps/core/letta/services/agent_generate_completion_manager.py (renamed + updated class)
- apps/core/letta/server/server.py (import + initialization)
- apps/core/letta/server/rest_api/routers/v1/agents.py (usage in endpoint)
No functional changes, purely a naming refactor.
* fix: remove invalid Message parameters in generate manager
Remove agent_id=None and user_id=None from Message construction.
The Message model doesn't accept these as None values - only pass
required parameters (role, content).
Fixes validation error:
'Extra inputs are not permitted [type=extra_forbidden, input_value=None]'
This aligns with other Message construction patterns in the codebase
(see tools.py, memory.py examples).
* feat: improve generate endpoint validation and tests
- Add field validator for whitespace-only prompts
- Always include system message (required by Anthropic)
- Use default "You are a helpful assistant." when no system_prompt provided
- Update tests to use direct HTTP calls via httpx
- Fix test issues:
- Use valid agent ID format (agent-{uuid})
- Use available model (openai/gpt-4o-mini)
- Add whitespace validation test
- All 9 integration tests passing
* refactor: extract compact logic to shared function
Extract the compaction logic from LettaAgentV3.compact() into a
standalone compact_messages() function that can be shared between
the agent and temporal workflows.
Changes:
- Create apps/core/letta/services/summarizer/compact.py with:
- compact_messages(): Core compaction logic
- build_summarizer_llm_config(): LLM config builder for summarization
- CompactResult: Dataclass for compaction results
- Update LettaAgentV3.compact() to use compact_messages()
- Update temporal summarize_conversation_history activity to use
compact_messages() instead of the old Summarizer class
- Add use_summary_role parameter to SummarizeParams
This ensures consistent summarization behavior across different
execution paths and prevents drift as we improve the implementation.
* chore: clean up verbose comments
* fix: correct CompactionSettings import path
* fix: correct count_tokens import from summarizer_sliding_window
* fix: update test patch path for count_tokens_with_tools
After extracting compact logic to compact.py, the test was patching
the old location. Update the patch path to the new module location.
* fix: update test to use build_summarizer_llm_config from compact.py
The function was moved from LettaAgentV3._build_summarizer_llm_config
to compact.py as a standalone function.
* fix: add early check for system prompt size in compact_messages
Check if the system prompt alone exceeds the context window before
attempting summarization. The system prompt cannot be compacted,
so fail fast with SystemPromptTokenExceededError.
* fix: properly propagate SystemPromptTokenExceededError from compact
The exception handler in _step() was not setting the correct stop_reason
for SystemPromptTokenExceededError, which caused the finally block to
return early and swallow the exception.
Add special handling to set stop_reason to context_window_overflow_in_system_prompt
when SystemPromptTokenExceededError is caught.
* revert: remove redundant SystemPromptTokenExceededError handling
The special handling in the outer exception handler is redundant because
stop_reason is already set in the inner handler at line 943. The actual
fix for the test was the early check in compact_messages(), not this
redundant handling.
* fix: correctly re-raise SystemPromptTokenExceededError
The inner exception handler was using 'raise e' which re-raised the outer
ContextWindowExceededError instead of the current SystemPromptTokenExceededError.
Changed to 'raise' to correctly re-raise the current exception. This bug
was pre-existing but masked because _check_for_system_prompt_overflow was
only called as a fallback. The new early check in compact_messages() exposed it.
* revert: remove early check and restore raise e to match main behavior
* fix: set should_continue=False and correctly re-raise exception
- Add should_continue=False in SystemPromptTokenExceededError handler (matching main's _check_for_system_prompt_overflow behavior)
- Fix raise e -> raise to correctly propagate SystemPromptTokenExceededError
Note: test_large_system_prompt_summarization still fails locally but passes on main.
Need to investigate why exception isn't propagating correctly on refactored branch.
* fix: add SystemPromptTokenExceededError handler for post-step compaction
The post-step compaction (line 1066) was missing a SystemPromptTokenExceededError
exception handler. When compact_messages() raised this error, it would be caught
by the outer exception handler which would:
1. Set stop_reason to "error" instead of "context_window_overflow_in_system_prompt"
2. Not set should_continue = False
3. Get swallowed by the finally block (line 1126) which returns early
This caused test_large_system_prompt_summarization to fail because the exception
never propagated to the test.
The fix adds the same exception handler pattern used in the retry compaction flow
(line 941-946), ensuring proper state is set before re-raising.
This issue only affected the refactored code because on main, _check_for_system_prompt_overflow()
was an instance method that set should_continue/stop_reason BEFORE raising. In the refactor,
compact_messages() is a standalone function that cannot set instance state, so the caller
must handle the exception and set the state.
* feat: add ID format validation to agent and user schemas
Reuse existing validator types (ToolId, SourceId, BlockId, MessageId,
IdentityId, UserId) from letta.validators to enforce ID format validation
at the schema level. This ensures malformed IDs are rejected with a 422
validation error instead of causing 500 database errors.
Changes:
- CreateAgent: validate tool_ids, source_ids, folder_ids, block_ids, identity_ids
- UpdateAgent: validate tool_ids, source_ids, folder_ids, block_ids, message_ids, identity_ids
- UserUpdate: validate id
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: regenerate API spec and SDK
* fix: override ID validation in AgentSchema for agent file portability
AgentSchema extends CreateAgent but needs to allow arbitrary short IDs
(e.g., tool-0, block-0) for portable agent files. Override the validated
ID fields to use plain List[str] instead of the validated types.
Also fix test_agent.af to use proper UUID-format IDs.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: regenerate API spec and SDK
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: revert test_agent.af - short IDs are valid for agent files
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix openapi schema
---------
Co-authored-by: Letta <noreply@letta.com>
* feat: add non-streaming option for conversation messages
- Add ConversationMessageRequest with stream=True default (backwards compatible)
- stream=true (default): SSE streaming via StreamingService
- stream=false: JSON response via AgentLoop.load().step()
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: regenerate API schema for ConversationMessageRequest
* feat: add direct ClickHouse storage for raw LLM traces
Adds ability to store raw LLM request/response payloads directly in ClickHouse,
bypassing OTEL span attribute size limits. This enables debugging and analytics
on large LLM payloads (>10MB system prompts, large tool schemas, etc.).
New files:
- letta/schemas/llm_raw_trace.py: Pydantic schema with ClickHouse row helper
- letta/services/llm_raw_trace_writer.py: Async batching writer (fire-and-forget)
- letta/services/llm_raw_trace_reader.py: Reader with query methods
- scripts/sql/clickhouse/llm_raw_traces.ddl: Production table DDL
- scripts/sql/clickhouse/llm_raw_traces_local.ddl: Local dev DDL
- apps/core/clickhouse-init.sql: Local dev initialization
Modified:
- letta/settings.py: Added 4 settings (store_llm_raw_traces, ttl, batch_size, flush_interval)
- letta/llm_api/llm_client_base.py: Integration into request_async_with_telemetry
- compose.yaml: Added ClickHouse service for local dev
- justfile: Added clickhouse, clickhouse-cli, clickhouse-traces commands
Feature disabled by default (LETTA_STORE_LLM_RAW_TRACES=false).
Uses ZSTD(3) compression for 10-30x reduction on JSON payloads.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: address code review feedback for LLM raw traces
Fixes based on code review feedback:
1. Fix ClickHouse endpoint parsing - default to secure=False for raw host:port
inputs (was defaulting to HTTPS which breaks local dev)
2. Make raw trace writes truly fire-and-forget - use asyncio.create_task()
instead of awaiting, so JSON serialization doesn't block request path
3. Add bounded queue (maxsize=10000) - prevents unbounded memory growth
under load. Drops traces with warning if queue is full.
4. Fix deprecated asyncio usage - get_running_loop() instead of get_event_loop()
5. Add org_id fallback - use _telemetry_org_id if actor doesn't have it
6. Remove unused imports - json import in reader
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: add missing asyncio import and simplify JSON serialization
- Add missing 'import asyncio' that was causing 'name asyncio is not defined' error
- Remove unnecessary clean_double_escapes() function - the JSON is stored correctly,
the clickhouse-client CLI was just adding extra escaping when displaying
- Update just clickhouse-trace to use Python client for correct JSON output
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* test: add clickhouse raw trace integration test
* test: simplify clickhouse trace assertions
* refactor: centralize usage parsing and stream error traces
Use per-client usage helpers for raw trace extraction and ensure streaming errors log requests with error metadata.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* test: exercise provider usage parsing live
Make live OpenAI/Anthropic/Gemini requests with credential gating and validate Anthropic cache usage mapping when present.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* test: fix usage parsing tests to pass
- Use GoogleAIClient with GEMINI_API_KEY instead of GoogleVertexClient
- Update model to gemini-2.0-flash (1.5-flash deprecated in v1beta)
- Add tools=[] for Gemini/Anthropic build_request_data
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: extract_usage_statistics returns LettaUsageStatistics
Standardize on LettaUsageStatistics as the canonical usage format returned by client helpers. Inline UsageStatistics construction for ChatCompletionResponse where needed.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* feat: add is_byok and llm_config_json columns to ClickHouse traces
Extend llm_raw_traces table with:
- is_byok (UInt8): Track BYOK vs base provider usage for billing analytics
- llm_config_json (String, ZSTD): Store full LLM config for debugging and analysis
This enables queries like:
- BYOK usage breakdown by provider/model
- Config parameter analysis (temperature, max_tokens, etc.)
- Debugging specific request configurations
* feat: add tests for error traces, llm_config_json, and cache tokens
- Update llm_raw_trace_reader.py to query new columns (is_byok,
cached_input_tokens, cache_write_tokens, reasoning_tokens, llm_config_json)
- Add test_error_trace_stored_in_clickhouse to verify error fields
- Add test_cache_tokens_stored_for_anthropic to verify cache token storage
- Update existing tests to verify llm_config_json is stored correctly
- Make llm_config required in log_provider_trace_async()
- Simplify provider extraction to use provider_name directly
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* ci: add ClickHouse integration tests to CI pipeline
- Add use-clickhouse option to reusable-test-workflow.yml
- Add ClickHouse service container with otel database
- Add schema initialization step using clickhouse-init.sql
- Add ClickHouse env vars (CLICKHOUSE_ENDPOINT, etc.)
- Add separate clickhouse-integration-tests job running
integration_test_clickhouse_llm_raw_traces.py
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: simplify provider and org_id extraction in raw trace writer
- Use model_endpoint_type.value for provider (not provider_name)
- Simplify org_id to just self.actor.organization_id (actor is always pydantic)
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: simplify LLMRawTraceWriter with _enabled flag
- Check ClickHouse env vars once at init, set _enabled flag
- Early return in write_async/flush_async if not enabled
- Remove ValueError raises (never used)
- Simplify _get_client (no validation needed since already checked)
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: add LLMRawTraceWriter shutdown to FastAPI lifespan
Properly flush pending traces on graceful shutdown via lifespan
instead of relying only on atexit handler.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* feat: add agent_tags column to ClickHouse traces
Store agent tags as Array(String) for filtering/analytics by tag.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* cleanup
* fix(ci): fix ClickHouse schema initialization in CI
- Create database separately before loading SQL file
- Remove CREATE DATABASE from SQL file (handled in CI step)
- Add verification step to confirm table was created
- Use -sf flag for curl to fail on HTTP errors
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: simplify LLM trace writer with ClickHouse async_insert
- Use ClickHouse async_insert for server-side batching instead of manual queue/flush loop
- Sync cloud DDL schema with clickhouse-init.sql (add missing columns)
- Remove redundant llm_raw_traces_local.ddl
- Remove unused batch_size/flush_interval settings
- Update tests for simplified writer
Key changes:
- async_insert=1, wait_for_async_insert=1 for reliable server-side batching
- Simple per-trace retry with exponential backoff (max 3 retries)
- ~150 lines removed from writer
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: consolidate ClickHouse direct writes into TelemetryManager backend
- Add clickhouse_direct backend to provider_trace_backends
- Remove duplicate ClickHouse write logic from llm_client_base.py
- Configure via LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse_direct
The clickhouse_direct backend:
- Converts ProviderTrace to LLMRawTrace
- Extracts usage stats from response JSON
- Writes via LLMRawTraceWriter with async_insert
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: address PR review comments and fix llm_config bug
Review comment fixes:
- Rename clickhouse_direct -> clickhouse_analytics (clearer purpose)
- Remove ClickHouse from OSS compose.yaml, create separate compose.clickhouse.yaml
- Delete redundant scripts/test_llm_raw_traces.py (use pytest tests)
- Remove unused llm_raw_traces_ttl_days setting (TTL handled in DDL)
- Fix socket description leak in telemetry_manager docstring
- Add cloud-only comment to clickhouse-init.sql
- Update justfile to use separate compose file
Bug fix:
- Fix llm_config not being passed to ProviderTrace in telemetry
- Now correctly populates provider, model, is_byok for all LLM calls
- Affects both request_async_with_telemetry and log_provider_trace_async
DDL optimizations:
- Add secondary indexes (bloom_filter for agent_id, model, step_id)
- Add minmax indexes for is_byok, is_error
- Change model and error_type to LowCardinality for faster GROUP BY
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: rename llm_raw_traces -> llm_traces
Address review feedback that "raw" is misleading since we denormalize fields.
Renames:
- Table: llm_raw_traces -> llm_traces
- Schema: LLMRawTrace -> LLMTrace
- Files: llm_raw_trace_{reader,writer}.py -> llm_trace_{reader,writer}.py
- Setting: store_llm_raw_traces -> store_llm_traces
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: update workflow references to llm_traces
Missed renaming table name in CI workflow files.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: update clickhouse_direct -> clickhouse_analytics in docstring
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: remove inaccurate OTEL size limit comments
The 4MB limit is our own truncation logic, not an OTEL protocol limit.
The real benefit is denormalized columns for analytics queries.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: remove local ClickHouse dev setup (cloud-only feature)
- Delete clickhouse-init.sql and compose.clickhouse.yaml
- Remove local clickhouse just commands
- Update CI to use cloud DDL with MergeTree for testing
clickhouse_analytics is a cloud-only feature. For local dev, use postgres backend.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: restore compose.yaml to match main
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: merge clickhouse_analytics into clickhouse backend
Per review feedback - having two separate backends was confusing.
Now the clickhouse backend:
- Writes to llm_traces table (denormalized for cost analytics)
- Reads from OTEL traces table (will cut over to llm_traces later)
Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: correct path to DDL file in CI workflow
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: add provider index to DDL for faster filtering
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: configure telemetry backend in clickhouse tests
Tests need to set telemetry_settings.provider_trace_backends to include
'clickhouse', otherwise traces are routed to default postgres backend.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: set provider_trace_backend field, not property
provider_trace_backends is a computed property, need to set the
underlying provider_trace_backend string field instead.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: error trace test and error_type extraction
- Add TelemetryManager to error trace test so traces get written
- Fix error_type extraction to check top-level before nested error dict
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: use provider_trace.id for trace correlation across backends
- Pass provider_trace.id to LLMTrace instead of auto-generating
- Log warning if ID is missing (shouldn't happen, helps debug)
- Fallback to new UUID only if not set
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: trace ID correlation and concurrency issues
- Strip "provider_trace-" prefix from ID for UUID storage in ClickHouse
- Add asyncio.Lock to serialize writes (clickhouse_connect not thread-safe)
- Fix Anthropic prompt_tokens to include cached tokens for cost analytics
- Log warning if provider_trace.id is missing
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Caren Thomas <carenthomas@gmail.com>
fix: handle oversized text in embedding requests with recursive chunking
When message text exceeds the embedding model's context length, recursively
split it until all chunks can be embedded successfully.
Changes:
- `tpuf_client.py`: Add `_split_text_in_half()` helper for recursive splitting
- `tpuf_client.py`: Add `_generate_embeddings_with_chunking()` that retries
with splits on context length errors
- `tpuf_client.py`: Store `message_id` and `chunk_index` columns in Turbopuffer
- `tpuf_client.py`: Deduplicate query results by `message_id`
- `tpuf_client.py`: Use `LettaInvalidArgumentError` instead of `ValueError`
- `tpuf_client.py`: Move LLMClient import to top of file
- `openai_client.py`: Remove fixed truncation (chunking handles this now)
- Add tests for `_split_text_in_half` and chunked query deduplication
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
The deepwiki SSE MCP server is deprecated, so skip this test.
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* fix: non-streaming conversation messages endpoint
**Problems:**
1. `AssertionError: run_id is required when enforce_run_id_set is True`
- Non-streaming path didn't create a run before calling `step()`
2. `ResponseValidationError: Unable to extract tag using discriminator 'message_type'`
- `response_model=LettaStreamingResponse` but non-streaming returns `LettaResponse`
**Fixes:**
1. Add run creation before calling `step()` (mirrors agents endpoint)
2. Set run_id in Redis for cancellation support
3. Pass `run_id` to `step()`
4. Change `response_model` from `LettaStreamingResponse` to `LettaResponse`
(streaming returns `StreamingResponse` which bypasses response_model validation)
**Test:**
Added `test_conversation_non_streaming_raw_http` to verify the fix.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* api sync
---------
Co-authored-by: Letta <noreply@letta.com>
fix: load default provider config when summarizer uses different provider
**Problem:**
Summarization failed when agent used one provider (e.g., Google AI) but
summarizer config specified a different provider (e.g., Anthropic):
```python
# Agent LLM config
model_endpoint_type='google_ai', handle='gemini-something/gemini-2.5-pro',
context_window=100000
# Summarizer config
model='anthropic/claude-haiku-4-5-20251001'
# Bug: Resulting summarizer_llm_config mixed Google + Anthropic settings
model='claude-haiku-4-5-20251001', model_endpoint_type='google_ai', # ❌ Wrong endpoint!
context_window=100000 # ❌ Google's context window, not Anthropic's default!
```
This sent Claude requests to Google AI endpoints with incorrect parameters.
**Root Cause:**
`_build_summarizer_llm_config()` always copied the agent's LLM config as base,
then patched model/provider fields. But this kept all provider-specific settings
(endpoint, context_window, etc.) from the wrong provider.
**Fix:**
1. Parse provider_name from summarizer handle
2. Check if it matches agent's model_endpoint_type (or provider_name for custom)
3. **If YES** → Use agent config as base, override model/handle (same provider)
4. **If NO** → Load default config via `provider_manager.get_llm_config_from_handle()` (new provider)
**Example Flow:**
```python
# Agent: google_ai/gemini-2.5-pro
# Summarizer: anthropic/claude-haiku
provider_name = "anthropic" # Parsed from handle
provider_matches = ("anthropic" == "google_ai") # False ❌
# Different provider → load default Anthropic config
base = await provider_manager.get_llm_config_from_handle(
handle="anthropic/claude-haiku",
actor=self.actor
)
# Returns: model_endpoint_type='anthropic', endpoint='https://api.anthropic.com', etc. ✅
```
**Result:**
- Summarizer with different provider gets correct default config
- No more mixing Google endpoints with Anthropic models
- Same-provider summarizers still inherit agent settings efficiently
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
fix: use shared event + .athrow() to properly set stream_was_cancelled flag
**Problem:**
When a run is cancelled via /cancel endpoint, `stream_was_cancelled` remained
False because `RunCancelledException` was raised in the consumer code (wrapper),
which closes the generator from outside. This causes Python to skip the
generator's except blocks and jump directly to finally with the wrong flag value.
**Solution:**
1. Shared `asyncio.Event` registry for cross-layer cancellation signaling
2. `cancellation_aware_stream_wrapper` sets the event when cancellation detected
3. Wrapper uses `.athrow()` to inject exception INTO generator (not consumer-side raise)
4. All streaming interfaces check event in `finally` block to set flag correctly
5. `streaming_service.py` handles `RunCancelledException` gracefully, yields [DONE]
**Changes:**
- streaming_response.py: Event registry + .athrow() injection + graceful handling
- openai_streaming_interface.py: 3 classes check event in finally
- gemini_streaming_interface.py: Check event in finally
- anthropic_*.py: Catch RunCancelledException
- simple_llm_stream_adapter.py: Create & pass event to interfaces
- streaming_service.py: Handle RunCancelledException, yield [DONE], skip double-update
- routers/v1/{conversations,runs}.py: Pass event to wrapper
- integration_test_human_in_the_loop.py: New test for approval + cancellation
**Tests:**
- test_tool_call with cancellation (OpenAI models) ✅
- test_approve_with_cancellation (approval flow + concurrent cancel) ✅
**Known cosmetic warnings (pre-existing):**
- "Run already in terminal state" - agent loop tries to update after /cancel
- "Stream ended without terminal event" - background streaming timing race
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* fix: don't need embedding model for self hosted
* stage publish api
* passes tests
* add test
* remove unnecessary upgrades
* update revision order db migrations
* add timeout for ci
* make favorite tag a const
* add favorite:user:{userId} for favorites
* favorite agent upon initial creation
* rename const
* add eslint ignore
* expect favorite tag
* test: add comprehensive provider trace telemetry tests
Add two test files for provider trace telemetry:
1. test_provider_trace.py - Integration tests for:
- Basic agent steps (streaming and non-streaming)
- Tool calls
- Telemetry context fields (agent_id, agent_tags, step_id, run_id)
- Multi-step conversations
- Request/response JSON content
2. test_provider_trace_summarization.py - Unit tests for:
- simple_summary() telemetry context passing
- summarize_all() telemetry pass-through
- summarize_via_sliding_window() telemetry pass-through
- Summarizer class runtime vs constructor telemetry
- LLMClient.set_telemetry_context() method
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* test: add telemetry tests for tool generation, adapters, and agent versions
Add comprehensive unit tests for provider trace telemetry:
- TestToolGenerationTelemetry: Verify /generate-tool endpoint sets
call_type="tool_generation" and has no agent context
- TestLLMClientTelemetryContext: Verify LLMClient.set_telemetry_context
accepts all telemetry fields
- TestAdapterTelemetryAttributes: Verify base adapter and subclasses
(LettaLLMRequestAdapter, LettaLLMStreamAdapter) support telemetry attrs
- TestSummarizerTelemetry: Verify Summarizer stores and passes telemetry
- TestAgentAdapterInstantiation: Verify LettaAgentV2 creates Summarizer
with correct agent_id
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* ci: add provider trace telemetry tests to unit test workflow
Add the new provider trace test files to the CI matrix:
- test_provider_trace_backends.py
- test_provider_trace_summarization.py
- test_provider_trace_agents.py
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: update socket backend test to match new record structure
The socket backend record structure changed - step_id/run_id are now
at top level, and model/usage are nested in request/response objects.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: add step_id to V1 agent telemetry context
Pass step_id to set_telemetry_context in both streaming and non-streaming
paths in LettaAgent (v1). The step_id is available via step_metrics.id
in the non-streaming path and passed explicitly in the streaming path.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
* feat: add agent_id, run_id, step_id to summarization provider traces
Summarization LLM calls were missing telemetry context (agent_id,
agent_tags, run_id, step_id), making it impossible to attribute
summarization costs to specific agents or trace them back to the
step that triggered compaction.
Changes:
- Add step_id param to simple_summary() and set_telemetry_context()
- Add agent_id, agent_tags, run_id, step_id to summarize_all() and
summarize_via_sliding_window()
- Update Summarizer class to accept and pass telemetry context
- Update LettaAgentV3.compact() to pass full telemetry context
- Update LettaAgentV2.summarize_conversation_history() with run_id/step_id
- Update LettaAgent (v1) streaming methods with run_id param
- Add run_id/step_id to SummarizeParams for Temporal activities
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: update test mock to accept new summarization params
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
* feat(core): add image support in tool returns [LET-7140]
Enable tool_return to support both string and ImageContent content parts,
matching the pattern used for user message inputs. This allows tools
executed client-side to return images back to the agent.
Changes:
- Add LettaToolReturnContentUnion type for text/image content parts
- Update ToolReturn schema to accept Union[str, List[content parts]]
- Update converters for each provider:
- OpenAI Chat Completions: placeholder text for images
- OpenAI Responses API: full image support
- Anthropic: full image support with base64
- Google: placeholder text for images
- Add resolve_tool_return_images() for URL-to-base64 conversion
- Make create_approval_response_message_from_input() async
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): support images in Google tool returns as sibling parts
Following the gemini-cli pattern: images in tool returns are sent as
sibling inlineData parts alongside the functionResponse, rather than
inside it.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* test(core): add integration tests for multi-modal tool returns [LET-7140]
Tests verify that:
- Models with image support (Anthropic, OpenAI Responses API) can see
images in tool returns and identify the secret text
- Models without image support (Chat Completions) get placeholder text
and cannot see the actual image content
- Tool returns with images persist correctly in the database
Uses secret.png test image containing hidden text "FIREBRAWL" that
models must identify to pass the test.
Also fixes misleading comment about Anthropic only supporting base64
images - they support URLs too, we just pre-resolve for consistency.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: simplify tool return image support implementation
Reduce code verbosity while maintaining all functionality:
- Extract _resolve_url_to_base64() helper in message_helper.py (eliminates duplication)
- Add _get_text_from_part() helper for text extraction
- Add _get_base64_image_data() helper for image data extraction
- Add _tool_return_to_google_parts() to simplify Google implementation
- Add _image_dict_to_data_url() for OpenAI Responses format
- Use walrus operator and list comprehensions where appropriate
- Add integration_test_multi_modal_tool_returns.py to CI workflow
Net change: -120 lines while preserving all features and test coverage.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(tests): improve prompt for multi-modal tool return tests
Make prompts more direct to reduce LLM flakiness:
- Simplify tool description: "Retrieves a secret image with hidden text. Call this function to get the image."
- Change user prompt from verbose request to direct command: "Call the get_secret_image function now."
- Apply to both test methods
This reduces ambiguity and makes tool calling more reliable across different LLM models.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix bugs
* test(core): add google_ai/gemini-2.0-flash-exp to multi-modal tests
Add Gemini model to test coverage for multi-modal tool returns. Google AI already supports images in tool returns via sibling inlineData parts.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(ui): handle multi-modal tool_return type in frontend components
Convert Union<string, LettaToolReturnContentUnion[]> to string for display:
- ViewRunDetails: Convert array to '[Image here]' placeholder
- ToolCallMessageComponent: Convert array to '[Image here]' placeholder
Fixes TypeScript errors in web, desktop-ui, and docker-ui type-checks.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Caren Thomas <carenthomas@gmail.com>
* feat: byok provider models in db also
* make tests and sync api
* fix inconsistent state with recreating provider of same name
* fix sync on byok creation
* update revision
* move stripe code for testing purposes
* revert
* add refresh byok models endpoint
* just stage publish api
* add tests
* reorder revision
* add test for name clashes
* feat: enable bedrock for anthropic models
* parallel tool calls in ade
* attempt add to ci
* update tests
* add env vars
* hardcode region
* get it working
* debugging
* add bedrock extra
* default env var [skip ci]
* run ci
* reasoner model update
* secrets
* clean up log
* clean up
* feat: add provider trace backend abstraction for multi-backend telemetry
Introduces a pluggable backend system for provider traces:
- Base class with async/sync create and read interfaces
- PostgreSQL backend (existing behavior)
- ClickHouse backend (via OTEL instrumentation)
- Socket backend (writes to Unix socket for crouton sidecar)
- Factory for instantiating backends from config
Refactors TelemetryManager to use backends with support for:
- Multi-backend writes (concurrent via asyncio.gather)
- Primary backend for reads (first in config list)
- Graceful error handling per backend
Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND (comma-separated)
Example: "postgres,socket" for dual-write to Postgres and crouton
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* feat: add protocol version to socket backend records
Adds PROTOCOL_VERSION constant to socket backend:
- Included in every telemetry record sent to crouton
- Must match ProtocolVersion in apps/crouton/main.go
- Enables crouton to detect and reject incompatible messages
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: remove organization_id from ProviderTraceCreate calls
The organization_id is now handled via the actor parameter in the
telemetry manager, not through ProviderTraceCreate schema. This fixes
validation errors after changing ProviderTraceCreate to inherit from
BaseProviderTrace which forbids extra fields.
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* consolidate provider trace
* add clickhouse-connect to fix bug on main lmao
* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api
* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api
* consolidate provider trace
* consolidate provider trace bug fix
---------
Co-authored-by: Letta <noreply@letta.com>
* feat: add TypeScript tool support for E2B sandbox execution
This change implements TypeScript tool support using the same E2B path as Python tools:
- Add TypeScript execution script generator (typescript_generator.py)
- Modify E2B sandbox to detect TypeScript tools and use language='ts'
- Add npm package installation for TypeScript tool dependencies
- Add validation requiring json_schema for TypeScript tools
- Add comprehensive integration tests for TypeScript tools
TypeScript tools:
- Require explicit json_schema (no docstring parsing)
- Use JSON serialization instead of pickle for results
- Support async functions with top-level await
- Support npm package dependencies via npm_requirements field
Closes#8793
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
* fix: disable AgentState for TypeScript tools & add letta-client injection
Based on Sarah's feedback:
1. AgentState is a legacy Python-only feature, disabled for TS tools
2. Added @letta-ai/letta-client npm package injection for TypeScript
(similar to letta_client for Python)
Changes:
- base.py: Explicitly set inject_agent_state=False for TypeScript tools
- typescript_generator.py: Inject LettaClient initialization code
- e2b_sandbox.py: Auto-install @letta-ai/letta-client for TS tools
- Added tests verifying both behaviors
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Sarah Wooders <sarahwooders@users.noreply.github.com>
Co-Authored-By: Letta <noreply@letta.com>
* Update core-integration-tests.yml
* fix: convert TypeScript test fixtures to async
The OrganizationManager and UserManager no longer have sync methods,
only async variants. Updated all fixtures to use:
- create_organization_async
- create_actor_async
- create_or_update_tool_async
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: skip Python AST parsing for TypeScript tools in sandbox base
The _init_async method was calling parse_function_arguments (which uses
Python's ast.parse) before checking if the tool was TypeScript, causing
SyntaxError when running TypeScript tools.
Moved the is_typescript_tool() check to happen first, skipping Python
AST parsing entirely for TypeScript tools.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* letta_agent_id
* skip ast parsing for s
* add tool execution test
---------
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <kian@letta.com>