letta-server

Author	SHA1	Message	Date
cthomas	416ffc7cd7	Add billing context to LLM telemetry traces (#9745 ) * feat: add billing context to LLM telemetry traces Add billing metadata (plan type, cost source, customer ID) to LLM traces in ClickHouse for cost analytics and attribution. Data Flow: - Cloud-API: Extract billing info from subscription in rate limiting, set x-billing-* headers - Core: Parse headers into BillingContext object via dependencies - Adapters: Flow billing_context through all LLM adapters (blocking & streaming) - Agent: Pass billing_context to step() and stream() methods - ClickHouse: Store in billing_plan_type, billing_cost_source, billing_customer_id columns Changes: - Add BillingContext schema to provider_trace.py - Add billing columns to llm_traces ClickHouse table DDL - Update getCustomerSubscription to fetch stripeCustomerId from organization_billing_details - Propagate billing_context through agent step flow, adapters, and streaming service - Update ProviderTrace and LLMTrace to include billing metadata - Regenerate SDK with autogen Production Deployment: Requires env vars: LETTA_PROVIDER_TRACE_BACKEND=clickhouse, LETTA_STORE_LLM_TRACES=true, CLICKHOUSE_* 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add billing_context parameter to agent step methods - Add billing_context to BaseAgent and BaseAgentV2 abstract methods - Update LettaAgent, LettaAgentV2, LettaAgentV3 step methods - Update multi-agent groups: SleeptimeMultiAgentV2, V3, V4 - Fix test_utils.py to include billing header parameters - Import BillingContext in all affected files * fix: add billing_context to stream methods - Add billing_context parameter to BaseAgentV2.stream() - Add billing_context parameter to LettaAgentV2.stream() - LettaAgentV3.stream() already has it from previous commit * fix: exclude billing headers from OpenAPI spec Mark billing headers as internal (include_in_schema=False) so they don't appear in the public API. These are internal headers between cloud-api and core, not part of the public SDK. Regenerated SDK with stage-api - removes 10,650 lines of bloat that was causing OOM during Next.js build. * refactor: return billing context from handleUnifiedRateLimiting instead of mutating req Instead of passing req into handleUnifiedRateLimiting and mutating headers inside it: - Return billing context fields (billingPlanType, billingCostSource, billingCustomerId) from handleUnifiedRateLimiting - Set headers in handleMessageRateLimiting (middleware layer) after getting the result - This fixes step-orchestrator compatibility since it doesn't have a real Express req object * chore: remove extra gencode * p --------- Co-authored-by: Letta <noreply@letta.com>	2026-03-03 18:34:13 -08:00
jnjpng	e8d5922ff9	fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming (#9535 ) * fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming When reasoning models (gpt-5.x) exhaust their max_output_tokens budget on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent instead of ResponseCompletedEvent. This was previously unhandled, causing final_response to remain None — which meant get_content() and get_tool_call_objects() returned empty results, silently dropping the partial response. Now ResponseIncompleteEvent is handled identically to ResponseCompletedEvent (extracting partial content, usage stats, and token details), with an additional warning log indicating the incomplete reason. * fix(core): propagate finish_reason for Responses API incomplete events - Guard usage extraction against None usage payload in ResponseIncompleteEvent handler - Add _finish_reason override to LettaLLMAdapter so streaming adapters can explicitly set finish_reason without a chat_completions_response - Map incomplete_details.reason="max_output_tokens" to finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat Completions API convention - This allows the agent loop's _decide_continuation to correctly return stop_reason="max_tokens_exceeded" instead of "end_turn" when the model exhausts its output token budget on reasoning * fix(core): handle empty content parts in incomplete ResponseOutputMessage When a model hits max_output_tokens after starting a ResponseOutputMessage but before producing any content parts, the message has content=[]. This previously raised ValueError("Got 0 content parts, expected 1"). Now it logs a warning and skips the empty message, allowing reasoning-only incomplete responses to be processed cleanly. * fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens Handle content_filter and any future unknown incomplete reasons from the Responses API instead of silently leaving finish_reason as None.	2026-02-24 10:55:11 -08:00
Kian Jones	f5c4ab50f4	chore: add ty + pre-commit hook and repeal even more ruff rules (#9504 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import * remove all ignores, add FastAPI rules and Ruff rules * add ty and precommit * ruff stuff * ty check fixes * ty check fixes pt 2 * error on invalid	2026-02-24 10:55:11 -08:00
Kian Jones	25d54dd896	chore: enable F821, F401, W293 (#9503 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import	2026-02-24 10:55:08 -08:00
jnjpng	e3eafb1977	fix: re-raise LLMError before wrapping with handle_llm_error (#9482 ) LLMError exceptions are already properly formatted errors that should propagate directly. Without this check, they get unnecessarily wrapped by handle_llm_error, losing their original error information.	2026-02-24 10:52:07 -08:00
Kian Jones	b9c4ed3b15	fix: catch contextwindowexceeded error on gemini (#9450 ) * catch contextwindowexceeded error * fix(core): detect Google token limit errors as ContextWindowExceededError Google's error message says "input token count exceeds the maximum number of tokens allowed" which doesn't contain the word "context", so it was falling through to generic LLMBadRequestError instead of ContextWindowExceededError. This means compaction won't auto-trigger. Expands the detection to also match "token count" and "tokens allowed" in addition to the existing "context" keyword. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): add missing message arg to LLMBadRequestError in OpenAI client The generic 400 path in handle_llm_error was constructing LLMBadRequestError without the required message positional arg, causing TypeError in prod during summarization. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * ci: add adapters/ test suite to core unit test matrix 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(tests): update adapter error handling test expectations to match actual behavior The streaming adapter's error handling double-wraps errors: the AnthropicStreamingInterface calls handle_llm_error first, then the adapter catches the result and calls handle_llm_error again, which falls through to the base class LLMError. Updated test expectations to match this behavior. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): prevent double-wrapping of LLMError in stream adapter The AnthropicStreamingInterface.process() already transforms raw provider errors into LLMError subtypes via handle_llm_error. The adapter was catching the result and calling handle_llm_error again, which didn't recognize the already-transformed LLMError and wrapped it in a generic LLMError("Unhandled LLM error"). This downgraded specific error types (LLMConnectionError, LLMServerError, etc.) and broke retry logic that matches on specific subtypes. Now the adapter checks if the error is already an LLMError and re-raises it as-is. Tests restored to original correct expectations. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	382e216cbb	fix(core): differentiate BYOK vs base provider in all LLM error details (#9425 ) Add is_byok flag to every LLMError's details dict returned from handle_llm_error across all providers (OpenAI, Anthropic, Google, ChatGPT OAuth). This enables observability into whether errors originate from Letta's production keys or user-provided BYOK keys. The rate limit handler in app.py now returns a more helpful message for BYOK users ("check your provider's rate limits and billing") versus the generic message for base provider rate limits. Datadog issues: - https://us5.datadoghq.com/error-tracking/issue/b711c824-f490-11f0-96e4-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/76623036-f4de-11f0-8697-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/43e9888a-dfcf-11f0-a645-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kevin Lin	23c94ec6d3	feat: add log probabilities from OpenAI-compatible servers and SGLang native endpoint (#9240 ) * Add log probabilities support for RL training This enables Letta server to request and return log probabilities from OpenAI-compatible providers (including SGLang) for use in RL training. Changes: - LLMConfig: Add return_logprobs and top_logprobs fields - OpenAIClient: Set logprobs in ChatCompletionRequest when enabled - LettaLLMAdapter: Add logprobs field and extract from response - LettaResponse: Add logprobs field to return log probs to client - LettaRequest: Add return_logprobs/top_logprobs for per-request override - LettaAgentV3: Store and pass logprobs through to response - agents.py: Handle request-level logprobs override Usage: response = client.agents.messages.create( agent_id=agent_id, messages=[...], return_logprobs=True, top_logprobs=5, ) print(response.logprobs) # Per-token log probabilities 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * Add multi-turn token tracking for RL training via SGLang native endpoint - Add TurnTokenData schema to track token IDs and logprobs per turn - Add return_token_ids flag to LettaRequest and LLMConfig - Create SGLangNativeClient for /generate endpoint (returns output_ids) - Create SGLangNativeAdapter that uses native endpoint - Modify LettaAgentV3 to accumulate turns across LLM calls - Include turns in LettaResponse when return_token_ids=True * Fix: Add SGLang native adapter to step() method, not just stream() * Fix: Handle Pydantic Message objects in SGLang native adapter * Fix: Remove api_key reference from LLMConfig (not present) * Fix: Add missing 'created' field to ChatCompletionResponse * Add full tool support to SGLang native adapter - Format tools into prompt in Qwen-style format - Parse tool calls from <tool_call> tags in response - Format tool results as <tool_response> in user messages - Set finish_reason to 'tool_calls' when tools are called * Use tokenizer.apply_chat_template for proper tool formatting - Add tokenizer caching in SGLang native adapter - Use apply_chat_template when tokenizer available - Fall back to manual formatting if not - Convert Letta messages to OpenAI format for tokenizer * Fix: Use func_response instead of tool_return for ToolReturn content * Fix: Get output_token_logprobs from meta_info in SGLang response * Fix: Allow None in output_token_logprobs (SGLang format includes null) * chore: remove unrelated files from logprobs branch 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add missing call_type param to adapter constructors in letta_agent_v3 The SGLang refactor dropped call_type=LLMCallType.agent_step when extracting adapter creation into conditional blocks. Restores it for all 3 spots (SGLang in step, SimpleLLM in step, SGLang in stream). 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * just stage-api && just publish-api * fix: update expected LLMConfig fields in schema test for logprobs support 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove rllm provider references 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * just stage-api && just publish-api 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-65-206.ec2.internal> Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Kian Jones	f20fdc73d1	fix(core): preserve Gemini thought_signature on function calls in non-streaming path (#9351 ) * fix(core): preserve Gemini thought_signature on function calls in non-streaming path The Google Gemini API requires thought_signature to be echoed back on function call parts in multi-turn conversations. In the non-streaming request path, the signature was only captured for subsequent function calls (else branch) but dropped for the first/only function call (if branch) in convert_response_to_chat_completion. This caused 400 INVALID_ARGUMENT errors on the next turn. Additionally, when no ReasoningContent existed to carry the signature (e.g. Gemini 2.5 Flash with include_thoughts=False), the signature was lost in the adapter layer. Now it falls through to TextContent. Datadog: https://us5.datadoghq.com/error-tracking/issue/17c4b114-d596-11f0-bcd6-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix(core): preserve Gemini thought_signature in non-temporal agent path Carry reasoning_content_signature on TextContent in letta_agent.py at both locations where content falls through from reasoning (same fix already applied to the adapter and temporal activity paths). Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com> 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>	2026-02-24 10:52:07 -08:00
Kian Jones	6f746c5225	fix(core): handle Anthropic overloaded errors and Unicode encoding issues (#9305 ) * fix: handle Anthropic overloaded_error in streaming interfaces * fix: handle Unicode surrogates in OpenAI requests Sanitize Unicode surrogate pairs before sending requests to OpenAI API. Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause UnicodeEncodeError when encoding to UTF-8. Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in position 326605: surrogates not allowed * fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests Improved sanitize_unicode_surrogates() to explicitly filter out lone surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8. Previous implementation used errors='ignore' which could still fail in edge cases. New approach directly checks Unicode code points and removes any surrogates before data reaches httpx encoding. Also added sanitization to stream_async_responses() method which was missing it. Fixes: 'utf-8' codec can't encode character '\ud83c' in position X: surrogates not allowed	2026-02-24 10:52:06 -08:00
Sarah Wooders	eaf64fb510	fix: add LLMCallType enum and ensure call_type is set on all provider traces (#9258 ) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
cthomas	00aa51927d	fix: add missing call_type to more ProviderTrace callsites (#9266 ) - letta_llm_request_adapter.py - llm_api_tools.py 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
cthomas	e8a565a384	fix: add missing call_type to stream adapter ProviderTrace (#9259 ) Both letta_llm_stream_adapter and simple_llm_stream_adapter were creating ProviderTrace without call_type, causing "unknown" in ClickHouse analytics. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
cthomas	cd7e80acc3	fix: add error trace logging to simple_llm_stream_adapter (#9247 ) Log error traces to ClickHouse when streaming requests fail, matching the behavior in letta_llm_stream_adapter. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Sarah Wooders	4096b30cd7	feat: log LLM traces to clickhouse (#9111 ) * feat: add non-streaming option for conversation messages - Add ConversationMessageRequest with stream=True default (backwards compatible) - stream=true (default): SSE streaming via StreamingService - stream=false: JSON response via AgentLoop.load().step() 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: regenerate API schema for ConversationMessageRequest * feat: add direct ClickHouse storage for raw LLM traces Adds ability to store raw LLM request/response payloads directly in ClickHouse, bypassing OTEL span attribute size limits. This enables debugging and analytics on large LLM payloads (>10MB system prompts, large tool schemas, etc.). New files: - letta/schemas/llm_raw_trace.py: Pydantic schema with ClickHouse row helper - letta/services/llm_raw_trace_writer.py: Async batching writer (fire-and-forget) - letta/services/llm_raw_trace_reader.py: Reader with query methods - scripts/sql/clickhouse/llm_raw_traces.ddl: Production table DDL - scripts/sql/clickhouse/llm_raw_traces_local.ddl: Local dev DDL - apps/core/clickhouse-init.sql: Local dev initialization Modified: - letta/settings.py: Added 4 settings (store_llm_raw_traces, ttl, batch_size, flush_interval) - letta/llm_api/llm_client_base.py: Integration into request_async_with_telemetry - compose.yaml: Added ClickHouse service for local dev - justfile: Added clickhouse, clickhouse-cli, clickhouse-traces commands Feature disabled by default (LETTA_STORE_LLM_RAW_TRACES=false). Uses ZSTD(3) compression for 10-30x reduction on JSON payloads. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: address code review feedback for LLM raw traces Fixes based on code review feedback: 1. Fix ClickHouse endpoint parsing - default to secure=False for raw host:port inputs (was defaulting to HTTPS which breaks local dev) 2. Make raw trace writes truly fire-and-forget - use asyncio.create_task() instead of awaiting, so JSON serialization doesn't block request path 3. Add bounded queue (maxsize=10000) - prevents unbounded memory growth under load. Drops traces with warning if queue is full. 4. Fix deprecated asyncio usage - get_running_loop() instead of get_event_loop() 5. Add org_id fallback - use _telemetry_org_id if actor doesn't have it 6. Remove unused imports - json import in reader 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add missing asyncio import and simplify JSON serialization - Add missing 'import asyncio' that was causing 'name asyncio is not defined' error - Remove unnecessary clean_double_escapes() function - the JSON is stored correctly, the clickhouse-client CLI was just adding extra escaping when displaying - Update just clickhouse-trace to use Python client for correct JSON output 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: add clickhouse raw trace integration test * test: simplify clickhouse trace assertions * refactor: centralize usage parsing and stream error traces Use per-client usage helpers for raw trace extraction and ensure streaming errors log requests with error metadata. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: exercise provider usage parsing live Make live OpenAI/Anthropic/Gemini requests with credential gating and validate Anthropic cache usage mapping when present. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: fix usage parsing tests to pass - Use GoogleAIClient with GEMINI_API_KEY instead of GoogleVertexClient - Update model to gemini-2.0-flash (1.5-flash deprecated in v1beta) - Add tools=[] for Gemini/Anthropic build_request_data 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: extract_usage_statistics returns LettaUsageStatistics Standardize on LettaUsageStatistics as the canonical usage format returned by client helpers. Inline UsageStatistics construction for ChatCompletionResponse where needed. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add is_byok and llm_config_json columns to ClickHouse traces Extend llm_raw_traces table with: - is_byok (UInt8): Track BYOK vs base provider usage for billing analytics - llm_config_json (String, ZSTD): Store full LLM config for debugging and analysis This enables queries like: - BYOK usage breakdown by provider/model - Config parameter analysis (temperature, max_tokens, etc.) - Debugging specific request configurations * feat: add tests for error traces, llm_config_json, and cache tokens - Update llm_raw_trace_reader.py to query new columns (is_byok, cached_input_tokens, cache_write_tokens, reasoning_tokens, llm_config_json) - Add test_error_trace_stored_in_clickhouse to verify error fields - Add test_cache_tokens_stored_for_anthropic to verify cache token storage - Update existing tests to verify llm_config_json is stored correctly - Make llm_config required in log_provider_trace_async() - Simplify provider extraction to use provider_name directly 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * ci: add ClickHouse integration tests to CI pipeline - Add use-clickhouse option to reusable-test-workflow.yml - Add ClickHouse service container with otel database - Add schema initialization step using clickhouse-init.sql - Add ClickHouse env vars (CLICKHOUSE_ENDPOINT, etc.) - Add separate clickhouse-integration-tests job running integration_test_clickhouse_llm_raw_traces.py 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify provider and org_id extraction in raw trace writer - Use model_endpoint_type.value for provider (not provider_name) - Simplify org_id to just self.actor.organization_id (actor is always pydantic) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify LLMRawTraceWriter with _enabled flag - Check ClickHouse env vars once at init, set _enabled flag - Early return in write_async/flush_async if not enabled - Remove ValueError raises (never used) - Simplify _get_client (no validation needed since already checked) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add LLMRawTraceWriter shutdown to FastAPI lifespan Properly flush pending traces on graceful shutdown via lifespan instead of relying only on atexit handler. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add agent_tags column to ClickHouse traces Store agent tags as Array(String) for filtering/analytics by tag. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * cleanup * fix(ci): fix ClickHouse schema initialization in CI - Create database separately before loading SQL file - Remove CREATE DATABASE from SQL file (handled in CI step) - Add verification step to confirm table was created - Use -sf flag for curl to fail on HTTP errors 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify LLM trace writer with ClickHouse async_insert - Use ClickHouse async_insert for server-side batching instead of manual queue/flush loop - Sync cloud DDL schema with clickhouse-init.sql (add missing columns) - Remove redundant llm_raw_traces_local.ddl - Remove unused batch_size/flush_interval settings - Update tests for simplified writer Key changes: - async_insert=1, wait_for_async_insert=1 for reliable server-side batching - Simple per-trace retry with exponential backoff (max 3 retries) - ~150 lines removed from writer 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: consolidate ClickHouse direct writes into TelemetryManager backend - Add clickhouse_direct backend to provider_trace_backends - Remove duplicate ClickHouse write logic from llm_client_base.py - Configure via LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse_direct The clickhouse_direct backend: - Converts ProviderTrace to LLMRawTrace - Extracts usage stats from response JSON - Writes via LLMRawTraceWriter with async_insert 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: address PR review comments and fix llm_config bug Review comment fixes: - Rename clickhouse_direct -> clickhouse_analytics (clearer purpose) - Remove ClickHouse from OSS compose.yaml, create separate compose.clickhouse.yaml - Delete redundant scripts/test_llm_raw_traces.py (use pytest tests) - Remove unused llm_raw_traces_ttl_days setting (TTL handled in DDL) - Fix socket description leak in telemetry_manager docstring - Add cloud-only comment to clickhouse-init.sql - Update justfile to use separate compose file Bug fix: - Fix llm_config not being passed to ProviderTrace in telemetry - Now correctly populates provider, model, is_byok for all LLM calls - Affects both request_async_with_telemetry and log_provider_trace_async DDL optimizations: - Add secondary indexes (bloom_filter for agent_id, model, step_id) - Add minmax indexes for is_byok, is_error - Change model and error_type to LowCardinality for faster GROUP BY 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: rename llm_raw_traces -> llm_traces Address review feedback that "raw" is misleading since we denormalize fields. Renames: - Table: llm_raw_traces -> llm_traces - Schema: LLMRawTrace -> LLMTrace - Files: llm_raw_trace_{reader,writer}.py -> llm_trace_{reader,writer}.py - Setting: store_llm_raw_traces -> store_llm_traces 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: update workflow references to llm_traces Missed renaming table name in CI workflow files. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: update clickhouse_direct -> clickhouse_analytics in docstring 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove inaccurate OTEL size limit comments The 4MB limit is our own truncation logic, not an OTEL protocol limit. The real benefit is denormalized columns for analytics queries. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove local ClickHouse dev setup (cloud-only feature) - Delete clickhouse-init.sql and compose.clickhouse.yaml - Remove local clickhouse just commands - Update CI to use cloud DDL with MergeTree for testing clickhouse_analytics is a cloud-only feature. For local dev, use postgres backend. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: restore compose.yaml to match main 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: merge clickhouse_analytics into clickhouse backend Per review feedback - having two separate backends was confusing. Now the clickhouse backend: - Writes to llm_traces table (denormalized for cost analytics) - Reads from OTEL traces table (will cut over to llm_traces later) Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: correct path to DDL file in CI workflow 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: add provider index to DDL for faster filtering 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: configure telemetry backend in clickhouse tests Tests need to set telemetry_settings.provider_trace_backends to include 'clickhouse', otherwise traces are routed to default postgres backend. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: set provider_trace_backend field, not property provider_trace_backends is a computed property, need to set the underlying provider_trace_backend string field instead. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: error trace test and error_type extraction - Add TelemetryManager to error trace test so traces get written - Fix error_type extraction to check top-level before nested error dict 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use provider_trace.id for trace correlation across backends - Pass provider_trace.id to LLMTrace instead of auto-generating - Log warning if ID is missing (shouldn't happen, helps debug) - Fallback to new UUID only if not set 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: trace ID correlation and concurrency issues - Strip "provider_trace-" prefix from ID for UUID storage in ClickHouse - Add asyncio.Lock to serialize writes (clickhouse_connect not thread-safe) - Fix Anthropic prompt_tokens to include cached tokens for cost analytics - Log warning if provider_trace.id is missing 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Caren Thomas <carenthomas@gmail.com>	2026-02-24 10:52:06 -08:00
Ari Webb	9ce1249738	feat: openrouter byok (#9148 ) * feat: openrouter byok * new client is unnecessary * revert json diffs	2026-01-29 12:44:04 -08:00
Sarah Wooders	b34ad43691	feat: add minimax byok to ui (#9101 ) * fix: patch minimax * feat: add frontend changes for minimax * add logo, fix backend * better check for is minimax * more references fixed for minimax * start revering unnecessary changes * revert backend changes, just ui * fix minimax fully * fix test * add key to deploy action --------- Co-authored-by: Ari Webb <ari@letta.com> Co-authored-by: Ari Webb <arijwebb@gmail.com>	2026-01-29 12:44:04 -08:00
Sarah Wooders	221b4e6279	refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065 ) 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
cthomas	c162de5127	fix: use shared event + .athrow() to properly set stream_was_cancelle… (#9019 ) fix: use shared event + .athrow() to properly set stream_was_cancelled flag Problem: When a run is cancelled via /cancel endpoint, `stream_was_cancelled` remained False because `RunCancelledException` was raised in the consumer code (wrapper), which closes the generator from outside. This causes Python to skip the generator's except blocks and jump directly to finally with the wrong flag value. Solution: 1. Shared `asyncio.Event` registry for cross-layer cancellation signaling 2. `cancellation_aware_stream_wrapper` sets the event when cancellation detected 3. Wrapper uses `.athrow()` to inject exception INTO generator (not consumer-side raise) 4. All streaming interfaces check event in `finally` block to set flag correctly 5. `streaming_service.py` handles `RunCancelledException` gracefully, yields [DONE] Changes: - streaming_response.py: Event registry + .athrow() injection + graceful handling - openai_streaming_interface.py: 3 classes check event in finally - gemini_streaming_interface.py: Check event in finally - anthropic_.py: Catch RunCancelledException - simple_llm_stream_adapter.py: Create & pass event to interfaces - streaming_service.py: Handle RunCancelledException, yield [DONE], skip double-update - routers/v1/{conversations,runs}.py: Pass event to wrapper - integration_test_human_in_the_loop.py: New test for approval + cancellation Tests:* - test_tool_call with cancellation (OpenAI models) ✅ - test_approve_with_cancellation (approval flow + concurrent cancel) ✅ Known cosmetic warnings (pre-existing): - "Run already in terminal state" - agent loop tries to update after /cancel - "Stream ended without terminal event" - background streaming timing race 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
Kian Jones	e3fb00f970	feat(crouton): add orgId, userId, Compaction_Settings and LLM_Config (#9022 ) * LC one shot? * api changes * fix summarizer nameerror	2026-01-29 12:44:04 -08:00
Kian Jones	81b5d71889	feat: add agents and log error properly (#8914 ) * add agents and log error properly * fix llm stream adapter	2026-01-19 15:54:43 -08:00
Kian Jones	b0dfdd2725	fix commas in justfile helm secret setting and bug with missing metadata (#8874 )	2026-01-19 15:54:43 -08:00
Kian Jones	a92e868ee6	feat: centralize telemetry logging at LLM client level (#8815 ) * feat: centralize telemetry logging at LLM client level Moves telemetry logging from individual adapters to LLMClientBase: - Add TelemetryStreamWrapper for streaming telemetry on stream close - Add request_async_with_telemetry() for non-streaming requests - Add stream_async_with_telemetry() for streaming requests - Add set_telemetry_context() to configure agent_id, run_id, step_id Updates adapters and agents to use new pattern: - LettaLLMAdapter now accepts agent_id/run_id in constructor - Adapters call set_telemetry_context() before LLM requests - Removes duplicate telemetry logging from adapters - Enriches traces with agent_id, run_id, call_type metadata 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: accumulate streaming response content for telemetry TelemetryStreamWrapper now extracts actual response data from chunks: - Content text (concatenated from deltas) - Tool calls (id, name, arguments) - Model name, finish reason, usage stats 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: move streaming telemetry to caller (option 3) - Remove TelemetryStreamWrapper class - Add log_provider_trace_async() helper to LLMClientBase - stream_async_with_telemetry() now just returns raw stream - Callers log telemetry after processing with rich interface data Updated callers: - summarizer.py: logs content + usage after stream processing - letta_agent.py: logs tool_call, reasoning, model, usage 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: pass agent_id and run_id to parent adapter class LettaLLMStreamAdapter was not passing agent_id/run_id to parent, causing "unexpected keyword argument" errors. 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-01-19 15:54:43 -08:00
Kian Jones	9418ab9815	feat: add provider trace backend abstraction for multi-backend telemetry (#8814 ) * feat: add provider trace backend abstraction for multi-backend telemetry Introduces a pluggable backend system for provider traces: - Base class with async/sync create and read interfaces - PostgreSQL backend (existing behavior) - ClickHouse backend (via OTEL instrumentation) - Socket backend (writes to Unix socket for crouton sidecar) - Factory for instantiating backends from config Refactors TelemetryManager to use backends with support for: - Multi-backend writes (concurrent via asyncio.gather) - Primary backend for reads (first in config list) - Graceful error handling per backend Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND (comma-separated) Example: "postgres,socket" for dual-write to Postgres and crouton 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add protocol version to socket backend records Adds PROTOCOL_VERSION constant to socket backend: - Included in every telemetry record sent to crouton - Must match ProtocolVersion in apps/crouton/main.go - Enables crouton to detect and reject incompatible messages 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: remove organization_id from ProviderTraceCreate calls The organization_id is now handled via the actor parameter in the telemetry manager, not through ProviderTraceCreate schema. This fixes validation errors after changing ProviderTraceCreate to inherit from BaseProviderTrace which forbids extra fields. 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * consolidate provider trace * add clickhouse-connect to fix bug on main lmao * auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api * auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api * consolidate provider trace * consolidate provider trace bug fix --------- Co-authored-by: Letta <noreply@letta.com>	2026-01-19 15:54:43 -08:00
jnjpng	5017cb1d12	feat: add chatgpt oauth client for codex routing (#8774 ) * base * refresh * use default model fallback * patch * streaming * generate	2026-01-19 15:54:42 -08:00
Sarah Wooders	d5decc2a27	fix: persist streaming errors in run metadata (#8062 )	2026-01-12 10:57:47 -08:00
Ari Webb	0a372b2540	fix: enable zai streaming (#7755 )	2026-01-12 10:57:20 -08:00
Charles Packer	33d39f4643	fix(core): patch usage data tracking for anthropic when context caching is on (#6997 )	2025-12-15 12:03:09 -08:00
Kevin Lin	4b9485a484	feat: Add max tokens exceeded to stop reasons [LET-6480] (#6576 )	2025-12-15 12:03:09 -08:00
Devansh Jain	d1536df6f6	chore: Update deepseek client for v3.2 models (#6556 ) * support for v3.2 models * streaming + context window fix * fix for no assitant text from deepseek	2025-12-15 12:02:34 -08:00
Kian Jones	edeac2c679	fix: fix gemini otel bug and add tracing for tool upsert (#6523 ) add tracing for tool upsert, and fix gemini otel bug	2025-12-15 12:02:33 -08:00
Kian Jones	a38475f23d	fix: safely load span attributes for provider traces (#6508 ) json.dumps on request data. Also remove step and actor since they are already present in the span	2025-12-15 12:02:33 -08:00
Kian Jones	5165d60881	feat: add a new span and log the provider request and response data objects (#6492 ) add a new span and log the provider request and response data objects	2025-12-15 12:02:33 -08:00
Sarah Wooders	91e3dd8b3e	feat: fix new summarizer code and add more tests (#6461 )	2025-12-15 12:02:19 -08:00
Charles Packer	1f7165afc4	fix: patch counting of tokens for anthropic (#6458 ) * fix: patch counting of tokens for anthropic * fix: patch ui to be simpler * fix: patch undercounting bug in anthropic when caching is on	2025-12-15 12:02:19 -08:00
Charles Packer	e67c98eedb	feat: add tests for prompt caching + fix anthropic prompt caching [LET-6373] (#6454 ) * feat: add tests for prompt caching * fix: add cache control breakpoints for anthropic + fix tests * fix: silence logging * fix: patch token counting error * fix: same patch on non-streaming path	2025-12-15 12:02:19 -08:00
Charles Packer	4af6465226	feat(core+web): store raw usage data on streams (and visualize properly in ADE) (#6452 ) * feat(core): store raw usage data on streams * fix(web): various fixes to deal w/ hardcoding against openai	2025-12-15 12:02:19 -08:00
Charles Packer	88a3743cc8	fix(core): distinguish between null and 0 for prompt caching (#6451 ) * fix(core): distinguish between null and 0 for prompt caching * fix: runtime errors * fix: just publish just sgate	2025-12-15 12:02:19 -08:00
Charles Packer	131891e05f	feat: add tracking of advanced usage data (eg caching) [LET-6372] (#6449 ) * feat: init refactor * feat: add helper code * fix: missing file + test * fix: just state/publish api	2025-12-15 12:02:19 -08:00
jnjpng	c6df306ccf	fix: upgrade google-genai sdk version and fix gemini 3 streaming (#6437 ) * base * base --------- Co-authored-by: Letta Bot <noreply@letta.com>	2025-12-15 12:02:18 -08:00
Ari Webb	30dab0abb9	fix: handle llm error during streaming [LET-6280] (#6341 ) handle llm error during streaming Co-authored-by: Ari Webb <ari@letta.com>	2025-11-24 19:10:27 -08:00
Charles Packer	18029250d0	fix(core): sanitize messages to anthropic in the main path the same way (or similar) to how we do it in the token counter (#6044 ) * fix(core): sanitize messages to anthropic in the main path the same way (or similar) to how we do it in the token counter * fix: also patch poison error in backend by filtering lazily * fix: remap streaming errors (what the fuck) * fix: dedupe tool clals * fix: cleanup, removed try/catch	2025-11-13 15:36:55 -08:00
Matthew Zhou	ff81f4153b	feat: Support parallel tool calling streaming for OpenAI chat completions [LET-4594] (#5865 ) * Finish chat completions parallel tool calling * Undo comments * Add comments * Remove test file	2025-11-13 15:36:14 -08:00
Ari Webb	48cc73175b	feat: parallel tool calling for openai non streaming [LET-4593] (#5773 ) * first hack * clean up * first implementation working * revert package-lock * remove openai test * error throw * typo * Update integration_test_send_message_v2.py * Update integration_test_send_message_v2.py * refine test * Only make changes for openai non streaming * Add tests --------- Co-authored-by: Ari Webb <ari@letta.com> Co-authored-by: Matt Zhou <mattzh1314@gmail.com>	2025-11-13 15:36:14 -08:00
Ari Webb	03e7639e2b	handle llm error on request_async [LET-5403] (#5408 ) handle llm error on request_async Co-authored-by: Ari Webb <ari@letta.com>	2025-10-24 15:11:31 -07:00
Matthew Zhou	25f140bd13	fix: Fix anthropic step parallel tool calling and add tests [LET-5438] (#5379 ) * Fix anthropic step parallel tool calling and add tests * Remove print statements	2025-10-24 15:11:31 -07:00
Matthew Zhou	bb8a7889e0	feat: Add parallel tool call streaming for anthropic [LET-4601] (#5225 ) * wip * Fix parallel tool calling interface * wip * wip adapt using id field * Integrate new multi tool return schemas into parallel tool calling * Remove example script * Reset changes to llm stream adapter since old agent loop should not enable parallel tool calling * Clean up fallback logic for extracting tool calls * Remove redundant check * Simplify logic * Clean up logic in handle ai response * Fix tests * Write anthropic dict conversion to be back compatible * wip * Double write tool call id for legacy reasons * Fix override args failures * Patch for approvals * Revert comments * Remove extraneous prints	2025-10-24 15:11:31 -07:00
Matthew Zhou	7511b0f4fe	feat: Write anthropic streaming interface that supports parallel tool calling [LET-5355] (#5295 ) Write anthropic streaming interface that supports parallel tool calling	2025-10-09 15:25:21 -07:00
cthomas	1d611d92b9	feat: update assistant content parts union (#5115 ) * feat: update assistant content parts union * api sync * just use the base object since updating assistant breaks frontend	2025-10-07 17:50:48 -07:00
cthomas	f7755d837a	feat: add gemini streaming to new agent loop (#5109 ) * feat: add gemini streaming to new agent loop * add google as required dependency * support storing all content parts * remove extra google references	2025-10-07 17:50:48 -07:00

1 2

63 Commits