letta-server

Author	SHA1	Message	Date
jnjpng	e8d5922ff9	fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming (#9535 ) * fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming When reasoning models (gpt-5.x) exhaust their max_output_tokens budget on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent instead of ResponseCompletedEvent. This was previously unhandled, causing final_response to remain None — which meant get_content() and get_tool_call_objects() returned empty results, silently dropping the partial response. Now ResponseIncompleteEvent is handled identically to ResponseCompletedEvent (extracting partial content, usage stats, and token details), with an additional warning log indicating the incomplete reason. * fix(core): propagate finish_reason for Responses API incomplete events - Guard usage extraction against None usage payload in ResponseIncompleteEvent handler - Add _finish_reason override to LettaLLMAdapter so streaming adapters can explicitly set finish_reason without a chat_completions_response - Map incomplete_details.reason="max_output_tokens" to finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat Completions API convention - This allows the agent loop's _decide_continuation to correctly return stop_reason="max_tokens_exceeded" instead of "end_turn" when the model exhausts its output token budget on reasoning * fix(core): handle empty content parts in incomplete ResponseOutputMessage When a model hits max_output_tokens after starting a ResponseOutputMessage but before producing any content parts, the message has content=[]. This previously raised ValueError("Got 0 content parts, expected 1"). Now it logs a warning and skips the empty message, allowing reasoning-only incomplete responses to be processed cleanly. * fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens Handle content_filter and any future unknown incomplete reasons from the Responses API instead of silently leaving finish_reason as None.	2026-02-24 10:55:11 -08:00
Kian Jones	25d54dd896	chore: enable F821, F401, W293 (#9503 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import	2026-02-24 10:55:08 -08:00
jnjpng	e3eafb1977	fix: re-raise LLMError before wrapping with handle_llm_error (#9482 ) LLMError exceptions are already properly formatted errors that should propagate directly. Without this check, they get unnecessarily wrapped by handle_llm_error, losing their original error information.	2026-02-24 10:52:07 -08:00
Kian Jones	382e216cbb	fix(core): differentiate BYOK vs base provider in all LLM error details (#9425 ) Add is_byok flag to every LLMError's details dict returned from handle_llm_error across all providers (OpenAI, Anthropic, Google, ChatGPT OAuth). This enables observability into whether errors originate from Letta's production keys or user-provided BYOK keys. The rate limit handler in app.py now returns a more helpful message for BYOK users ("check your provider's rate limits and billing") versus the generic message for base provider rate limits. Datadog issues: - https://us5.datadoghq.com/error-tracking/issue/b711c824-f490-11f0-96e4-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/76623036-f4de-11f0-8697-da7ad0900000 - https://us5.datadoghq.com/error-tracking/issue/43e9888a-dfcf-11f0-a645-da7ad0900000 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:07 -08:00
Sarah Wooders	eaf64fb510	fix: add LLMCallType enum and ensure call_type is set on all provider traces (#9258 ) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
cthomas	e8a565a384	fix: add missing call_type to stream adapter ProviderTrace (#9259 ) Both letta_llm_stream_adapter and simple_llm_stream_adapter were creating ProviderTrace without call_type, causing "unknown" in ClickHouse analytics. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
cthomas	cd7e80acc3	fix: add error trace logging to simple_llm_stream_adapter (#9247 ) Log error traces to ClickHouse when streaming requests fail, matching the behavior in letta_llm_stream_adapter. 🤖 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-02-24 10:52:06 -08:00
Ari Webb	9ce1249738	feat: openrouter byok (#9148 ) * feat: openrouter byok * new client is unnecessary * revert json diffs	2026-01-29 12:44:04 -08:00
Sarah Wooders	b34ad43691	feat: add minimax byok to ui (#9101 ) * fix: patch minimax * feat: add frontend changes for minimax * add logo, fix backend * better check for is minimax * more references fixed for minimax * start revering unnecessary changes * revert backend changes, just ui * fix minimax fully * fix test * add key to deploy action --------- Co-authored-by: Ari Webb <ari@letta.com> Co-authored-by: Ari Webb <arijwebb@gmail.com>	2026-01-29 12:44:04 -08:00
Sarah Wooders	221b4e6279	refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065 ) 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
cthomas	c162de5127	fix: use shared event + .athrow() to properly set stream_was_cancelle… (#9019 ) fix: use shared event + .athrow() to properly set stream_was_cancelled flag Problem: When a run is cancelled via /cancel endpoint, `stream_was_cancelled` remained False because `RunCancelledException` was raised in the consumer code (wrapper), which closes the generator from outside. This causes Python to skip the generator's except blocks and jump directly to finally with the wrong flag value. Solution: 1. Shared `asyncio.Event` registry for cross-layer cancellation signaling 2. `cancellation_aware_stream_wrapper` sets the event when cancellation detected 3. Wrapper uses `.athrow()` to inject exception INTO generator (not consumer-side raise) 4. All streaming interfaces check event in `finally` block to set flag correctly 5. `streaming_service.py` handles `RunCancelledException` gracefully, yields [DONE] Changes: - streaming_response.py: Event registry + .athrow() injection + graceful handling - openai_streaming_interface.py: 3 classes check event in finally - gemini_streaming_interface.py: Check event in finally - anthropic_.py: Catch RunCancelledException - simple_llm_stream_adapter.py: Create & pass event to interfaces - streaming_service.py: Handle RunCancelledException, yield [DONE], skip double-update - routers/v1/{conversations,runs}.py: Pass event to wrapper - integration_test_human_in_the_loop.py: New test for approval + cancellation Tests:* - test_tool_call with cancellation (OpenAI models) ✅ - test_approve_with_cancellation (approval flow + concurrent cancel) ✅ Known cosmetic warnings (pre-existing): - "Run already in terminal state" - agent loop tries to update after /cancel - "Stream ended without terminal event" - background streaming timing race 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
Kian Jones	e3fb00f970	feat(crouton): add orgId, userId, Compaction_Settings and LLM_Config (#9022 ) * LC one shot? * api changes * fix summarizer nameerror	2026-01-29 12:44:04 -08:00
Kian Jones	81b5d71889	feat: add agents and log error properly (#8914 ) * add agents and log error properly * fix llm stream adapter	2026-01-19 15:54:43 -08:00
Kian Jones	b0dfdd2725	fix commas in justfile helm secret setting and bug with missing metadata (#8874 )	2026-01-19 15:54:43 -08:00
Kian Jones	9418ab9815	feat: add provider trace backend abstraction for multi-backend telemetry (#8814 ) * feat: add provider trace backend abstraction for multi-backend telemetry Introduces a pluggable backend system for provider traces: - Base class with async/sync create and read interfaces - PostgreSQL backend (existing behavior) - ClickHouse backend (via OTEL instrumentation) - Socket backend (writes to Unix socket for crouton sidecar) - Factory for instantiating backends from config Refactors TelemetryManager to use backends with support for: - Multi-backend writes (concurrent via asyncio.gather) - Primary backend for reads (first in config list) - Graceful error handling per backend Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND (comma-separated) Example: "postgres,socket" for dual-write to Postgres and crouton 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add protocol version to socket backend records Adds PROTOCOL_VERSION constant to socket backend: - Included in every telemetry record sent to crouton - Must match ProtocolVersion in apps/crouton/main.go - Enables crouton to detect and reject incompatible messages 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: remove organization_id from ProviderTraceCreate calls The organization_id is now handled via the actor parameter in the telemetry manager, not through ProviderTraceCreate schema. This fixes validation errors after changing ProviderTraceCreate to inherit from BaseProviderTrace which forbids extra fields. 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * consolidate provider trace * add clickhouse-connect to fix bug on main lmao * auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api * auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api * consolidate provider trace * consolidate provider trace bug fix --------- Co-authored-by: Letta <noreply@letta.com>	2026-01-19 15:54:43 -08:00
jnjpng	5017cb1d12	feat: add chatgpt oauth client for codex routing (#8774 ) * base * refresh * use default model fallback * patch * streaming * generate	2026-01-19 15:54:42 -08:00
Ari Webb	0a372b2540	fix: enable zai streaming (#7755 )	2026-01-12 10:57:20 -08:00
Charles Packer	33d39f4643	fix(core): patch usage data tracking for anthropic when context caching is on (#6997 )	2025-12-15 12:03:09 -08:00
Devansh Jain	d1536df6f6	chore: Update deepseek client for v3.2 models (#6556 ) * support for v3.2 models * streaming + context window fix * fix for no assitant text from deepseek	2025-12-15 12:02:34 -08:00
Kian Jones	edeac2c679	fix: fix gemini otel bug and add tracing for tool upsert (#6523 ) add tracing for tool upsert, and fix gemini otel bug	2025-12-15 12:02:33 -08:00
Kian Jones	a38475f23d	fix: safely load span attributes for provider traces (#6508 ) json.dumps on request data. Also remove step and actor since they are already present in the span	2025-12-15 12:02:33 -08:00
Kian Jones	5165d60881	feat: add a new span and log the provider request and response data objects (#6492 ) add a new span and log the provider request and response data objects	2025-12-15 12:02:33 -08:00
Charles Packer	1f7165afc4	fix: patch counting of tokens for anthropic (#6458 ) * fix: patch counting of tokens for anthropic * fix: patch ui to be simpler * fix: patch undercounting bug in anthropic when caching is on	2025-12-15 12:02:19 -08:00
Charles Packer	e67c98eedb	feat: add tests for prompt caching + fix anthropic prompt caching [LET-6373] (#6454 ) * feat: add tests for prompt caching * fix: add cache control breakpoints for anthropic + fix tests * fix: silence logging * fix: patch token counting error * fix: same patch on non-streaming path	2025-12-15 12:02:19 -08:00
Charles Packer	4af6465226	feat(core+web): store raw usage data on streams (and visualize properly in ADE) (#6452 ) * feat(core): store raw usage data on streams * fix(web): various fixes to deal w/ hardcoding against openai	2025-12-15 12:02:19 -08:00
Charles Packer	88a3743cc8	fix(core): distinguish between null and 0 for prompt caching (#6451 ) * fix(core): distinguish between null and 0 for prompt caching * fix: runtime errors * fix: just publish just sgate	2025-12-15 12:02:19 -08:00
Charles Packer	131891e05f	feat: add tracking of advanced usage data (eg caching) [LET-6372] (#6449 ) * feat: init refactor * feat: add helper code * fix: missing file + test * fix: just state/publish api	2025-12-15 12:02:19 -08:00
jnjpng	c6df306ccf	fix: upgrade google-genai sdk version and fix gemini 3 streaming (#6437 ) * base * base --------- Co-authored-by: Letta Bot <noreply@letta.com>	2025-12-15 12:02:18 -08:00
Ari Webb	30dab0abb9	fix: handle llm error during streaming [LET-6280] (#6341 ) handle llm error during streaming Co-authored-by: Ari Webb <ari@letta.com>	2025-11-24 19:10:27 -08:00
Charles Packer	18029250d0	fix(core): sanitize messages to anthropic in the main path the same way (or similar) to how we do it in the token counter (#6044 ) * fix(core): sanitize messages to anthropic in the main path the same way (or similar) to how we do it in the token counter * fix: also patch poison error in backend by filtering lazily * fix: remap streaming errors (what the fuck) * fix: dedupe tool clals * fix: cleanup, removed try/catch	2025-11-13 15:36:55 -08:00
Matthew Zhou	ff81f4153b	feat: Support parallel tool calling streaming for OpenAI chat completions [LET-4594] (#5865 ) * Finish chat completions parallel tool calling * Undo comments * Add comments * Remove test file	2025-11-13 15:36:14 -08:00
Ari Webb	48cc73175b	feat: parallel tool calling for openai non streaming [LET-4593] (#5773 ) * first hack * clean up * first implementation working * revert package-lock * remove openai test * error throw * typo * Update integration_test_send_message_v2.py * Update integration_test_send_message_v2.py * refine test * Only make changes for openai non streaming * Add tests --------- Co-authored-by: Ari Webb <ari@letta.com> Co-authored-by: Matt Zhou <mattzh1314@gmail.com>	2025-11-13 15:36:14 -08:00
Matthew Zhou	bb8a7889e0	feat: Add parallel tool call streaming for anthropic [LET-4601] (#5225 ) * wip * Fix parallel tool calling interface * wip * wip adapt using id field * Integrate new multi tool return schemas into parallel tool calling * Remove example script * Reset changes to llm stream adapter since old agent loop should not enable parallel tool calling * Clean up fallback logic for extracting tool calls * Remove redundant check * Simplify logic * Clean up logic in handle ai response * Fix tests * Write anthropic dict conversion to be back compatible * wip * Double write tool call id for legacy reasons * Fix override args failures * Patch for approvals * Revert comments * Remove extraneous prints	2025-10-24 15:11:31 -07:00
Matthew Zhou	7511b0f4fe	feat: Write anthropic streaming interface that supports parallel tool calling [LET-5355] (#5295 ) Write anthropic streaming interface that supports parallel tool calling	2025-10-09 15:25:21 -07:00
cthomas	1d611d92b9	feat: update assistant content parts union (#5115 ) * feat: update assistant content parts union * api sync * just use the base object since updating assistant breaks frontend	2025-10-07 17:50:48 -07:00
cthomas	f7755d837a	feat: add gemini streaming to new agent loop (#5109 ) * feat: add gemini streaming to new agent loop * add google as required dependency * support storing all content parts * remove extra google references	2025-10-07 17:50:48 -07:00
Sarah Wooders	ef07e03ee3	feat: add `run_id` to input messages and `step_id` to messages (#5099 )	2025-10-07 17:50:48 -07:00
cthomas	67f8e46619	feat: add run id to streamed messages (#5037 )	2025-10-07 17:50:47 -07:00
Matthew Zhou	d3c5d0c330	feat: Add missing import for `SimpleOpenAIResponsesStreamingInterface` (#5036 ) Add missing import	2025-10-07 17:50:47 -07:00
cthomas	76d1bc8cbc	feat: move new streaming adapters into own files (#5001 )	2025-10-07 17:50:47 -07:00

40 Commits