Commit Graph

40 Commits

Author SHA1 Message Date
jnjpng
e8d5922ff9 fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming (#9535)
* fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming

When reasoning models (gpt-5.x) exhaust their max_output_tokens budget
on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent
instead of ResponseCompletedEvent. This was previously unhandled, causing
final_response to remain None — which meant get_content() and
get_tool_call_objects() returned empty results, silently dropping the
partial response.

Now ResponseIncompleteEvent is handled identically to
ResponseCompletedEvent (extracting partial content, usage stats, and
token details), with an additional warning log indicating the incomplete
reason.

* fix(core): propagate finish_reason for Responses API incomplete events

- Guard usage extraction against None usage payload in
  ResponseIncompleteEvent handler
- Add _finish_reason override to LettaLLMAdapter so streaming adapters
  can explicitly set finish_reason without a chat_completions_response
- Map incomplete_details.reason="max_output_tokens" to
  finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat
  Completions API convention
- This allows the agent loop's _decide_continuation to correctly return
  stop_reason="max_tokens_exceeded" instead of "end_turn" when the model
  exhausts its output token budget on reasoning

* fix(core): handle empty content parts in incomplete ResponseOutputMessage

When a model hits max_output_tokens after starting a ResponseOutputMessage
but before producing any content parts, the message has content=[]. This
previously raised ValueError("Got 0 content parts, expected 1"). Now it
logs a warning and skips the empty message, allowing reasoning-only
incomplete responses to be processed cleanly.

* fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens

Handle content_filter and any future unknown incomplete reasons from the
Responses API instead of silently leaving finish_reason as None.
2026-02-24 10:55:11 -08:00
Kian Jones
25d54dd896 chore: enable F821, F401, W293 (#9503)
* auto fixes

* auto fix pt2 and transitive deps and undefined var checking locals()

* manual fixes (ignored or letta-code fixed)

* fix circular import
2026-02-24 10:55:08 -08:00
jnjpng
e3eafb1977 fix: re-raise LLMError before wrapping with handle_llm_error (#9482)
LLMError exceptions are already properly formatted errors that should
propagate directly. Without this check, they get unnecessarily wrapped
by handle_llm_error, losing their original error information.
2026-02-24 10:52:07 -08:00
Kian Jones
382e216cbb fix(core): differentiate BYOK vs base provider in all LLM error details (#9425)
Add is_byok flag to every LLMError's details dict returned from
handle_llm_error across all providers (OpenAI, Anthropic, Google,
ChatGPT OAuth). This enables observability into whether errors
originate from Letta's production keys or user-provided BYOK keys.

The rate limit handler in app.py now returns a more helpful message
for BYOK users ("check your provider's rate limits and billing")
versus the generic message for base provider rate limits.

Datadog issues:
- https://us5.datadoghq.com/error-tracking/issue/b711c824-f490-11f0-96e4-da7ad0900000
- https://us5.datadoghq.com/error-tracking/issue/76623036-f4de-11f0-8697-da7ad0900000
- https://us5.datadoghq.com/error-tracking/issue/43e9888a-dfcf-11f0-a645-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Sarah Wooders
eaf64fb510 fix: add LLMCallType enum and ensure call_type is set on all provider traces (#9258)
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
cthomas
e8a565a384 fix: add missing call_type to stream adapter ProviderTrace (#9259)
Both letta_llm_stream_adapter and simple_llm_stream_adapter were
creating ProviderTrace without call_type, causing "unknown" in
ClickHouse analytics.

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
cthomas
cd7e80acc3 fix: add error trace logging to simple_llm_stream_adapter (#9247)
Log error traces to ClickHouse when streaming requests fail,
matching the behavior in letta_llm_stream_adapter.

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Ari Webb
9ce1249738 feat: openrouter byok (#9148)
* feat: openrouter byok

* new client is unnecessary

* revert json diffs
2026-01-29 12:44:04 -08:00
Sarah Wooders
b34ad43691 feat: add minimax byok to ui (#9101)
* fix: patch minimax

* feat: add frontend changes for minimax

* add logo, fix backend

* better check for is minimax

* more references fixed for minimax

* start revering unnecessary changes

* revert backend changes, just ui

* fix minimax fully

* fix test

* add key to deploy action

---------

Co-authored-by: Ari Webb <ari@letta.com>
Co-authored-by: Ari Webb <arijwebb@gmail.com>
2026-01-29 12:44:04 -08:00
Sarah Wooders
221b4e6279 refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065)
👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
cthomas
c162de5127 fix: use shared event + .athrow() to properly set stream_was_cancelle… (#9019)
fix: use shared event + .athrow() to properly set stream_was_cancelled flag

**Problem:**
When a run is cancelled via /cancel endpoint, `stream_was_cancelled` remained
False because `RunCancelledException` was raised in the consumer code (wrapper),
which closes the generator from outside. This causes Python to skip the
generator's except blocks and jump directly to finally with the wrong flag value.

**Solution:**
1. Shared `asyncio.Event` registry for cross-layer cancellation signaling
2. `cancellation_aware_stream_wrapper` sets the event when cancellation detected
3. Wrapper uses `.athrow()` to inject exception INTO generator (not consumer-side raise)
4. All streaming interfaces check event in `finally` block to set flag correctly
5. `streaming_service.py` handles `RunCancelledException` gracefully, yields [DONE]

**Changes:**
- streaming_response.py: Event registry + .athrow() injection + graceful handling
- openai_streaming_interface.py: 3 classes check event in finally
- gemini_streaming_interface.py: Check event in finally
- anthropic_*.py: Catch RunCancelledException
- simple_llm_stream_adapter.py: Create & pass event to interfaces
- streaming_service.py: Handle RunCancelledException, yield [DONE], skip double-update
- routers/v1/{conversations,runs}.py: Pass event to wrapper
- integration_test_human_in_the_loop.py: New test for approval + cancellation

**Tests:**
- test_tool_call with cancellation (OpenAI models) 
- test_approve_with_cancellation (approval flow + concurrent cancel) 

**Known cosmetic warnings (pre-existing):**
- "Run already in terminal state" - agent loop tries to update after /cancel
- "Stream ended without terminal event" - background streaming timing race

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
Kian Jones
e3fb00f970 feat(crouton): add orgId, userId, Compaction_Settings and LLM_Config (#9022)
* LC one shot?

* api changes

* fix summarizer nameerror
2026-01-29 12:44:04 -08:00
Kian Jones
81b5d71889 feat: add agents and log error properly (#8914)
* add agents and log error properly

* fix llm stream adapter
2026-01-19 15:54:43 -08:00
Kian Jones
b0dfdd2725 fix commas in justfile helm secret setting and bug with missing metadata (#8874) 2026-01-19 15:54:43 -08:00
Kian Jones
9418ab9815 feat: add provider trace backend abstraction for multi-backend telemetry (#8814)
* feat: add provider trace backend abstraction for multi-backend telemetry

Introduces a pluggable backend system for provider traces:
- Base class with async/sync create and read interfaces
- PostgreSQL backend (existing behavior)
- ClickHouse backend (via OTEL instrumentation)
- Socket backend (writes to Unix socket for crouton sidecar)
- Factory for instantiating backends from config

Refactors TelemetryManager to use backends with support for:
- Multi-backend writes (concurrent via asyncio.gather)
- Primary backend for reads (first in config list)
- Graceful error handling per backend

Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND (comma-separated)
Example: "postgres,socket" for dual-write to Postgres and crouton

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add protocol version to socket backend records

Adds PROTOCOL_VERSION constant to socket backend:
- Included in every telemetry record sent to crouton
- Must match ProtocolVersion in apps/crouton/main.go
- Enables crouton to detect and reject incompatible messages

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: remove organization_id from ProviderTraceCreate calls

The organization_id is now handled via the actor parameter in the
telemetry manager, not through ProviderTraceCreate schema. This fixes
validation errors after changing ProviderTraceCreate to inherit from
BaseProviderTrace which forbids extra fields.

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* consolidate provider trace

* add clickhouse-connect to fix bug on main lmao

* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api

* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api

* consolidate provider trace

* consolidate provider trace bug fix

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:43 -08:00
jnjpng
5017cb1d12 feat: add chatgpt oauth client for codex routing (#8774)
* base

* refresh

* use default model fallback

* patch

* streaming

* generate
2026-01-19 15:54:42 -08:00
Ari Webb
0a372b2540 fix: enable zai streaming (#7755) 2026-01-12 10:57:20 -08:00
Charles Packer
33d39f4643 fix(core): patch usage data tracking for anthropic when context caching is on (#6997) 2025-12-15 12:03:09 -08:00
Devansh Jain
d1536df6f6 chore: Update deepseek client for v3.2 models (#6556)
* support for v3.2 models

* streaming + context window fix

* fix for no assitant text from deepseek
2025-12-15 12:02:34 -08:00
Kian Jones
edeac2c679 fix: fix gemini otel bug and add tracing for tool upsert (#6523)
add tracing for tool upsert, and fix gemini otel bug
2025-12-15 12:02:33 -08:00
Kian Jones
a38475f23d fix: safely load span attributes for provider traces (#6508)
json.dumps on request data. Also remove step and actor since they are already present in the span
2025-12-15 12:02:33 -08:00
Kian Jones
5165d60881 feat: add a new span and log the provider request and response data objects (#6492)
add a new span and log the provider request and response data objects
2025-12-15 12:02:33 -08:00
Charles Packer
1f7165afc4 fix: patch counting of tokens for anthropic (#6458)
* fix: patch counting of tokens for anthropic

* fix: patch ui to be simpler

* fix: patch undercounting bug in anthropic when caching is on
2025-12-15 12:02:19 -08:00
Charles Packer
e67c98eedb feat: add tests for prompt caching + fix anthropic prompt caching [LET-6373] (#6454)
* feat: add tests for prompt caching

* fix: add cache control breakpoints for anthropic + fix tests

* fix: silence logging

* fix: patch token counting error

* fix: same patch on non-streaming path
2025-12-15 12:02:19 -08:00
Charles Packer
4af6465226 feat(core+web): store raw usage data on streams (and visualize properly in ADE) (#6452)
* feat(core): store raw usage data on streams

* fix(web): various fixes to deal w/ hardcoding against openai
2025-12-15 12:02:19 -08:00
Charles Packer
88a3743cc8 fix(core): distinguish between null and 0 for prompt caching (#6451)
* fix(core): distinguish between null and 0 for prompt caching

* fix: runtime errors

* fix: just publish just sgate
2025-12-15 12:02:19 -08:00
Charles Packer
131891e05f feat: add tracking of advanced usage data (eg caching) [LET-6372] (#6449)
* feat: init refactor

* feat: add helper code

* fix: missing file + test

* fix: just state/publish api
2025-12-15 12:02:19 -08:00
jnjpng
c6df306ccf fix: upgrade google-genai sdk version and fix gemini 3 streaming (#6437)
* base

* base

---------

Co-authored-by: Letta Bot <noreply@letta.com>
2025-12-15 12:02:18 -08:00
Ari Webb
30dab0abb9 fix: handle llm error during streaming [LET-6280] (#6341)
handle llm error during streaming

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-24 19:10:27 -08:00
Charles Packer
18029250d0 fix(core): sanitize messages to anthropic in the main path the same way (or similar) to how we do it in the token counter (#6044)
* fix(core): sanitize messages to anthropic in the main path the same way (or similar) to how we do it in the token counter

* fix: also patch poison error in backend by filtering lazily

* fix: remap streaming errors (what the fuck)

* fix: dedupe tool clals

* fix: cleanup, removed try/catch
2025-11-13 15:36:55 -08:00
Matthew Zhou
ff81f4153b feat: Support parallel tool calling streaming for OpenAI chat completions [LET-4594] (#5865)
* Finish chat completions parallel tool calling

* Undo comments

* Add comments

* Remove test file
2025-11-13 15:36:14 -08:00
Ari Webb
48cc73175b feat: parallel tool calling for openai non streaming [LET-4593] (#5773)
* first hack

* clean up

* first implementation working

* revert package-lock

* remove openai test

* error throw

* typo

* Update integration_test_send_message_v2.py

* Update integration_test_send_message_v2.py

* refine test

* Only make changes for openai non streaming

* Add tests

---------

Co-authored-by: Ari Webb <ari@letta.com>
Co-authored-by: Matt Zhou <mattzh1314@gmail.com>
2025-11-13 15:36:14 -08:00
Matthew Zhou
bb8a7889e0 feat: Add parallel tool call streaming for anthropic [LET-4601] (#5225)
* wip

* Fix parallel tool calling interface

* wip

* wip adapt using id field

* Integrate new multi tool return schemas into parallel tool calling

* Remove example script

* Reset changes to llm stream adapter since old agent loop should not enable parallel tool calling

* Clean up fallback logic for extracting tool calls

* Remove redundant check

* Simplify logic

* Clean up logic in handle ai response

* Fix tests

* Write anthropic dict conversion to be back compatible

* wip

* Double write tool call id for legacy reasons

* Fix override args failures

* Patch for approvals

* Revert comments

* Remove extraneous prints
2025-10-24 15:11:31 -07:00
Matthew Zhou
7511b0f4fe feat: Write anthropic streaming interface that supports parallel tool calling [LET-5355] (#5295)
Write anthropic streaming interface that supports parallel tool calling
2025-10-09 15:25:21 -07:00
cthomas
1d611d92b9 feat: update assistant content parts union (#5115)
* feat: update assistant content parts union

* api sync

* just use the base object since updating assistant breaks frontend
2025-10-07 17:50:48 -07:00
cthomas
f7755d837a feat: add gemini streaming to new agent loop (#5109)
* feat: add gemini streaming to new agent loop

* add google as required dependency

* support storing all content parts

* remove extra google references
2025-10-07 17:50:48 -07:00
Sarah Wooders
ef07e03ee3 feat: add run_id to input messages and step_id to messages (#5099) 2025-10-07 17:50:48 -07:00
cthomas
67f8e46619 feat: add run id to streamed messages (#5037) 2025-10-07 17:50:47 -07:00
Matthew Zhou
d3c5d0c330 feat: Add missing import for SimpleOpenAIResponsesStreamingInterface (#5036)
Add missing import
2025-10-07 17:50:47 -07:00
cthomas
76d1bc8cbc feat: move new streaming adapters into own files (#5001) 2025-10-07 17:50:47 -07:00