Commit Graph

99 Commits

Author SHA1 Message Date
cthomas
3d781efd21 fix(core): raise LLMEmptyResponseError for empty Anthropic responses (#9624)
* fix(core): raise LLMEmptyResponseError for empty Anthropic responses

Fixes LET-7679: Opus 4.6 occasionally returns empty responses (no content
and no tool calls), causing silent failures with stop_reason=end_turn.

Changes:
- Add LLMEmptyResponseError class (subclass of LLMServerError)
- Raise error in anthropic_client for empty non-streaming responses
- Raise error in anthropic_streaming_interface for empty streaming responses
- Pass through LLMError instances in handle_llm_error to preserve specific types
- Add test for empty streaming response detection

This allows clients (letta-code) to catch this specific error and implement
retry logic with cache-busting modifications.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): set invalid_llm_response stop reason for empty responses

Catch LLMEmptyResponseError specifically and set stop_reason to
invalid_llm_response instead of llm_api_error. This allows clients
to distinguish empty responses from transient API errors.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:01 -08:00
jnjpng
a59f24ac87 fix(core): ensure buffered Anthropic tool chunks always include otid (#9516)
fix(core): ensure otid exists when flushing buffered anthropic tool chunks

Anthropic TOOL_USE buffering can emit buffered tool_call/approval chunks on content block stop before otid is assigned in the normal inner_thoughts_complete path. Ensure flush-time chunks get a deterministic otid so streaming clients can reliably correlate deltas.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:26 -08:00
Ari Webb
a9a6a5f29d fix: add correct logging (#9603) 2026-02-24 10:55:26 -08:00
cthomas
e2ad8762fe fix: gemini streaming bug (#9555) 2026-02-24 10:55:11 -08:00
jnjpng
e8d5922ff9 fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming (#9535)
* fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming

When reasoning models (gpt-5.x) exhaust their max_output_tokens budget
on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent
instead of ResponseCompletedEvent. This was previously unhandled, causing
final_response to remain None — which meant get_content() and
get_tool_call_objects() returned empty results, silently dropping the
partial response.

Now ResponseIncompleteEvent is handled identically to
ResponseCompletedEvent (extracting partial content, usage stats, and
token details), with an additional warning log indicating the incomplete
reason.

* fix(core): propagate finish_reason for Responses API incomplete events

- Guard usage extraction against None usage payload in
  ResponseIncompleteEvent handler
- Add _finish_reason override to LettaLLMAdapter so streaming adapters
  can explicitly set finish_reason without a chat_completions_response
- Map incomplete_details.reason="max_output_tokens" to
  finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat
  Completions API convention
- This allows the agent loop's _decide_continuation to correctly return
  stop_reason="max_tokens_exceeded" instead of "end_turn" when the model
  exhausts its output token budget on reasoning

* fix(core): handle empty content parts in incomplete ResponseOutputMessage

When a model hits max_output_tokens after starting a ResponseOutputMessage
but before producing any content parts, the message has content=[]. This
previously raised ValueError("Got 0 content parts, expected 1"). Now it
logs a warning and skips the empty message, allowing reasoning-only
incomplete responses to be processed cleanly.

* fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens

Handle content_filter and any future unknown incomplete reasons from the
Responses API instead of silently leaving finish_reason as None.
2026-02-24 10:55:11 -08:00
cthomas
3651658ea7 fix: tool call streaming using deprecated field (#9517) 2026-02-24 10:55:11 -08:00
Kian Jones
f5c4ab50f4 chore: add ty + pre-commit hook and repeal even more ruff rules (#9504)
* auto fixes

* auto fix pt2 and transitive deps and undefined var checking locals()

* manual fixes (ignored or letta-code fixed)

* fix circular import

* remove all ignores, add FastAPI rules and Ruff rules

* add ty and precommit

* ruff stuff

* ty check fixes

* ty check fixes pt 2

* error on invalid
2026-02-24 10:55:11 -08:00
Kian Jones
25d54dd896 chore: enable F821, F401, W293 (#9503)
* auto fixes

* auto fix pt2 and transitive deps and undefined var checking locals()

* manual fixes (ignored or letta-code fixed)

* fix circular import
2026-02-24 10:55:08 -08:00
Kian Jones
4126fdadea fix(core): preserve thought_signature on TextContent in Gemini streaming path (#9461)
get_content() was only setting signature on ReasoningContent items.
When Gemini returns a function call with thought_signature but no
ReasoningContent (e.g. include_thoughts=False), the signature was
stored on self.thinking_signature but never attached to TextContent.
This caused "missing thought_signature in functionCall parts" errors
when the message was echoed back to Gemini on the next turn.

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
6f746c5225 fix(core): handle Anthropic overloaded errors and Unicode encoding issues (#9305)
* fix: handle Anthropic overloaded_error in streaming interfaces

* fix: handle Unicode surrogates in OpenAI requests

Sanitize Unicode surrogate pairs before sending requests to OpenAI API.
Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause
UnicodeEncodeError when encoding to UTF-8.

Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in
position 326605: surrogates not allowed

* fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests

Improved sanitize_unicode_surrogates() to explicitly filter out lone
surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8.

Previous implementation used errors='ignore' which could still fail in
edge cases. New approach directly checks Unicode code points and removes
any surrogates before data reaches httpx encoding.

Also added sanitization to stream_async_responses() method which was
missing it.

Fixes: 'utf-8' codec can't encode character '\ud83c' in position X:
surrogates not allowed
2026-02-24 10:52:06 -08:00
Ari Webb
0bbb9c9bc0 feat: add reasoning zai openrouter (#9189)
* feat: add reasoning zai openrouter

* add openrouter reasoning

* stage + publish api

* openrouter reasoning always on

* revert

* fix

* remove reference

* do
2026-02-24 10:52:06 -08:00
Sarah Wooders
221b4e6279 refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065)
👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
cthomas
c162de5127 fix: use shared event + .athrow() to properly set stream_was_cancelle… (#9019)
fix: use shared event + .athrow() to properly set stream_was_cancelled flag

**Problem:**
When a run is cancelled via /cancel endpoint, `stream_was_cancelled` remained
False because `RunCancelledException` was raised in the consumer code (wrapper),
which closes the generator from outside. This causes Python to skip the
generator's except blocks and jump directly to finally with the wrong flag value.

**Solution:**
1. Shared `asyncio.Event` registry for cross-layer cancellation signaling
2. `cancellation_aware_stream_wrapper` sets the event when cancellation detected
3. Wrapper uses `.athrow()` to inject exception INTO generator (not consumer-side raise)
4. All streaming interfaces check event in `finally` block to set flag correctly
5. `streaming_service.py` handles `RunCancelledException` gracefully, yields [DONE]

**Changes:**
- streaming_response.py: Event registry + .athrow() injection + graceful handling
- openai_streaming_interface.py: 3 classes check event in finally
- gemini_streaming_interface.py: Check event in finally
- anthropic_*.py: Catch RunCancelledException
- simple_llm_stream_adapter.py: Create & pass event to interfaces
- streaming_service.py: Handle RunCancelledException, yield [DONE], skip double-update
- routers/v1/{conversations,runs}.py: Pass event to wrapper
- integration_test_human_in_the_loop.py: New test for approval + cancellation

**Tests:**
- test_tool_call with cancellation (OpenAI models) 
- test_approve_with_cancellation (approval flow + concurrent cancel) 

**Known cosmetic warnings (pre-existing):**
- "Run already in terminal state" - agent loop tries to update after /cancel
- "Stream ended without terminal event" - background streaming timing race

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
Ari Webb
851798d71a fix: step_id is none (#8528) 2026-01-19 15:54:37 -08:00
Kian Jones
82e5d70807 fix: prevent empty reasoning messages in streaming interfaces (#7207)
* fix: prevent empty reasoning messages in streaming interfaces

Prevents empty "Thinking..." indicators from appearing in clients by
filtering out reasoning messages with no content at the source.

Changes:
- Gemini: Don't emit ReasoningMessage when only thought_signature exists
- Gemini: Only emit reasoning content if text is non-empty
- Anthropic: Don't emit ReasoningMessage for BetaSignatureDelta
- Anthropic: Only emit reasoning content if thinking text is non-empty

This fixes the issue where providers send signature metadata before
actual thinking content, causing empty reasoning blocks to appear
in the UI after responses complete.

Affects: Gemini reasoning, Anthropic extended thinking

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: handle Anthropic thinking signature correctly

- Only include 'signature' in Anthropic message payload if it is not None (fixes BadRequestError).
- Capture and attach 'signature' to ReasoningMessage in streaming interface.

* fix(anthropic): attach signature to last reasoning message in stream

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-12 10:57:19 -08:00
Sarah Wooders
8729a037b9 fix: handle new openai overflow error format (#7110) 2025-12-17 17:31:02 -08:00
Devansh Jain
d1536df6f6 chore: Update deepseek client for v3.2 models (#6556)
* support for v3.2 models

* streaming + context window fix

* fix for no assitant text from deepseek
2025-12-15 12:02:34 -08:00
Kian Jones
647e271c2a fix: add more logging for stream error (#6490)
* trying tout gpt-5.1-codex

* add unit test for message content

* try to support multimodal

* remove ValueError and add logging on stream error

* prevent stream termination from api spec implementation errors

* fix: remove final_response references from non-Responses API interfaces

* fix: add diagnostic attributes to SimpleOpenAIResponsesStreamingInterface

* fix: remove final_response from SimpleOpenAIStreamingInterface (Chat Completions API)
2025-12-15 12:02:33 -08:00
Sarah Wooders
91e3dd8b3e feat: fix new summarizer code and add more tests (#6461) 2025-12-15 12:02:19 -08:00
Charles Packer
1f7165afc4 fix: patch counting of tokens for anthropic (#6458)
* fix: patch counting of tokens for anthropic

* fix: patch ui to be simpler

* fix: patch undercounting bug in anthropic when caching is on
2025-12-15 12:02:19 -08:00
Charles Packer
4af6465226 feat(core+web): store raw usage data on streams (and visualize properly in ADE) (#6452)
* feat(core): store raw usage data on streams

* fix(web): various fixes to deal w/ hardcoding against openai
2025-12-15 12:02:19 -08:00
Charles Packer
88a3743cc8 fix(core): distinguish between null and 0 for prompt caching (#6451)
* fix(core): distinguish between null and 0 for prompt caching

* fix: runtime errors

* fix: just publish just sgate
2025-12-15 12:02:19 -08:00
Charles Packer
131891e05f feat: add tracking of advanced usage data (eg caching) [LET-6372] (#6449)
* feat: init refactor

* feat: add helper code

* fix: missing file + test

* fix: just state/publish api
2025-12-15 12:02:19 -08:00
Charles Packer
e142d440d5 fix: patch gemini token counting (#6445)
fix: use usage_metadata.candidates_token_count for counting total tokens
2025-12-15 12:02:18 -08:00
Charles Packer
081a1f6920 fix(core): patch responses api parallel tool calling not returning tool call IDs (#6079)
* fix(core): patch responses api parallel tool calling not returning tool call ids

* fix(core): patch chatcompletions as well

* fix: patch problem with gpt-4.1
2025-11-13 15:36:56 -08:00
Matthew Zhou
72e80395cc fix: Fix gemini streaming interface string growth [LET-6067] (#5975)
* Fix gemini streaming interface

* Add comments
2025-11-13 15:36:55 -08:00
Matthew Zhou
6f57ae829a fix: Reduce string growth for anthropic (#5974)
Reduce string growth for anthropic
2025-11-13 15:36:55 -08:00
Matthew Zhou
a699aca626 fix: Eliminate O(n^2) string growth for OpenAI [LET-6065] (#5973)
Finish
2025-11-13 15:36:55 -08:00
Matthew Zhou
7b3cb0224a feat: Add gemini parallel tool call streaming for gemini [LET-6027] (#5913)
* Make changes to gemini streaming interface to support parallel tool calling

* Finish send message integration test

* Add comments
2025-11-13 15:36:39 -08:00
Matthew Zhou
d3ab51b822 feat: Support parallel tool calling streaming for responses OpenAI [LET-5977] (#5867) 2025-11-13 15:36:20 -08:00
Matthew Zhou
ff81f4153b feat: Support parallel tool calling streaming for OpenAI chat completions [LET-4594] (#5865)
* Finish chat completions parallel tool calling

* Undo comments

* Add comments

* Remove test file
2025-11-13 15:36:14 -08:00
Ari Webb
9d5fdc6de7 feat: migrate integration test mcp serverspy to use 1.0 client [LET-5945] (#5814)
* new test first hack, should still break

---------

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-13 15:36:14 -08:00
cthomas
1848df2daa feat: add special approval request otid for openai streaming (#5744)
* feat: add special approval request otid for openai streaming

* fix import
2025-10-24 15:14:39 -07:00
cthomas
c67bdd9c64 fix: special approval message otid for gemini streaming (#5742) 2025-10-24 15:14:39 -07:00
cthomas
6b37ef2cb7 fix: special otid handling for approval request (#5726) 2025-10-24 15:14:31 -07:00
cthomas
4823416af9 feat: default unpack assistant message content [LET-5404] (#5707)
feat: default unpack assistant message content
2025-10-24 15:14:20 -07:00
cthomas
73dcc0d4b7 feat: latest hitl + parallel tool call changes (#5565) 2025-10-24 15:12:49 -07:00
Matthew Zhou
643ec8fe2f fix: Double write tool call deltas [LET-5545] (#5461)
* Double write tool call deltas

* Fix
2025-10-24 15:12:11 -07:00
Kevin Lin
08da1a64bb feat: parse reasoning_content from OAI proxies (eg. vLLM / OpenRouter) (#5372)
* reasonig_content support

* fix

* comment

* fix

* rm comment

---------

Co-authored-by: Charles Packer <packercharles@gmail.com>
2025-10-24 15:11:31 -07:00
Kian Jones
c2e474e03a feat: refactor logs to parse as a single log line each and filter out 404s from sentry (#5242)
* add multiline log auto detect

* implement logger.exception()

* filter out 404

* remove potentially problematic changes
2025-10-24 15:11:31 -07:00
Matthew Zhou
7511b0f4fe feat: Write anthropic streaming interface that supports parallel tool calling [LET-5355] (#5295)
Write anthropic streaming interface that supports parallel tool calling
2025-10-09 15:25:21 -07:00
Matthew Zhou
5593f1450b feat: Double write to ToolCallMessage's new list tool_calls field (#5268)
* Add new tool_calls field to ToolCallMessage

* fern autogen

* Double write to new tool_calls field

* Update straggling instances
2025-10-09 13:20:52 -07:00
cthomas
cc913df27c feat: add signature to content parts (#5134)
* feat: add signature to content parts

* always base64 encode thought signature

* propagate thought signature back to request
2025-10-07 17:50:49 -07:00
cthomas
93d9ff01c6 feat: add gemini native thinking (#5124)
* feat: add gemini native thinking

* update test

* revert comments
2025-10-07 17:50:49 -07:00
cthomas
3e17b4289a feat: gracefully handle gemini empty content parts (#5116) 2025-10-07 17:50:48 -07:00
cthomas
f7755d837a feat: add gemini streaming to new agent loop (#5109)
* feat: add gemini streaming to new agent loop

* add google as required dependency

* support storing all content parts

* remove extra google references
2025-10-07 17:50:48 -07:00
Sarah Wooders
ef07e03ee3 feat: add run_id to input messages and step_id to messages (#5099) 2025-10-07 17:50:48 -07:00
cthomas
a3545110cf feat: add full responses api support in new agent loop (#5051)
* feat: add full responses api support in new agent loop

* update matrix in workflow

* relax check for reasoning messages for high effort gpt 5

* fix indent

* one more relax
2025-10-07 17:50:48 -07:00
cthomas
67f8e46619 feat: add run id to streamed messages (#5037) 2025-10-07 17:50:47 -07:00
cthomas
f235dfb356 feat: add tool call test for new agent loop (#5034) 2025-10-07 17:50:47 -07:00