* fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming
When reasoning models (gpt-5.x) exhaust their max_output_tokens budget
on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent
instead of ResponseCompletedEvent. This was previously unhandled, causing
final_response to remain None — which meant get_content() and
get_tool_call_objects() returned empty results, silently dropping the
partial response.
Now ResponseIncompleteEvent is handled identically to
ResponseCompletedEvent (extracting partial content, usage stats, and
token details), with an additional warning log indicating the incomplete
reason.
* fix(core): propagate finish_reason for Responses API incomplete events
- Guard usage extraction against None usage payload in
ResponseIncompleteEvent handler
- Add _finish_reason override to LettaLLMAdapter so streaming adapters
can explicitly set finish_reason without a chat_completions_response
- Map incomplete_details.reason="max_output_tokens" to
finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat
Completions API convention
- This allows the agent loop's _decide_continuation to correctly return
stop_reason="max_tokens_exceeded" instead of "end_turn" when the model
exhausts its output token budget on reasoning
* fix(core): handle empty content parts in incomplete ResponseOutputMessage
When a model hits max_output_tokens after starting a ResponseOutputMessage
but before producing any content parts, the message has content=[]. This
previously raised ValueError("Got 0 content parts, expected 1"). Now it
logs a warning and skips the empty message, allowing reasoning-only
incomplete responses to be processed cleanly.
* fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens
Handle content_filter and any future unknown incomplete reasons from the
Responses API instead of silently leaving finish_reason as None.
LLMError exceptions are already properly formatted errors that should
propagate directly. Without this check, they get unnecessarily wrapped
by handle_llm_error, losing their original error information.
Both letta_llm_stream_adapter and simple_llm_stream_adapter were
creating ProviderTrace without call_type, causing "unknown" in
ClickHouse analytics.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
Log error traces to ClickHouse when streaming requests fail,
matching the behavior in letta_llm_stream_adapter.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
fix: use shared event + .athrow() to properly set stream_was_cancelled flag
**Problem:**
When a run is cancelled via /cancel endpoint, `stream_was_cancelled` remained
False because `RunCancelledException` was raised in the consumer code (wrapper),
which closes the generator from outside. This causes Python to skip the
generator's except blocks and jump directly to finally with the wrong flag value.
**Solution:**
1. Shared `asyncio.Event` registry for cross-layer cancellation signaling
2. `cancellation_aware_stream_wrapper` sets the event when cancellation detected
3. Wrapper uses `.athrow()` to inject exception INTO generator (not consumer-side raise)
4. All streaming interfaces check event in `finally` block to set flag correctly
5. `streaming_service.py` handles `RunCancelledException` gracefully, yields [DONE]
**Changes:**
- streaming_response.py: Event registry + .athrow() injection + graceful handling
- openai_streaming_interface.py: 3 classes check event in finally
- gemini_streaming_interface.py: Check event in finally
- anthropic_*.py: Catch RunCancelledException
- simple_llm_stream_adapter.py: Create & pass event to interfaces
- streaming_service.py: Handle RunCancelledException, yield [DONE], skip double-update
- routers/v1/{conversations,runs}.py: Pass event to wrapper
- integration_test_human_in_the_loop.py: New test for approval + cancellation
**Tests:**
- test_tool_call with cancellation (OpenAI models) ✅
- test_approve_with_cancellation (approval flow + concurrent cancel) ✅
**Known cosmetic warnings (pre-existing):**
- "Run already in terminal state" - agent loop tries to update after /cancel
- "Stream ended without terminal event" - background streaming timing race
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* feat: add provider trace backend abstraction for multi-backend telemetry
Introduces a pluggable backend system for provider traces:
- Base class with async/sync create and read interfaces
- PostgreSQL backend (existing behavior)
- ClickHouse backend (via OTEL instrumentation)
- Socket backend (writes to Unix socket for crouton sidecar)
- Factory for instantiating backends from config
Refactors TelemetryManager to use backends with support for:
- Multi-backend writes (concurrent via asyncio.gather)
- Primary backend for reads (first in config list)
- Graceful error handling per backend
Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND (comma-separated)
Example: "postgres,socket" for dual-write to Postgres and crouton
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* feat: add protocol version to socket backend records
Adds PROTOCOL_VERSION constant to socket backend:
- Included in every telemetry record sent to crouton
- Must match ProtocolVersion in apps/crouton/main.go
- Enables crouton to detect and reject incompatible messages
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: remove organization_id from ProviderTraceCreate calls
The organization_id is now handled via the actor parameter in the
telemetry manager, not through ProviderTraceCreate schema. This fixes
validation errors after changing ProviderTraceCreate to inherit from
BaseProviderTrace which forbids extra fields.
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* consolidate provider trace
* add clickhouse-connect to fix bug on main lmao
* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api
* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api
* consolidate provider trace
* consolidate provider trace bug fix
---------
Co-authored-by: Letta <noreply@letta.com>
* fix(core): sanitize messages to anthropic in the main path the same way (or similar) to how we do it in the token counter
* fix: also patch poison error in backend by filtering lazily
* fix: remap streaming errors (what the fuck)
* fix: dedupe tool clals
* fix: cleanup, removed try/catch
* first hack
* clean up
* first implementation working
* revert package-lock
* remove openai test
* error throw
* typo
* Update integration_test_send_message_v2.py
* Update integration_test_send_message_v2.py
* refine test
* Only make changes for openai non streaming
* Add tests
---------
Co-authored-by: Ari Webb <ari@letta.com>
Co-authored-by: Matt Zhou <mattzh1314@gmail.com>
* wip
* Fix parallel tool calling interface
* wip
* wip adapt using id field
* Integrate new multi tool return schemas into parallel tool calling
* Remove example script
* Reset changes to llm stream adapter since old agent loop should not enable parallel tool calling
* Clean up fallback logic for extracting tool calls
* Remove redundant check
* Simplify logic
* Clean up logic in handle ai response
* Fix tests
* Write anthropic dict conversion to be back compatible
* wip
* Double write tool call id for legacy reasons
* Fix override args failures
* Patch for approvals
* Revert comments
* Remove extraneous prints
* feat: add gemini streaming to new agent loop
* add google as required dependency
* support storing all content parts
* remove extra google references