LLMError exceptions are already properly formatted errors that should
propagate directly. Without this check, they get unnecessarily wrapped
by handle_llm_error, losing their original error information.
* Add log probabilities support for RL training
This enables Letta server to request and return log probabilities from
OpenAI-compatible providers (including SGLang) for use in RL training.
Changes:
- LLMConfig: Add return_logprobs and top_logprobs fields
- OpenAIClient: Set logprobs in ChatCompletionRequest when enabled
- LettaLLMAdapter: Add logprobs field and extract from response
- LettaResponse: Add logprobs field to return log probs to client
- LettaRequest: Add return_logprobs/top_logprobs for per-request override
- LettaAgentV3: Store and pass logprobs through to response
- agents.py: Handle request-level logprobs override
Usage:
response = client.agents.messages.create(
agent_id=agent_id,
messages=[...],
return_logprobs=True,
top_logprobs=5,
)
print(response.logprobs) # Per-token log probabilities
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* Add multi-turn token tracking for RL training via SGLang native endpoint
- Add TurnTokenData schema to track token IDs and logprobs per turn
- Add return_token_ids flag to LettaRequest and LLMConfig
- Create SGLangNativeClient for /generate endpoint (returns output_ids)
- Create SGLangNativeAdapter that uses native endpoint
- Modify LettaAgentV3 to accumulate turns across LLM calls
- Include turns in LettaResponse when return_token_ids=True
* Fix: Add SGLang native adapter to step() method, not just stream()
* Fix: Handle Pydantic Message objects in SGLang native adapter
* Fix: Remove api_key reference from LLMConfig (not present)
* Fix: Add missing 'created' field to ChatCompletionResponse
* Add full tool support to SGLang native adapter
- Format tools into prompt in Qwen-style format
- Parse tool calls from <tool_call> tags in response
- Format tool results as <tool_response> in user messages
- Set finish_reason to 'tool_calls' when tools are called
* Use tokenizer.apply_chat_template for proper tool formatting
- Add tokenizer caching in SGLang native adapter
- Use apply_chat_template when tokenizer available
- Fall back to manual formatting if not
- Convert Letta messages to OpenAI format for tokenizer
* Fix: Use func_response instead of tool_return for ToolReturn content
* Fix: Get output_token_logprobs from meta_info in SGLang response
* Fix: Allow None in output_token_logprobs (SGLang format includes null)
* chore: remove unrelated files from logprobs branch
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: add missing call_type param to adapter constructors in letta_agent_v3
The SGLang refactor dropped call_type=LLMCallType.agent_step when extracting
adapter creation into conditional blocks. Restores it for all 3 spots (SGLang
in step, SimpleLLM in step, SGLang in stream).
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* just stage-api && just publish-api
* fix: update expected LLMConfig fields in schema test for logprobs support
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: remove rllm provider references
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* just stage-api && just publish-api
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-65-206.ec2.internal>
Co-authored-by: Letta <noreply@letta.com>
* fix(core): preserve Gemini thought_signature on function calls in non-streaming path
The Google Gemini API requires thought_signature to be echoed back on
function call parts in multi-turn conversations. In the non-streaming
request path, the signature was only captured for subsequent function
calls (else branch) but dropped for the first/only function call (if
branch) in convert_response_to_chat_completion. This caused 400
INVALID_ARGUMENT errors on the next turn.
Additionally, when no ReasoningContent existed to carry the signature
(e.g. Gemini 2.5 Flash with include_thoughts=False), the signature was
lost in the adapter layer. Now it falls through to TextContent.
Datadog: https://us5.datadoghq.com/error-tracking/issue/17c4b114-d596-11f0-bcd6-da7ad0900000🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): preserve Gemini thought_signature in non-temporal agent path
Carry reasoning_content_signature on TextContent in letta_agent.py
at both locations where content falls through from reasoning (same
fix already applied to the adapter and temporal activity paths).
Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
* feat: centralize telemetry logging at LLM client level
Moves telemetry logging from individual adapters to LLMClientBase:
- Add TelemetryStreamWrapper for streaming telemetry on stream close
- Add request_async_with_telemetry() for non-streaming requests
- Add stream_async_with_telemetry() for streaming requests
- Add set_telemetry_context() to configure agent_id, run_id, step_id
Updates adapters and agents to use new pattern:
- LettaLLMAdapter now accepts agent_id/run_id in constructor
- Adapters call set_telemetry_context() before LLM requests
- Removes duplicate telemetry logging from adapters
- Enriches traces with agent_id, run_id, call_type metadata
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: accumulate streaming response content for telemetry
TelemetryStreamWrapper now extracts actual response data from chunks:
- Content text (concatenated from deltas)
- Tool calls (id, name, arguments)
- Model name, finish reason, usage stats
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: move streaming telemetry to caller (option 3)
- Remove TelemetryStreamWrapper class
- Add log_provider_trace_async() helper to LLMClientBase
- stream_async_with_telemetry() now just returns raw stream
- Callers log telemetry after processing with rich interface data
Updated callers:
- summarizer.py: logs content + usage after stream processing
- letta_agent.py: logs tool_call, reasoning, model, usage
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: pass agent_id and run_id to parent adapter class
LettaLLMStreamAdapter was not passing agent_id/run_id to parent,
causing "unexpected keyword argument" errors.
🐙 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>