feat: centralize telemetry logging at LLM client level (#8815)

* feat: centralize telemetry logging at LLM client level

Moves telemetry logging from individual adapters to LLMClientBase:
- Add TelemetryStreamWrapper for streaming telemetry on stream close
- Add request_async_with_telemetry() for non-streaming requests
- Add stream_async_with_telemetry() for streaming requests
- Add set_telemetry_context() to configure agent_id, run_id, step_id

Updates adapters and agents to use new pattern:
- LettaLLMAdapter now accepts agent_id/run_id in constructor
- Adapters call set_telemetry_context() before LLM requests
- Removes duplicate telemetry logging from adapters
- Enriches traces with agent_id, run_id, call_type metadata

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: accumulate streaming response content for telemetry

TelemetryStreamWrapper now extracts actual response data from chunks:
- Content text (concatenated from deltas)
- Tool calls (id, name, arguments)
- Model name, finish reason, usage stats

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: move streaming telemetry to caller (option 3)

- Remove TelemetryStreamWrapper class
- Add log_provider_trace_async() helper to LLMClientBase
- stream_async_with_telemetry() now just returns raw stream
- Callers log telemetry after processing with rich interface data

Updated callers:
- summarizer.py: logs content + usage after stream processing
- letta_agent.py: logs tool_call, reasoning, model, usage

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: pass agent_id and run_id to parent adapter class

LettaLLMStreamAdapter was not passing agent_id/run_id to parent,
causing "unexpected keyword argument" errors.

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
This commit is contained in:
Kian Jones
2026-01-16 22:23:48 -08:00
committed by Sarah Wooders
parent 9418ab9815
commit a92e868ee6
10 changed files with 216 additions and 19 deletions

View File

@@ -426,11 +426,15 @@ async def simple_summary(
actor: User,
include_ack: bool = True,
prompt: str | None = None,
telemetry_manager: "TelemetryManager | None" = None,
agent_id: str | None = None,
run_id: str | None = None,
) -> str:
"""Generate a simple summary from a list of messages.
Intentionally kept functional due to the simplicity of the prompt.
"""
from letta.services.telemetry_manager import TelemetryManager
# Create an LLMClient from the config
llm_client = LLMClient.create(
@@ -440,6 +444,15 @@ async def simple_summary(
)
assert llm_client is not None
# Always set telemetry context - create TelemetryManager if not provided
tm = telemetry_manager or TelemetryManager()
llm_client.set_telemetry_context(
telemetry_manager=tm,
agent_id=agent_id,
run_id=run_id,
call_type="summarization",
)
# Prepare the messages payload to send to the LLM
system_prompt = prompt or gpt_summarize.SYSTEM
# Build the initial transcript without clamping to preserve fidelity
@@ -494,13 +507,27 @@ async def simple_summary(
)
# AnthropicClient.stream_async sets request_data["stream"] = True internally.
stream = await llm_client.stream_async(req_data, summarizer_llm_config)
stream = await llm_client.stream_async_with_telemetry(req_data, summarizer_llm_config)
async for _chunk in interface.process(stream):
# We don't emit anything; we just want the fully-accumulated content.
pass
content_parts = interface.get_content()
text = "".join(part.text for part in content_parts if isinstance(part, TextContent)).strip()
# Log telemetry after stream processing
await llm_client.log_provider_trace_async(
request_data=req_data,
response_json={
"content": text,
"model": summarizer_llm_config.model,
"usage": {
"input_tokens": getattr(interface, "input_tokens", None),
"output_tokens": getattr(interface, "output_tokens", None),
},
},
)
if not text:
logger.warning("No content returned from summarizer (streaming path)")
raise Exception("Summary failed to generate")
@@ -512,7 +539,7 @@ async def simple_summary(
summarizer_llm_config.model_endpoint_type,
summarizer_llm_config.model,
)
response_data = await llm_client.request_async(req_data, summarizer_llm_config)
response_data = await llm_client.request_async_with_telemetry(req_data, summarizer_llm_config)
response = await llm_client.convert_response_to_chat_completion(
response_data,
req_messages_obj,