feat: centralize telemetry logging at LLM client level (#8815)

* feat: centralize telemetry logging at LLM client level Moves telemetry logging from individual adapters to LLMClientBase: - Add TelemetryStreamWrapper for streaming telemetry on stream close - Add request_async_with_telemetry() for non-streaming requests - Add stream_async_with_telemetry() for streaming requests - Add set_telemetry_context() to configure agent_id, run_id, step_id Updates adapters and agents to use new pattern: - LettaLLMAdapter now accepts agent_id/run_id in constructor - Adapters call set_telemetry_context() before LLM requests - Removes duplicate telemetry logging from adapters - Enriches traces with agent_id, run_id, call_type metadata 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: accumulate streaming response content for telemetry TelemetryStreamWrapper now extracts actual response data from chunks: - Content text (concatenated from deltas) - Tool calls (id, name, arguments) - Model name, finish reason, usage stats 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: move streaming telemetry to caller (option 3) - Remove TelemetryStreamWrapper class - Add log_provider_trace_async() helper to LLMClientBase - stream_async_with_telemetry() now just returns raw stream - Callers log telemetry after processing with rich interface data Updated callers: - summarizer.py: logs content + usage after stream processing - letta_agent.py: logs tool_call, reasoning, model, usage 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: pass agent_id and run_id to parent adapter class LettaLLMStreamAdapter was not passing agent_id/run_id to parent, causing "unexpected keyword argument" errors. 🐙 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>
2026-01-16 22:23:48 -08:00
parent 9418ab9815
commit a92e868ee6
10 changed files with 216 additions and 19 deletions
--- a/letta/services/summarizer/summarizer.py
+++ b/letta/services/summarizer/summarizer.py
@@ -426,11 +426,15 @@ async def simple_summary(
    actor: User,
    include_ack: bool = True,
    prompt: str | None = None,
+    telemetry_manager: "TelemetryManager | None" = None,
+    agent_id: str | None = None,
+    run_id: str | None = None,
 ) -> str:
    """Generate a simple summary from a list of messages.

    Intentionally kept functional due to the simplicity of the prompt.
    """
+    from letta.services.telemetry_manager import TelemetryManager

    # Create an LLMClient from the config
    llm_client = LLMClient.create(
@@ -440,6 +444,15 @@ async def simple_summary(
    )
    assert llm_client is not None

+    # Always set telemetry context - create TelemetryManager if not provided
+    tm = telemetry_manager or TelemetryManager()
+    llm_client.set_telemetry_context(
+        telemetry_manager=tm,
+        agent_id=agent_id,
+        run_id=run_id,
+        call_type="summarization",
+    )
+
    # Prepare the messages payload to send to the LLM
    system_prompt = prompt or gpt_summarize.SYSTEM
    # Build the initial transcript without clamping to preserve fidelity
@@ -494,13 +507,27 @@ async def simple_summary(
            )

            # AnthropicClient.stream_async sets request_data["stream"] = True internally.
-            stream = await llm_client.stream_async(req_data, summarizer_llm_config)
+            stream = await llm_client.stream_async_with_telemetry(req_data, summarizer_llm_config)
            async for _chunk in interface.process(stream):
                # We don't emit anything; we just want the fully-accumulated content.
                pass

            content_parts = interface.get_content()
            text = "".join(part.text for part in content_parts if isinstance(part, TextContent)).strip()
+
+            # Log telemetry after stream processing
+            await llm_client.log_provider_trace_async(
+                request_data=req_data,
+                response_json={
+                    "content": text,
+                    "model": summarizer_llm_config.model,
+                    "usage": {
+                        "input_tokens": getattr(interface, "input_tokens", None),
+                        "output_tokens": getattr(interface, "output_tokens", None),
+                    },
+                },
+            )
+
            if not text:
                logger.warning("No content returned from summarizer (streaming path)")
                raise Exception("Summary failed to generate")
@@ -512,7 +539,7 @@ async def simple_summary(
            summarizer_llm_config.model_endpoint_type,
            summarizer_llm_config.model,
        )
-        response_data = await llm_client.request_async(req_data, summarizer_llm_config)
+        response_data = await llm_client.request_async_with_telemetry(req_data, summarizer_llm_config)
        response = await llm_client.convert_response_to_chat_completion(
            response_data,
            req_messages_obj,