Files
letta-server/letta/groups/sleeptime_multi_agent_v4.py
cthomas 416ffc7cd7 Add billing context to LLM telemetry traces (#9745)
* feat: add billing context to LLM telemetry traces

Add billing metadata (plan type, cost source, customer ID) to LLM traces in ClickHouse for cost analytics and attribution.

**Data Flow:**
- Cloud-API: Extract billing info from subscription in rate limiting, set x-billing-* headers
- Core: Parse headers into BillingContext object via dependencies
- Adapters: Flow billing_context through all LLM adapters (blocking & streaming)
- Agent: Pass billing_context to step() and stream() methods
- ClickHouse: Store in billing_plan_type, billing_cost_source, billing_customer_id columns

**Changes:**
- Add BillingContext schema to provider_trace.py
- Add billing columns to llm_traces ClickHouse table DDL
- Update getCustomerSubscription to fetch stripeCustomerId from organization_billing_details
- Propagate billing_context through agent step flow, adapters, and streaming service
- Update ProviderTrace and LLMTrace to include billing metadata
- Regenerate SDK with autogen

**Production Deployment:**
Requires env vars: LETTA_PROVIDER_TRACE_BACKEND=clickhouse, LETTA_STORE_LLM_TRACES=true, CLICKHOUSE_*

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add billing_context parameter to agent step methods

- Add billing_context to BaseAgent and BaseAgentV2 abstract methods
- Update LettaAgent, LettaAgentV2, LettaAgentV3 step methods
- Update multi-agent groups: SleeptimeMultiAgentV2, V3, V4
- Fix test_utils.py to include billing header parameters
- Import BillingContext in all affected files

* fix: add billing_context to stream methods

- Add billing_context parameter to BaseAgentV2.stream()
- Add billing_context parameter to LettaAgentV2.stream()
- LettaAgentV3.stream() already has it from previous commit

* fix: exclude billing headers from OpenAPI spec

Mark billing headers as internal (include_in_schema=False) so they don't appear in the public API.
These are internal headers between cloud-api and core, not part of the public SDK.

Regenerated SDK with stage-api - removes 10,650 lines of bloat that was causing OOM during Next.js build.

* refactor: return billing context from handleUnifiedRateLimiting instead of mutating req

Instead of passing req into handleUnifiedRateLimiting and mutating headers inside it:
- Return billing context fields (billingPlanType, billingCostSource, billingCustomerId) from handleUnifiedRateLimiting
- Set headers in handleMessageRateLimiting (middleware layer) after getting the result
- This fixes step-orchestrator compatibility since it doesn't have a real Express req object

* chore: remove extra gencode

* p

---------

Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:13 -08:00

266 lines
11 KiB
Python

from collections.abc import AsyncGenerator
from datetime import datetime, timezone
from letta.agents.letta_agent_v3 import LettaAgentV3
from letta.constants import DEFAULT_MAX_STEPS
from letta.groups.helpers import stringify_message
from letta.otel.tracing import trace_method
from letta.schemas.agent import AgentState
from letta.schemas.enums import RunStatus
from letta.schemas.group import Group, ManagerType
from letta.schemas.letta_message import MessageType
from letta.schemas.letta_message_content import TextContent
from letta.schemas.letta_request import ClientToolSchema
from letta.schemas.letta_response import LettaResponse
from letta.schemas.letta_stop_reason import StopReasonType
from letta.schemas.message import Message, MessageCreate
from letta.schemas.provider_trace import BillingContext
from letta.schemas.run import Run, RunUpdate
from letta.schemas.user import User
from letta.services.group_manager import GroupManager
from letta.utils import safe_create_task
class SleeptimeMultiAgentV4(LettaAgentV3):
def __init__(
self,
agent_state: AgentState,
actor: User,
group: Group,
):
super().__init__(agent_state, actor)
assert group.manager_type == ManagerType.sleeptime, f"Expected group type to be 'sleeptime', got {group.manager_type}"
self.group = group
self.run_ids = []
# Additional manager classes
self.group_manager = GroupManager()
@trace_method
async def step(
self,
input_messages: list[MessageCreate],
max_steps: int = DEFAULT_MAX_STEPS,
run_id: str | None = None,
use_assistant_message: bool = True,
include_return_message_types: list[MessageType] | None = None,
request_start_timestamp_ns: int | None = None,
conversation_id: str | None = None,
client_tools: list[ClientToolSchema] | None = None,
include_compaction_messages: bool = False,
billing_context: "BillingContext | None" = None,
) -> LettaResponse:
self.run_ids = []
for i in range(len(input_messages)):
input_messages[i].group_id = self.group.id
response = await super().step(
input_messages=input_messages,
max_steps=max_steps,
run_id=run_id,
use_assistant_message=use_assistant_message,
include_return_message_types=include_return_message_types,
request_start_timestamp_ns=request_start_timestamp_ns,
conversation_id=conversation_id,
client_tools=client_tools,
include_compaction_messages=include_compaction_messages,
billing_context=billing_context,
)
run_ids = await self.run_sleeptime_agents()
response.usage.run_ids = run_ids
return response
@trace_method
async def stream(
self,
input_messages: list[MessageCreate],
max_steps: int = DEFAULT_MAX_STEPS,
stream_tokens: bool = True,
run_id: str | None = None,
use_assistant_message: bool = True,
request_start_timestamp_ns: int | None = None,
include_return_message_types: list[MessageType] | None = None,
conversation_id: str | None = None,
client_tools: list[ClientToolSchema] | None = None,
include_compaction_messages: bool = False,
) -> AsyncGenerator[str, None]:
self.run_ids = []
for i in range(len(input_messages)):
input_messages[i].group_id = self.group.id
# Perform foreground agent step
try:
async for chunk in super().stream(
input_messages=input_messages,
max_steps=max_steps,
stream_tokens=stream_tokens,
run_id=run_id,
use_assistant_message=use_assistant_message,
include_return_message_types=include_return_message_types,
request_start_timestamp_ns=request_start_timestamp_ns,
conversation_id=conversation_id,
client_tools=client_tools,
include_compaction_messages=include_compaction_messages,
):
yield chunk
finally:
# For some reason, stream is throwing a GeneratorExit even though it appears the that client
# is getting the whole stream. This pattern should work to ensure sleeptime agents run despite this.
await self.run_sleeptime_agents()
@trace_method
async def run_sleeptime_agents(self) -> list[str]:
# Get response messages
last_response_messages = self.response_messages
# Update turns counter
turns_counter = None
if self.group.sleeptime_agent_frequency is not None and self.group.sleeptime_agent_frequency > 0:
turns_counter = await self.group_manager.bump_turns_counter_async(group_id=self.group.id, actor=self.actor)
# Perform participant steps
if self.group.sleeptime_agent_frequency is None or (
turns_counter is not None and turns_counter % self.group.sleeptime_agent_frequency == 0
):
# Skip sleeptime processing if no response messages were generated
if not last_response_messages:
self.logger.warning("No response messages generated, skipping sleeptime agent processing")
return self.run_ids
last_processed_message_id = await self.group_manager.get_last_processed_message_id_and_update_async(
group_id=self.group.id, last_processed_message_id=last_response_messages[-1].id, actor=self.actor
)
for sleeptime_agent_id in self.group.agent_ids:
try:
sleeptime_run_id = await self._issue_background_task(
sleeptime_agent_id,
last_response_messages,
last_processed_message_id,
)
self.run_ids.append(sleeptime_run_id)
except Exception as e:
# Individual task failures
print(f"Sleeptime agent processing failed: {e!s}")
raise e
return self.run_ids
@trace_method
async def _issue_background_task(
self,
sleeptime_agent_id: str,
response_messages: list[Message],
last_processed_message_id: str,
) -> str:
run = Run(
agent_id=sleeptime_agent_id,
status=RunStatus.created,
metadata={
"run_type": "sleeptime_agent_send_message_async", # is this right?
"agent_id": sleeptime_agent_id,
},
)
run = await self.run_manager.create_run(pydantic_run=run, actor=self.actor)
safe_create_task(
self._participant_agent_step(
foreground_agent_id=self.agent_state.id,
sleeptime_agent_id=sleeptime_agent_id,
response_messages=response_messages,
last_processed_message_id=last_processed_message_id,
run_id=run.id,
),
label=f"participant_agent_step_{sleeptime_agent_id}",
)
return run.id
@trace_method
async def _participant_agent_step(
self,
foreground_agent_id: str,
sleeptime_agent_id: str,
response_messages: list[Message],
last_processed_message_id: str,
run_id: str,
) -> LettaResponse:
try:
# Update run status
run_update = RunUpdate(status=RunStatus.running)
await self.run_manager.update_run_by_id_async(run_id=run_id, update=run_update, actor=self.actor)
# Create conversation transcript
prior_messages = []
if self.group.sleeptime_agent_frequency:
try:
prior_messages = await self.message_manager.list_messages(
agent_id=foreground_agent_id,
actor=self.actor,
after=last_processed_message_id,
before=response_messages[0].id,
)
except Exception:
pass # continue with just latest messages
message_strings = [stringify_message(message) for message in prior_messages + response_messages]
message_strings = [s for s in message_strings if s is not None]
messages_text = "\n".join(message_strings)
message_text = (
"<system-reminder>\n"
"You are a sleeptime agent - a background agent that asynchronously processes conversations after they occur.\n\n"
"IMPORTANT: You are NOT the primary agent. You are reviewing a conversation that already happened between a primary agent and its user:\n"
'- Messages labeled "assistant" are from the primary agent (not you)\n'
'- Messages labeled "user" are from the primary agent\'s user\n\n'
"Your primary role is memory management. Review the conversation and use your memory tools to update any relevant memory blocks with information worth preserving. "
"Check your memory_persona block for any additional instructions or policies.\n"
"</system-reminder>\n\n"
f"Messages:\n{messages_text}"
)
sleeptime_agent_messages = [
MessageCreate(
role="user",
content=[TextContent(text=message_text)],
id=Message.generate_id(),
agent_id=sleeptime_agent_id,
group_id=self.group.id,
)
]
# Load sleeptime agent
sleeptime_agent_state = await self.agent_manager.get_agent_by_id_async(agent_id=sleeptime_agent_id, actor=self.actor)
sleeptime_agent = LettaAgentV3(
agent_state=sleeptime_agent_state,
actor=self.actor,
)
# Perform sleeptime agent step
result = await sleeptime_agent.step(
input_messages=sleeptime_agent_messages,
run_id=run_id,
)
# Update run status
run_update = RunUpdate(
status=RunStatus.completed,
completed_at=datetime.now(timezone.utc).replace(tzinfo=None),
stop_reason=result.stop_reason.stop_reason if result.stop_reason else StopReasonType.end_turn,
metadata={
"result": result.model_dump(mode="json"),
"agent_id": sleeptime_agent_state.id,
},
)
await self.run_manager.update_run_by_id_async(run_id=run_id, update=run_update, actor=self.actor)
return result
except Exception as e:
run_update = RunUpdate(
status=RunStatus.failed,
completed_at=datetime.now(timezone.utc).replace(tzinfo=None),
stop_reason=StopReasonType.error,
metadata={"error": str(e)},
)
await self.run_manager.update_run_by_id_async(run_id=run_id, update=run_update, actor=self.actor)
raise