feat: log LLM traces to clickhouse (#9111)

* feat: add non-streaming option for conversation messages - Add ConversationMessageRequest with stream=True default (backwards compatible) - stream=true (default): SSE streaming via StreamingService - stream=false: JSON response via AgentLoop.load().step() 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: regenerate API schema for ConversationMessageRequest * feat: add direct ClickHouse storage for raw LLM traces Adds ability to store raw LLM request/response payloads directly in ClickHouse, bypassing OTEL span attribute size limits. This enables debugging and analytics on large LLM payloads (>10MB system prompts, large tool schemas, etc.). New files: - letta/schemas/llm_raw_trace.py: Pydantic schema with ClickHouse row helper - letta/services/llm_raw_trace_writer.py: Async batching writer (fire-and-forget) - letta/services/llm_raw_trace_reader.py: Reader with query methods - scripts/sql/clickhouse/llm_raw_traces.ddl: Production table DDL - scripts/sql/clickhouse/llm_raw_traces_local.ddl: Local dev DDL - apps/core/clickhouse-init.sql: Local dev initialization Modified: - letta/settings.py: Added 4 settings (store_llm_raw_traces, ttl, batch_size, flush_interval) - letta/llm_api/llm_client_base.py: Integration into request_async_with_telemetry - compose.yaml: Added ClickHouse service for local dev - justfile: Added clickhouse, clickhouse-cli, clickhouse-traces commands Feature disabled by default (LETTA_STORE_LLM_RAW_TRACES=false). Uses ZSTD(3) compression for 10-30x reduction on JSON payloads. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: address code review feedback for LLM raw traces Fixes based on code review feedback: 1. Fix ClickHouse endpoint parsing - default to secure=False for raw host:port inputs (was defaulting to HTTPS which breaks local dev) 2. Make raw trace writes truly fire-and-forget - use asyncio.create_task() instead of awaiting, so JSON serialization doesn't block request path 3. Add bounded queue (maxsize=10000) - prevents unbounded memory growth under load. Drops traces with warning if queue is full. 4. Fix deprecated asyncio usage - get_running_loop() instead of get_event_loop() 5. Add org_id fallback - use _telemetry_org_id if actor doesn't have it 6. Remove unused imports - json import in reader 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add missing asyncio import and simplify JSON serialization - Add missing 'import asyncio' that was causing 'name asyncio is not defined' error - Remove unnecessary clean_double_escapes() function - the JSON is stored correctly, the clickhouse-client CLI was just adding extra escaping when displaying - Update just clickhouse-trace to use Python client for correct JSON output 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: add clickhouse raw trace integration test * test: simplify clickhouse trace assertions * refactor: centralize usage parsing and stream error traces Use per-client usage helpers for raw trace extraction and ensure streaming errors log requests with error metadata. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: exercise provider usage parsing live Make live OpenAI/Anthropic/Gemini requests with credential gating and validate Anthropic cache usage mapping when present. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * test: fix usage parsing tests to pass - Use GoogleAIClient with GEMINI_API_KEY instead of GoogleVertexClient - Update model to gemini-2.0-flash (1.5-flash deprecated in v1beta) - Add tools=[] for Gemini/Anthropic build_request_data 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: extract_usage_statistics returns LettaUsageStatistics Standardize on LettaUsageStatistics as the canonical usage format returned by client helpers. Inline UsageStatistics construction for ChatCompletionResponse where needed. 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add is_byok and llm_config_json columns to ClickHouse traces Extend llm_raw_traces table with: - is_byok (UInt8): Track BYOK vs base provider usage for billing analytics - llm_config_json (String, ZSTD): Store full LLM config for debugging and analysis This enables queries like: - BYOK usage breakdown by provider/model - Config parameter analysis (temperature, max_tokens, etc.) - Debugging specific request configurations * feat: add tests for error traces, llm_config_json, and cache tokens - Update llm_raw_trace_reader.py to query new columns (is_byok, cached_input_tokens, cache_write_tokens, reasoning_tokens, llm_config_json) - Add test_error_trace_stored_in_clickhouse to verify error fields - Add test_cache_tokens_stored_for_anthropic to verify cache token storage - Update existing tests to verify llm_config_json is stored correctly - Make llm_config required in log_provider_trace_async() - Simplify provider extraction to use provider_name directly 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * ci: add ClickHouse integration tests to CI pipeline - Add use-clickhouse option to reusable-test-workflow.yml - Add ClickHouse service container with otel database - Add schema initialization step using clickhouse-init.sql - Add ClickHouse env vars (CLICKHOUSE_ENDPOINT, etc.) - Add separate clickhouse-integration-tests job running integration_test_clickhouse_llm_raw_traces.py 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify provider and org_id extraction in raw trace writer - Use model_endpoint_type.value for provider (not provider_name) - Simplify org_id to just self.actor.organization_id (actor is always pydantic) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify LLMRawTraceWriter with _enabled flag - Check ClickHouse env vars once at init, set _enabled flag - Early return in write_async/flush_async if not enabled - Remove ValueError raises (never used) - Simplify _get_client (no validation needed since already checked) 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: add LLMRawTraceWriter shutdown to FastAPI lifespan Properly flush pending traces on graceful shutdown via lifespan instead of relying only on atexit handler. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * feat: add agent_tags column to ClickHouse traces Store agent tags as Array(String) for filtering/analytics by tag. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * cleanup * fix(ci): fix ClickHouse schema initialization in CI - Create database separately before loading SQL file - Remove CREATE DATABASE from SQL file (handled in CI step) - Add verification step to confirm table was created - Use -sf flag for curl to fail on HTTP errors 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: simplify LLM trace writer with ClickHouse async_insert - Use ClickHouse async_insert for server-side batching instead of manual queue/flush loop - Sync cloud DDL schema with clickhouse-init.sql (add missing columns) - Remove redundant llm_raw_traces_local.ddl - Remove unused batch_size/flush_interval settings - Update tests for simplified writer Key changes: - async_insert=1, wait_for_async_insert=1 for reliable server-side batching - Simple per-trace retry with exponential backoff (max 3 retries) - ~150 lines removed from writer 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: consolidate ClickHouse direct writes into TelemetryManager backend - Add clickhouse_direct backend to provider_trace_backends - Remove duplicate ClickHouse write logic from llm_client_base.py - Configure via LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse_direct The clickhouse_direct backend: - Converts ProviderTrace to LLMRawTrace - Extracts usage stats from response JSON - Writes via LLMRawTraceWriter with async_insert 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: address PR review comments and fix llm_config bug Review comment fixes: - Rename clickhouse_direct -> clickhouse_analytics (clearer purpose) - Remove ClickHouse from OSS compose.yaml, create separate compose.clickhouse.yaml - Delete redundant scripts/test_llm_raw_traces.py (use pytest tests) - Remove unused llm_raw_traces_ttl_days setting (TTL handled in DDL) - Fix socket description leak in telemetry_manager docstring - Add cloud-only comment to clickhouse-init.sql - Update justfile to use separate compose file Bug fix: - Fix llm_config not being passed to ProviderTrace in telemetry - Now correctly populates provider, model, is_byok for all LLM calls - Affects both request_async_with_telemetry and log_provider_trace_async DDL optimizations: - Add secondary indexes (bloom_filter for agent_id, model, step_id) - Add minmax indexes for is_byok, is_error - Change model and error_type to LowCardinality for faster GROUP BY 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: rename llm_raw_traces -> llm_traces Address review feedback that "raw" is misleading since we denormalize fields. Renames: - Table: llm_raw_traces -> llm_traces - Schema: LLMRawTrace -> LLMTrace - Files: llm_raw_trace_{reader,writer}.py -> llm_trace_{reader,writer}.py - Setting: store_llm_raw_traces -> store_llm_traces 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: update workflow references to llm_traces Missed renaming table name in CI workflow files. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: update clickhouse_direct -> clickhouse_analytics in docstring 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove inaccurate OTEL size limit comments The 4MB limit is our own truncation logic, not an OTEL protocol limit. The real benefit is denormalized columns for analytics queries. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: remove local ClickHouse dev setup (cloud-only feature) - Delete clickhouse-init.sql and compose.clickhouse.yaml - Remove local clickhouse just commands - Update CI to use cloud DDL with MergeTree for testing clickhouse_analytics is a cloud-only feature. For local dev, use postgres backend. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: restore compose.yaml to match main 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * refactor: merge clickhouse_analytics into clickhouse backend Per review feedback - having two separate backends was confusing. Now the clickhouse backend: - Writes to llm_traces table (denormalized for cost analytics) - Reads from OTEL traces table (will cut over to llm_traces later) Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND=postgres,clickhouse 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: correct path to DDL file in CI workflow 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * chore: add provider index to DDL for faster filtering 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: configure telemetry backend in clickhouse tests Tests need to set telemetry_settings.provider_trace_backends to include 'clickhouse', otherwise traces are routed to default postgres backend. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: set provider_trace_backend field, not property provider_trace_backends is a computed property, need to set the underlying provider_trace_backend string field instead. 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: error trace test and error_type extraction - Add TelemetryManager to error trace test so traces get written - Fix error_type extraction to check top-level before nested error dict 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: use provider_trace.id for trace correlation across backends - Pass provider_trace.id to LLMTrace instead of auto-generating - Log warning if ID is missing (shouldn't happen, helps debug) - Fallback to new UUID only if not set 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: trace ID correlation and concurrency issues - Strip "provider_trace-" prefix from ID for UUID storage in ClickHouse - Add asyncio.Lock to serialize writes (clickhouse_connect not thread-safe) - Fix Anthropic prompt_tokens to include cached tokens for cost analytics - Log warning if provider_trace.id is missing 🤖 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com> Co-authored-by: Caren Thomas <carenthomas@gmail.com>
2026-02-02 17:25:45 -08:00
parent 24ea7dbaed
commit 4096b30cd7
13 changed files with 1419 additions and 13 deletions
--- a/tests/integration_test_clickhouse_llm_traces.py
+++ b/tests/integration_test_clickhouse_llm_traces.py
@@ -0,0 +1,350 @@
+"""
+Integration tests for ClickHouse-backed LLM raw traces.
+
+Validates that:
+1) Agent message requests are stored in ClickHouse (request_json contains the message)
+2) Summarization traces are stored and retrievable by step_id
+3) Error traces are stored with is_error, error_type, and error_message
+4) llm_config_json is properly stored
+5) Cache and usage statistics are stored (cached_input_tokens, cache_write_tokens, reasoning_tokens)
+"""
+
+import asyncio
+import json
+import os
+import time
+import uuid
+
+import pytest
+
+from letta.agents.letta_agent_v3 import LettaAgentV3
+from letta.config import LettaConfig
+from letta.schemas.agent import CreateAgent
+from letta.schemas.embedding_config import EmbeddingConfig
+from letta.schemas.enums import MessageRole
+from letta.schemas.letta_message_content import TextContent
+from letta.schemas.llm_config import LLMConfig
+from letta.schemas.message import Message, MessageCreate
+from letta.schemas.run import Run
+from letta.server.server import SyncServer
+from letta.services.llm_trace_reader import get_llm_trace_reader
+from letta.services.provider_trace_backends import get_provider_trace_backends
+from letta.services.summarizer.summarizer import simple_summary
+from letta.settings import settings, telemetry_settings
+
+
+def _require_clickhouse_env() -> dict[str, str]:
+    endpoint = os.getenv("CLICKHOUSE_ENDPOINT")
+    password = os.getenv("CLICKHOUSE_PASSWORD")
+    if not endpoint or not password:
+        pytest.skip("ClickHouse env vars not set (CLICKHOUSE_ENDPOINT, CLICKHOUSE_PASSWORD)")
+    return {
+        "endpoint": endpoint,
+        "password": password,
+        "username": os.getenv("CLICKHOUSE_USERNAME", "default"),
+        "database": os.getenv("CLICKHOUSE_DATABASE", "otel"),
+    }
+
+
+def _anthropic_llm_config() -> LLMConfig:
+    return LLMConfig(
+        model="claude-3-5-haiku-20241022",
+        model_endpoint_type="anthropic",
+        model_endpoint="https://api.anthropic.com/v1",
+        context_window=200000,
+        max_tokens=2048,
+        put_inner_thoughts_in_kwargs=False,
+        enable_reasoner=False,
+    )
+
+
+@pytest.fixture
+async def server():
+    config = LettaConfig.load()
+    config.save()
+    server = SyncServer(init_with_default_org_and_user=True)
+    await server.init_async()
+    await server.tool_manager.upsert_base_tools_async(actor=server.default_user)
+    yield server
+
+
+@pytest.fixture
+async def actor(server: SyncServer):
+    return server.default_user
+
+
+@pytest.fixture
+def clickhouse_settings():
+    env = _require_clickhouse_env()
+
+    original_values = {
+        "endpoint": settings.clickhouse_endpoint,
+        "username": settings.clickhouse_username,
+        "password": settings.clickhouse_password,
+        "database": settings.clickhouse_database,
+        "store_llm_traces": settings.store_llm_traces,
+        "provider_trace_backend": telemetry_settings.provider_trace_backend,
+    }
+
+    settings.clickhouse_endpoint = env["endpoint"]
+    settings.clickhouse_username = env["username"]
+    settings.clickhouse_password = env["password"]
+    settings.clickhouse_database = env["database"]
+    settings.store_llm_traces = True
+
+    # Configure telemetry to use clickhouse backend (set the underlying field, not the property)
+    telemetry_settings.provider_trace_backend = "clickhouse"
+    # Clear the cached backends so they get recreated with new settings
+    get_provider_trace_backends.cache_clear()
+
+    yield
+
+    settings.clickhouse_endpoint = original_values["endpoint"]
+    settings.clickhouse_username = original_values["username"]
+    settings.clickhouse_password = original_values["password"]
+    settings.clickhouse_database = original_values["database"]
+    settings.store_llm_traces = original_values["store_llm_traces"]
+    telemetry_settings.provider_trace_backend = original_values["provider_trace_backend"]
+    # Clear cache again to restore original backends
+    get_provider_trace_backends.cache_clear()
+
+
+async def _wait_for_raw_trace(step_id: str, organization_id: str, timeout_seconds: int = 30):
+    """Wait for a trace to appear in ClickHouse.
+
+    With async_insert + wait_for_async_insert=1, traces should appear quickly,
+    but we poll to handle any propagation delay.
+    """
+    reader = get_llm_trace_reader()
+    deadline = time.time() + timeout_seconds
+
+    while time.time() < deadline:
+        trace = await reader.get_by_step_id_async(step_id=step_id, organization_id=organization_id)
+        if trace is not None:
+            return trace
+        await asyncio.sleep(0.5)
+
+    raise AssertionError(f"Timed out waiting for raw trace with step_id={step_id}")
+
+
+@pytest.mark.asyncio
+async def test_agent_message_stored_in_clickhouse(server: SyncServer, actor, clickhouse_settings):
+    """Test that agent step traces are stored with all fields including llm_config_json."""
+    message_text = f"ClickHouse trace test {uuid.uuid4()}"
+    llm_config = _anthropic_llm_config()
+
+    agent_state = await server.agent_manager.create_agent_async(
+        CreateAgent(
+            name=f"clickhouse_agent_{uuid.uuid4().hex[:8]}",
+            llm_config=llm_config,
+            embedding_config=EmbeddingConfig.default_config(model_name="letta"),
+        ),
+        actor=actor,
+    )
+
+    agent = LettaAgentV3(agent_state=agent_state, actor=actor)
+    run = await server.run_manager.create_run(
+        Run(agent_id=agent_state.id),
+        actor=actor,
+    )
+    run_id = run.id
+    response = await agent.step(
+        [MessageCreate(role=MessageRole.user, content=[TextContent(text=message_text)])],
+        run_id=run_id,
+    )
+
+    step_id = next(msg.step_id for msg in reversed(response.messages) if msg.step_id)
+    trace = await _wait_for_raw_trace(step_id=step_id, organization_id=actor.organization_id)
+
+    # Basic trace fields
+    assert trace.step_id == step_id
+    assert message_text in trace.request_json
+    assert trace.is_error is False
+    assert trace.error_type is None
+    assert trace.error_message is None
+
+    # Verify llm_config_json is stored and contains expected fields
+    assert trace.llm_config_json, "llm_config_json should not be empty"
+    config_data = json.loads(trace.llm_config_json)
+    assert config_data.get("model") == llm_config.model
+    assert "context_window" in config_data
+    assert "max_tokens" in config_data
+
+    # Token usage should be populated
+    assert trace.prompt_tokens > 0
+    assert trace.completion_tokens >= 0
+    assert trace.total_tokens > 0
+
+
+@pytest.mark.asyncio
+async def test_summary_stored_with_content_and_usage(server: SyncServer, actor, clickhouse_settings):
+    """Test that summarization traces are stored with content, usage, and cache info."""
+    step_id = f"step-{uuid.uuid4()}"
+    llm_config = _anthropic_llm_config()
+    summary_source_messages = [
+        Message(role=MessageRole.system, content=[TextContent(text="System prompt")]),
+        Message(role=MessageRole.user, content=[TextContent(text="User message 1")]),
+        Message(role=MessageRole.assistant, content=[TextContent(text="Assistant response 1")]),
+        Message(role=MessageRole.user, content=[TextContent(text="User message 2")]),
+    ]
+
+    summary_text = await simple_summary(
+        messages=summary_source_messages,
+        llm_config=llm_config,
+        actor=actor,
+        agent_id=f"agent-{uuid.uuid4()}",
+        agent_tags=["test", "clickhouse"],
+        run_id=f"run-{uuid.uuid4()}",
+        step_id=step_id,
+        compaction_settings={"mode": "partial_evict", "message_buffer_limit": 60},
+    )
+
+    trace = await _wait_for_raw_trace(step_id=step_id, organization_id=actor.organization_id)
+
+    # Basic assertions
+    assert trace.step_id == step_id
+    assert trace.call_type == "summarization"
+    assert trace.is_error is False
+
+    # Verify llm_config_json is stored
+    assert trace.llm_config_json, "llm_config_json should not be empty"
+    config_data = json.loads(trace.llm_config_json)
+    assert config_data.get("model") == llm_config.model
+
+    # Verify summary content in response
+    summary_in_response = False
+    try:
+        response_payload = json.loads(trace.response_json)
+        if isinstance(response_payload, dict):
+            if "choices" in response_payload:
+                content = response_payload.get("choices", [{}])[0].get("message", {}).get("content", "")
+                summary_in_response = summary_text.strip() in (content or "")
+            elif "content" in response_payload:
+                summary_in_response = summary_text.strip() in (response_payload.get("content") or "")
+    except Exception:
+        summary_in_response = False
+
+    assert summary_in_response or summary_text in trace.response_json
+
+    # Token usage should be populated
+    assert trace.prompt_tokens > 0
+    assert trace.total_tokens > 0
+
+    # Cache fields may or may not be populated depending on provider response
+    # Just verify they're accessible (not erroring)
+    _ = trace.cached_input_tokens
+    _ = trace.cache_write_tokens
+    _ = trace.reasoning_tokens
+
+
+@pytest.mark.asyncio
+async def test_error_trace_stored_in_clickhouse(server: SyncServer, actor, clickhouse_settings):
+    """Test that error traces are stored with is_error=True and error details."""
+    from letta.llm_api.anthropic_client import AnthropicClient
+
+    step_id = f"step-error-{uuid.uuid4()}"
+
+    # Create a client with invalid config to trigger an error
+    invalid_llm_config = LLMConfig(
+        model="invalid-model-that-does-not-exist",
+        model_endpoint_type="anthropic",
+        model_endpoint="https://api.anthropic.com/v1",
+        context_window=200000,
+        max_tokens=2048,
+    )
+
+    from letta.services.telemetry_manager import TelemetryManager
+
+    client = AnthropicClient()
+    client.set_telemetry_context(
+        telemetry_manager=TelemetryManager(),
+        agent_id=f"agent-{uuid.uuid4()}",
+        run_id=f"run-{uuid.uuid4()}",
+        step_id=step_id,
+        call_type="agent_step",
+        org_id=actor.organization_id,
+    )
+    client.actor = actor
+
+    # Make a request that will fail
+    request_data = {
+        "model": invalid_llm_config.model,
+        "messages": [{"role": "user", "content": "test"}],
+        "max_tokens": 100,
+    }
+
+    try:
+        await client.request_async_with_telemetry(request_data, invalid_llm_config)
+    except Exception:
+        pass  # Expected to fail
+
+    # Wait for the error trace to be written
+    trace = await _wait_for_raw_trace(step_id=step_id, organization_id=actor.organization_id)
+
+    # Verify error fields
+    assert trace.step_id == step_id
+    assert trace.is_error is True
+    assert trace.error_type is not None, "error_type should be set for error traces"
+    assert trace.error_message is not None, "error_message should be set for error traces"
+
+    # Verify llm_config_json is still stored even for errors
+    assert trace.llm_config_json, "llm_config_json should be stored even for error traces"
+    config_data = json.loads(trace.llm_config_json)
+    assert config_data.get("model") == invalid_llm_config.model
+
+
+@pytest.mark.asyncio
+async def test_cache_tokens_stored_for_anthropic(server: SyncServer, actor, clickhouse_settings):
+    """Test that Anthropic cache tokens (cached_input_tokens, cache_write_tokens) are stored.
+
+    Note: This test verifies the fields are properly stored when present in the response.
+    Actual cache token values depend on Anthropic's prompt caching behavior.
+    """
+    message_text = f"Cache test {uuid.uuid4()}"
+    llm_config = _anthropic_llm_config()
+
+    agent_state = await server.agent_manager.create_agent_async(
+        CreateAgent(
+            name=f"cache_test_agent_{uuid.uuid4().hex[:8]}",
+            llm_config=llm_config,
+            embedding_config=EmbeddingConfig.default_config(model_name="letta"),
+        ),
+        actor=actor,
+    )
+
+    agent = LettaAgentV3(agent_state=agent_state, actor=actor)
+    run = await server.run_manager.create_run(
+        Run(agent_id=agent_state.id),
+        actor=actor,
+    )
+
+    # Make two requests - second may benefit from caching
+    response1 = await agent.step(
+        [MessageCreate(role=MessageRole.user, content=[TextContent(text=message_text)])],
+        run_id=run.id,
+    )
+    step_id_1 = next(msg.step_id for msg in reversed(response1.messages) if msg.step_id)
+
+    response2 = await agent.step(
+        [MessageCreate(role=MessageRole.user, content=[TextContent(text="Follow up question")])],
+        run_id=run.id,
+    )
+    step_id_2 = next(msg.step_id for msg in reversed(response2.messages) if msg.step_id)
+
+    # Check traces for both requests
+    trace1 = await _wait_for_raw_trace(step_id=step_id_1, organization_id=actor.organization_id)
+    trace2 = await _wait_for_raw_trace(step_id=step_id_2, organization_id=actor.organization_id)
+
+    # Verify cache fields are accessible (may be None if no caching occurred)
+    # The important thing is they're stored correctly when present
+    for trace in [trace1, trace2]:
+        assert trace.prompt_tokens > 0
+        # Cache fields should be stored (may be None or int)
+        assert trace.cached_input_tokens is None or isinstance(trace.cached_input_tokens, int)
+        assert trace.cache_write_tokens is None or isinstance(trace.cache_write_tokens, int)
+        assert trace.reasoning_tokens is None or isinstance(trace.reasoning_tokens, int)
+
+        # Verify llm_config_json
+        assert trace.llm_config_json
+        config_data = json.loads(trace.llm_config_json)
+        assert config_data.get("model") == llm_config.model