Files
letta-server/.skills/llm-provider-usage-statistics/references/anthropic.md
Sarah Wooders 221b4e6279 refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065)
👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00

2.4 KiB

Anthropic Usage Statistics

Response Format

response.usage.input_tokens                  # NON-cached input tokens only
response.usage.output_tokens                 # Output tokens
response.usage.cache_read_input_tokens       # Tokens read from cache
response.usage.cache_creation_input_tokens   # Tokens written to cache

Critical: Token Calculation

Anthropic's input_tokens is NOT the total. To get total input tokens:

total_input = input_tokens + cache_read_input_tokens + cache_creation_input_tokens

This is different from OpenAI/Gemini where prompt_tokens is already the total.

Prefix Caching (Prompt Caching)

Requirements:

  • Minimum 1,024 tokens for Claude 3.5 Haiku/Sonnet
  • Minimum 2,048 tokens for Claude 3 Opus
  • Requires explicit cache_control breakpoints in messages
  • TTL: 5 minutes

How to enable: Add cache_control to message content:

{
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "...",
            "cache_control": {"type": "ephemeral"}
        }
    ]
}

Beta header required:

betas = ["prompt-caching-2024-07-31"]

Cache Behavior

  • cache_creation_input_tokens: Tokens that were cached on this request (cache write)
  • cache_read_input_tokens: Tokens that were read from existing cache (cache hit)
  • On first request: expect cache_creation_input_tokens > 0
  • On subsequent requests with same prefix: expect cache_read_input_tokens > 0

Streaming

In streaming mode, usage is reported in two events:

  1. message_start: Initial usage (may have cache info)

    event.message.usage.input_tokens
    event.message.usage.output_tokens
    event.message.usage.cache_read_input_tokens
    event.message.usage.cache_creation_input_tokens
    
  2. message_delta: Cumulative output tokens

    event.usage.output_tokens  # This is CUMULATIVE, not incremental
    

Important: Per Anthropic docs, message_delta token counts are cumulative, so assign (don't accumulate).

Letta Implementation

  • Client: letta/llm_api/anthropic_client.py
  • Streaming interfaces:
    • letta/interfaces/anthropic_streaming_interface.py
    • letta/interfaces/anthropic_parallel_tool_call_streaming_interface.py (tracks cache tokens)
  • Extract method: AnthropicClient.extract_usage_statistics()
  • Cache control: _add_cache_control_to_system_message(), _add_cache_control_to_messages()