Files

Sarah Wooders 221b4e6279 refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065 )

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>

2026-01-29 12:44:04 -08:00

name, description

name	description
llm-provider-usage-statistics	Reference guide for token counting and prefix caching across LLM providers (OpenAI, Anthropic, Gemini). Use when debugging token counts or optimizing prefix caching.

LLM Provider Usage Statistics

Reference documentation for how different LLM providers report token usage.

Quick Reference: Token Counting Semantics

Provider	`input_tokens` meaning	Cache tokens	Must add cache to get total?
OpenAI	TOTAL (includes cached)	`cached_tokens` is subset	No
Anthropic	NON-cached only	`cache_read_input_tokens` + `cache_creation_input_tokens`	Yes
Gemini	TOTAL (includes cached)	`cached_content_token_count` is subset	No

Critical difference: Anthropic's input_tokens excludes cached tokens, so you must add them:

total_input = input_tokens + cache_read_input_tokens + cache_creation_input_tokens

Provider	Min tokens	How to enable	TTL
OpenAI	1,024	Automatic	~5-10 min
Anthropic	1,024	Requires `cache_control` breakpoints	5 min
Gemini 2.0+	1,024	Automatic (implicit)	Variable

Provider	Field name	Models
OpenAI	`reasoning_tokens`	o1, o3 models
Anthropic	N/A	(thinking is in content blocks, not usage)
Gemini	`thoughts_token_count`	Gemini 2.0 with thinking enabled

OpenAI: references/openai.md - Chat Completions vs Responses API, reasoning models, cached_tokens
Anthropic: references/anthropic.md - cache_control setup, beta headers, cache token fields
Gemini: references/gemini.md - implicit caching, thinking tokens, usage_metadata fields