letta-server/letta/llm_api at 21765d16c9993c195ef60f6679fbc717042c049d - letta-server - WIUF Gitea: Waiting is - Until Fullness

Fimeg/letta-server

Files

History

Ari Webb 21765d16c9 fix(core): add OpenAI 24h prompt cache retention for supported models (#9509 )

* fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492)

* fix(core): apply OpenAI prompt cache settings to request payloads

Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): prefix prompt cache key with letta tag

Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* add integration test

* skip test

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Ari Webb <ari@letta.com>

* fix(core): only set prompt_cache_retention, drop prompt_cache_key

Two issues with the original prompt_cache_key approach:
1. Key exceeded 64-char max (agent-<uuid>:conv-<uuid> = 90 chars)
2. Setting an explicit key disrupted OpenAI's default prefix-hash
   routing, dropping cache hit rates from 40-45% to 10-13%

OpenAI's default routing (hash of first ~256 tokens) already provides
good cache affinity since each agent has a unique system prompt.
We only need prompt_cache_retention="24h" for extended retention.

Also fixes:
- Operator precedence bug in _supports_extended_prompt_cache_retention
- Removes incorrect gpt-5.2-codex exclusion (it IS supported per docs)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Charles Packer <packercharles@gmail.com>
Co-authored-by: Letta <noreply@letta.com>

2026-02-24 10:55:11 -08:00

..

sample_response_jsons

merge this (#4759 )

2025-09-17 15:47:40 -07:00

__init__.py

merge this (#4759 )

2025-09-17 15:47:40 -07:00

anthropic_client.py

chore: add ty + pre-commit hook and repeal even more ruff rules (#9504 )

2026-02-24 10:55:11 -08:00

anthropic_constants.py

feat: Add structured outputs for Anthropic (#7495 )

2026-01-12 10:57:19 -08:00

azure_client.py

fix(core): differentiate BYOK vs base provider in all LLM error details (#9425 )

2026-02-24 10:52:07 -08:00

bedrock_client.py

feat: enable bedrock for anthropic models (#8847 )

2026-01-19 15:54:44 -08:00

chatgpt_oauth_client.py

chore: enable F821, F401, W293 (#9503 )

2026-02-24 10:55:08 -08:00

deepseek_client.py

fix(core): sanitize Unicode surrogates in all LLM client requests (#9323 )

2026-02-24 10:52:06 -08:00

error_utils.py

fix(core): catch bare openai.APIError in handle_llm_error (#9468 )

2026-02-24 10:52:07 -08:00

google_ai_client.py

fix(core): use BYOK API keys for Google AI/Vertex LLM requests (#9439 )

2026-02-24 10:52:07 -08:00

google_constants.py

fix: max output tokens for gemini 3 models (#7322 )

2025-12-17 17:31:03 -08:00

google_vertex_client.py

chore: add ty + pre-commit hook and repeal even more ruff rules (#9504 )

2026-02-24 10:55:11 -08:00

groq_client.py

fix(core): sanitize Unicode surrogates in all LLM client requests (#9323 )

2026-02-24 10:52:06 -08:00

helpers.py

chore: enable F821, F401, W293 (#9503 )

2026-02-24 10:55:08 -08:00

llm_api_tools.py

fix: add LLMCallType enum and ensure call_type is set on all provider traces (#9258 )

2026-02-24 10:52:06 -08:00

llm_client_base.py

Add LLM client compaction errors to traces (#9474 )

2026-02-24 10:52:07 -08:00

llm_client.py

feat: openrouter byok (#9148 )

2026-01-29 12:44:04 -08:00

minimax_client.py

fix(core): sanitize Unicode surrogates in all LLM client requests (#9323 )

2026-02-24 10:52:06 -08:00

mistral.py

fix(core): reject empty API keys in Bearer auth headers (#9350 )

2026-02-24 10:52:07 -08:00

openai_client.py

fix(core): add OpenAI 24h prompt cache retention for supported models (#9509 )

2026-02-24 10:55:11 -08:00

openai.py

chore: add ty + pre-commit hook and repeal even more ruff rules (#9504 )

2026-02-24 10:55:11 -08:00

sglang_native_client.py

chore: enable F821, F401, W293 (#9503 )

2026-02-24 10:55:08 -08:00

together_client.py

fix(core): sanitize Unicode surrogates in all LLM client requests (#9323 )

2026-02-24 10:52:06 -08:00

xai_client.py

fix(core): sanitize Unicode surrogates in all LLM client requests (#9323 )

2026-02-24 10:52:06 -08:00

zai_client.py

fix: zai clear empty messages (#9466 )

2026-02-24 10:52:07 -08:00