* fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492)
* fix(core): apply OpenAI prompt cache settings to request payloads
Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): prefix prompt cache key with letta tag
Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* add integration test
* skip test
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Ari Webb <ari@letta.com>
* fix(core): only set prompt_cache_retention, drop prompt_cache_key
Two issues with the original prompt_cache_key approach:
1. Key exceeded 64-char max (agent-<uuid>:conv-<uuid> = 90 chars)
2. Setting an explicit key disrupted OpenAI's default prefix-hash
routing, dropping cache hit rates from 40-45% to 10-13%
OpenAI's default routing (hash of first ~256 tokens) already provides
good cache affinity since each agent has a unique system prompt.
We only need prompt_cache_retention="24h" for extended retention.
Also fixes:
- Operator precedence bug in _supports_extended_prompt_cache_retention
- Removes incorrect gpt-5.2-codex exclusion (it IS supported per docs)
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Charles Packer <packercharles@gmail.com>
Co-authored-by: Letta <noreply@letta.com>