The Pydantic validator `set_model_specific_defaults` was checking
`values.get("max_tokens") is None`, which matched both "field not
provided" and "field explicitly set to null". This meant users could
not disable the max output tokens limit for GPT-5/GPT-4.1 models -
the validator would always override null with a default value during
request deserialization.
Changed to `"max_tokens" not in values` so that an explicit
`max_tokens: null` is preserved while still applying defaults when
the field is omitted entirely.
list_llm_models_async was constructing LLMConfig without max_tokens,
causing the GET /models/ endpoint to return null for max_tokens.
Now calls typed_provider.get_default_max_output_tokens() for both
base and BYOK provider paths, matching get_llm_config_from_handle.
Add 3.1 model metadata for Google AI and update Gemini tests/examples to use the new handle.
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
perf(memfs): delta upload — only push new/modified git objects after commit
Instead of re-uploading the entire .git/ directory after every commit,
snapshot file mtimes before the commit and only upload files that are
new or changed. A typical single-block update creates ~5 new objects
(blob, trees, commit, ref) vs re-uploading all ~30.
Full _upload_repo retained for create_repo and other paths that need it.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
LlmConfig._to_model_settings() for Anthropic built an AnthropicModelSettings
object without passing effort=self.effort, so GET /agents/{id} never returned
the effort field in model_settings even when it was stored on the agent.
The Letta Code CLI derives the reasoning tier displayed in the status bar
from model_settings.effort (canonical source), so the footer always showed
e.g. "Sonnet 4.6" instead of "Sonnet 4.6 (high)" after a model switch.
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* fix(core): always create system message even with _init_with_no_messages
When _init_with_no_messages=True (used by agent import flows), the agent
was created with message_ids=None. If subsequent message initialization
failed, this left orphaned agents that crash when context window is
calculated (TypeError on message_ids[1:]).
Now the system message is always generated and persisted, even when
skipping the rest of the initial message sequence. This ensures every
agent has at least message_ids=[system_message_id].
Fixes Datadog issue 773a24ea-eeb3-11f0-8f9f-da7ad0900000
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): clean up placeholder messages during import and add test
Delete placeholder system messages after imported messages are
successfully created (not before), so agents retain their safety-net
system message if import fails. Also adds a test verifying that
_init_with_no_messages=True still produces a valid context window.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): add descriptive error for empty message_ids in get_system_message
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
* fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming
When reasoning models (gpt-5.x) exhaust their max_output_tokens budget
on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent
instead of ResponseCompletedEvent. This was previously unhandled, causing
final_response to remain None — which meant get_content() and
get_tool_call_objects() returned empty results, silently dropping the
partial response.
Now ResponseIncompleteEvent is handled identically to
ResponseCompletedEvent (extracting partial content, usage stats, and
token details), with an additional warning log indicating the incomplete
reason.
* fix(core): propagate finish_reason for Responses API incomplete events
- Guard usage extraction against None usage payload in
ResponseIncompleteEvent handler
- Add _finish_reason override to LettaLLMAdapter so streaming adapters
can explicitly set finish_reason without a chat_completions_response
- Map incomplete_details.reason="max_output_tokens" to
finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat
Completions API convention
- This allows the agent loop's _decide_continuation to correctly return
stop_reason="max_tokens_exceeded" instead of "end_turn" when the model
exhausts its output token budget on reasoning
* fix(core): handle empty content parts in incomplete ResponseOutputMessage
When a model hits max_output_tokens after starting a ResponseOutputMessage
but before producing any content parts, the message has content=[]. This
previously raised ValueError("Got 0 content parts, expected 1"). Now it
logs a warning and skips the empty message, allowing reasoning-only
incomplete responses to be processed cleanly.
* fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens
Handle content_filter and any future unknown incomplete reasons from the
Responses API instead of silently leaving finish_reason as None.
* fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492)
* fix(core): apply OpenAI prompt cache settings to request payloads
Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): prefix prompt cache key with letta tag
Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* add integration test
* skip test
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Ari Webb <ari@letta.com>
* fix(core): only set prompt_cache_retention, drop prompt_cache_key
Two issues with the original prompt_cache_key approach:
1. Key exceeded 64-char max (agent-<uuid>:conv-<uuid> = 90 chars)
2. Setting an explicit key disrupted OpenAI's default prefix-hash
routing, dropping cache hit rates from 40-45% to 10-13%
OpenAI's default routing (hash of first ~256 tokens) already provides
good cache affinity since each agent has a unique system prompt.
We only need prompt_cache_retention="24h" for extended retention.
Also fixes:
- Operator precedence bug in _supports_extended_prompt_cache_retention
- Removes incorrect gpt-5.2-codex exclusion (it IS supported per docs)
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Charles Packer <packercharles@gmail.com>
Co-authored-by: Letta <noreply@letta.com>
Adds a diagnostic log at the streaming chokepoint in LettaAgentV3.stream()
to detect when any LettaMessage chunk is yielded without an id or otid field.
This helps trace the root cause of client-side id/otid inconsistencies.
* auto fixes
* auto fix pt2 and transitive deps and undefined var checking locals()
* manual fixes (ignored or letta-code fixed)
* fix circular import
* remove all ignores, add FastAPI rules and Ruff rules
* add ty and precommit
* ruff stuff
* ty check fixes
* ty check fixes pt 2
* error on invalid
* fix(core): apply OpenAI prompt cache settings to request payloads
Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): prefix prompt cache key with letta tag
Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* add integration test
* skip test
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Ari Webb <ari@letta.com>
502s and upstream connection errors (envoy proxy failures) from ChatGPT
were not being retried. This classifies them as LLMConnectionError (retryable)
in both the streaming and non-streaming paths, and adds retry handling in
the non-streaming HTTPStatusError handler so 502s get the same exponential
backoff treatment as transport-level connection drops.
🐾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
Avoid failing message-list endpoints when historical send_message tool calls are missing the expected message argument by logging and skipping malformed entries during conversion.
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
The "Experience the new ADE" page was outdated and no longer useful.
Root path now redirects to /docs (FastAPI Swagger UI) instead.
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
LLMError exceptions are already properly formatted errors that should
propagate directly. Without this check, they get unnecessarily wrapped
by handle_llm_error, losing their original error information.
* debug: log statement_timeout + connection pid on every session checkout
Temporary instrumentation to diagnose why some PlanetScale connections
have statement_timeout=5s while others have 0 (disabled).
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* debug: log statement_timeout on every checkout, not just non-zero
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: rollback implicit transaction from debug query
The SELECT implicitly begins a transaction, causing "A transaction is
already begun" errors for code that calls session.begin() explicitly.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
* new prompts for sliding window and all compaction + defaults to corresponding prompt
* regenerate api spec
---------
Co-authored-by: Amy Guan <amy@letta.com>
fix: update system prompt metadata label from "Memory blocks were last modified" to "System prompt last recompiled"
When git-based memory is enabled, there are no memory blocks, so the label
"Memory blocks were last modified" is inaccurate. Changed to
"System prompt last recompiled" which accurately reflects the timestamp meaning.
Fixes#9476🐾 Generated with [Letta Code](https://letta.com)
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
* fix(core): catch bare openai.APIError in handle_llm_error fallthrough
openai.APIError raised during streaming (e.g. OpenRouter credit
exhaustion) is not an APIStatusError, so it skipped the catch-all
at the end and fell through to LLMError("Unhandled"). Now bare
APIErrors that aren't context window overflows are mapped to
LLMBadRequestError.
Datadog: https://us5.datadoghq.com/error-tracking/issue/7a2c356c-0849-11f1-be66-da7ad0900000🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* feat(core): add LLMInsufficientCreditsError for BYOK credit exhaustion
Adds dedicated error type for insufficient credits/quota across all
providers (OpenAI, Anthropic, Google). Returns HTTP 402 with
BYOK-aware messaging instead of generic 400.
- New LLMInsufficientCreditsError class and PAYMENT_REQUIRED ErrorCode
- is_insufficient_credits_message() helper detecting credit/quota strings
- All 3 provider clients detect 402 status + credit keywords
- FastAPI handler returns 402 with "your API key" vs generic messaging
- 5 new parametrized tests covering OpenRouter, OpenAI, and negative case
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
* feat: add template rollback endpoint [LET-7423]
Adds POST /v1/templates/:template_name/rollback endpoint to restore templates to previous versions.
Key features:
- Rollback to any numbered version (1, 2, 3, etc.) or "latest"
- Auto-saves unsaved changes before rollback to prevent data loss
- Validates input (rejects "current"/"dev" as target versions)
- Preserves entity IDs and relationships across rollback
- Uses project context from X-Project header (no project_id in path)
Implementation includes:
- API contract in templatesContract.ts
- Handler in templatesRouter.ts with comprehensive error handling
- 9 E2E tests covering functionality and edge cases
- Updated stainless.yml for SDK generation
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore test
* fix: add X-Project header to rollback endpoint tests
The rollback endpoint uses project context from X-Project header instead of URL path.
Updated all rollback test calls to include the X-Project header with testProject value.
This follows the no-project-in-path pattern for template endpoints.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* feat: support both URL patterns for rollback endpoint
Added dual URL pattern support for rollback endpoint:
- `/v1/templates/:project_id/:template_name/rollback` (with project in path)
- `/v1/templates/:template_name/rollback` (NoProject, uses X-Project header)
Backend supports both patterns, but Stainless only exposes the cleaner NoProject version for SDKs.
Key changes:
- Fixed "rollback to latest" bug by resolving target version BEFORE auto-saving
- NoProject route is exported first to ensure correct route matching order
- Updated tests to use project_id in path for better compatibility
- All 8 rollback tests passing
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* bump
* bump
* bump
---------
Co-authored-by: Letta <noreply@letta.com>
- Map httpx.ReadError/WriteError/ConnectError to LLMConnectionError in
handle_llm_error so Temporal correctly classifies them as retryable
(previously fell through to generic non-retryable LLMError)
- Add client-level retry with exponential backoff (up to 3 attempts) on
request_async and stream_async for transient transport errors
- Stream retry is guarded by has_yielded flag to avoid corrupting
partial responses already consumed by the caller
get_content() was only setting signature on ReasoningContent items.
When Gemini returns a function call with thought_signature but no
ReasoningContent (e.g. include_thoughts=False), the signature was
stored on self.thinking_signature but never attached to TextContent.
This caused "missing thought_signature in functionCall parts" errors
when the message was echoed back to Gemini on the next turn.
🐾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* catch contextwindowexceeded error
* fix(core): detect Google token limit errors as ContextWindowExceededError
Google's error message says "input token count exceeds the maximum
number of tokens allowed" which doesn't contain the word "context",
so it was falling through to generic LLMBadRequestError instead of
ContextWindowExceededError. This means compaction won't auto-trigger.
Expands the detection to also match "token count" and "tokens allowed"
in addition to the existing "context" keyword.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): add missing message arg to LLMBadRequestError in OpenAI client
The generic 400 path in handle_llm_error was constructing
LLMBadRequestError without the required message positional arg,
causing TypeError in prod during summarization.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* ci: add adapters/ test suite to core unit test matrix
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(tests): update adapter error handling test expectations to match actual behavior
The streaming adapter's error handling double-wraps errors: the
AnthropicStreamingInterface calls handle_llm_error first, then the
adapter catches the result and calls handle_llm_error again, which
falls through to the base class LLMError. Updated test expectations
to match this behavior.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): prevent double-wrapping of LLMError in stream adapter
The AnthropicStreamingInterface.process() already transforms raw
provider errors into LLMError subtypes via handle_llm_error. The
adapter was catching the result and calling handle_llm_error again,
which didn't recognize the already-transformed LLMError and wrapped
it in a generic LLMError("Unhandled LLM error"). This downgraded
specific error types (LLMConnectionError, LLMServerError, etc.)
and broke retry logic that matches on specific subtypes.
Now the adapter checks if the error is already an LLMError and
re-raises it as-is. Tests restored to original correct expectations.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
* test(core): strengthen git-memory system prompt stability integration coverage
Switch git-memory HTTP integration tests to OpenAI model handles and add assertions that system prompt content remains stable after normal turns and direct block value updates until explicit recompilation or reset.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): preserve git-memory formatting and enforce lock conflicts
Preserve existing markdown frontmatter formatting on block updates while still ensuring required metadata fields exist, and make post-push git sync propagate memory-repo lock conflicts as 409 responses. Also enable slash-containing core-memory block labels in route params and add regression coverage.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(memfs): fail closed on memory repo lock contention
Make memfs git commits fail closed when the per-agent Redis lock cannot be acquired, return 409 MEMORY_REPO_BUSY from the memfs files write API, and map that 409 back to core MemoryRepoBusyError so API callers receive consistent busy conflicts.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore(core): minimize git-memory fix scope to memfs lock and frontmatter paths
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: drop unrelated changes and keep memfs-focused scope
Revert branch-only changes that are not required for the memfs lock contention and frontmatter-preservation fix so the PR contains only issue-relevant files.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(memfs): lock push sync path and improve nested sync diagnostics
Serialize memfs push-to-GCS sync with the same per-agent Redis lock key used by API commits, and add targeted post-push nested-block diagnostics plus a focused nested-label sync regression test for _sync_after_push.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
* fix(core): stabilize system prompt refresh and expand git-memory coverage
Only rebuild system prompts on explicit refresh paths so normal turns preserve prefix-cache stability, including git/custom prompt layouts. Add integration coverage for memory filesystem tree structure and recompile/reset system-message updates via message-id retrieval.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): recompile system prompt around compaction and stabilize source tests
Force system prompt refresh before/after compaction in LettaAgentV3 so repaired system+memory state is used and persisted across subsequent turns. Update source-system prompt tests to explicitly recompile before raw preview assertions instead of assuming automatic rebuild timing.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
GoogleAIClient and GoogleVertexClient were hardcoding Letta's managed
credentials for all requests, ignoring user-provided BYOK API keys.
This meant Letta was paying Google API costs for BYOK users.
Add _get_client_async and update _get_client to check BYOK overrides
(via get_byok_overrides / get_byok_overrides_async) before falling back
to managed credentials, matching the pattern used by OpenAIClient and
AnthropicClient.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>