Commit Graph

7227 Commits

Author SHA1 Message Date
github-actions[bot]
ba67621e1b feat: add conversation deletion endpoint (soft delete) [LET-7286] (#9230)
* feat: add conversation deletion endpoint (soft delete) [LET-7286]

- Add DELETE /conversations/{conversation_id} endpoint
- Filter soft-deleted conversations from list operations
- Add check_is_deleted=True to update/delete operations

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add tests, update SDK and stainless for delete conversation

- Add 5 integration tests for DELETE conversation endpoint
- Run stage-api to regenerate OpenAPI spec and SDK
- Add delete method to conversations in stainless.yml

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* test: add manager-level tests for conversation soft delete [LET-7286]

- test_delete_conversation_removes_from_list
- test_delete_conversation_double_delete_raises
- test_update_deleted_conversation_raises
- test_delete_conversation_excluded_from_summary_search

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
2026-02-24 10:55:12 -08:00
jnjpng
9c8589a687 fix: correct ChatGPT OAuth GPT-5 max output token defaults (#9592)
fix: align ChatGPT OAuth GPT-5 max output token defaults

Update ChatGPT OAuth provider defaults so GPT-5 family models report 128k max output tokens based on current OpenAI model docs, avoiding incorrect 16k values in /v1/models responses.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:12 -08:00
Shubham Naik
73c824f5d2 feat: make agent_id optional in conversations list endpoint [LET-7612] (#9585)
* feat: make agent_id optional in conversations list endpoint [LET-7612]

Allow listing all conversations without filtering by agent_id.

**Router changes (conversations.py):**
- Changed agent_id from required (`...`) to optional (`None`)
- Updated description to clarify behavior
- Updated docstring to reflect optional filtering

**Manager changes (conversation_manager.py):**
- Updated list_conversations signature: agent_id: str → Optional[str]
- Updated docstring to clarify optional behavior
- Summary search query: conditionally adds agent_id filter only if provided
- Default list logic: passes agent_id (can be None) to list_async

**How it works:**
- Without agent_id: returns all conversations for the user's organization
- With agent_id: returns conversations filtered by that agent
- list_async handles None gracefully via **kwargs pattern

**Use case:**
- Cloud UI can list all user conversations across agents
- Still supports filtering by agent_id when needed

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: update logs

* chore: update logs

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:12 -08:00
jnjpng
257b99923b fix: preserve max_tokens on model_settings updates without max_output_tokens (#9591)
When model_settings is sent without max_output_tokens (e.g. only
changing reasoning_effort), the Pydantic default of 4096 was being
applied via _to_legacy_config_params(), silently overwriting the
agent's existing max_tokens.

Use model_fields_set to detect when max_output_tokens was not
explicitly provided and skip overwriting max_tokens in that case.
Only applied to the update path — on create, letting the default
apply is reasonable since there's no pre-existing value.
2026-02-24 10:55:12 -08:00
cthomas
857c289ed2 fix: handle compact edge case in idempotency check (#9588) 2026-02-24 10:55:12 -08:00
Shubham Naik
34bab3cf9a Shub/let listener mode control (#9584)
* feat: add two-way mode control for listener connections

Enable bidirectional permission mode control between letta-cloud UI and letta-code instances.

**Backend:**
- Added ModeChangeMessage and ModeChangedMessage to WebSocket protocol
- Added sendModeChange endpoint (/v1/listeners/:connectionId/mode)
- listenersRouter publishes mode_change via Redis Pub/Sub
- listenersHandler handles mode_changed acknowledgments from letta-code
- Stores current mode in Redis for UI state sync

**Contract:**
- Added sendModeChange contract with PermissionModeSchema
- 4 modes: default, acceptEdits, plan, bypassPermissions

**Frontend:**
- Extended PermissionMode type to 4 modes (was 2: ask/never)
- PermissionModeSelector now shows all 4 modes with descriptions
- Added disabled prop (grayed out when Cloud orchestrator selected)
- PermissionModeContext.sendModeChangeToDevice() calls API
- AgentMessenger sends mode changes to device on mode/device change
- Updated auto-approval logic (only in Cloud mode, only for bypassPermissions)
- Updated inputMode logic (device handles approvals, not cloud)

**Translations:**
- Updated en.json with 4 mode labels and descriptions
- Removed legacy "askAlways" and "neverAsk" keys

**Mode Behavior:**
- default: Ask permission for each tool
- acceptEdits: Auto-approve file edits only
- plan: Read-only exploration (denies writes)
- bypassPermissions: Auto-approve everything

**Lint Fixes:**
- Removed unused imports and functions from trackingMiddleware.ts

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: store mode in connectionData and show approvals for all modes

**Backend:**
- Fixed Redis WRONGTYPE error - store currentMode inside connectionData object
- Changed const connectionData to let connectionData (needs mutation)
- Updated mode_changed handler to reassign entire connectionData object
- Updated ping handler for consistency (also reassigns connectionData)
- Added currentMode field to ListenerConnectionSchema (optional)

**Frontend:**
- Simplified inputMode logic - always show approval UI when toolCallsToApprove.length > 0
- Removed mode-specific approval filtering (show approvals even in bypass/acceptEdits for visibility)
- Users can see what tools are being auto-approved during execution

**Why:**
- Redis key is a JSON string (via setRedisData), not a hash
- Cannot use hset on string keys - causes WRONGTYPE error
- Must update entire object via setRedisData like ping handler does
- Approval visibility helpful for debugging/understanding agent behavior

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use useMutation hook for sendModeChange instead of direct call

cloudAPI is initialized via initTsrReactQuery, so sendModeChange is a
mutation hook object, not a callable function. Use .useMutation() at the
component level and mutateAsync in the callback.

Co-authored-by: Shubham Naik <4shub@users.noreply.github.com>

* chore: update logs

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Shubham Naik <4shub@users.noreply.github.com>
2026-02-24 10:55:12 -08:00
cthomas
73c9b14fa9 fix: dont throw error if compaction races (#9576) 2026-02-24 10:55:12 -08:00
jnjpng
f10440b49c fix: update Anthropic Haiku test model after 3.5 retirement (#9569)
* fix: migrate Anthropic Haiku test model off retired release

Update Anthropic Haiku references in integration and usage parsing tests to a supported model id so test requests stop failing with 404 model not found errors.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use canonical Anthropic Haiku handle in tests

Replace dated Anthropic Haiku handle references with the canonical provider handle so handle-based model resolution does not fail in batch and client tests.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:12 -08:00
cthomas
ddaf4053f6 test: fix parallel tool call default value (#9572) 2026-02-24 10:55:12 -08:00
amysguan
a101d5980d Fix: load config for summarizer model from defaults instead of agent's config (#9568)
* load default settings instead of loading from agent for summarizer config

* update tests to allow use of get_llm_config_from_handle

* remove nit comment

---------

Co-authored-by: Amy Guan <amy@letta.com>
2026-02-24 10:55:12 -08:00
jnjpng
b29d063ba7 feat: forward letta-code feedback context fields to PostHog (#9567)
* feat: add flattened feedback context fields for PostHog filterability

Add system info (local_time, device_type, cwd), session stats (token counts,
timing), agent info (agent_name, agent_description, model), and account info
(billing_tier) as flat top-level fields to the /v1/metadata/feedback endpoint.

Flat fields allow direct PostHog filtering/breakdown without HogQL.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: regenerate API specs

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:12 -08:00
amysguan
33969d7190 Default to lightweight compaction model instead of agent's model (#9488)
---------

Co-authored-by: Amy Guan <amy@letta.com>
2026-02-24 10:55:12 -08:00
jnjpng
eb4a0daabd fix: allow explicit null for max_tokens on GPT-5 models (#9562)
The Pydantic validator `set_model_specific_defaults` was checking
`values.get("max_tokens") is None`, which matched both "field not
provided" and "field explicitly set to null". This meant users could
not disable the max output tokens limit for GPT-5/GPT-4.1 models -
the validator would always override null with a default value during
request deserialization.

Changed to `"max_tokens" not in values` so that an explicit
`max_tokens: null` is preserved while still applying defaults when
the field is omitted entirely.
2026-02-24 10:55:12 -08:00
jnjpng
828c89c76f fix: populate max_tokens when listing LLM models (#9559)
list_llm_models_async was constructing LLMConfig without max_tokens,
causing the GET /models/ endpoint to return null for max_tokens.
Now calls typed_provider.get_default_max_output_tokens() for both
base and BYOK provider paths, matching get_llm_config_from_handle.
2026-02-24 10:55:12 -08:00
Kevin Lin
bd5b5fa9f3 feat(gemini): add 3.1 pro preview support (#9553)
Add 3.1 model metadata for Google AI and update Gemini tests/examples to use the new handle.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
cthomas
e2ad8762fe fix: gemini streaming bug (#9555) 2026-02-24 10:55:11 -08:00
cthomas
8ffc515674 fix: flip parallel_tool_calls setting default (#9541) 2026-02-24 10:55:11 -08:00
cthomas
3cdd64dc24 chore: update keepalive interval 50->20 (#9538)
* chore: update keepalive interval 50->20

* update comment
2026-02-24 10:55:11 -08:00
Kian Jones
8f56527958 perf(memfs): delta upload — only push new git objects after commit (#9548)
perf(memfs): delta upload — only push new/modified git objects after commit

Instead of re-uploading the entire .git/ directory after every commit,
snapshot file mtimes before the commit and only upload files that are
new or changed. A typical single-block update creates ~5 new objects
(blob, trees, commit, ref) vs re-uploading all ~30.

Full _upload_repo retained for create_repo and other paths that need it.

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
Charles Packer
044241daec fix(core): include effort in AnthropicModelSettings returned by _to_model_settings() (#9543)
LlmConfig._to_model_settings() for Anthropic built an AnthropicModelSettings
object without passing effort=self.effort, so GET /agents/{id} never returned
the effort field in model_settings even when it was stored on the agent.

The Letta Code CLI derives the reasoning tier displayed in the status bar
from model_settings.effort (canonical source), so the footer always showed
e.g. "Sonnet 4.6" instead of "Sonnet 4.6 (high)" after a model switch.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
Kian Jones
e65795b5f1 fix(core): handle None message_ids in context window calculator (#9330)
* fix(core): always create system message even with _init_with_no_messages

When _init_with_no_messages=True (used by agent import flows), the agent
was created with message_ids=None. If subsequent message initialization
failed, this left orphaned agents that crash when context window is
calculated (TypeError on message_ids[1:]).

Now the system message is always generated and persisted, even when
skipping the rest of the initial message sequence. This ensures every
agent has at least message_ids=[system_message_id].

Fixes Datadog issue 773a24ea-eeb3-11f0-8f9f-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): clean up placeholder messages during import and add test

Delete placeholder system messages after imported messages are
successfully created (not before), so agents retain their safety-net
system message if import fails. Also adds a test verifying that
_init_with_no_messages=True still produces a valid context window.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): add descriptive error for empty message_ids in get_system_message

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
jnjpng
e8d5922ff9 fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming (#9535)
* fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming

When reasoning models (gpt-5.x) exhaust their max_output_tokens budget
on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent
instead of ResponseCompletedEvent. This was previously unhandled, causing
final_response to remain None — which meant get_content() and
get_tool_call_objects() returned empty results, silently dropping the
partial response.

Now ResponseIncompleteEvent is handled identically to
ResponseCompletedEvent (extracting partial content, usage stats, and
token details), with an additional warning log indicating the incomplete
reason.

* fix(core): propagate finish_reason for Responses API incomplete events

- Guard usage extraction against None usage payload in
  ResponseIncompleteEvent handler
- Add _finish_reason override to LettaLLMAdapter so streaming adapters
  can explicitly set finish_reason without a chat_completions_response
- Map incomplete_details.reason="max_output_tokens" to
  finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat
  Completions API convention
- This allows the agent loop's _decide_continuation to correctly return
  stop_reason="max_tokens_exceeded" instead of "end_turn" when the model
  exhausts its output token budget on reasoning

* fix(core): handle empty content parts in incomplete ResponseOutputMessage

When a model hits max_output_tokens after starting a ResponseOutputMessage
but before producing any content parts, the message has content=[]. This
previously raised ValueError("Got 0 content parts, expected 1"). Now it
logs a warning and skips the empty message, allowing reasoning-only
incomplete responses to be processed cleanly.

* fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens

Handle content_filter and any future unknown incomplete reasons from the
Responses API instead of silently leaving finish_reason as None.
2026-02-24 10:55:11 -08:00
Ari Webb
5896e5d023 fix: logging for credit verification step (#9514) 2026-02-24 10:55:11 -08:00
cthomas
3651658ea7 fix: tool call streaming using deprecated field (#9517) 2026-02-24 10:55:11 -08:00
Ari Webb
21765d16c9 fix(core): add OpenAI 24h prompt cache retention for supported models (#9509)
* fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492)

* fix(core): apply OpenAI prompt cache settings to request payloads

Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): prefix prompt cache key with letta tag

Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* add integration test

* skip test

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Ari Webb <ari@letta.com>

* fix(core): only set prompt_cache_retention, drop prompt_cache_key

Two issues with the original prompt_cache_key approach:
1. Key exceeded 64-char max (agent-<uuid>:conv-<uuid> = 90 chars)
2. Setting an explicit key disrupted OpenAI's default prefix-hash
   routing, dropping cache hit rates from 40-45% to 10-13%

OpenAI's default routing (hash of first ~256 tokens) already provides
good cache affinity since each agent has a unique system prompt.
We only need prompt_cache_retention="24h" for extended retention.

Also fixes:
- Operator precedence bug in _supports_extended_prompt_cache_retention
- Removes incorrect gpt-5.2-codex exclusion (it IS supported per docs)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Charles Packer <packercharles@gmail.com>
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:11 -08:00
jnjpng
042c9c36af fix(core): add warning log for streaming chunks missing id or otid (#9513)
Adds a diagnostic log at the streaming chokepoint in LettaAgentV3.stream()
to detect when any LettaMessage chunk is yielded without an id or otid field.
This helps trace the root cause of client-side id/otid inconsistencies.
2026-02-24 10:55:11 -08:00
Shubham Naik
3247fa7065 chore: fix favoriting bugs (#9505)
* chore: fix favoriting bugs

* chore: fix favoriting bugs

* chore: fix favoriting bugs
2026-02-24 10:55:11 -08:00
Shubham Naik
4a829123cd Listener mode (#9486)
* feat: listener mdoe

* feat: listener mdoe

* feat: listener mdoe

* feat: listener mdoe

* feat: listener mdoe

* feat: listener mdoe

* chore: merge

* feat: listen mode

* feat: add some keycontrols

* feat: add some keycontrols

* chore: hotwire fix for core

* chore: restore connection
2026-02-24 10:55:11 -08:00
Kian Jones
f5c4ab50f4 chore: add ty + pre-commit hook and repeal even more ruff rules (#9504)
* auto fixes

* auto fix pt2 and transitive deps and undefined var checking locals()

* manual fixes (ignored or letta-code fixed)

* fix circular import

* remove all ignores, add FastAPI rules and Ruff rules

* add ty and precommit

* ruff stuff

* ty check fixes

* ty check fixes pt 2

* error on invalid
2026-02-24 10:55:11 -08:00
Devansh Jain
39ddda81cc feat: add Anthropic Sonnet 4.6 (#9408) 2026-02-24 10:55:11 -08:00
Kian Jones
25d54dd896 chore: enable F821, F401, W293 (#9503)
* auto fixes

* auto fix pt2 and transitive deps and undefined var checking locals()

* manual fixes (ignored or letta-code fixed)

* fix circular import
2026-02-24 10:55:08 -08:00
Ari Webb
fa70e09963 Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention" (#9502)
Revert "fix(core): add OpenAI prompt cache key and model-gated 24h retention …"

This reverts commit f5bb9c629cb7d45544e90758cdfb899bcef41912.
2026-02-24 10:52:07 -08:00
Sarah Wooders
2bf3314cef fix: import asyncio for parallel tool calls (#9501) 2026-02-24 10:52:07 -08:00
Shubham Naik
20c71523f8 chore: hotwire fix for core (#9500) 2026-02-24 10:52:07 -08:00
Shubham Naik
e66981c7e8 feat: update undertaker to use rate limiter (#9498) 2026-02-24 10:52:07 -08:00
Charles Packer
619e81ed1e fix(core): add OpenAI prompt cache key and model-gated 24h retention (#9492)
* fix(core): apply OpenAI prompt cache settings to request payloads

Set prompt_cache_key using agent and conversation context on both Responses and Chat Completions request builders, and enable 24h retention only for supported OpenAI models while excluding OpenRouter paths.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): prefix prompt cache key with letta tag

Add a `letta:` prefix to generated OpenAI prompt_cache_key values so cache-related entries are easier to identify in provider-side logs and diagnostics.

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* add integration test

* skip test

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Ari Webb <ari@letta.com>
2026-02-24 10:52:07 -08:00
jnjpng
5b001a7749 fix: rename ChatGPT server error to ChatGPT API error (#9497)
fix: rename ChatGPT server error to ChatGPT API error in error messages
2026-02-24 10:52:07 -08:00
jnjpng
fbc0bb60d9 fix: retry ChatGPT 502 and upstream connection errors with exponential backoff (#9495)
502s and upstream connection errors (envoy proxy failures) from ChatGPT
were not being retried. This classifies them as LLMConnectionError (retryable)
in both the streaming and non-streaming paths, and adds retry handling in
the non-streaming HTTPStatusError handler so 502s get the same exponential
backoff treatment as transport-level connection drops.

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Sarah Wooders
26cbdb7b7b fix(core): skip malformed send_message entries in message conversion (#9494)
Avoid failing message-list endpoints when historical send_message tool calls are missing the expected message argument by logging and skipping malformed entries during conversion.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Charles Packer
c32d53f8a3 fix(core): remove old static landing page from Docker image (#9369)
The "Experience the new ADE" page was outdated and no longer useful.
Root path now redirects to /docs (FastAPI Swagger UI) instead.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
amysguan
80a0d1a95f Add LLM client compaction errors to traces (#9474)
* add llm client errors to traces

* update response json for telemetry

* prevent silent failures and properly log errored responses in streaming path

* remove double logging

---------

Co-authored-by: Amy Guan <amy@letta.com>
Co-authored-by: Kian Jones <kian@letta.com>
2026-02-24 10:52:07 -08:00
Shubham Naik
2f76f2c629 Agent page update (#9475)
* feat: agent page updates

* feat: agent page updates

* feat: agent page updates

* feat: agent page updates

* feat: agent page updates

* feat: agent page updates

* chore: fix code

* chore: fix code
2026-02-24 10:52:07 -08:00
jnjpng
e3eafb1977 fix: re-raise LLMError before wrapping with handle_llm_error (#9482)
LLMError exceptions are already properly formatted errors that should
propagate directly. Without this check, they get unnecessarily wrapped
by handle_llm_error, losing their original error information.
2026-02-24 10:52:07 -08:00
Kian Jones
2f0294165c debug: log statement_timeout + connection pid on session checkout (#9472)
* debug: log statement_timeout + connection pid on every session checkout

Temporary instrumentation to diagnose why some PlanetScale connections
have statement_timeout=5s while others have 0 (disabled).

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* debug: log statement_timeout on every checkout, not just non-zero

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: rollback implicit transaction from debug query

The SELECT implicitly begins a transaction, causing "A transaction is
already begun" errors for code that calls session.begin() explicitly.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Ari Webb
0a8a8fda54 feat: add credit verification before agent message endpoints [LET-XXXX] (#9433)
* feat: add credit verification before agent message endpoints

Add credit verification checks to message endpoints to prevent
execution when organizations have insufficient credits.

- Add InsufficientCreditsError exception type
- Add CreditVerificationService that calls step-orchestrator API
- Add credit checks to /agents/{id}/messages endpoints
- Add credit checks to /conversations/{id}/messages endpoint

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* surface error in ade

* do per step instead

* parallel check

* parallel to step

* small fixes

* stage publish api

* fixes

* revert unnecessary frontend changes

* insufficient credits stop reason

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Ari Webb
5faec5632f fix: add m2.5 (#9480)
* fix: add m2.5

* fix test
2026-02-24 10:52:07 -08:00
amysguan
9bec8c64f5 New prompts/defaults for sliding_window and all compaction (#9444)
* new prompts for sliding window and all compaction + defaults to corresponding prompt

* regenerate api spec

---------

Co-authored-by: Amy Guan <amy@letta.com>
2026-02-24 10:52:07 -08:00
github-actions[bot]
0b08164cc2 fix: update system prompt metadata label to "System prompt last recompiled" (#9477)
fix: update system prompt metadata label from "Memory blocks were last modified" to "System prompt last recompiled"

When git-based memory is enabled, there are no memory blocks, so the label
"Memory blocks were last modified" is inaccurate. Changed to
"System prompt last recompiled" which accurately reflects the timestamp meaning.

Fixes #9476



🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
f55ff3a822 fix(core): descriptive error for empty message_ids instead of IndexError (#9464)
fix(core): replace IndexError with descriptive error for empty message_ids

Agent with no in-context messages (empty/None message_ids) would crash
with a cryptic IndexError on message_ids[0]. Now raises a clear
LettaError explaining the system message was not initialized.

Datadog: https://us5.datadoghq.com/error-tracking/issue/6c061c28-0830-11f1-b060-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
80f34f134d fix(core): catch bare openai.APIError in handle_llm_error (#9468)
* fix(core): catch bare openai.APIError in handle_llm_error fallthrough

openai.APIError raised during streaming (e.g. OpenRouter credit
exhaustion) is not an APIStatusError, so it skipped the catch-all
at the end and fell through to LLMError("Unhandled"). Now bare
APIErrors that aren't context window overflows are mapped to
LLMBadRequestError.

Datadog: https://us5.datadoghq.com/error-tracking/issue/7a2c356c-0849-11f1-be66-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat(core): add LLMInsufficientCreditsError for BYOK credit exhaustion

Adds dedicated error type for insufficient credits/quota across all
providers (OpenAI, Anthropic, Google). Returns HTTP 402 with
BYOK-aware messaging instead of generic 400.

- New LLMInsufficientCreditsError class and PAYMENT_REQUIRED ErrorCode
- is_insufficient_credits_message() helper detecting credit/quota strings
- All 3 provider clients detect 402 status + credit keywords
- FastAPI handler returns 402 with "your API key" vs generic messaging
- 5 new parametrized tests covering OpenRouter, OpenAI, and negative case

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00