Commit Graph

7282 Commits

Author SHA1 Message Date
Kevin Lin
4e2bf3ecd6 feat: add gpt-5.3-chat-latest model support (#9746)
Add OpenAI's GPT-5.3 Chat model (128K context, 16K output) with pricing
specs, and remove the "chat" keyword filter so chat variants are listed.

🐾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta Code <noreply@letta.com>
2026-03-03 18:34:15 -08:00
Charles Packer
774305d10a feat(web): device-mode refactor to use proper websocket typing (#9740)
* feat(web): device-mode approval clean v2 (functionally faithful, minimal diff)

* fix(web): do not block chat input on version mismatch

* fix(web): prevent stale queue ref from re-rendering dequeued item

* fix(agent-messenger): prevent stale queue rows from reappearing

* fix(typecheck): align messenger queue/control types across apps

* chore(review): address manifest/docs and typing feedback

* test(ui-ade-components): harden ws replay timeout for CI

* chore(api): sync autogenerated openapi artifacts

* test(ui-ade-components): force real timers in ws replay suite

* chore: resolve main conflict in FunctionCallPreview

* test(ui-ade): harden ws replay streaming CI timeout

* test(ui-ade): temporarily skip new device-mode suites for OOM triage
2026-03-03 18:34:15 -08:00
cthomas
416ffc7cd7 Add billing context to LLM telemetry traces (#9745)
* feat: add billing context to LLM telemetry traces

Add billing metadata (plan type, cost source, customer ID) to LLM traces in ClickHouse for cost analytics and attribution.

**Data Flow:**
- Cloud-API: Extract billing info from subscription in rate limiting, set x-billing-* headers
- Core: Parse headers into BillingContext object via dependencies
- Adapters: Flow billing_context through all LLM adapters (blocking & streaming)
- Agent: Pass billing_context to step() and stream() methods
- ClickHouse: Store in billing_plan_type, billing_cost_source, billing_customer_id columns

**Changes:**
- Add BillingContext schema to provider_trace.py
- Add billing columns to llm_traces ClickHouse table DDL
- Update getCustomerSubscription to fetch stripeCustomerId from organization_billing_details
- Propagate billing_context through agent step flow, adapters, and streaming service
- Update ProviderTrace and LLMTrace to include billing metadata
- Regenerate SDK with autogen

**Production Deployment:**
Requires env vars: LETTA_PROVIDER_TRACE_BACKEND=clickhouse, LETTA_STORE_LLM_TRACES=true, CLICKHOUSE_*

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add billing_context parameter to agent step methods

- Add billing_context to BaseAgent and BaseAgentV2 abstract methods
- Update LettaAgent, LettaAgentV2, LettaAgentV3 step methods
- Update multi-agent groups: SleeptimeMultiAgentV2, V3, V4
- Fix test_utils.py to include billing header parameters
- Import BillingContext in all affected files

* fix: add billing_context to stream methods

- Add billing_context parameter to BaseAgentV2.stream()
- Add billing_context parameter to LettaAgentV2.stream()
- LettaAgentV3.stream() already has it from previous commit

* fix: exclude billing headers from OpenAPI spec

Mark billing headers as internal (include_in_schema=False) so they don't appear in the public API.
These are internal headers between cloud-api and core, not part of the public SDK.

Regenerated SDK with stage-api - removes 10,650 lines of bloat that was causing OOM during Next.js build.

* refactor: return billing context from handleUnifiedRateLimiting instead of mutating req

Instead of passing req into handleUnifiedRateLimiting and mutating headers inside it:
- Return billing context fields (billingPlanType, billingCostSource, billingCustomerId) from handleUnifiedRateLimiting
- Set headers in handleMessageRateLimiting (middleware layer) after getting the result
- This fixes step-orchestrator compatibility since it doesn't have a real Express req object

* chore: remove extra gencode

* p

---------

Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:13 -08:00
jnjpng
db9e0f42af fix(core): prevent ModelSettings default max_output_tokens from overriding agent config (#9739)
* fix(core): prevent ModelSettings default max_output_tokens from overriding agent config

When a conversation's model_settings were saved, the Pydantic default
of max_output_tokens=4096 was always persisted to the DB even when the
client never specified it. On subsequent messages, this default would
overwrite the agent's max_tokens (typically None) with 4096, silently
capping output.

Two changes:
1. Use model_dump(exclude_unset=True) when persisting model_settings
   to the DB so Pydantic defaults are not saved.
2. Add model_fields_set guards at all callsites that apply
   _to_legacy_config_params() to skip max_tokens when it was not
   explicitly provided by the caller.

Also conditionally set max_output_tokens in the OpenAI Responses API
request builder so None is not sent as null (which some models treat
as a hard 4096 cap).

* nit

* Fix model_settings serialization to preserve provider_type discriminator

Replace blanket exclude_unset=True with targeted removal of only
max_output_tokens when not explicitly set. The previous approach
stripped the provider_type field (a Literal with a default), which
broke discriminated union deserialization when reading back from DB.
2026-03-03 18:34:02 -08:00
Ari Webb
8335aa0fa0 fix: add some more logging for interrupts (#9733) 2026-03-03 18:34:02 -08:00
Christina Tong
c8ae02a1fb feat(core): sort agents by updated_at [LET-7771] (#9730)
feat(core): sort agents by last_updated_at
2026-03-03 18:34:02 -08:00
Shubham Naik
c247496027 Clean up server (#9728)
* feat: clean up production

* feat: clean up production

* feat: clean up production

* feat: clean up production

* feat: clean up production

* feat: clean up production

* feat: clean up production

* feat: clean up production
2026-03-03 18:34:02 -08:00
amysguan
8e60b73eee fix: minor change in upsert logic for prompt default (#9729)
minor compaction upsert change
2026-03-03 18:34:02 -08:00
amysguan
c28ba77354 Fix: ADE compaction button compacts current conversation (#9720)
* ADE compaction button compacts current conversation, update conversation endpoint

* update name (summerizer --> summarizer), type fixes

* bug fix for conversation + self_compact_sliding_window

* chore: add French translations for AgentSimulatorOptionsMenu

Add missing French translations for the AgentSimulatorOptionsMenu
section to match en.json changes.

Co-authored-by: Christina Tong <christinatong01@users.noreply.github.com>

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* retrigger CI

* error typefix

---------

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:02 -08:00
amysguan
7a4188dbda Add compaction settings to ADE (#9667)
* add compaction settings to ADE, add get default prompt for updated mode route

* update patch to auto set prompt on mode change, related ade changes

* reset api and update test

* feat: add compaction configuration translation keys for fr and cn

Add ADE/CompactionConfiguration translation keys to fr.json and cn.json
to match the new keys added in en.json.

Co-authored-by: Christina Tong <christinatong01@users.noreply.github.com>

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* type/translation/etc fixes

* fix typing

* update model selector path w/ change from main

* import mode from sdk

---------

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:02 -08:00
amysguan
612a2ae98b Fix: Change Z.ai context window to account for max_token subtraction (#9710)
fix zai context window (functionally [advertised context window] - [max output tokens]) and properly pass in max tokens so Z.ai doesn't default to 65k for GLM-5
2026-03-03 18:34:02 -08:00
cthomas
aa66e81a71 feat: add debug logs in telem endpoint (#9723)
* feat: add debug logs in telem endpoint

* api sync

* fix: add debug_log_tail to FeedbackProperty type

Add debug_log_tail field to FeedbackProperty interface in service-analytics
to fix type error when sending debug log data in feedback and telemetry.

Also add e2e tests for feedback and error telemetry with debug_log_tail.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:02 -08:00
Sarah Wooders
a50482e6d3 feat(core): sync skills from SKILL.md into memFS blocks (#9718) 2026-03-03 18:34:02 -08:00
Kevin Lin
a11ba9710c feat(core): increase Gemini timeout to 10 minutes (#9714) 2026-03-03 18:34:02 -08:00
cthomas
ab784e702d feat: support default convo in list and cancel endpoints (#9707)
* feat: support default convo in list and cancel endpoints

* also support in compact endpoint

* api sync
2026-03-03 18:34:02 -08:00
cthomas
39a537a9a5 feat: add default convo support to conversations endpoint (#9706)
* feat: add default convo support to conversations endpoint

* api sync
2026-03-03 18:34:02 -08:00
Shubham Naik
fd4a8e73a5 More stream fixes (#9702)
* chore: more stream fixes

* chore: more stream fixes

* chore: more stream fixes

* chore: next

* chore: next

* chore: next
2026-03-03 18:34:02 -08:00
Ari Webb
673c1220a1 fix: strip properties for fireworks (#9703) 2026-03-03 18:34:02 -08:00
Sarah Wooders
57e7e0e52b feat(core): reserve skills in memfs sync and list top-level skill directory [LET-7710] (#9691) 2026-03-03 18:34:02 -08:00
jnjpng
750b83a2ea fix: update conversation manager tests for auto-generated system message (#9687)
fix: update Anthropic mock to match real SDK's sync list() signature

The real Anthropic SDK's models.list() is a regular (non-async) method
that returns an AsyncPaginator (async-iterable). The mock used async def,
causing `async for model in client.models.list()` to iterate over a
coroutine instead of the page, silently failing with 0 models synced.
2026-03-03 18:34:02 -08:00
cthomas
28a66fa9d7 chore: remove stmt timeout debug logging (#9693) 2026-03-03 18:34:02 -08:00
github-actions[bot]
f54ae7c929 feat: render description for non-system files in memory_filesystem tree (#9688)
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
2026-03-03 18:34:02 -08:00
github-actions[bot]
bf80de214d feat: change default context window from 32000 to 128000 (#9673)
* feat: change default context window from 32000 to 128000

Update DEFAULT_CONTEXT_WINDOW and global_max_context_window_limit from
32000 to 128000. Also update all .af (agent files), cypress test
fixtures, and integration tests to use the new default.

Closes #9672

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): update conversation manager tests for auto-created system message

create_conversation now auto-creates a system message at position 0
(from #9508), but the test assertions weren't updated. Adjust expected
message counts and ordering to account for the initial system message.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): fix mock Anthropic models.list() to return async iterable, not coroutine

The real Anthropic SDK's models.list() returns an AsyncPage (with __aiter__)
directly, but the mock used `async def list()` which returns a coroutine.
The code does `async for model in client.models.list()` which needs an
async iterable, not a coroutine. Fix by making list() a regular method.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
2026-03-03 18:34:01 -08:00
Shubham Naik
357a3ad15b Shub/let 7721 make env permanent [LET-7721] (#9683)
* chore: env permanent

* chore: env permanent

* feat: add persistent environments with hybrid DB + Redis storage [LET-7721]

Implements persistent storage for letta-code listener connections (environments) with hybrid PostgreSQL + Redis architecture:

**Database Layer:**
- Add `environments` table with device tracking, connection metadata, soft deletes
- Store userId/apiKeyOwner, connection history (firstSeenAt, lastSeenAt)
- Unique constraint on (organizationId, deviceId) - one environment per device per org
- Auto-undelete previously deleted environments on reconnect

**API Layer:**
- Update environmentsContract with new fields (id, firstSeenAt, lastSeenAt, metadata)
- Add deleteEnvironment endpoint (soft delete, closes WebSocket if online)
- Add onlineOnly filter to listConnections for efficient online-only queries
- Export ListConnectionsResponse type for proper client typing

**Router Implementation:**
- register(): Create/update DB environment, generate ephemeral connectionId
- listConnections(): Hybrid query strategy (DB-first for all, Redis-first for onlineOnly)
- deleteEnvironment(): Soft delete with Redis Pub/Sub for graceful WebSocket close
- Filter by connectionId in DB using inArray() for onlineOnly performance

**WebSocket Handler:**
- Moved from apps/cloud-api to libs/utils-server for reusability
- Update DB on connect/disconnect only (not heartbeat) - minimal write load
- Store currentPodId and userId/apiKeyOwner on connect
- Clear currentConnectionId/currentPodId on disconnect/error

**Shared Types:**
- Add EnvironmentMetadata interface in libs/types for cross-layer consistency
- Update Redis schema to include currentMode field

**UI Components:**
- Add DeleteDeviceModal with offline-only restriction
- Update DeviceSelector with delete button on hover for offline devices
- Proper cache updates using ListConnectionsResponse type
- Add translations for delete modal

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* docs: update letta remote setup instructions [LET-7721]

Update local setup guide with clearer instructions:
- Remove hardcoded ngrok URL requirement (ngrok generates URL automatically)
- Update env var to use CLOUD_API_ENDPOINT_OVERRIDE
- Add proper API key and base URL format
- Include alternative setup using letta-code repo with bun dev

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: fix env

* fix: lint errors and make migration idempotent [LET-7721]

- Remove unused imports (HiddenOnMobile, VisibleOnMobile, MiddleTruncate)
- Fix type imports (use `import type` for type-only imports)
- Remove non-null assertions in environmentsRouter (use safe null checks + filter)
- Make migration idempotent with IF NOT EXISTS for table, indexes, and constraints
- Use DO $$ block for foreign key constraint (handles duplicate_object exception)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: fix env

---------

Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:01 -08:00
jnjpng
46971414a4 fix: preserve agent max_tokens when caller doesn't explicitly set it (#9679)
* fix: preserve agent max_tokens when caller doesn't explicitly set it

When updating an agent with convenience fields (model, model_settings)
but without an explicit max_tokens, the server was constructing a fresh
LLMConfig via get_llm_config_from_handle_async. The Pydantic validator
on LLMConfig hardcodes max_tokens=16384 for gpt-5* models, silently
overriding the agent's existing value (e.g. 128000).

This was triggered by reasoning tab-switch in the CLI, which sends
model + model_settings (with reasoning_effort) but no max_tokens.

Now, when request.max_tokens is None we carry forward the agent's
current max_tokens instead of accepting the provider default.

* fix: use correct 128k max_output_tokens defaults for gpt-5.2/5.3

- Update OpenAI provider fallback to return 128000 for gpt-5.2*/5.3*
  models (except -chat variants which are 16k)
- Update LLMConfig Pydantic validator to match
- Update gpt-5.2 default_config factory to use 128000
- Move server-side max_tokens preservation guard into the
  model_settings branch where llm_config is already available
2026-03-03 18:34:01 -08:00
Shubham Naik
5d55d4ccd4 chore: rebuild docs (#9674) 2026-03-03 18:34:01 -08:00
cthomas
1fb355a39a fix: override stop reason for streaming for empty response (#9663) 2026-03-03 18:34:01 -08:00
jnjpng
bd6f2e792c feat: accept recent_chunks in error telemetry schema (#9662)
* feat: accept recent_chunks in error telemetry schema

Add recent_chunks field to ErrorDataSchema (Zod) and
LettaCodeErrorProperty (analytics type) so the server can receive
and forward chunk diagnostics attached to error telemetry events.

* chore: regenerate openapi with recent_chunks field
2026-03-03 18:34:01 -08:00
Ari Webb
dd0e513951 fix: lazy load conversations [LET-7682] (#9629)
fix: lazy load conversations
2026-03-03 18:34:01 -08:00
Charles Packer
1555c338b6 fix(core-tests): update SDK blocks limit expectation to 100k (#9653)
fix(core-tests): align blocks sdk expected limit with 100k default
2026-03-03 18:34:01 -08:00
cthomas
9422b2d993 fix: set otid for all approval request message (#9655) 2026-03-03 18:34:01 -08:00
cthomas
1448609ecf fix: set otid for summary message (#9654) 2026-03-03 18:34:01 -08:00
cthomas
3d781efd21 fix(core): raise LLMEmptyResponseError for empty Anthropic responses (#9624)
* fix(core): raise LLMEmptyResponseError for empty Anthropic responses

Fixes LET-7679: Opus 4.6 occasionally returns empty responses (no content
and no tool calls), causing silent failures with stop_reason=end_turn.

Changes:
- Add LLMEmptyResponseError class (subclass of LLMServerError)
- Raise error in anthropic_client for empty non-streaming responses
- Raise error in anthropic_streaming_interface for empty streaming responses
- Pass through LLMError instances in handle_llm_error to preserve specific types
- Add test for empty streaming response detection

This allows clients (letta-code) to catch this specific error and implement
retry logic with cache-busting modifications.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): set invalid_llm_response stop reason for empty responses

Catch LLMEmptyResponseError specifically and set stop_reason to
invalid_llm_response instead of llm_api_error. This allows clients
to distinguish empty responses from transient API errors.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:01 -08:00
github-actions[bot]
86ff216dc9 fix: update tests for CORE_MEMORY_BLOCK_CHAR_LIMIT increase to 100k (#9645)
Tests were failing because they relied on the old default limit of 20,000:

- test_memory.py: "x " * 50000 = 100,000 chars now equals the limit
  instead of exceeding it. Increased to "x " * 60000 (120k chars).

- test_block_manager.py: Block created with default limit (now 100k),
  then 30k char update no longer exceeds it. Set explicit limit=20000
  on the test block to preserve the test intent.

- test_log_context_middleware.py: Removed stale `limit: 20000` from
  dummy frontmatter fixtures to match new serialization behavior.

Related to #9537

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>
2026-03-03 18:34:01 -08:00
Kevin Lin
895acb9f4e feat(core): add gpt-5.3-codex model support (#9628)
* feat(core): add gpt-5.3-codex model support

Add OpenAI gpt-5.3-codex model: context window overrides, model pricing
and capabilities, none-reasoning-effort support, and test config.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* just stage-api && just publish-api

---------

Co-authored-by: Letta <noreply@letta.com>
2026-03-03 18:34:01 -08:00
Kian Jones
ddfa922cde fix(core): prevent event loop saturation from ClickHouse and socket trace writes (#9617)
* fix(core): prevent event loop saturation from ClickHouse and socket trace writes

Two issues were causing the event loop watchdog to fire and liveness probes
to fail under load:

1. LLMTraceWriter held an asyncio.Lock across each ClickHouse write, and
   wait_for_async_insert=1 meant each write held that lock for ~1s. Under high
   request volume, N background tasks all queued for the lock simultaneously,
   saturating the event loop with task management overhead. Fix: switch to
   wait_for_async_insert=0 (ClickHouse async_insert handles server-side batching
   — no acknowledgment wait needed) and remove the lock (clickhouse_connect uses
   a thread-safe connection pool). The sync insert still runs in asyncio.to_thread
   so it never blocks the event loop. No traces are dropped.

2. SocketProviderTraceBackend spawned one OS thread per trace with a 60s socket
   timeout. During crouton restarts, threads accumulated blocking on sock.sendall
   for up to 3 minutes each (3 retries x 60s). Fix: reduce socket timeout from
   60s to 5s — the socket is local (Unix socket), so 5s is already generous, and
   fast failure lets retries resolve before threads pile up.

Root cause analysis: event_loop_watchdog.py was detecting saturation (lag >2s)
every ~60s on gke-letta-default-pool-c6915745-fmq6 via thread dumps. The
saturated event loop caused k8s liveness probes to time out, triggering restarts.

* chore(core): sync socket backend with main and document ClickHouse thread safety
2026-03-03 18:34:01 -08:00
github-actions[bot]
94fc05b6e5 feat: remove limit from git-base memory frontmatter and increase default to 100k (#9537)
- Remove `limit` from YAML frontmatter in `serialize_block()` and
  `merge_frontmatter_with_body()` (deprecated for git-base memory)
- Remove `limit` from `_render_memory_blocks_git()` in-context rendering
- Existing frontmatter with `limit` is automatically cleaned up on next write
- Parsing still accepts `limit` from frontmatter for backward compatibility
- Increase `CORE_MEMORY_BLOCK_CHAR_LIMIT` from 20,000 to 100,000
- Update integration tests to assert `limit` is not in frontmatter

Fixes #9536

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
2026-03-03 18:34:01 -08:00
github-actions[bot]
0020f4b866 feat: recompile system message on new conversation creation (#9508)
* feat: recompile system message on new conversation creation

When a new conversation is created, the system prompt is now recompiled
with the latest memory block values and metadata instead of starting
with no messages. This ensures each conversation captures the current
agent state at creation time.

- Add _initialize_conversation_system_message to ConversationManager
- Compile fresh system message using PromptGenerator during conversation creation
- Add integration tests for the full workflow (modify memory → new conversation
  gets updated system message)
- Update existing test expectations for non-empty conversation messages

Fixes #9507

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

* refactor: deduplicate system message compilation into ConversationManager

Consolidate the duplicate system message compilation logic into a single
shared method `compile_and_save_system_message_for_conversation` on
ConversationManager. This method accepts optional pre-loaded agent_state
and message_manager to avoid redundant DB loads when callers already have
them.

- Renamed _initialize_conversation_system_message → compile_and_save_system_message_for_conversation (public, reusable)
- Added optional agent_state and message_manager params
- Replaced 40-line duplicate in helpers.py with a 7-line call to the shared method
- Method returns the persisted system message for caller use

Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>

---------

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
2026-03-03 18:34:01 -08:00
cthomas
1b2aa98b3e chore: bump version 0.16.5 (#3202) 2026-02-24 11:02:17 -08:00
Caren Thomas
ce54fb1a00 bump version 2026-02-24 10:58:16 -08:00
amysguan
47b0c87ebe Add modes self and self_sliding_window for prompt caching (#9372)
* add self compaction method with proper caching (pass in tools, don't refresh sys prompt beforehand) + sliding fallback

* updated prompts for self compaction

* add tests for self, self_sliding_window modes and w/o refresh messages before compaction

* add cache logging to summarization

* better handling to prevent agent from continuing convo on self modes

* if mode changes via summarize endpoint, will use default prompt for the new mode

---------

Co-authored-by: Amy Guan <amy@letta.com>
2026-02-24 10:55:26 -08:00
Ari Webb
47d55362a4 fix: models need to be paginated (#9621) 2026-02-24 10:55:26 -08:00
cthomas
8ab9d78a23 chore: cleanup (#9602)
* chore: cleanup

* update dependencies
2026-02-24 10:55:26 -08:00
cthomas
db418d99f4 test: remove sonnet 3-7 reference (#9618) 2026-02-24 10:55:26 -08:00
Ari Webb
c325b2b002 feat: add memfs file list and read endpoints to cloud-api [LET-7437] (#9520)
* feat: add memfs file list and read endpoints to cloud-api

* fix ci

* add env var

* tests and refactor memoryFilesRouter

* rename env var

* fix path parameter error

* fix test

* stage publish api

* memfs in helm
2026-02-24 10:55:26 -08:00
jnjpng
5505e9cf4b fix(core): suppress missing-otid warning for compaction events (#9616)
fix(core): skip missing-otid warning for compaction events
2026-02-24 10:55:26 -08:00
Sarah Wooders
d77c2950dc chore: update APIs for conversations (#9614) 2026-02-24 10:55:26 -08:00
Ari Webb
62967bcca0 feat: parallel tool calling minimax provider [LET-7647] (#9613)
* feat: parallel tool calling minimax provider

* stage publish api
2026-02-24 10:55:26 -08:00
jnjpng
a59f24ac87 fix(core): ensure buffered Anthropic tool chunks always include otid (#9516)
fix(core): ensure otid exists when flushing buffered anthropic tool chunks

Anthropic TOOL_USE buffering can emit buffered tool_call/approval chunks on content block stop before otid is assigned in the normal inner_thoughts_complete path. Ensure flush-time chunks get a deterministic otid so streaming clients can reliably correlate deltas.

👾 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:26 -08:00
Shubham Naik
f082fd5061 feat: add order_by and order params to /v1/conversations list endpoin… (#9599)
* feat: add order_by and order params to /v1/conversations list endpoint [LET-7628]

Added sorting support to the conversations list endpoint, matching the pattern from /v1/agents.

**API Changes:**
- Added `order` query param: "asc" or "desc" (default: "desc")
- Added `order_by` query param: "created_at" or "last_run_completion" (default: "created_at")

**Implementation:**

**created_at ordering:**
- Simple ORDER BY on ConversationModel.created_at
- No join required, fast query
- Nulls not applicable (created_at always set)

**last_run_completion ordering:**
- LEFT JOIN with runs table using subquery
- Subquery: MAX(completed_at) grouped by conversation_id
- Uses OUTER JOIN so conversations with no runs are included
- Nulls last ordering (conversations with no runs go to end)
- Index on runs.conversation_id ensures performant join

**Pagination:**
- Cursor-based pagination with `after` parameter
- Handles null values correctly for last_run_completion
- For created_at: simple timestamp comparison
- For last_run_completion: complex null-aware cursor logic

**Performance:**
- Existing index: `ix_runs_conversation_id` on runs table
- Subquery with GROUP BY is efficient for this use case
- OUTER JOIN ensures conversations without runs are included

**Follows agents pattern:**
- Same parameter names (order, order_by)
- Same Literal types and defaults
- Converts "asc"/"desc" to ascending boolean internally

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: order

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:55:26 -08:00