* feat: add debug logs in telem endpoint
* api sync
* fix: add debug_log_tail to FeedbackProperty type
Add debug_log_tail field to FeedbackProperty interface in service-analytics
to fix type error when sending debug log data in feedback and telemetry.
Also add e2e tests for feedback and error telemetry with debug_log_tail.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
fix: update Anthropic mock to match real SDK's sync list() signature
The real Anthropic SDK's models.list() is a regular (non-async) method
that returns an AsyncPaginator (async-iterable). The mock used async def,
causing `async for model in client.models.list()` to iterate over a
coroutine instead of the page, silently failing with 0 models synced.
* feat: change default context window from 32000 to 128000
Update DEFAULT_CONTEXT_WINDOW and global_max_context_window_limit from
32000 to 128000. Also update all .af (agent files), cypress test
fixtures, and integration tests to use the new default.
Closes#9672
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): update conversation manager tests for auto-created system message
create_conversation now auto-creates a system message at position 0
(from #9508), but the test assertions weren't updated. Adjust expected
message counts and ordering to account for the initial system message.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): fix mock Anthropic models.list() to return async iterable, not coroutine
The real Anthropic SDK's models.list() returns an AsyncPage (with __aiter__)
directly, but the mock used `async def list()` which returns a coroutine.
The code does `async for model in client.models.list()` which needs an
async iterable, not a coroutine. Fix by making list() a regular method.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
* chore: env permanent
* chore: env permanent
* feat: add persistent environments with hybrid DB + Redis storage [LET-7721]
Implements persistent storage for letta-code listener connections (environments) with hybrid PostgreSQL + Redis architecture:
**Database Layer:**
- Add `environments` table with device tracking, connection metadata, soft deletes
- Store userId/apiKeyOwner, connection history (firstSeenAt, lastSeenAt)
- Unique constraint on (organizationId, deviceId) - one environment per device per org
- Auto-undelete previously deleted environments on reconnect
**API Layer:**
- Update environmentsContract with new fields (id, firstSeenAt, lastSeenAt, metadata)
- Add deleteEnvironment endpoint (soft delete, closes WebSocket if online)
- Add onlineOnly filter to listConnections for efficient online-only queries
- Export ListConnectionsResponse type for proper client typing
**Router Implementation:**
- register(): Create/update DB environment, generate ephemeral connectionId
- listConnections(): Hybrid query strategy (DB-first for all, Redis-first for onlineOnly)
- deleteEnvironment(): Soft delete with Redis Pub/Sub for graceful WebSocket close
- Filter by connectionId in DB using inArray() for onlineOnly performance
**WebSocket Handler:**
- Moved from apps/cloud-api to libs/utils-server for reusability
- Update DB on connect/disconnect only (not heartbeat) - minimal write load
- Store currentPodId and userId/apiKeyOwner on connect
- Clear currentConnectionId/currentPodId on disconnect/error
**Shared Types:**
- Add EnvironmentMetadata interface in libs/types for cross-layer consistency
- Update Redis schema to include currentMode field
**UI Components:**
- Add DeleteDeviceModal with offline-only restriction
- Update DeviceSelector with delete button on hover for offline devices
- Proper cache updates using ListConnectionsResponse type
- Add translations for delete modal
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* docs: update letta remote setup instructions [LET-7721]
Update local setup guide with clearer instructions:
- Remove hardcoded ngrok URL requirement (ngrok generates URL automatically)
- Update env var to use CLOUD_API_ENDPOINT_OVERRIDE
- Add proper API key and base URL format
- Include alternative setup using letta-code repo with bun dev
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: fix env
* fix: lint errors and make migration idempotent [LET-7721]
- Remove unused imports (HiddenOnMobile, VisibleOnMobile, MiddleTruncate)
- Fix type imports (use `import type` for type-only imports)
- Remove non-null assertions in environmentsRouter (use safe null checks + filter)
- Make migration idempotent with IF NOT EXISTS for table, indexes, and constraints
- Use DO $$ block for foreign key constraint (handles duplicate_object exception)
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: fix env
---------
Co-authored-by: Letta <noreply@letta.com>
* fix: preserve agent max_tokens when caller doesn't explicitly set it
When updating an agent with convenience fields (model, model_settings)
but without an explicit max_tokens, the server was constructing a fresh
LLMConfig via get_llm_config_from_handle_async. The Pydantic validator
on LLMConfig hardcodes max_tokens=16384 for gpt-5* models, silently
overriding the agent's existing value (e.g. 128000).
This was triggered by reasoning tab-switch in the CLI, which sends
model + model_settings (with reasoning_effort) but no max_tokens.
Now, when request.max_tokens is None we carry forward the agent's
current max_tokens instead of accepting the provider default.
* fix: use correct 128k max_output_tokens defaults for gpt-5.2/5.3
- Update OpenAI provider fallback to return 128000 for gpt-5.2*/5.3*
models (except -chat variants which are 16k)
- Update LLMConfig Pydantic validator to match
- Update gpt-5.2 default_config factory to use 128000
- Move server-side max_tokens preservation guard into the
model_settings branch where llm_config is already available
* feat: accept recent_chunks in error telemetry schema
Add recent_chunks field to ErrorDataSchema (Zod) and
LettaCodeErrorProperty (analytics type) so the server can receive
and forward chunk diagnostics attached to error telemetry events.
* chore: regenerate openapi with recent_chunks field
* fix(core): raise LLMEmptyResponseError for empty Anthropic responses
Fixes LET-7679: Opus 4.6 occasionally returns empty responses (no content
and no tool calls), causing silent failures with stop_reason=end_turn.
Changes:
- Add LLMEmptyResponseError class (subclass of LLMServerError)
- Raise error in anthropic_client for empty non-streaming responses
- Raise error in anthropic_streaming_interface for empty streaming responses
- Pass through LLMError instances in handle_llm_error to preserve specific types
- Add test for empty streaming response detection
This allows clients (letta-code) to catch this specific error and implement
retry logic with cache-busting modifications.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): set invalid_llm_response stop reason for empty responses
Catch LLMEmptyResponseError specifically and set stop_reason to
invalid_llm_response instead of llm_api_error. This allows clients
to distinguish empty responses from transient API errors.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
Tests were failing because they relied on the old default limit of 20,000:
- test_memory.py: "x " * 50000 = 100,000 chars now equals the limit
instead of exceeding it. Increased to "x " * 60000 (120k chars).
- test_block_manager.py: Block created with default limit (now 100k),
then 30k char update no longer exceeds it. Set explicit limit=20000
on the test block to preserve the test intent.
- test_log_context_middleware.py: Removed stale `limit: 20000` from
dummy frontmatter fixtures to match new serialization behavior.
Related to #9537
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>
* feat(core): add gpt-5.3-codex model support
Add OpenAI gpt-5.3-codex model: context window overrides, model pricing
and capabilities, none-reasoning-effort support, and test config.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* just stage-api && just publish-api
---------
Co-authored-by: Letta <noreply@letta.com>
* fix(core): prevent event loop saturation from ClickHouse and socket trace writes
Two issues were causing the event loop watchdog to fire and liveness probes
to fail under load:
1. LLMTraceWriter held an asyncio.Lock across each ClickHouse write, and
wait_for_async_insert=1 meant each write held that lock for ~1s. Under high
request volume, N background tasks all queued for the lock simultaneously,
saturating the event loop with task management overhead. Fix: switch to
wait_for_async_insert=0 (ClickHouse async_insert handles server-side batching
— no acknowledgment wait needed) and remove the lock (clickhouse_connect uses
a thread-safe connection pool). The sync insert still runs in asyncio.to_thread
so it never blocks the event loop. No traces are dropped.
2. SocketProviderTraceBackend spawned one OS thread per trace with a 60s socket
timeout. During crouton restarts, threads accumulated blocking on sock.sendall
for up to 3 minutes each (3 retries x 60s). Fix: reduce socket timeout from
60s to 5s — the socket is local (Unix socket), so 5s is already generous, and
fast failure lets retries resolve before threads pile up.
Root cause analysis: event_loop_watchdog.py was detecting saturation (lag >2s)
every ~60s on gke-letta-default-pool-c6915745-fmq6 via thread dumps. The
saturated event loop caused k8s liveness probes to time out, triggering restarts.
* chore(core): sync socket backend with main and document ClickHouse thread safety
- Remove `limit` from YAML frontmatter in `serialize_block()` and
`merge_frontmatter_with_body()` (deprecated for git-base memory)
- Remove `limit` from `_render_memory_blocks_git()` in-context rendering
- Existing frontmatter with `limit` is automatically cleaned up on next write
- Parsing still accepts `limit` from frontmatter for backward compatibility
- Increase `CORE_MEMORY_BLOCK_CHAR_LIMIT` from 20,000 to 100,000
- Update integration tests to assert `limit` is not in frontmatter
Fixes#9536
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
* feat: recompile system message on new conversation creation
When a new conversation is created, the system prompt is now recompiled
with the latest memory block values and metadata instead of starting
with no messages. This ensures each conversation captures the current
agent state at creation time.
- Add _initialize_conversation_system_message to ConversationManager
- Compile fresh system message using PromptGenerator during conversation creation
- Add integration tests for the full workflow (modify memory → new conversation
gets updated system message)
- Update existing test expectations for non-empty conversation messages
Fixes#9507
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
* refactor: deduplicate system message compilation into ConversationManager
Consolidate the duplicate system message compilation logic into a single
shared method `compile_and_save_system_message_for_conversation` on
ConversationManager. This method accepts optional pre-loaded agent_state
and message_manager to avoid redundant DB loads when callers already have
them.
- Renamed _initialize_conversation_system_message → compile_and_save_system_message_for_conversation (public, reusable)
- Added optional agent_state and message_manager params
- Replaced 40-line duplicate in helpers.py with a 7-line call to the shared method
- Method returns the persisted system message for caller use
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
---------
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
* add self compaction method with proper caching (pass in tools, don't refresh sys prompt beforehand) + sliding fallback
* updated prompts for self compaction
* add tests for self, self_sliding_window modes and w/o refresh messages before compaction
* add cache logging to summarization
* better handling to prevent agent from continuing convo on self modes
* if mode changes via summarize endpoint, will use default prompt for the new mode
---------
Co-authored-by: Amy Guan <amy@letta.com>
* feat: add memfs file list and read endpoints to cloud-api
* fix ci
* add env var
* tests and refactor memoryFilesRouter
* rename env var
* fix path parameter error
* fix test
* stage publish api
* memfs in helm
fix(core): ensure otid exists when flushing buffered anthropic tool chunks
Anthropic TOOL_USE buffering can emit buffered tool_call/approval chunks on content block stop before otid is assigned in the normal inner_thoughts_complete path. Ensure flush-time chunks get a deterministic otid so streaming clients can reliably correlate deltas.
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* feat: add order_by and order params to /v1/conversations list endpoint [LET-7628]
Added sorting support to the conversations list endpoint, matching the pattern from /v1/agents.
**API Changes:**
- Added `order` query param: "asc" or "desc" (default: "desc")
- Added `order_by` query param: "created_at" or "last_run_completion" (default: "created_at")
**Implementation:**
**created_at ordering:**
- Simple ORDER BY on ConversationModel.created_at
- No join required, fast query
- Nulls not applicable (created_at always set)
**last_run_completion ordering:**
- LEFT JOIN with runs table using subquery
- Subquery: MAX(completed_at) grouped by conversation_id
- Uses OUTER JOIN so conversations with no runs are included
- Nulls last ordering (conversations with no runs go to end)
- Index on runs.conversation_id ensures performant join
**Pagination:**
- Cursor-based pagination with `after` parameter
- Handles null values correctly for last_run_completion
- For created_at: simple timestamp comparison
- For last_run_completion: complex null-aware cursor logic
**Performance:**
- Existing index: `ix_runs_conversation_id` on runs table
- Subquery with GROUP BY is efficient for this use case
- OUTER JOIN ensures conversations without runs are included
**Follows agents pattern:**
- Same parameter names (order, order_by)
- Same Literal types and defaults
- Converts "asc"/"desc" to ascending boolean internally
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: order
---------
Co-authored-by: Letta <noreply@letta.com>
* feat: add recent_chunks field to feedback endpoint
Accept an optional recent_chunks string in the /v1/metadata/feedback
POST body and forward it to PostHog. Contains the last 100 truncated
streaming chunks as JSON for debugging issues reported via /feedback.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: regenerate openapi specs with recent_chunks field
---------
Co-authored-by: Letta <noreply@letta.com>
fix: align ChatGPT OAuth GPT-5 max output token defaults
Update ChatGPT OAuth provider defaults so GPT-5 family models report 128k max output tokens based on current OpenAI model docs, avoiding incorrect 16k values in /v1/models responses.
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* feat: make agent_id optional in conversations list endpoint [LET-7612]
Allow listing all conversations without filtering by agent_id.
**Router changes (conversations.py):**
- Changed agent_id from required (`...`) to optional (`None`)
- Updated description to clarify behavior
- Updated docstring to reflect optional filtering
**Manager changes (conversation_manager.py):**
- Updated list_conversations signature: agent_id: str → Optional[str]
- Updated docstring to clarify optional behavior
- Summary search query: conditionally adds agent_id filter only if provided
- Default list logic: passes agent_id (can be None) to list_async
**How it works:**
- Without agent_id: returns all conversations for the user's organization
- With agent_id: returns conversations filtered by that agent
- list_async handles None gracefully via **kwargs pattern
**Use case:**
- Cloud UI can list all user conversations across agents
- Still supports filtering by agent_id when needed
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: update logs
* chore: update logs
---------
Co-authored-by: Letta <noreply@letta.com>
When model_settings is sent without max_output_tokens (e.g. only
changing reasoning_effort), the Pydantic default of 4096 was being
applied via _to_legacy_config_params(), silently overwriting the
agent's existing max_tokens.
Use model_fields_set to detect when max_output_tokens was not
explicitly provided and skip overwriting max_tokens in that case.
Only applied to the update path — on create, letting the default
apply is reasonable since there's no pre-existing value.
* feat: add two-way mode control for listener connections
Enable bidirectional permission mode control between letta-cloud UI and letta-code instances.
**Backend:**
- Added ModeChangeMessage and ModeChangedMessage to WebSocket protocol
- Added sendModeChange endpoint (/v1/listeners/:connectionId/mode)
- listenersRouter publishes mode_change via Redis Pub/Sub
- listenersHandler handles mode_changed acknowledgments from letta-code
- Stores current mode in Redis for UI state sync
**Contract:**
- Added sendModeChange contract with PermissionModeSchema
- 4 modes: default, acceptEdits, plan, bypassPermissions
**Frontend:**
- Extended PermissionMode type to 4 modes (was 2: ask/never)
- PermissionModeSelector now shows all 4 modes with descriptions
- Added disabled prop (grayed out when Cloud orchestrator selected)
- PermissionModeContext.sendModeChangeToDevice() calls API
- AgentMessenger sends mode changes to device on mode/device change
- Updated auto-approval logic (only in Cloud mode, only for bypassPermissions)
- Updated inputMode logic (device handles approvals, not cloud)
**Translations:**
- Updated en.json with 4 mode labels and descriptions
- Removed legacy "askAlways" and "neverAsk" keys
**Mode Behavior:**
- default: Ask permission for each tool
- acceptEdits: Auto-approve file edits only
- plan: Read-only exploration (denies writes)
- bypassPermissions: Auto-approve everything
**Lint Fixes:**
- Removed unused imports and functions from trackingMiddleware.ts
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: store mode in connectionData and show approvals for all modes
**Backend:**
- Fixed Redis WRONGTYPE error - store currentMode inside connectionData object
- Changed const connectionData to let connectionData (needs mutation)
- Updated mode_changed handler to reassign entire connectionData object
- Updated ping handler for consistency (also reassigns connectionData)
- Added currentMode field to ListenerConnectionSchema (optional)
**Frontend:**
- Simplified inputMode logic - always show approval UI when toolCallsToApprove.length > 0
- Removed mode-specific approval filtering (show approvals even in bypass/acceptEdits for visibility)
- Users can see what tools are being auto-approved during execution
**Why:**
- Redis key is a JSON string (via setRedisData), not a hash
- Cannot use hset on string keys - causes WRONGTYPE error
- Must update entire object via setRedisData like ping handler does
- Approval visibility helpful for debugging/understanding agent behavior
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: use useMutation hook for sendModeChange instead of direct call
cloudAPI is initialized via initTsrReactQuery, so sendModeChange is a
mutation hook object, not a callable function. Use .useMutation() at the
component level and mutateAsync in the callback.
Co-authored-by: Shubham Naik <4shub@users.noreply.github.com>
* chore: update logs
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Shubham Naik <4shub@users.noreply.github.com>