When git push completes, the webhook fires immediately but GCS upload
may still be in progress. This causes KeyError when trying to read
commit objects that haven't been uploaded yet.
Add retry with exponential backoff (1s, 2s, 4s) to handle this race.
🐾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
Multiple OpenAI-compatible LLM clients (Azure, Deepseek, Groq, Together, XAI, ZAI)
and Anthropic-compatible clients (Anthropic, MiniMax, Google Vertex) were overriding
request_async/stream_async without calling sanitize_unicode_surrogates, causing
UnicodeEncodeError when message content contained lone UTF-16 surrogates.
Root cause: Child classes override parent methods but omit the sanitization step that
the base OpenAIClient includes. This allows corrupted Unicode (unpaired surrogates
from malformed emoji) to reach the httpx layer, which rejects it during UTF-8 encoding.
Fix: Import and call sanitize_unicode_surrogates in all overridden request methods.
Also removed duplicate sanitize_unicode_surrogates definition from openai_client.py
that shadowed the canonical implementation in letta.helpers.json_helpers.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
Issue-ID: 10c0f2e4-f87b-11f0-b91c-da7ad0900000
PR #9309 changed the block storage from blocks/ to memory/ directory.
Update memfs_client.py and memfs_client_base.py to match.
🐾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* fix(core): handle PermissionDeniedError in provider API key validation
Fixed OpenAI PermissionDeniedError being raised as unknown error when
validating provider API keys. The check_api_key methods in OpenAI-based
providers (OpenAI, OpenRouter, Azure, Together) now properly catch and
re-raise PermissionDeniedError as LLMPermissionDeniedError.
🐛 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): handle Unicode surrogates in OpenAI requests
Sanitize invalid UTF-16 surrogates before sending requests to OpenAI API.
Fixes UnicodeEncodeError when message content contains unpaired surrogates
from corrupted emoji data or malformed Unicode sequences.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* try to fix
* revert random stuff
* revert some stuff
---------
Co-authored-by: Letta <noreply@letta.com>
* fix: strip whitespace from API keys in LLM client headers
Fixes httpx.LocalProtocolError when API keys contain leading/trailing whitespace.
Strips whitespace from API keys before using them in HTTP headers across:
- OpenAI client (openai.py)
- Mistral client (mistral.py)
- Anthropic client (anthropic_client.py)
- Anthropic schema provider (schemas/providers/anthropic.py)
- Google AI client (google_ai_client.py)
- Proxy helpers (proxy_helpers.py)
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: handle McpError gracefully in MCP client execute_tool
Return error as failed result instead of re-raising to avoid Datadog alerts for expected user-facing errors like missing tool arguments.
* fix: strip whitespace from API keys before passing to httpx client
Fixes httpx.LocalProtocolError by stripping leading/trailing whitespace
from API keys before passing them to OpenAI/AsyncOpenAI clients. The
OpenAI client library constructs Authorization headers internally, and
invalid header values (like keys with leading spaces) cause protocol
errors.
Applied fix to:
- azure_client.py (AzureOpenAI/AsyncAzureOpenAI)
- deepseek_client.py (OpenAI/AsyncOpenAI)
- openai_client.py (OpenAI/AsyncOpenAI via kwargs)
- xai_client.py (OpenAI/AsyncOpenAI)
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: handle JSONDecodeError in OpenAI client requests
Catches json.JSONDecodeError from OpenAI SDK when API returns invalid
JSON (typically HTML error pages from 500-series errors) and converts
to LLMServerError with helpful details.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): strip API key whitespace at schema level on write/create
Add field_validator to ProviderCreate, ProviderUpdate, and ProviderCheck
schemas to strip whitespace from api_key and access_key fields before
persistence. This ensures keys are clean at the point of entry, preventing
whitespace from being encrypted and stored in the database.
Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>
* refactor: remove api_key.strip() calls across all LLM clients
Remove redundant .strip() calls on api_key parameters since pydantic models
now handle whitespace trimming at the validation layer. This centralizes
the validation logic and follows DRY principles.
- Updated 13 files across multiple LLM client implementations
- Removed 34 occurrences of api_key.strip()
- Includes: OpenAI, Anthropic, Azure, Google AI, Groq, XAI, DeepSeek, ZAI, Together, Mistral
- Also updated proxy helpers and provider schemas
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor: remove redundant ternary operators from api_key parameters
Remove `if api_key else None` ternaries since pydantic validation ensures
api_key is either a valid string or None. The ternary was defensive programming
that's now unnecessary with proper model-level validation.
- Simplified 23 occurrences across 7 files
- Cleaner, more concise client initialization code
- No behavioral change since pydantic already handles this
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>
* feat: add memfs-py service
* add tf for bucket access and secrets v2 access
* feat(memfs): add helm charts, deploy workflow, and bug fixes
- Add dev helm chart (helm/dev/memfs-py/) with CSI secrets pattern
- Update prod helm chart with CSI secrets and correct service account
- Add GitHub Actions deploy workflow
- Change port from 8284 to 8285 to avoid conflict with core's dulwich sidecar
- Fix chunked transfer encoding issue (strip HTTP_TRANSFER_ENCODING header)
- Fix timestamp parsing to handle both ISO and HTTP date formats
- Fix get_head_sha to raise FileNotFoundError on 404
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Kian Jones <kian@letta.com>
Co-authored-by: Letta <noreply@letta.com>
* fix: handle const keyword in google genai tool schemas
* fix: handle pydantic ValidationError in Google GenAI client
Fixes Datadog error tracking issue where pydantic_core.ValidationError
was raised when tool schemas contained unsupported fields (e.g., 'const',
'default', 'additionalProperties').
Changes:
- Add error handling for pydantic ValidationError in request(), request_async(), and stream_async()
- Convert validation errors to LLMBadRequestError with helpful error message
- Deep copy tool parameters before cleaning to avoid modifying shared objects
- Add imports for pydantic_core and copy module
This prevents unhandled exceptions and provides better diagnostics when
tool schemas contain fields not supported by Google AI API.
* fix(core): handle PermissionDeniedError in provider API key validation
Fixed OpenAI PermissionDeniedError being raised as unknown error when
validating provider API keys. The check_api_key methods in OpenAI-based
providers (OpenAI, OpenRouter, Azure, Together) now properly catch and
re-raise PermissionDeniedError as LLMPermissionDeniedError.
🐛 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): handle Unicode surrogates in OpenAI requests
Sanitize invalid UTF-16 surrogates before sending requests to OpenAI API.
Fixes UnicodeEncodeError when message content contains unpaired surrogates
from corrupted emoji data or malformed Unicode sequences.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): handle MCP tool schema validation errors gracefully
Catch fastmcp.exceptions.ToolError in execute_mcp_tool endpoint and
convert to LettaInvalidArgumentError (400) instead of letting it
propagate as 500 error. This is an expected user error when tool
arguments don't match the MCP tool's schema.
Fixes Datadog issue 8f2d874a-f8e5-11f0-9b25-da7ad0900000
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): handle ExceptionGroup-wrapped ToolError in MCP executor
When MCP tools fail with validation errors (e.g., missing required parameters),
fastmcp raises ToolError exceptions that may be wrapped in ExceptionGroup by
Python's async TaskGroup. The exception handler now unwraps single-exception
groups before checking if the error should be handled gracefully.
Fixes Calendly API "organization parameter missing" errors being logged to
Datadog instead of returning friendly error messages to users.
🐾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: handle missing agent in create_conversation to prevent foreign key violation
* Update .gitignore
---------
Co-authored-by: Letta <noreply@letta.com>
* fix: handle Anthropic overloaded_error in streaming interfaces
* fix: handle Unicode surrogates in OpenAI requests
Sanitize Unicode surrogate pairs before sending requests to OpenAI API.
Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause
UnicodeEncodeError when encoding to UTF-8.
Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in
position 326605: surrogates not allowed
* fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests
Improved sanitize_unicode_surrogates() to explicitly filter out lone
surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8.
Previous implementation used errors='ignore' which could still fail in
edge cases. New approach directly checks Unicode code points and removes
any surrogates before data reaches httpx encoding.
Also added sanitization to stream_async_responses() method which was
missing it.
Fixes: 'utf-8' codec can't encode character '\ud83c' in position X:
surrogates not allowed
* feat: add /agents/{agent_id}/generate endpoint for direct LLM requests
Add new endpoint that makes direct LLM provider requests without agent
context, memory, tools, or state modification. This enables:
- Quick LLM queries without agent overhead
- Testing model configurations
- Simple chat completions using agent's credentials
- Comparing responses across different models
Features:
- Uses agent's LLM config by default
- Supports model override with full provider config resolution
- Non-streaming, stateless operation
- Proper error handling and validation
- Request/response schemas with Pydantic validation
Implementation:
- Add GenerateRequest and GenerateResponse schemas
- Implement generate_completion endpoint handler
- Add necessary imports (LLMError, LLMClient, HandleNotFoundError)
- Include logging and comprehensive error handling
* fix: improve error handling and fix Message construction
- Fix critical bug: use content=[TextContent(text=...)] instead of text=...
- Add explicit error handling for NoResultFound and HandleNotFoundError
- Add error handling for convert_response_to_chat_completion
- Add structured logging for debugging
- Remove unnecessary .get() calls since Pydantic validates messages
* refactor: extract generate logic to AgentCompletionService
Move the generate endpoint business logic out of the endpoint handler
into a dedicated AgentCompletionService class for better code organization
and separation of concerns.
Changes:
- Create new AgentCompletionService in services/agent_completion_service.py
- Service handles all business logic: agent validation, LLM config resolution,
message conversion, LLM client creation, and request/response processing
- Integrate service with SyncServer initialization
- Refactor generate_completion endpoint to use the service
- Endpoint now only handles HTTP concerns (auth, error mapping)
Benefits:
- Cleaner endpoint code (reduced from ~140 lines to ~25 lines)
- Better separation of concerns (HTTP vs business logic)
- Service logic can be reused or tested independently
- Follows established patterns in the codebase (AgentManager, etc.)
* feat: simplify generate API to accept just prompt text
Simplify the client interface by accepting a simple prompt string instead
of requiring clients to format messages.
Changes:
- Update GenerateRequest schema:
- Replace 'messages' array with simple 'prompt' string
- Add optional 'system_prompt' for context/instructions
- Keep 'override_model' for model selection
- Update AgentCompletionService to format messages automatically:
- Accepts prompt and optional system_prompt
- Constructs message array internally (system + user messages)
- Simpler API surface for clients
- Update endpoint documentation with new simplified examples
- Regenerate OpenAPI spec and TypeScript SDK
Benefits:
- Much simpler client experience - just send text
- No need to understand message formatting
- Still supports system prompts for context
- Cleaner API that matches common use cases
Example (before):
{
"messages": [{"role": "user", "content": "What is 2+2?"}]
}
Example (after):
{
"prompt": "What is 2+2?"
}
* test: add comprehensive integration tests for generate endpoint
Add 9 integration tests covering various scenarios:
Happy path tests:
- test_agent_generate_basic: Basic prompt -> response flow
- test_agent_generate_with_system_prompt: System prompt + user prompt
- test_agent_generate_with_model_override: Override model selection
- test_agent_generate_long_prompt: Handle longer prompts
- test_agent_generate_no_persistence: Verify no messages saved to agent
Error handling tests:
- test_agent_generate_empty_prompt_error: Empty prompt validation (422)
- test_agent_generate_invalid_agent_id: Invalid agent ID (404)
- test_agent_generate_invalid_model_override: Invalid model handle (404)
All tests verify:
- Response structure (content, model, usage)
- Proper status codes for errors
- Usage statistics (tokens, counts)
- No side effects on agent state
Tests follow existing test patterns in test_client.py and use the
letta_client SDK (assuming generate_completion method is auto-generated
from the OpenAPI spec).
* openapi
* refactor: rename AgentCompletionService to AgentGenerateCompletionManager
Rename for better clarity and consistency with codebase naming conventions:
- Rename file: agent_completion_service.py → agent_generate_completion_manager.py
- Rename class: AgentCompletionService → AgentGenerateCompletionManager
- Rename attribute: server.agent_completion_service → server.agent_generate_completion_manager
- Update docstrings: 'Service' → 'Manager'
Changes:
- apps/core/letta/services/agent_generate_completion_manager.py (renamed + updated class)
- apps/core/letta/server/server.py (import + initialization)
- apps/core/letta/server/rest_api/routers/v1/agents.py (usage in endpoint)
No functional changes, purely a naming refactor.
* fix: remove invalid Message parameters in generate manager
Remove agent_id=None and user_id=None from Message construction.
The Message model doesn't accept these as None values - only pass
required parameters (role, content).
Fixes validation error:
'Extra inputs are not permitted [type=extra_forbidden, input_value=None]'
This aligns with other Message construction patterns in the codebase
(see tools.py, memory.py examples).
* feat: improve generate endpoint validation and tests
- Add field validator for whitespace-only prompts
- Always include system message (required by Anthropic)
- Use default "You are a helpful assistant." when no system_prompt provided
- Update tests to use direct HTTP calls via httpx
- Fix test issues:
- Use valid agent ID format (agent-{uuid})
- Use available model (openai/gpt-4o-mini)
- Add whitespace validation test
- All 9 integration tests passing
* fix(core): derive dulwich org context from user_id fallback
Make git smart HTTP more robust in prod by:
- normalizing/injecting a single X-Organization-Id header in the FastAPI proxy
- keeping dulwich org contextvar set through WSGI iteration
- falling back to resolving org_id from user_id when X-Organization-Id is missing
- adding opt-in debug logs (env LETTA_GIT_HTTP_DEBUG_LOGS or letta_debug query)
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* refactor(core): drop user->org cache in dulwich org fallback
Keep the dulwich org_id fallback simple by resolving org_id from user_id via
UserManager lookup when X-Organization-Id is missing, without maintaining an
in-process cache.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore(core): make git HTTP debug logging always-on
Remove opt-in toggles for git HTTP debug logs and log proxy + dulwich request
context for every git smart-HTTP request.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
When exporting an agent with a conversation_id, the export function was
setting agent_state.message_ids from the conversation, but from_agent_state
was ignoring this and fetching messages generically via list_messages.
Now from_agent_state checks if message_ids is set and fetches those specific
messages instead.
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* fix(core): pass org_id to dulwich via header for git HTTP
* fix(core): use actor org id for git HTTP org header
Git smart HTTP proxies were reading `organization_id` from AgentState, which
is not present and caused 500s during clone/push. Use the authenticated
actor's org id while still performing an authorization check on the agent.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
Some providers (Groq, OpenRouter proxied providers) only support string
values for tool_choice ("none", "auto", "required"), not the object
format {"type": "function", "name": "..."}.
When force_tool_call is set, convert to "required" instead of object
format for these providers.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* feat: add usage columns to steps table
Adds denormalized usage fields to the steps table for easier querying:
- model_handle: The model handle (e.g., "openai/gpt-4o-mini")
- cached_input_tokens: Tokens served from cache
- cache_write_tokens: Tokens written to cache (Anthropic)
- reasoning_tokens: Reasoning/thinking tokens
These fields mirror LettaUsageStatistics and are extracted from the
existing prompt_tokens_details and completion_tokens_details JSON columns.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: regenerate OpenAPI specs and SDK for usage columns
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
---------
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Sarah Wooders <sarahwooders@users.noreply.github.com>
Provider traces were being created twice per step:
1. Via `request_async_with_telemetry` / `log_provider_trace_async` in LLMClient
2. Via direct `create_provider_trace_async` calls in LettaAgent
This caused duplicate records in provider_trace_metadata (Postgres) and
llm_traces (ClickHouse) for every agent step.
Changes:
- Remove redundant direct `create_provider_trace_async` calls from letta_agent.py
- Remove no-op `stream_async_with_telemetry` method (was just a pass-through to `stream_async`)
- Update callers to use `stream_async` directly
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
Provider traces were being created twice per step:
1. Via `request_async_with_telemetry` / `log_provider_trace_async` in LLMClient
2. Via direct `create_provider_trace_async` calls in LettaAgent
This caused duplicate records in provider_trace_metadata (Postgres) and
llm_traces (ClickHouse) for every agent step.
Removed the redundant direct calls since telemetry is now centralized
in the LLM client layer.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* feat(core): add git-backed memory repos and block manager
Introduce a GCS-backed git repository per agent as the source of truth for core
memory blocks. Add a GitEnabledBlockManager that writes block updates to git and
syncs values back into Postgres as a cache.
Default newly-created memory repos to the `main` branch.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* feat(core): serve memory repos over git smart HTTP
Run dulwich's WSGI HTTPGitApplication on a local sidecar port and proxy
/v1/git/* through FastAPI to support git clone/fetch/push directly against
GCS-backed memory repos.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): create memory repos on demand and stabilize git HTTP
- Ensure MemoryRepoManager creates the git repo on first write (instead of 500ing)
and avoids rewriting history by only auto-creating on FileNotFoundError.
- Simplify dulwich-thread async execution and auto-create empty repos on first
git clone.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): make dulwich optional for CI installs
Guard dulwich imports in the git smart HTTP router so the core server can boot
(and CI tests can run) without installing the memory-repo extra.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): guard git HTTP WSGI init when dulwich missing
Avoid instantiating dulwich's HTTPGitApplication at import time when dulwich
isn't installed (common in CI installs).
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): avoid masking send_message errors in finally
Initialize `result` before the agent loop so error paths (e.g. approval
validation) don't raise UnboundLocalError in the run-tracking finally block.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): stop event loop watchdog on FastAPI shutdown
Ensure the EventLoopWatchdog thread is stopped during FastAPI lifespan
shutdown to avoid daemon threads logging during interpreter teardown (seen in CI
unit tests).
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore(core): remove send_*_message_to_agent from SyncServer
Drop send_message_to_agent and send_group_message_to_agent from SyncServer and
route internal fire-and-forget messaging through send_messages helpers instead.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): backfill git memory repo when tag added
When an agent is updated to include the git-memory-enabled tag, ensure the
git-backed memory repo is created and initialized from the agent's current
blocks. Also support configuring the memory repo object store via
LETTA_OBJECT_STORE_URI.
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): preserve block tags on git-enabled updates
When updating a block for a git-memory-enabled agent, keep block tags in sync
with PostgreSQL (tags are not currently stored in the git repo).
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore(core): remove git-state legacy shims
- Rename optional dependency extra from memory-repo to git-state
- Drop legacy object-store env aliases and unused region config
- Simplify memory repo metadata to a single canonical format
- Remove unused repo-cache invalidation helper
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix(core): keep PR scope for git-backed blocks
- Revert unrelated change in fire-and-forget multi-agent send helper
- Route agent block updates-by-label through injected block manager only when needed
👾 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
---------
Co-authored-by: Letta <noreply@letta.com>
Fixes UnboundLocalError when client disconnects (EndOfStream) during
request processing - the finally block tried to access `result` which
was never assigned.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
Initialize `result` and `run_status` variables before the try block
to prevent UnboundLocalError in the finally block when exceptions
occur early in execution (e.g., during AgentLoop.load()).
Previously, if an exception was raised before `result` was assigned
inside the try block, the finally block would fail when trying to
access `result`, masking the original error with an UnboundLocalError.
This fix ensures the finally block can safely check and use these
variables regardless of when/where an exception occurs.
fix: check for actual error content, not just "error" key presence
OpenAI Responses API returns {"error": null} on success, which
incorrectly triggered is_error=True. Now we check if error_data
is truthy rather than just checking key existence.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
Both letta_llm_stream_adapter and simple_llm_stream_adapter were
creating ProviderTrace without call_type, causing "unknown" in
ClickHouse analytics.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* refactor: extract compact logic to shared function
Extract the compaction logic from LettaAgentV3.compact() into a
standalone compact_messages() function that can be shared between
the agent and temporal workflows.
Changes:
- Create apps/core/letta/services/summarizer/compact.py with:
- compact_messages(): Core compaction logic
- build_summarizer_llm_config(): LLM config builder for summarization
- CompactResult: Dataclass for compaction results
- Update LettaAgentV3.compact() to use compact_messages()
- Update temporal summarize_conversation_history activity to use
compact_messages() instead of the old Summarizer class
- Add use_summary_role parameter to SummarizeParams
This ensures consistent summarization behavior across different
execution paths and prevents drift as we improve the implementation.
* chore: clean up verbose comments
* fix: correct CompactionSettings import path
* fix: correct count_tokens import from summarizer_sliding_window
* fix: update test patch path for count_tokens_with_tools
After extracting compact logic to compact.py, the test was patching
the old location. Update the patch path to the new module location.
* fix: update test to use build_summarizer_llm_config from compact.py
The function was moved from LettaAgentV3._build_summarizer_llm_config
to compact.py as a standalone function.
* fix: add early check for system prompt size in compact_messages
Check if the system prompt alone exceeds the context window before
attempting summarization. The system prompt cannot be compacted,
so fail fast with SystemPromptTokenExceededError.
* fix: properly propagate SystemPromptTokenExceededError from compact
The exception handler in _step() was not setting the correct stop_reason
for SystemPromptTokenExceededError, which caused the finally block to
return early and swallow the exception.
Add special handling to set stop_reason to context_window_overflow_in_system_prompt
when SystemPromptTokenExceededError is caught.
* revert: remove redundant SystemPromptTokenExceededError handling
The special handling in the outer exception handler is redundant because
stop_reason is already set in the inner handler at line 943. The actual
fix for the test was the early check in compact_messages(), not this
redundant handling.
* fix: correctly re-raise SystemPromptTokenExceededError
The inner exception handler was using 'raise e' which re-raised the outer
ContextWindowExceededError instead of the current SystemPromptTokenExceededError.
Changed to 'raise' to correctly re-raise the current exception. This bug
was pre-existing but masked because _check_for_system_prompt_overflow was
only called as a fallback. The new early check in compact_messages() exposed it.
* revert: remove early check and restore raise e to match main behavior
* fix: set should_continue=False and correctly re-raise exception
- Add should_continue=False in SystemPromptTokenExceededError handler (matching main's _check_for_system_prompt_overflow behavior)
- Fix raise e -> raise to correctly propagate SystemPromptTokenExceededError
Note: test_large_system_prompt_summarization still fails locally but passes on main.
Need to investigate why exception isn't propagating correctly on refactored branch.
* fix: add SystemPromptTokenExceededError handler for post-step compaction
The post-step compaction (line 1066) was missing a SystemPromptTokenExceededError
exception handler. When compact_messages() raised this error, it would be caught
by the outer exception handler which would:
1. Set stop_reason to "error" instead of "context_window_overflow_in_system_prompt"
2. Not set should_continue = False
3. Get swallowed by the finally block (line 1126) which returns early
This caused test_large_system_prompt_summarization to fail because the exception
never propagated to the test.
The fix adds the same exception handler pattern used in the retry compaction flow
(line 941-946), ensuring proper state is set before re-raising.
This issue only affected the refactored code because on main, _check_for_system_prompt_overflow()
was an instance method that set should_continue/stop_reason BEFORE raising. In the refactor,
compact_messages() is a standalone function that cannot set instance state, so the caller
must handle the exception and set the state.
* feat: add ID format validation to agent and user schemas
Reuse existing validator types (ToolId, SourceId, BlockId, MessageId,
IdentityId, UserId) from letta.validators to enforce ID format validation
at the schema level. This ensures malformed IDs are rejected with a 422
validation error instead of causing 500 database errors.
Changes:
- CreateAgent: validate tool_ids, source_ids, folder_ids, block_ids, identity_ids
- UpdateAgent: validate tool_ids, source_ids, folder_ids, block_ids, message_ids, identity_ids
- UserUpdate: validate id
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: regenerate API spec and SDK
* fix: override ID validation in AgentSchema for agent file portability
AgentSchema extends CreateAgent but needs to allow arbitrary short IDs
(e.g., tool-0, block-0) for portable agent files. Override the validated
ID fields to use plain List[str] instead of the validated types.
Also fix test_agent.af to use proper UUID-format IDs.
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* chore: regenerate API spec and SDK
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix: revert test_agent.af - short IDs are valid for agent files
🤖 Generated with [Letta Code](https://letta.com)
Co-Authored-By: Letta <noreply@letta.com>
* fix openapi schema
---------
Co-authored-by: Letta <noreply@letta.com>
Log error traces to ClickHouse when streaming requests fail,
matching the behavior in letta_llm_stream_adapter.
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>