Commit Graph

107 Commits

Author SHA1 Message Date
Kian Jones
b9c4ed3b15 fix: catch contextwindowexceeded error on gemini (#9450)
* catch contextwindowexceeded error

* fix(core): detect Google token limit errors as ContextWindowExceededError

Google's error message says "input token count exceeds the maximum
number of tokens allowed" which doesn't contain the word "context",
so it was falling through to generic LLMBadRequestError instead of
ContextWindowExceededError. This means compaction won't auto-trigger.

Expands the detection to also match "token count" and "tokens allowed"
in addition to the existing "context" keyword.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): add missing message arg to LLMBadRequestError in OpenAI client

The generic 400 path in handle_llm_error was constructing
LLMBadRequestError without the required message positional arg,
causing TypeError in prod during summarization.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* ci: add adapters/ test suite to core unit test matrix

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(tests): update adapter error handling test expectations to match actual behavior

The streaming adapter's error handling double-wraps errors: the
AnthropicStreamingInterface calls handle_llm_error first, then the
adapter catches the result and calls handle_llm_error again, which
falls through to the base class LLMError. Updated test expectations
to match this behavior.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): prevent double-wrapping of LLMError in stream adapter

The AnthropicStreamingInterface.process() already transforms raw
provider errors into LLMError subtypes via handle_llm_error. The
adapter was catching the result and calling handle_llm_error again,
which didn't recognize the already-transformed LLMError and wrapped
it in a generic LLMError("Unhandled LLM error"). This downgraded
specific error types (LLMConnectionError, LLMServerError, etc.)
and broke retry logic that matches on specific subtypes.

Now the adapter checks if the error is already an LLMError and
re-raises it as-is. Tests restored to original correct expectations.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
411bb63990 fix(core): improve error handling for upstream LLM provider errors (#9423)
Handle HTML error responses from ALB/load balancers in OpenAI client and
add explicit InternalServerError handling for Anthropic upstream issues.

🐛 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
382e216cbb fix(core): differentiate BYOK vs base provider in all LLM error details (#9425)
Add is_byok flag to every LLMError's details dict returned from
handle_llm_error across all providers (OpenAI, Anthropic, Google,
ChatGPT OAuth). This enables observability into whether errors
originate from Letta's production keys or user-provided BYOK keys.

The rate limit handler in app.py now returns a more helpful message
for BYOK users ("check your provider's rate limits and billing")
versus the generic message for base provider rate limits.

Datadog issues:
- https://us5.datadoghq.com/error-tracking/issue/b711c824-f490-11f0-96e4-da7ad0900000
- https://us5.datadoghq.com/error-tracking/issue/76623036-f4de-11f0-8697-da7ad0900000
- https://us5.datadoghq.com/error-tracking/issue/43e9888a-dfcf-11f0-a645-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kevin Lin
23c94ec6d3 feat: add log probabilities from OpenAI-compatible servers and SGLang native endpoint (#9240)
* Add log probabilities support for RL training

This enables Letta server to request and return log probabilities from
OpenAI-compatible providers (including SGLang) for use in RL training.

Changes:
- LLMConfig: Add return_logprobs and top_logprobs fields
- OpenAIClient: Set logprobs in ChatCompletionRequest when enabled
- LettaLLMAdapter: Add logprobs field and extract from response
- LettaResponse: Add logprobs field to return log probs to client
- LettaRequest: Add return_logprobs/top_logprobs for per-request override
- LettaAgentV3: Store and pass logprobs through to response
- agents.py: Handle request-level logprobs override

Usage:
  response = client.agents.messages.create(
      agent_id=agent_id,
      messages=[...],
      return_logprobs=True,
      top_logprobs=5,
  )
  print(response.logprobs)  # Per-token log probabilities

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* Add multi-turn token tracking for RL training via SGLang native endpoint

- Add TurnTokenData schema to track token IDs and logprobs per turn
- Add return_token_ids flag to LettaRequest and LLMConfig
- Create SGLangNativeClient for /generate endpoint (returns output_ids)
- Create SGLangNativeAdapter that uses native endpoint
- Modify LettaAgentV3 to accumulate turns across LLM calls
- Include turns in LettaResponse when return_token_ids=True

* Fix: Add SGLang native adapter to step() method, not just stream()

* Fix: Handle Pydantic Message objects in SGLang native adapter

* Fix: Remove api_key reference from LLMConfig (not present)

* Fix: Add missing 'created' field to ChatCompletionResponse

* Add full tool support to SGLang native adapter

- Format tools into prompt in Qwen-style format
- Parse tool calls from <tool_call> tags in response
- Format tool results as <tool_response> in user messages
- Set finish_reason to 'tool_calls' when tools are called

* Use tokenizer.apply_chat_template for proper tool formatting

- Add tokenizer caching in SGLang native adapter
- Use apply_chat_template when tokenizer available
- Fall back to manual formatting if not
- Convert Letta messages to OpenAI format for tokenizer

* Fix: Use func_response instead of tool_return for ToolReturn content

* Fix: Get output_token_logprobs from meta_info in SGLang response

* Fix: Allow None in output_token_logprobs (SGLang format includes null)

* chore: remove unrelated files from logprobs branch

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add missing call_type param to adapter constructors in letta_agent_v3

The SGLang refactor dropped call_type=LLMCallType.agent_step when extracting
adapter creation into conditional blocks. Restores it for all 3 spots (SGLang
in step, SimpleLLM in step, SGLang in stream).

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* just stage-api && just publish-api

* fix: update expected LLMConfig fields in schema test for logprobs support

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* chore: remove rllm provider references

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* just stage-api && just publish-api

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-65-206.ec2.internal>
Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
4c753f3f3c fix: handle non-JSON responses from LLM provider endpoints (#9362)
When an OpenAI/Anthropic-compatible endpoint returns a non-JSON response
(e.g. HTML error page), the SDK's paginated response parser falls back
to returning a raw string. The post-parser then calls
_set_private_attributes() on that string, causing an AttributeError.

Add explicit AttributeError handling around SDK models.list() calls in
provider check_api_key/list_llm_models_async methods, and add type
guards in convert_response_to_chat_completion to reject raw strings
before Pydantic model construction.

Datadog: https://us5.datadoghq.com/error-tracking/issue/59a7a206-00b8-11f1-be73-da7ad0900000

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:07 -08:00
Kian Jones
d48932bdb6 fix(core): sanitize Unicode surrogates in all LLM client requests (#9323)
Multiple OpenAI-compatible LLM clients (Azure, Deepseek, Groq, Together, XAI, ZAI)
and Anthropic-compatible clients (Anthropic, MiniMax, Google Vertex) were overriding
request_async/stream_async without calling sanitize_unicode_surrogates, causing
UnicodeEncodeError when message content contained lone UTF-16 surrogates.

Root cause: Child classes override parent methods but omit the sanitization step that
the base OpenAIClient includes. This allows corrupted Unicode (unpaired surrogates
from malformed emoji) to reach the httpx layer, which rejects it during UTF-8 encoding.

Fix: Import and call sanitize_unicode_surrogates in all overridden request methods.
Also removed duplicate sanitize_unicode_surrogates definition from openai_client.py
that shadowed the canonical implementation in letta.helpers.json_helpers.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

Issue-ID: 10c0f2e4-f87b-11f0-b91c-da7ad0900000
2026-02-24 10:52:06 -08:00
Kian Jones
662ec082cf fix(core): handle MCP errors and API key whitespace (#9306)
* fix: strip whitespace from API keys in LLM client headers

Fixes httpx.LocalProtocolError when API keys contain leading/trailing whitespace.
Strips whitespace from API keys before using them in HTTP headers across:
- OpenAI client (openai.py)
- Mistral client (mistral.py)
- Anthropic client (anthropic_client.py)
- Anthropic schema provider (schemas/providers/anthropic.py)
- Google AI client (google_ai_client.py)
- Proxy helpers (proxy_helpers.py)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: handle McpError gracefully in MCP client execute_tool

Return error as failed result instead of re-raising to avoid Datadog alerts for expected user-facing errors like missing tool arguments.

* fix: strip whitespace from API keys before passing to httpx client

Fixes httpx.LocalProtocolError by stripping leading/trailing whitespace
from API keys before passing them to OpenAI/AsyncOpenAI clients. The
OpenAI client library constructs Authorization headers internally, and
invalid header values (like keys with leading spaces) cause protocol
errors.

Applied fix to:
- azure_client.py (AzureOpenAI/AsyncAzureOpenAI)
- deepseek_client.py (OpenAI/AsyncOpenAI)
- openai_client.py (OpenAI/AsyncOpenAI via kwargs)
- xai_client.py (OpenAI/AsyncOpenAI)

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: handle JSONDecodeError in OpenAI client requests

Catches json.JSONDecodeError from OpenAI SDK when API returns invalid
JSON (typically HTML error pages from 500-series errors) and converts
to LLMServerError with helpful details.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): strip API key whitespace at schema level on write/create

Add field_validator to ProviderCreate, ProviderUpdate, and ProviderCheck
schemas to strip whitespace from api_key and access_key fields before
persistence. This ensures keys are clean at the point of entry, preventing
whitespace from being encrypted and stored in the database.

Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>

* refactor: remove api_key.strip() calls across all LLM clients

Remove redundant .strip() calls on api_key parameters since pydantic models
now handle whitespace trimming at the validation layer. This centralizes
the validation logic and follows DRY principles.

- Updated 13 files across multiple LLM client implementations
- Removed 34 occurrences of api_key.strip()
- Includes: OpenAI, Anthropic, Azure, Google AI, Groq, XAI, DeepSeek, ZAI, Together, Mistral
- Also updated proxy helpers and provider schemas

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* refactor: remove redundant ternary operators from api_key parameters

Remove `if api_key else None` ternaries since pydantic validation ensures
api_key is either a valid string or None. The ternary was defensive programming
that's now unnecessary with proper model-level validation.

- Simplified 23 occurrences across 7 files
- Cleaner, more concise client initialization code
- No behavioral change since pydantic already handles this

👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Kian Jones <kianjones9@users.noreply.github.com>
2026-02-24 10:52:06 -08:00
Kian Jones
be60697a62 fix(core): handle protocol errors and foreign key violations (#9308)
* fix(core): handle PermissionDeniedError in provider API key validation

Fixed OpenAI PermissionDeniedError being raised as unknown error when
validating provider API keys. The check_api_key methods in OpenAI-based
providers (OpenAI, OpenRouter, Azure, Together) now properly catch and
re-raise PermissionDeniedError as LLMPermissionDeniedError.

🐛 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): handle Unicode surrogates in OpenAI requests

Sanitize invalid UTF-16 surrogates before sending requests to OpenAI API.
Fixes UnicodeEncodeError when message content contains unpaired surrogates
from corrupted emoji data or malformed Unicode sequences.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): handle MCP tool schema validation errors gracefully

Catch fastmcp.exceptions.ToolError in execute_mcp_tool endpoint and
convert to LettaInvalidArgumentError (400) instead of letting it
propagate as 500 error. This is an expected user error when tool
arguments don't match the MCP tool's schema.

Fixes Datadog issue 8f2d874a-f8e5-11f0-9b25-da7ad0900000

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix(core): handle ExceptionGroup-wrapped ToolError in MCP executor

When MCP tools fail with validation errors (e.g., missing required parameters),
fastmcp raises ToolError exceptions that may be wrapped in ExceptionGroup by
Python's async TaskGroup. The exception handler now unwraps single-exception
groups before checking if the error should be handled gracefully.

Fixes Calendly API "organization parameter missing" errors being logged to
Datadog instead of returning friendly error messages to users.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: handle missing agent in create_conversation to prevent foreign key violation

* Update .gitignore

---------

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Kian Jones
6f746c5225 fix(core): handle Anthropic overloaded errors and Unicode encoding issues (#9305)
* fix: handle Anthropic overloaded_error in streaming interfaces

* fix: handle Unicode surrogates in OpenAI requests

Sanitize Unicode surrogate pairs before sending requests to OpenAI API.
Surrogate pairs (U+D800-U+DFFF) are UTF-16 encoding artifacts that cause
UnicodeEncodeError when encoding to UTF-8.

Fixes Datadog error: 'utf-8' codec can't encode character '\ud83c' in
position 326605: surrogates not allowed

* fix: handle UnicodeEncodeError from lone Unicode surrogates in OpenAI requests

Improved sanitize_unicode_surrogates() to explicitly filter out lone
surrogate characters (U+D800 to U+DFFF) which are invalid in UTF-8.

Previous implementation used errors='ignore' which could still fail in
edge cases. New approach directly checks Unicode code points and removes
any surrogates before data reaches httpx encoding.

Also added sanitization to stream_async_responses() method which was
missing it.

Fixes: 'utf-8' codec can't encode character '\ud83c' in position X:
surrogates not allowed
2026-02-24 10:52:06 -08:00
cthomas
09d7940090 fix: use string tool_choice for Groq and OpenRouter (#9267)
Some providers (Groq, OpenRouter proxied providers) only support string
values for tool_choice ("none", "auto", "required"), not the object
format {"type": "function", "name": "..."}.

When force_tool_call is set, convert to "required" instead of object
format for these providers.

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Ari Webb
170886b8a8 fix: compaction on 413 from openai (#9263) 2026-02-24 10:52:06 -08:00
Ari Webb
0bbb9c9bc0 feat: add reasoning zai openrouter (#9189)
* feat: add reasoning zai openrouter

* add openrouter reasoning

* stage + publish api

* openrouter reasoning always on

* revert

* fix

* remove reference

* do
2026-02-24 10:52:06 -08:00
Kian Jones
01cb00ae10 Revert "fix: truncate oversized text in embedding requests" (#9227)
Revert "fix: truncate oversized text in embedding requests (#9196)"

This reverts commit a9c342087e022519c63d62fb76b72aed8859539b.
2026-02-24 10:52:06 -08:00
Kian Jones
630c147b13 fix: truncate oversized text in embedding requests (#9196)
fix: handle oversized text in embedding requests with recursive chunking

When message text exceeds the embedding model's context length, recursively
split it until all chunks can be embedded successfully.

Changes:
- `tpuf_client.py`: Add `_split_text_in_half()` helper for recursive splitting
- `tpuf_client.py`: Add `_generate_embeddings_with_chunking()` that retries
  with splits on context length errors
- `tpuf_client.py`: Store `message_id` and `chunk_index` columns in Turbopuffer
- `tpuf_client.py`: Deduplicate query results by `message_id`
- `tpuf_client.py`: Use `LettaInvalidArgumentError` instead of `ValueError`
- `tpuf_client.py`: Move LLMClient import to top of file
- `openai_client.py`: Remove fixed truncation (chunking handles this now)
- Add tests for `_split_text_in_half` and chunked query deduplication

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: Letta <noreply@letta.com>
2026-02-24 10:52:06 -08:00
Ari Webb
501de90d6c fix: fix base openrouter (#9171) 2026-01-29 12:44:04 -08:00
Ari Webb
bcd90859ec fix: byok uses our key, fix that (#9169) 2026-01-29 12:44:04 -08:00
Sarah Wooders
221b4e6279 refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065)
👾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-29 12:44:04 -08:00
jnjpng
a98bc31bf3 fix: refactor enable strict mode for structured output (#8840)
* base

* test
2026-01-19 15:54:42 -08:00
jnjpng
21c70323df fix: respect strict mode for temporal with fallback on client (#8839)
* base

* update
2026-01-19 15:54:42 -08:00
jnjpng
85c242077e feat: strict tool calling setting (#8810)
base
2026-01-19 15:54:42 -08:00
Sarah Wooders
97cdfb4225 Revert "feat: add strict tool calling setting [LET-6902]" (#8720)
Revert "feat: add strict tool calling setting [LET-6902] (#8577)"

This reverts commit 697c9d0dee6af73ec4d5d98780e2ca7632a69173.
2026-01-19 15:54:39 -08:00
Sarah Wooders
bdede5f90c feat: add strict tool calling setting [LET-6902] (#8577) 2026-01-19 15:54:38 -08:00
github-actions[bot]
f2171447a8 fix: handle httpx.ReadError, WriteError, and ConnectError in LLM streaming clients (#8243)
Adds explicit handling for httpx network errors (ReadError, WriteError,
ConnectError) in AnthropicClient, OpenAIClient, and GoogleVertexClient.
These errors can occur during streaming when the connection is unexpectedly
closed while reading/writing data.

Maps these errors to LLMConnectionError for consistent error handling.

Fixes #8221 (and duplicate #8156)

🤖 Generated with [Letta Code](https://letta.com)

Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
2026-01-12 10:57:49 -08:00
github-actions[bot]
76008c61f4 fix: handle httpx.RemoteProtocolError during LLM streaming (#8206) 2026-01-12 10:57:48 -08:00
Sarah Wooders
8729a037b9 fix: handle new openai overflow error format (#7110) 2025-12-17 17:31:02 -08:00
Kevin Lin
4b9485a484 feat: Add max tokens exceeded to stop reasons [LET-6480] (#6576) 2025-12-15 12:03:09 -08:00
Ari Webb
4d90f37f50 feat: add gpt-5.2 support (#6698) 2025-12-15 12:02:34 -08:00
Sarah Wooders
c8fa77a01f feat: cleanup cancellation code and add more logging (#6588) 2025-12-15 12:02:34 -08:00
Kian Jones
d6292b6eb6 fix: bug which causes unrecoverable state if previous message was an image (#6486)
* trying tout gpt-5.1-codex

* add unit test for message content

* try to support multimodal
2025-12-15 12:02:33 -08:00
Sarah Wooders
91e3dd8b3e feat: fix new summarizer code and add more tests (#6461) 2025-12-15 12:02:19 -08:00
Charles Packer
131891e05f feat: add tracking of advanced usage data (eg caching) [LET-6372] (#6449)
* feat: init refactor

* feat: add helper code

* fix: missing file + test

* fix: just state/publish api
2025-12-15 12:02:19 -08:00
Ari Webb
89c7ab5f14 feat: structured outputs for openai [LET-6233] (#6363)
* first hack with test

* remove changes integration test

* Delete apps/core/tests/sdk_v1/integration/integration_test_send_message_v2.py

* add test

* remove comment

* stage and publish api

* deprecate base level response_schema

* add param to llm_config test

---------

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-26 14:39:39 -08:00
cthomas
6f810d95d8 feat: add semaphore to limit embeddings creation (#6261) 2025-11-24 19:10:11 -08:00
Ari Webb
7380eaec13 feat: enable gpt5.1 models [LET-6178] (#6175)
* hack at gpt51

* revert package lock

* first hack

* default context window

---------

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-24 19:09:33 -08:00
Kian Jones
ddb6f3836e Fix: prevent empty embedding batches from causing memory spikes (#6230)
Root cause: When splitting failed embedding batches, mid=0 for single
items created empty chunks. These empty chunks were then processed,
creating hundreds of no-op tasks that consumed memory.

Crash pattern from logs:
- 600+ 'batch_size=0' embedding tasks created
- Memory spiked 531 MB → 4.9 GB
- Pod crashed

Fixes:
1. Skip empty chunks before creating tasks
2. Guard chunk splits to prevent empty slices (mid = max(1, len//2))
3. Break early if all chunks are empty

This prevents the asyncio.gather() from creating thousands of empty
coroutines that exhaust memory.
2025-11-24 19:09:33 -08:00
Ari Webb
f19a71dad1 chore: log problematic chunk (#6166)
log problematic chunk

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-24 19:09:32 -08:00
Kian Jones
aafd5696c5 fix: logging and try to handle invalid embeddings (#6145)
logging and try to handkle invalid embeddings
2025-11-13 15:36:56 -08:00
Sarah Wooders
57bb051ea4 feat: add tool return truncation to summarization as a fallback [LET-5970] (#5859) 2025-11-13 15:36:30 -08:00
Ari Webb
48cc73175b feat: parallel tool calling for openai non streaming [LET-4593] (#5773)
* first hack

* clean up

* first implementation working

* revert package-lock

* remove openai test

* error throw

* typo

* Update integration_test_send_message_v2.py

* Update integration_test_send_message_v2.py

* refine test

* Only make changes for openai non streaming

* Add tests

---------

Co-authored-by: Ari Webb <ari@letta.com>
Co-authored-by: Matt Zhou <mattzh1314@gmail.com>
2025-11-13 15:36:14 -08:00
Ari Webb
f3a40a41f5 feat: updated backend to not allow minimal for codex [LET-5883] (#5760)
* updated backend

* add function in openai_client

* remove values before error

* remove test

---------

Co-authored-by: Ari Webb <ari@letta.com>
2025-11-13 15:35:34 -08:00
Matthew Zhou
09ba075cfa feat: Modify embedding strategy to first halve the batch size v.s. the batc… [LET-5510] (#5434)
Modify embedding strategy to first halve the batch size v.s. the batch size
2025-10-24 15:12:11 -07:00
Matthew Zhou
0543a60538 chore: Restore chunk size for openai embeddings (#5431)
Restore chunk size
2025-10-24 15:12:11 -07:00
Ari Webb
624c591820 Ari/let 5486 badrequesterror error code 400 error message requested [LET-5486] (#5422)
* letta agent v2 throw exception not error

* warning instead of error or exception

* decrease min_chunk_size

---------

Co-authored-by: Ari Webb <ari@letta.com>
2025-10-24 15:12:11 -07:00
Kevin Lin
08da1a64bb feat: parse reasoning_content from OAI proxies (eg. vLLM / OpenRouter) (#5372)
* reasonig_content support

* fix

* comment

* fix

* rm comment

---------

Co-authored-by: Charles Packer <packercharles@gmail.com>
2025-10-24 15:11:31 -07:00
cthomas
a1e771877d feat: add model name patch for openrouter (#5303)
* feat: add model name patch for openrouter

* add comment
2025-10-09 15:25:21 -07:00
cthomas
f8dce88ce4 feat: support for models that do not allow None content (#5218) 2025-10-07 17:50:50 -07:00
cthomas
a3545110cf feat: add full responses api support in new agent loop (#5051)
* feat: add full responses api support in new agent loop

* update matrix in workflow

* relax check for reasoning messages for high effort gpt 5

* fix indent

* one more relax
2025-10-07 17:50:48 -07:00
Matthew Zhou
df5c997da0 feat: Enable dynamic toggling of tool choice in v3 agent loop for OpenAI [LET-4564] (#5042)
* Add subsequent flag

* Finish integrating constrained/unconstrained toggling on v3 agent loop

* Update tests to run on v3

* Run lint
2025-10-07 17:50:47 -07:00
Sarah Wooders
b5de42fefd fix: patch summarizers for integration_test_send_message.py (#4919)
* fix: integration_test_send_message.py

* patch summarizer

* remove print
2025-10-07 17:50:46 -07:00
Charles Packer
8da15aaf08 fix(core): patch issue where LLM may generate a 'noop' call [PRO-1340] (#4944)
fix(core): patch issue where LLM may generate a 'noop' call
2025-10-07 17:50:46 -07:00