Files
letta-server/letta/server/server.py
jnjpng 46971414a4 fix: preserve agent max_tokens when caller doesn't explicitly set it (#9679)
* fix: preserve agent max_tokens when caller doesn't explicitly set it

When updating an agent with convenience fields (model, model_settings)
but without an explicit max_tokens, the server was constructing a fresh
LLMConfig via get_llm_config_from_handle_async. The Pydantic validator
on LLMConfig hardcodes max_tokens=16384 for gpt-5* models, silently
overriding the agent's existing value (e.g. 128000).

This was triggered by reasoning tab-switch in the CLI, which sends
model + model_settings (with reasoning_effort) but no max_tokens.

Now, when request.max_tokens is None we carry forward the agent's
current max_tokens instead of accepting the provider default.

* fix: use correct 128k max_output_tokens defaults for gpt-5.2/5.3

- Update OpenAI provider fallback to return 128000 for gpt-5.2*/5.3*
  models (except -chat variants which are 16k)
- Update LLMConfig Pydantic validator to match
- Update gpt-5.2 default_config factory to use 128000
- Move server-side max_tokens preservation guard into the
  model_settings branch where llm_config is already available
2026-03-03 18:34:01 -08:00

86 KiB