fix: preserve max_tokens on model_settings updates without max_output_tokens (#9591)

When model_settings is sent without max_output_tokens (e.g. only changing reasoning_effort), the Pydantic default of 4096 was being applied via _to_legacy_config_params(), silently overwriting the agent's existing max_tokens. Use model_fields_set to detect when max_output_tokens was not explicitly provided and skip overwriting max_tokens in that case. Only applied to the update path — on create, letting the default apply is reasonable since there's no pre-existing value.
2026-02-20 12:27:08 -08:00
parent 857c289ed2
commit 257b99923b
1 changed files with 4 additions and 0 deletions
--- a/letta/server/server.py
+++ b/letta/server/server.py
@@ -717,6 +717,10 @@ class SyncServer(object):
                agent = await self.agent_manager.get_agent_by_id_async(agent_id=agent_id, actor=actor)
                request.llm_config = agent.llm_config.model_copy()
            update_llm_config_params = request.model_settings._to_legacy_config_params()
            # Don't clobber max_tokens with the Pydantic default when the caller
            # didn't explicitly provide max_output_tokens in the request.
            if "max_output_tokens" not in request.model_settings.model_fields_set:
                update_llm_config_params.pop("max_tokens", None)
            request.llm_config = request.llm_config.model_copy(update=update_llm_config_params)
        # Copy parallel_tool_calls from request to llm_config if provided