fix: preserve max_tokens on model_settings updates without max_output_tokens (#9591)
When model_settings is sent without max_output_tokens (e.g. only changing reasoning_effort), the Pydantic default of 4096 was being applied via _to_legacy_config_params(), silently overwriting the agent's existing max_tokens. Use model_fields_set to detect when max_output_tokens was not explicitly provided and skip overwriting max_tokens in that case. Only applied to the update path — on create, letting the default apply is reasonable since there's no pre-existing value.
This commit is contained in:
@@ -717,6 +717,10 @@ class SyncServer(object):
|
|||||||
agent = await self.agent_manager.get_agent_by_id_async(agent_id=agent_id, actor=actor)
|
agent = await self.agent_manager.get_agent_by_id_async(agent_id=agent_id, actor=actor)
|
||||||
request.llm_config = agent.llm_config.model_copy()
|
request.llm_config = agent.llm_config.model_copy()
|
||||||
update_llm_config_params = request.model_settings._to_legacy_config_params()
|
update_llm_config_params = request.model_settings._to_legacy_config_params()
|
||||||
|
# Don't clobber max_tokens with the Pydantic default when the caller
|
||||||
|
# didn't explicitly provide max_output_tokens in the request.
|
||||||
|
if "max_output_tokens" not in request.model_settings.model_fields_set:
|
||||||
|
update_llm_config_params.pop("max_tokens", None)
|
||||||
request.llm_config = request.llm_config.model_copy(update=update_llm_config_params)
|
request.llm_config = request.llm_config.model_copy(update=update_llm_config_params)
|
||||||
|
|
||||||
# Copy parallel_tool_calls from request to llm_config if provided
|
# Copy parallel_tool_calls from request to llm_config if provided
|
||||||
|
|||||||
Reference in New Issue
Block a user