refactor: extract compact logic to shared function for temporal (#9249)
* refactor: extract compact logic to shared function Extract the compaction logic from LettaAgentV3.compact() into a standalone compact_messages() function that can be shared between the agent and temporal workflows. Changes: - Create apps/core/letta/services/summarizer/compact.py with: - compact_messages(): Core compaction logic - build_summarizer_llm_config(): LLM config builder for summarization - CompactResult: Dataclass for compaction results - Update LettaAgentV3.compact() to use compact_messages() - Update temporal summarize_conversation_history activity to use compact_messages() instead of the old Summarizer class - Add use_summary_role parameter to SummarizeParams This ensures consistent summarization behavior across different execution paths and prevents drift as we improve the implementation. * chore: clean up verbose comments * fix: correct CompactionSettings import path * fix: correct count_tokens import from summarizer_sliding_window * fix: update test patch path for count_tokens_with_tools After extracting compact logic to compact.py, the test was patching the old location. Update the patch path to the new module location. * fix: update test to use build_summarizer_llm_config from compact.py The function was moved from LettaAgentV3._build_summarizer_llm_config to compact.py as a standalone function. * fix: add early check for system prompt size in compact_messages Check if the system prompt alone exceeds the context window before attempting summarization. The system prompt cannot be compacted, so fail fast with SystemPromptTokenExceededError. * fix: properly propagate SystemPromptTokenExceededError from compact The exception handler in _step() was not setting the correct stop_reason for SystemPromptTokenExceededError, which caused the finally block to return early and swallow the exception. Add special handling to set stop_reason to context_window_overflow_in_system_prompt when SystemPromptTokenExceededError is caught. * revert: remove redundant SystemPromptTokenExceededError handling The special handling in the outer exception handler is redundant because stop_reason is already set in the inner handler at line 943. The actual fix for the test was the early check in compact_messages(), not this redundant handling. * fix: correctly re-raise SystemPromptTokenExceededError The inner exception handler was using 'raise e' which re-raised the outer ContextWindowExceededError instead of the current SystemPromptTokenExceededError. Changed to 'raise' to correctly re-raise the current exception. This bug was pre-existing but masked because _check_for_system_prompt_overflow was only called as a fallback. The new early check in compact_messages() exposed it. * revert: remove early check and restore raise e to match main behavior * fix: set should_continue=False and correctly re-raise exception - Add should_continue=False in SystemPromptTokenExceededError handler (matching main's _check_for_system_prompt_overflow behavior) - Fix raise e -> raise to correctly propagate SystemPromptTokenExceededError Note: test_large_system_prompt_summarization still fails locally but passes on main. Need to investigate why exception isn't propagating correctly on refactored branch. * fix: add SystemPromptTokenExceededError handler for post-step compaction The post-step compaction (line 1066) was missing a SystemPromptTokenExceededError exception handler. When compact_messages() raised this error, it would be caught by the outer exception handler which would: 1. Set stop_reason to "error" instead of "context_window_overflow_in_system_prompt" 2. Not set should_continue = False 3. Get swallowed by the finally block (line 1126) which returns early This caused test_large_system_prompt_summarization to fail because the exception never propagated to the test. The fix adds the same exception handler pattern used in the retry compaction flow (line 941-946), ensuring proper state is set before re-raising. This issue only affected the refactored code because on main, _check_for_system_prompt_overflow() was an instance method that set should_continue/stop_reason BEFORE raising. In the refactor, compact_messages() is a standalone function that cannot set instance state, so the caller must handle the exception and set the state.
This commit is contained in:
@@ -59,13 +59,9 @@ from letta.server.rest_api.utils import (
|
||||
)
|
||||
from letta.services.conversation_manager import ConversationManager
|
||||
from letta.services.helpers.tool_parser_helper import runtime_override_tool_json_schema
|
||||
from letta.services.summarizer.summarizer_all import summarize_all
|
||||
from letta.services.summarizer.compact import compact_messages
|
||||
from letta.services.summarizer.summarizer_config import CompactionSettings
|
||||
from letta.services.summarizer.summarizer_sliding_window import (
|
||||
count_tokens,
|
||||
count_tokens_with_tools,
|
||||
summarize_via_sliding_window,
|
||||
)
|
||||
from letta.services.summarizer.summarizer_sliding_window import count_tokens
|
||||
from letta.settings import settings, summarizer_settings
|
||||
from letta.system import package_function_response, package_summarize_message_no_counts
|
||||
from letta.utils import log_telemetry, validate_function_response
|
||||
@@ -943,10 +939,11 @@ class LettaAgentV3(LettaAgentV2):
|
||||
|
||||
continue
|
||||
except SystemPromptTokenExceededError:
|
||||
self.should_continue = False
|
||||
self.stop_reason = LettaStopReason(
|
||||
stop_reason=StopReasonType.context_window_overflow_in_system_prompt.value
|
||||
)
|
||||
raise e
|
||||
raise
|
||||
except Exception as e:
|
||||
self.stop_reason = LettaStopReason(stop_reason=StopReasonType.error.value)
|
||||
self.logger.error(f"Unknown error occured for summarization run {run_id}: {e}")
|
||||
@@ -1065,34 +1062,39 @@ class LettaAgentV3(LettaAgentV2):
|
||||
trigger="post_step_context_check",
|
||||
)
|
||||
|
||||
summary_message, messages, summary_text = await self.compact(
|
||||
messages,
|
||||
trigger_threshold=self.agent_state.llm_config.context_window,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
use_summary_role=include_compaction_messages,
|
||||
trigger="post_step_context_check",
|
||||
context_tokens_before=context_tokens_before,
|
||||
messages_count_before=messages_count_before,
|
||||
)
|
||||
self.response_messages.append(summary_message)
|
||||
try:
|
||||
summary_message, messages, summary_text = await self.compact(
|
||||
messages,
|
||||
trigger_threshold=self.agent_state.llm_config.context_window,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
use_summary_role=include_compaction_messages,
|
||||
trigger="post_step_context_check",
|
||||
context_tokens_before=context_tokens_before,
|
||||
messages_count_before=messages_count_before,
|
||||
)
|
||||
self.response_messages.append(summary_message)
|
||||
|
||||
# Yield summary result message to client
|
||||
for msg in self._create_summary_result_message(
|
||||
summary_message=summary_message,
|
||||
summary_text=summary_text,
|
||||
step_id=step_id,
|
||||
run_id=run_id,
|
||||
include_compaction_messages=include_compaction_messages,
|
||||
):
|
||||
yield msg
|
||||
# Yield summary result message to client
|
||||
for msg in self._create_summary_result_message(
|
||||
summary_message=summary_message,
|
||||
summary_text=summary_text,
|
||||
step_id=step_id,
|
||||
run_id=run_id,
|
||||
include_compaction_messages=include_compaction_messages,
|
||||
):
|
||||
yield msg
|
||||
|
||||
await self._checkpoint_messages(
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
new_messages=[summary_message],
|
||||
in_context_messages=messages,
|
||||
)
|
||||
await self._checkpoint_messages(
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
new_messages=[summary_message],
|
||||
in_context_messages=messages,
|
||||
)
|
||||
except SystemPromptTokenExceededError:
|
||||
self.should_continue = False
|
||||
self.stop_reason = LettaStopReason(stop_reason=StopReasonType.context_window_overflow_in_system_prompt.value)
|
||||
raise
|
||||
|
||||
except Exception as e:
|
||||
# NOTE: message persistence does not happen in the case of an exception (rollback to previous state)
|
||||
@@ -1678,256 +1680,29 @@ class LettaAgentV3(LettaAgentV2):
|
||||
context_tokens_before: Token count before compaction (for stats).
|
||||
messages_count_before: Message count before compaction (for stats).
|
||||
"""
|
||||
# Determine compaction settings: passed-in > agent's > global defaults
|
||||
effective_compaction_settings = compaction_settings or self.agent_state.compaction_settings
|
||||
|
||||
# Use the passed-in compaction_settings first, then agent's compaction_settings if set,
|
||||
# otherwise fall back to global defaults based on the agent's model handle.
|
||||
if compaction_settings is not None:
|
||||
summarizer_config = compaction_settings
|
||||
elif self.agent_state.compaction_settings is not None:
|
||||
summarizer_config = self.agent_state.compaction_settings
|
||||
else:
|
||||
# Prefer the new handle field if set, otherwise derive from llm_config
|
||||
if self.agent_state.model is not None:
|
||||
handle = self.agent_state.model
|
||||
else:
|
||||
llm_cfg = self.agent_state.llm_config
|
||||
handle = llm_cfg.handle or f"{llm_cfg.model_endpoint_type}/{llm_cfg.model}"
|
||||
|
||||
summarizer_config = CompactionSettings(model=handle)
|
||||
|
||||
# Build the LLMConfig used for summarization
|
||||
summarizer_llm_config = await self._build_summarizer_llm_config(
|
||||
agent_llm_config=self.agent_state.llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
)
|
||||
|
||||
summarization_mode_used = summarizer_config.mode
|
||||
if summarizer_config.mode == "all":
|
||||
summary, compacted_messages = await summarize_all(
|
||||
actor=self.actor,
|
||||
llm_config=summarizer_llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
in_context_messages=messages,
|
||||
agent_id=self.agent_state.id,
|
||||
agent_tags=self.agent_state.tags,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
elif summarizer_config.mode == "sliding_window":
|
||||
try:
|
||||
summary, compacted_messages = await summarize_via_sliding_window(
|
||||
actor=self.actor,
|
||||
llm_config=summarizer_llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
in_context_messages=messages,
|
||||
agent_id=self.agent_state.id,
|
||||
agent_tags=self.agent_state.tags,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
except Exception as e:
|
||||
self.logger.error(f"Sliding window summarization failed with exception: {str(e)}. Falling back to all mode.")
|
||||
summary, compacted_messages = await summarize_all(
|
||||
actor=self.actor,
|
||||
llm_config=summarizer_llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
in_context_messages=messages,
|
||||
agent_id=self.agent_state.id,
|
||||
agent_tags=self.agent_state.tags,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
summarization_mode_used = "all"
|
||||
else:
|
||||
raise ValueError(f"Invalid summarizer mode: {summarizer_config.mode}")
|
||||
|
||||
# update the token count (including tools for accurate comparison with LLM's prompt_tokens)
|
||||
self.context_token_estimate = await count_tokens_with_tools(
|
||||
result = await compact_messages(
|
||||
actor=self.actor,
|
||||
llm_config=self.agent_state.llm_config,
|
||||
messages=compacted_messages,
|
||||
tools=self.agent_state.tools,
|
||||
)
|
||||
self.logger.info(f"Context token estimate after summarization: {self.context_token_estimate}")
|
||||
|
||||
# if the trigger_threshold is provided, we need to make sure that the new token count is below it
|
||||
if trigger_threshold is not None and self.context_token_estimate is not None and self.context_token_estimate >= trigger_threshold:
|
||||
# If even after summarization the context is still at or above
|
||||
# the proactive summarization threshold, treat this as a hard
|
||||
# failure: log loudly and evict all prior conversation state
|
||||
# (keeping only the system message) to avoid getting stuck in
|
||||
# repeated summarization loops.
|
||||
self.logger.error(
|
||||
"Summarization failed to sufficiently reduce context size: "
|
||||
f"post-summarization tokens={self.context_token_estimate}, "
|
||||
f"threshold={trigger_threshold}, context_window={self.context_token_estimate}. "
|
||||
"Evicting all prior messages without a summary to break potential loops.",
|
||||
)
|
||||
|
||||
# if we used the sliding window mode, try to summarize again with the all mode
|
||||
if summarization_mode_used == "sliding_window":
|
||||
# try to summarize again with the all mode
|
||||
summary, compacted_messages = await summarize_all(
|
||||
actor=self.actor,
|
||||
llm_config=self.agent_state.llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
in_context_messages=compacted_messages,
|
||||
agent_id=self.agent_state.id,
|
||||
agent_tags=self.agent_state.tags,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
summarization_mode_used = "all"
|
||||
|
||||
self.context_token_estimate = await count_tokens_with_tools(
|
||||
actor=self.actor,
|
||||
llm_config=self.agent_state.llm_config,
|
||||
messages=compacted_messages,
|
||||
tools=self.agent_state.tools,
|
||||
)
|
||||
|
||||
# final edge case: the system prompt is the cause of the context overflow (raise error)
|
||||
if self.context_token_estimate is not None and self.context_token_estimate >= trigger_threshold:
|
||||
await self._check_for_system_prompt_overflow(compacted_messages[0])
|
||||
|
||||
# raise an error if this is STILL not the problem
|
||||
# do not throw an error, since we don't want to brick the agent
|
||||
self.logger.error(
|
||||
f"Failed to summarize messages after hard eviction and checking the system prompt token estimate: {self.context_token_estimate} > {trigger_threshold}"
|
||||
)
|
||||
else:
|
||||
self.logger.info(
|
||||
f"Summarization fallback succeeded in bringing the context size below the trigger threshold: {self.context_token_estimate} < {trigger_threshold}"
|
||||
)
|
||||
|
||||
# Build compaction stats if we have the before values
|
||||
# Note: messages_count_after = len(compacted_messages) + 1 because final_messages
|
||||
# will be: [system] + [summary_message] + compacted_messages[1:]
|
||||
compaction_stats = None
|
||||
if trigger and context_tokens_before is not None and messages_count_before is not None:
|
||||
compaction_stats = {
|
||||
"trigger": trigger,
|
||||
"context_tokens_before": context_tokens_before,
|
||||
"context_tokens_after": self.context_token_estimate,
|
||||
"context_window": self.agent_state.llm_config.context_window,
|
||||
"messages_count_before": messages_count_before,
|
||||
"messages_count_after": len(compacted_messages) + 1,
|
||||
}
|
||||
|
||||
# Persist the summary message to DB
|
||||
summary_message_str_packed = package_summarize_message_no_counts(
|
||||
summary=summary,
|
||||
agent_id=self.agent_state.id,
|
||||
agent_llm_config=self.agent_state.llm_config,
|
||||
messages=messages,
|
||||
timezone=self.agent_state.timezone,
|
||||
compaction_stats=compaction_stats,
|
||||
compaction_settings=effective_compaction_settings,
|
||||
agent_model_handle=self.agent_state.model,
|
||||
agent_tags=self.agent_state.tags,
|
||||
tools=self.agent_state.tools,
|
||||
trigger_threshold=trigger_threshold,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
use_summary_role=use_summary_role,
|
||||
trigger=trigger,
|
||||
context_tokens_before=context_tokens_before,
|
||||
messages_count_before=messages_count_before,
|
||||
)
|
||||
|
||||
if use_summary_role:
|
||||
# New behavior: Create Message directly with role=summary
|
||||
# (bypassing MessageCreate which only accepts user/system/assistant roles)
|
||||
summary_message_obj = Message(
|
||||
role=MessageRole.summary,
|
||||
content=[TextContent(text=summary_message_str_packed)],
|
||||
agent_id=self.agent_state.id,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
else:
|
||||
# Legacy behavior: Use convert_message_creates_to_messages with role=user
|
||||
summary_messages = await convert_message_creates_to_messages(
|
||||
message_creates=[
|
||||
MessageCreate(
|
||||
role=MessageRole.user,
|
||||
content=[TextContent(text=summary_message_str_packed)],
|
||||
)
|
||||
],
|
||||
agent_id=self.agent_state.id,
|
||||
timezone=self.agent_state.timezone,
|
||||
# We already packed, don't pack again
|
||||
wrap_user_message=False,
|
||||
wrap_system_message=False,
|
||||
run_id=run_id,
|
||||
)
|
||||
if not len(summary_messages) == 1:
|
||||
self.logger.error(f"Expected only one summary message, got {len(summary_messages)} in {summary_messages}")
|
||||
summary_message_obj = summary_messages[0]
|
||||
# Update the agent's context token estimate
|
||||
self.context_token_estimate = result.context_token_estimate
|
||||
|
||||
# final messages: inject summarization message at the beginning
|
||||
final_messages = [compacted_messages[0]] + [summary_message_obj]
|
||||
if len(compacted_messages) > 1:
|
||||
final_messages += compacted_messages[1:]
|
||||
|
||||
return summary_message_obj, final_messages, summary
|
||||
|
||||
async def _build_summarizer_llm_config(
|
||||
self,
|
||||
agent_llm_config: LLMConfig,
|
||||
summarizer_config: CompactionSettings,
|
||||
) -> LLMConfig:
|
||||
"""Derive an LLMConfig for summarization from a model handle.
|
||||
|
||||
This mirrors the agent-creation path: start from the agent's LLMConfig,
|
||||
override provider/model/handle from ``compaction_settings.model``, and
|
||||
then apply any explicit ``compaction_settings.model_settings`` via
|
||||
``_to_legacy_config_params``.
|
||||
"""
|
||||
|
||||
# If no summarizer model handle is provided, fall back to the agent's config
|
||||
if not summarizer_config.model:
|
||||
return agent_llm_config
|
||||
|
||||
try:
|
||||
# Parse provider/model from the handle, falling back to the agent's
|
||||
# provider type when only a model name is given.
|
||||
if "/" in summarizer_config.model:
|
||||
provider_name, model_name = summarizer_config.model.split("/", 1)
|
||||
else:
|
||||
provider_name = agent_llm_config.provider_name
|
||||
model_name = summarizer_config.model
|
||||
|
||||
# Start from the agent's config and override model + provider_name + handle
|
||||
# Check if the summarizer's provider matches the agent's provider
|
||||
# If they match, we can safely use the agent's config as a base
|
||||
# If they don't match, we need to load the default config for the new provider
|
||||
from letta.schemas.enums import ProviderType
|
||||
|
||||
provider_matches = False
|
||||
try:
|
||||
# Check if provider_name is a valid ProviderType that matches agent's endpoint type
|
||||
provider_type = ProviderType(provider_name)
|
||||
provider_matches = provider_type.value == agent_llm_config.model_endpoint_type
|
||||
except ValueError:
|
||||
# provider_name is a custom label - check if it matches agent's provider_name
|
||||
provider_matches = provider_name == agent_llm_config.provider_name
|
||||
|
||||
if provider_matches:
|
||||
# Same provider - use agent's config as base and override model/handle
|
||||
base = agent_llm_config.model_copy()
|
||||
base.model = model_name
|
||||
base.handle = summarizer_config.model
|
||||
else:
|
||||
# Different provider - load default config for this handle
|
||||
from letta.services.provider_manager import ProviderManager
|
||||
|
||||
provider_manager = ProviderManager()
|
||||
try:
|
||||
base = await provider_manager.get_llm_config_from_handle(
|
||||
handle=summarizer_config.model,
|
||||
actor=self.actor,
|
||||
)
|
||||
except Exception as e:
|
||||
self.logger.warning(
|
||||
f"Failed to load LLM config for summarizer handle '{summarizer_config.model}': {e}. "
|
||||
f"Falling back to agent's LLM config."
|
||||
)
|
||||
return agent_llm_config
|
||||
|
||||
# If explicit model_settings are provided for the summarizer, apply
|
||||
# them just like server.create_agent_async does for agents.
|
||||
if summarizer_config.model_settings is not None:
|
||||
update_params = summarizer_config.model_settings._to_legacy_config_params()
|
||||
return base.model_copy(update=update_params)
|
||||
|
||||
return base
|
||||
except Exception:
|
||||
# On any error, do not break the agent – just fall back
|
||||
return agent_llm_config
|
||||
return result.summary_message, result.compacted_messages, result.summary_text
|
||||
|
||||
335
letta/services/summarizer/compact.py
Normal file
335
letta/services/summarizer/compact.py
Normal file
@@ -0,0 +1,335 @@
|
||||
"""Standalone compaction functions for message summarization."""
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Optional
|
||||
|
||||
from letta.helpers.message_helper import convert_message_creates_to_messages
|
||||
from letta.log import get_logger
|
||||
from letta.otel.tracing import trace_method
|
||||
from letta.schemas.enums import MessageRole
|
||||
from letta.schemas.letta_message_content import TextContent
|
||||
from letta.schemas.llm_config import LLMConfig
|
||||
from letta.schemas.message import Message, MessageCreate
|
||||
from letta.schemas.tool import Tool
|
||||
from letta.schemas.user import User
|
||||
from letta.services.summarizer.summarizer_all import summarize_all
|
||||
from letta.services.summarizer.summarizer_config import CompactionSettings
|
||||
from letta.services.summarizer.summarizer_sliding_window import (
|
||||
count_tokens,
|
||||
count_tokens_with_tools,
|
||||
summarize_via_sliding_window,
|
||||
)
|
||||
from letta.system import package_summarize_message_no_counts
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class CompactResult:
|
||||
"""Result of a compaction operation."""
|
||||
|
||||
summary_message: Message
|
||||
compacted_messages: list[Message]
|
||||
summary_text: str
|
||||
context_token_estimate: Optional[int]
|
||||
|
||||
|
||||
async def build_summarizer_llm_config(
|
||||
agent_llm_config: LLMConfig,
|
||||
summarizer_config: CompactionSettings,
|
||||
actor: User,
|
||||
) -> LLMConfig:
|
||||
"""Derive an LLMConfig for summarization from a model handle.
|
||||
|
||||
This mirrors the agent-creation path: start from the agent's LLMConfig,
|
||||
override provider/model/handle from ``compaction_settings.model``, and
|
||||
then apply any explicit ``compaction_settings.model_settings`` via
|
||||
``_to_legacy_config_params``.
|
||||
|
||||
Args:
|
||||
agent_llm_config: The agent's LLM configuration to use as base.
|
||||
summarizer_config: Compaction settings with optional model override.
|
||||
actor: The user performing the operation.
|
||||
|
||||
Returns:
|
||||
LLMConfig configured for summarization.
|
||||
"""
|
||||
# If no summarizer model handle is provided, fall back to the agent's config
|
||||
if not summarizer_config.model:
|
||||
return agent_llm_config
|
||||
|
||||
try:
|
||||
# Parse provider/model from the handle, falling back to the agent's
|
||||
# provider type when only a model name is given.
|
||||
if "/" in summarizer_config.model:
|
||||
provider_name, model_name = summarizer_config.model.split("/", 1)
|
||||
else:
|
||||
provider_name = agent_llm_config.provider_name
|
||||
model_name = summarizer_config.model
|
||||
|
||||
# Start from the agent's config and override model + provider_name + handle
|
||||
# Check if the summarizer's provider matches the agent's provider
|
||||
# If they match, we can safely use the agent's config as a base
|
||||
# If they don't match, we need to load the default config for the new provider
|
||||
from letta.schemas.enums import ProviderType
|
||||
|
||||
provider_matches = False
|
||||
try:
|
||||
# Check if provider_name is a valid ProviderType that matches agent's endpoint type
|
||||
provider_type = ProviderType(provider_name)
|
||||
provider_matches = provider_type.value == agent_llm_config.model_endpoint_type
|
||||
except ValueError:
|
||||
# provider_name is a custom label - check if it matches agent's provider_name
|
||||
provider_matches = provider_name == agent_llm_config.provider_name
|
||||
|
||||
if provider_matches:
|
||||
# Same provider - use agent's config as base and override model/handle
|
||||
base = agent_llm_config.model_copy()
|
||||
base.model = model_name
|
||||
base.handle = summarizer_config.model
|
||||
else:
|
||||
# Different provider - load default config for this handle
|
||||
from letta.services.provider_manager import ProviderManager
|
||||
|
||||
provider_manager = ProviderManager()
|
||||
try:
|
||||
base = await provider_manager.get_llm_config_from_handle(
|
||||
handle=summarizer_config.model,
|
||||
actor=actor,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
f"Failed to load LLM config for summarizer handle '{summarizer_config.model}': {e}. Falling back to agent's LLM config."
|
||||
)
|
||||
return agent_llm_config
|
||||
|
||||
# If explicit model_settings are provided for the summarizer, apply
|
||||
# them just like server.create_agent_async does for agents.
|
||||
if summarizer_config.model_settings is not None:
|
||||
update_params = summarizer_config.model_settings._to_legacy_config_params()
|
||||
return base.model_copy(update=update_params)
|
||||
|
||||
return base
|
||||
except Exception:
|
||||
# On any error, do not break the agent – just fall back
|
||||
return agent_llm_config
|
||||
|
||||
|
||||
@trace_method
|
||||
async def compact_messages(
|
||||
actor: User,
|
||||
agent_id: str,
|
||||
agent_llm_config: LLMConfig,
|
||||
messages: List[Message],
|
||||
timezone: str,
|
||||
compaction_settings: Optional[CompactionSettings] = None,
|
||||
agent_model_handle: Optional[str] = None,
|
||||
agent_tags: Optional[List[str]] = None,
|
||||
tools: Optional[List[Tool]] = None,
|
||||
trigger_threshold: Optional[int] = None,
|
||||
run_id: Optional[str] = None,
|
||||
step_id: Optional[str] = None,
|
||||
use_summary_role: bool = True,
|
||||
trigger: Optional[str] = None,
|
||||
context_tokens_before: Optional[int] = None,
|
||||
messages_count_before: Optional[int] = None,
|
||||
) -> CompactResult:
|
||||
"""Compact in-context messages using summarization.
|
||||
|
||||
Args:
|
||||
actor: The user performing the operation.
|
||||
agent_id: The agent's ID.
|
||||
agent_llm_config: The agent's LLM configuration.
|
||||
messages: The in-context messages to compact.
|
||||
timezone: The agent's timezone for message formatting.
|
||||
compaction_settings: Optional compaction settings override.
|
||||
agent_model_handle: The agent's model handle (used if compaction_settings is None).
|
||||
agent_tags: The agent's tags for telemetry.
|
||||
tools: The agent's tools (for token counting).
|
||||
trigger_threshold: If provided, verify context stays below this after compaction.
|
||||
run_id: Optional run ID for telemetry.
|
||||
step_id: Optional step ID for telemetry.
|
||||
use_summary_role: If True, create summary message with role=summary.
|
||||
trigger: What triggered the compaction (for stats).
|
||||
context_tokens_before: Token count before compaction (for stats).
|
||||
messages_count_before: Message count before compaction (for stats).
|
||||
|
||||
Returns:
|
||||
CompactResult containing the summary message, compacted messages, summary text,
|
||||
and updated context token estimate.
|
||||
"""
|
||||
# Determine compaction settings
|
||||
if compaction_settings is not None:
|
||||
summarizer_config = compaction_settings
|
||||
elif agent_model_handle is not None:
|
||||
summarizer_config = CompactionSettings(model=agent_model_handle)
|
||||
else:
|
||||
# Fall back to deriving from llm_config
|
||||
handle = agent_llm_config.handle or f"{agent_llm_config.model_endpoint_type}/{agent_llm_config.model}"
|
||||
summarizer_config = CompactionSettings(model=handle)
|
||||
|
||||
# Build the LLMConfig used for summarization
|
||||
summarizer_llm_config = await build_summarizer_llm_config(
|
||||
agent_llm_config=agent_llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
actor=actor,
|
||||
)
|
||||
|
||||
summarization_mode_used = summarizer_config.mode
|
||||
if summarizer_config.mode == "all":
|
||||
summary, compacted_messages = await summarize_all(
|
||||
actor=actor,
|
||||
llm_config=summarizer_llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
in_context_messages=messages,
|
||||
agent_id=agent_id,
|
||||
agent_tags=agent_tags,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
elif summarizer_config.mode == "sliding_window":
|
||||
try:
|
||||
summary, compacted_messages = await summarize_via_sliding_window(
|
||||
actor=actor,
|
||||
llm_config=summarizer_llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
in_context_messages=messages,
|
||||
agent_id=agent_id,
|
||||
agent_tags=agent_tags,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error(f"Sliding window summarization failed with exception: {str(e)}. Falling back to all mode.")
|
||||
summary, compacted_messages = await summarize_all(
|
||||
actor=actor,
|
||||
llm_config=summarizer_llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
in_context_messages=messages,
|
||||
agent_id=agent_id,
|
||||
agent_tags=agent_tags,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
summarization_mode_used = "all"
|
||||
else:
|
||||
raise ValueError(f"Invalid summarizer mode: {summarizer_config.mode}")
|
||||
|
||||
# Update the token count (including tools for accurate comparison with LLM's prompt_tokens)
|
||||
context_token_estimate = await count_tokens_with_tools(
|
||||
actor=actor,
|
||||
llm_config=agent_llm_config,
|
||||
messages=compacted_messages,
|
||||
tools=tools or [],
|
||||
)
|
||||
logger.info(f"Context token estimate after summarization: {context_token_estimate}")
|
||||
|
||||
# If the trigger_threshold is provided, verify the new token count is below it
|
||||
if trigger_threshold is not None and context_token_estimate is not None and context_token_estimate >= trigger_threshold:
|
||||
logger.error(
|
||||
"Summarization failed to sufficiently reduce context size: "
|
||||
f"post-summarization tokens={context_token_estimate}, "
|
||||
f"threshold={trigger_threshold}. "
|
||||
"Attempting fallback strategies.",
|
||||
)
|
||||
|
||||
# If we used the sliding window mode, try to summarize again with the all mode
|
||||
if summarization_mode_used == "sliding_window":
|
||||
summary, compacted_messages = await summarize_all(
|
||||
actor=actor,
|
||||
llm_config=agent_llm_config,
|
||||
summarizer_config=summarizer_config,
|
||||
in_context_messages=compacted_messages,
|
||||
agent_id=agent_id,
|
||||
agent_tags=agent_tags,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
summarization_mode_used = "all"
|
||||
|
||||
context_token_estimate = await count_tokens_with_tools(
|
||||
actor=actor,
|
||||
llm_config=agent_llm_config,
|
||||
messages=compacted_messages,
|
||||
tools=tools or [],
|
||||
)
|
||||
|
||||
# Final edge case: check if we're still over threshold
|
||||
if context_token_estimate is not None and context_token_estimate >= trigger_threshold:
|
||||
# Check if system prompt is the cause
|
||||
system_prompt_token_estimate = await count_tokens(
|
||||
actor=actor,
|
||||
llm_config=agent_llm_config,
|
||||
messages=[compacted_messages[0]],
|
||||
)
|
||||
if system_prompt_token_estimate is not None and system_prompt_token_estimate >= agent_llm_config.context_window:
|
||||
from letta.errors import SystemPromptTokenExceededError
|
||||
|
||||
raise SystemPromptTokenExceededError(
|
||||
system_prompt_token_estimate=system_prompt_token_estimate,
|
||||
context_window=agent_llm_config.context_window,
|
||||
)
|
||||
|
||||
# Log error but don't brick the agent
|
||||
logger.error(f"Failed to summarize messages after fallback: {context_token_estimate} > {trigger_threshold}")
|
||||
else:
|
||||
logger.info(f"Summarization fallback succeeded: {context_token_estimate} < {trigger_threshold}")
|
||||
|
||||
# Build compaction stats if we have the before values
|
||||
compaction_stats = None
|
||||
if trigger and context_tokens_before is not None and messages_count_before is not None:
|
||||
compaction_stats = {
|
||||
"trigger": trigger,
|
||||
"context_tokens_before": context_tokens_before,
|
||||
"context_tokens_after": context_token_estimate,
|
||||
"context_window": agent_llm_config.context_window,
|
||||
"messages_count_before": messages_count_before,
|
||||
"messages_count_after": len(compacted_messages) + 1,
|
||||
}
|
||||
|
||||
# Create the summary message
|
||||
summary_message_str_packed = package_summarize_message_no_counts(
|
||||
summary=summary,
|
||||
timezone=timezone,
|
||||
compaction_stats=compaction_stats,
|
||||
)
|
||||
|
||||
if use_summary_role:
|
||||
# New behavior: Create Message directly with role=summary
|
||||
summary_message_obj = Message(
|
||||
role=MessageRole.summary,
|
||||
content=[TextContent(text=summary_message_str_packed)],
|
||||
agent_id=agent_id,
|
||||
run_id=run_id,
|
||||
step_id=step_id,
|
||||
)
|
||||
else:
|
||||
# Legacy behavior: Use convert_message_creates_to_messages with role=user
|
||||
summary_messages = await convert_message_creates_to_messages(
|
||||
message_creates=[
|
||||
MessageCreate(
|
||||
role=MessageRole.user,
|
||||
content=[TextContent(text=summary_message_str_packed)],
|
||||
)
|
||||
],
|
||||
agent_id=agent_id,
|
||||
timezone=timezone,
|
||||
wrap_user_message=False,
|
||||
wrap_system_message=False,
|
||||
run_id=run_id,
|
||||
)
|
||||
if len(summary_messages) != 1:
|
||||
logger.error(f"Expected only one summary message, got {len(summary_messages)}")
|
||||
summary_message_obj = summary_messages[0]
|
||||
|
||||
# Build final messages: [system] + [summary] + remaining compacted messages
|
||||
final_messages = [compacted_messages[0], summary_message_obj]
|
||||
if len(compacted_messages) > 1:
|
||||
final_messages += compacted_messages[1:]
|
||||
|
||||
return CompactResult(
|
||||
summary_message=summary_message_obj,
|
||||
compacted_messages=final_messages,
|
||||
summary_text=summary,
|
||||
context_token_estimate=context_token_estimate,
|
||||
)
|
||||
@@ -1046,10 +1046,10 @@ async def test_v3_summarize_hard_eviction_when_still_over_threshold(
|
||||
# summarize_conversation_history to run and then hit the branch where the
|
||||
# *post*-summarization token count is still above the proactive
|
||||
# summarization threshold. We simulate that by patching the
|
||||
# letta_agent_v3-level count_tokens_with_tools helper to report an extremely large
|
||||
# count_tokens_with_tools helper to report an extremely large
|
||||
# token count for the first call (post-summary) and a small count for the
|
||||
# second call (after hard eviction).
|
||||
with patch("letta.agents.letta_agent_v3.count_tokens_with_tools") as mock_count_tokens:
|
||||
with patch("letta.services.summarizer.compact.count_tokens_with_tools") as mock_count_tokens:
|
||||
# First call: pretend the summarized context is still huge relative to
|
||||
# this model's context window so that we always trigger the
|
||||
# hard-eviction path. Second call: minimal context (system only) is
|
||||
|
||||
@@ -314,12 +314,12 @@ async def test_compaction_settings_model_uses_separate_llm_config_for_summarizat
|
||||
the LLMConfig used for the summarizer request.
|
||||
"""
|
||||
|
||||
from letta.agents.letta_agent_v3 import LettaAgentV3
|
||||
from letta.schemas.agent import AgentState as PydanticAgentState
|
||||
from letta.schemas.enums import AgentType, MessageRole
|
||||
from letta.schemas.memory import Memory
|
||||
from letta.schemas.message import Message as PydanticMessage
|
||||
from letta.schemas.model import OpenAIModelSettings, OpenAIReasoning
|
||||
from letta.services.summarizer.compact import build_summarizer_llm_config
|
||||
|
||||
# Base agent LLM config
|
||||
base_llm_config = LLMConfig.default_config("gpt-4o-mini")
|
||||
@@ -406,16 +406,11 @@ async def test_compaction_settings_model_uses_separate_llm_config_for_summarizat
|
||||
tool_rules=None,
|
||||
)
|
||||
|
||||
# Create a mock agent instance to call the instance method
|
||||
mock_agent = Mock(spec=LettaAgentV3)
|
||||
mock_agent.actor = default_user
|
||||
mock_agent.logger = Mock()
|
||||
|
||||
# Use the instance method to derive summarizer llm_config
|
||||
summarizer_llm_config = await LettaAgentV3._build_summarizer_llm_config(
|
||||
mock_agent,
|
||||
# Use the shared function to derive summarizer llm_config
|
||||
summarizer_llm_config = await build_summarizer_llm_config(
|
||||
agent_llm_config=agent_state.llm_config,
|
||||
summarizer_config=agent_state.compaction_settings,
|
||||
actor=default_user,
|
||||
)
|
||||
|
||||
# Agent model remains the base model
|
||||
|
||||
Reference in New Issue
Block a user