refactor: extract compact logic to shared function for temporal (#9249)

* refactor: extract compact logic to shared function

Extract the compaction logic from LettaAgentV3.compact() into a
standalone compact_messages() function that can be shared between
the agent and temporal workflows.

Changes:
- Create apps/core/letta/services/summarizer/compact.py with:
  - compact_messages(): Core compaction logic
  - build_summarizer_llm_config(): LLM config builder for summarization
  - CompactResult: Dataclass for compaction results
- Update LettaAgentV3.compact() to use compact_messages()
- Update temporal summarize_conversation_history activity to use
  compact_messages() instead of the old Summarizer class
- Add use_summary_role parameter to SummarizeParams

This ensures consistent summarization behavior across different
execution paths and prevents drift as we improve the implementation.

* chore: clean up verbose comments

* fix: correct CompactionSettings import path

* fix: correct count_tokens import from summarizer_sliding_window

* fix: update test patch path for count_tokens_with_tools

After extracting compact logic to compact.py, the test was patching
the old location. Update the patch path to the new module location.

* fix: update test to use build_summarizer_llm_config from compact.py

The function was moved from LettaAgentV3._build_summarizer_llm_config
to compact.py as a standalone function.

* fix: add early check for system prompt size in compact_messages

Check if the system prompt alone exceeds the context window before
attempting summarization. The system prompt cannot be compacted,
so fail fast with SystemPromptTokenExceededError.

* fix: properly propagate SystemPromptTokenExceededError from compact

The exception handler in _step() was not setting the correct stop_reason
for SystemPromptTokenExceededError, which caused the finally block to
return early and swallow the exception.

Add special handling to set stop_reason to context_window_overflow_in_system_prompt
when SystemPromptTokenExceededError is caught.

* revert: remove redundant SystemPromptTokenExceededError handling

The special handling in the outer exception handler is redundant because
stop_reason is already set in the inner handler at line 943. The actual
fix for the test was the early check in compact_messages(), not this
redundant handling.

* fix: correctly re-raise SystemPromptTokenExceededError

The inner exception handler was using 'raise e' which re-raised the outer
ContextWindowExceededError instead of the current SystemPromptTokenExceededError.

Changed to 'raise' to correctly re-raise the current exception. This bug
was pre-existing but masked because _check_for_system_prompt_overflow was
only called as a fallback. The new early check in compact_messages() exposed it.

* revert: remove early check and restore raise e to match main behavior

* fix: set should_continue=False and correctly re-raise exception

- Add should_continue=False in SystemPromptTokenExceededError handler (matching main's _check_for_system_prompt_overflow behavior)
- Fix raise e -> raise to correctly propagate SystemPromptTokenExceededError

Note: test_large_system_prompt_summarization still fails locally but passes on main.
Need to investigate why exception isn't propagating correctly on refactored branch.

* fix: add SystemPromptTokenExceededError handler for post-step compaction

The post-step compaction (line 1066) was missing a SystemPromptTokenExceededError
exception handler. When compact_messages() raised this error, it would be caught
by the outer exception handler which would:
1. Set stop_reason to "error" instead of "context_window_overflow_in_system_prompt"
2. Not set should_continue = False
3. Get swallowed by the finally block (line 1126) which returns early

This caused test_large_system_prompt_summarization to fail because the exception
never propagated to the test.

The fix adds the same exception handler pattern used in the retry compaction flow
(line 941-946), ensuring proper state is set before re-raising.

This issue only affected the refactored code because on main, _check_for_system_prompt_overflow()
was an instance method that set should_continue/stop_reason BEFORE raising. In the refactor,
compact_messages() is a standalone function that cannot set instance state, so the caller
must handle the exception and set the state.
This commit is contained in:
jnjpng
2026-02-03 12:34:25 -08:00
committed by Caren Thomas
parent 74dc6225ba
commit f48b60634f
4 changed files with 396 additions and 291 deletions

View File

@@ -59,13 +59,9 @@ from letta.server.rest_api.utils import (
)
from letta.services.conversation_manager import ConversationManager
from letta.services.helpers.tool_parser_helper import runtime_override_tool_json_schema
from letta.services.summarizer.summarizer_all import summarize_all
from letta.services.summarizer.compact import compact_messages
from letta.services.summarizer.summarizer_config import CompactionSettings
from letta.services.summarizer.summarizer_sliding_window import (
count_tokens,
count_tokens_with_tools,
summarize_via_sliding_window,
)
from letta.services.summarizer.summarizer_sliding_window import count_tokens
from letta.settings import settings, summarizer_settings
from letta.system import package_function_response, package_summarize_message_no_counts
from letta.utils import log_telemetry, validate_function_response
@@ -943,10 +939,11 @@ class LettaAgentV3(LettaAgentV2):
continue
except SystemPromptTokenExceededError:
self.should_continue = False
self.stop_reason = LettaStopReason(
stop_reason=StopReasonType.context_window_overflow_in_system_prompt.value
)
raise e
raise
except Exception as e:
self.stop_reason = LettaStopReason(stop_reason=StopReasonType.error.value)
self.logger.error(f"Unknown error occured for summarization run {run_id}: {e}")
@@ -1065,34 +1062,39 @@ class LettaAgentV3(LettaAgentV2):
trigger="post_step_context_check",
)
summary_message, messages, summary_text = await self.compact(
messages,
trigger_threshold=self.agent_state.llm_config.context_window,
run_id=run_id,
step_id=step_id,
use_summary_role=include_compaction_messages,
trigger="post_step_context_check",
context_tokens_before=context_tokens_before,
messages_count_before=messages_count_before,
)
self.response_messages.append(summary_message)
try:
summary_message, messages, summary_text = await self.compact(
messages,
trigger_threshold=self.agent_state.llm_config.context_window,
run_id=run_id,
step_id=step_id,
use_summary_role=include_compaction_messages,
trigger="post_step_context_check",
context_tokens_before=context_tokens_before,
messages_count_before=messages_count_before,
)
self.response_messages.append(summary_message)
# Yield summary result message to client
for msg in self._create_summary_result_message(
summary_message=summary_message,
summary_text=summary_text,
step_id=step_id,
run_id=run_id,
include_compaction_messages=include_compaction_messages,
):
yield msg
# Yield summary result message to client
for msg in self._create_summary_result_message(
summary_message=summary_message,
summary_text=summary_text,
step_id=step_id,
run_id=run_id,
include_compaction_messages=include_compaction_messages,
):
yield msg
await self._checkpoint_messages(
run_id=run_id,
step_id=step_id,
new_messages=[summary_message],
in_context_messages=messages,
)
await self._checkpoint_messages(
run_id=run_id,
step_id=step_id,
new_messages=[summary_message],
in_context_messages=messages,
)
except SystemPromptTokenExceededError:
self.should_continue = False
self.stop_reason = LettaStopReason(stop_reason=StopReasonType.context_window_overflow_in_system_prompt.value)
raise
except Exception as e:
# NOTE: message persistence does not happen in the case of an exception (rollback to previous state)
@@ -1678,256 +1680,29 @@ class LettaAgentV3(LettaAgentV2):
context_tokens_before: Token count before compaction (for stats).
messages_count_before: Message count before compaction (for stats).
"""
# Determine compaction settings: passed-in > agent's > global defaults
effective_compaction_settings = compaction_settings or self.agent_state.compaction_settings
# Use the passed-in compaction_settings first, then agent's compaction_settings if set,
# otherwise fall back to global defaults based on the agent's model handle.
if compaction_settings is not None:
summarizer_config = compaction_settings
elif self.agent_state.compaction_settings is not None:
summarizer_config = self.agent_state.compaction_settings
else:
# Prefer the new handle field if set, otherwise derive from llm_config
if self.agent_state.model is not None:
handle = self.agent_state.model
else:
llm_cfg = self.agent_state.llm_config
handle = llm_cfg.handle or f"{llm_cfg.model_endpoint_type}/{llm_cfg.model}"
summarizer_config = CompactionSettings(model=handle)
# Build the LLMConfig used for summarization
summarizer_llm_config = await self._build_summarizer_llm_config(
agent_llm_config=self.agent_state.llm_config,
summarizer_config=summarizer_config,
)
summarization_mode_used = summarizer_config.mode
if summarizer_config.mode == "all":
summary, compacted_messages = await summarize_all(
actor=self.actor,
llm_config=summarizer_llm_config,
summarizer_config=summarizer_config,
in_context_messages=messages,
agent_id=self.agent_state.id,
agent_tags=self.agent_state.tags,
run_id=run_id,
step_id=step_id,
)
elif summarizer_config.mode == "sliding_window":
try:
summary, compacted_messages = await summarize_via_sliding_window(
actor=self.actor,
llm_config=summarizer_llm_config,
summarizer_config=summarizer_config,
in_context_messages=messages,
agent_id=self.agent_state.id,
agent_tags=self.agent_state.tags,
run_id=run_id,
step_id=step_id,
)
except Exception as e:
self.logger.error(f"Sliding window summarization failed with exception: {str(e)}. Falling back to all mode.")
summary, compacted_messages = await summarize_all(
actor=self.actor,
llm_config=summarizer_llm_config,
summarizer_config=summarizer_config,
in_context_messages=messages,
agent_id=self.agent_state.id,
agent_tags=self.agent_state.tags,
run_id=run_id,
step_id=step_id,
)
summarization_mode_used = "all"
else:
raise ValueError(f"Invalid summarizer mode: {summarizer_config.mode}")
# update the token count (including tools for accurate comparison with LLM's prompt_tokens)
self.context_token_estimate = await count_tokens_with_tools(
result = await compact_messages(
actor=self.actor,
llm_config=self.agent_state.llm_config,
messages=compacted_messages,
tools=self.agent_state.tools,
)
self.logger.info(f"Context token estimate after summarization: {self.context_token_estimate}")
# if the trigger_threshold is provided, we need to make sure that the new token count is below it
if trigger_threshold is not None and self.context_token_estimate is not None and self.context_token_estimate >= trigger_threshold:
# If even after summarization the context is still at or above
# the proactive summarization threshold, treat this as a hard
# failure: log loudly and evict all prior conversation state
# (keeping only the system message) to avoid getting stuck in
# repeated summarization loops.
self.logger.error(
"Summarization failed to sufficiently reduce context size: "
f"post-summarization tokens={self.context_token_estimate}, "
f"threshold={trigger_threshold}, context_window={self.context_token_estimate}. "
"Evicting all prior messages without a summary to break potential loops.",
)
# if we used the sliding window mode, try to summarize again with the all mode
if summarization_mode_used == "sliding_window":
# try to summarize again with the all mode
summary, compacted_messages = await summarize_all(
actor=self.actor,
llm_config=self.agent_state.llm_config,
summarizer_config=summarizer_config,
in_context_messages=compacted_messages,
agent_id=self.agent_state.id,
agent_tags=self.agent_state.tags,
run_id=run_id,
step_id=step_id,
)
summarization_mode_used = "all"
self.context_token_estimate = await count_tokens_with_tools(
actor=self.actor,
llm_config=self.agent_state.llm_config,
messages=compacted_messages,
tools=self.agent_state.tools,
)
# final edge case: the system prompt is the cause of the context overflow (raise error)
if self.context_token_estimate is not None and self.context_token_estimate >= trigger_threshold:
await self._check_for_system_prompt_overflow(compacted_messages[0])
# raise an error if this is STILL not the problem
# do not throw an error, since we don't want to brick the agent
self.logger.error(
f"Failed to summarize messages after hard eviction and checking the system prompt token estimate: {self.context_token_estimate} > {trigger_threshold}"
)
else:
self.logger.info(
f"Summarization fallback succeeded in bringing the context size below the trigger threshold: {self.context_token_estimate} < {trigger_threshold}"
)
# Build compaction stats if we have the before values
# Note: messages_count_after = len(compacted_messages) + 1 because final_messages
# will be: [system] + [summary_message] + compacted_messages[1:]
compaction_stats = None
if trigger and context_tokens_before is not None and messages_count_before is not None:
compaction_stats = {
"trigger": trigger,
"context_tokens_before": context_tokens_before,
"context_tokens_after": self.context_token_estimate,
"context_window": self.agent_state.llm_config.context_window,
"messages_count_before": messages_count_before,
"messages_count_after": len(compacted_messages) + 1,
}
# Persist the summary message to DB
summary_message_str_packed = package_summarize_message_no_counts(
summary=summary,
agent_id=self.agent_state.id,
agent_llm_config=self.agent_state.llm_config,
messages=messages,
timezone=self.agent_state.timezone,
compaction_stats=compaction_stats,
compaction_settings=effective_compaction_settings,
agent_model_handle=self.agent_state.model,
agent_tags=self.agent_state.tags,
tools=self.agent_state.tools,
trigger_threshold=trigger_threshold,
run_id=run_id,
step_id=step_id,
use_summary_role=use_summary_role,
trigger=trigger,
context_tokens_before=context_tokens_before,
messages_count_before=messages_count_before,
)
if use_summary_role:
# New behavior: Create Message directly with role=summary
# (bypassing MessageCreate which only accepts user/system/assistant roles)
summary_message_obj = Message(
role=MessageRole.summary,
content=[TextContent(text=summary_message_str_packed)],
agent_id=self.agent_state.id,
run_id=run_id,
step_id=step_id,
)
else:
# Legacy behavior: Use convert_message_creates_to_messages with role=user
summary_messages = await convert_message_creates_to_messages(
message_creates=[
MessageCreate(
role=MessageRole.user,
content=[TextContent(text=summary_message_str_packed)],
)
],
agent_id=self.agent_state.id,
timezone=self.agent_state.timezone,
# We already packed, don't pack again
wrap_user_message=False,
wrap_system_message=False,
run_id=run_id,
)
if not len(summary_messages) == 1:
self.logger.error(f"Expected only one summary message, got {len(summary_messages)} in {summary_messages}")
summary_message_obj = summary_messages[0]
# Update the agent's context token estimate
self.context_token_estimate = result.context_token_estimate
# final messages: inject summarization message at the beginning
final_messages = [compacted_messages[0]] + [summary_message_obj]
if len(compacted_messages) > 1:
final_messages += compacted_messages[1:]
return summary_message_obj, final_messages, summary
async def _build_summarizer_llm_config(
self,
agent_llm_config: LLMConfig,
summarizer_config: CompactionSettings,
) -> LLMConfig:
"""Derive an LLMConfig for summarization from a model handle.
This mirrors the agent-creation path: start from the agent's LLMConfig,
override provider/model/handle from ``compaction_settings.model``, and
then apply any explicit ``compaction_settings.model_settings`` via
``_to_legacy_config_params``.
"""
# If no summarizer model handle is provided, fall back to the agent's config
if not summarizer_config.model:
return agent_llm_config
try:
# Parse provider/model from the handle, falling back to the agent's
# provider type when only a model name is given.
if "/" in summarizer_config.model:
provider_name, model_name = summarizer_config.model.split("/", 1)
else:
provider_name = agent_llm_config.provider_name
model_name = summarizer_config.model
# Start from the agent's config and override model + provider_name + handle
# Check if the summarizer's provider matches the agent's provider
# If they match, we can safely use the agent's config as a base
# If they don't match, we need to load the default config for the new provider
from letta.schemas.enums import ProviderType
provider_matches = False
try:
# Check if provider_name is a valid ProviderType that matches agent's endpoint type
provider_type = ProviderType(provider_name)
provider_matches = provider_type.value == agent_llm_config.model_endpoint_type
except ValueError:
# provider_name is a custom label - check if it matches agent's provider_name
provider_matches = provider_name == agent_llm_config.provider_name
if provider_matches:
# Same provider - use agent's config as base and override model/handle
base = agent_llm_config.model_copy()
base.model = model_name
base.handle = summarizer_config.model
else:
# Different provider - load default config for this handle
from letta.services.provider_manager import ProviderManager
provider_manager = ProviderManager()
try:
base = await provider_manager.get_llm_config_from_handle(
handle=summarizer_config.model,
actor=self.actor,
)
except Exception as e:
self.logger.warning(
f"Failed to load LLM config for summarizer handle '{summarizer_config.model}': {e}. "
f"Falling back to agent's LLM config."
)
return agent_llm_config
# If explicit model_settings are provided for the summarizer, apply
# them just like server.create_agent_async does for agents.
if summarizer_config.model_settings is not None:
update_params = summarizer_config.model_settings._to_legacy_config_params()
return base.model_copy(update=update_params)
return base
except Exception:
# On any error, do not break the agent just fall back
return agent_llm_config
return result.summary_message, result.compacted_messages, result.summary_text

View File

@@ -0,0 +1,335 @@
"""Standalone compaction functions for message summarization."""
from dataclasses import dataclass
from typing import List, Optional
from letta.helpers.message_helper import convert_message_creates_to_messages
from letta.log import get_logger
from letta.otel.tracing import trace_method
from letta.schemas.enums import MessageRole
from letta.schemas.letta_message_content import TextContent
from letta.schemas.llm_config import LLMConfig
from letta.schemas.message import Message, MessageCreate
from letta.schemas.tool import Tool
from letta.schemas.user import User
from letta.services.summarizer.summarizer_all import summarize_all
from letta.services.summarizer.summarizer_config import CompactionSettings
from letta.services.summarizer.summarizer_sliding_window import (
count_tokens,
count_tokens_with_tools,
summarize_via_sliding_window,
)
from letta.system import package_summarize_message_no_counts
logger = get_logger(__name__)
@dataclass
class CompactResult:
"""Result of a compaction operation."""
summary_message: Message
compacted_messages: list[Message]
summary_text: str
context_token_estimate: Optional[int]
async def build_summarizer_llm_config(
agent_llm_config: LLMConfig,
summarizer_config: CompactionSettings,
actor: User,
) -> LLMConfig:
"""Derive an LLMConfig for summarization from a model handle.
This mirrors the agent-creation path: start from the agent's LLMConfig,
override provider/model/handle from ``compaction_settings.model``, and
then apply any explicit ``compaction_settings.model_settings`` via
``_to_legacy_config_params``.
Args:
agent_llm_config: The agent's LLM configuration to use as base.
summarizer_config: Compaction settings with optional model override.
actor: The user performing the operation.
Returns:
LLMConfig configured for summarization.
"""
# If no summarizer model handle is provided, fall back to the agent's config
if not summarizer_config.model:
return agent_llm_config
try:
# Parse provider/model from the handle, falling back to the agent's
# provider type when only a model name is given.
if "/" in summarizer_config.model:
provider_name, model_name = summarizer_config.model.split("/", 1)
else:
provider_name = agent_llm_config.provider_name
model_name = summarizer_config.model
# Start from the agent's config and override model + provider_name + handle
# Check if the summarizer's provider matches the agent's provider
# If they match, we can safely use the agent's config as a base
# If they don't match, we need to load the default config for the new provider
from letta.schemas.enums import ProviderType
provider_matches = False
try:
# Check if provider_name is a valid ProviderType that matches agent's endpoint type
provider_type = ProviderType(provider_name)
provider_matches = provider_type.value == agent_llm_config.model_endpoint_type
except ValueError:
# provider_name is a custom label - check if it matches agent's provider_name
provider_matches = provider_name == agent_llm_config.provider_name
if provider_matches:
# Same provider - use agent's config as base and override model/handle
base = agent_llm_config.model_copy()
base.model = model_name
base.handle = summarizer_config.model
else:
# Different provider - load default config for this handle
from letta.services.provider_manager import ProviderManager
provider_manager = ProviderManager()
try:
base = await provider_manager.get_llm_config_from_handle(
handle=summarizer_config.model,
actor=actor,
)
except Exception as e:
logger.warning(
f"Failed to load LLM config for summarizer handle '{summarizer_config.model}': {e}. Falling back to agent's LLM config."
)
return agent_llm_config
# If explicit model_settings are provided for the summarizer, apply
# them just like server.create_agent_async does for agents.
if summarizer_config.model_settings is not None:
update_params = summarizer_config.model_settings._to_legacy_config_params()
return base.model_copy(update=update_params)
return base
except Exception:
# On any error, do not break the agent just fall back
return agent_llm_config
@trace_method
async def compact_messages(
actor: User,
agent_id: str,
agent_llm_config: LLMConfig,
messages: List[Message],
timezone: str,
compaction_settings: Optional[CompactionSettings] = None,
agent_model_handle: Optional[str] = None,
agent_tags: Optional[List[str]] = None,
tools: Optional[List[Tool]] = None,
trigger_threshold: Optional[int] = None,
run_id: Optional[str] = None,
step_id: Optional[str] = None,
use_summary_role: bool = True,
trigger: Optional[str] = None,
context_tokens_before: Optional[int] = None,
messages_count_before: Optional[int] = None,
) -> CompactResult:
"""Compact in-context messages using summarization.
Args:
actor: The user performing the operation.
agent_id: The agent's ID.
agent_llm_config: The agent's LLM configuration.
messages: The in-context messages to compact.
timezone: The agent's timezone for message formatting.
compaction_settings: Optional compaction settings override.
agent_model_handle: The agent's model handle (used if compaction_settings is None).
agent_tags: The agent's tags for telemetry.
tools: The agent's tools (for token counting).
trigger_threshold: If provided, verify context stays below this after compaction.
run_id: Optional run ID for telemetry.
step_id: Optional step ID for telemetry.
use_summary_role: If True, create summary message with role=summary.
trigger: What triggered the compaction (for stats).
context_tokens_before: Token count before compaction (for stats).
messages_count_before: Message count before compaction (for stats).
Returns:
CompactResult containing the summary message, compacted messages, summary text,
and updated context token estimate.
"""
# Determine compaction settings
if compaction_settings is not None:
summarizer_config = compaction_settings
elif agent_model_handle is not None:
summarizer_config = CompactionSettings(model=agent_model_handle)
else:
# Fall back to deriving from llm_config
handle = agent_llm_config.handle or f"{agent_llm_config.model_endpoint_type}/{agent_llm_config.model}"
summarizer_config = CompactionSettings(model=handle)
# Build the LLMConfig used for summarization
summarizer_llm_config = await build_summarizer_llm_config(
agent_llm_config=agent_llm_config,
summarizer_config=summarizer_config,
actor=actor,
)
summarization_mode_used = summarizer_config.mode
if summarizer_config.mode == "all":
summary, compacted_messages = await summarize_all(
actor=actor,
llm_config=summarizer_llm_config,
summarizer_config=summarizer_config,
in_context_messages=messages,
agent_id=agent_id,
agent_tags=agent_tags,
run_id=run_id,
step_id=step_id,
)
elif summarizer_config.mode == "sliding_window":
try:
summary, compacted_messages = await summarize_via_sliding_window(
actor=actor,
llm_config=summarizer_llm_config,
summarizer_config=summarizer_config,
in_context_messages=messages,
agent_id=agent_id,
agent_tags=agent_tags,
run_id=run_id,
step_id=step_id,
)
except Exception as e:
logger.error(f"Sliding window summarization failed with exception: {str(e)}. Falling back to all mode.")
summary, compacted_messages = await summarize_all(
actor=actor,
llm_config=summarizer_llm_config,
summarizer_config=summarizer_config,
in_context_messages=messages,
agent_id=agent_id,
agent_tags=agent_tags,
run_id=run_id,
step_id=step_id,
)
summarization_mode_used = "all"
else:
raise ValueError(f"Invalid summarizer mode: {summarizer_config.mode}")
# Update the token count (including tools for accurate comparison with LLM's prompt_tokens)
context_token_estimate = await count_tokens_with_tools(
actor=actor,
llm_config=agent_llm_config,
messages=compacted_messages,
tools=tools or [],
)
logger.info(f"Context token estimate after summarization: {context_token_estimate}")
# If the trigger_threshold is provided, verify the new token count is below it
if trigger_threshold is not None and context_token_estimate is not None and context_token_estimate >= trigger_threshold:
logger.error(
"Summarization failed to sufficiently reduce context size: "
f"post-summarization tokens={context_token_estimate}, "
f"threshold={trigger_threshold}. "
"Attempting fallback strategies.",
)
# If we used the sliding window mode, try to summarize again with the all mode
if summarization_mode_used == "sliding_window":
summary, compacted_messages = await summarize_all(
actor=actor,
llm_config=agent_llm_config,
summarizer_config=summarizer_config,
in_context_messages=compacted_messages,
agent_id=agent_id,
agent_tags=agent_tags,
run_id=run_id,
step_id=step_id,
)
summarization_mode_used = "all"
context_token_estimate = await count_tokens_with_tools(
actor=actor,
llm_config=agent_llm_config,
messages=compacted_messages,
tools=tools or [],
)
# Final edge case: check if we're still over threshold
if context_token_estimate is not None and context_token_estimate >= trigger_threshold:
# Check if system prompt is the cause
system_prompt_token_estimate = await count_tokens(
actor=actor,
llm_config=agent_llm_config,
messages=[compacted_messages[0]],
)
if system_prompt_token_estimate is not None and system_prompt_token_estimate >= agent_llm_config.context_window:
from letta.errors import SystemPromptTokenExceededError
raise SystemPromptTokenExceededError(
system_prompt_token_estimate=system_prompt_token_estimate,
context_window=agent_llm_config.context_window,
)
# Log error but don't brick the agent
logger.error(f"Failed to summarize messages after fallback: {context_token_estimate} > {trigger_threshold}")
else:
logger.info(f"Summarization fallback succeeded: {context_token_estimate} < {trigger_threshold}")
# Build compaction stats if we have the before values
compaction_stats = None
if trigger and context_tokens_before is not None and messages_count_before is not None:
compaction_stats = {
"trigger": trigger,
"context_tokens_before": context_tokens_before,
"context_tokens_after": context_token_estimate,
"context_window": agent_llm_config.context_window,
"messages_count_before": messages_count_before,
"messages_count_after": len(compacted_messages) + 1,
}
# Create the summary message
summary_message_str_packed = package_summarize_message_no_counts(
summary=summary,
timezone=timezone,
compaction_stats=compaction_stats,
)
if use_summary_role:
# New behavior: Create Message directly with role=summary
summary_message_obj = Message(
role=MessageRole.summary,
content=[TextContent(text=summary_message_str_packed)],
agent_id=agent_id,
run_id=run_id,
step_id=step_id,
)
else:
# Legacy behavior: Use convert_message_creates_to_messages with role=user
summary_messages = await convert_message_creates_to_messages(
message_creates=[
MessageCreate(
role=MessageRole.user,
content=[TextContent(text=summary_message_str_packed)],
)
],
agent_id=agent_id,
timezone=timezone,
wrap_user_message=False,
wrap_system_message=False,
run_id=run_id,
)
if len(summary_messages) != 1:
logger.error(f"Expected only one summary message, got {len(summary_messages)}")
summary_message_obj = summary_messages[0]
# Build final messages: [system] + [summary] + remaining compacted messages
final_messages = [compacted_messages[0], summary_message_obj]
if len(compacted_messages) > 1:
final_messages += compacted_messages[1:]
return CompactResult(
summary_message=summary_message_obj,
compacted_messages=final_messages,
summary_text=summary,
context_token_estimate=context_token_estimate,
)

View File

@@ -1046,10 +1046,10 @@ async def test_v3_summarize_hard_eviction_when_still_over_threshold(
# summarize_conversation_history to run and then hit the branch where the
# *post*-summarization token count is still above the proactive
# summarization threshold. We simulate that by patching the
# letta_agent_v3-level count_tokens_with_tools helper to report an extremely large
# count_tokens_with_tools helper to report an extremely large
# token count for the first call (post-summary) and a small count for the
# second call (after hard eviction).
with patch("letta.agents.letta_agent_v3.count_tokens_with_tools") as mock_count_tokens:
with patch("letta.services.summarizer.compact.count_tokens_with_tools") as mock_count_tokens:
# First call: pretend the summarized context is still huge relative to
# this model's context window so that we always trigger the
# hard-eviction path. Second call: minimal context (system only) is

View File

@@ -314,12 +314,12 @@ async def test_compaction_settings_model_uses_separate_llm_config_for_summarizat
the LLMConfig used for the summarizer request.
"""
from letta.agents.letta_agent_v3 import LettaAgentV3
from letta.schemas.agent import AgentState as PydanticAgentState
from letta.schemas.enums import AgentType, MessageRole
from letta.schemas.memory import Memory
from letta.schemas.message import Message as PydanticMessage
from letta.schemas.model import OpenAIModelSettings, OpenAIReasoning
from letta.services.summarizer.compact import build_summarizer_llm_config
# Base agent LLM config
base_llm_config = LLMConfig.default_config("gpt-4o-mini")
@@ -406,16 +406,11 @@ async def test_compaction_settings_model_uses_separate_llm_config_for_summarizat
tool_rules=None,
)
# Create a mock agent instance to call the instance method
mock_agent = Mock(spec=LettaAgentV3)
mock_agent.actor = default_user
mock_agent.logger = Mock()
# Use the instance method to derive summarizer llm_config
summarizer_llm_config = await LettaAgentV3._build_summarizer_llm_config(
mock_agent,
# Use the shared function to derive summarizer llm_config
summarizer_llm_config = await build_summarizer_llm_config(
agent_llm_config=agent_state.llm_config,
summarizer_config=agent_state.compaction_settings,
actor=default_user,
)
# Agent model remains the base model