refactor: extract compact logic to shared function for temporal (#9249)

* refactor: extract compact logic to shared function

Extract the compaction logic from LettaAgentV3.compact() into a
standalone compact_messages() function that can be shared between
the agent and temporal workflows.

Changes:
- Create apps/core/letta/services/summarizer/compact.py with:
  - compact_messages(): Core compaction logic
  - build_summarizer_llm_config(): LLM config builder for summarization
  - CompactResult: Dataclass for compaction results
- Update LettaAgentV3.compact() to use compact_messages()
- Update temporal summarize_conversation_history activity to use
  compact_messages() instead of the old Summarizer class
- Add use_summary_role parameter to SummarizeParams

This ensures consistent summarization behavior across different
execution paths and prevents drift as we improve the implementation.

* chore: clean up verbose comments

* fix: correct CompactionSettings import path

* fix: correct count_tokens import from summarizer_sliding_window

* fix: update test patch path for count_tokens_with_tools

After extracting compact logic to compact.py, the test was patching
the old location. Update the patch path to the new module location.

* fix: update test to use build_summarizer_llm_config from compact.py

The function was moved from LettaAgentV3._build_summarizer_llm_config
to compact.py as a standalone function.

* fix: add early check for system prompt size in compact_messages

Check if the system prompt alone exceeds the context window before
attempting summarization. The system prompt cannot be compacted,
so fail fast with SystemPromptTokenExceededError.

* fix: properly propagate SystemPromptTokenExceededError from compact

The exception handler in _step() was not setting the correct stop_reason
for SystemPromptTokenExceededError, which caused the finally block to
return early and swallow the exception.

Add special handling to set stop_reason to context_window_overflow_in_system_prompt
when SystemPromptTokenExceededError is caught.

* revert: remove redundant SystemPromptTokenExceededError handling

The special handling in the outer exception handler is redundant because
stop_reason is already set in the inner handler at line 943. The actual
fix for the test was the early check in compact_messages(), not this
redundant handling.

* fix: correctly re-raise SystemPromptTokenExceededError

The inner exception handler was using 'raise e' which re-raised the outer
ContextWindowExceededError instead of the current SystemPromptTokenExceededError.

Changed to 'raise' to correctly re-raise the current exception. This bug
was pre-existing but masked because _check_for_system_prompt_overflow was
only called as a fallback. The new early check in compact_messages() exposed it.

* revert: remove early check and restore raise e to match main behavior

* fix: set should_continue=False and correctly re-raise exception

- Add should_continue=False in SystemPromptTokenExceededError handler (matching main's _check_for_system_prompt_overflow behavior)
- Fix raise e -> raise to correctly propagate SystemPromptTokenExceededError

Note: test_large_system_prompt_summarization still fails locally but passes on main.
Need to investigate why exception isn't propagating correctly on refactored branch.

* fix: add SystemPromptTokenExceededError handler for post-step compaction

The post-step compaction (line 1066) was missing a SystemPromptTokenExceededError
exception handler. When compact_messages() raised this error, it would be caught
by the outer exception handler which would:
1. Set stop_reason to "error" instead of "context_window_overflow_in_system_prompt"
2. Not set should_continue = False
3. Get swallowed by the finally block (line 1126) which returns early

This caused test_large_system_prompt_summarization to fail because the exception
never propagated to the test.

The fix adds the same exception handler pattern used in the retry compaction flow
(line 941-946), ensuring proper state is set before re-raising.

This issue only affected the refactored code because on main, _check_for_system_prompt_overflow()
was an instance method that set should_continue/stop_reason BEFORE raising. In the refactor,
compact_messages() is a standalone function that cannot set instance state, so the caller
must handle the exception and set the state.
This commit is contained in:
jnjpng
2026-02-03 12:34:25 -08:00
committed by Caren Thomas
parent 74dc6225ba
commit f48b60634f
4 changed files with 396 additions and 291 deletions

View File

@@ -1046,10 +1046,10 @@ async def test_v3_summarize_hard_eviction_when_still_over_threshold(
# summarize_conversation_history to run and then hit the branch where the
# *post*-summarization token count is still above the proactive
# summarization threshold. We simulate that by patching the
# letta_agent_v3-level count_tokens_with_tools helper to report an extremely large
# count_tokens_with_tools helper to report an extremely large
# token count for the first call (post-summary) and a small count for the
# second call (after hard eviction).
with patch("letta.agents.letta_agent_v3.count_tokens_with_tools") as mock_count_tokens:
with patch("letta.services.summarizer.compact.count_tokens_with_tools") as mock_count_tokens:
# First call: pretend the summarized context is still huge relative to
# this model's context window so that we always trigger the
# hard-eviction path. Second call: minimal context (system only) is

View File

@@ -314,12 +314,12 @@ async def test_compaction_settings_model_uses_separate_llm_config_for_summarizat
the LLMConfig used for the summarizer request.
"""
from letta.agents.letta_agent_v3 import LettaAgentV3
from letta.schemas.agent import AgentState as PydanticAgentState
from letta.schemas.enums import AgentType, MessageRole
from letta.schemas.memory import Memory
from letta.schemas.message import Message as PydanticMessage
from letta.schemas.model import OpenAIModelSettings, OpenAIReasoning
from letta.services.summarizer.compact import build_summarizer_llm_config
# Base agent LLM config
base_llm_config = LLMConfig.default_config("gpt-4o-mini")
@@ -406,16 +406,11 @@ async def test_compaction_settings_model_uses_separate_llm_config_for_summarizat
tool_rules=None,
)
# Create a mock agent instance to call the instance method
mock_agent = Mock(spec=LettaAgentV3)
mock_agent.actor = default_user
mock_agent.logger = Mock()
# Use the instance method to derive summarizer llm_config
summarizer_llm_config = await LettaAgentV3._build_summarizer_llm_config(
mock_agent,
# Use the shared function to derive summarizer llm_config
summarizer_llm_config = await build_summarizer_llm_config(
agent_llm_config=agent_state.llm_config,
summarizer_config=agent_state.compaction_settings,
actor=default_user,
)
# Agent model remains the base model