refactor: extract compact logic to shared function for temporal (#9249)

* refactor: extract compact logic to shared function Extract the compaction logic from LettaAgentV3.compact() into a standalone compact_messages() function that can be shared between the agent and temporal workflows. Changes: - Create apps/core/letta/services/summarizer/compact.py with: - compact_messages(): Core compaction logic - build_summarizer_llm_config(): LLM config builder for summarization - CompactResult: Dataclass for compaction results - Update LettaAgentV3.compact() to use compact_messages() - Update temporal summarize_conversation_history activity to use compact_messages() instead of the old Summarizer class - Add use_summary_role parameter to SummarizeParams This ensures consistent summarization behavior across different execution paths and prevents drift as we improve the implementation. * chore: clean up verbose comments * fix: correct CompactionSettings import path * fix: correct count_tokens import from summarizer_sliding_window * fix: update test patch path for count_tokens_with_tools After extracting compact logic to compact.py, the test was patching the old location. Update the patch path to the new module location. * fix: update test to use build_summarizer_llm_config from compact.py The function was moved from LettaAgentV3._build_summarizer_llm_config to compact.py as a standalone function. * fix: add early check for system prompt size in compact_messages Check if the system prompt alone exceeds the context window before attempting summarization. The system prompt cannot be compacted, so fail fast with SystemPromptTokenExceededError. * fix: properly propagate SystemPromptTokenExceededError from compact The exception handler in _step() was not setting the correct stop_reason for SystemPromptTokenExceededError, which caused the finally block to return early and swallow the exception. Add special handling to set stop_reason to context_window_overflow_in_system_prompt when SystemPromptTokenExceededError is caught. * revert: remove redundant SystemPromptTokenExceededError handling The special handling in the outer exception handler is redundant because stop_reason is already set in the inner handler at line 943. The actual fix for the test was the early check in compact_messages(), not this redundant handling. * fix: correctly re-raise SystemPromptTokenExceededError The inner exception handler was using 'raise e' which re-raised the outer ContextWindowExceededError instead of the current SystemPromptTokenExceededError. Changed to 'raise' to correctly re-raise the current exception. This bug was pre-existing but masked because _check_for_system_prompt_overflow was only called as a fallback. The new early check in compact_messages() exposed it. * revert: remove early check and restore raise e to match main behavior * fix: set should_continue=False and correctly re-raise exception - Add should_continue=False in SystemPromptTokenExceededError handler (matching main's _check_for_system_prompt_overflow behavior) - Fix raise e -> raise to correctly propagate SystemPromptTokenExceededError Note: test_large_system_prompt_summarization still fails locally but passes on main. Need to investigate why exception isn't propagating correctly on refactored branch. * fix: add SystemPromptTokenExceededError handler for post-step compaction The post-step compaction (line 1066) was missing a SystemPromptTokenExceededError exception handler. When compact_messages() raised this error, it would be caught by the outer exception handler which would: 1. Set stop_reason to "error" instead of "context_window_overflow_in_system_prompt" 2. Not set should_continue = False 3. Get swallowed by the finally block (line 1126) which returns early This caused test_large_system_prompt_summarization to fail because the exception never propagated to the test. The fix adds the same exception handler pattern used in the retry compaction flow (line 941-946), ensuring proper state is set before re-raising. This issue only affected the refactored code because on main, _check_for_system_prompt_overflow() was an instance method that set should_continue/stop_reason BEFORE raising. In the refactor, compact_messages() is a standalone function that cannot set instance state, so the caller must handle the exception and set the state.
2026-02-03 12:34:25 -08:00
parent 74dc6225ba
commit f48b60634f
4 changed files with 396 additions and 291 deletions
--- a/tests/integration_test_summarizer.py
+++ b/tests/integration_test_summarizer.py
@@ -1046,10 +1046,10 @@ async def test_v3_summarize_hard_eviction_when_still_over_threshold(
    # summarize_conversation_history to run and then hit the branch where the
    # *post*-summarization token count is still above the proactive
    # summarization threshold. We simulate that by patching the
-    # letta_agent_v3-level count_tokens_with_tools helper to report an extremely large
+    # count_tokens_with_tools helper to report an extremely large
    # token count for the first call (post-summary) and a small count for the
    # second call (after hard eviction).
-    with patch("letta.agents.letta_agent_v3.count_tokens_with_tools") as mock_count_tokens:
+    with patch("letta.services.summarizer.compact.count_tokens_with_tools") as mock_count_tokens:
        # First call: pretend the summarized context is still huge relative to
        # this model's context window so that we always trigger the
        # hard-eviction path. Second call: minimal context (system only) is
--- a/tests/managers/test_agent_manager.py
+++ b/tests/managers/test_agent_manager.py
@@ -314,12 +314,12 @@ async def test_compaction_settings_model_uses_separate_llm_config_for_summarizat
    the LLMConfig used for the summarizer request.
    """

-    from letta.agents.letta_agent_v3 import LettaAgentV3
    from letta.schemas.agent import AgentState as PydanticAgentState
    from letta.schemas.enums import AgentType, MessageRole
    from letta.schemas.memory import Memory
    from letta.schemas.message import Message as PydanticMessage
    from letta.schemas.model import OpenAIModelSettings, OpenAIReasoning
+    from letta.services.summarizer.compact import build_summarizer_llm_config

    # Base agent LLM config
    base_llm_config = LLMConfig.default_config("gpt-4o-mini")
@@ -406,16 +406,11 @@ async def test_compaction_settings_model_uses_separate_llm_config_for_summarizat
        tool_rules=None,
    )

-    # Create a mock agent instance to call the instance method
-    mock_agent = Mock(spec=LettaAgentV3)
-    mock_agent.actor = default_user
-    mock_agent.logger = Mock()
-
-    # Use the instance method to derive summarizer llm_config
-    summarizer_llm_config = await LettaAgentV3._build_summarizer_llm_config(
-        mock_agent,
+    # Use the shared function to derive summarizer llm_config
+    summarizer_llm_config = await build_summarizer_llm_config(
        agent_llm_config=agent_state.llm_config,
        summarizer_config=agent_state.compaction_settings,
+        actor=default_user,
    )

    # Agent model remains the base model