fix: orphaned approvals, token inflation, reasoning fields, memfs redis dep

[IN TESTING — self-hosted 0.16.6, Kimi-K2.5 via Synthetic Direct] Four independent fixes that landed together on this stack: helpers.py — skip PendingApprovalError when the associated run is already cancelled or failed. Stale approvals from interrupted runs were blocking all subsequent messages on that conversation. Now checks run status before raising; falls back to raising on lookup failure (conservative). letta_agent_v3.py — use prompt_tokens not total_tokens for context window estimate. total_tokens inflated the estimate by including completion tokens, triggering premature compaction. This was causing context window resets mid- conversation and is the root of the token inflation bug (see #3242). openai_client.py (both build_request_data paths) — strip reasoning_content, reasoning_content_signature, redacted_reasoning_content, omitted_reasoning_content from message history before sending to inference backends. Fireworks and Synthetic Direct reject these fields with 422/400 errors. exclude_none handles None values but not actual text content from previous assistant turns. block_manager_git.py — skip DB write when block value is unchanged. Reduces unnecessary write amplification on every memfs sync cycle. memfs_client_base.py — remove redis_client= kwarg from GitOperations init. Dependency was removed upstream but the call site wasn't updated. Dockerfile / compose files — context window and config updates for 220k limit.
2026-03-26 23:24:32 -04:00
parent 08d3c26732
commit 1d1adb261a
9 changed files with 91 additions and 15 deletions
--- a/letta/llm_api/openai_client.py
+++ b/letta/llm_api/openai_client.py
@@ -454,6 +454,15 @@ class OpenAIClient(LLMClientBase):
        )

        request_data = data.model_dump(exclude_unset=True, exclude_none=True)
+
+        # Strip reasoning fields (see streaming build_request_data for explanation)
+        _REASONING_FIELDS = ("reasoning_content", "reasoning_content_signature",
+                             "redacted_reasoning_content", "omitted_reasoning_content")
+        if "messages" in request_data:
+            for message in request_data["messages"]:
+                for field in _REASONING_FIELDS:
+                    message.pop(field, None)
+
        return request_data

    @trace_method
@@ -641,6 +650,15 @@ class OpenAIClient(LLMClientBase):
                    tool.function.strict = False
        request_data = data.model_dump(exclude_unset=True, exclude_none=True)

+        # Strip reasoning fields that strict backends (Fireworks/Synthetic) reject.
+        # exclude_none handles fields that are None, but reasoning_content has actual
+        # text from previous assistant turns and must be explicitly removed.
+        _REASONING_FIELDS = ("reasoning_content", "reasoning_content_signature",
+                             "redacted_reasoning_content", "omitted_reasoning_content")
+        if "messages" in request_data:
+            for message in request_data["messages"]:
+                for field in _REASONING_FIELDS:
+                    message.pop(field, None)

        # If Ollama
        # if llm_config.handle.startswith("ollama/") and llm_config.enable_reasoner: