fix: orphaned approvals, token inflation, reasoning fields, memfs redis dep
[IN TESTING — self-hosted 0.16.6, Kimi-K2.5 via Synthetic Direct] Four independent fixes that landed together on this stack: helpers.py — skip PendingApprovalError when the associated run is already cancelled or failed. Stale approvals from interrupted runs were blocking all subsequent messages on that conversation. Now checks run status before raising; falls back to raising on lookup failure (conservative). letta_agent_v3.py — use prompt_tokens not total_tokens for context window estimate. total_tokens inflated the estimate by including completion tokens, triggering premature compaction. This was causing context window resets mid- conversation and is the root of the token inflation bug (see #3242). openai_client.py (both build_request_data paths) — strip reasoning_content, reasoning_content_signature, redacted_reasoning_content, omitted_reasoning_content from message history before sending to inference backends. Fireworks and Synthetic Direct reject these fields with 422/400 errors. exclude_none handles None values but not actual text content from previous assistant turns. block_manager_git.py — skip DB write when block value is unchanged. Reduces unnecessary write amplification on every memfs sync cycle. memfs_client_base.py — remove redis_client= kwarg from GitOperations init. Dependency was removed upstream but the call site wasn't updated. Dockerfile / compose files — context window and config updates for 220k limit.
This commit is contained in:
@@ -454,6 +454,15 @@ class OpenAIClient(LLMClientBase):
|
||||
)
|
||||
|
||||
request_data = data.model_dump(exclude_unset=True, exclude_none=True)
|
||||
|
||||
# Strip reasoning fields (see streaming build_request_data for explanation)
|
||||
_REASONING_FIELDS = ("reasoning_content", "reasoning_content_signature",
|
||||
"redacted_reasoning_content", "omitted_reasoning_content")
|
||||
if "messages" in request_data:
|
||||
for message in request_data["messages"]:
|
||||
for field in _REASONING_FIELDS:
|
||||
message.pop(field, None)
|
||||
|
||||
return request_data
|
||||
|
||||
@trace_method
|
||||
@@ -641,6 +650,15 @@ class OpenAIClient(LLMClientBase):
|
||||
tool.function.strict = False
|
||||
request_data = data.model_dump(exclude_unset=True, exclude_none=True)
|
||||
|
||||
# Strip reasoning fields that strict backends (Fireworks/Synthetic) reject.
|
||||
# exclude_none handles fields that are None, but reasoning_content has actual
|
||||
# text from previous assistant turns and must be explicitly removed.
|
||||
_REASONING_FIELDS = ("reasoning_content", "reasoning_content_signature",
|
||||
"redacted_reasoning_content", "omitted_reasoning_content")
|
||||
if "messages" in request_data:
|
||||
for message in request_data["messages"]:
|
||||
for field in _REASONING_FIELDS:
|
||||
message.pop(field, None)
|
||||
|
||||
# If Ollama
|
||||
# if llm_config.handle.startswith("ollama/") and llm_config.enable_reasoner:
|
||||
|
||||
Reference in New Issue
Block a user