fix: update ContextWindowCalculator to parse new system message sections (#9398)

* fix: update ContextWindowCalculator to parse new system message sections

The context window calculator was using outdated position-based parsing
that only handled 3 sections (base_instructions, memory_blocks, memory_metadata).
The actual system message now includes additional sections that were not
being tracked:

- <memory_filesystem> (git-enabled agents)
- <tool_usage_rules> (when tool rules configured)
- <directories> (when sources attached)

Changes:
- Add _extract_tag_content() helper for proper XML tag extraction
- Rewrite extract_system_components() to return a Dict with all 6 sections
- Update calculate_context_window() to count tokens for new sections
- Add new fields to ContextWindowOverview schema with backward-compatible defaults
- Add unit tests for the extraction logic

* update

* generate

* fix: check attached file in directories section instead of core_memory

Files are rendered inside <directories> tags, not <memory_blocks>.
Update validate_context_window_overview assertions accordingly.

* fix: address review feedback for context window parser

- Fix git-enabled agents regression: capture bare file blocks
  (e.g. <system/human.md>) rendered after </memory_filesystem> as
  core_memory via new _extract_git_core_memory() method
- Make _extract_top_level_tag robust: scan all occurrences to find
  tag outside container, handling nested-first + top-level-later case
- Document system_prompt tag inconsistency in docstring
- Add TODO to base_agent.py extract_dynamic_section linking to
  ContextWindowCalculator to flag parallel parser tech debt
- Add tests: git-enabled agent parsing, dual-occurrence tag
  extraction, pure text system prompt, git-enabled integration test
This commit is contained in:
jnjpng
2026-02-10 16:22:54 -08:00
committed by Caren Thomas
parent 7cc1cd3dc0
commit 39b25a0e3c
6 changed files with 906 additions and 86 deletions

View File

@@ -30242,6 +30242,60 @@
"title": "Core Memory",
"description": "The content of the core memory."
},
"num_tokens_memory_filesystem": {
"type": "integer",
"title": "Num Tokens Memory Filesystem",
"description": "The number of tokens in the memory filesystem section (git-enabled agents only).",
"default": 0
},
"memory_filesystem": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Memory Filesystem",
"description": "The content of the memory filesystem section."
},
"num_tokens_tool_usage_rules": {
"type": "integer",
"title": "Num Tokens Tool Usage Rules",
"description": "The number of tokens in the tool usage rules section.",
"default": 0
},
"tool_usage_rules": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Tool Usage Rules",
"description": "The content of the tool usage rules section."
},
"num_tokens_directories": {
"type": "integer",
"title": "Num Tokens Directories",
"description": "The number of tokens in the directories section (attached sources).",
"default": 0
},
"directories": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Directories",
"description": "The content of the directories section."
},
"num_tokens_summary_memory": {
"type": "integer",
"title": "Num Tokens Summary Memory",