docs: rewrite dispatching-coding-agents skill prompt (#1400)

Co-authored-by: Letta Code <noreply@letta.com>
2026-03-16 10:49:15 -07:00
parent 022dafadbc
commit 380c7e1369
1 changed files with 181 additions and 149 deletions
--- a/src/skills/builtin/dispatching-coding-agents/SKILL.md
+++ b/src/skills/builtin/dispatching-coding-agents/SKILL.md
@@ -1,23 +1,140 @@
 ---
 name: dispatching-coding-agents
-description: Dispatch tasks to Claude Code and Codex CLI agents via Bash. Use when you want a second opinion, need to parallelize research across models, or face a hard coding problem that benefits from a stateless agent with frontier reasoning. Covers non-interactive execution, model selection, session resumption, and the history-analyzer subagent for accessing their past sessions.
+description: Dispatch stateless coding agents (Claude Code or Codex) via Bash. Use when you're stuck, need a second opinion, or need parallel research on a hard problem. They have no memory — you must provide all context.
 ---

 # Dispatching Coding Agents

-You can shell out to **Claude Code** (`claude`) and **Codex** (`codex`) as stateless sub-agents via Bash. They have full filesystem and tool access but **zero memory** — you must provide all necessary context in the prompt.
+You can shell out to **Claude Code** (`claude`) and **Codex** (`codex`) as stateless sub-agents via Bash. They have filesystem and tool access (scope depends on sandbox/approval settings) but **zero memory** — every session starts from scratch.

-## Philosophy
+**Default to `run_in_background: true`** on the Bash call so you can keep working while they run. Check results later with `TaskOutput`. Don't sit idle waiting for a subagent.

-You are the experienced manager with persistent memory. Claude Code and Codex are high-intellect but stateless — reborn fresh every invocation. Your job:
+## The Core Mental Model

-1. **Provide context** — include relevant file paths, architecture context, and constraints from your memory
-2. **Be specific** — tell them exactly what to investigate or implement, and what files to look at
-3. **Run async when possible** — use `run_in_background: true` on Bash calls to avoid blocking
-4. **Learn from results** — track which models/agents perform better on which tasks, and update memory
-5. **Mine their history** — use the `history-analyzer` subagent to access past Claude Code and Codex sessions
+Claude Code and Codex are highly optimized coding agents, but are re-born with each new session. Think of them like a brilliant intern that showed up today. Provide them with the right instructions and context to help them succeed and avoid having to re-learn things that you've learned.

-## Non-Interactive Execution
+You are the experienced manager with persistent memory of the user's preferences, the codebase, past decisions, and hard-won lessons. **Give them context, not a plan.** They won't know anything you don't tell them:
+
+- **Specific task**: Be precise about what you need — not "look into the auth system" but "trace the request flow from the messages endpoint through to the LLM call, cite files and line numbers."
+- **File paths and architecture**: Tell them exactly where to look and how pieces connect. They will wander aimlessly without this.
+- **Preferences and constraints**: Code style, error handling patterns, things the user has corrected you on. Save them from making mistakes you already learned from.
+- **What you've already tried**: If you're dispatching because you're stuck, this prevents them from rediscovering your dead ends.
+
+If a subagent needs clarification or asks a question, respond in the same session (see Session Resumption below) — don't start a new session or you'll lose the conversation context.
+
+## When to Dispatch (and When Not To)
+
+### Dispatch for:
+- **Hard debugging** — you've been looping on a problem and need fresh eyes
+- **Second opinions** — you want validation before a risky change
+- **Parallel research** — investigate multiple hypotheses simultaneously
+- **Large-scope investigation** — tracing a flow across many files in an unfamiliar area
+- **Code review** — have another agent review your diff or plan
+
+### Don't dispatch for:
+- Simple file reads, greps, or small edits — faster to do yourself
+- Anything that takes less than ~3 minutes of direct work
+- Tasks where you already know exactly what to do
+- When context transfer would take longer than just doing the task
+
+## Choosing an Agent and Model
+
+Different agents have different strengths. Track what works in your memory over time — your own observations are more valuable than these defaults.
+
+### Categories
+
+**Codex:**
+- `gpt-5.3-codex` — Frontier reasoning. Best for the hardest debugging and complex tasks.
+  - Strengths: Best reasoning, excellent at debugging, best option for the hardest tasks
+  - Weaknesses: Slow with long trajectories, compactions can destroy trajectories
+- `gpt-5.4` — Latest frontier model. Fast and general-purpose.
+  - Strengths: Easier for humans to understand, general-purpose, faster
+  - Weaknesses: More likely to make silly errors than gpt-5.3-codex
+
+**Claude Code:**
+- `opus` — Excellent writer. Best for docs, refactors, open-ended tasks, and vague instructions.
+  - Strengths: Excellent writer, understands vague instructions, excellent for coding but also general-purpose
+  - Weaknesses: Tends to generate "slop", writing excessive quantities of code unnecessarily. Can hang on large repos.
+
+### Cost and speed tradeoffs
+- Frontier models (`gpt-5.3-codex`, Opus) are slower and more expensive — use for tasks that justify it
+- Fast models (`gpt-5.4`) are good for quick checks and simple tasks
+- Use `--max-budget-usd N` (Claude Code) to cap spend on exploratory tasks
+
+### Known quirks
+- **Claude Code can hang on large repos** with unrestricted tools — consider `--allowedTools "Read Grep Glob"` (no Bash) and shorter timeouts for research tasks
+- **Codex compactions can destroy long trajectories** — for very long tasks, prefer multiple shorter sessions over one marathon
+- **Opus tends to over-generate** — produces more code than necessary. Good for exploration, verify before applying.
+
+## Prompting Subagents
+
+### Prompt template
+```
+TASK: [one-sentence summary]
+
+CONTEXT:
+- Repo: [path]
+- Key files: [list specific files and what they contain]
+- Architecture: [brief relevant context]
+
+WHAT TO DO:
+[what you need done — be precise, but let them figure out the approach]
+
+CONSTRAINTS:
+- [any preferences, patterns to follow, things to avoid]
+- [what you've already tried, if dispatching because stuck]
+
+OUTPUT:
+[what you want back — a diff, a list of files, a root cause analysis, etc.]
+```
+
+### What makes a good prompt
+- **Be specific about files** — "look at `src/agent/message.ts` lines 40-80" not "look at the message handling code"
+- **State the output format** — "return a bullet list of findings" vs. leaving it open-ended
+- **Include constraints** — if the user prefers certain patterns, say so explicitly
+- **Provide what you've tried** — when dispatching because you're stuck, this prevents them from repeating your dead ends
+
+## Dispatch Patterns
+
+### Parallel research — multiple perspectives
+Run Claude Code and Codex simultaneously on the same question via separate Bash calls in a single message (use `run_in_background: true`). Compare results for higher confidence.
+
+### Background dispatch — keep working while they run
+Use `run_in_background: true` on the Bash call to dispatch async. Continue your own work, then check results with `TaskOutput` when ready.
+
+### Deep investigation — frontier models
+For hard problems, use the strongest available models:
+```bash
+codex exec "YOUR PROMPT" -m gpt-5.3-codex --full-auto -C /path/to/repo
+```
+
+### Code review — cross-agent validation
+Have one agent write code or create a plan, then dispatch another to review:
+```bash
+# Codex has a native review command:
+codex review --uncommitted    # Review all local changes
+codex exec review "Focus on error handling and edge cases" -m gpt-5.4 --full-auto
+
+# Claude Code — pass the diff inline:
+claude -p "Review the following diff for correctness, edge cases, and missed error handling:\n\n$(git diff)" \
+  --model opus --dangerously-skip-permissions
+```
+
+### Get outside feedback on your work
+Write your plan or analysis to a file, then ask a subagent to critique it:
+```bash
+claude -p "Read /tmp/my-plan.md and critique it. What am I missing? What could go wrong?" \
+  --model opus --dangerously-skip-permissions -C /path/to/repo
+```
+
+## Handling Failures
+
+- **Timeout**: If an agent times out (especially Claude Code on large repos), try: (1) a shorter, more focused prompt, (2) restricting tools with `--allowedTools`, (3) switching to Codex which handles large repos better
+- **Garbage output**: If results are incoherent, the prompt was probably too vague. Rewrite with more specific file paths and clearer instructions.
+- **Session errors**: Claude Code can hit "stale approval from interrupted session" — `--dangerously-skip-permissions` prevents this. If Codex errors, start a fresh `exec` session.
+- **Compaction mid-task**: If a Codex session runs long enough to compact, it may lose earlier context. Break long tasks into smaller sequential sessions.
+
+## CLI Reference

 ### Claude Code

@@ -25,168 +142,83 @@ You are the experienced manager with persistent memory. Claude Code and Codex ar
 claude -p "YOUR PROMPT" --model MODEL --dangerously-skip-permissions
 ```

- `-p` / `--print`: non-interactive mode, prints response and exits
- `--dangerously-skip-permissions`: use in trusted repos to skip approval prompts. Without this, killed/timed-out sessions can leave stale approval state that blocks future runs with "stale approval from interrupted session" errors.
- `--model MODEL`: alias (`sonnet`, `opus`) or full name (`claude-sonnet-4-6`)
- `--effort LEVEL`: `low`, `medium`, `high` — controls reasoning depth
- `--append-system-prompt "..."`: inject additional system instructions
- `--allowedTools "Bash Edit Read"`: restrict available tools
- `--max-budget-usd N`: cap spend for the invocation
- `-C DIR`: set working directory
-
-Example — research task with Opus:
-```bash
-claude -p "Trace the request flow from POST /agents/{id}/messages through to the LLM call. Cite files and line numbers." \
-  --model opus --dangerously-skip-permissions -C /path/to/repo
-```
+| Flag | Purpose |
+|------|---------|
+| `-p` / `--print` | Non-interactive mode, prints response and exits |
+| `--dangerously-skip-permissions` | Skip approval prompts (prevents stale approval errors on timeout) |
+| `--model MODEL` | Alias (`sonnet`, `opus`) or full name (`claude-sonnet-4-6`) |
+| `--effort LEVEL` | `low`, `medium`, `high` — controls reasoning depth |
+| `--append-system-prompt "..."` | Inject additional system instructions |
+| `--allowedTools "Bash Edit Read"` | Restrict available tools |
+| `--max-budget-usd N` | Cap spend for the invocation |
+| `-C DIR` | Set working directory |
+| `--output-format json` | Structured output with `session_id`, `cost_usd`, `duration_ms` |

 ### Codex

 ```bash
-codex exec "YOUR PROMPT" -m codex-5.3 --full-auto
+codex exec "YOUR PROMPT" -m gpt-5.3-codex --full-auto
 ```

- `exec`: non-interactive mode
- `-m MODEL`: prefer `codex-5.3` (frontier), also `gpt-5.2`, `o3`
- `--full-auto`: auto-approve commands in sandbox (equivalent to `-a on-request --sandbox workspace-write`)
- `-C DIR`: set working directory
- `--search`: enable web search tool
+| Flag | Purpose |
+|------|---------|
+| `exec` | Non-interactive mode |
+| `-m MODEL` | `gpt-5.3-codex` (frontier), `gpt-5.4` (fast), `gpt-5.3-codex-spark` (ultra-fast), `gpt-5.2-codex`, `gpt-5.2` |
+| `--full-auto` | Auto-approve all commands in sandbox |
+| `-C DIR` | Set working directory |
+| `--search` | Enable web search tool |
+| `review` | Native code review — `codex review --uncommitted` or `codex exec review "prompt"` |

-Example — research task:
-```bash
-codex exec "Find all places where system prompt is recompiled. Cite files and line numbers." \
-  -m codex-5.3 --full-auto -C /path/to/repo
-```
+## Session Management

-## Session Resumption
+Both CLIs persist full session data (tool calls, reasoning, files read) to disk. The Bash output you see is just the final summary — the local session file is much richer.

-Both CLIs persist sessions to disk. Use resumption to continue a line of investigation.
+### Session storage paths

-### Claude Code
+**Claude Code:** `~/.claude/projects/<encoded-path>/<session-id>.jsonl`
+- `<encoded-path>` = working directory with `/` replaced by `-` (e.g. `/Users/foo/repos/bar` becomes `-Users-foo-repos-bar`)
+- Use `--output-format json` to get the `session_id` in the response

-```bash
-# Resume by session ID
-claude -r SESSION_ID -p "Follow up: now check if..."
+**Codex:** `~/.codex/sessions/<year>/<month>/<day>/rollout-*-<session-id>.jsonl`
+- Session ID is printed in output header: `session id: <uuid>`
+- Extract with: `grep "^session id:" output | awk '{print $3}'`

-# Continue most recent session in current directory
-claude -c -p "Also check..."
+### Resuming sessions

-# Fork a session (new ID, keeps history)
-claude -r SESSION_ID --fork-session -p "Try a different approach..."
-```
-
-### Codex
-
-```bash
-# Resume by session ID (interactive)
-codex resume SESSION_ID "Follow up prompt"
-
-# Resume most recent session
-codex resume --last "Follow up prompt"
-
-# Fork a session (new ID, keeps history)
-codex fork SESSION_ID "Try a different approach"
-codex fork --last "Try a different approach"
-```
-
-Note: Codex `resume` and `fork` launch interactive sessions, not non-interactive `exec`. For non-interactive follow-ups with Codex, start a fresh `exec` and include relevant context from the previous session in the prompt.
-
-## Capturing Session IDs
-
-When you dispatch a task, capture the session ID so you can access the full session history later. The Bash output you get back is just the final summary — the full session (intermediate tool calls, files read, reasoning) is stored locally and contains much richer data.
-
-### Claude Code
-
-Use `--output-format json` to get structured output including the session ID:
-```bash
-claude -p "YOUR PROMPT" --model opus --dangerously-skip-permissions --output-format json 2>&1
-```
-The JSON response includes `session_id`, `cost_usd`, `duration_ms`, `num_turns`, and `result`.
-
-Session files are stored at:
-```
-~/.claude/projects/<encoded-path>/<session-id>.jsonl
-```
-Where `<encoded-path>` is the working directory with `/` replaced by `-` (e.g. `/Users/foo/repos/bar` → `-Users-foo-repos-bar`).
-
-### Codex
-
-Codex prints the session ID in its output header:
-```
-session id: 019c9b76-fff4-7f40-a895-a58daa3c74c6
-```
-Extract it with: `grep "^session id:" output | awk '{print $3}'`
-
-Session files are stored at:
-```
-~/.codex/sessions/<year>/<month>/<day>/rollout-*.jsonl
-```
-
-## Session History
-
-Both CLIs persist full session data (tool calls, reasoning, files read) locally. This is richer than the summarized output you get back in Bash.
-
-### Where sessions are stored
+Use session resumption to continue a line of investigation without re-providing all context:

 **Claude Code:**
+```bash
+claude -r SESSION_ID -p "Follow up: now check if..."    # Resume by ID
+claude -c -p "Also check..."                             # Continue most recent
+claude -r SESSION_ID --fork-session -p "Try differently" # Fork (new ID, keeps history)
 ```
-~/.claude/projects/<encoded-path>/<session-id>.jsonl
-```
-Where `<encoded-path>` is the working directory with `/` replaced by `-` (e.g. `/Users/foo/repos/bar` → `-Users-foo-repos-bar`). Use `--output-format json` to get the `session_id` in structured output.

 **Codex:**
-```
-~/.codex/sessions/<year>/<month>/<day>/rollout-*-<session-id>.jsonl
-```
-The session ID is printed in the output header: `session id: <uuid>`.
-
-### When to analyze sessions
-
-**Don't** run history-analyzer after every dispatch — the reflection agent already captures insights from your conversation naturally, and single-session analysis tends to produce overly detailed memory that's better represented by the code itself.
-
-**Do** use `history-analyzer` for its intended purpose: **bulk migration** when bootstrapping memory from months of accumulated Claude Code/Codex history (e.g. during `/init`). For that, see the `migrating-from-codex-and-claude-code` skill.
-
-Session files are useful for:
- **Resuming** a line of investigation (see Session Resumption above)
- **Reviewing** what an agent actually did (read the JSONL directly)
- **Bulk migration** during `/init` when you have no existing memory
-
-## Dispatch Patterns
-
-### Parallel research — get multiple perspectives
-
-Run Claude Code and Codex simultaneously on the same question via separate Bash calls in a single message. Compare results for higher confidence.
-
-### Deep investigation — use frontier models
-
-For hard problems, use the strongest available models:
- Codex: `-m codex-5.3` (preferred — strong reasoning, good with large repos)
- Claude Code: `--model opus`
-
-### Code review — cross-agent validation
-
-Have one agent write code, then dispatch the other to review it:
 ```bash
-claude -p "Review the changes in this diff for correctness and edge cases: $(git diff)" --model opus
+codex exec resume SESSION_ID "Follow up prompt"  # Resume by ID (non-interactive)
+codex exec resume --last "Follow up prompt"      # Resume most recent (non-interactive)
+codex resume SESSION_ID "Follow up prompt"       # Resume by ID (interactive)
+codex resume --last "Follow up prompt"           # Resume most recent (interactive)
+codex fork SESSION_ID "Try a different approach" # Fork session (interactive)
 ```

-### Scoped implementation — sandboxed changes
+Note: `codex exec resume` works non-interactively. `codex resume` and `codex fork` are interactive only.

-Use Codex with `--full-auto` or Claude Code with `--dangerously-skip-permissions` (in trusted repos only) for autonomous implementation tasks. Always review their changes via `git diff` before committing.
+### When to analyze past sessions
+
+**Don't** run `history-analyzer` after every dispatch — your reflection agent already captures insights naturally, and single-session analysis produces overly detailed notes.
+
+**Do** use `history-analyzer` for **bulk migration** when bootstrapping memory from months of accumulated history (e.g. during `/init`). See the `migrating-from-codex-and-claude-code` skill.
+
+Direct uses for session files:
+- **Resume** an investigation (see above)
+- **Review** what an agent actually did (read the JSONL file directly)
+- **Bulk migration** when setting up a new agent

 ## Timeouts

-Set appropriate Bash timeouts for these calls — they can take a while:
- Research/analysis: `timeout: 300000` (5 min)
+Set Bash timeouts appropriate to the task:
+- Quick checks / reviews: `timeout: 120000` (2 min)
+- Research / analysis: `timeout: 300000` (5 min)
 - Implementation: `timeout: 600000` (10 min)
-
-## Strengths & Weaknesses (update as you learn)
-
-Track observations about model/agent performance in memory. Initial heuristics:
-
-| Agent | Strengths | Weaknesses |
-|-------|-----------|------------|
-| Codex (Codex-5.3) | Frontier reasoning, handles large repos well, reliable with --full-auto | Most expensive |
-| Codex (GPT-5.2) | Strong reasoning, good code search | Slightly less capable than 5.3 |
-| Claude Code (Sonnet) | Fast, actionable output with concrete code samples | Less thorough on edge cases |
-| Claude Code (Opus) | Deep analysis, nuanced reasoning | Can hang on large repos with tool use, needs --dangerously-skip-permissions |