commit 00382055c63bfad8eb450ca4d9e93213ba01b6cd Author: Ani (Annie Tunturi) Date: Wed Mar 18 10:30:20 2026 -0400 Initial commit: Community ADE foundation - Project structure: docs/, src/, tests/, proto/ - Research synthesis: Letta vs commercial ADEs - Architecture: Redis Streams queue design - Phase 1 orchestration design - Execution plan and project state tracking - Working subagent system (manager.ts fixes) This is the foundation for a Community ADE built on Letta's stateful agent architecture with git-native MemFS. πŸ‘Ύ Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta Code diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..06cef22 --- /dev/null +++ b/.gitignore @@ -0,0 +1,46 @@ +# Dependencies +node_modules/ +package-lock.json +yarn.lock +pnpm-lock.yaml + +# Build outputs +dist/ +build/ +*.tsbuildinfo + +# Environment +.env +.env.local +.env.*.local + +# IDE +.vscode/ +.idea/ +*.swp +*.swo +*~ + +# OS +.DS_Store +Thumbs.db + +# Logs +logs/ +*.log +npm-debug.log* +yarn-debug.log* +yarn-error.log* + +# Testing +coverage/ +.nyc_output/ + +# Redis +dump.rdb +*.rdb + +# Temporary +tmp/ +temp/ +*.tmp diff --git a/README.md b/README.md new file mode 100644 index 0000000..777b771 --- /dev/null +++ b/README.md @@ -0,0 +1,57 @@ +# Community ADE (Agentic Development Environment) + +A community-driven, open-source agentic development environment built on Letta's stateful agent architecture. + +## Vision + +Build an open-source ADE that combines: +- **Stateful agents** with hierarchical memory (Letta's unique strength) +- **Git-native persistence** with MemFS versioning +- **Persistent task queues** for durable subagent execution +- **Web dashboard** for real-time monitoring and control +- **Computer Use** integration for browser automation + +## Differentiation + +Unlike commercial alternatives (Warp, Intent), Community ADE is: +- **Open source** and self-hostable +- **Stateful by design** - agents remember across sessions +- **Model agnostic** - use any OpenAI-compatible API +- **Git-native** - version control for agent memory + +## Project Structure + +``` +β”œβ”€β”€ src/ # Queue implementation and worker pool +β”œβ”€β”€ tests/ # Test suite +β”œβ”€β”€ docs/ # Architecture and design documents +β”œβ”€β”€ proto/ # Prototypes and experiments +└── README.md # This file +``` + +## Documentation + +- [Project State](docs/community-ade-project-state.md) - Current status and active subagents +- [Phase 1 Design](docs/ade-phase1-orchestration-design.md) - Task queue architecture +- [Redis Queue Design](docs/ade-redis-queue-design.md) - Detailed Redis implementation spec +- [Research Synthesis](docs/community-ade-research-synthesis-2026-03-18.md) - Competitive analysis + +## Phase 1: Orchestration Layer (In Progress) + +Goals: +1. βœ… Research and design complete +2. πŸ”„ Redis task queue implementation +3. ⏳ Worker pool with heartbeat +4. ⏳ Integration with Letta Task tool + +## Quick Start + +Coming soon - queue prototype implementation. + +## License + +MIT - Community contribution welcome. + +--- + +*Project orchestrated by Ani, with research and design by specialized subagents.* diff --git a/docs/ade-phase1-execution-plan.md b/docs/ade-phase1-execution-plan.md new file mode 100644 index 0000000..194b840 --- /dev/null +++ b/docs/ade-phase1-execution-plan.md @@ -0,0 +1,525 @@ +# Phase 1 Execution Plan: Orchestration Layer + +**Date:** March 18, 2026 +**Status:** Ready for Implementation +**Estimated Duration:** 6 weeks +**Owner:** TBD + +--- + +## Overview + +This document provides actionable implementation guidance for Phase 1 of the Community ADE, based on synthesized research from commercial tools (Intent, Warp) and open-source alternatives (Aider, Cline, Agno). + +--- + +## Key Research Insights + +### 1. Letta's Competitive Position + +**βœ… Strongest Open-Source Position:** +- No competitor combines: stateful agents + hierarchical memory + git-native persistence + subagent orchestration +- Aider has git integration but no agent memory +- Cline is session-based with no persistence +- Agno lacks Letta's memory architecture + +**⚠️ Commercial Tools Lead in UX:** +- Warp: Terminal-native with rich context (@file, images) +- Intent: Specification-driven development +- Both have web dashboards; Letta needs one + +### 2. Technical Pattern Validation + +**Redis + Workers (Selected for Phase 1):** +- βœ… Proven pattern (Celery uses Redis under hood) +- βœ… Simpler than Temporal for our use case +- βœ… More control over data model +- ⚠️ Temporal deferred to Phase 2 evaluation + +**React + FastAPI (Selected for Phase 2):** +- βœ… Industry standard +- βœ… shadcn/ui provides accessible components +- βœ… TanStack Query for caching/real-time sync + +--- + +## Phase 1 Scope + +### Goals +1. Replace in-process Task execution with persistent queue +2. Ensure tasks survive agent restarts +3. Support 5+ concurrent workers +4. Maintain backward compatibility + +### Out of Scope (Phase 2+) +- Web dashboard (Phase 2) +- Temporal workflows (Phase 2 evaluation) +- GitHub integration (Phase 3) +- Computer Use (Phase 4) + +--- + +## Implementation Breakdown + +### Week 1: In-Memory Prototype + +**Deliverables:** +- [ ] `TaskQueue` class with asyncio.Queue +- [ ] Task dataclass with all fields +- [ ] Worker process skeleton +- [ ] Basic enqueue/dequeue/complete/fail operations + +**Testing:** +```python +# Test: Task survives worker crash +# Test: Concurrent task execution +# Test: Priority ordering +``` + +**Code Structure:** +``` +letta_ade/ +β”œβ”€β”€ __init__.py +β”œβ”€β”€ queue/ +β”‚ β”œβ”€β”€ __init__.py +β”‚ β”œβ”€β”€ models.py # Task dataclass, enums +β”‚ β”œβ”€β”€ memory_queue.py # Week 1 implementation +β”‚ └── base.py # Abstract base class +└── worker/ + β”œβ”€β”€ __init__.py + └── runner.py # Worker process logic +``` + +### Week 2: Redis Integration + +**Deliverables:** +- [ ] Redis connection manager +- [ ] Task serialization (JSON/pickle) +- [ ] Atomic dequeue with WATCH/MULTI/EXEC +- [ ] Status tracking (Sets per status) + +**Redis Schema:** +```redis +# Task storage +HSET task:{uuid} field value ... + +# Priority queue (pending) +ZADD queue:pending {priority} {task_id} + +# Running tasks +ZADD queue:running {started_at} {task_id} + +# Status index +SADD status:pending {task_id} +SADD status:running {task_id} +SADD status:completed {task_id} +SADD status:failed {task_id} + +# User index +SADD user:{user_id}:tasks {task_id} +``` + +**Dependencies:** +```toml +[dependencies] +redis = { version = "^5.0", extras = ["hiredis"] } +``` + +### Week 3-4: Worker Pool + Heartbeat + +**Deliverables:** +- [ ] Multiple worker processes +- [ ] Worker heartbeat (every 30s) +- [ ] Stall detection (2x heartbeat timeout) +- [ ] Graceful shutdown handling +- [ ] Worker capacity management + +**Worker Logic:** +```python +async def worker_loop(agent_id: UUID, queue: TaskQueue): + while running: + # Send heartbeat + await queue.heartbeat(agent_id) + + # Try to get task (5s timeout) + task = await queue.dequeue(agent_id, timeout_ms=5000) + + if task: + # Spawn subagent process + proc = await asyncio.create_subprocess_exec( + "letta", "run-agent", + f"--task-id={task.id}", + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE + ) + + # Wait for completion + stdout, stderr = await proc.communicate() + + # Update queue + if proc.returncode == 0: + await queue.complete(task.id, parse_result(stdout)) + else: + await queue.fail(task.id, stderr.decode()) + + # Brief pause to prevent tight loop + await asyncio.sleep(0.1) +``` + +**Stall Recovery (Cron job):** +```python +async def recover_stalled_tasks(queue: TaskQueue, max_age: timedelta): + """Requeue tasks from crashed workers.""" + stalled = await queue.find_stalled(max_age) + for task_id in stalled: + await queue.requeue(task_id) +``` + +### Week 5: API Layer + +**Deliverables:** +- [ ] FastAPI application structure +- [ ] REST endpoints (CRUD for tasks) +- [ ] WebSocket endpoint for real-time updates +- [ ] Authentication middleware + +**REST Endpoints:** +```python +@app.post("/tasks") +async def create_task(task: TaskCreate) -> TaskResponse: + """Enqueue a new task.""" + task_id = await queue.enqueue(task) + return TaskResponse(task_id=task_id, status="pending") + +@app.get("/tasks/{task_id}") +async def get_task(task_id: UUID) -> Task: + """Get task status and result.""" + return await queue.get(task_id) + +@app.get("/tasks") +async def list_tasks( + user_id: str, + status: Optional[TaskStatus] = None +) -> List[TaskSummary]: + """List tasks with optional filtering.""" + return await queue.list_by_user(user_id, status) + +@app.post("/tasks/{task_id}/cancel") +async def cancel_task(task_id: UUID): + """Cancel a pending or running task.""" + await queue.cancel(task_id) + +@app.post("/tasks/{task_id}/retry") +async def retry_task(task_id: UUID): + """Retry a failed task.""" + await queue.retry(task_id) +``` + +**WebSocket:** +```python +@app.websocket("/ws") +async def websocket_endpoint(websocket: WebSocket): + await websocket.accept() + + # Subscribe to Redis pub/sub for updates + pubsub = redis.pubsub() + pubsub.subscribe("task_updates") + + async for message in pubsub.listen(): + if message["type"] == "message": + await websocket.send_json(message["data"]) +``` + +### Week 6: Task Tool Integration + +**Deliverables:** +- [ ] Modify existing Task tool to use queue +- [ ] `persist` flag for backward compatibility +- [ ] Polling support for task completion +- [ ] Migration guide for existing code + +**Modified Task Tool:** +```python +class TaskTool: + async def run( + self, + prompt: str, + subagent_type: str, + # ... existing args ... + persist: bool = False, # NEW + priority: int = 100, # NEW + wait: bool = False, # NEW + timeout: int = 300, # NEW + ) -> TaskResult: + + if persist: + # Enqueue and optionally wait + task_id = await self.queue.enqueue(...) + + if wait: + # Poll for completion + result = await self._wait_for_task(task_id, timeout) + return result + else: + # Return immediately with task_id + return TaskResult(task_id=task_id, status="pending") + else: + # Legacy immediate execution + return await self._execute_immediately(...) +``` + +--- + +## Technical Specifications + +### Task Data Model + +```python +@dataclass +class Task: + id: UUID + subagent_type: str + prompt: str + system_prompt: Optional[str] + model: Optional[str] + + # State + status: TaskStatus + priority: int = 100 + created_at: datetime + started_at: Optional[datetime] + completed_at: Optional[datetime] + + # Execution + agent_id: Optional[UUID] + retry_count: int = 0 + max_retries: int = 3 + + # Results + result: Optional[dict] + error: Optional[str] + exit_code: Optional[int] + + # Metadata + tags: List[str] + user_id: str + parent_task: Optional[UUID] + + # Cost tracking (NEW) + input_tokens: int = 0 + output_tokens: int = 0 + estimated_cost: float = 0.0 +``` + +### Retry Logic + +```python +async def retry_with_backoff(task: Task) -> bool: + if task.retry_count >= task.max_retries: + return False # Permanent failure + + # Exponential backoff: 2^retry_count seconds + delay = min(2 ** task.retry_count, 300) # Cap at 5 min + + await asyncio.sleep(delay) + task.retry_count += 1 + + # Re-enqueue with same priority + await queue.enqueue(task, priority=task.priority) + return True +``` + +### Error Classification + +| Error | Retry? | Action | +|-------|--------|--------| +| Subagent crash | Yes | Requeue with backoff | +| Syntax error | No | Fail immediately | +| API rate limit | Yes | Exponential backoff | +| Out of memory | No | Alert admin, fail | +| Redis connection | Yes | Reconnect, retry | +| Timeout | Yes | Retry with longer timeout | + +--- + +## Testing Strategy + +### Unit Tests +```python +# test_queue.py +def test_enqueue_creates_pending_task(): +def test_dequeue_removes_from_pending(): +def test_complete_moves_to_completed(): +def test_fail_triggers_retry(): +def test_max_retries_exceeded(): +def test_cancel_stops_running_task(): +``` + +### Integration Tests +```python +# test_worker.py +async def test_worker_processes_task(): +async def test_worker_handles_failure(): +async def test_worker_heartbeat(): +async def test_stall_recovery(): +``` + +### Durability Tests +```python +# test_durability.py +async def test_tasks_survive_restart(): + """Enqueue tasks, restart Redis, verify tasks persist.""" + +async def test_worker_crash_recovery(): + """Kill worker mid-task, verify task requeued.""" + +async def test_concurrent_workers(): + """5 workers, 20 tasks, verify all complete.""" +``` + +--- + +## Dependencies + +### Required +```toml +redis = { version = "^5.0", extras = ["hiredis"] } +fastapi = "^0.115" +websockets = "^13.0" +pydantic = "^2.0" +``` + +### Development +```toml +pytest = "^8.0" +pytest-asyncio = "^0.24" +httpx = "^0.27" # For FastAPI test client +``` + +### Infrastructure +- Redis 7.0+ (local or cloud) +- Python 3.11+ + +--- + +## Migration Guide + +### For Existing Task Tool Users + +**Before:** +```python +result = await task_tool.run( + prompt="Create a React component", + subagent_type="coder" +) # Blocks until complete +``` + +**After (backward compatible):** +```python +# Same behavior (immediate execution) +result = await task_tool.run( + prompt="Create a React component", + subagent_type="coder", + persist=False # default +) +``` + +**New (persistent):** +```python +# Fire-and-forget +task_id = await task_tool.run( + prompt="Create a React component", + subagent_type="coder", + persist=True +) + +# Wait for completion +result = await task_tool.run( + prompt="Create a React component", + subagent_type="coder", + persist=True, + wait=True, + timeout=600 +) +``` + +--- + +## Success Criteria + +| Metric | Target | Measurement | +|--------|--------|-------------| +| Task durability | 100% | Tasks never lost on restart | +| Throughput | 10 tasks/min | With 3 workers | +| Latency | <100ms | Enqueue β†’ pending | +| Recovery time | <60s | Worker crash β†’ requeue | +| API uptime | 99.9% | Health check endpoint | +| Backward compat | 100% | Existing tests pass | + +--- + +## Risk Mitigation + +| Risk | Likelihood | Impact | Mitigation | +|------|------------|--------|------------| +| Redis complexity | Low | Medium | Start with simple ops | +| Worker pool bugs | Medium | High | Extensive testing | +| Performance issues | Low | Medium | Load testing Week 5 | +| Migration breakage | Low | High | Full test suite | + +--- + +## Handoff to Phase 2 + +**Phase 2 Prereqs:** +- [ ] All Phase 1 success criteria met +- [ ] API documentation complete +- [ ] WebSocket tested with simple client +- [ ] Cost tracking working + +**Phase 2 Inputs:** +- Task queue API (REST + WebSocket) +- Task data model +- Worker management API +- Redis schema + +--- + +## Appendix: Quick Reference + +### Redis Commands Cheat Sheet + +```bash +# Start Redis +docker run -d -p 6379:6379 redis:7-alpine + +# Monitor +redis-cli monitor + +# Inspect keys +redis-cli KEYS "task:*" +redis-cli HGETALL task:abc-123 + +# Clear queue +redis-cli FLUSHDB +``` + +### Development Commands + +```bash +# Start worker +python -m letta_ade.worker.runner --agent-id worker-1 + +# Start API +uvicorn letta_ade.api:app --reload + +# Run tests +pytest tests/ -v --tb=short + +# Integration test +pytest tests/integration/ -v +``` + +--- + +*Ready for implementation. Questions? See community-ade-research-synthesis-2026-03-18.md for full context.* diff --git a/docs/ade-phase1-orchestration-design.md b/docs/ade-phase1-orchestration-design.md new file mode 100644 index 0000000..78a935f --- /dev/null +++ b/docs/ade-phase1-orchestration-design.md @@ -0,0 +1,307 @@ +# Phase 1: Orchestration Layer Design + +**Date:** March 18, 2026 +**Architect:** Researcher subagent +**Goal:** Design persistent task queue system for Community ADE + +--- + +## 1. Core Data Model + +```python +@dataclass +class Task: + id: UUID # Unique task identifier + subagent_type: str # "researcher", "coder", etc. + prompt: str # User prompt to subagent + system_prompt: Optional[str] # Override default system prompt + model: Optional[str] # Override default model + + # State tracking + status: TaskStatus # pending/running/completed/failed/cancelled + priority: int = 100 # Lower = higher priority + created_at: datetime + started_at: Optional[datetime] + completed_at: Optional[datetime] + + # Execution tracking + agent_id: Optional[UUID] # Assigned worker agent + retry_count: int = 0 + max_retries: int = 3 + + # Results + result: Optional[dict] # Success result + error: Optional[str] # Failure message + exit_code: Optional[int] # Subprocess exit code + + # Metadata + tags: List[str] # For filtering/grouping + user_id: str # Task owner + parent_task: Optional[UUID] # For task chains +``` + +### TaskStatus Enum +```python +class TaskStatus(Enum): + PENDING = "pending" # Waiting for worker + RUNNING = "running" # Assigned to worker + COMPLETED = "completed" # Success + FAILED = "failed" # Permanent failure (max retries) + CANCELLED = "cancelled" # User cancelled + STALLED = "stalled" # Worker crashed, needs recovery +``` + +--- + +## 2. State Machine + +``` + +-----------+ + | PENDING | + +-----+-----+ + | dequeue() + v ++--------+ +-------------+ +-----------+ +| FAILED |<--------+ RUNNING +-------->| COMPLETED | ++--------+ fail() +------+------+ success +-----------+ + ^ max | | + | retries | | + +------------------+ | cancel() + retry() v + +-----------+ + | CANCELLED | + +-----------+ + ^ + | stall detected + +----------+ + | STALLED | + +----------+ +``` + +### Transitions +- `PENDING β†’ RUNNING`: Worker dequeues task +- `RUNNING β†’ COMPLETED`: Subagent succeeds +- `RUNNING β†’ FAILED`: Subagent fails, max retries reached +- `RUNNING β†’ STALLED`: Worker heartbeat timeout +- `STALLED β†’ RUNNING`: Reassigned to new worker +- `FAILED β†’ RUNNING`: Manual retry triggered +- Any β†’ CANCELLED: User cancellation + +--- + +## 3. Redis Data Structures + +| Purpose | Structure | Key Pattern | +|---------|-----------|-------------| +| Task payload | Hash | `task:{task_id}` | +| Pending queue | Sorted Set (by priority) | `queue:pending` | +| Running set | Set | `queue:running` | +| Worker registry | Hash | `worker:{agent_id}` | +| Status index | Set per status | `status:{status}` | +| User tasks | Set | `user:{user_id}:tasks` | + +### Example Redis Operations + +```redis +# Enqueue (pending) +ZADD queue:pending {priority} {task_id} +HSET task:{task_id} status pending created_at {timestamp} ... +SADD status:pending {task_id} + +# Dequeue (atomic) +WATCH queue:pending +task_id = ZPOPMIN queue:pending +MULTI + ZADD queue:running {now} {task_id} + HSET task:{task_id} status running agent_id {worker} started_at {now} + SMOVE status:pending status:running {task_id} +EXEC + +# Complete +ZREM queue:running {task_id} +SADD status:completed {task_id} +HSET task:{task_id} status completed result {...} completed_at {now} + +# Fail with retry +HINCRBY task:{task_id} retry_count 1 +ZADD queue:pending {priority} {task_id} # Re-enqueue +SMOVE status:running status:pending {task_id} +HSET task:{task_id} status pending error {...} + +# Stall recovery (cron job) +SMEMBERS queue:running +# For each task where worker heartbeat > threshold: +ZREM queue:running {task_id} +SADD status:stalled {task_id} +ZADD queue:pending {priority} {task_id} +``` + +--- + +## 4. Key API Methods + +```python +class TaskQueue: + # Core operations + async def enqueue(task: Task) -> UUID + async def dequeue(worker_id: UUID, timeout_ms: int = 5000) -> Optional[Task] + async def complete(task_id: UUID, result: dict) -> None + async def fail(task_id: UUID, error: str, retryable: bool = True) -> None + async def cancel(task_id: UUID) -> None + + # Management + async def retry(task_id: UUID) -> None # Manual retry + async def requeue_stalled(max_age_ms: int = 60000) -> int # Recover crashed + async def get_status(task_id: UUID) -> TaskStatus + async def list_by_user(user_id: str, status: Optional[str]) -> List[TaskSummary] + + # Worker management + async def register_worker(agent_id: UUID, capacity: int) -> None + async def heartbeat(agent_id: UUID) -> None + async def unregister_worker(agent_id: UUID, reason: str) -> None +``` + +--- + +## 5. Integration with Existing Task Tool + +### Current Flow +``` +Task tool β†’ immediate subprocess spawn β†’ wait β†’ return result +``` + +### New Flow (with persistence) +``` +Task tool β†’ enqueue() β†’ return task_id (immediate) + ↓ +Background worker β†’ dequeue() β†’ spawn subprocess β†’ complete()/fail() + ↓ +Caller polls/gets notification when task completes +``` + +### Changes to Task Tool Schema +```python +class TaskTool: + async def run( + self, + prompt: str, + subagent_type: str, + # ... existing args ... + persist: bool = False, # NEW: enqueue instead of immediate run + priority: int = 100, # NEW + tags: Optional[List[str]] = None # NEW + ) -> TaskResult: + if persist: + task_id = await self.queue.enqueue(...) + return TaskResult(task_id=task_id, status="pending") + else: + # Legacy: immediate execution + ... +``` + +### Worker Agent Integration + +**Worker subscribes to queue:** +```python +async def worker_loop(agent_id: UUID): + while running: + task = await queue.dequeue(agent_id, timeout_ms=5000) + if task: + # Spawn subprocess + proc = await asyncio.create_subprocess_exec( + "letta", "run-agent", f"--task-id={task.id}", + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE + ) + + # Monitor and wait + stdout, stderr = await proc.communicate() + + # Update queue based on result + if proc.returncode == 0: + await queue.complete(task.id, parse_result(stdout)) + else: + await queue.fail(task.id, stderr.decode(), retryable=True) +``` + +--- + +## 6. Implementation Phases + +### Phase 1a: In-Memory Prototype (Week 1) +- Python `asyncio.Queue` for pending tasks +- In-memory dict for task storage +- Single worker process +- No Redis dependency + +### Phase 1b: Redis Integration (Week 2) +- Replace queue with Redis +- Add task persistence +- Implement retry logic +- Add stall recovery + +### Phase 1c: Worker Pool (Week 3-4) +- Multiple worker processes +- Worker heartbeat monitoring +- Task assignment logic +- Graceful shutdown handling + +### Phase 1d: API & CLI (Week 5-6) +- REST API for task management +- CLI commands for queue inspection +- Task status dashboard endpoint +- Webhook notifications + +### Phase 1e: Integration (Week 7-8) +- Modify Task tool to use queue +- Add persistence flag +- Maintain backward compatibility +- Migration path for existing code + +--- + +## 7. Retry Logic with Exponential Backoff + +```python +async def retry_with_backoff(task_id: UUID): + task = await queue.get(task_id) + + if task.retry_count >= task.max_retries: + await queue.fail(task_id, "Max retries exceeded", retryable=False) + return + + # Exponential backoff: 2^retry_count seconds + delay = min(2 ** task.retry_count, 300) # Cap at 5 minutes + + await asyncio.sleep(delay) + + # Re-enqueue with same priority + await queue.enqueue(task, priority=task.priority) +``` + +--- + +## 8. Error Handling Strategy + +| Error Type | Retry? | Action | +|------------|--------|--------| +| Subagent crash | Yes | Increment retry, requeue | +| Syntax error in code | No | Fail immediately | +| Timeout | Yes | Retry with longer timeout | +| API rate limit | Yes | Retry with exponential backoff | +| Out of memory | No | Fail, alert admin | +| Redis connection lost | Yes | Reconnect, retry operation | + +--- + +## Next Steps + +1. **Implement in-memory prototype** (Week 1) +2. **Add Redis persistence** (Week 2) +3. **Build worker pool** (Week 3-4) +4. **Integrate with Task tool** (Week 7-8) +5. **Write tests for queue durability** (ongoing) + +--- + +*Design by Researcher subagent, March 18, 2026* diff --git a/docs/ade-redis-queue-design.md b/docs/ade-redis-queue-design.md new file mode 100644 index 0000000..7ac9fce --- /dev/null +++ b/docs/ade-redis-queue-design.md @@ -0,0 +1,835 @@ +# Redis Task Queue Architecture for Letta Community ADE + +## Executive Summary + +This document outlines the architecture for replacing the in-memory `QueueRuntime` with a Redis-backed persistent task queue. The design prioritizes durability, horizontal scalability, and reliable task execution while maintaining compatibility with the existing Task tool and subagent spawning workflows. + +**Key Decisions:** +- Use **Redis Streams** (not Sorted Sets) for the primary task queue to leverage consumer groups and at-least-once delivery guarantees +- Hybrid approach: Streams for queue semantics, Sorted Sets for scheduling/delays, Hashes for task state +- Stateless workers with heartbeat-based liveness detection +- Exponential backoff with jitter for retry logic + +--- + +## 1. Redis Data Structures + +### 1.1 Primary Queue: Redis Stream + +``` +Key: ade:queue:tasks +Type: Stream +Purpose: Main task ingestion and distribution +``` + +**Why Streams over Sorted Sets?** + +| Feature | Sorted Sets | Redis Streams | +|---------|-------------|---------------| +| Ordering | Score-based (can have ties) | Strict temporal (millisecond ID) | +| Consumer Groups | Manual implementation | Built-in XREADGROUP | +| Delivery Semantics | At-most-once (easy) / At-least-once (complex) | At-least-once with ACK | +| Pending Tracking | Manual | Built-in XPENDING | +| Claim/Retry | Custom Lua scripts | Built-in XCLAIM/XAUTOCLAIM | +| Message Visibility | Immediate to all | Consumer-group isolated | + +Streams provide the exact semantics needed for reliable task processing without custom Lua scripting. + +**Stream Entries:** +``` +XADD ade:queue:tasks * taskId payload priority +``` + +### 1.2 Delayed Tasks: Sorted Set + +``` +Key: ade:queue:delayed +Type: Sorted Set (ZSET) +Score: scheduled execution timestamp (ms) +Member: taskId +``` + +Used for: +- Tasks with explicit `runAfter` timestamps +- Retry scheduling with exponential backoff +- Rate-limited task release + +### 1.3 Task State Storage: Redis Hash + +``` +Key: ade:task:{taskId} +Type: Hash +Fields: + - id: string (UUID v4) + - status: pending|running|completed|failed + - payload: JSON (task arguments) + - createdAt: timestamp (ms) + - startedAt: timestamp (ms) + - completedAt: timestamp (ms) + - workerId: string (nullable) + - attemptCount: integer + - maxAttempts: integer (default: 3) + - error: string (last error message) + - result: JSON (completed task result) + - parentTaskId: string (nullable, for task chains) + - subagentId: string (link to subagent state) + - priority: integer (0-9, default 5) + - kind: message|task_notification|approval_result|overlay_action +TTL: 7 days (configurable cleanup for completed/failed tasks) +``` + +### 1.4 Worker Registry: Redis Hash + Sorted Set + +``` +Key: ade:workers:active +Type: Hash +Fields per worker: + - {workerId}: JSON { hostname, pid, startedAt, lastHeartbeat, version } + +Key: ade:workers:heartbeat +Type: Sorted Set +Score: last heartbeat timestamp +Member: workerId +``` + +### 1.5 Consumer Group State + +``` +Stream Consumer Group: ade:queue:tasks +Group Name: ade-workers +Consumer Name: {workerId} (unique per process) +``` + +Redis Streams automatically track: +- Pending messages per consumer (XPENDING) +- Delivery count per message +- Idle time since last read + +--- + +## 2. Task Entity Schema + +### 2.1 TypeScript Interface + +```typescript +// src/queue/redis/types.ts + +export type TaskStatus = + | "pending" // Enqueued, not yet claimed + | "running" // Claimed by worker, processing + | "completed" // Successfully finished + | "failed" // Exhausted all retries + | "cancelled"; // Explicitly cancelled + +export type TaskKind = + | "message" + | "task_notification" + | "approval_result" + | "overlay_action"; + +export interface TaskPayload { + // Task identification + id: string; // UUID v4 + kind: TaskKind; + + // Execution context + agentId?: string; + conversationId?: string; + clientMessageId?: string; + + // Content (varies by kind) + content?: unknown; // For "message" kind + text?: string; // For notification/approval/overlay + + // Subagent execution params (for task_notification) + subagentType?: string; + prompt?: string; + model?: string; + existingAgentId?: string; + existingConversationId?: string; + maxTurns?: number; + + // Scheduling + priority: number; // 0-9, lower = higher priority + runAfter?: number; // Timestamp ms (for delayed tasks) + + // Retry configuration + maxAttempts: number; + backoffMultiplier: number; // Default: 2 + maxBackoffMs: number; // Default: 300000 (5 min) + + // Metadata + enqueuedAt: number; + source: "user" | "system" | "hook"; +} + +export interface TaskState extends TaskPayload { + status: TaskStatus; + workerId?: string; + attemptCount: number; + startedAt?: number; + completedAt?: number; + error?: string; + result?: unknown; + + // Coalescing support (from QueueRuntime) + isCoalescable: boolean; + scopeKey?: string; // For grouping coalescable items +} +``` + +### 2.2 State Transitions + +``` + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ PENDING │◄──────────────────┐ + β”‚ (queued) β”‚ β”‚ + β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ + β”‚ claim β”‚ retry + β–Ό β”‚ (with delay) + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”‚ RUNNING β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ (claimed) β”‚ fail (retryable) + β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ +complete β”‚ β”‚ fail (final) + β”‚ β–Ό + β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + └────────►│ COMPLETED β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β” + β”‚ FAILED β”‚ + β”‚ (exhausted)β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## 3. Worker Pool Registration and Heartbeat + +### 3.1 Worker Lifecycle + +```typescript +// src/queue/redis/worker.ts + +class TaskWorker { + private workerId: string; + private redis: RedisClient; + private isRunning: boolean = false; + private heartbeatInterval?: NodeJS.Timeout; + private claimInterval?: NodeJS.Timeout; + + // Config + private readonly HEARTBEAT_INTERVAL_MS = 5000; + private readonly HEARTBEAT_TIMEOUT_MS = 30000; + private readonly CLAIM_BATCH_SIZE = 10; + private readonly PROCESSING_TIMEOUT_MS = 300000; // 5 min + + async start(): Promise { + this.workerId = generateWorkerId(); // {hostname}:{pid}:{uuid} + + // Register in worker registry + await this.redis.hSet("ade:workers:active", this.workerId, JSON.stringify({ + hostname: os.hostname(), + pid: process.pid, + startedAt: Date.now(), + lastHeartbeat: Date.now(), + version: process.env.npm_package_version || "unknown" + })); + + // Create consumer in stream group (idempotent) + try { + await this.redis.xGroupCreate("ade:queue:tasks", "ade-workers", "$", { + MKSTREAM: true + }); + } catch (err) { + // Group already exists - ignore + } + + this.isRunning = true; + this.startHeartbeat(); + this.startClaimLoop(); + } + + async stop(): Promise { + this.isRunning = false; + clearInterval(this.heartbeatInterval); + clearInterval(this.claimInterval); + + // Release pending tasks back to queue + await this.releasePendingTasks(); + + // Deregister + await this.redis.hDel("ade:workers:active", this.workerId); + await this.redis.zRem("ade:workers:heartbeat", this.workerId); + } + + private startHeartbeat(): void { + this.heartbeatInterval = setInterval(async () => { + await this.redis.zAdd("ade:workers:heartbeat", { + score: Date.now(), + value: this.workerId + }); + await this.redis.hSet("ade:workers:active", this.workerId, JSON.stringify({ + ...currentInfo, + lastHeartbeat: Date.now() + })); + }, this.HEARTBEAT_INTERVAL_MS); + } +} +``` + +### 3.2 Dead Worker Detection + +```typescript +// src/queue/redis/orchestrator.ts (singleton, per-deployment) + +class QueueOrchestrator { + async detectAndReclaimDeadWorkerTasks(): Promise { + const now = Date.now(); + const cutoff = now - this.HEARTBEAT_TIMEOUT_MS; + + // Find dead workers + const deadWorkers = await this.redis.zRangeByScore( + "ade:workers:heartbeat", + "-inf", + cutoff + ); + + let reclaimedCount = 0; + + for (const workerId of deadWorkers) { + // Find pending tasks for this worker using XPENDING + const pending = await this.redis.xPendingRange( + "ade:queue:tasks", + "ade-workers", + "-", + "+ + this.CLAIM_BATCH_SIZE + ); + + for (const item of pending) { + if (item.consumer === workerId && item.idle > this.PROCESSING_TIMEOUT_MS) { + // Use XAUTOCLAIM to atomically claim and retry + const [nextId, claimed] = await this.redis.xAutoClaim( + "ade:queue:tasks", + "ade-workers", + "orchestrator", // consumer name for cleanup + this.PROCESSING_TIMEOUT_MS, + item.id, + { COUNT: 1 } + ); + + // Release back to pending by ACKing (removes from pending list) + // The orchestrator will re-add to delayed queue for retry + await this.redis.xAck("ade:queue:tasks", "ade-workers", item.id); + await this.scheduleRetry(item.id); + reclaimedCount++; + } + } + + // Clean up dead worker registration + await this.redis.hDel("ade:workers:active", workerId); + await this.redis.zRem("ade:workers:heartbeat", workerId); + } + + return reclaimedCount; + } +} +``` + +--- + +## 4. Retry Logic with Exponential Backoff + +### 4.1 Backoff Calculation + +```typescript +// src/queue/redis/retry.ts + +interface RetryConfig { + attempt: number; // 0-indexed (0 = first retry) + baseDelayMs: number; // Default: 1000 + multiplier: number; // Default: 2 + maxDelayMs: number; // Default: 300000 (5 min) + jitterFactor: number; // Default: 0.1 (10% randomization) +} + +function calculateRetryDelay(config: RetryConfig): number { + // Exponential backoff: base * (multiplier ^ attempt) + const exponentialDelay = config.baseDelayMs * + Math.pow(config.multiplier, config.attempt); + + // Cap at max + const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs); + + // Add jitter to prevent thundering herd: Β±jitterFactor + const jitter = cappedDelay * config.jitterFactor * (Math.random() * 2 - 1); + + return Math.floor(cappedDelay + jitter); +} + +// Examples with defaults: +// Attempt 0 (first retry): ~1000ms Β±100ms +// Attempt 1: ~2000ms Β±200ms +// Attempt 2: ~4000ms Β±400ms +// Attempt 3: ~8000ms Β±800ms +// Attempt 4: ~16000ms Β±1600ms +// ...up to max 300000ms (5 min) +``` + +### 4.2 Retry Flow + +```typescript +async function handleTaskFailure( + taskId: string, + error: Error, + workerId: string +): Promise { + const taskKey = `ade:task:${taskId}`; + const task = await redis.hGetAll(taskKey); + + const attemptCount = parseInt(task.attemptCount) + 1; + const maxAttempts = parseInt(task.maxAttempts); + + if (attemptCount >= maxAttempts) { + // Final failure - mark as failed + await redis.hSet(taskKey, { + status: "failed", + error: error.message, + completedAt: Date.now(), + attemptCount: attemptCount.toString() + }); + + // Publish failure event for observers + await redis.publish("ade:events:task-failed", JSON.stringify({ + taskId, + error: error.message, + totalAttempts: attemptCount + })); + + // ACK to remove from pending + await redis.xAck("ade:queue:tasks", "ade-workers", taskId); + } else { + // Schedule retry + const delay = calculateRetryDelay({ + attempt: attemptCount, + baseDelayMs: 1000, + multiplier: 2, + maxDelayMs: 300000, + jitterFactor: 0.1 + }); + + const runAfter = Date.now() + delay; + + // Update task state + await redis.hSet(taskKey, { + status: "pending", + attemptCount: attemptCount.toString(), + error: error.message, + workerId: "" // Clear worker assignment + }); + + // Add to delayed queue + await redis.zAdd("ade:queue:delayed", { + score: runAfter, + value: taskId + }); + + // ACK to remove from stream pending + await redis.xAck("ade:queue:tasks", "ade-workers", taskId); + } +} +``` + +### 4.3 Delayed Task Promoter + +```typescript +// Runs periodically (every 1 second) to move due tasks from delayed set to stream + +async function promoteDelayedTasks(): Promise { + const now = Date.now(); + + // Atomically get and remove due tasks + const dueTasks = await redis.zRangeByScore( + "ade:queue:delayed", + "-inf", + now, + { LIMIT: { offset: 0, count: 100 } } + ); + + if (dueTasks.length === 0) return 0; + + // Remove from delayed queue + await redis.zRem("ade:queue:delayed", dueTasks); + + // Re-add to stream for processing + for (const taskId of dueTasks) { + const task = await redis.hGetAll(`ade:task:${taskId}`); + await redis.xAdd("ade:queue:tasks", "*", { + taskId, + payload: task.payload, + priority: task.priority + }); + } + + return dueTasks.length; +} +``` + +--- + +## 5. Integration with Existing Task.ts + +### 5.1 Adapter Pattern + +```typescript +// src/queue/redis/adapter.ts + +import { QueueRuntime, QueueItem, DequeuedBatch } from "../queueRuntime"; +import { RedisQueue } from "./queue"; + +/** + * Redis-backed implementation of QueueRuntime interface. + * Allows drop-in replacement of in-memory queue. + */ +export class RedisQueueAdapter implements QueueRuntime { + private redisQueue: RedisQueue; + private localBatchBuffer: Map = new Map(); + + constructor(redisUrl: string, options?: QueueRuntimeOptions) { + this.redisQueue = new RedisQueue(redisUrl, { + ...options, + onTaskCompleted: this.handleTaskCompleted.bind(this), + onTaskFailed: this.handleTaskFailed.bind(this) + }); + } + + async enqueue(input: Omit): Promise { + // Map QueueItem to TaskPayload + const taskId = generateUUID(); + const enqueuedAt = Date.now(); + + const payload: TaskPayload = { + id: taskId, + kind: input.kind, + agentId: input.agentId, + conversationId: input.conversationId, + clientMessageId: input.clientMessageId, + text: (input as any).text, + content: (input as any).content, + priority: 5, // Default priority + maxAttempts: 3, + backoffMultiplier: 2, + maxBackoffMs: 300000, + enqueuedAt, + source: "user", + isCoalescable: isCoalescable(input.kind) + }; + + const success = await this.redisQueue.enqueue(payload); + if (!success) return null; + + return { + ...input, + id: taskId, + enqueuedAt + } as QueueItem; + } + + async tryDequeue(blockedReason: QueueBlockedReason | null): Promise { + if (blockedReason !== null) { + // Emit blocked event if needed (preserving QueueRuntime behavior) + return null; + } + + // Claim batch from Redis + const batch = await this.redisQueue.claimBatch({ + consumerId: this.workerId, + batchSize: this.getCoalescingBatchSize(), + coalescingWindowMs: 50 // Small window for coalescing + }); + + if (!batch || batch.length === 0) return null; + + // Map back to QueueItem format + const items: QueueItem[] = batch.map(task => this.mapTaskToQueueItem(task)); + + return { + batchId: generateBatchId(), + items, + mergedCount: items.length, + queueLenAfter: await this.redisQueue.getQueueLength() + }; + } + + // ... other QueueRuntime methods +} +``` + +### 5.2 Task.ts Integration Points + +**Current Flow (Task.ts line 403+):** +```typescript +// Background task spawning +const { taskId, outputFile, subagentId } = spawnBackgroundSubagentTask({ + subagentType: subagent_type, + prompt, + description, + model, + toolCallId, + existingAgentId: args.agent_id, + existingConversationId: args.conversation_id, + maxTurns: args.max_turns, +}); +``` + +**Proposed Redis Integration:** +```typescript +// New: Redis-backed task queue integration +interface TaskQueueEnqueueOptions { + subagentType: string; + prompt: string; + description: string; + model?: string; + toolCallId?: string; + existingAgentId?: string; + existingConversationId?: string; + maxTurns?: number; + priority?: number; + runInBackground?: boolean; +} + +// In Task.ts - replace spawnBackgroundSubagentTask with: +export async function enqueueSubagentTask( + args: TaskQueueEnqueueOptions, + queue: RedisQueue +): Promise { + + const taskId = generateTaskId(); + const subagentId = generateSubagentId(); + + // Register in subagent state store (for UI) + registerSubagent(subagentId, args.subagentType, args.description, args.toolCallId, true); + + const outputFile = createBackgroundOutputFile(taskId); + + // Create task payload + const payload: TaskPayload = { + id: taskId, + kind: "task_notification", + subagentType: args.subagentType, + prompt: args.prompt, + description: args.description, + model: args.model, + existingAgentId: args.existingAgentId, + existingConversationId: args.existingConversationId, + maxTurns: args.maxTurns, + subagentId, + outputFile, + priority: args.priority ?? 5, + maxAttempts: 3, + backoffMultiplier: 2, + maxBackoffMs: 300000, + enqueuedAt: Date.now(), + source: "user", + isCoalescable: false // Task notifications are not coalescable + }; + + // Enqueue to Redis + await queue.enqueue(payload); + + return { taskId, outputFile, subagentId }; +} +``` + +### 5.3 Worker Implementation for Subagents + +```typescript +// src/queue/redis/subagent-worker.ts + +class SubagentTaskWorker extends TaskWorker { + protected async processTask(task: TaskState): Promise { + // Update subagent state to "running" + updateSubagent(task.subagentId!, { status: "running" }); + + try { + // Execute subagent (existing manager.ts logic) + const result = await spawnSubagent( + task.subagentType!, + task.prompt!, + task.model, + task.subagentId!, + undefined, // signal - handled via task cancellation + task.existingAgentId, + task.existingConversationId, + task.maxTurns + ); + + // Write transcript + writeTaskTranscriptResult(task.outputFile!, result, ""); + + // Complete subagent state + completeSubagent(task.subagentId!, { + success: result.success, + error: result.error, + totalTokens: result.totalTokens + }); + + // Send notification if not silent + if (!task.silent) { + const notification = formatTaskNotification({ + taskId: task.id, + status: result.success ? "completed" : "failed", + summary: `Agent "${task.description}" ${result.success ? "completed" : "failed"}`, + result: result.success ? result.report : result.error, + outputFile: task.outputFile! + }); + + // Add to message queue for parent agent + addToMessageQueue({ + kind: "task_notification", + text: notification + }); + } + + // Mark task completed + await this.completeTask(task.id, result); + + } catch (error) { + const errorMessage = error instanceof Error ? error.message : String(error); + + // Update subagent state + completeSubagent(task.subagentId!, { success: false, error: errorMessage }); + + // Fail task (triggers retry logic) + await this.failTask(task.id, new Error(errorMessage)); + } + } +} +``` + +--- + +## 6. Operational Considerations + +### 6.1 Redis Configuration + +```yaml +# Recommended Redis config for task queue +maxmemory: 1gb +maxmemory-policy: allkeys-lru # Evict old completed tasks first + +# Persistence (for durability) +appendonly: yes +appendfsync: everysec + +# Stream trimming (prevent unbounded growth) +# Set via XTRIM or MAXLEN on XADD +``` + +### 6.2 Key Patterns and Cleanup + +| Key Pattern | Type | TTL | Cleanup Strategy | +|-------------|------|-----|------------------| +| `ade:queue:tasks` | Stream | - | XTRIM by MAXLEN (keep 100k) | +| `ade:queue:delayed` | ZSET | - | Processed by promoter | +| `ade:task:{id}` | Hash | 7 days | Expire completed/failed | +| `ade:workers:active` | Hash | - | On worker deregistration | +| `ade:workers:heartbeat` | ZSET | - | On worker timeout | + +### 6.3 Monitoring Metrics + +```typescript +// Metrics to expose via Prometheus/StatsD +interface QueueMetrics { + // Queue depth + "ade_queue_pending_total": number; // XPENDING count + "ade_queue_delayed_total": number; // ZCARD ade:queue:delayed + "ade_queue_stream_length": number; // XLEN ade:queue:tasks + + // Throughput + "ade_tasks_enqueued_rate": number; // XADD rate + "ade_tasks_completed_rate": number; // Completion rate + "ade_tasks_failed_rate": number; // Failure rate + + // Worker health + "ade_workers_active_total": number; // HLEN ade:workers:active + "ade_workers_dead_total": number; // Detected dead workers + + // Processing + "ade_task_duration_ms": Histogram; // Time from claim to complete + "ade_task_wait_ms": Histogram; // Time from enqueue to claim + "ade_task_attempts": Histogram; // Distribution of retry counts +} +``` + +### 6.4 Failure Modes + +| Scenario | Handling | +|----------|----------| +| Redis unavailable | Tasks fail immediately; caller responsible for retry | +| Worker crash | Tasks reclaimed via heartbeat timeout (30s) | +| Poison message | Max retries (3) then moved to DLQ | +| Slow task | Processing timeout (5 min) triggers requeue | +| Duplicate task | Idempotent task IDs (UUID) prevent double execution | + +--- + +## 7. Migration Strategy + +### Phase 1: Dual-Write (Week 1) +- Implement RedisQueueAdapter +- Write to both in-memory and Redis queues +- Read from in-memory only (Redis for validation) + +### Phase 2: Shadow Mode (Week 2) +- Read from both queues +- Compare results, log discrepancies +- Fix any edge cases + +### Phase 3: Cutover (Week 3) +- Switch reads to Redis +- Keep in-memory as fallback +- Monitor for 1 week + +### Phase 4: Cleanup (Week 4) +- Remove in-memory queue code +- Full Redis dependency + +--- + +## 8. Implementation Checklist + +- [ ] Redis client configuration (ioredis or node-redis) +- [ ] Task entity schema and serialization +- [ ] Stream consumer group setup +- [ ] Worker registration and heartbeat +- [ ] Task claim and processing loop +- [ ] Retry logic with exponential backoff +- [ ] Delayed task promotion +- [ ] Dead worker detection and reclamation +- [ ] QueueRuntime adapter implementation +- [ ] Task.ts integration +- [ ] Subagent state synchronization +- [ ] Metrics and monitoring +- [ ] Error handling and DLQ +- [ ] Tests (unit, integration, load) +- [ ] Documentation + +--- + +## 9. Appendix: Redis Commands Reference + +| Operation | Command | Complexity | +|-----------|---------|------------| +| Enqueue task | `XADD` | O(1) | +| Claim tasks | `XREADGROUP` | O(N) N=count | +| Ack completion | `XACK` | O(1) | +| Get pending | `XPENDING` | O(1) | +| Claim pending | `XCLAIM` / `XAUTOCLAIM` | O(log N) | +| Delay task | `ZADD` delayed | O(log N) | +| Promote delayed | `ZRANGEBYSCORE` + `ZREM` + `XADD` | O(log N + M) | +| Register worker | `HSET` + `ZADD` | O(1) | +| Heartbeat | `ZADD` | O(log N) | +| Detect dead | `ZRANGEBYSCORE` | O(log N + M) | diff --git a/docs/ade-research.md b/docs/ade-research.md new file mode 100644 index 0000000..64693d6 --- /dev/null +++ b/docs/ade-research.md @@ -0,0 +1,257 @@ +# Agent Development Environment (ADE) Research + +**Date:** March 17, 2026 +**Purpose:** Compare existing ADE solutions to inform Letta Community ADE development + +--- + +## Executive Summary + +The ADE category emerged in 2025 as agentic AI proved too complex for traditional IDE/CLI tooling. Three primary architectures exist: + +1. **Letta ADE** - Memory-first, context window transparency, multi-model +2. **Intent (Augment)** - Spec-driven with coordinator/specialist/verifier pattern +3. **Warp Oz** - Terminal-native with cloud orchestration + +Each approaches multi-agent orchestration differently, offering distinct tradeoffs for community implementation. + +--- + +## 1. Letta ADE (Our Foundation) + +### Core Philosophy +> "Designing great agents is all about designing great context windows" + +Letta ADE makes the opaque world of context windows and agent reasoning **visible and manageable**. + +### Key Features + +| Feature | Implementation | +|---------|---------------| +| **State & Memory** | Stateful agents that learn from interactions vs stateless LLMs | +| **Context Management** | Editable memory blocks, tools, system prompts with character limits | +| **Memory Architecture** | Core Memory (in-context blocks) + Archival/Recall Memory (vector DB) | +| **Transparent Reasoning** | All agents must show their work - reasoning separated from user communication | +| **Tool Integration** | 7,000+ tools via Composio, custom Python tool editor | +| **Production Modes** | Simple/Interactive/Debug modes for different use cases | + +### Architecture Highlights +- **Core Memory**: Editable in-context blocks (`core_memory_append`, `core_memory_replace`) +- **Archival Memory**: Vector database for free-form storage (`archival_memory_insert`, `archival_memory_search`) +- **Recall Memory**: Automatic conversation history tracking (`conversation_search`) +- **Context Pruning**: Recursive summarization + message pruning to manage window size + +### Strengths +βœ… Memory-first design (MemGPT heritage) +βœ… Transparent reasoning by design +βœ… Context window controls +βœ… Real-time tool execution in ADE +βœ… Production deployment ready + +### Letta Code CLI Features +- Client-side tool execution (Bash, Read, Write execute locally) +- Streaming API with background mode for long operations +- Conversations API for parallel sessions with shared memory +- Subagent spawning via Task tool +- Memory-first coding with persistent context + +--- + +## 2. Intent by Augment Code + +### Core Philosophy +> "Spec-Driven Development puts the spec at the center of your workflow" + +Intent uses **living specifications** that update as agents work, preventing the "outdated PRD" problem. + +### Key Features + +| Feature | Implementation | +|---------|---------------| +| **Spec-Driven** | Living spec as source of truth - updates as code changes | +| **Coordinator Pattern** | Coordinator β†’ Specialists β†’ Verifier pipeline | +| **Parallel Work** | Isolated git worktrees for concurrent agent execution | +| **Specialist Agents** | Investigate, Implement, Verify, Critique, Debug, Code Review | +| **BYOA** | Bring Your Own Agent (Claude Code, Codex, OpenCode supported) | +| **Context Engine** | Semantic dependency analysis across 400,000+ files | + +### Architecture: Coordinator/Specialist/Verifier + +``` +Coordinator Agent + ↓ analyzes codebase, drafts spec, generates tasks +Specialist Agents (parallel in isolated worktrees) + ↓ execute scoped tasks +Verifier Agent + ↓ validates against spec before merge +Changes Tab + ↓ human review, merge/stage/create PR +``` + +### Specialist Roles +- **Investigate** - Explore codebase, assess feasibility +- **Implement** - Execute implementation plans +- **Verify** - Check implementations match specs +- **Critique** - Review specs for feasibility +- **Debug** - Analyze and fix issues +- **Code Review** - Automated reviews with severity + +### Unique Features +- **Git Worktree Isolation**: Each agent runs in independent working directory +- **WARP.md**: Compatible with agents.md, claude.md for agent behavior +- **Context Engine**: Call-graph and dependency-chain understanding +- **Verifier Agent**: Catches misalignment before human review + +### Compliance +- SOC 2 Type II (zero deviations, Coalfire audited) +- ISO/IEC 42001 (AI governance certification) +- Customer-Managed Encryption Keys (CMEK) +- Air-gapped deployment options + +### Strengths +βœ… Living specs prevent drift +βœ… Verifier catches misalignment +βœ… Enterprise compliance (dual certification) +βœ… BYOA prevents lock-in +βœ… Context Engine handles massive codebases + +--- + +## 3. Warp Oz (Terminal-Native ADE) + +### Core Philosophy +> "Break out of your shell" - Terminal as the primary surface for agentic development + +Warp reimagines the terminal as an agent platform with **Oz orchestration**. + +### Key Features + +| Feature | Implementation | +|---------|---------------| +| **Full Terminal Use** | Agents can run interactive CLI apps (REPLs, debuggers, top) | +| **Cloud Agents** | Background agents on Warp infrastructure or self-hosted | +| **Local Agents** | Real-time interactive coding in Warp terminal | +| **Auto-Tracking** | Every agent produces link + audit trail | +| **Multi-Model** | Mixed-model approach with fallback chains | +| **Skills** | Reusable instructions (compatible with Claude Code, Codex) | + +### Architecture: Oz Platform + +**Local Mode:** +- Run directly in Warp app +- Real-time, interactive assistance +- Multi-step planning, debugging, fixing + +**Cloud Mode:** +- Run on Warp infrastructure (or self-hosted) +- Scheduled agents (cron-like) +- Event triggers (Slack, GitHub, webhooks) +- Parallel execution across repos + +### Oz Capabilities +- **Environments**: Docker containers + git repos + startup commands +- **Session Sharing**: Links to track and steer agents +- **Artifacts**: PRs, branches, plans automatically tracked +- **Skills**: Any Skill can become an agent automation +- **API/SDK/CLI**: Fully programmable agent stack + +### Unique Features +- **Multi-Repo Changes**: One agent can work across repos +- **Computer Use**: Visual verification via screenshots +- **Agent Session Sharing**: Hop into any running agent +- **Cloud Mode**: Background automation with full visibility + +### Performance Claims +- Terminal-Bench: #1 ranked (52% β†’ 61.2%) +- SWE-bench Verified: 71% +- 60%+ merged PRs created by Oz +- 700K+ active developers + +### Security +- SOC 2 Type 2 certified +- Contractual Zero Data Retention (ZDR) with Anthropic, OpenAI, Fireworks, Google +- Configurable permissions (Never/Always allow/Prompt/Let agent decide) +- Agent Profiles (Prod mode/YOLO mode) + +### Strengths +βœ… Full terminal control (unique in market) +βœ… Cloud agent infrastructure +βœ… Multi-repo changes +βœ… Contractual ZDR across all providers +βœ… Terminal-native workflow + +--- + +## 4. Feature Comparison Matrix + +| Feature | Letta ADE | Intent | Warp Oz | +|---------|-----------|--------|---------| +| **Orchestration Model** | Memory-driven | Coordinator/Specialist/Verifier | Local + Cloud agents | +| **Core Abstraction** | Context windows + Memory | Living specs + Git worktrees | Terminal + Environments | +| **Multi-Agent** | Subagents via Task | Parallel specialists | Cloud agent pool | +| **Isolation** | Memory blocks | Git worktrees | Docker environments | +| **Context Strategy** | Hierarchical memory | Semantic Context Engine | Codebase indexing + MCP | +| **Verification** | Tool return validation | Verifier agent | Human-in-the-loop | +| **BYOA** | Open source, BYOK | Claude/Codex/OpenCode | Multi-model, BYOK | +| **Compliance** | SOC 2 | SOC 2 + ISO 42001 | SOC 2 + ZDR | +| **Scale** | Terminal-Bench #1 | 400K+ files | 700K+ developers | +| **Unique** | Memory-first | Spec-driven | Terminal-native | + +--- + +## 5. Community ADE Recommendations + +Based on this research, here's what a **Letta Community ADE** should prioritize: + +### Phase 1: Foundation (Letta Already Has) +- βœ… Memory-first architecture (Core/Archival/Recall) +- βœ… Context window transparency +- βœ… Subagent spawning (Task tool) +- βœ… Real-time tool execution +- βœ… Multi-model support + +### Phase 2: Enhanced Orchestration (From Intent) +- **Git Worktree Isolation**: Execute subagents in isolated branches +- **Coordinator Pattern**: Formal coordinator/specialist/verifier roles +- **Approval Queue Enhancement**: Structured task delegation +- **Spec Tracking**: Document what was planned vs executed + +### Phase 3: Scale Features (From Warp) +- **Cloud Agent Mode**: Background agents with session tracking +- **Multi-Repo Support**: Cross-repository changes +- **Skills System**: Reusable agent instructions +- **Session Sharing**: Links to share agent runs + +### Phase 4: Advanced Features +- **Verification Layer**: Automated spec compliance checking +- **Context Engine**: Semantic dependency analysis +- **Scheduling**: Recurring agent tasks +- **Event Triggers**: React to GitHub/Slack events + +--- + +## 6. Key Implementation Insights + +### From Intent: Spec-Driven Works +The "living spec" concept prevents the most common agent failure mode: drift between intent and implementation. Letta's memory blocks could serve this purpose with explicit "plan" vs "execution" blocks. + +### From Warp: Terminal is Underrated +Full terminal control enables agents to use the same tools developers use (REPLs, debuggers, etc.). Letta Code's Bash tool already supports this, but could be enhanced with "terminal session" preservation. + +### From Letta: Memory is Differentiating +Neither Intent nor Warp have Letta's tiered memory architecture. This is a unique strength to build upon - memory as the coordination layer, not just context. + +--- + +## 7. Sources + +1. [Letta ADE Blog](https://www.letta.com/blog/introducing-the-agent-development-environment) +2. [Letta ADE Docs](https://docs.letta.com/guides/ade/overview/) +3. [Intent by Augment](https://www.augmentcode.com/product/intent) +4. [Intent ADE Guide](https://www.augmentcode.com/guides/what-is-an-agentic-development-environment) +5. [Warp Oz Platform](https://www.warp.dev/oz) +6. [Warp Oz Launch](https://www.warp.dev/blog/oz-orchestration-platform-cloud-agents) + +--- + +*Generated by Ani (Letta agent) - March 17, 2026* diff --git a/docs/community-ade-project-state.md b/docs/community-ade-project-state.md new file mode 100644 index 0000000..65ec1eb --- /dev/null +++ b/docs/community-ade-project-state.md @@ -0,0 +1,97 @@ +# Community ADE Project - State Management + +**Project:** Letta Community Agentic Development Environment +**Orchestrator:** Ani (Annie Tunturi) +**Created:** March 18, 2026 +**Status:** Phase 1 - Orchestration Layer + +--- + +## Active Subagents + +| Subagent | Type | Status | Assigned Task | Output Location | +|----------|------|--------|---------------|-----------------| +| explorer-1 | explore | PENDING | Codebase exploration - task queue patterns | /tmp/ade-explorer-1/ | +| architect-1 | feature-architect | PENDING | Design Redis queue integration | /tmp/ade-architect-1/ | +| researcher-1 | researcher | COMPLETED | ADE competitive analysis | docs/community-ade-research-synthesis-2026-03-18.md | + +--- + +## Document Registry + +### Research Documents +- [x] `community-ade-research-2026-03-18.md` - Initial research +- [x] `ade-phase1-orchestration-design.md` - Phase 1 technical design +- [x] `community-ade-research-synthesis-2026-03-18.md` - Web research synthesis +- [x] `ade-phase1-execution-plan.md` - 6-week execution plan + +### Design Documents +- [x] `ade-redis-queue-design.md` - Redis queue architecture (COMPLETED by researcher-2) +- [ ] `ade-task-queue-spec.md` - Detailed task queue specification (IN PROGRESS) +- [ ] `ade-worker-pool-design.md` - Worker pool architecture (PENDING) +- [ ] `ade-dashboard-wireframes.md` - Dashboard UI design (PENDING) + +### Implementation +- [ ] `ade-queue-prototype/` - In-memory prototype (NOT STARTED) +- [ ] `ade-redis-queue/` - Redis-backed implementation (NOT STARTED) +- [ ] `ade-worker-process/` - Worker daemon (NOT STARTED) + +--- + +## Current Phase: Phase 1 - Orchestration Layer + +### Goals +1. Build persistent task queue system +2. Implement worker pool for subagent execution +3. Add retry logic with exponential backoff +4. Integrate with existing Task tool + +### Decisions Made +- Use Redis (not Celery) for direct control +- In-memory prototype first, then Redis +- Worker pool with heartbeat monitoring +- Defer Temporal to Phase 2 evaluation + +### Open Questions +- Should we use Redis Streams or Sorted Sets? +- Worker count: Fixed or dynamic? +- Task priority levels: Simple (high/normal) or granular? + +--- + +## Subagent Work Queue + +### Ready to Assign +1. **Explore task queue patterns in codebase** + - Type: explore + - Focus: Find existing queue/spawning code + - Output: File locations and patterns + +2. **Design Redis queue architecture** + - Type: architect + - Focus: Data models, operations, integration points + - Output: Architecture spec document + +3. **Research Playwright Computer Use** + - Type: researcher + - Focus: Browser automation for agentic coding + - Output: Integration approach + +### Blocked +- None currently + +### Completed +- [x] ADE competitive analysis (researcher-1) + +--- + +## State Updates Log + +**2026-03-18 09:23 EDT** - Project initiated, research documents created +**2026-03-18 10:01 EDT** - Attempting to spawn parallel subagents +**2026-03-18 02:03 EDT** - explorer-1 completed: Found Task.ts (line 403), manager.ts (spawnSubagent at line 883), in-memory QueueRuntime class. No Redis currently exists. +**2026-03-18 02:07 EDT** - researcher-2 completed: Redis queue architecture design. Key decisions: Redis Streams (consumer groups), Hash per task, 5s worker heartbeat, exponential backoff with jitter, adapter pattern integration. + +--- + +*This file is maintained by Ani. Update when subagents report progress.* diff --git a/docs/community-ade-research-2026-03-18.md b/docs/community-ade-research-2026-03-18.md new file mode 100644 index 0000000..7b21c7a --- /dev/null +++ b/docs/community-ade-research-2026-03-18.md @@ -0,0 +1,245 @@ +# Community ADE Research - Letta vs Commercial Alternatives + +**Date:** March 18, 2026 +**Researcher:** Ani (researcher subagent) +**Goal:** Analyze Letta ADE capabilities and gaps vs Intent, Warp, and other commercial alternatives + +--- + +## Executive Summary + +Letta has a **solid foundation** for an ADE (Agentic Development Environment) with best-in-class memory management and subagent orchestration. The gaps are primarily in **orchestration polish, web UI, and team collaboration features** rather than core agent capabilities. + +**Estimated effort to full community ADE:** 6-9 months for a small team + +--- + +## 1. Current Letta ADE Capabilities + +### βœ… Subagent Orchestration (MATURE) +- Built-in approval queues with ACCEPT/REJECT flow +- Multi-agent fan-out (parallel subagent spawning) +- Result aggregation and error handling +- Agent lifecycle management (create/deploy/destroy) +- Conversation threading for complex workflows + +### βœ… Git Integration (UNIQUE STRENGTH) +- MemFS with git versioning +- Worktree isolation for parallel execution +- Automatic checkpointing +- Branch/merge semantics for agent memory +- Diff-based memory updates + +### βœ… CLI Experience (MATURE) +- Full headless mode (`--headless`) +- JSON streaming output (`--output-format stream-json`) +- System prompt customization (`--system-custom`) +- Tool filtering (`--tools`, `--allowedTools`) +- Session persistence + +### βœ… Memory System (BEST-IN-CLASS) +- Hierarchical memory blocks +- Resident/On-Demand archival tiers +- Clear-immune sacred memory +- Memory consolidation (Aster background agent) +- Git-backed durability + +### βœ… Skills System (SOLID) +- Portable, reusable tool packages +- MCP (Model Context Protocol) integration +- Custom skill development +- Tool registry and discovery + +### βœ… Model Agnostic +- Works with any OpenAI-compatible API +- Synthetic API integration +- Local Ollama support +- Multi-provider fallback + +--- + +## 2. Gaps vs Commercial ADEs + +| Feature | Letta Status | Intent | Warp | Notes | +|---------|-------------|--------|------|-------| +| **Web Dashboard** | ❌ Missing | βœ… Full UI | βœ… Full UI | CLI-only currently | +| **Triggers/Schedules** | ❌ Missing | βœ… GitHub webhooks | βœ… Built-in | No automated triggers | +| **GitHub Integration** | ⚠️ Manual | βœ… Native PR reviews | βœ… PR comments | No native GitHub app | +| **Persistent Task Queue** | ⚠️ In-process | βœ… Temporal/Cadence | βœ… Durable | Tasks lost on restart | +| **Spec-Driven Dev** | ⚠️ Basic | βœ… Full PRD support | βœ… Constitution | No formal spec layer | +| **Team Collaboration** | ❌ Missing | βœ… Shared workspaces | βœ… Org features | Single-user focus | +| **Observability** | ⚠️ Logs only | βœ… Full traces | βœ… Metrics | No dashboard metrics | +| **RBAC/Permissions** | ⚠️ Tool-level only | βœ… Role-based | βœ… Enterprise auth | No user roles | + +--- + +## 3. Community ADE Implementation Roadmap + +### Phase 1: Orchestration Layer (6-8 weeks) +**Goal:** Persistent, durable task execution + +**Components:** +- Task queue (Redis/RabbitMQ) +- Durable execution (Temporal.io or Cadence) +- Retry logic with exponential backoff +- Task state persistence +- Failure recovery + +**Key Decisions:** +- Use existing Letta subagent system +- Add queue abstraction layer +- Maintain CLI compatibility + +### Phase 2: Web Dashboard (4-6 weeks) +**Goal:** Real-time visibility and control + +**Features:** +- Agent grid view (running/pending/failed) +- Real-time logs (WebSocket streaming) +- Approval queue UI +- Memory browser +- Task history + +**Tech Stack:** +- React + Vite +- FastAPI backend +- WebSocket for real-time updates +- PostgreSQL for metadata + +### Phase 3: Integration Ecosystem (3-4 weeks) +**Goal:** Connect to external tools + +**Integrations:** +- GitHub App (PR reviews, issue comments) +- Slack/Discord notifications +- Webhook triggers +- CI/CD pipeline hooks +- Linear/Jira ticket creation + +### Phase 4: Specification Layer (8-10 weeks) +**Goal:** PRD-driven development + +**Features:** +- Executable specifications (Zod schemas) +- Constitution enforcement +- Architectural guardrails +- Test-first enforcement +- Spec versioning + +**Example Workflow:** +```yaml +specification: + prd: "./docs/prd.md" + constitution: + - "library-first: prefer existing packages" + - "test-first: all code must have tests" + - "simplicity: minimize abstraction layers" + +execution: + generate_plan: true + auto_implement: false + review_checkpoints: true +``` + +### Phase 5: Team Collaboration (4-6 weeks) +**Goal:** Multi-user, organization support + +**Features:** +- Shared agent registry +- Organization memory +- Agent permissions/roles +- Session sharing +- Comment/annotation system + +--- + +## 4. Technical Architecture Recommendations + +### Recommended Stack + +| Layer | Technology | Rationale | +|-------|------------|-----------| +| **Orchestration** | Temporal.io | Durable execution, retries, observability | +| **Queue** | Redis | Reliable, fast, pub/sub support | +| **API** | FastAPI | Async native, easy WebSockets | +| **Dashboard** | React + Vite | Modern, good DX, fast builds | +| **Database** | PostgreSQL | ACID, JSON support, mature | +| **Memory** | Git + Letta MemFS | Existing infrastructure | +| **Auth** | OAuth 2.0 / SSO | Enterprise compatibility | + +### Integration Points + +```typescript +// Letta API Integration +interface LettaCloudConfig { + lettaBaseUrl: string; + apiKey: string; + + // Cloud features + triggers: WebhookConfig[]; + schedules: ScheduleConfig[]; + + // Team features + organizationId: string; + sharedAgents: string[]; +} + +// Specification Layer +interface SpecificationConfig { + prdPath: string; + constitution: string[]; + guardrails: GuardrailRule[]; +} +``` + +--- + +## 5. Letta's Unique Advantages + +1. **Stateful by Design**: Unlike stateless alternatives, Letta's MemFS provides true persistence +2. **Git-Native**: Version control for agent memory is unique to Letta +3. **Model Agnostic**: Not locked into single provider +4. **Open Source**: Full transparency, community extensible +5. **CLI-First**: Developers love the terminal experience + +--- + +## 6. Critical Path to MVP + +**Minimum Viable Community ADE:** + +1. βœ… **Already have**: Subagent orchestration, memory system, CLI +2. πŸ”„ **Need soon**: Persistent task queue, basic dashboard +3. πŸ“‹ **Next phase**: GitHub integration, triggers +4. πŸ“‹ **Future**: Full spec layer, team features + +**Priorities:** +1. Fix Task tool reliability (persistent queue) +2. Build minimal dashboard (agent status, approvals) +3. Add GitHub webhook support +4. Implement basic spec validation + +--- + +## 7. Conclusion + +### Letta's Position +- **Best memory system** in the open-source ADE space +- **Solid foundation** for enterprise-grade ADE +- **Unique git-native approach** provides durability others lack +- **Gaps are solvable** with focused engineering effort + +### Recommendation +**Build the community ADE on Letta**. The core architecture is superior to many commercial alternatives. The missing pieces (dashboard, queue durability, integrations) are well-understood engineering problems. + +**Start with:** +1. Persistent task queue (fixes current Task reliability issues) +2. Minimal web dashboard (real-time visibility) +3. GitHub webhook integration (proves external integration model) + +**Total effort to MVP:** 3-4 months focused work +**Total effort to full ADE:** 6-9 months + +--- + +*Research conducted by Ani (researcher subagent) on March 18, 2026* diff --git a/docs/community-ade-research-synthesis-2026-03-18.md b/docs/community-ade-research-synthesis-2026-03-18.md new file mode 100644 index 0000000..e6c19a2 --- /dev/null +++ b/docs/community-ade-research-synthesis-2026-03-18.md @@ -0,0 +1,601 @@ +# Community ADE Research Synthesis + +**Date:** March 18, 2026 +**Researcher:** Research Agent +**Goal:** Synthesize web research findings with existing documents and recommend technical next steps + +--- + +## Executive Summary + +Based on web research and analysis of existing documents, the Letta community ADE opportunity is **stronger than initially assessed**. The research confirms: + +1. **Letta's core architecture is genuinely differentiated** - No open-source competitor combines stateful agents, hierarchical memory, and git-native persistence +2. **Commercial ADEs (Warp, Intent) validate the market** but leave gaps Letta can fill +3. **Technical patterns are well-established** - Redis queues, Temporal workflows, FastAPI + React dashboards are proven patterns +4. **Community momentum exists** - Multiple open-source tools (Aider, Cline, Agno) show appetite for agentic development tools + +**Updated recommendation:** The 6-9 month estimate is realistic. The critical path is **orchestration layer + dashboard MVP**. + +--- + +## 1. Letta Position Analysis (Updated) + +### Current Strengths Confirmed + +From [docs.letta.com](https://docs.letta.com) and [github.com/letta-ai/letta](https://github.com/letta-ai/letta): + +| Feature | Letta | Competitive Landscape | +|---------|-------|----------------------| +| **Stateful Agents** | βœ… Core design | ❌ Most tools (Cline, Aider) are stateless | +| **Memory Blocks** | βœ… Hierarchical, tiered | ❌ Simple conversation history only | +| **Git-Native** | βœ… MemFS with versioning | ⚠️ Aider has git commits, no MemFS | +| **Model Agnostic** | βœ… Any OpenAI-compatible | ⚠️ Many lock to specific providers | +| **Subagents** | βœ… Built-in orchestration | ⚠️ Limited or external | +| **Skills/MCP** | βœ… Portable tool packages | βœ… Emerging standard | + +### Key Differentiator: Stateful + Memory + +Letta's **memory-first architecture** is unique in the open-source space: +- **Agno** ([docs.agno.com](https://docs.agno.com)): Agent framework, but no persistent memory architecture +- **Cline** ([github.com/cline/cline](https://github.com/cline/cline)): VS Code extension, session-only state +- **Aider** ([github.com/Aider-AI/aider](https://github.com/Aider-AI/aider)): Git-aware but no agent memory + +This positions Letta as the **only open-source option** for long-running, learning agents that persist knowledge across sessions. + +--- + +## 2. Commercial ADE Analysis + +### Intent (intent.dev) + +**Core Value Proposition:** Specification-Driven Development (SDD) + +From [intent.dev](https://intent.dev): +- **The Specification as the Asset**: PRD is source of truth +- **15-Minute Workflows**: Automated documentation generation +- **Architectural Guardrails**: "Development Constitution" for enforcing principles +- **Total Pivotability**: Change spec β†’ regenerate implementation + +**Key Insight for Letta:** +Intent focuses on the *specification layer* - the "what" before the "how". This is Phase 4 in our roadmap. Intent's approach validates that **executable specifications are valuable**, but they don't provide the underlying agent infrastructure. Letta could integrate Intent-style spec validation without building the spec layer from scratch. + +**Gap Letta Can Fill:** +- Intent appears to be a platform/service; Letta is open-source and self-hostable +- Intent doesn't mention stateful/memory-enabled agents +- No evidence of subagent orchestration + +### Warp (warp.dev) + +**Core Value Proposition:** Terminal-native ADE with "Oz" agent + +From [warp.dev](https://warp.dev): +- **Multi-model by default**: OpenAI, Anthropic, Google models +- **Full Terminal Use**: Interactive terminal commands +- **Computer Use**: Browser automation for verification +- **MCP Support**: Linear, Figma, Slack, Sentry integration +- **WARP.md**: Configuration files compatible with agents.md, claude.md + +**Key Features:** +``` +- IDE + CLI hybrid experience +- Agent code review interface (line-level comments) +- Universal Input: @file, image upload, URL attachment +- Snapshot/restore for workspace exploration +- Enterprise: SSO, audit trails, VPC support +``` + +**Key Insight for Letta:** +Warp validates the **terminal-native workflow** but extends it with: +1. **Rich prompt context** (@mentions, images, URLs) +2. **Code review UI** in terminal (Letta could add approval queue UI) +3. **MCP ecosystem** (Letta already supports skills, should expand MCP) +4. **Workspace snapshots** (Letta's git worktrees provide similar isolation) + +**Critical Gap Warp Leaves:** +- No mention of durable execution or task persistence +- Appears to be single-session focused +- No subagent orchestration (single agent model) +- No persistent memory across sessions + +### Antigravity + +**Status:** No functional website found. May be early-stage or rebranded. + +--- + +## 3. Open-Source Community Landscape + +### Aider (Most Direct Comparison) + +From [github.com/Aider-AI/aider](https://github.com/Aider-AI/aider): + +**Strengths:** +- Multi-file editing with diff view +- Automatic git commits with sensible messages +- Repository mapping for large codebases +- Voice support, image input +- Works with almost any LLM (including local) +- Lint/test integration (auto-fix) +- Strong community (28k+ GitHub stars) + +**Architecture:** +- Git-repo centered (not stateful agents) +- Edits files directly (no MemFS) +- No subagent orchestration +- Session-based (no persistent memory) + +**Lessons for Letta:** +- Aider's git integration patterns are excellent (auto-commit, sensible messages) +- Codebase mapping is crucial for large projects +- Diff-based editing is user-friendly +- Multi-model support is table stakes + +### Cline (VS Code Extension) + +From [github.com/cline/cline](https://github.com/cline/cline): + +**Strengths:** +- Human-in-the-loop GUI (approve every change) +- VS Code native integration +- Computer Use (browser automation) +- MCP support for custom tools +- Workspace snapshots and restore +- Token/cost tracking +- Terminal integration in editor + +**Architecture:** +- Extension-based (VS Code only) +- Session-based state +- No subagent orchestration +- No persistent memory + +**Lessons for Letta:** +- Human-in-the-loop approval is important for trust +- Workspace snapshots enable safe exploration +- Cost transparency (token tracking) is valuable +- Browser automation (Computer Use) is becoming standard + +### Agno + +From [docs.agno.com](https://docs.agno.com): + +**Positioning:** "AgentOS" - platform for building agents + +**Observations:** +- Focus on production deployment patterns +- Less mature than Letta in documentation +- No clear differentiation on memory/state + +--- + +## 4. Technical Pattern Research + +### 4.1 Task Queues: Redis + Python + +From [redis.io/docs](https://redis.io/docs/latest/develop/clients/redis-py/): + +**Redis-py patterns for Letta:** +```python +import redis + +# Connection with auto-decode +r = redis.Redis(host='localhost', port=6379, decode_responses=True) + +# Hash for task storage (matches our design) +r.hset(f'task:{task_id}', mapping={ + 'status': 'pending', + 'prompt': prompt, + 'subagent_type': subagent_type, + 'created_at': timestamp, +}) + +# Sorted Set for priority queue +r.zadd('queue:pending', {task_id: priority}) + +# Atomic dequeue with WATCH/MULTI/EXEC +``` + +**Key Insights:** +- `decode_responses=True` eliminates manual byte decoding +- `hset` with `mapping=` is clean for task storage +- Redis transactions (MULTI/EXEC) provide atomic queue operations +- Consider `redis[hiredis]` for performance + +**Alternative: Celery** + +From [docs.celeryq.dev](https://docs.celeryq.dev): + +Celery provides: +- Distributed task processing +- Real-time + scheduling +- Worker management +- Built-in retry logic + +**Recommendation:** For Phase 1, use **raw Redis** (not Celery): +- Celery adds abstraction layer that may conflict with Letta's specific needs +- Our task queue has unique requirements (subagent spawning, git worktrees) +- Raw Redis gives more control over the data model +- Can migrate to Celery later if needed + +### 4.2 Durable Execution: Temporal + +From [temporal.io](https://temporal.io): + +**Core Value:** "Write code as if failure doesn't exist" + +**Key Concepts:** +- **Workflows**: Durable, fault-tolerant business logic +- **Activities**: Retryable, failure-prone operations +- **State persistence**: Automatic checkpointing +- **Signals**: External events to running workflows + +**Temporal for Letta ADE:** +```python +# Potential Workflow structure +@workflow.defn +class SubagentWorkflow: + @workflow.run + async def run(self, task_id: str): + # Activity: Spawn subagent (may fail) + result = await workflow.execute_activity( + spawn_subagent, + task_id, + start_to_close_timeout=timedelta(minutes=5), + retry_policy=RetryPolicy(maximum_attempts=3) + ) + + # Activity: Wait for completion (long-running) + final_result = await workflow.execute_activity( + poll_subagent_completion, + result.agent_id, + start_to_close_timeout=timedelta(hours=1), + ) + + return final_result +``` + +**Decision Matrix:** + +| Approach | Complexity | Durability | Letta Fit | +|----------|-----------|------------|-----------| +| Raw Redis + Workers | Low | Medium | βœ… Good fit | +| Temporal | Medium | High | ⚠️ Overkill? | +| Celery | Low | Medium | ⚠️ Too abstract | + +**Recommendation:** +- **Phase 1**: Use Redis + custom workers (existing design) +- **Phase 2**: Evaluate Temporal for complex multi-step workflows +- Temporal shines for **long-running, multi-step** workflows with human-in-the-loop +- Letta's subagent tasks are relatively simple (spawn β†’ run β†’ complete) + +### 4.3 Web Dashboard: React + FastAPI Patterns + +From [ui.shadcn.com](https://ui.shadcn.com): + +**Shadcn/ui** provides: +- Unstyled, accessible components +- Tailwind CSS integration +- Customizable design system +- Modern React patterns + +**Recommended Dashboard Stack:** + +``` +Frontend: +- React + Vite (existing plan) +- shadcn/ui for components +- TanStack Query for data fetching +- WebSocket client for real-time updates + +Backend: +- FastAPI (async native) +- WebSocket support built-in +- Redis for pub/sub +- PostgreSQL for metadata +``` + +**Dashboard Features (Prioritized):** + +| Priority | Feature | Tech | +|----------|---------|------| +| P0 | Agent grid view | React + TanStack Query | +| P0 | Real-time logs | WebSocket | +| P1 | Approval queue UI | shadcn Dialog + Table | +| P1 | Task history | TanStack Query + Pagination | +| P2 | Memory browser | Tree view component | +| P2 | Metrics dashboard | Recharts or Tremor | + +--- + +## 5. Updated Gap Analysis + +### Revised Comparison Table + +| Feature | Letta | Intent | Warp | Aider | Cline | +|---------|-------|--------|------|-------|-------| +| **Web Dashboard** | ❌ | βœ… | βœ… | ❌ | ❌ | +| **Persistent Tasks** | ⚠️ | βœ… | ⚠️ | ❌ | ❌ | +| **Stateful Agents** | βœ… | ⚠️ | ❌ | ❌ | ❌ | +| **Subagent Orchestration** | βœ… | ❌ | ❌ | ❌ | ❌ | +| **Git-Native Memory** | βœ… | ❌ | ❌ | ⚠️ | ❌ | +| **MCP/Skills** | βœ… | ❌ | βœ… | ❌ | βœ… | +| **Approval Queues** | βœ… CLI | βœ… | βœ… | ❌ | βœ… | +| **Multi-Model** | βœ… | ? | βœ… | βœ… | βœ… | +| **Computer Use** | ❌ | ? | βœ… | ❌ | βœ… | +| **Spec-Driven Dev** | ⚠️ | βœ… | ❌ | ❌ | ❌ | + +**Key Insights:** +1. **Letta leads in agent infrastructure** (state, memory, orchestration) +2. **Commercial tools lead in UX** (dashboards, IDE integration) +3. **No competitor has Letta's core combination** (stateful + memory + git + subagents) +4. **Computer Use** is becoming standard (Warp, Cline have it) + +--- + +## 6. Critical Gaps Identified + +### High Priority (MVP Blockers) + +1. **Computer Use / Browser Automation** + - Warp and Cline both offer browser automation + - Letta agents should be able to launch browsers, click, screenshot + - Enables web testing, visual verification + +2. **Rich Context in Prompts** + - Warp's "Universal Input" (@file, images, URLs) + - Currently requires manual file reading + - Should support: `@path/to/file`, `@https://example.com`, drag-drop images + +3. **Workspace Snapshots** + - Cline's snapshot's/restore feature + - Letta git worktrees provide foundation + - Need UI for "save point" and "restore to point" + +### Medium Priority (Competitive Parity) + +4. **Cost Tracking** + - Cline shows token usage and cost per request + - Important for agent transparency + +5. **Voice Input Flow** + - Warp integrates with Wispr + - Nice-to-have, not MVP critical + +6. **MCP Ecosystem Expansion** + - Expand skills to full MCP server support + - Community MCP marketplace + +--- + +## 7. Revised Implementation Recommendations + +### Phase 1: Orchestration Layer (Refined) + +**Duration:** 6 weeks (was 8) + +**Changes from original design:** +1. **Week 1-2**: In-memory prototype (unchanged) +2. **Week 3-4**: Redis integration + worker pool (unchanged) +3. **Week 5**: API endpoints (REST + WebSocket) +4. **Week 6**: Task tool integration + testing + +**Additions:** +- WebSocket endpoint for real-time task updates +- Cost tracking (token counting) +- Rich context parsing (@file, URLs) + +**Deferred:** +- Temporal integration (evaluate in Phase 2) +- Advanced retry policies (basic exponential backoff sufficient) + +### Phase 2: Dashboard MVP (Refined) + +**Duration:** 4 weeks + +**Week 1**: Project setup + Agent grid view +- Vite + React + shadcn/ui setup +- TanStack Query integration +- Basic agent status display + +**Week 2**: Real-time features +- WebSocket connection +- Live log streaming +- Status updates + +**Week 3**: Approval queue UI +- Pending approvals list +- Accept/Reject buttons +- Comment/feedback input + +**Week 4**: Task history + polish +- Task list with filtering +- Detail view +- Error display + +**Additions based on research:** +- Cost display (per task, total) +- Workspace snapshot indicator +- @mention support in prompts + +### Phase 3: Integration Ecosystem (Unchanged) + +**Add priority:** +1. GitHub App (highest - matches Aider/Warp) +2. Slack notifications +3. Linear/Jira (MCP-based) +4. Webhook triggers + +### Phase 4: Computer Use (NEW PHASE) + +**Duration:** 4 weeks + +**Rationale:** Computer Use is becoming table stakes (Warp, Cline have it) + +**Scope:** +- Browser automation (Playwright integration) +- Screenshot capture +- Click/type/scroll actions +- Visual verification workflows + +**Integration:** +- New skill: `computer_use` +- Subagent can launch browser +- Screenshots stored in MemFS + +### Phase 5: Specification Layer (Refined) + +**Duration:** 6 weeks (was 8-10) + +**Scope reduction:** +- Start with PRD validation (Zod schemas) +- Basic constitution enforcement (regex + AST rules) +- No full natural language spec parsing yet + +**Deferred:** +- Full spec regeneration (Intent-level functionality) +- Architectural diagram generation + +### Phase 6: Team Collaboration (Unchanged) + +--- + +## 8. Technical Stack Recommendations (Updated) + +### Orchestration Layer + +| Component | Original | Updated | Rationale | +|-----------|----------|---------|-----------| +| Queue | Redis | Redis βœ… | Proven, matches research | +| Durable Execution | Temporal | Redis + Workers | Temporal overkill for Phase 1 | +| Workers | Python asyncio | Python asyncio βœ… | Good fit | +| API | FastAPI | FastAPI βœ… | Async native, WebSocket support | + +### Dashboard + +| Component | Original | Updated | Rationale | +|-----------|----------|---------|-----------| +| Framework | React | React βœ… | Standard | +| Build Tool | Vite | Vite βœ… | Fast, modern | +| UI Library | - | shadcn/ui | Accessible, customizable | +| Styling | - | Tailwind CSS | Standard with shadcn | +| Data Fetching | - | TanStack Query | Caching, real-time sync | +| Charts | - | Tremor/Recharts | Dashboard metrics | + +### Additional Components + +| Component | Recommendation | +|-----------|----------------| +| Browser Automation | Playwright | +| Rich Context Parsing | Custom parser (@file, URL regex) | +| Cost Tracking | Token counting in subagent wrapper | +| WebSocket | FastAPI native + Redis pub/sub | + +--- + +## 9. Risks and Mitigations + +### Identified Risks + +| Risk | Impact | Mitigation | +|------|--------|------------| +| Web search unavailable for research | Medium | Use fetch_webpage for known URLs | +| Temporal overengineering | High | Defer to Phase 2 evaluation | +| Dashboard scope creep | High | Strict MVP definition (4 weeks) | +| Computer Use complexity | Medium | Use Playwright, limit scope | +| Competition pace | Medium | Focus on Letta differentiators | + +### Competitive Response + +**If Warp releases open-source:** +- Warp is terminal + IDE hybrid; Letta is agent infrastructure +- Different target users (Warp = developers, Letta = agent builders) +- Letta's stateful/memory approach still differentiated + +**If Intent releases spec layer as open standard:** +- Letta could adopt Intent spec format +- Focus on execution infrastructure +- Potential collaboration opportunity + +--- + +## 10. Next Steps (Prioritized) + +### Immediate (This Week) + +1. **Decision:** Confirm Redis-only vs Temporal evaluation +2. **Prototype:** Build in-memory task queue (Week 1) +3. **Research:** Playwright integration for Computer Use +4. **Design:** Dashboard wireframes (shadcn components) + +### Week 2-3 + +5. **Implement:** Redis integration +6. **Test:** Worker pool with 3+ concurrent workers +7. **API:** REST endpoints for task management + +### Week 4-6 + +8. **Integrate:** Modify Task tool to use queue +9. **WebSocket:** Real-time updates endpoint +10. **Dashboard:** Start React project setup + +### Success Metrics + +| Metric | Target | +|--------|--------| +| Task durability | 0% loss on restart | +| Worker concurrency | 5+ parallel tasks | +| Dashboard load time | <2 seconds | +| Approval latency | <1 second from event | + +--- + +## 11. Conclusion + +### Updated Assessment + +**Letta's position is stronger than initially thought:** + +1. **No open-source competitor** has the combination of: + - Stateful agents with hierarchical memory + - Git-native persistence + - Subagent orchestration + - Model agnostic design + +2. **Commercial tools validate the market** but focus on different layers: + - Intent: Specification layer (Letta can integrate) + - Warp: Terminal UX (Letta can offer alternative) + +3. **Technical patterns are well-understood**: + - Redis queues: Proven, simple + - Temporal: Powerful but may be overkill + - React + FastAPI: Standard, well-supported + +### Final Recommendation + +**Proceed with Phase 1 (Orchestration) immediately.** + +The research confirms: +- The orchestration layer design is sound +- Redis is the right choice for Phase 1 +- The dashboard stack (React + shadcn + TanStack Query) is industry standard +- Competitive pressure is real but Letta has unique advantages + +**Revised Timeline:** +- Phase 1 (Orchestration): 6 weeks +- Phase 2 (Dashboard): 4 weeks +- Phase 3 (Integrations): 4 weeks +- Phase 4 (Computer Use): 4 weeks +- Phase 5 (Specifications): 6 weeks +- Phase 6 (Team): 4 weeks + +**Total to full ADE:** 7 months (was 6-9) + +**MVP (Phases 1-2):** 10 weeks (was 3-4 months) + +--- + +*Research synthesis conducted on March 18, 2026* + +*Sources: Letta docs, Intent.dev, Warp.dev, Temporal.io, Redis docs, Celery docs, GitHub (Aider, Cline)*