536 lines
14 KiB
Markdown
536 lines
14 KiB
Markdown
# Matrix-Letta Bridge Design Plan
|
|
|
|
## Current State
|
|
|
|
The bridge is **dumb transport**. It:
|
|
- Connects to Matrix (E2EE)
|
|
- Receives messages from rooms
|
|
- Sends to ONE Letta agent per room
|
|
- Receives response
|
|
- Displays to user
|
|
|
|
That's it.
|
|
|
|
**What it DOES NOT do (yet)**:
|
|
- Show tool execution progress
|
|
- Parse anything but `assistant_message` from Letta responses
|
|
- Handle multi-agent orchestration (that's Letta's job)
|
|
- Store task state (that's Letta's job)
|
|
- Provide confirmations (that's Letta's job, if ever)
|
|
|
|
---
|
|
|
|
## What Letta Handles
|
|
|
|
| Feature | Owner |
|
|
|---------|-------|
|
|
| Multi-agent coordination | ✅ Letta |
|
|
| Task persistence | ✅ Letta |
|
|
| Context switching | ✅ Letta |
|
|
| Tool execution | ✅ Letta |
|
|
| Agent routing logic | ✅ Letta |
|
|
| State resumption | ✅ Letta |
|
|
|
|
**Bridge role**: Just get the message there and back.
|
|
|
|
---
|
|
|
|
## What We Want to Add to Bridge
|
|
|
|
**Single focus**: Tool execution visibility via emojis.
|
|
|
|
Letta runs tools (search, read mail, etc.) but the bridge ignores all that. User just sees silence, then a result appears. We want to **surface what's happening** through reactions.
|
|
|
|
---
|
|
|
|
## The Problem
|
|
|
|
Current Letta response parsing:
|
|
|
|
```python
|
|
# bridge-e2ee.py:1149 - CURRENT CODE
|
|
for msg in data.get("messages", []):
|
|
if msg.get("message_type") == "assistant_message": # ❌ ONLY captures text
|
|
content = msg.get("content", "")
|
|
if content:
|
|
assistant_messages.append(content)
|
|
|
|
# Everything else is dropped:
|
|
# - tool_call (ignored)
|
|
# - reasoning_step (ignored)
|
|
# - tool_result (ignored)
|
|
# - error_message (ignored)
|
|
```
|
|
|
|
**User experience**:
|
|
```
|
|
User: "Curate my unread emails"
|
|
(silence for 30 seconds)
|
|
Ani: "Found 3 actionable emails..."
|
|
```
|
|
|
|
**Desired experience**:
|
|
```
|
|
User: "Curate my unread emails"
|
|
Ani: 🧠 Working...
|
|
🔍📖📋 (reactions appear as tools run)
|
|
Ani: "Found 3 actionable emails..."
|
|
```
|
|
|
|
---
|
|
|
|
## Solution: Tool Indication via Reactions
|
|
|
|
### Simple Implementation
|
|
|
|
```python
|
|
# 1. Parse ALL message types
|
|
for msg in data.get("messages", []):
|
|
msg_type = msg.get("message_type", "unknown")
|
|
|
|
if msg_type == "assistant_message":
|
|
# Capture text response
|
|
assistant_messages.append(msg.get("content", ""))
|
|
|
|
elif msg_type == "tool_call":
|
|
# Post progress message with emoji
|
|
tool_name = msg.get("tool_name")
|
|
emoji = get_emoji_for_tool(tool_name)
|
|
await self.post_progress_message(room_id, f"🧠 Working... {emoji}")
|
|
|
|
# 2. Delete/edit progress when response arrives
|
|
await self.finalize_progress(room_id, final_response)
|
|
```
|
|
|
|
### Tool → Emoji Mapping
|
|
|
|
```python
|
|
EMOJI_MAP = {
|
|
# Read operations
|
|
"read_mail": "📖",
|
|
"read_file": "📖",
|
|
"retrieve_memory": "📖",
|
|
"get_calendar": "📖",
|
|
|
|
# Write operations
|
|
"send_email": "✍️",
|
|
"save_note": "✍️",
|
|
"write_file": "✍️",
|
|
|
|
# Search operations
|
|
"search_web": "🔍",
|
|
"google_search": "🔍",
|
|
"web_search": "🔍",
|
|
|
|
# Compute/Process
|
|
"calculate": "🔧",
|
|
"process_image": "🔧",
|
|
"analyze": "🔧",
|
|
|
|
# List/Browse
|
|
"list_emails": "📋",
|
|
"list_files": "📋",
|
|
"list": "📋",
|
|
|
|
# Default
|
|
"default": "⚙️",
|
|
"error": "⚠️"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Approach
|
|
|
|
### Option A: Separate Progress Message
|
|
|
|
```
|
|
[User]: "Curate my emails"
|
|
|
|
[Ani]: 🧠 Working...
|
|
🔍 (reaction added after search_web)
|
|
📖 (reaction added after read_mail)
|
|
✍️ (reaction added after save_note)
|
|
|
|
[Ani]: "Found 3 actionable emails..."
|
|
```
|
|
|
|
### Option B: Inline Update (Edit)
|
|
|
|
```
|
|
[User]: "Curate my emails"
|
|
|
|
[Ani]: 🧠 Working...
|
|
[edit] 🔍 Searching...
|
|
[edit] 📖 Reading emails...
|
|
[edit] "Found 3 actionable emails..."
|
|
```
|
|
|
|
**Recommendation**: Option A (more compatible with E2EE)
|
|
|
|
---
|
|
|
|
## Files to Modify
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `bridge-e2ee.py` | `send_to_letta()` - parse all message types |
|
|
| `bridge-e2ee.py` | Add `tool_categories` and `EMOJI_MAP` |
|
|
| `bridge-e2ee.py` | Add `post_progress_message()` and `finalize_progress()` |
|
|
| `bridge-e2ee.py` | Optional: store tool executions in DB for audit |
|
|
|
|
---
|
|
|
|
## Minimal Implementation (First Pass)
|
|
|
|
```python
|
|
# In send_to_letta(), after parsing all messages:
|
|
|
|
# Check if any non-assistant messages exist (tools, reasoning, etc.)
|
|
non_assistant = [m for m in messages if m.get("message_type") != "assistant_message"]
|
|
|
|
if non_assistant:
|
|
# Post working message
|
|
log.info(f"[Letta] Agent is working: {[m.get('message_type') for m in non_assistant]}")
|
|
|
|
# For now, just log it. Emojis require bridge to post to Matrix during Letta call,
|
|
# which means we need to handle streaming or callback pattern.
|
|
```
|
|
|
|
**Reality check**: The current `send_to_letta()` blocks until Letta finishes. To show progress DURING execution, we'd need:
|
|
- Streaming responses from Letta, OR
|
|
- A callback/event pattern, OR
|
|
- Polling mechanism
|
|
|
|
**Simplest first step**: Just log what tools are being executed. We can add the Matrix progress in V2.
|
|
|
|
---
|
|
|
|
## Simplified Roadmap
|
|
|
|
| Phase | What | Effort |
|
|
|-------|------|--------|
|
|
| V0 | Log all message types from Letta (debug) | 30m |
|
|
| V1 | Parse tool_call, log tool names | 1h |
|
|
| V2 | Post "Working..." message when tools detected | 1h |
|
|
| V3 | Add emoji reactions per tool type | 1h |
|
|
| V4 | Delete progress message on completion | 30m |
|
|
|
|
**Total**: ~4 hours for basic tool visibility
|
|
|
|
---
|
|
|
|
## What We're NOT Adding
|
|
|
|
For now, NO:
|
|
- ✅/❌ confirmations (Letta's domain if ever needed)
|
|
- Multi-agent routing in bridge (Letta's domain)
|
|
- Task persistence in bridge (Letta's domain)
|
|
- Context switching in bridge (Letta's domain)
|
|
|
|
The bridge stays dumb. We just add a little peek into what Letta's doing.
|
|
|
|
---
|
|
|
|
## Notes on Letta Multi-Agent
|
|
|
|
If Letta ever has multiple agents conversing in one room, the bridge just:
|
|
1. Receives messages
|
|
2. Sends to room
|
|
3. Tags sender (if Letta provides agent identity)
|
|
|
|
Example:
|
|
```
|
|
[Curator]: "Found 3 emails worth reviewing."
|
|
[Ani]: "Thanks, can you summarize them?"
|
|
[Curator]: "Sure: 1. ..."
|
|
```
|
|
|
|
Bridge just passes through. Agent identity comes from Letta (in `source` or similar field in message).
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**Bridge role**: Transport + visibility
|
|
|
|
Current transport: ✅ Working
|
|
Current visibility: ❌ Blind (only sees final text)
|
|
|
|
Add visibility: Tool indication via reactions
|
|
Complexity: Low (just parse and display)
|
|
Multi-agent: Not our problem (Letta's domain)
|
|
|
|
---
|
|
|
|
# MAJOR REFACTOR PLAN: mautrix.util.formatter Migration
|
|
|
|
## Problem Statement
|
|
|
|
**Current Issues (Patches Maintained):**
|
|
1. **Manual markdown** - ~100 lines of regexpatches for false code blocks
|
|
2. **Manual color syntax** - `{red|text}` → `<font color="..." data-mx-color="...">`
|
|
3. **Manual spoiler syntax** - `||text||` → `<span data-mx-spoiler>`
|
|
4. **Manual HTML→text** - html.unescape() and strip_tags()
|
|
5. **Emoji shortcode** - custom normalize_emoji_shortcodes() + emoji.emojize()
|
|
|
|
**Maintenance Burden:**
|
|
- Each patch edge case → new regex
|
|
- False code block detection keeps breaking
|
|
- Color palette manually maintained (MATRIX_COLORS)
|
|
- No built-in mention/@user or room pill handling
|
|
|
|
**Solution:** Migrate to `mautrix.util.formatter` (native Matrix formatting)
|
|
|
|
---
|
|
|
|
## API Analysis (from mautrix-python docs)
|
|
|
|
```python
|
|
from mautrix.util.formatter import parse_html, MarkdownString, MatrixParser
|
|
from mautrix.types import EventType
|
|
|
|
# HTML → Plain Text (ASYNC)
|
|
plain_text = await parse_html(html_input)
|
|
# Returns: "Hello world!\n• Item 1\n• Item 2"
|
|
|
|
# Markdown → HTML (SYNC)
|
|
markdown = MarkdownString("**Bold** and ||spoiler||")
|
|
html_output = markdown.format(EventType.ROOM_MESSAGE)
|
|
# Returns: '<strong>Bold</strong> and <span data-mx-spoiler>spoiler</span>'
|
|
|
|
# Mentions and pills (ASYNC)
|
|
parser = MatrixParser()
|
|
formatted = await parser.parse("Hello @user:example.com")
|
|
# Returns: MarkdownString with proper Matrix pills
|
|
formatted.html # <a href="https://matrix.to/#/@user:example.com">@user:example.com</a>
|
|
```
|
|
|
|
**Key Finding:**
|
|
- `parse_html()` is **async** (coroutine)
|
|
- `MarkdownString.format()` is **sync** but requires `EntityType` argument
|
|
- `MatrixParser.parse()` is **async**
|
|
|
|
---
|
|
|
|
## Impact Analysis
|
|
|
|
### Functions Requiring Async Conversion
|
|
|
|
| Function | Current | New | Impact |
|
|
|----------|---------|-----|--------|
|
|
| `format_html()` | sync | async | **HIGH** - called everywhere |
|
|
| `send_message()` | sync | async | **MEDIUM** - many call sites |
|
|
| `on_message()` | async | async | **LOW** - already async |
|
|
| `on_image()` | async | async | **LOW** - already async |
|
|
| `on_audio()` | async | async | **LOW** - already async |
|
|
| `process_queue()` | async | async | **LOW** - already async |
|
|
|
|
### Call Sites Count
|
|
|
|
```bash
|
|
format_html() called in:
|
|
- send_message() → 20+ callers
|
|
- Various message handlers → scattered throughout
|
|
|
|
All send_message() callers must be updated to await the result.
|
|
```
|
|
|
|
**Estimated:** ~25 call site updates needed
|
|
|
|
---
|
|
|
|
## Refactor Implementation Plan
|
|
|
|
### Phase 1: Infrastructure (2h)
|
|
|
|
**1.1 Update imports**
|
|
```python
|
|
from mautrix.util.formatter import parse_html, MarkdownString, MatrixParser
|
|
from mautrix.types import EventType
|
|
```
|
|
|
|
**1.2 Create async format_html()**
|
|
```python
|
|
async def format_html(text: str) -> tuple[str, str]:
|
|
"""
|
|
Format text using mautrix native formatter.
|
|
|
|
Args:
|
|
text: Response from Letta (markdown or HTML)
|
|
|
|
Returns:
|
|
(plain_text, html_body) tuple
|
|
"""
|
|
try:
|
|
# Strip whitespace
|
|
text = text.strip()
|
|
|
|
# Convert emoji shortcodes (keep existing behavior)
|
|
text = normalize_emoji_shortcodes(text)
|
|
text = emoji.emojize(text, language='en')
|
|
|
|
# HTML path → parse to plain (ASYNC)
|
|
if text.startswith('<') and '>' in text:
|
|
# mautrix handles HTML parsing with Matrix extensions
|
|
# Note: our {color|text} syntax needs pre-processing
|
|
text = _apply_color_syntax(text.strip())
|
|
plain = await parse_html(text)
|
|
return plain, text
|
|
|
|
# Markdown path → use MarkdownString (SYNC)
|
|
md = MarkdownString(text)
|
|
# Pre-process {color|text} - MarkdownString doesn't handle this
|
|
processed_md = _apply_color_syntax(md.text)
|
|
md.text = processed_md
|
|
|
|
# Format to HTML (SYNC)
|
|
html = md.format(EventType.ROOM_MESSAGE)
|
|
|
|
# Generate plain text (ASYNC)
|
|
plain = await parse_html(html)
|
|
|
|
return plain, html
|
|
|
|
except Exception as e:
|
|
log.warning(f"HTML formatting failed: {e}")
|
|
return emoji.emojize(text), emoji.emojize(text)
|
|
```
|
|
|
|
**1.3 Color syntax helper**
|
|
```python
|
|
def _apply_color_syntax(text: str) -> str:
|
|
"""Convert {color|text} to HTML spans."""
|
|
def replace_color(match):
|
|
color = match.group(1)
|
|
content = match.group(2)
|
|
|
|
# Resolve color name
|
|
hex_color = MATRIX_COLORS.get(color, color)
|
|
|
|
# Convert to mautrix MarkdownString color syntax
|
|
# mautrix uses <font color="..."> internally
|
|
return f'<font color="{hex_color}" data-mx-color="{hex_color}">{content}</font>'
|
|
|
|
return re.sub(r'\{([a-zA-Z0-9_#]+)\|([^}]+)\}', replace_color, text)
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 2: Update Call Sites (1h)
|
|
|
|
**2.1 Update send_message()**
|
|
```python
|
|
# BEFORE
|
|
def format_html(text: str) -> tuple[str, str]: # SYNC
|
|
...
|
|
|
|
async def send_message(self, room_id: RoomID, text: str) -> str | None:
|
|
plain_text, html_body = format_html(text) # SYNC call
|
|
...
|
|
|
|
# AFTER
|
|
async def format_html(text: str) -> tuple[str, str]: # ASYNC
|
|
...
|
|
|
|
async def send_message(self, room_id: RoomID, text: str) -> str | None:
|
|
plain_text, html_body = await format_html(text) # AWAIT
|
|
...
|
|
```
|
|
|
|
**2.2 Scattered callers**
|
|
- `on_message()` → already async, just add await
|
|
- `on_image()` → already async, just add await
|
|
- `on_audio()` → already async, just add await
|
|
- `process_queue()` → already async, just add await
|
|
- Status handlers → already async, just add await
|
|
|
|
---
|
|
|
|
### Phase 3: Remove Dead Code (30m)
|
|
|
|
**Delete these functions (now handled by mautrix):**
|
|
|
|
```python
|
|
# ~~apply_matrix_extensions()~~ - mautrix handles spoilers natively
|
|
# ~~enhance_html()~~ - mautrix generates proper HTML
|
|
# ~~apply_chromatophores()~~ - could keep if desired, optional
|
|
# ~~False code block detection~~ - mautrix doesn't have this bug
|
|
# ~~normalize_emoji_shortcodes()~~ - mautrix may handle, keep if needed
|
|
```
|
|
|
|
**Delete MATRIX_COLORS** (or move to config):
|
|
- mautrix handles hex colors directly
|
|
- Named colors can stay if needed, but resolve to hex first
|
|
|
|
---
|
|
|
|
### Phase 4: Testing (1h)
|
|
|
|
**Test Cases:**
|
|
1. Bold/italic markdown → **strong/em** tags
|
|
2. Code blocks → proper `<pre><code>`
|
|
3. Links → `<a href="...">`
|
|
4. Spoilers `||text||` → `<span data-mx-spoiler>`
|
|
5. Colors `{red|text}` → `<font color="..." data-mx-color="...">`
|
|
6. Emoji shortcodes `:heart:` → ❤️
|
|
7. Mentions `@user:server` → Matrix pills (if using MatrixParser)
|
|
8. Room pills `#room:server` → Matrix pills (if using MatrixParser)
|
|
|
|
**Regression Tests:**
|
|
- Existing messages render correctly
|
|
- Tools reactions still work
|
|
- TTS still works
|
|
- Queue still works
|
|
|
|
---
|
|
|
|
## Risks and Mitigations
|
|
|
|
| Risk | Impact | Mitigation |
|
|
|------|--------|------------|
|
|
| Async conversion bugs | HIGH | Work on copy `ani_e2ee_bridge.py`, keep original |
|
|
| Color syntax breaking | MEDIUM | Keep `_apply_color_syntax()` as adapter |
|
|
| Spoiler syntax change | MEDIUM | Test spoilers: `||text||` still works |
|
|
| False code blocks returning | LOW | mautrix shouldn't have this bug, but monitor |
|
|
|
|
---
|
|
|
|
## Rollback Plan
|
|
|
|
If refactor fails:
|
|
|
|
1. Original `bridge-e2ee.py` is preserved intact
|
|
2. `ani_e2ee_bridge.py` is the experimental branch
|
|
3. Rename/revert as needed:
|
|
```bash
|
|
mv bridge-e2ee.py bridge-e2ee.backup.py
|
|
cp ani_e2ee_bridge.py bridge-e2ee.py # OR
|
|
cp bridge-e2ee.backup.py bridge-e2ee.py
|
|
```
|
|
|
|
---
|
|
|
|
## Estimated Total Effort
|
|
|
|
| Phase | Time |
|
|
|-------|------|
|
|
| Phase 1: Infrastructure | 2h |
|
|
| Phase 2: Update Call Sites | 1h |
|
|
| Phase 3: Remove Dead Code | 30m |
|
|
| Phase 4: Testing | 1h |
|
|
| **Total** | **~4.5h** |
|
|
|
|
---
|
|
|
|
## Benefits After Migration
|
|
|
|
- **~100 fewer lines** of manual patch code
|
|
- **Native spoiler support** - no regex
|
|
- **Native color formatting** via EntityType.COLOR
|
|
- **Built-in mention/@user support** (EntityType.USER_MENTION)
|
|
- **Built-in room pill support** (EntityType.ROOM_MENTION)
|
|
- **Better emoji handling** - less custom code
|
|
- **Future-proof** - mautrix evolves, we get updates
|
|
- **Less maintenance** - fewer edge cases to patch
|