Root cause: When splitting failed embedding batches, mid=0 for single
items created empty chunks. These empty chunks were then processed,
creating hundreds of no-op tasks that consumed memory.
Crash pattern from logs:
- 600+ 'batch_size=0' embedding tasks created
- Memory spiked 531 MB → 4.9 GB
- Pod crashed
Fixes:
1. Skip empty chunks before creating tasks
2. Guard chunk splits to prevent empty slices (mid = max(1, len//2))
3. Break early if all chunks are empty
This prevents the asyncio.gather() from creating thousands of empty
coroutines that exhaust memory.
* Revert "Revert "feat: make attach/detach routes return None if version is 1.0 [LET-5844]" (#6201)"
This reverts commit bb0d10725f5889306de61e1758f061d6c1041c52.
* fix type checking
* revert
* return state for blocks and sources
* func signatures
* create memgpt_agent for cloud-e2e-tests
* Revert "create memgpt_agent for cloud-e2e-tests"
This reverts commit f279e5897b0942b1006a5f8527713dd801064c63.
* fix
---------
Co-authored-by: Ari Webb <ari@letta.com>
* Fix event loop blocking in NLTK downloads and Azure model listing
Found via watchdog detecting 61.6s hang during file upload.
**Root causes:**
1. NLTK punkt_tab downloads blocking during file processing
2. Azure model listing using sync requests.get() in async context
**Fixes:**
1. Pre-download NLTK data at Docker build time
2. Async fallback download at startup if build failed
3. Move Azure model fetch to thread pool with asyncio.to_thread()
**Impact:**
- Eliminates 60+ second event loop hangs
- Startup: instant if data baked in, ~60s async if needs download
- Requests: never block, all I/O offloaded to threads
* Fix Docker build: ensure /root/nltk_data exists even if download fails
- Create directory before download attempt
- Add verification step to confirm download success
- Directory always exists so COPY won't fail in runtime stage
* Fix: use venv python for NLTK download in Docker build
The builder stage installs NLTK in /app/.venv but we were using
system python which doesn't have NLTK. Now using venv python so
download actually works.
* Use uv run for NLTK download (more idiomatic)
uv run automatically uses the synced venv, cleaner than hardcoding
the venv path.
Add stack trace dumping to watchdog
When a hang is detected, now logs:
- Full stack trace of all threads
- Exact file:line of blocking code
- Function names in the call chain
This shows you WHY the event loop is blocked, not just that it's blocked.
* Add lightweight event loop watchdog monitoring
- Thread-based watchdog detects event loop hangs >15s
- Runs independently, won't interfere with normal operation
- Disabled in test environments
- Minimal overhead, just heartbeat checks every 5s
* actually test it
* Add test script to validate watchdog detects hangs
Run with: uv run python test_watchdog_hang.py
Tests:
- Normal operation (no false positives)
- Short blocks under threshold (no alerts)
- Long blocks over threshold (correctly alerts)
Change the Fern docs custom domain from docs.letta.com to docs-legacy.letta.com.
Next steps:
- Contact Fern to receive new CNAME and TXT records
- Update DNS records for docs-legacy subdomain
- Wait for SSL provisioning
👾 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
* add memory tracking to core
* move to asyncio from threading.Thread
* remove threading.thread all the way
* delay decorator monitoring initialization until after event loop is registered
* context manager to decorator
* add psutil