Root cause: When splitting failed embedding batches, mid=0 for single
items created empty chunks. These empty chunks were then processed,
creating hundreds of no-op tasks that consumed memory.
Crash pattern from logs:
- 600+ 'batch_size=0' embedding tasks created
- Memory spiked 531 MB → 4.9 GB
- Pod crashed
Fixes:
1. Skip empty chunks before creating tasks
2. Guard chunk splits to prevent empty slices (mid = max(1, len//2))
3. Break early if all chunks are empty
This prevents the asyncio.gather() from creating thousands of empty
coroutines that exhaust memory.
* fix(core): sanitize messages to anthropic in the main path the same way (or similar) to how we do it in the token counter
* fix: also patch poison error in backend by filtering lazily
* fix: remap streaming errors (what the fuck)
* fix: dedupe tool clals
* fix: cleanup, removed try/catch
* no bare raise in anthropic client. Also filter out messages with empty content
* use client logic
* replace empty/whitespace with period
* cursor bugbot
* first hack
* clean up
* first implementation working
* revert package-lock
* remove openai test
* error throw
* typo
* Update integration_test_send_message_v2.py
* Update integration_test_send_message_v2.py
* refine test
* Only make changes for openai non streaming
* Add tests
---------
Co-authored-by: Ari Webb <ari@letta.com>
Co-authored-by: Matt Zhou <mattzh1314@gmail.com>
* refactor not to use warnings.warn
* temp circular import fix maybe unecessary/bnad
* fix Deprecation warning
* fix deprecation warning and mcp thing?
* revert changes to mcp server test
* fix deprecation warning
* feat: add full responses api support in new agent loop
* update matrix in workflow
* relax check for reasoning messages for high effort gpt 5
* fix indent
* one more relax
* fix: patch failing summarizer tests for anthropic claude 3.5
* fix: carveout for gemini-2.5-flash because it doesn't do the send_message tool call
* fix: deprecate old gemini test now that model is unavailable
* fix: deprecate old gemini test now that model is unavailable
* fix: deprecate old gemini test now that model is unavailable
* fix: patch flash flakiness
* fix: relax the gemini 2.5 flash test
* fix: relax the gemini 2.5 flash test
* fix: relax again
* fix: another flash fix
* fix: relax gpt-4o-mini
* fix: swap 4o-mini for 4.1
* fix: drop 4o-mini
* fix: change the backend logic for the reasoning toggle to be different for v1 vs old agent types, also change the frontend to disable toggling unless it's a toggle-able model
* fix: add more logic to when the reasoning switch is enabled or disabled
* fix: safety net in the backend to set config when undefined
* fix: disable prefix fill, it's a bug
* feat: squash rebase of OSS PR
* fix: revert changes that weren't on manual rebase
* fix: caught another one
* fix: disable force
* chore: drop print
* fix: just stage-api && just publish-api
* fix: make agent_type consistently an arg in the client
* fix: patch multi-modal support
* chore: put in todo stub
* fix: disable hardcoding for tests
* fix: patch validate agent sync (#4882)
patch validate agent sync
* fix: strip bad merge diff
* fix: revert unrelated diff
* fix: react_v2 naming -> letta_v1 naming
* fix: strip bad merge
---------
Co-authored-by: Kevin Lin <klin5061@gmail.com>