fix: handle oversized text in embedding requests with recursive chunking
When message text exceeds the embedding model's context length, recursively
split it until all chunks can be embedded successfully.
Changes:
- `tpuf_client.py`: Add `_split_text_in_half()` helper for recursive splitting
- `tpuf_client.py`: Add `_generate_embeddings_with_chunking()` that retries
with splits on context length errors
- `tpuf_client.py`: Store `message_id` and `chunk_index` columns in Turbopuffer
- `tpuf_client.py`: Deduplicate query results by `message_id`
- `tpuf_client.py`: Use `LettaInvalidArgumentError` instead of `ValueError`
- `tpuf_client.py`: Move LLMClient import to top of file
- `openai_client.py`: Remove fixed truncation (chunking handles this now)
- Add tests for `_split_text_in_half` and chunked query deduplication
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: Letta <noreply@letta.com>
Adds explicit handling for httpx network errors (ReadError, WriteError,
ConnectError) in AnthropicClient, OpenAIClient, and GoogleVertexClient.
These errors can occur during streaming when the connection is unexpectedly
closed while reading/writing data.
Maps these errors to LLMConnectionError for consistent error handling.
Fixes#8221 (and duplicate #8156)
🤖 Generated with [Letta Code](https://letta.com)
Co-authored-by: letta-code <248085862+letta-code@users.noreply.github.com>
Co-authored-by: Letta <noreply@letta.com>
Co-authored-by: Kian Jones <11655409+kianjones9@users.noreply.github.com>
* first hack with test
* remove changes integration test
* Delete apps/core/tests/sdk_v1/integration/integration_test_send_message_v2.py
* add test
* remove comment
* stage and publish api
* deprecate base level response_schema
* add param to llm_config test
---------
Co-authored-by: Ari Webb <ari@letta.com>
Root cause: When splitting failed embedding batches, mid=0 for single
items created empty chunks. These empty chunks were then processed,
creating hundreds of no-op tasks that consumed memory.
Crash pattern from logs:
- 600+ 'batch_size=0' embedding tasks created
- Memory spiked 531 MB → 4.9 GB
- Pod crashed
Fixes:
1. Skip empty chunks before creating tasks
2. Guard chunk splits to prevent empty slices (mid = max(1, len//2))
3. Break early if all chunks are empty
This prevents the asyncio.gather() from creating thousands of empty
coroutines that exhaust memory.
* first hack
* clean up
* first implementation working
* revert package-lock
* remove openai test
* error throw
* typo
* Update integration_test_send_message_v2.py
* Update integration_test_send_message_v2.py
* refine test
* Only make changes for openai non streaming
* Add tests
---------
Co-authored-by: Ari Webb <ari@letta.com>
Co-authored-by: Matt Zhou <mattzh1314@gmail.com>
* feat: add full responses api support in new agent loop
* update matrix in workflow
* relax check for reasoning messages for high effort gpt 5
* fix indent
* one more relax
* feat: squash rebase of OSS PR
* fix: revert changes that weren't on manual rebase
* fix: caught another one
* fix: disable force
* chore: drop print
* fix: just stage-api && just publish-api
* fix: make agent_type consistently an arg in the client
* fix: patch multi-modal support
* chore: put in todo stub
* fix: disable hardcoding for tests
* fix: patch validate agent sync (#4882)
patch validate agent sync
* fix: strip bad merge diff
* fix: revert unrelated diff
* fix: react_v2 naming -> letta_v1 naming
* fix: strip bad merge
---------
Co-authored-by: Kevin Lin <klin5061@gmail.com>
* remove apps/core and apps/fern
* fix precommit
* add submodule updates in workflows
* submodule
* remove core tests
* update core revision
* Add submodules: true to all GitHub workflows
- Ensure all workflows can access git submodules
- Add submodules support to deployment, test, and CI workflows
- Fix YAML syntax issues in workflow files
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* remove core-lint
* upgrade core with latest main of oss
---------
Co-authored-by: Claude <noreply@anthropic.com>
* base requirements
* autofix
* Configure ruff for Python linting and formatting
- Set up minimal ruff configuration with basic checks (E, W, F, I)
- Add temporary ignores for common issues during migration
- Configure pre-commit hooks to use ruff with pass_filenames
- This enables gradual migration from black to ruff
* Delete sdj
* autofixed only
* migrate lint action
* more autofixed
* more fixes
* change precommit
* try changing the hook
* try this stuff