letta-server

Author	SHA1	Message	Date
jnjpng	e8d5922ff9	fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming (#9535 ) * fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming When reasoning models (gpt-5.x) exhaust their max_output_tokens budget on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent instead of ResponseCompletedEvent. This was previously unhandled, causing final_response to remain None — which meant get_content() and get_tool_call_objects() returned empty results, silently dropping the partial response. Now ResponseIncompleteEvent is handled identically to ResponseCompletedEvent (extracting partial content, usage stats, and token details), with an additional warning log indicating the incomplete reason. * fix(core): propagate finish_reason for Responses API incomplete events - Guard usage extraction against None usage payload in ResponseIncompleteEvent handler - Add _finish_reason override to LettaLLMAdapter so streaming adapters can explicitly set finish_reason without a chat_completions_response - Map incomplete_details.reason="max_output_tokens" to finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat Completions API convention - This allows the agent loop's _decide_continuation to correctly return stop_reason="max_tokens_exceeded" instead of "end_turn" when the model exhausts its output token budget on reasoning * fix(core): handle empty content parts in incomplete ResponseOutputMessage When a model hits max_output_tokens after starting a ResponseOutputMessage but before producing any content parts, the message has content=[]. This previously raised ValueError("Got 0 content parts, expected 1"). Now it logs a warning and skips the empty message, allowing reasoning-only incomplete responses to be processed cleanly. * fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens Handle content_filter and any future unknown incomplete reasons from the Responses API instead of silently leaving finish_reason as None.	2026-02-24 10:55:11 -08:00
cthomas	3651658ea7	fix: tool call streaming using deprecated field (#9517 )	2026-02-24 10:55:11 -08:00
Kian Jones	f5c4ab50f4	chore: add ty + pre-commit hook and repeal even more ruff rules (#9504 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import * remove all ignores, add FastAPI rules and Ruff rules * add ty and precommit * ruff stuff * ty check fixes * ty check fixes pt 2 * error on invalid	2026-02-24 10:55:11 -08:00
Kian Jones	25d54dd896	chore: enable F821, F401, W293 (#9503 ) * auto fixes * auto fix pt2 and transitive deps and undefined var checking locals() * manual fixes (ignored or letta-code fixed) * fix circular import	2026-02-24 10:55:08 -08:00
Ari Webb	0bbb9c9bc0	feat: add reasoning zai openrouter (#9189 ) * feat: add reasoning zai openrouter * add openrouter reasoning * stage + publish api * openrouter reasoning always on * revert * fix * remove reference * do	2026-02-24 10:52:06 -08:00
Sarah Wooders	221b4e6279	refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065 ) 👾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> --------- Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
cthomas	c162de5127	fix: use shared event + .athrow() to properly set stream_was_cancelle… (#9019 ) fix: use shared event + .athrow() to properly set stream_was_cancelled flag Problem: When a run is cancelled via /cancel endpoint, `stream_was_cancelled` remained False because `RunCancelledException` was raised in the consumer code (wrapper), which closes the generator from outside. This causes Python to skip the generator's except blocks and jump directly to finally with the wrong flag value. Solution: 1. Shared `asyncio.Event` registry for cross-layer cancellation signaling 2. `cancellation_aware_stream_wrapper` sets the event when cancellation detected 3. Wrapper uses `.athrow()` to inject exception INTO generator (not consumer-side raise) 4. All streaming interfaces check event in `finally` block to set flag correctly 5. `streaming_service.py` handles `RunCancelledException` gracefully, yields [DONE] Changes: - streaming_response.py: Event registry + .athrow() injection + graceful handling - openai_streaming_interface.py: 3 classes check event in finally - gemini_streaming_interface.py: Check event in finally - anthropic_.py: Catch RunCancelledException - simple_llm_stream_adapter.py: Create & pass event to interfaces - streaming_service.py: Handle RunCancelledException, yield [DONE], skip double-update - routers/v1/{conversations,runs}.py: Pass event to wrapper - integration_test_human_in_the_loop.py: New test for approval + cancellation Tests:* - test_tool_call with cancellation (OpenAI models) ✅ - test_approve_with_cancellation (approval flow + concurrent cancel) ✅ Known cosmetic warnings (pre-existing): - "Run already in terminal state" - agent loop tries to update after /cancel - "Stream ended without terminal event" - background streaming timing race 👾 Generated with [Letta Code](https://letta.com) Co-authored-by: Letta <noreply@letta.com>	2026-01-29 12:44:04 -08:00
Sarah Wooders	8729a037b9	fix: handle new openai overflow error format (#7110 )	2025-12-17 17:31:02 -08:00
Devansh Jain	d1536df6f6	chore: Update deepseek client for v3.2 models (#6556 ) * support for v3.2 models * streaming + context window fix * fix for no assitant text from deepseek	2025-12-15 12:02:34 -08:00
Kian Jones	647e271c2a	fix: add more logging for stream error (#6490 ) * trying tout gpt-5.1-codex * add unit test for message content * try to support multimodal * remove ValueError and add logging on stream error * prevent stream termination from api spec implementation errors * fix: remove final_response references from non-Responses API interfaces * fix: add diagnostic attributes to SimpleOpenAIResponsesStreamingInterface * fix: remove final_response from SimpleOpenAIStreamingInterface (Chat Completions API)	2025-12-15 12:02:33 -08:00
Sarah Wooders	91e3dd8b3e	feat: fix new summarizer code and add more tests (#6461 )	2025-12-15 12:02:19 -08:00
Charles Packer	4af6465226	feat(core+web): store raw usage data on streams (and visualize properly in ADE) (#6452 ) * feat(core): store raw usage data on streams * fix(web): various fixes to deal w/ hardcoding against openai	2025-12-15 12:02:19 -08:00
Charles Packer	88a3743cc8	fix(core): distinguish between null and 0 for prompt caching (#6451 ) * fix(core): distinguish between null and 0 for prompt caching * fix: runtime errors * fix: just publish just sgate	2025-12-15 12:02:19 -08:00
Charles Packer	131891e05f	feat: add tracking of advanced usage data (eg caching) [LET-6372] (#6449 ) * feat: init refactor * feat: add helper code * fix: missing file + test * fix: just state/publish api	2025-12-15 12:02:19 -08:00
Charles Packer	081a1f6920	fix(core): patch responses api parallel tool calling not returning tool call IDs (#6079 ) * fix(core): patch responses api parallel tool calling not returning tool call ids * fix(core): patch chatcompletions as well * fix: patch problem with gpt-4.1	2025-11-13 15:36:56 -08:00
Matthew Zhou	a699aca626	fix: Eliminate O(n^2) string growth for OpenAI [LET-6065] (#5973 ) Finish	2025-11-13 15:36:55 -08:00
Matthew Zhou	d3ab51b822	feat: Support parallel tool calling streaming for responses OpenAI [LET-5977] (#5867 )	2025-11-13 15:36:20 -08:00
Matthew Zhou	ff81f4153b	feat: Support parallel tool calling streaming for OpenAI chat completions [LET-4594] (#5865 ) * Finish chat completions parallel tool calling * Undo comments * Add comments * Remove test file	2025-11-13 15:36:14 -08:00
cthomas	1848df2daa	feat: add special approval request otid for openai streaming (#5744 ) * feat: add special approval request otid for openai streaming * fix import	2025-10-24 15:14:39 -07:00
cthomas	4823416af9	feat: default unpack assistant message content [LET-5404] (#5707 ) feat: default unpack assistant message content	2025-10-24 15:14:20 -07:00
Kevin Lin	08da1a64bb	feat: parse `reasoning_content` from OAI proxies (eg. vLLM / OpenRouter) (#5372 ) * reasonig_content support * fix * comment * fix * rm comment --------- Co-authored-by: Charles Packer <packercharles@gmail.com>	2025-10-24 15:11:31 -07:00
Kian Jones	c2e474e03a	feat: refactor logs to parse as a single log line each and filter out 404s from sentry (#5242 ) * add multiline log auto detect * implement logger.exception() * filter out 404 * remove potentially problematic changes	2025-10-24 15:11:31 -07:00
Matthew Zhou	5593f1450b	feat: Double write to `ToolCallMessage`'s new list `tool_calls` field (#5268 ) * Add new tool_calls field to ToolCallMessage * fern autogen * Double write to new tool_calls field * Update straggling instances	2025-10-09 13:20:52 -07:00
Sarah Wooders	ef07e03ee3	feat: add `run_id` to input messages and `step_id` to messages (#5099 )	2025-10-07 17:50:48 -07:00
cthomas	a3545110cf	feat: add full responses api support in new agent loop (#5051 ) * feat: add full responses api support in new agent loop * update matrix in workflow * relax check for reasoning messages for high effort gpt 5 * fix indent * one more relax	2025-10-07 17:50:48 -07:00
cthomas	67f8e46619	feat: add run id to streamed messages (#5037 )	2025-10-07 17:50:47 -07:00
Charles Packer	a4041879a4	feat: add new agent loop (squash rebase of OSS PR) (#4815 ) * feat: squash rebase of OSS PR * fix: revert changes that weren't on manual rebase * fix: caught another one * fix: disable force * chore: drop print * fix: just stage-api && just publish-api * fix: make agent_type consistently an arg in the client * fix: patch multi-modal support * chore: put in todo stub * fix: disable hardcoding for tests * fix: patch validate agent sync (#4882) patch validate agent sync * fix: strip bad merge diff * fix: revert unrelated diff * fix: react_v2 naming -> letta_v1 naming * fix: strip bad merge --------- Co-authored-by: Kevin Lin <klin5061@gmail.com>	2025-10-07 17:50:45 -07:00
Kian Jones	b8e9a80d93	merge this (#4759 ) * wait I forgot to comit locally * cp the entire core directory and then rm the .git subdir	2025-09-17 15:47:40 -07:00
Kian Jones	22f70ca07c	chore: officially migrate to submodule (#4502 ) * remove apps/core and apps/fern * fix precommit * add submodule updates in workflows * submodule * remove core tests * update core revision * Add submodules: true to all GitHub workflows - Ensure all workflows can access git submodules - Add submodules support to deployment, test, and CI workflows - Fix YAML syntax issues in workflow files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * remove core-lint * upgrade core with latest main of oss --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-09 12:45:53 -07:00
cthomas	cb7296c81d	fix: approval request for streaming (#4445 ) * fix: approval request for streaming * fix: claude code attempt, unit test passing (add on to #4445) (#4448) * fix: claude code attempt, unit test passing * chore: update locks to 0.1.314 from 0.1.312 * chore: just stage-api && just publish-api * chore: drop dead poetry lock --------- Co-authored-by: Charles Packer <packercharles@gmail.com>	2025-09-05 17:43:21 -07:00
Charles Packer	264171f327	fix: patch streaming hidden reasoning event [LET-4167] (#4367 ) * fix: patch streaming hidden reasoning event * fix: patch reasoning_effort not getting passed to openai	2025-09-02 16:21:18 -07:00
Charles Packer	9d49eff204	fix: patch the streaming issue in the openai client for when inner_thoughts_in_kwargs is off [LET-4146] (#4350 ) fix: patch the streaming issue in the openai client for when inner_thoughts_in_kwargs is off	2025-09-02 12:44:25 -07:00
cthomas	1edcc13778	feat: support filtering out messages when converting to openai dict (#4337 ) * feat: support filtering out messages when converting to openai dict * fix imports	2025-09-01 12:48:45 -07:00
Kian Jones	fecf6decfb	chore: migrate to ruff (#4305 ) * base requirements * autofix * Configure ruff for Python linting and formatting - Set up minimal ruff configuration with basic checks (E, W, F, I) - Add temporary ignores for common issues during migration - Configure pre-commit hooks to use ruff with pass_filenames - This enables gradual migration from black to ruff * Delete sdj * autofixed only * migrate lint action * more autofixed * more fixes * change precommit * try changing the hook * try this stuff	2025-08-29 11:11:19 -07:00
cthomas	c8b370466e	fix: duplicate message stream error (#3834 )	2025-08-11 14:27:35 -07:00
cthomas	db41f01ac2	feat: continue stream processing on client cancel (#3796 )	2025-08-07 13:17:36 -07:00
Andy Li	ca6f474c4e	feat: track metrics for runs in db	2025-08-06 15:46:50 -07:00
cthomas	7d33254f5f	feat: log stream cancellation to sentry (#3759 )	2025-08-05 16:07:30 -07:00
jnjpng	6b082f0447	fix: manually count tokens for streaming lmstudio models Co-authored-by: Jin Peng <jinjpeng@Jins-MacBook-Pro.local> Co-authored-by: Charles Packer <packercharles@gmail.com>	2025-07-29 18:12:42 -07:00
Andy Li	33c1f26ab6	feat: support for agent loop job cancelation (#2837 )	2025-07-02 14:31:16 -07:00
Kevin Lin	868294533c	feat: add omitted reasoning to streaming openai reasoning (#2846 ) Co-authored-by: Charles Packer <packercharles@gmail.com>	2025-06-24 18:47:38 -07:00
Sarah Wooders	5fa52a2c38	fix: avoid calling `model_dump` on stop reason messages twice (#2811 )	2025-06-13 18:25:35 -07:00
cthomas	1405464a1c	feat: send stop reason in letta APIs (#2789 )	2025-06-13 16:04:48 -07:00
Andy Li	33bfd14017	fix: metric tracking (#2785 )	2025-06-13 13:53:10 -07:00
cthomas	605a1f410c	feat: consolidate logic for finish tokens (#2779 )	2025-06-12 15:24:06 -07:00
Kevin Lin	58c4448235	fix: patch reasoning models (#2703 ) Co-authored-by: Charles Packer <packercharles@gmail.com>	2025-06-11 17:20:04 -07:00
Andy Li	d2252f2953	feat: otel metrics and expanded collecting (#2647 ) (passed tests in last run)	2025-06-05 17:20:14 -07:00
cthomas	22c66da7bc	fix: add temp hack to gracefully handle parallel tool calling (#2654 )	2025-06-05 14:43:46 -07:00
Kevin Lin	0d6907c8cf	fix: set openai streaming interface `letta_message_id` (#2648 ) Co-authored-by: Caren Thomas <carenthomas@gmail.com>	2025-06-05 12:26:01 -07:00
cthomas	a8f394d675	feat: populate tool call name and id in when token streaming (#2639 )	2025-06-04 17:06:44 -07:00

1 2

59 Commits