jnjpng e8d5922ff9 fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming (#9535)
* fix(core): handle ResponseIncompleteEvent in OpenAI Responses API streaming

When reasoning models (gpt-5.x) exhaust their max_output_tokens budget
on chain-of-thought reasoning, OpenAI emits a ResponseIncompleteEvent
instead of ResponseCompletedEvent. This was previously unhandled, causing
final_response to remain None — which meant get_content() and
get_tool_call_objects() returned empty results, silently dropping the
partial response.

Now ResponseIncompleteEvent is handled identically to
ResponseCompletedEvent (extracting partial content, usage stats, and
token details), with an additional warning log indicating the incomplete
reason.

* fix(core): propagate finish_reason for Responses API incomplete events

- Guard usage extraction against None usage payload in
  ResponseIncompleteEvent handler
- Add _finish_reason override to LettaLLMAdapter so streaming adapters
  can explicitly set finish_reason without a chat_completions_response
- Map incomplete_details.reason="max_output_tokens" to
  finish_reason="length" in SimpleLLMStreamAdapter, matching the Chat
  Completions API convention
- This allows the agent loop's _decide_continuation to correctly return
  stop_reason="max_tokens_exceeded" instead of "end_turn" when the model
  exhausts its output token budget on reasoning

* fix(core): handle empty content parts in incomplete ResponseOutputMessage

When a model hits max_output_tokens after starting a ResponseOutputMessage
but before producing any content parts, the message has content=[]. This
previously raised ValueError("Got 0 content parts, expected 1"). Now it
logs a warning and skips the empty message, allowing reasoning-only
incomplete responses to be processed cleanly.

* fix(core): map all incomplete reasons to finish_reason, not just max_output_tokens

Handle content_filter and any future unknown incomplete reasons from the
Responses API instead of silently leaving finish_reason as None.
2026-02-24 10:55:11 -08:00
2024-12-10 19:20:27 -08:00
2026-02-24 10:55:11 -08:00
2025-04-21 08:43:29 -07:00
2024-12-27 11:28:00 +04:00
2024-07-04 14:45:35 -07:00
2024-10-11 15:51:14 -07:00
2024-11-06 23:00:17 -08:00
2025-05-13 15:32:09 -07:00
2026-01-18 13:50:17 -08:00

Letta logo

Letta (formerly MemGPT)

Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.

  • Letta Code: run agents locally in your terminal
  • Letta API: build agents into your applications

Get started in the CLI

Requires Node.js 18+

  1. Install the Letta Code CLI tool: npm install -g @letta-ai/letta-code
  2. Run letta in your terminal to launch an agent with memory running on your local computer

When running the CLI tool, your agent help you code and do any task you can do on your computer.

Letta Code supports skills and subagents, and bundles pre-built skills/subagents for advanced memory and continual learning. Letta is fully model-agnostic, though we recommend Opus 4.5 and GPT-5.2 for best performance (see our model leaderboard for our rankings).

Get started with the Letta API

Use the Letta API to integrate stateful agents into your own applications. Letta has a full-featured agents API, and a Python and Typescript SDK (view our API reference).

Installation

TypeScript / Node.js:

npm install @letta-ai/letta-client

Python:

pip install letta-client

Hello World example

Below is a quick example of creating a stateful agent and sending it a message (requires a Letta API key). See the full quickstart guide for complete documentation.

TypeScript:

import Letta from "@letta-ai/letta-client";

const client = new Letta({ apiKey: process.env.LETTA_API_KEY });

// Create your agent
const agentState = await client.agents.create({
  model: "openai/gpt-5.2",
  memory_blocks: [
    {
      label: "human",
      value:
        "Name: Timber. Status: dog. Occupation: building Letta, infrastructure to democratize self-improving superintelligence",
    },
    {
      label: "persona",
      value:
        "I am a self-improving superintelligence. Timber is my best friend and collaborator.",
    },
  ],
  tools: ["web_search", "fetch_webpage"],
});

console.log("Agent created with ID:", agentState.id);

// Send your agent a message
const response = await client.agents.messages.create(agentState.id, {
  input: "What do you know about me?",
});

for (const message of response.messages) {
  console.log(message);
}

Python:

from letta_client import Letta
import os

client = Letta(api_key=os.getenv("LETTA_API_KEY"))

# Create your agent
agent_state = client.agents.create(
    model="openai/gpt-5.2",
    memory_blocks=[
        {
          "label": "human",
          "value": "Name: Timber. Status: dog. Occupation: building Letta, infrastructure to democratize self-improving superintelligence"
        },
        {
          "label": "persona",
          "value": "I am a self-improving superintelligence. Timber is my best friend and collaborator."
        }
    ],
    tools=["web_search", "fetch_webpage"]
)

print(f"Agent created with ID: {agent_state.id}")

# Send your agent a message
response = client.agents.messages.create(
    agent_id=agent_state.id,
    input="What do you know about me?"
)

for message in response.messages:
    print(message)

Contributing

Letta is an open source project built by over a hundred contributors from around the world. There are many ways to get involved in the Letta OSS project!


Legal notices: By using Letta and related Letta services (such as the Letta endpoint or hosted service), you are agreeing to our privacy policy and terms of service.

Description
letta-server - primary development repo
Readme Cite this repository 146 MiB
Languages
Python 99.5%