letta-server/fern/pages/agents/streaming.mdx

---
title: Streaming agent responses
slug: guides/agents/streaming
---

Messages from the **Letta server** can be **streamed** to the client.
If you're building a UI on the Letta API, enabling streaming allows your UI to update in real-time as the agent generates a response to an input message.

<Warning>
When working with agents that execute long-running operations (e.g., complex tool calls, extensive searches, or code execution), you may encounter timeouts with the message routes.
See our [tips on handling long-running tasks](/guides/agents/long-running) for more info.
</Warning>

## Quick Start

Letta supports two streaming modes: **step streaming** (default) and **token streaming**.

To enable streaming, use the [`/v1/agents/{agent_id}/messages/stream`](/api-reference/agents/messages/stream) endpoint instead of `/messages`:

<CodeGroup>
```python title="python"
# Step streaming (default) - returns complete messages
stream = client.agents.messages.create_stream(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Hello!"}]
)
for chunk in stream:
    print(chunk)  # Complete message objects

# Token streaming - returns partial chunks for real-time UX
stream = client.agents.messages.create_stream(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Hello!"}],
    stream_tokens=True  # Enable token streaming
)
for chunk in stream:
    print(chunk)  # Partial content chunks
```

```typescript title="typescript"
import { LettaClient } from '@letta-ai/letta-client';

const client = new LettaClient({ token: 'YOUR_API_KEY' });

// Step streaming (default) - returns complete messages
const stream = await client.agents.messages.createStream(
    agent.id, {
        messages: [{role: "user", content: "Hello!"}]
    }
);
for await (const chunk of stream) {
    console.log(chunk);  // Complete message objects
}

// Token streaming - returns partial chunks for real-time UX
const tokenStream = await client.agents.messages.createStream(
    agent.id, {
        messages: [{role: "user", content: "Hello!"}],
        streamTokens: true  // Enable token streaming
    }
);
for await (const chunk of tokenStream) {
    console.log(chunk);  // Partial content chunks
}
```
</CodeGroup>

## Streaming Modes Comparison

| Aspect | Step Streaming (default) | Token Streaming |
|--------|-------------------------|-----------------|
| **What you get** | Complete messages after each step | Partial chunks as tokens generate |
| **When to use** | Simple implementation | ChatGPT-like real-time UX |
| **Reassembly needed** | No | Yes (by message ID) |
| **Message IDs** | Unique per message | Same ID across chunks |
| **Content format** | Full text in each message | Incremental text pieces |
| **Enable with** | Default behavior | `stream_tokens: true` |

## Understanding Message Flow

### Message Types and Flow Patterns

The messages you receive depend on your agent's configuration:

**With reasoning enabled (default):**
- Simple response: `reasoning_message` → `assistant_message`
- With tool use: `reasoning_message` → `tool_call_message` → `tool_return_message` → `reasoning_message` → `assistant_message`

**With reasoning disabled (`reasoning=false`):**
- Simple response: `assistant_message`
- With tool use: `tool_call_message` → `tool_return_message` → `assistant_message`

### Message Type Reference

- **`reasoning_message`**: Agent's internal thinking process (only when `reasoning=true`)
- **`assistant_message`**: The actual response shown to the user
- **`tool_call_message`**: Request to execute a tool
- **`tool_return_message`**: Result from tool execution
- **`stop_reason`**: Indicates end of response (`end_turn`)
- **`usage_statistics`**: Token usage and step count metrics

### Controlling Reasoning Messages

```python
# With reasoning (default) - includes reasoning_message events
agent = client.agents.create(
    model="openai/gpt-4o-mini",
    # reasoning=True is the default
)

# Without reasoning - no reasoning_message events
agent = client.agents.create(
    model="openai/gpt-4o-mini",
    reasoning=False  # Disable reasoning messages
)
```

## Step Streaming (Default)

Step streaming delivers **complete messages** after each agent step completes. This is the default behavior when you use the streaming endpoint.

### How It Works

1. Agent processes your request through steps (reasoning, tool calls, generating responses)
2. After each step completes, you receive a complete `LettaMessage` via SSE
3. Each message can be processed immediately without reassembly

### Example

<CodeGroup>
```python title="python"
stream = client.agents.messages.create_stream(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "What's 2+2?"}]
)

for chunk in stream:
    if hasattr(chunk, 'message_type'):
        if chunk.message_type == 'reasoning_message':
            print(f"Thinking: {chunk.reasoning}")
        elif chunk.message_type == 'assistant_message':
            print(f"Response: {chunk.content}")
```

```typescript title="typescript"
import { LettaClient } from '@letta-ai/letta-client';
import type { LettaMessage } from '@letta-ai/letta-client/api/types';

const client = new LettaClient({ token: 'YOUR_API_KEY' });

const stream = await client.agents.messages.createStream(
    agent.id, {
        messages: [{role: "user", content: "What's 2+2?"}]
    }
);

for await (const chunk of stream as AsyncIterable<LettaMessage>) {
    if (chunk.messageType === 'reasoning_message') {
        console.log(`Thinking: ${(chunk as any).reasoning}`);
    } else if (chunk.messageType === 'assistant_message') {
        console.log(`Response: ${(chunk as any).content}`);
    }
}
```

```bash title="curl"
curl -N --request POST \
  --url https://api.letta.com/v1/agents/$AGENT_ID/messages/stream \
  --header "Authorization: Bearer $LETTA_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{"messages": [{"role": "user", "content": "What is 2+2?"}]}'

# For self-hosted: Replace https://api.letta.com with http://localhost:8283
```
</CodeGroup>

### Example Output

```
data: {"id":"msg-123","message_type":"reasoning_message","reasoning":"User is asking a simple math question."}
data: {"id":"msg-456","message_type":"assistant_message","content":"2 + 2 equals 4!"}
data: {"message_type":"stop_reason","stop_reason":"end_turn"}
data: {"message_type":"usage_statistics","completion_tokens":50,"total_tokens":2821}
data: [DONE]
```

## Token Streaming

Token streaming provides **partial content chunks** as they're generated by the LLM, enabling a ChatGPT-like experience where text appears character by character.

### How It Works

1. Set `stream_tokens: true` in your request
2. Receive multiple chunks with the **same message ID**
3. Each chunk contains a piece of the content
4. Client must accumulate chunks by ID to rebuild complete messages

### Example with Reassembly

<CodeGroup>
```python title="python"
# Token streaming with reassembly
message_accumulators = {}

stream = client.agents.messages.create_stream(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Tell me a joke"}],
    stream_tokens=True
)

for chunk in stream:
    if hasattr(chunk, 'id') and hasattr(chunk, 'message_type'):
        msg_id = chunk.id
        msg_type = chunk.message_type

        # Initialize accumulator for new messages
        if msg_id not in message_accumulators:
            message_accumulators[msg_id] = {
                'type': msg_type,
                'content': ''
            }

        # Accumulate content
        if msg_type == 'reasoning_message':
            message_accumulators[msg_id]['content'] += chunk.reasoning
        elif msg_type == 'assistant_message':
            message_accumulators[msg_id]['content'] += chunk.content

        # Display accumulated content in real-time
        print(message_accumulators[msg_id]['content'], end='', flush=True)
```

```typescript title="typescript"
import { LettaClient } from '@letta-ai/letta-client';
import type { LettaMessage } from '@letta-ai/letta-client/api/types';

const client = new LettaClient({ token: 'YOUR_API_KEY' });

// Token streaming with reassembly
interface MessageAccumulator {
    type: string;
    content: string;
}

const messageAccumulators = new Map<string, MessageAccumulator>();

const stream = await client.agents.messages.createStream(
    agent.id, {
        messages: [{role: "user", content: "Tell me a joke"}],
        streamTokens: true  // Note: camelCase
    }
);

for await (const chunk of stream as AsyncIterable<LettaMessage>) {
    if (chunk.id && chunk.messageType) {
        const msgId = chunk.id;
        const msgType = chunk.messageType;

        // Initialize accumulator for new messages
        if (!messageAccumulators.has(msgId)) {
            messageAccumulators.set(msgId, {
                type: msgType,
                content: ''
            });
        }

        // Accumulate content based on message type
        const acc = messageAccumulators.get(msgId)!;

        // Only accumulate if the type matches (in case types share IDs)
        if (acc.type === msgType) {
            if (msgType === 'reasoning_message') {
                acc.content += (chunk as any).reasoning || '';
            } else if (msgType === 'assistant_message') {
                acc.content += (chunk as any).content || '';
            }
        }

        // Update UI with accumulated content
        process.stdout.write(acc.content);
    }
}
```

```bash title="curl"
curl -N --request POST \
  --url https://api.letta.com/v1/agents/$AGENT_ID/messages/stream \
  --header "Authorization: Bearer $LETTA_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "messages": [{"role": "user", "content": "Tell me a joke"}],
    "stream_tokens": true
  }'
```
</CodeGroup>

### Example Output

```
# Same ID across chunks of the same message
data: {"id":"msg-abc","message_type":"assistant_message","content":"Why"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" did"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" the"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" scarecrow"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" win"}
# ... more chunks with same ID
data: [DONE]
```

## Implementation Tips

### Universal Handling Pattern

The accumulator pattern shown above works for **both** streaming modes:
- **Step streaming**: Each message is complete (single chunk per ID)
- **Token streaming**: Multiple chunks per ID need accumulation

This means you can write your client code once to handle both cases.

### SSE Format Notes

All streaming responses follow the Server-Sent Events (SSE) format:
- Each event starts with `data: ` followed by JSON
- Stream ends with `data: [DONE]`
- Empty lines separate events

Learn more about SSE format [here](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events).

### Handling Different LLM Providers

If your Letta server connects to multiple LLM providers, some may not support token streaming. Your client code will still work - the server will fall back to step streaming automatically when token streaming isn't available.