168 lines
6.2 KiB
Plaintext
168 lines
6.2 KiB
Plaintext
---
|
|
title: Streaming agent responses
|
|
slug: guides/agents/streaming
|
|
---
|
|
|
|
Messages from the **Letta server** can be **streamed** to the client.
|
|
If you're building a UI on the Letta API, enabling streaming allows your UI to update in real-time as the agent generates a response to an input message.
|
|
|
|
There are two kinds of streaming you can enable: **streaming agent steps** and **streaming tokens**.
|
|
To enable streaming (either mode), you need to use the [`/v1/agent/messages/stream`](/api-reference/agents/messages/stream) API route instead of the [`/v1/agent/messages`](/api-reference/agents/messages) API route.
|
|
|
|
<Warning>
|
|
When working with agents that execute long-running operations (e.g., complex tool calls, extensive searches, or code execution), you may encounter timeouts with the message routes.
|
|
See our [tips on handling long-running tasks](/guides/agents/long-running) for more info.
|
|
</Warning>
|
|
|
|
## Streaming agent steps
|
|
|
|
When you send a message to the Letta server, the agent may run multiple steps while generating a response.
|
|
For example, an agent may run a search query, then use the results of that query to generate a response.
|
|
|
|
When you use the `/messages/stream` route, `stream_steps` is enabled by default, and the response to the `POST` request will stream back as server-sent events (read more about SSE format [here](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events)):
|
|
<CodeGroup>
|
|
```curl curl
|
|
curl --request POST \
|
|
--url http://localhost:8283/v1/agents/$AGENT_ID/messages/stream \
|
|
--header 'Content-Type: application/json' \
|
|
--data '{
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": "hows it going????"
|
|
}
|
|
]
|
|
}'
|
|
```
|
|
```python title="python" maxLines=50
|
|
# send a message to the agent (streaming steps)
|
|
stream = client.agents.messages.create_stream(
|
|
agent_id=agent_state.id,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "hows it going????"
|
|
}
|
|
],
|
|
)
|
|
|
|
# print the chunks coming back
|
|
for chunk in stream:
|
|
print(chunk)
|
|
```
|
|
```typescript maxLines=50 title="node.js"
|
|
// send a message to the agent (streaming steps)
|
|
const stream = await client.agents.messages.create_stream(
|
|
agentState.id, {
|
|
messages: [
|
|
{
|
|
role: "user",
|
|
content: "hows it going????"
|
|
}
|
|
]
|
|
}
|
|
);
|
|
|
|
// print the chunks coming back
|
|
for await (const chunk of stream) {
|
|
console.log(chunk);
|
|
};
|
|
```
|
|
</CodeGroup>
|
|
|
|
```json maxLines=50
|
|
data: {"id":"...","date":"...","message_type":"reasoning_message","reasoning":"User keeps asking the same question; maybe it's part of their style or humor. I\u2019ll respond warmly and play along."}
|
|
|
|
data: {"id":"...","date":"...","message_type":"assistant_message","assistant_message":"Hey! It\u2019s going well! Still here, ready to chat. How about you? Anything exciting happening?"}
|
|
|
|
data: {"message_type":"usage_statistics","completion_tokens":65,"prompt_tokens":2329,"total_tokens":2394,"step_count":1}
|
|
|
|
data: [DONE]
|
|
```
|
|
|
|
## Streaming tokens
|
|
|
|
You can also stream chunks of tokens from the agent as they are generated by the underlying LLM process by setting `stream_tokens` to `true` in your API request:
|
|
<CodeGroup>
|
|
```curl curl
|
|
curl --request POST \
|
|
--url http://localhost:8283/v1/agents/$AGENT_ID/messages/stream \
|
|
--header 'Content-Type: application/json' \
|
|
--data '{
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": "hows it going????"
|
|
}
|
|
],
|
|
"stream_tokens": true
|
|
}'
|
|
```
|
|
```python title="python" maxLines=50
|
|
# send a message to the agent (streaming steps)
|
|
stream = client.agents.messages.create_stream(
|
|
agent_id=agent_state.id,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "hows it going????"
|
|
}
|
|
],
|
|
stream_tokens=True,
|
|
)
|
|
|
|
# print the chunks coming back
|
|
for chunk in stream:
|
|
print(chunk)
|
|
```
|
|
```typescript maxLines=50 title="node.js"
|
|
// send a message to the agent (streaming steps)
|
|
const stream = await client.agents.messages.create_stream(
|
|
agentState.id, {
|
|
messages: [
|
|
{
|
|
role: "user",
|
|
content: "hows it going????"
|
|
}
|
|
],
|
|
streamTokens: true
|
|
}
|
|
);
|
|
|
|
// print the chunks coming back
|
|
for await (const chunk of stream) {
|
|
console.log(chunk);
|
|
};
|
|
```
|
|
</CodeGroup>
|
|
|
|
With token streaming enabled, the response will look very similar to the prior example (agent steps streaming), but instead of receiving complete messages, the client receives multiple messages with chunks of the response.
|
|
The client is responsible for reassembling the response from the chunks.
|
|
We've ommited most of the chunks for brevity:
|
|
```sh
|
|
data: {"id":"...","date":"...","message_type":"reasoning_message","reasoning":"It's"}
|
|
|
|
data: {"id":"...","date":"...","message_type":"reasoning_message","reasoning":" interesting"}
|
|
|
|
... chunks ommited
|
|
|
|
data: {"id":"...","date":"...","message_type":"reasoning_message","reasoning":"!"}
|
|
|
|
data: {"id":"...","date":"...","message_type":"assistant_message","assistant_message":"Well"}
|
|
|
|
... chunks ommited
|
|
|
|
data: {"id":"...","date":"...","message_type":"assistant_message","assistant_message":"."}
|
|
|
|
data: {"message_type":"usage_statistics","completion_tokens":50,"prompt_tokens":2771,"total_tokens":2821,"step_count":1}
|
|
|
|
data: [DONE]
|
|
```
|
|
|
|
## Tips on handling streaming in your client code
|
|
The data structure for token streaming is the same as for agent steps streaming (`LettaMessage`) - just instead of returning complete messages, the Letta server will return multiple messages each with a chunk of the response.
|
|
Because the format of the data looks the same, if you write your frontend code to handle tokens streaming, it will also work for agent steps streaming.
|
|
|
|
For example, if the Letta server is connected to multiple LLM backend providers and only a subset of them support LLM token streaming, you can use the same frontend code (interacting with the Letta API) to handle both streaming and non-streaming providers.
|
|
If you send a message to an agent with streaming enabled (`stream_tokens` are `true`), the server will stream back `LettaMessage` objects with chunks if the selected LLM provider supports token streaming, and `LettaMessage` objects with complete strings if the selected LLM provider does not support token streaming.
|