OpenAI-Compatible API

LettaBot exposes an OpenAI-compatible API so you can point any OpenAI SDK or tool at your LettaBot server and interact with your agents directly.

Endpoints

Method	Path	Description
`GET`	`/v1/models`	List available agents (as "models")
`POST`	`/v1/chat/completions`	Send a message and get a response (sync or streaming)

Both endpoints run on the same API server as the rest of LettaBot (default port 8080, configurable via server.api.port).

Authentication

All requests require an API key, passed as either:

Authorization: Bearer <key>
X-Api-Key: <key>

The API key is auto-generated on first run and saved to lettabot-api.json, or set via the LETTABOT_API_KEY environment variable. This is the same key used by /api/v1/chat and /api/v1/chat/async.

Quick Start

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="YOUR_API_KEY",
)

# Sync
response = client.chat.completions.create(
    model="lettabot",  # your agent name
    messages=[{"role": "user", "content": "What's on my todo list?"}],
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="lettabot",
    messages=[{"role": "user", "content": "What's on my todo list?"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

Node / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "YOUR_API_KEY",
});

// Sync
const response = await client.chat.completions.create({
  model: "lettabot",
  messages: [{ role: "user", content: "What's on my todo list?" }],
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: "lettabot",
  messages: [{ role: "user", content: "What's on my todo list?" }],
  stream: true,
});
for await (const chunk of stream) {
  const delta = chunk.choices[0].delta;
  if (delta.content) process.stdout.write(delta.content);
}

curl

Sync:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "lettabot",
    "messages": [{"role": "user", "content": "What is on my todo list?"}]
  }'

Streaming:

curl -N -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "lettabot",
    "messages": [{"role": "user", "content": "What is on my todo list?"}],
    "stream": true
  }'

Model Mapping

The model field maps to your agent names. Use GET /v1/models to list them:

curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

{
  "object": "list",
  "data": [
    { "id": "lettabot", "object": "model", "created": 1740000000, "owned_by": "lettabot" },
    { "id": "helper-bot", "object": "model", "created": 1740000000, "owned_by": "lettabot" }
  ]
}

If you omit model in a chat request, the first configured agent is used.

Streaming

When stream: true, responses arrive as Server-Sent Events (SSE). The stream includes:

Content deltas -- incremental text from the assistant
Tool call deltas -- tool invocations with name and arguments
Finish chunk -- finish_reason: "stop"
[DONE] sentinel -- end of stream

Internal events (reasoning, tool results) are filtered and not included in the stream.

Supported Parameters

Parameter	Supported	Notes
`model`	Yes	Maps to agent name
`messages`	Yes	Only the last user message is extracted (see below)
`stream`	Yes	`true` for SSE streaming, `false`/omitted for sync
`temperature`	Ignored	Accepted but has no effect
`max_tokens`	Ignored	Accepted but has no effect
`tools`	Ignored	Agent tools are configured server-side
`top_p`	Ignored	Accepted but has no effect
All others	Ignored	Silently accepted for compatibility

How Messages Are Handled

The OpenAI API lets you send a full conversation in the messages array. LettaBot handles this differently:

Only the last user message is extracted and sent to the agent
Multi-turn context is managed by Letta's built-in memory and conversation history, not by the messages array
System messages, assistant messages, and tool messages in the array are ignored

This means you don't need to manage conversation history client-side -- the agent remembers everything on its own.

Limitations

usage is always null -- token counts are not tracked
No multi-turn passthrough -- only the last user message is used (see above)
Tool definitions ignored -- tools are configured on the agent, not per-request
Reasoning events filtered -- the agent's internal reasoning is not exposed in the stream

Use with Open WebUI

Since the endpoint is OpenAI-compatible, you can connect it to Open WebUI or any other OpenAI-compatible frontend. Point the frontend at http://localhost:8080/v1 with your API key.

5.3 KiB Raw Blame History