Files
lettabot/docs/openai-compat.md

5.3 KiB

OpenAI-Compatible API

LettaBot exposes an OpenAI-compatible API so you can point any OpenAI SDK or tool at your LettaBot server and interact with your agents directly.

Endpoints

Method Path Description
GET /v1/models List available agents (as "models")
POST /v1/chat/completions Send a message and get a response (sync or streaming)

Both endpoints run on the same API server as the rest of LettaBot (default port 8080, configurable via server.api.port).

Authentication

All requests require an API key, passed as either:

  • Authorization: Bearer <key>
  • X-Api-Key: <key>

The API key is auto-generated on first run and saved to lettabot-api.json, or set via the LETTABOT_API_KEY environment variable. This is the same key used by /api/v1/chat and /api/v1/chat/async.

Quick Start

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="YOUR_API_KEY",
)

# Sync
response = client.chat.completions.create(
    model="lettabot",  # your agent name
    messages=[{"role": "user", "content": "What's on my todo list?"}],
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="lettabot",
    messages=[{"role": "user", "content": "What's on my todo list?"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

Node / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "YOUR_API_KEY",
});

// Sync
const response = await client.chat.completions.create({
  model: "lettabot",
  messages: [{ role: "user", content: "What's on my todo list?" }],
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: "lettabot",
  messages: [{ role: "user", content: "What's on my todo list?" }],
  stream: true,
});
for await (const chunk of stream) {
  const delta = chunk.choices[0].delta;
  if (delta.content) process.stdout.write(delta.content);
}

curl

Sync:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "lettabot",
    "messages": [{"role": "user", "content": "What is on my todo list?"}]
  }'

Streaming:

curl -N -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "lettabot",
    "messages": [{"role": "user", "content": "What is on my todo list?"}],
    "stream": true
  }'

Model Mapping

The model field maps to your agent names. Use GET /v1/models to list them:

curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"
{
  "object": "list",
  "data": [
    { "id": "lettabot", "object": "model", "created": 1740000000, "owned_by": "lettabot" },
    { "id": "helper-bot", "object": "model", "created": 1740000000, "owned_by": "lettabot" }
  ]
}

If you omit model in a chat request, the first configured agent is used.

Streaming

When stream: true, responses arrive as Server-Sent Events (SSE). The stream includes:

  • Content deltas -- incremental text from the assistant
  • Tool call deltas -- tool invocations with name and arguments
  • Finish chunk -- finish_reason: "stop"
  • [DONE] sentinel -- end of stream

Internal events (reasoning, tool results) are filtered and not included in the stream.

Supported Parameters

Parameter Supported Notes
model Yes Maps to agent name
messages Yes Only the last user message is extracted (see below)
stream Yes true for SSE streaming, false/omitted for sync
temperature Ignored Accepted but has no effect
max_tokens Ignored Accepted but has no effect
tools Ignored Agent tools are configured server-side
top_p Ignored Accepted but has no effect
All others Ignored Silently accepted for compatibility

How Messages Are Handled

The OpenAI API lets you send a full conversation in the messages array. LettaBot handles this differently:

  • Only the last user message is extracted and sent to the agent
  • Multi-turn context is managed by Letta's built-in memory and conversation history, not by the messages array
  • System messages, assistant messages, and tool messages in the array are ignored

This means you don't need to manage conversation history client-side -- the agent remembers everything on its own.

Limitations

  • usage is always null -- token counts are not tracked
  • No multi-turn passthrough -- only the last user message is used (see above)
  • Tool definitions ignored -- tools are configured on the agent, not per-request
  • Reasoning events filtered -- the agent's internal reasoning is not exposed in the stream

Use with Open WebUI

Since the endpoint is OpenAI-compatible, you can connect it to Open WebUI or any other OpenAI-compatible frontend. Point the frontend at http://localhost:8080/v1 with your API key.