Go to file

jnjpng ff69c6a32e feat: add /agents/{agent_id}/generate endpoint for direct LLM requests (#9272 )

* feat: add /agents/{agent_id}/generate endpoint for direct LLM requests

Add new endpoint that makes direct LLM provider requests without agent
context, memory, tools, or state modification. This enables:
- Quick LLM queries without agent overhead
- Testing model configurations
- Simple chat completions using agent's credentials
- Comparing responses across different models

Features:
- Uses agent's LLM config by default
- Supports model override with full provider config resolution
- Non-streaming, stateless operation
- Proper error handling and validation
- Request/response schemas with Pydantic validation

Implementation:
- Add GenerateRequest and GenerateResponse schemas
- Implement generate_completion endpoint handler
- Add necessary imports (LLMError, LLMClient, HandleNotFoundError)
- Include logging and comprehensive error handling

* fix: improve error handling and fix Message construction

- Fix critical bug: use content=[TextContent(text=...)] instead of text=...
- Add explicit error handling for NoResultFound and HandleNotFoundError
- Add error handling for convert_response_to_chat_completion
- Add structured logging for debugging
- Remove unnecessary .get() calls since Pydantic validates messages

* refactor: extract generate logic to AgentCompletionService

Move the generate endpoint business logic out of the endpoint handler
into a dedicated AgentCompletionService class for better code organization
and separation of concerns.

Changes:
- Create new AgentCompletionService in services/agent_completion_service.py
- Service handles all business logic: agent validation, LLM config resolution,
  message conversion, LLM client creation, and request/response processing
- Integrate service with SyncServer initialization
- Refactor generate_completion endpoint to use the service
- Endpoint now only handles HTTP concerns (auth, error mapping)

Benefits:
- Cleaner endpoint code (reduced from ~140 lines to ~25 lines)
- Better separation of concerns (HTTP vs business logic)
- Service logic can be reused or tested independently
- Follows established patterns in the codebase (AgentManager, etc.)

* feat: simplify generate API to accept just prompt text

Simplify the client interface by accepting a simple prompt string instead
of requiring clients to format messages.

Changes:
- Update GenerateRequest schema:
  - Replace 'messages' array with simple 'prompt' string
  - Add optional 'system_prompt' for context/instructions
  - Keep 'override_model' for model selection
- Update AgentCompletionService to format messages automatically:
  - Accepts prompt and optional system_prompt
  - Constructs message array internally (system + user messages)
  - Simpler API surface for clients
- Update endpoint documentation with new simplified examples
- Regenerate OpenAPI spec and TypeScript SDK

Benefits:
- Much simpler client experience - just send text
- No need to understand message formatting
- Still supports system prompts for context
- Cleaner API that matches common use cases

Example (before):
{
  "messages": [{"role": "user", "content": "What is 2+2?"}]
}

Example (after):
{
  "prompt": "What is 2+2?"
}

* test: add comprehensive integration tests for generate endpoint

Add 9 integration tests covering various scenarios:

Happy path tests:
- test_agent_generate_basic: Basic prompt -> response flow
- test_agent_generate_with_system_prompt: System prompt + user prompt
- test_agent_generate_with_model_override: Override model selection
- test_agent_generate_long_prompt: Handle longer prompts
- test_agent_generate_no_persistence: Verify no messages saved to agent

Error handling tests:
- test_agent_generate_empty_prompt_error: Empty prompt validation (422)
- test_agent_generate_invalid_agent_id: Invalid agent ID (404)
- test_agent_generate_invalid_model_override: Invalid model handle (404)

All tests verify:
- Response structure (content, model, usage)
- Proper status codes for errors
- Usage statistics (tokens, counts)
- No side effects on agent state

Tests follow existing test patterns in test_client.py and use the
letta_client SDK (assuming generate_completion method is auto-generated
from the OpenAPI spec).

* openapi

* refactor: rename AgentCompletionService to AgentGenerateCompletionManager

Rename for better clarity and consistency with codebase naming conventions:
- Rename file: agent_completion_service.py → agent_generate_completion_manager.py
- Rename class: AgentCompletionService → AgentGenerateCompletionManager
- Rename attribute: server.agent_completion_service → server.agent_generate_completion_manager
- Update docstrings: 'Service' → 'Manager'

Changes:
- apps/core/letta/services/agent_generate_completion_manager.py (renamed + updated class)
- apps/core/letta/server/server.py (import + initialization)
- apps/core/letta/server/rest_api/routers/v1/agents.py (usage in endpoint)

No functional changes, purely a naming refactor.

* fix: remove invalid Message parameters in generate manager

Remove agent_id=None and user_id=None from Message construction.
The Message model doesn't accept these as None values - only pass
required parameters (role, content).

Fixes validation error:
  'Extra inputs are not permitted [type=extra_forbidden, input_value=None]'

This aligns with other Message construction patterns in the codebase
(see tools.py, memory.py examples).

* feat: improve generate endpoint validation and tests

- Add field validator for whitespace-only prompts
- Always include system message (required by Anthropic)
- Use default "You are a helpful assistant." when no system_prompt provided
- Update tests to use direct HTTP calls via httpx
- Fix test issues:
  - Use valid agent ID format (agent-{uuid})
  - Use available model (openai/gpt-4o-mini)
  - Add whitespace validation test
- All 9 integration tests passing

2026-02-24 10:52:06 -08:00

.github

fix: update gh templates (#3155 )

2026-01-18 13:50:17 -08:00

.skills

refactor: add extract_usage_statistics returning LettaUsageStatistics (#9065 )

2026-01-29 12:44:04 -08:00

alembic

feat: add usage columns to steps table (#9270 )

2026-02-24 10:52:06 -08:00

assets

chore: Update README.md (#2215 )

2024-12-10 19:20:27 -08:00

certs

feat: support local https mode (#2217 )

2024-12-10 13:36:20 -08:00

chore: migrate package name to letta (#1775 )

2024-09-23 09:15:18 -07:00

examples/notebooks/data

chore: remove old examples (#6255 )

2025-11-24 19:09:33 -08:00

fern

feat: add /agents/{agent_id}/generate endpoint for direct LLM requests (#9272 )

2026-02-24 10:52:06 -08:00

letta

feat: add /agents/{agent_id}/generate endpoint for direct LLM requests (#9272 )

2026-02-24 10:52:06 -08:00

otel

feat: Ship traces to datadog and add trace correlation (#6311 )

2025-11-24 19:10:26 -08:00

sandbox

fix: safer type coersion for tools (#8990 )

2026-01-29 12:43:53 -08:00

scripts

cleanup

2025-04-21 08:43:29 -07:00

tests

feat: add /agents/{agent_id}/generate endpoint for direct LLM requests (#9272 )

2026-02-24 10:52:06 -08:00

.dockerignore

fix: patch Dockerfile for purpose of docker run (#2177 )

2024-12-09 15:03:11 -08:00

.env.example

fix example

2024-12-27 11:28:00 +04:00

.gitattributes

chore: .gitattributes (#1511 )

2024-07-04 14:45:35 -07:00

.gitignore

feat: Write tests for search messages [LET-4212] (#4447 )

2025-09-05 17:52:13 -07:00

.pre-commit-config.yaml

chore: migrate to ruff (#4305 )

2025-08-29 11:11:19 -07:00

.python-version

feat: add custom version of ddtrace which supports anthropic (#8419 )

2026-01-12 10:57:48 -08:00

alembic.ini

chore: support alembic (#1867 )

2024-10-11 15:51:14 -07:00

CITATION.cff

fix: Update CITATION.cff (#2009 )

2024-11-06 23:00:17 -08:00

compose.yaml

fix: correct external db (#2163 )

2025-05-13 15:32:09 -07:00

conf.yaml

feat: add support for YAML config file (#8999 )

2026-02-24 10:52:06 -08:00

CONTRIBUTING.md

Update contributing.md with corrected local setup steps (#3123 )

2025-12-31 12:24:14 -08:00

dev-compose.yaml

chore: bump pgvector container (#9271 )

2026-02-24 10:52:06 -08:00

development.compose.yml

fix: fix core memory heartbeat issue (#1929 )

2024-10-23 12:22:37 -07:00

docker-compose-vllm.yaml

feat: rename docker to letta/letta (#2010 )

2024-11-06 23:15:25 -08:00

Dockerfile

chore: bump pgvector container (#9271 )

2026-02-24 10:52:06 -08:00

init.sql

chore: migrate package name to letta (#1775 )

2024-09-23 09:15:18 -07:00

LICENSE

chore: migrate package name to letta (#1775 )

2024-09-23 09:15:18 -07:00

nginx.conf

fix: Fix Docker compose startup issues (letta-ai#2056) (#2057 )

2024-11-17 19:28:53 -08:00

package-lock.json

feat: add sonnet 3.7 support (#1302 )

2025-03-24 16:36:16 -10:00

PRIVACY.md

chore: migrate package name to letta (#1775 )

2024-09-23 09:15:18 -07:00

project.json

fix: try and patch the PATCH/update issue with MCP server URL [LET-3933]

2025-09-03 09:42:57 -07:00

pyproject.toml

feat: git smart HTTP for agent memory repos (#9257 )

2026-02-24 10:52:06 -08:00

README.md

fix: update gh templates (#3155 )

2026-01-18 13:50:17 -08:00

TERMS.md

chore: migrate package name to letta (#1775 )

2024-09-23 09:15:18 -07:00

test_watchdog_hang.py

Add lightweight event loop watchdog monitoring (#6209 )

2025-11-24 19:09:33 -08:00

uv.lock

feat: git smart HTTP for agent memory repos (#9257 )

2026-02-24 10:52:06 -08:00

WEBHOOK_SETUP.md

feat: support webhooks for step completions (#5904 )

2025-11-13 15:36:50 -08:00

README.md

Letta (formerly MemGPT)

Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.

Letta Code: run agents locally in your terminal
Letta API: build agents into your applications

Get started in the CLI

Requires Node.js 18+

Install the Letta Code CLI tool: npm install -g @letta-ai/letta-code
Run letta in your terminal to launch an agent with memory running on your local computer

When running the CLI tool, your agent help you code and do any task you can do on your computer.

Letta Code supports skills and subagents, and bundles pre-built skills/subagents for advanced memory and continual learning. Letta is fully model-agnostic, though we recommend Opus 4.5 and GPT-5.2 for best performance (see our model leaderboard for our rankings).

Get started with the Letta API

Use the Letta API to integrate stateful agents into your own applications. Letta has a full-featured agents API, and a Python and Typescript SDK (view our API reference).

Installation

TypeScript / Node.js:

npm install @letta-ai/letta-client

Python:

pip install letta-client

Hello World example

Below is a quick example of creating a stateful agent and sending it a message (requires a Letta API key). See the full quickstart guide for complete documentation.

TypeScript:

import Letta from "@letta-ai/letta-client";

const client = new Letta({ apiKey: process.env.LETTA_API_KEY });

// Create your agent
const agentState = await client.agents.create({
  model: "openai/gpt-5.2",
  memory_blocks: [
    {
      label: "human",
      value:
        "Name: Timber. Status: dog. Occupation: building Letta, infrastructure to democratize self-improving superintelligence",
    },
    {
      label: "persona",
      value:
        "I am a self-improving superintelligence. Timber is my best friend and collaborator.",
    },
  ],
  tools: ["web_search", "fetch_webpage"],
});

console.log("Agent created with ID:", agentState.id);

// Send your agent a message
const response = await client.agents.messages.create(agentState.id, {
  input: "What do you know about me?",
});

for (const message of response.messages) {
  console.log(message);
}

Python:

from letta_client import Letta
import os

client = Letta(api_key=os.getenv("LETTA_API_KEY"))

# Create your agent
agent_state = client.agents.create(
    model="openai/gpt-5.2",
    memory_blocks=[
        {
          "label": "human",
          "value": "Name: Timber. Status: dog. Occupation: building Letta, infrastructure to democratize self-improving superintelligence"
        },
        {
          "label": "persona",
          "value": "I am a self-improving superintelligence. Timber is my best friend and collaborator."
        }
    ],
    tools=["web_search", "fetch_webpage"]
)

print(f"Agent created with ID: {agent_state.id}")

# Send your agent a message
response = client.agents.messages.create(
    agent_id=agent_state.id,
    input="What do you know about me?"
)

for message in response.messages:
    print(message)

Contributing

Letta is an open source project built by over a hundred contributors from around the world. There are many ways to get involved in the Letta OSS project!

Join the Discord: Chat with the Letta devs and other AI developers.
Chat on our forum: If you're not into Discord, check out our developer forum.
Follow our socials: Twitter/X, LinkedIn, YouTube

Legal notices: By using Letta and related Letta services (such as the Letta endpoint or hosted service), you are agreeing to our privacy policy and terms of service.