feat: add /agents/{agent_id}/generate endpoint for direct LLM requests (#9272)

* feat: add /agents/{agent_id}/generate endpoint for direct LLM requests

Add new endpoint that makes direct LLM provider requests without agent
context, memory, tools, or state modification. This enables:
- Quick LLM queries without agent overhead
- Testing model configurations
- Simple chat completions using agent's credentials
- Comparing responses across different models

Features:
- Uses agent's LLM config by default
- Supports model override with full provider config resolution
- Non-streaming, stateless operation
- Proper error handling and validation
- Request/response schemas with Pydantic validation

Implementation:
- Add GenerateRequest and GenerateResponse schemas
- Implement generate_completion endpoint handler
- Add necessary imports (LLMError, LLMClient, HandleNotFoundError)
- Include logging and comprehensive error handling

* fix: improve error handling and fix Message construction

- Fix critical bug: use content=[TextContent(text=...)] instead of text=...
- Add explicit error handling for NoResultFound and HandleNotFoundError
- Add error handling for convert_response_to_chat_completion
- Add structured logging for debugging
- Remove unnecessary .get() calls since Pydantic validates messages

* refactor: extract generate logic to AgentCompletionService

Move the generate endpoint business logic out of the endpoint handler
into a dedicated AgentCompletionService class for better code organization
and separation of concerns.

Changes:
- Create new AgentCompletionService in services/agent_completion_service.py
- Service handles all business logic: agent validation, LLM config resolution,
  message conversion, LLM client creation, and request/response processing
- Integrate service with SyncServer initialization
- Refactor generate_completion endpoint to use the service
- Endpoint now only handles HTTP concerns (auth, error mapping)

Benefits:
- Cleaner endpoint code (reduced from ~140 lines to ~25 lines)
- Better separation of concerns (HTTP vs business logic)
- Service logic can be reused or tested independently
- Follows established patterns in the codebase (AgentManager, etc.)

* feat: simplify generate API to accept just prompt text

Simplify the client interface by accepting a simple prompt string instead
of requiring clients to format messages.

Changes:
- Update GenerateRequest schema:
  - Replace 'messages' array with simple 'prompt' string
  - Add optional 'system_prompt' for context/instructions
  - Keep 'override_model' for model selection
- Update AgentCompletionService to format messages automatically:
  - Accepts prompt and optional system_prompt
  - Constructs message array internally (system + user messages)
  - Simpler API surface for clients
- Update endpoint documentation with new simplified examples
- Regenerate OpenAPI spec and TypeScript SDK

Benefits:
- Much simpler client experience - just send text
- No need to understand message formatting
- Still supports system prompts for context
- Cleaner API that matches common use cases

Example (before):
{
  "messages": [{"role": "user", "content": "What is 2+2?"}]
}

Example (after):
{
  "prompt": "What is 2+2?"
}

* test: add comprehensive integration tests for generate endpoint

Add 9 integration tests covering various scenarios:

Happy path tests:
- test_agent_generate_basic: Basic prompt -> response flow
- test_agent_generate_with_system_prompt: System prompt + user prompt
- test_agent_generate_with_model_override: Override model selection
- test_agent_generate_long_prompt: Handle longer prompts
- test_agent_generate_no_persistence: Verify no messages saved to agent

Error handling tests:
- test_agent_generate_empty_prompt_error: Empty prompt validation (422)
- test_agent_generate_invalid_agent_id: Invalid agent ID (404)
- test_agent_generate_invalid_model_override: Invalid model handle (404)

All tests verify:
- Response structure (content, model, usage)
- Proper status codes for errors
- Usage statistics (tokens, counts)
- No side effects on agent state

Tests follow existing test patterns in test_client.py and use the
letta_client SDK (assuming generate_completion method is auto-generated
from the OpenAPI spec).

* openapi

* refactor: rename AgentCompletionService to AgentGenerateCompletionManager

Rename for better clarity and consistency with codebase naming conventions:
- Rename file: agent_completion_service.py → agent_generate_completion_manager.py
- Rename class: AgentCompletionService → AgentGenerateCompletionManager
- Rename attribute: server.agent_completion_service → server.agent_generate_completion_manager
- Update docstrings: 'Service' → 'Manager'

Changes:
- apps/core/letta/services/agent_generate_completion_manager.py (renamed + updated class)
- apps/core/letta/server/server.py (import + initialization)
- apps/core/letta/server/rest_api/routers/v1/agents.py (usage in endpoint)

No functional changes, purely a naming refactor.

* fix: remove invalid Message parameters in generate manager

Remove agent_id=None and user_id=None from Message construction.
The Message model doesn't accept these as None values - only pass
required parameters (role, content).

Fixes validation error:
  'Extra inputs are not permitted [type=extra_forbidden, input_value=None]'

This aligns with other Message construction patterns in the codebase
(see tools.py, memory.py examples).

* feat: improve generate endpoint validation and tests

- Add field validator for whitespace-only prompts
- Always include system message (required by Anthropic)
- Use default "You are a helpful assistant." when no system_prompt provided
- Update tests to use direct HTTP calls via httpx
- Fix test issues:
  - Use valid agent ID format (agent-{uuid})
  - Use available model (openai/gpt-4o-mini)
  - Add whitespace validation test
- All 9 integration tests passing
This commit is contained in:
jnjpng
2026-02-04 21:33:49 -08:00
committed by Caren Thomas
parent 597c7f3e3e
commit ff69c6a32e
5 changed files with 606 additions and 1 deletions

View File

@@ -8003,6 +8003,62 @@
}
}
},
"/v1/agents/{agent_id}/generate": {
"post": {
"tags": ["agents"],
"summary": "Generate Completion",
"description": "Generate a completion directly from the LLM provider using the agent's configuration.\n\nThis endpoint makes a direct request to the LLM provider without any agent processing:\n- No memory or context retrieval\n- No tool calling\n- No message persistence\n- No agent state modification\n\nSimply provide a prompt, and the endpoint formats it as a user message.\nOptionally include a system_prompt for context/instructions.\n\nThe agent's LLM configuration (model, credentials, settings) is used by default.\nUse override_model to switch to a different model/provider while still using\nthe organization's configured providers.\n\nExample use cases:\n- Quick LLM queries without agent overhead\n- Testing different models with the same prompt\n- Simple chat completions using agent's credentials\n- Comparing model outputs on identical prompts",
"operationId": "generate_completion",
"parameters": [
{
"name": "agent_id",
"in": "path",
"required": true,
"schema": {
"type": "string",
"minLength": 42,
"maxLength": 42,
"pattern": "^agent-[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$",
"description": "The ID of the agent in the format 'agent-<uuid4>'",
"examples": ["agent-123e4567-e89b-42d3-8456-426614174000"],
"title": "Agent Id"
},
"description": "The ID of the agent in the format 'agent-<uuid4>'"
}
],
"requestBody": {
"required": true,
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/GenerateRequest"
}
}
}
},
"responses": {
"200": {
"description": "Successful generation",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/GenerateResponse"
}
}
}
},
"404": {
"description": "Agent not found"
},
"422": {
"description": "Invalid request parameters"
},
"502": {
"description": "LLM provider error"
}
}
}
},
"/v1/agents/messages/search": {
"post": {
"tags": ["agents"],
@@ -33584,6 +33640,66 @@
"type": "object",
"title": "GeminiThinkingConfig"
},
"GenerateRequest": {
"properties": {
"prompt": {
"type": "string",
"minLength": 1,
"title": "Prompt",
"description": "The prompt/message to send to the LLM"
},
"system_prompt": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "System Prompt",
"description": "Optional system prompt to prepend to the conversation"
},
"override_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Override Model",
"description": "Model handle to use instead of agent's default (e.g., 'openai/gpt-4', 'anthropic/claude-3-5-sonnet')"
}
},
"type": "object",
"required": ["prompt"],
"title": "GenerateRequest",
"description": "Request for direct LLM generation without agent processing."
},
"GenerateResponse": {
"properties": {
"content": {
"type": "string",
"title": "Content",
"description": "The LLM's response text"
},
"model": {
"type": "string",
"title": "Model",
"description": "The model that generated this response"
},
"usage": {
"$ref": "#/components/schemas/LettaUsageStatistics",
"description": "Token usage statistics"
}
},
"type": "object",
"required": ["content", "model", "usage"],
"title": "GenerateResponse",
"description": "Response from direct LLM generation."
},
"GenerateToolInput": {
"properties": {
"tool_name": {

View File

@@ -8,7 +8,7 @@ from fastapi import APIRouter, Body, Depends, File, Form, Header, HTTPException,
from fastapi.responses import JSONResponse
from marshmallow import ValidationError
from orjson import orjson
from pydantic import BaseModel, ConfigDict, Field
from pydantic import BaseModel, ConfigDict, Field, field_validator
from sqlalchemy.exc import IntegrityError, OperationalError
from starlette.responses import Response, StreamingResponse
@@ -24,11 +24,14 @@ from letta.errors import (
AgentExportProcessingError,
AgentFileImportError,
AgentNotFoundForExportError,
HandleNotFoundError,
LLMError,
NoActiveRunsToCancelError,
PendingApprovalError,
)
from letta.groups.sleeptime_multi_agent_v4 import SleeptimeMultiAgentV4
from letta.helpers.datetime_helpers import get_utc_time, get_utc_timestamp_ns
from letta.llm_api.llm_client import LLMClient
from letta.log import get_logger
from letta.orm.errors import NoResultFound
from letta.otel.context import get_ctx_attributes
@@ -59,6 +62,7 @@ from letta.schemas.run import Run as PydanticRun, RunUpdate
from letta.schemas.source import BaseSource, Source
from letta.schemas.tool import BaseTool, Tool
from letta.schemas.tool_execution_result import ToolExecutionResult
from letta.schemas.usage import LettaUsageStatistics
from letta.schemas.user import User
from letta.serialize_schemas.pydantic_agent_schema import AgentSchema
from letta.server.rest_api.dependencies import HeaderParams, get_headers, get_letta_server
@@ -79,6 +83,43 @@ router = APIRouter(prefix="/agents", tags=["agents"])
logger = get_logger(__name__)
# Schemas for direct LLM generation endpoint
class GenerateRequest(BaseModel):
"""Request for direct LLM generation without agent processing."""
prompt: str = Field(
...,
description="The prompt/message to send to the LLM",
min_length=1,
)
system_prompt: Optional[str] = Field(
None,
description="Optional system prompt to prepend to the conversation",
)
override_model: Optional[str] = Field(
None,
description="Model handle to use instead of agent's default (e.g., 'openai/gpt-4', 'anthropic/claude-3-5-sonnet')",
)
@field_validator("prompt")
@classmethod
def validate_prompt_not_empty(cls, v: str) -> str:
"""Ensure prompt is not empty or whitespace-only."""
if not v or not v.strip():
raise ValueError("prompt cannot be empty or whitespace-only")
return v
class GenerateResponse(BaseModel):
"""Response from direct LLM generation."""
content: str = Field(..., description="The LLM's response text")
model: str = Field(..., description="The model that generated this response")
usage: LettaUsageStatistics = Field(..., description="Token usage statistics")
@router.get("/", response_model=list[AgentState], operation_id="list_agents")
async def list_agents(
name: str | None = Query(None, description="Name of the agent"),
@@ -1785,6 +1826,75 @@ async def cancel_message(
return results
@router.post(
"/{agent_id}/generate",
response_model=GenerateResponse,
operation_id="generate_completion",
responses={
200: {"description": "Successful generation"},
404: {"description": "Agent not found"},
422: {"description": "Invalid request parameters"},
502: {"description": "LLM provider error"},
},
)
async def generate_completion(
agent_id: AgentId,
server: SyncServer = Depends(get_letta_server),
request: GenerateRequest = Body(...),
headers: HeaderParams = Depends(get_headers),
) -> GenerateResponse:
"""
Generate a completion directly from the LLM provider using the agent's configuration.
This endpoint makes a direct request to the LLM provider without any agent processing:
- No memory or context retrieval
- No tool calling
- No message persistence
- No agent state modification
Simply provide a prompt, and the endpoint formats it as a user message.
Optionally include a system_prompt for context/instructions.
The agent's LLM configuration (model, credentials, settings) is used by default.
Use override_model to switch to a different model/provider while still using
the organization's configured providers.
Example use cases:
- Quick LLM queries without agent overhead
- Testing different models with the same prompt
- Simple chat completions using agent's credentials
- Comparing model outputs on identical prompts
"""
# Get actor for permissions
actor = await server.user_manager.get_actor_or_default_async(actor_id=headers.actor_id)
# Call the manager to generate the completion
try:
service_response = await server.agent_generate_completion_manager.generate_completion_with_agent_config_async(
agent_id=str(agent_id),
prompt=request.prompt,
system_prompt=request.system_prompt,
actor=actor,
override_model=request.override_model,
)
except NoResultFound:
raise HTTPException(status_code=404, detail=f"Agent with ID {agent_id} not found")
except HandleNotFoundError:
raise HTTPException(status_code=404, detail=f"Model '{request.override_model}' not found or not accessible")
except LLMError as e:
raise HTTPException(status_code=502, detail=f"LLM provider error: {str(e)}")
except Exception as e:
logger.error(f"Failed to process LLM response: {str(e)}")
raise HTTPException(status_code=502, detail=f"Failed to process LLM response: {str(e)}")
# Convert service response to API response model
return GenerateResponse(
content=service_response.content,
model=service_response.model,
usage=service_response.usage,
)
@router.post("/messages/search", response_model=List[MessageSearchResult], operation_id="search_messages")
async def search_messages(
request: MessageSearchRequest = Body(...),

View File

@@ -191,6 +191,11 @@ class SyncServer(object):
self.file_agent_manager = FileAgentManager()
self.file_manager = FileManager()
# Import and initialize the agent generate completion manager
from letta.services.agent_generate_completion_manager import AgentGenerateCompletionManager
self.agent_generate_completion_manager = AgentGenerateCompletionManager(server=self)
self.agent_serialization_manager = AgentSerializationManager(
agent_manager=self.agent_manager,
tool_manager=self.tool_manager,

View File

@@ -0,0 +1,168 @@
"""Manager for handling direct LLM completions using agent configuration."""
from typing import TYPE_CHECKING, Optional
from letta.errors import HandleNotFoundError, LLMError
from letta.llm_api.llm_client import LLMClient
from letta.log import get_logger
from letta.orm.errors import NoResultFound
from letta.schemas.enums import MessageRole
from letta.schemas.letta_message_content import TextContent
from letta.schemas.message import Message
from letta.schemas.usage import LettaUsageStatistics
if TYPE_CHECKING:
from letta.orm import User
from letta.schemas.llm_config import LLMConfig
from letta.server.server import SyncServer
logger = get_logger(__name__)
class GenerateResponse:
"""Response from direct LLM generation."""
def __init__(self, content: str, model: str, usage: LettaUsageStatistics):
self.content = content
self.model = model
self.usage = usage
class AgentGenerateCompletionManager:
"""Manager for handling direct LLM completions using agent configuration."""
def __init__(self, server: "SyncServer"):
"""
Initialize the agent generate completion manager.
Args:
server: The SyncServer instance for accessing managers
"""
self.server = server
self.agent_manager = server.agent_manager
self.provider_manager = server.provider_manager
async def generate_completion_with_agent_config_async(
self,
agent_id: str,
prompt: str,
actor: "User",
system_prompt: Optional[str] = None,
override_model: Optional[str] = None,
) -> GenerateResponse:
"""
Generate a completion directly from the LLM provider using the agent's configuration.
This method makes a direct request to the LLM provider without any agent processing:
- No memory or context retrieval
- No tool calling
- No message persistence
- No agent state modification
Args:
agent_id: The agent ID whose configuration to use
prompt: The prompt/message to send to the LLM
actor: The user making the request
system_prompt: Optional system prompt to prepend to the conversation
override_model: Optional model handle to override the agent's default
(e.g., 'openai/gpt-4', 'anthropic/claude-3-5-sonnet')
Returns:
GenerateResponse with content, model, and usage statistics
Raises:
NoResultFound: If agent not found
HandleNotFoundError: If override_model is invalid
LLMError: If LLM provider error occurs
"""
# 1. Validate agent exists and user has access
agent = await self.agent_manager.get_agent_by_id_async(
agent_id,
actor,
include_relationships=[],
)
# 2. Get LLM config (with optional override)
llm_config: "LLMConfig" = agent.llm_config
if override_model:
# Get full LLM config for the override model
# This ensures we get the right provider, endpoint, credentials, etc.
llm_config = await self.server.get_llm_config_from_handle_async(
actor=actor,
handle=override_model,
)
logger.info(
f"Generating completion for agent {agent_id}",
extra={
"agent_id": str(agent_id),
"override_model": override_model,
"prompt_length": len(prompt),
"has_system_prompt": system_prompt is not None,
"model": llm_config.model,
},
)
# 3. Build messages from prompt and optional system_prompt
letta_messages = []
# Always add a system message (required by some providers like Anthropic)
# Use provided system_prompt or minimal default (empty strings not allowed with cache_control)
letta_messages.append(
Message(
role=MessageRole.system,
content=[TextContent(text=system_prompt if system_prompt else "You are a helpful assistant.")],
)
)
# Add user prompt
letta_messages.append(
Message(
role=MessageRole.user,
content=[TextContent(text=prompt)],
)
)
# 4. Create LLM client for the provider
llm_client = LLMClient.create(
provider_type=llm_config.model_endpoint_type,
actor=actor,
)
if llm_client is None:
raise LLMError(f"Unsupported provider type: {llm_config.model_endpoint_type}")
# 5. Build request data (no tools, no function calling)
request_data = llm_client.build_request_data(
agent_type=agent.agent_type,
messages=letta_messages,
llm_config=llm_config,
tools=None, # No tools for direct generation
force_tool_call=None,
)
# 6. Make direct LLM request
response_data = await llm_client.request_async(request_data, llm_config)
# 7. Convert to standard chat completion format
chat_completion = await llm_client.convert_response_to_chat_completion(
response_data,
letta_messages,
llm_config,
)
# 8. Extract response content
content = ""
if chat_completion.choices and len(chat_completion.choices) > 0:
message = chat_completion.choices[0].message
content = message.content or ""
# 9. Extract usage statistics
usage = llm_client.extract_usage_statistics(response_data, llm_config)
# 10. Build and return response
return GenerateResponse(
content=content,
model=llm_config.model,
usage=usage,
)

View File

@@ -4,6 +4,7 @@ import threading
import uuid
from http.server import BaseHTTPRequestHandler, HTTPServer
import httpx
import pytest
from dotenv import load_dotenv
from letta_client import APIError, Letta
@@ -895,3 +896,208 @@ def test_attach_sleeptime_block(client: Letta):
# cleanup
client.agents.delete(agent.id)
# --------------------------------------------------------------------------------------------------------------------
# Agent Generate Endpoint Tests
# --------------------------------------------------------------------------------------------------------------------
def test_agent_generate_basic(client: Letta, agent: AgentState):
"""Test basic generate endpoint with simple prompt."""
response = httpx.post(
f"{client._client._base_url}/v1/agents/{agent.id}/generate",
json={"prompt": "What is 2+2?"},
timeout=30.0,
)
# Verify successful response
assert response.status_code == 200, f"Expected 200, got {response.status_code}: {response.text}"
response_data = response.json()
# Verify response structure
assert response_data is not None
assert "content" in response_data
assert "model" in response_data
assert "usage" in response_data
# Verify content is returned
assert response_data["content"] is not None
assert len(response_data["content"]) > 0
assert isinstance(response_data["content"], str)
# Verify model is set
assert response_data["model"] is not None
assert isinstance(response_data["model"], str)
# Verify usage statistics
assert response_data["usage"] is not None
assert response_data["usage"]["total_tokens"] > 0
assert response_data["usage"]["prompt_tokens"] > 0
assert response_data["usage"]["completion_tokens"] > 0
def test_agent_generate_with_system_prompt(client: Letta, agent: AgentState):
"""Test generate endpoint with system prompt."""
response = httpx.post(
f"{client._client._base_url}/v1/agents/{agent.id}/generate",
json={
"prompt": "What is your role?",
"system_prompt": "You are a helpful math tutor who always responds with exactly 5 words.",
},
timeout=30.0,
)
# Verify successful response
assert response.status_code == 200, f"Expected 200, got {response.status_code}: {response.text}"
response_data = response.json()
# Verify response
assert response_data is not None
assert response_data["content"] is not None
assert len(response_data["content"]) > 0
# Verify usage includes system prompt tokens
assert response_data["usage"]["prompt_tokens"] > 10 # Should include system prompt tokens
def test_agent_generate_with_model_override(client: Letta, agent: AgentState):
"""Test generate endpoint with model override."""
# Get the agent's current model
original_model = agent.llm_config.model
# Use OpenAI model (more likely to be available in test environment)
override_model_handle = "openai/gpt-4o-mini"
response = httpx.post(
f"{client._client._base_url}/v1/agents/{agent.id}/generate",
json={
"prompt": "Say hello",
"override_model": override_model_handle,
},
timeout=30.0,
)
# Verify successful response
assert response.status_code == 200, f"Expected 200, got {response.status_code}: {response.text}"
response_data = response.json()
# Verify response
assert response_data is not None
assert response_data["content"] is not None
# Verify the override model was used (model name should be different from original)
# Note: The actual model name in response might be the full model name, not the handle
assert response_data["model"] is not None
def test_agent_generate_empty_prompt_error(client: Letta, agent: AgentState):
"""Test that empty prompt returns validation error."""
response = httpx.post(
f"{client._client._base_url}/v1/agents/{agent.id}/generate",
json={"prompt": ""}, # Empty prompt should fail validation
timeout=30.0,
)
# Verify it's a validation error (422)
assert response.status_code == 422, f"Expected 422, got {response.status_code}: {response.text}"
def test_agent_generate_whitespace_prompt_error(client: Letta, agent: AgentState):
"""Test that whitespace-only prompt returns validation error."""
response = httpx.post(
f"{client._client._base_url}/v1/agents/{agent.id}/generate",
json={"prompt": " \n\t "}, # Whitespace-only prompt should fail validation
timeout=30.0,
)
# Verify it's a validation error (422)
assert response.status_code == 422, f"Expected 422, got {response.status_code}: {response.text}"
def test_agent_generate_invalid_agent_id(client: Letta):
"""Test that invalid agent ID returns 404."""
# Use properly formatted agent ID that doesn't exist
fake_agent_id = "agent-00000000-0000-4000-8000-000000000000"
response = httpx.post(
f"{client._client._base_url}/v1/agents/{fake_agent_id}/generate",
json={"prompt": "Hello"},
timeout=30.0,
)
# Verify it's a not found error (404)
assert response.status_code == 404, f"Expected 404, got {response.status_code}: {response.text}"
assert "not found" in response.text.lower()
def test_agent_generate_invalid_model_override(client: Letta, agent: AgentState):
"""Test that invalid model override returns 404."""
response = httpx.post(
f"{client._client._base_url}/v1/agents/{agent.id}/generate",
json={
"prompt": "Hello",
"override_model": "invalid/model-that-does-not-exist",
},
timeout=30.0,
)
# Verify it's a not found error (404)
assert response.status_code == 404, f"Expected 404, got {response.status_code}: {response.text}"
assert "not found" in response.text.lower() or "not accessible" in response.text.lower()
def test_agent_generate_long_prompt(client: Letta, agent: AgentState):
"""Test generate endpoint with a longer prompt."""
# Create a longer prompt
long_prompt = " ".join(["This is a test sentence."] * 50)
response = httpx.post(
f"{client._client._base_url}/v1/agents/{agent.id}/generate",
json={"prompt": long_prompt},
timeout=30.0,
)
# Verify successful response
assert response.status_code == 200, f"Expected 200, got {response.status_code}: {response.text}"
response_data = response.json()
# Verify response
assert response_data is not None
assert response_data["content"] is not None
# Verify token usage reflects the longer prompt
assert response_data["usage"]["prompt_tokens"] > 100 # Should have substantial prompt tokens
def test_agent_generate_no_persistence(client: Letta, agent: AgentState):
"""Test that generate endpoint does not persist messages to agent."""
# Get initial message count
initial_messages = client.agents.messages.list(agent_id=agent.id).items
initial_count = len(initial_messages)
# Make a generate request
response = httpx.post(
f"{client._client._base_url}/v1/agents/{agent.id}/generate",
json={"prompt": "This should not be saved to agent memory"},
timeout=30.0,
)
# Verify successful response
assert response.status_code == 200, f"Expected 200, got {response.status_code}: {response.text}"
response_data = response.json()
# Verify response was generated
assert response_data is not None
assert response_data["content"] is not None
# Verify no new messages were added to the agent
final_messages = client.agents.messages.list(agent_id=agent.id).items
final_count = len(final_messages)
assert final_count == initial_count, "Generate endpoint should not persist messages"