docs: center documentation around current Letta architecture (#5634)

* docs: restructure architecture documentation to sideline legacy agent types This commit reorganizes the agent architecture documentation to address confusion around legacy agent types (memgpt_agent, memgpt_v2_agent) and clarify that users should not specify agent_type for new projects. The documentation was causing confusion for both users and LLMs: - References to memgpt_agent, memgpt_v2_agent, and letta_v1_agent were scattered throughout main docs - The naming progression (memgpt → memgpt_v2 → letta_v1) is non-standard - LLMs trained on these docs were recommending deprecated architectures - Discord users were confused about which agent type to use - send_message tool and heartbeat references were in mainline docs - architectures_overview.mdx - Landing page explaining legacy types exist - migration_guide.mdx - Step-by-step migration with code snippets - naming_history.mdx - Hidden page explaining progression for LLMs - memgpt_agents_legacy.mdx - Moved from main docs with deprecation warnings - heartbeats_legacy.mdx - Moved from main docs with deprecation warnings - Removed "Agent Architectures" subsection from main nav - Moved "MemGPT Agents" to top-level (renamed "Agent Memory & Architecture") - Removed "Heartbeats" page from main nav - Added "Legacy & Migration" section with 5 sub-pages - Added redirects for old URLs - pages/agents/memgpt_agents.mdx - Completely rewritten to focus on current architecture without mentioning legacy agent types - pages/agents/sleep_time_agents.mdx - Changed from agent_type to enableSleeptime - pages/agents/base_tools.mdx - Added stronger deprecation warning for send_message - pages/agents/overview.mdx - Updated assistant_message description - pages/agents/tool_rules.mdx - Removed send_message default rule examples - pages/agents/message_types.mdx - Removed heartbeat message type section - pages/agents/json_mode.mdx - Removed send_message requirements - pages/agents/archival_best_practices.mdx - Removed send_message tool rule example - pages/agents/react_agents.mdx - Removed heartbeat mechanism reference - pages/getting-started/prompts.mdx - Removed send_message note - pages/ade-guide/simulator.mdx - Removed tip about removing send_message - pages/advanced/custom_memory.mdx - Changed send_message to "respond to user" - pages/deployment/railway.mdx - Removed legacy tools array from example - pages/selfhosting/overview.mdx - Changed send_message example to memory_insert - pages/agents/heartbeats.mdx - Moved to legacy section Added to memory: aggressively remove send_message and heartbeat references from main docs. Keep legacy content only in /guides/legacy/ section. Don't add notes about legacy in main docs - just remove the references entirely. * docs: remove evals tab from navigation The evals content is not ready for public documentation yet. * docs: move send_message to deprecated tools table with legacy link - Removed Legacy Tools section - Added send_message to Deprecated Tools table with link to legacy guide - Removed undefined warning text * docs: move ReAct agents to legacy section - Moved pages/agents/react_agents.mdx to pages/legacy/react_agents_legacy.mdx - Added deprecation warning at top - Updated slug to guides/legacy/react_agents_legacy - Added to Legacy & Migration navigation section - Added redirect from old URL to new legacy location ReAct agents are a legacy architecture that lacks long-term memory capabilities compared to the current Letta architecture. * docs: move workflow and low-latency architectures to legacy - Moved pages/agents/workflows.mdx to pages/legacy/workflows_legacy.mdx - Moved pages/agents/low_latency_agents.mdx to pages/legacy/low_latency_agents_legacy.mdx - Deleted pages/agents/architectures.mdx (overview page no longer needed) - Removed 'Agent Memory & Architecture' from main Agents section - Added workflows and low-latency to Legacy & Migration section - Added redirects for old URLs These agent architectures (workflow_agent, voice_convo_agent) are legacy. For new projects, users should use the current Letta architecture with tool rules or voice-optimized configurations instead. * docs: remove orphaned stateful workflows page - Deleted pages/agents/stateful_workflows.mdx - Page was not linked in navigation or from other docs - Feature (message_buffer_autoclear flag) is already documented in API reference - Avoids confusion with legacy workflow architectures
2025-10-22 10:25:02 -07:00
parent 3d342d402b
commit fc531ca6de
22 changed files with 3481 additions and 63 deletions
--- a/fern/docs.yml
+++ b/fern/docs.yml
@@ -50,8 +50,6 @@ tabs:
  leaderboard:
    display-name: Leaderboard
    skip-slug: true
-
-
 landing-page:
  page: home
  path: pages/index.mdx
@@ -99,25 +97,10 @@ navigation:
        contents:
          - page: Overview
            path: pages/agents/overview.mdx
-          - section: Agent Architectures
-            path: pages/agents/architectures.mdx
-            contents:
-              - page: MemGPT Agents
-                path: pages/agents/memgpt_agents.mdx
-              - page: Low-latency (voice) Agents
-                path: pages/agents/low_latency_agents.mdx
-              - page: ReAct Agents
-                path: pages/agents/react_agents.mdx
-              - page: Workflows
-                path: pages/agents/workflows.mdx
-              - page: Stateful Workflows
-                path: pages/agents/stateful_workflows.mdx
          - page: Context Hierarchy
            path: pages/agents/context_hierarchy.mdx
          - page: Message Types
            path: pages/agents/message_types.mdx
-          - page: Heartbeats
-            path: pages/agents/heartbeats.mdx

      - section: Memory
        path: pages/agents/memory.mdx
@@ -223,6 +206,25 @@ navigation:
          - page: MemGPT Paper Deep Dive
            path: pages/concepts/memgpt.mdx

+      - section: Legacy & Migration
+        contents:
+          - page: Legacy Architecture Overview
+            path: pages/legacy/architectures_overview.mdx
+          - page: Migration Guide
+            path: pages/legacy/migration_guide.mdx
+          - page: MemGPT Agents (Legacy)
+            path: pages/legacy/memgpt_agents_legacy.mdx
+          - page: Workflows (Legacy)
+            path: pages/legacy/workflows_legacy.mdx
+          - page: Low-latency Agents (Legacy)
+            path: pages/legacy/low_latency_agents_legacy.mdx
+          - page: ReAct Agents (Legacy)
+            path: pages/legacy/react_agents_legacy.mdx
+          - page: Heartbeats (Legacy)
+            path: pages/legacy/heartbeats_legacy.mdx
+          - page: Architecture Naming History
+            path: pages/legacy/naming_history.mdx
+
      #- section: Tool Execution
      #  contents:
      #    - page: Overview
@@ -721,7 +723,6 @@ navigation:
          - page: Contributing Results
            path: pages/leaderboard/contributing.mdx

-
  # - tab: cookbooks
  #   layout:
  #     - section: Cookbooks
@@ -899,3 +900,15 @@ redirects:
    destination: "/api-reference/agents/messages/cancel"
  - source: "/api-reference/messages/cancel-batch-run"
    destination: "/api-reference/batches/cancel"
+  - source: "/guides/agents/architectures"
+    destination: "/guides/legacy/architectures_overview"
+  - source: "/guides/agents/architectures/memgpt"
+    destination: "/guides/legacy/memgpt_agents_legacy"
+  - source: "/guides/agents/heartbeats"
+    destination: "/guides/legacy/heartbeats_legacy"
+  - source: "/guides/agents/architectures/react"
+    destination: "/guides/legacy/react_agents_legacy"
+  - source: "/guides/agents/architectures/workflows"
+    destination: "/guides/legacy/workflows_legacy"
+  - source: "/guides/agents/architectures/low-latency"
+    destination: "/guides/legacy/low_latency_agents_legacy"
--- a/fern/pages/ade-guide/simulator.mdx
+++ b/fern/pages/ade-guide/simulator.mdx
@@ -0,0 +1,75 @@
+---
+title: Agent Simulator
+subtitle: Use the agent simulator to chat with your agent
+slug: guides/ade/simulator
+---
+
+The Agent Simulator is the central interface where you interact with your agent in real-time. It provides a comprehensive view of your agent's conversation history and tool usage while offering an intuitive chat interface.
+
+<img className="block dark:hidden" src="../../images/ade_screenshot_chat_light.png" />
+<img className="hidden dark:block" src="../../images/ade_screenshot_chat.png" />
+
+## Key Features
+
+### Conversation Visualization
+
+The simulator displays the complete event and conversation (or event) history of your agent, organized chronologically. Each message is color-coded and formatted according to its type for clear differentiation:
+
+- **User Messages**: Messages sent by you (the user) to the agent. These appear on the right side of the conversation view.
+- **Agent Messages**: Responses generated by the agent and directed to the user. These appear on the left side of the conversation view.
+- **System Messages**: Non-user messages that represent events or notifications, such as `[Alert] The user just logged on` or `[Notification] File upload completed`. These provide context about events happening in the environment.
+- **Function (Tool) Messages** <span style={{ color: '#6366F1' }}><i className="fas fa-rectangle-terminal mr-1"></i></span>: Detailed records of tool executions, including:
+  - Tool calls made by the agent
+  - Arguments passed to the tools
+  - Results returned by the tools
+  - Any errors encountered during execution
+
+If an error occurs during tool execution, the agent is given an opportunity to handle the error and continue execution by calling the tool again.
+The simulator supports real-time streaming of agent responses, allowing you to see the agent's thought process as it happens.
+
+
+### Advanced Conversation Controls
+
+Beyond basic chatting, the simulator provides several controls to enhance your interaction:
+
+- **Message Type Selection**: Toggle between sending user messages or system messages
+- **Conversation History**: Scroll through the entire conversation history
+- **Message Search**: Quickly find specific messages or tool calls
+- **Tool Execution View**: Expand tool calls to see detailed execution information
+- **Token Usage**: Monitor token consumption throughout the conversation
+
+## Using the Simulator Effectively
+
+### Testing Agent Behavior
+
+The simulator is ideal for testing how your agent responds to different inputs:
+
+- Try various user queries to test the agent's understanding
+- Send edge case questions to verify error handling
+- Use system messages to simulate events and observe reactions
+
+### Debugging Tool Usage
+
+When developing custom tools, the simulator provides valuable insights:
+
+- See exactly which tools the agent chooses to use
+- Verify that arguments are correctly formatted
+- Check tool execution results and error handling
+- Monitor the agent's interpretation of tool results
+
+### Simulating Multi-turn Conversations
+
+To test your agent's memory and conversation abilities:
+
+1. Start with a simple query to establish context
+2. Follow up with related questions to test if the agent maintains context
+3. Introduce new topics to see how the agent handles context switching
+4. Return to previous topics to verify if information was retained
+
+### Best Practices
+
+- **Start with simple queries**: Begin testing with straightforward questions before moving to complex scenarios
+- **Monitor tool usage**: Pay attention to which tools the agent chooses and why
+- **Test edge cases**: Deliberately test how your agent handles unexpected inputs
+- **Use system messages**: Simulate environmental events to test agent adaptability
+- **Review context window**: Cross-reference with the Context Window Viewer to understand what information the agent is using to form responses
--- a/fern/pages/advanced/custom_memory.mdx
+++ b/fern/pages/advanced/custom_memory.mdx
@@ -0,0 +1,75 @@
+---
+title: Creating custom memory classes
+subtitle: Learn how to create custom memory classes
+slug: guides/agents/custom-memory
+---
+
+
+## Customizing in-context memory management
+
+We can extend both the `BaseMemory` and `ChatMemory` classes to implement custom in-context memory management for agents.
+For example, you can add an additional memory section to "human" and "persona" such as "organization".
+
+In this example, we'll show how to implement in-context memory management that treats memory as a task queue.
+We'll call this `TaskMemory` and extend the `ChatMemory` class so that we have both the original `ChatMemory` tools (`core_memory_replace` & `core_memory_append`) as well as the "human" and "persona" fields.
+
+We show an implementation of `TaskMemory` below:
+```python
+from letta.memory import ChatMemory, MemoryModule
+from typing import Optional, List
+
+class TaskMemory(ChatMemory):
+
+    def __init__(self, human: str, persona: str, tasks: List[str]):
+        super().__init__(human=human, persona=persona)
+        self.memory["tasks"] = MemoryModule(limit=2000, value=tasks) # create an empty list
+
+
+
+    def task_queue_push(self, task_description: str) -> Optional[str]:
+        """
+        Push to a task queue stored in core memory.
+
+        Args:
+            task_description (str): A description of the next task you must accomplish.
+
+        Returns:
+            Optional[str]: None is always returned as this function does not produce a response.
+        """
+        self.memory["tasks"].value.append(task_description)
+        return None
+
+    def task_queue_pop(self) -> Optional[str]:
+        """
+        Get the next task from the task queue
+
+        Returns:
+            Optional[str]: The description of the task popped from the queue,
+            if there are still tasks in queue. Otherwise, returns None (the
+            task queue is empty)
+        """
+        if len(self.memory["tasks"].value) == 0:
+            return None
+        task = self.memory["tasks"].value[0]
+        self.memory["tasks"].value = self.memory["tasks"].value[1:]
+        return task
+```
+
+To create an agent with this custom memory type, we can simply pass in an instance of `TaskMemory` into the agent creation.
+We also will modify the persona of the agent to explain how the "tasks" section of memory should be used:
+```python
+task_agent_state = client.create_agent(
+    name="task_agent",
+    memory=TaskMemory(
+        human="My name is Sarah",
+        persona="You have an additional section of core memory called `tasks`. " \
+        + "This section of memory contains of list of tasks you must do." \
+        + "Use the `task_queue_push` tool to write down tasks so you don't forget to do them." \
+        + "If there are tasks in the task queue, you should call `task_queue_pop` to retrieve and remove them. " \
+        + "Keep calling `task_queue_pop` until there are no more tasks in the queue. " \
+        + "Do *not* respond to the user until you have completed all tasks in your queue. " \
+        + "If you call `task_queue_pop`, you must always do what the popped task specifies",
+        tasks=["start calling yourself Bob", "tell me a haiku with my name"],
+    )
+)
+```
--- a/fern/pages/agents/archival_best_practices.mdx
+++ b/fern/pages/agents/archival_best_practices.mdx
@@ -239,34 +239,6 @@ client.agents.update(
 ```
 </CodeGroup>

-**Require archival insertion before exit:**
-
-<CodeGroup>
-```typescript TypeScript
-await client.agents.update(agent.id, {
-    toolRules: [
-        {
-            type: "child",
-            toolName: "send_message",
-            children: ["archival_memory_insert"]
-        }
-    ]
-});
-```
-```python Python
-from letta_client.types import ChildToolRule
-
-client.agents.update(
-    agent_id=agent.id,
-    tool_rules=[
-        ChildToolRule(
-            tool_name="send_message",
-            children=["archival_memory_insert"]
-        )
-    ]
-)
-```
-</CodeGroup>

 <Info>
 **Using the ADE:** Tool rules can also be configured in the Agent Development Environment's Tool Manager interface.
--- a/fern/pages/agents/base_tools.mdx
+++ b/fern/pages/agents/base_tools.mdx
@@ -131,28 +131,13 @@ Search archival memory using semantic (embedding-based) search.
 **Returns:**
 Matching passages with timestamps and content, ordered by semantic similarity.

-## Legacy Tools
-
-### send_message
-
-<Info>
-Only used by `memgpt_v1_agent` and `memgpt_v2_agent` architectures. The newer `letta_v1_agent` architecture does not use or need this tool.
-</Info>
-
-Sends a message to the user.
-
-**Parameters:**
- `message`: The message text to send
-
-**What it does:**
-Displays a message to the user in the conversation.
-
 ## Deprecated Tools

 These tools are still available but deprecated:

 | Tool | Use Instead |
 |------|-------------|
+| `send_message` | Agent responses (no tool needed). See [legacy architectures](/guides/legacy/memgpt_agents_legacy) |
 | `core_memory_append` | `memory_insert` with `insert_line=-1` |
 | `core_memory_replace` | `memory_replace` |

--- a/fern/pages/agents/json_mode.mdx
+++ b/fern/pages/agents/json_mode.mdx
@@ -0,0 +1,468 @@
+---
+title: JSON Mode & Structured Output
+subtitle: Get structured JSON responses from your Letta agents
+slug: guides/agents/json-mode
+---
+
+Letta provides two ways to get structured JSON output from agents: **Structured Generation through Tools** (recommended) and the `response_format` parameter.
+
+## Quick Comparison
+
+<Note>
+**Recommended**: Use **Structured Generation through Tools** - works with all providers (Anthropic, OpenAI, Google, etc.) and integrates naturally with Letta's tool-calling architecture.
+</Note>
+
+<Info>
+**Structured Generation through Tools**:
+- ✅ Universal provider compatibility
+- ✅ Both reasoning AND structured output
+- ✅ Per-message control
+- ✅ Works even as "dummy tool" for pure formatting
+</Info>
+
+<Warning>
+**`response_format` parameter**:
+- ⚠️ OpenAI-compatible providers only (NOT Anthropic)
+- ⚠️ Persistent agent state (affects all future responses)
+
+- ✅ Built-in provider schema enforcement
+</Warning>
+
+## Structured Generation through Tools (Recommended)
+
+Create a tool that defines your desired response format. The tool arguments become your structured data, and you can extract them from the tool call.
+
+### Creating a Structured Generation Tool
+
+<CodeGroup>
+```typescript TypeScript maxLines=100
+import { LettaClient } from '@letta-ai/letta-client'
+
+// Create client (Letta Cloud)
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+// Or for self-hosted
+// const client = new LettaClient({ baseUrl: "http://localhost:8283" });
+
+// First create the tool
+const toolCode = `def generate_rank(rank: int, reason: str):
+    """Generate a ranking with explanation.
+
+    Args:
+        rank (int): The numerical rank from 1-10.
+        reason (str): The reasoning behind the rank.
+    """
+    print("Rank generated")
+    return`;
+
+const tool = await client.tools.create({
+    sourceCode: toolCode,
+    sourceType: "python"
+});
+
+// Create agent with the structured generation tool
+const agentState = await client.agents.create({
+    model: "openai/gpt-4o-mini",
+    embedding: "openai/text-embedding-3-small",
+    memoryBlocks: [
+        {
+            label: "human",
+            value: "The human's name is Chad. They are a food enthusiast who enjoys trying different cuisines."
+        },
+        {
+            label: "persona",
+            value: "I am a helpful food critic assistant. I provide detailed rankings and reviews of different foods and restaurants."
+        }
+    ],
+    toolIds: [tool.id]
+});
+```
+
+```python title="python" maxLines=100
+from letta_client import Letta
+
+# Create client (Letta Cloud)
+client = Letta(token="LETTA_API_KEY")
+
+# Or for self-hosted
+# client = Letta(base_url="http://localhost:8283")
+
+def generate_rank(rank: int, reason: str):
+    """Generate a ranking with explanation.
+
+    Args:
+        rank (int): The numerical rank from 1-10.
+        reason (str): The reasoning behind the rank.
+    """
+    print("Rank generated")
+    return
+
+# Create the tool
+tool = client.tools.create(func=generate_rank)
+
+# Create agent with the structured generation tool
+agent_state = client.agents.create(
+    model="openai/gpt-4o-mini",
+    embedding="openai/text-embedding-3-small",
+    memory_blocks=[
+        {
+            "label": "human",
+            "value": "The human's name is Chad. They are a food enthusiast who enjoys trying different cuisines."
+        },
+        {
+            "label": "persona",
+            "value": "I am a helpful food critic assistant. I provide detailed rankings and reviews of different foods and restaurants."
+        }
+    ],
+    tool_ids=[tool.id]
+)
+```
+</CodeGroup>
+
+### Using the Structured Generation Tool
+
+<CodeGroup>
+```typescript TypeScript maxLines=100
+// Send message and instruct agent to use the tool
+const response = await client.agents.messages.create(
+    agentState.id, {
+        messages: [
+            {
+                role: "user",
+                content: "How do you rank sushi as a food? Please use the generate_rank tool to provide your response."
+            }
+        ]
+    }
+);
+
+// Extract structured data from tool call
+for (const message of response.messages) {
+    if (message.messageType === "tool_call_message") {
+        const args = JSON.parse(message.toolCall.arguments);
+        console.log(`Rank: ${args.rank}`);
+        console.log(`Reason: ${args.reason}`);
+    }
+}
+
+// Example output:
+// Rank: 8
+// Reason: Sushi is a highly regarded cuisine known for its fresh ingredients...
+```
+
+```python title="python" maxLines=100
+# Send message and instruct agent to use the tool
+response = client.agents.messages.create(
+    agent_id=agent_state.id,
+    messages=[
+        {
+            "role": "user",
+            "content": "How do you rank sushi as a food? Please use the generate_rank tool to provide your response."
+        }
+    ]
+)
+
+# Extract structured data from tool call
+for message in response.messages:
+    if message.message_type == "tool_call_message":
+        import json
+        args = json.loads(message.tool_call.arguments)
+        rank = args["rank"]
+        reason = args["reason"]
+        print(f"Rank: {rank}")
+        print(f"Reason: {reason}")
+
+# Example output:
+# Rank: 8
+# Reason: Sushi is a highly regarded cuisine known for its fresh ingredients...
+```
+</CodeGroup>
+
+The agent will call the tool, and you can extract the structured arguments:
+
+```json
+{
+  "rank": 8,
+  "reason": "Sushi is a highly regarded cuisine known for its fresh ingredients, artistic presentation, and cultural significance."
+}
+```
+
+## Using `response_format` for Provider-Native JSON Mode
+
+The `response_format` parameter enables structured output/JSON mode from LLM providers that support it. This approach is fundamentally different from tools because **`response_format` becomes a persistent part of the agent's state** - once set, all future responses from that agent will follow the format until explicitly changed.
+
+Under the hood, `response_format` constrains the agent's assistant messages to follow the specified schema, but it doesn't affect tools - those continue to work normally with their original schemas.
+
+<Warning>
+**Requirements for `response_format`:**
+- Only works with providers that support structured outputs (like OpenAI) - NOT Anthropic or other providers
+
+</Warning>
+
+### Basic JSON Mode
+
+<CodeGroup>
+```typescript TypeScript maxLines=100
+import { LettaClient } from '@letta-ai/letta-client'
+
+// Create client (Letta Cloud)
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+// Create agent with basic JSON mode (OpenAI/compatible providers only)
+const agentState = await client.agents.create({
+    model: "openai/gpt-4o-mini",
+    embedding: "openai/text-embedding-3-small",
+    memoryBlocks: [
+        {
+            label: "human",
+            value: "The human's name is Chad. They work as a data analyst and prefer clear, organized information."
+        },
+        {
+            label: "persona",
+            value: "I am a helpful assistant who provides clear and well-organized responses."
+        }
+    ],
+    responseFormat: { type: "json_object" }
+});
+
+// Send message expecting JSON response
+const response = await client.agents.messages.create(
+    agentState.id, {
+        messages: [
+            {
+                role: "user",
+                content: "How do you rank sushi as a food? Please respond in JSON format with rank and reason fields."
+            }
+        ]
+    }
+);
+
+for (const message of response.messages) {
+    console.log(message);
+}
+```
+
+```python title="python" maxLines=100
+from letta_client import Letta
+
+# Create client (Letta Cloud)
+client = Letta(token="LETTA_API_KEY")
+
+# Create agent with basic JSON mode (OpenAI/compatible providers only)
+agent_state = client.agents.create(
+    model="openai/gpt-4o-mini",
+    embedding="openai/text-embedding-3-small",
+    memory_blocks=[
+        {
+            "label": "human",
+            "value": "The human's name is Chad. They work as a data analyst and prefer clear, organized information."
+        },
+        {
+            "label": "persona",
+            "value": "I am a helpful assistant who provides clear and well-organized responses."
+        }
+    ],
+    response_format={"type": "json_object"}
+)
+
+# Send message expecting JSON response
+response = client.agents.messages.create(
+    agent_id=agent_state.id,
+    messages=[
+        {
+            "role": "user",
+            "content": "How do you rank sushi as a food? Please respond in JSON format with rank and reason fields."
+        }
+    ]
+)
+
+for message in response.messages:
+    print(message)
+```
+</CodeGroup>
+
+### Advanced JSON Schema Mode
+
+For more precise control, you can use OpenAI's `json_schema` mode with strict validation:
+
+<CodeGroup>
+```typescript TypeScript maxLines=100
+import { LettaClient } from '@letta-ai/letta-client'
+
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+// Define structured schema (from OpenAI structured outputs guide)
+const responseFormat = {
+    type: "json_schema",
+    jsonSchema: {
+        name: "food_ranking",
+        schema: {
+            type: "object",
+            properties: {
+                rank: {
+                    type: "integer",
+                    minimum: 1,
+                    maximum: 10
+                },
+                reason: {
+                    type: "string"
+                },
+                categories: {
+                    type: "array",
+                    items: {
+                        type: "object",
+                        properties: {
+                            name: { type: "string" },
+                            score: { type: "integer" }
+                        },
+                        required: ["name", "score"],
+                        additionalProperties: false
+                    }
+                }
+            },
+            required: ["rank", "reason", "categories"],
+            additionalProperties: false
+        },
+        strict: true
+    }
+};
+
+// Create agent
+const agentState = await client.agents.create({
+    model: "openai/gpt-4o-mini",
+    embedding: "openai/text-embedding-3-small",
+    memoryBlocks: []
+});
+
+// Update agent with response format
+const updatedAgent = await client.agents.update(
+    agentState.id,
+    { responseFormat }
+);
+
+// Send message
+const response = await client.agents.messages.create(
+    agentState.id, {
+        messages: [
+            { role: "user", content: "How do you rank sushi? Include categories for taste, presentation, and value." }
+        ]
+    }
+);
+
+for (const message of response.messages) {
+    console.log(message);
+}
+```
+
+```python title="python" maxLines=100
+from letta_client import Letta
+
+client = Letta(token="LETTA_API_KEY")
+
+# Define structured schema (from OpenAI structured outputs guide)
+response_format = {
+    "type": "json_schema",
+    "json_schema": {
+        "name": "food_ranking",
+        "schema": {
+            "type": "object",
+            "properties": {
+                "rank": {
+                    "type": "integer",
+                    "minimum": 1,
+                    "maximum": 10
+                },
+                "reason": {
+                    "type": "string"
+                },
+                "categories": {
+                    "type": "array",
+                    "items": {
+                        "type": "object",
+                        "properties": {
+                            "name": { "type": "string" },
+                            "score": { "type": "integer" }
+                        },
+                        "required": ["name", "score"],
+                        "additionalProperties": False
+                    }
+                }
+            },
+            "required": ["rank", "reason", "categories"],
+            "additionalProperties": False
+        },
+        "strict": True
+    }
+}
+
+# Create agent
+agent_state = client.agents.create(
+    model="openai/gpt-4o-mini",
+    embedding="openai/text-embedding-3-small",
+    memory_blocks=[]
+)
+
+# Update agent with response format
+agent_state = client.agents.update(
+    agent_id=agent_state.id,
+    response_format=response_format
+)
+
+# Send message
+response = client.agents.messages.create(
+    agent_id=agent_state.id,
+    messages=[
+        {"role": "user", "content": "How do you rank sushi? Include categories for taste, presentation, and value."}
+    ]
+)
+
+for message in response.messages:
+    print(message)
+```
+</CodeGroup>
+
+With structured JSON schema, the agent's response will be strictly validated:
+
+```json
+{
+  "rank": 8,
+  "reason": "Sushi is highly regarded for its fresh ingredients and artful presentation",
+  "categories": [
+    {"name": "taste", "score": 9},
+    {"name": "presentation", "score": 10},
+    {"name": "value", "score": 6}
+  ]
+}
+```
+
+
+## Updating Agent Response Format
+
+You can update an existing agent's response format:
+
+<CodeGroup>
+```typescript TypeScript maxLines=100
+// Update agent to use JSON mode (OpenAI/compatible only)
+await client.agents.update(agentState.id, {
+    responseFormat: { type: "json_object" }
+});
+
+// Or remove JSON mode
+await client.agents.update(agentState.id, {
+    responseFormat: null
+});
+```
+
+```python title="python" maxLines=100
+# Update agent to use JSON mode (OpenAI/compatible only)
+client.agents.update(
+    agent_id=agent_state.id,
+    response_format={"type": "json_object"}
+)
+
+# Or remove JSON mode
+client.agents.update(
+    agent_id=agent_state.id,
+    response_format=None
+)
+```
+</CodeGroup>
--- a/fern/pages/agents/memgpt_agents.mdx
+++ b/fern/pages/agents/memgpt_agents.mdx
@@ -0,0 +1,217 @@
+---
+title: Agent Memory & Architecture
+subtitle: How Letta agents manage persistent, self-editing memory
+slug: guides/agents/architectures/memgpt
+---
+
+<Note>
+**Looking for legacy architecture documentation?** See [Legacy Architectures](/guides/legacy/memgpt_agents_legacy) for information on older agent types with send_message and heartbeats.
+</Note>
+
+<Info>
+Letta is made by the [creators of MemGPT](https://www.letta.com/about-us). The agent architecture in Letta is built on the MemGPT research paper's concepts of self-editing memory and memory hierarchy.
+</Info>
+
+Letta agents solve the context window limitation of LLMs through context engineering across two tiers of memory: **in-context (core) memory** (including system instructions, read-write memory blocks, and conversation history), and **out-of-context memory** (older evicted conversation history and archival storage).
+
+To learn more about the research origins, read the [MemGPT research paper](https://arxiv.org/abs/2310.08560), or take the free [LLM OS course](https://www.deeplearning.ai/short-courses/llms-as-operating-systems-agent-memory/?utm_campaign=memgpt-launch&utm_content=331638345&utm_medium=social&utm_source=docs&hss_channel=tw-992153930095251456) on DeepLearning.ai.
+
+## Memory Hierarchy
+
+```mermaid
+graph LR
+    subgraph CONTEXT[Context Window]
+        SYS[System Instructions]
+        CORE[Memory Blocks]
+        MSGS[Messages]
+    end
+
+    RECALL[Recall Memory]
+    ARCH[Archival Memory]
+
+    CONTEXT <--> RECALL
+    CONTEXT <--> ARCH
+```
+
+### In-context (core) memory
+
+Your agent's context window contains:
+- **System instructions:** Your agent's base behavior and capabilities
+- **Memory blocks:** Persistent, always-visible information (persona, user info, working state, etc.)
+- **Recent messages:** Latest conversation history
+
+### Out-of-context memory
+
+When the context window fills up:
+- **Recall memory:** Older messages searchable via `conversation_search` tool
+- **Archival memory:** Long-term semantic storage searchable via `archival_memory_search` tool
+
+## Agent Architecture
+
+Letta's agent architecture follows modern LLM patterns:
+
+- **Native reasoning:** Uses model's built-in reasoning capabilities (Responses API for OpenAI, encrypted reasoning for other providers)
+- **Direct messaging:** Agents respond with assistant messages
+- **Compatibility:** Works with any LLM, tool calling not required
+- **Self-directed termination:** Agents decide when to continue or stop
+
+This architecture is optimized for frontier models like GPT-5 and Claude Sonnet 4.5.
+
+[Learn more about the architecture evolution →](https://www.letta.com/blog/letta-v1-agent)
+
+## Memory Tools
+
+Letta agents have tools to manage their own memory:
+
+### Memory block editing
+* `memory_insert` - Insert text into a memory block
+* `memory_replace` - Replace specific text in a memory block
+* `memory_rethink` - Completely rewrite a memory block
+
+### Recall memory
+* `conversation_search` - Search prior conversation history
+
+### Archival memory
+* `archival_memory_insert` - Store facts and knowledge long-term
+* `archival_memory_search` - Query semantic storage
+
+[Learn more about memory tools →](/guides/agents/base-tools)
+
+## Creating Agents
+
+Agents are created with memory blocks that define their persistent context:
+
+<CodeGroup>
+```typescript TypeScript
+import { LettaClient } from '@letta-ai/letta-client'
+
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+const agent = await client.agents.create({
+    model: "openai/gpt-4o-mini",
+    embedding: "openai/text-embedding-3-small",
+    memoryBlocks: [
+        {
+          label: "human",
+          value: "The human's name is Chad. They like vibe coding."
+        },
+        {
+          label: "persona",
+          value: "My name is Sam, the all-knowing sentient AI."
+        }
+    ],
+    tools: ["web_search", "run_code"]
+});
+```
+
+```python Python
+from letta_client import Letta
+
+client = Letta(token="LETTA_API_KEY")
+
+agent = client.agents.create(
+    model="openai/gpt-4o-mini",
+    embedding="openai/text-embedding-3-small",
+    memory_blocks=[
+        {
+          "label": "human",
+          "value": "The human's name is Chad. They like vibe coding."
+        },
+        {
+          "label": "persona",
+          "value": "My name is Sam, the all-knowing sentient AI."
+        }
+    ],
+    tools=["web_search", "run_code"]
+)
+```
+
+```bash cURL
+curl -X POST https://api.letta.com/v1/agents \
+     -H "Authorization: Bearer $LETTA_API_KEY" \
+     -H "Content-Type: application/json" \
+     -d '{
+  "model": "openai/gpt-4o-mini",
+  "embedding": "openai/text-embedding-3-small",
+  "memory_blocks": [
+    {
+      "label": "human",
+      "value": "The human'\''s name is Chad. They like vibe coding."
+    },
+    {
+      "label": "persona",
+      "value": "My name is Sam, the all-knowing sentient AI."
+    }
+  ],
+  "tools": ["web_search", "run_code"]
+}'
+```
+</CodeGroup>
+
+## Context Window Management
+
+When the context window fills up, Letta automatically:
+1. Compacts older messages into a recursive summary
+2. Moves full message history to recall storage
+3. Agent can search recall with `conversation_search` tool
+
+This happens transparently - your agent maintains continuity.
+
+## Populating Archival Memory
+
+Agents can insert memories during conversations, or you can populate archival memory programmatically:
+
+<CodeGroup>
+```typescript TypeScript
+// Insert a memory via SDK
+await client.agents.passages.insert(agent.id, {
+    content: "The user prefers TypeScript over JavaScript for type safety.",
+    tags: ["preferences", "languages"]
+});
+
+// Agent can now search this
+// Agent calls: archival_memory_search(query="language preferences")
+```
+```python Python
+# Insert a memory via SDK
+client.agents.passages.insert(
+    agent_id=agent.id,
+    content="The user prefers TypeScript over JavaScript for type safety.",
+    tags=["preferences", "languages"]
+)
+
+# Agent can now search this
+# Agent calls: archival_memory_search(query="language preferences")
+```
+</CodeGroup>
+
+[Learn more about archival memory →](/guides/agents/archival-memory)
+
+## Research Background
+
+Key concepts from the MemGPT research:
+
+- **Self-editing memory:** Agents actively manage their own memory
+- **Memory hierarchy:** In-context vs out-of-context storage
+- **Tool-based memory management:** Agents decide what to remember
+- **Stateful agents:** Persistent memory across all interactions
+
+[Read the MemGPT paper →](https://arxiv.org/abs/2310.08560)
+[Take the free course →](https://www.deeplearning.ai/short-courses/llms-as-operating-systems-agent-memory)
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="Memory Blocks" href="/guides/agents/memory-blocks">
+    Deep dive into memory block structure
+  </Card>
+  <Card title="Archival Memory" href="/guides/agents/archival-memory">
+    Long-term semantic storage
+  </Card>
+  <Card title="Base Tools" href="/guides/agents/base-tools">
+    Built-in tools for memory management
+  </Card>
+  <Card title="Context Engineering" href="/guides/agents/context-engineering">
+    Optimizing agent memory usage
+  </Card>
+</CardGroup>
--- a/fern/pages/agents/message_types.mdx
+++ b/fern/pages/agents/message_types.mdx
@@ -0,0 +1,459 @@
+---
+title: Message Types
+subtitle: Understanding message types and working with agent message history
+slug: guides/agents/message-types
+---
+
+When you interact with a Letta agent and retrieve its message history using `client.agents.messages.list()`, you'll receive various types of messages that represent different aspects of the agent's execution. This guide explains all message types and how to work with them.
+
+## Overview
+
+Letta uses a structured message system where each message has a specific `message_type` field that indicates its purpose. Messages are returned as instances of `LettaMessageUnion`, which is a discriminated union of all possible message types.
+
+## Message Type Categories
+
+### User and System Messages
+
+#### `user_message`
+Messages sent by the user or system events packaged as user input.
+
+**Structure:**
+```typescript
+{
+  id: string;
+  date: datetime;
+  message_type: "user_message";
+  content: string | Array<TextContent | ImageContent>;
+  name?: string;
+  otid?: string;
+  sender_id?: string;
+}
+```
+
+**Special User Message Subtypes:**
+User messages can contain JSON with a `type` field indicating special message subtypes:
+
+- **`login`** - User login events
+  ```json
+  {
+    "type": "login",
+    "last_login": "Never (first login)",
+    "time": "2025-10-03 12:34:56 PM PDT-0700"
+  }
+  ```
+
+- **`user_message`** - Standard user messages
+  ```json
+  {
+    "type": "user_message",
+    "message": "Hello, agent!",
+    "time": "2025-10-03 12:34:56 PM PDT-0700"
+  }
+  ```
+
+- **`system_alert`** - System notifications and alerts
+  ```json
+  {
+    "type": "system_alert",
+    "message": "System notification text",
+    "time": "2025-10-03 12:34:56 PM PDT-0700"
+  }
+  ```
+
+#### `system_message`
+Messages generated by the system, typically used for internal context.
+
+**Structure:**
+```typescript
+{
+  id: string;
+  date: datetime;
+  message_type: "system_message";
+  content: string;
+  name?: string;
+}
+```
+
+**Note:** System messages are never streamed back in responses; they're only visible when paginating through message history.
+
+### Agent Reasoning and Responses
+
+#### `reasoning_message`
+Represents the agent's internal reasoning or "chain of thought."
+
+**Structure:**
+```typescript
+{
+  id: string;
+  date: datetime;
+  message_type: "reasoning_message";
+  reasoning: string;
+  source: "reasoner_model" | "non_reasoner_model";
+  signature?: string;
+}
+```
+
+**Fields:**
+- `reasoning` - The agent's internal thought process
+- `source` - Whether this was generated by a model with native reasoning (like o1) or via prompting
+- `signature` - Optional cryptographic signature for reasoning verification (for models that support it)
+
+#### `hidden_reasoning_message`
+Represents reasoning that has been hidden from the response.
+
+**Structure:**
+```typescript
+{
+  id: string;
+  date: datetime;
+  message_type: "hidden_reasoning_message";
+  state: "redacted" | "omitted";
+  hidden_reasoning?: string;
+}
+```
+
+**Fields:**
+- `state: "redacted"` - The provider redacted the reasoning content
+- `state: "omitted"` - The API chose not to include reasoning (e.g., for o1/o3 models)
+
+#### `assistant_message`
+The actual message content sent by the agent.
+
+**Structure:**
+```typescript
+{
+  id: string;
+  date: datetime;
+  message_type: "assistant_message";
+  content: string | Array<TextContent>;
+  name?: string;
+}
+```
+
+### Tool Execution Messages
+
+#### `tool_call_message`
+A request from the agent to execute a tool.
+
+**Structure:**
+```typescript
+{
+  id: string;
+  date: datetime;
+  message_type: "tool_call_message";
+  tool_call: {
+    name: string;
+    arguments: string;  // JSON string
+    tool_call_id: string;
+  };
+}
+```
+
+**Example:**
+```typescript
+{
+  message_type: "tool_call_message",
+  tool_call: {
+    name: "archival_memory_search",
+    arguments: '{"query": "user preferences", "page": 0}',
+    tool_call_id: "call_abc123"
+  }
+}
+```
+
+#### `tool_return_message`
+The result of a tool execution.
+
+**Structure:**
+```typescript
+{
+  id: string;
+  date: datetime;
+  message_type: "tool_return_message";
+  tool_return: string;
+  status: "success" | "error";
+  tool_call_id: string;
+  stdout?: string[];
+  stderr?: string[];
+}
+```
+
+**Fields:**
+- `tool_return` - The formatted return value from the tool
+- `status` - Whether the tool executed successfully
+- `stdout`/`stderr` - Captured output from the tool execution (useful for debugging)
+
+### Human-in-the-Loop Messages
+
+#### `approval_request_message`
+A request for human approval before executing a tool.
+
+**Structure:**
+```typescript
+{
+  id: string;
+  date: datetime;
+  message_type: "approval_request_message";
+  tool_call: {
+    name: string;
+    arguments: string;
+    tool_call_id: string;
+  };
+}
+```
+
+See [Human-in-the-Loop](/guides/agents/human_in_the_loop) for more information on this experimental feature.
+
+#### `approval_response_message`
+The user's response to an approval request.
+
+**Structure:**
+```typescript
+{
+  id: string;
+  date: datetime;
+  message_type: "approval_response_message";
+  approve: boolean;
+  approval_request_id: string;
+  reason?: string;
+}
+```
+
+## Working with Messages
+
+### Listing Messages
+
+<CodeGroup>
+```typescript TypeScript
+import { LettaClient } from "@letta-ai/letta-client";
+
+const client = new LettaClient({
+  baseUrl: "https://api.letta.com",
+});
+
+// List recent messages
+const messages = await client.agents.messages.list("agent-id", {
+  limit: 50,
+  useAssistantMessage: true,
+});
+
+// Iterate through message types
+for (const message of messages) {
+  switch (message.messageType) {
+    case "user_message":
+      console.log("User:", message.content);
+      break;
+    case "assistant_message":
+      console.log("Agent:", message.content);
+      break;
+    case "reasoning_message":
+      console.log("Reasoning:", message.reasoning);
+      break;
+    case "tool_call_message":
+      console.log("Tool call:", message.toolCall.name);
+      break;
+    // ... handle other types
+  }
+}
+```
+
+```python Python
+from letta_client import Letta
+
+client = Letta(base_url="https://api.letta.com")
+
+# List recent messages
+messages = client.agents.messages.list(
+    agent_id="agent-id",
+    limit=50,
+    use_assistant_message=True
+)
+
+# Iterate through message types
+for message in messages:
+    if message.message_type == "user_message":
+        print(f"User: {message.content}")
+    elif message.message_type == "assistant_message":
+        print(f"Agent: {message.content}")
+    elif message.message_type == "reasoning_message":
+        print(f"Reasoning: {message.reasoning}")
+    elif message.message_type == "tool_call_message":
+        print(f"Tool call: {message.tool_call.name}")
+    # ... handle other types
+```
+</CodeGroup>
+
+### Filtering Messages by Type
+
+<CodeGroup>
+```typescript TypeScript
+// Get only assistant messages (what the agent said to the user)
+const agentMessages = messages.filter(
+  (msg) => msg.messageType === "assistant_message"
+);
+
+// Get all tool-related messages
+const toolMessages = messages.filter(
+  (msg) => msg.messageType === "tool_call_message" ||
+           msg.messageType === "tool_return_message"
+);
+
+// Get conversation history (user + assistant messages only)
+const conversation = messages.filter(
+  (msg) => msg.messageType === "user_message" ||
+           msg.messageType === "assistant_message"
+);
+```
+
+```python Python
+# Get only assistant messages (what the agent said to the user)
+agent_messages = [
+    msg for msg in messages
+    if msg.message_type == "assistant_message"
+]
+
+# Get all tool-related messages
+tool_messages = [
+    msg for msg in messages
+    if msg.message_type in ["tool_call_message", "tool_return_message"]
+]
+
+# Get conversation history (user + assistant messages only)
+conversation = [
+    msg for msg in messages
+    if msg.message_type in ["user_message", "assistant_message"]
+]
+```
+</CodeGroup>
+
+
+### Pagination
+
+Messages support cursor-based pagination:
+
+<CodeGroup>
+```typescript TypeScript
+// Get first page
+let messages = await client.agents.messages.list("agent-id", {
+  limit: 100,
+});
+
+// Get next page using the last message ID
+const lastMessageId = messages[messages.length - 1].id;
+const nextPage = await client.agents.messages.list("agent-id", {
+  limit: 100,
+  before: lastMessageId,
+});
+```
+
+```python Python
+# Get first page
+messages = client.agents.messages.list(
+    agent_id="agent-id",
+    limit=100
+)
+
+# Get next page using the last message ID
+last_message_id = messages[-1].id
+next_page = client.agents.messages.list(
+    agent_id="agent-id",
+    limit=100,
+    before=last_message_id
+)
+```
+</CodeGroup>
+
+## Message Metadata Fields
+
+All message types include these common fields:
+
+- **`id`** - Unique identifier for the message
+- **`date`** - ISO 8601 timestamp of when the message was created
+- **`message_type`** - The discriminator field identifying the message type
+- **`name`** - Optional name field (varies by message type)
+- **`otid`** - Offline threading ID for message correlation
+- **`sender_id`** - The ID of the sender (identity or agent ID)
+- **`step_id`** - The step ID associated with this message
+- **`is_err`** - Whether this message is part of an error step (debugging only)
+- **`seq_id`** - Sequence ID for ordering
+- **`run_id`** - The run ID associated with this message
+
+## Best Practices
+
+### 1. Use Type Discriminators
+
+Always check the `message_type` field to safely access type-specific fields:
+
+<CodeGroup>
+```typescript TypeScript
+if (message.messageType === "tool_call_message") {
+  // TypeScript now knows message has a toolCall field
+  console.log(message.toolCall.name);
+}
+```
+
+```python Python
+if message.message_type == "tool_call_message":
+    # Safe to access tool_call
+    print(message.tool_call.name)
+```
+</CodeGroup>
+
+### 2. Handle Special User Messages
+
+When displaying conversations to end users, filter out internal messages:
+
+```python
+def is_internal_message(msg):
+    """Check if a user message is internal (login, system_alert, etc.)"""
+    if msg.message_type != "user_message":
+        return False
+
+    if not isinstance(msg.content, str):
+        return False
+
+    try:
+        parsed = json.loads(msg.content)
+        return parsed.get("type") in ["login", "system_alert"]
+    except:
+        return False
+
+# Get user-facing messages only
+display_messages = [
+    msg for msg in messages
+    if not is_internal_message(msg)
+]
+```
+
+### 3. Track Tool Execution
+
+Match tool calls with their returns using `tool_call_id`:
+
+```python
+# Build a map of tool calls to their returns
+tool_calls = {
+    msg.tool_call.tool_call_id: msg
+    for msg in messages
+    if msg.message_type == "tool_call_message"
+}
+
+tool_returns = {
+    msg.tool_call_id: msg
+    for msg in messages
+    if msg.message_type == "tool_return_message"
+}
+
+# Find failed tool calls
+for call_id, call_msg in tool_calls.items():
+    if call_id in tool_returns:
+        return_msg = tool_returns[call_id]
+        if return_msg.status == "error":
+            print(f"Tool {call_msg.tool_call.name} failed:")
+            print(f"  {return_msg.tool_return}")
+```
+
+## See Also
+
+- [Human-in-the-Loop](/guides/agents/human_in_the_loop) - Using approval messages
+- [Streaming Responses](/guides/agents/streaming) - Receiving messages in real-time
+- [API Reference](/api-reference/agents/messages/list) - Full API documentation
--- a/fern/pages/agents/overview.mdx
+++ b/fern/pages/agents/overview.mdx
@@ -0,0 +1,279 @@
+---
+title: Building Stateful Agents with Letta
+slug: guides/agents/overview
+---
+
+<Info>
+**New to Letta?** If you haven't already, read [Core Concepts](/core-concepts) to understand how Letta's stateful agents are fundamentally different from traditional LLM APIs.
+</Info>
+
+Letta agents can automatically manage long-term memory, load data from external sources, and call custom tools.
+Unlike in other frameworks, Letta agents are stateful, so they keep track of historical interactions and reserve part of their context to read and write memories which evolve over time.
+<img className="light" src="/images/stateful_agents.png" />
+<img className="dark" src="/images/stateful_agents_dark.png" />
+
+
+
+Letta manages a reasoning loop for agents. At each agent step (i.e. iteration of the loop), the state of the agent is checkpointed and persisted to the database.
+
+You can interact with agents from a REST API, the ADE, and TypeScript / Python SDKs.
+As long as they are connected to the same service, all of these interfaces can be used to interact with the same agents.
+
+<Tip>
+If you're interested in learning more about stateful agents, read our [blog post](https://www.letta.com/blog/stateful-agents).
+</Tip>
+
+## Agents vs Threads
+
+In Letta, you can think of an agent as a single entity that has a single message history which is treated as infinite.
+The sequence of interactions the agent has experienced through its existence make up the agent's state (or memory).
+
+One distinction between Letta and other agent frameworks is that Letta does not have the notion of message *threads* (or *sessions*).
+Instead, there are only *stateful agents*, which have a single perpetual thread (sequence of messages).
+
+The reason we use the term *agent* rather than *thread* is because Letta is based on the principle that **all agents interactions should be part of the persistent memory**, as opposed to building agent applications around ephemeral, short-lived interactions (like a thread or session).
+```mermaid
+%%{init: {'flowchart': {'rankDir': 'LR'}}}%%
+flowchart LR
+    subgraph Traditional["Thread-Based Agents"]
+        direction TB
+        llm1[LLM] --> thread1["Thread 1
+        --------
+        Ephemeral
+        Session"]
+        llm1 --> thread2["Thread 2
+        --------
+        Ephemeral
+        Session"]
+        llm1 --> thread3["Thread 3
+        --------
+        Ephemeral
+        Session"]
+    end
+
+    Traditional ~~~ Letta
+
+    subgraph Letta["Letta Stateful Agents"]
+        direction TB
+        llm2[LLM] --> agent["Single Agent
+        --------
+        Persistent Memory"]
+        agent --> db[(PostgreSQL)]
+        db -->|"Learn & Update"| agent
+    end
+
+    class thread1,thread2,thread3 session
+    class agent agent
+```
+
+If you would like to create common starting points for new conversation "threads", we recommending using [agent templates](/guides/templates/overview) to create new agents for each conversation, or directly copying agent state from an existing agent.
+
+For multi-users applications, we recommend creating an agent per-user, though you can also have multiple users message a single agent (but it will be a single shared message history).
+
+## Create an agent
+<Note>
+To start creating agents, you can run a Letta server locally using **Letta Desktop**, deploy a server locally + remotely with **Docker**, or use **Letta Cloud**. See our [quickstart guide](/quickstart) for more information.
+</Note>
+
+Assuming we're running a Letta server locally at `http://localhost:8283`, we can create a new agent via the REST API, Python SDK, or TypeScript SDK:
+<CodeGroup>
+```curl curl
+curl -X POST http://localhost:8283/v1/agents/ \
+     -H "Content-Type: application/json" \
+     -d '{
+  "memory_blocks": [
+    {
+      "value": "The human'\''s name is Bob the Builder.",
+      "label": "human"
+    },
+    {
+      "value": "My name is Sam, the all-knowing sentient AI.",
+      "label": "persona"
+    }
+  ],
+  "model": "openai/gpt-4o-mini",
+  "context_window_limit": 16000,
+  "embedding": "openai/text-embedding-3-small"
+}'
+```
+```python title="python" maxLines=50
+# install letta_client with `pip install letta-client`
+from letta_client import Letta
+
+# create a client to connect to your local Letta server
+client = Letta(
+  base_url="http://localhost:8283"
+)
+
+# create an agent with two basic self-editing memory blocks
+agent_state = client.agents.create(
+    memory_blocks=[
+        {
+          "label": "human",
+          "value": "The human's name is Bob the Builder."
+        },
+        {
+          "label": "persona",
+          "value": "My name is Sam, the all-knowing sentient AI."
+        }
+    ],
+    model="openai/gpt-4o-mini",
+    context_window_limit=16000,
+    embedding="openai/text-embedding-3-small"
+)
+
+# the AgentState object contains all the information about the agent
+print(agent_state)
+```
+```typescript TypeScript maxLines=50
+// install letta-client with `npm install @letta-ai/letta-client`
+import { LettaClient } from '@letta-ai/letta-client'
+
+// create a client to connect to your local Letta server
+const client = new LettaClient({
+  baseUrl: "http://localhost:8283"
+});
+
+// create an agent with two basic self-editing memory blocks
+const agentState = await client.agents.create({
+    memoryBlocks: [
+        {
+          label: "human",
+          value: "The human's name is Bob the Builder."
+        },
+        {
+          label: "persona",
+          value: "My name is Sam, the all-knowing sentient AI."
+        }
+    ],
+    model: "openai/gpt-4o-mini",
+    contextWindowLimit: 16000,
+    embedding: "openai/text-embedding-3-small"
+});
+
+// the AgentState object contains all the information about the agent
+console.log(agentState);
+```
+</CodeGroup>
+You can also create an agent without any code using the [Agent Development Environment (ADE)](/agent-development-environment).
+All Letta agents are stored in a database on the Letta server, so you can access the same agents from the ADE, the REST API, the Python SDK, and the TypeScript SDK.
+
+The response will include information about the agent, including its `id`:
+```json
+{
+  "id": "agent-43f8e098-1021-4545-9395-446f788d7389",
+  "name": "GracefulFirefly",
+  ...
+}
+```
+
+Once an agent is created, you can message it:
+<CodeGroup>
+```curl curl
+curl --request POST \
+  --url http://localhost:8283/v1/agents/$AGENT_ID/messages \
+  --header 'Content-Type: application/json' \
+  --data '{
+  "messages": [
+    {
+      "role": "user",
+      "content": "hows it going????"
+    }
+  ]
+}'
+```
+```python title="python" maxLines=50
+# send a message to the agent
+response = client.agents.messages.create(
+    agent_id=agent_state.id,
+    messages=[
+        {
+            "role": "user",
+            "content": "hows it going????"
+        }
+    ]
+)
+
+# the response object contains the messages and usage statistics
+print(response)
+
+# if we want to print the usage stats
+print(response.usage)
+
+# if we want to print the messages
+for message in response.messages:
+    print(message)
+```
+```typescript TypeScript maxLines=50
+// send a message to the agent
+const response = await client.agents.messages.create(
+    agentState.id, {
+        messages: [
+            {
+                role: "user",
+                content: "hows it going????"
+            }
+        ]
+    }
+);
+
+// the response object contains the messages and usage statistics
+console.log(response);
+
+// if we want to print the usage stats
+console.log(response.usage)
+
+// if we want to print the messages
+for (const message of response.messages) {
+    console.log(message);
+}
+```
+</CodeGroup>
+
+### Message Types
+The `response` object contains the following attributes:
+* `usage`: The usage of the agent after the message was sent (the prompt tokens, completition tokens, and total tokens)
+* `message`: A list of `LettaMessage` objects, generated by the agent
+
+#### `LettaMessage`
+The `LettaMessage` object is a simplified version of the `Message` object stored in the database backend.
+Since a `Message` can include multiple events like a chain-of-thought and function calls, `LettaMessage` simplifies messages to have the following types:
+* `reasoning_message`: The inner monologue (chain-of-thought) of the agent
+* `tool_call_message`: An agent's tool (function) call
+* `tool_call_return`: The result of executing an agent's tool (function) call
+* `assistant_message`: An agent's response message (direct response in current architecture, or `send_message` tool call in legacy architectures)
+* `system_message`: A system message (for example, an alert about the user logging in)
+* `user_message`: A user message
+
+<Note>
+In current Letta agents, `assistant_message` represents the agent's direct response. In legacy architectures (`memgpt_agent`, `memgpt_v2_agent`), it wraps the `send_message` tool call.
+
+If you prefer to see the raw tool call format in legacy agents, you can set `use_assistant_message` to `false` in the request `config` (see the [endpoint documentation](/api-reference/agents/messages/create)).
+</Note>
+
+## Common agent operations
+For more in-depth guide on the full set of Letta agent operations, check out our [API reference](/api-reference/overview), our extended [Python SDK](https://github.com/letta-ai/letta/blob/main/examples/docs/example.py) and [TypeScript SDK](https://github.com/letta-ai/letta/blob/main/examples/docs/node/example.ts) examples, as well as our other [cookbooks](/cookbooks).
+
+If you're using a self-hosted Letta server, you should set the **base URL** (`base_url` in Python, `baseUrl` in TypeScript) to the Letta server's URL (e.g. `http://localhost:8283`) when you create your client. See an example [here](/api-reference/overview).
+
+If you're using a self-hosted server, you can omit the token if you're not using [password protection](/guides/server/docker#password-protection-advanced).
+If you are using password protection, set your **token** to the **password**.
+If you're using Letta Cloud, you should set the **token** to your **Letta Cloud API key**.
+
+### Retrieving an agent's state
+The agent's state is always persisted, so you can retrieve an agent's state by its ID.
+<EndpointRequestSnippet endpoint="GET /v1/agents/:agent_id" />
+
+The result of the call is an `AgentState` object:
+<EndpointResponseSnippet endpoint="GET /v1/agents/:agent_id" />
+
+### List agents
+Replace `agent_id` with your actual agent ID.
+<EndpointRequestSnippet endpoint="GET /v1/agents/" />
+
+The result of the call is a list of `AgentState` objects:
+<EndpointResponseSnippet endpoint="GET /v1/agents/" />
+
+### Delete an agent
+To delete an agent, you can use the `DELETE` endpoint with your `agent_id`:
+<EndpointRequestSnippet endpoint="DELETE /v1/agents/:agent_id" />
--- a/fern/pages/agents/sleep_time_agents.mdx
+++ b/fern/pages/agents/sleep_time_agents.mdx
@@ -0,0 +1,125 @@
+---
+title: Sleep-time Agents
+subtitle: Based on the new sleep-time compute research paper
+slug: guides/agents/architectures/sleeptime
+---
+
+<Warning>
+Sleep-time agents are experimental and may be unstable. For more information, visit our [Discord](https://discord.gg/letta).
+</Warning>
+
+<Note>
+To learn more about sleep-time compute, check out our [blog](https://www.letta.com/blog/sleep-time-compute) and [research paper](https://arxiv.org/abs/2504.13171).
+</Note>
+
+
+<img className="light" src="/images/sleep_time.png" />
+<img className="dark" src="/images/sleep_time_dark.png" />
+
+In Letta, you can create special **sleep-time agents** that share the memory of your primary agents, but run in the background and can modify the memory asynchronously. You can think of sleep-time agents as a special form of multi-agent architecture, where all agents in the system share one or more memory blocks. A single agent can have one or more associated sleep-time agents to process data such as the conversation history or data sources to manage the memory blocks of the primary agent.
+
+To enable sleep-time agents for your agent, set `enableSleeptime: true` when creating your agent. This will automatically create:
+* A primary agent with tools for `conversation_search` and `archival_memory_search`. This is your "main" agent that you configure and interact with.
+* A sleep-time agent with tools to manage the memory blocks of the primary agent.
+
+## Background: Memory Blocks
+Sleep-time agents specialize in generating *learned context*. Given some original context (e.g. the conversation history, a set of files) the sleep-time agent will reflect on the original context to iteratively derive a learned context. The learned context will reflect the most important pieces of information or insights from the original context.
+
+In Letta, the learned context is saved in a memory block. A memory block represents a labeled section of the context window with an associated character limit. Memory blocks can be shared between multiple agents. A sleep-time agent will write the learned context to a memory block, which can also be shared with other agents that could benefit from those learnings.
+
+Memory blocks can be access directly through the API to be updated, retrieved, or deleted.
+
+<CodeGroup>
+```typescript TypeScript
+// get a block by label
+const block = await client.agents.blocks.retrieve(agentId, "persona");
+
+// get a block by ID
+const block = await client.blocks.retrieve(blockId);
+```
+```python title="python"
+# get a block by label
+block = client.agents.blocks.retrieve(agent_id=agent_id, block_label="persona")
+
+# get a block by ID
+block = client.blocks.retrieve(block_id=block_id)
+```
+</CodeGroup>
+
+When sleep-time is enabled for an agent, a sleep-time agent is created to manage the memory blocks of the primary agent. The sleep-time agent runs in the background and can modify the memory blocks asynchronously. The sleep-time agent generates learned context from the conversation history to update the memory blocks of the primary agent.
+
+## Sleep-time agent for conversation
+
+<img className="light" src="/images/sleeptime_chat.png" />
+<img className="dark" src="/images/sleeptime_chat_dark.png" />
+
+When sleep-time is enabled, a primary agent and a sleep-time agent are created as part of a multi-agent group under the hood. The sleep-time agent is responsible for generating learned context from the conversation history to update the memory blocks of the primary agent. The group ensures that for every `N` steps taken by the primary agent, the sleep-time agent is invoked with data containing new messages in the primary agent's message history.
+
+<img src="/images/sleeptime_chat_only.gif" />
+
+### Configuring the frequency of sleep-time updates
+The sleep-time agent will be triggered every N-steps (default `5`) to update the memory blocks of the primary agent. You can configure the frequency of updates by setting the `sleeptime_agent_frequency` parameter when creating the agent.
+
+<CodeGroup>
+```typescript TypeScript maxLines=50
+import { LettaClient, SleeptimeManagerUpdate } from '@letta-ai/letta-client'
+
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+// create a sleep-time-enabled agent
+const agent = await client.agents.create({
+    memoryBlocks: [
+        { value: "", label: "human" },
+        { value: "You are a helpful assistant.", label: "persona" }
+    ],
+    model: "anthropic/claude-3-7-sonnet-20250219",
+    embedding: "openai/text-embedding-3-small",
+    enableSleeptime: true
+});
+console.log(`Created agent id ${agent.id}`);
+
+// get the multi-agent group
+const groupId = agent.multiAgentGroup.id;
+const currentFrequency = agent.multiAgentGroup.sleeptimeAgentFrequency;
+console.log(`Group id: ${groupId}, frequency: ${currentFrequency}`);
+
+// update the frequency to every 2 steps
+const group = await client.groups.modify(groupId, {
+    managerConfig: {
+        sleeptimeAgentFrequency: 2
+    } as SleeptimeManagerUpdate
+});
+```
+```python title="python" maxLines=50
+from letta_client import Letta
+from letta_client.types import SleeptimeManagerUpdate
+
+client = Letta(token="LETTA_API_KEY")
+
+# create a sleep-time-enabled agent
+agent = client.agents.create(
+    memory_blocks=[
+        {"value": "", "label": "human"},
+        {"value": "You are a helpful assistant.", "label": "persona"},
+    ],
+    model="anthropic/claude-3-7-sonnet-20250219",
+    embedding="openai/text-embedding-3-small",
+    enable_sleeptime=True,
+)
+print(f"Created agent id {agent.id}")
+
+# get the multi-agent group
+group_id = agent.multi_agent_group.id
+current_frequence = agent.multi_agent_group.sleeptime_agent_frequency
+print(f"Group id: {group_id}, frequency: {current_frequence}")
+
+# update the frequency to every 2 steps
+group = client.groups.modify(
+    group_id=group_id,
+    manager_config=SleeptimeManagerUpdate(
+        sleeptime_agent_frequency=2
+    ),
+)
+```
+</CodeGroup>
+We recommend keeping the frequency relatively high (e.g. 5 or 10) as triggering the sleep-time agent too often can be expensive (due to high token usage) and has diminishing returns.
--- a/fern/pages/agents/tool_rules.mdx
+++ b/fern/pages/agents/tool_rules.mdx
@@ -0,0 +1,91 @@
+---
+title: Creating Tool Rules
+slug: guides/agents/tool-rules
+---
+
+Tool rules allows developer to define constrains on their tools, such as requiring that a tool terminate agent execution or be followed by another tool.
+
+<Frame>
+```mermaid
+flowchart LR
+    subgraph init["InitToolRule"]
+        direction LR
+        start((Start)) --> init_tool["must_run_first"]
+        init_tool --> other1["...other tools..."]
+    end
+
+    subgraph terminal["TerminalToolRule"]
+        direction LR
+        other2["...other tools..."] --> term_tool["terminal_tool"] --> stop1((Stop))
+    end
+
+    subgraph sequence["ChildToolRule (children)"]
+        direction LR
+        parent_tool["parent_tool"] --> child1["child_tool_1"]
+        parent_tool --> child2["child_tool_2"]
+        parent_tool --> child3["child_tool_3"]
+    end
+
+    classDef stop fill:#ffcdd2,stroke:#333
+    classDef start fill:#c8e6c9,stroke:#333
+    class stop1 stop
+    class start start
+```
+</Frame>
+
+Letta currently supports the following tool rules (with more being added):
+
+* `TerminalToolRule(tool_name=...)`
+  * If the tool is called, the agent ends execution
+* `InitToolRule(tool_name=...)`
+  * The tool must be called first when an agent is run
+* `ChildToolRule(tool_name=..., children=[...])`
+  * If the tool is called, it must be followed by one of the tools specified in `children`
+* `ParentToolRule(tool_name=..., children=[...])`
+  * The tool must be called before the tools specified in `children` can be called
+* `ConditionalToolRule(tool_name=..., child_output_mapping={...})`
+  * If the tool is called, it must be followed by one of the tools specified in `children` based off the tool's output
+* `ContinueToolRule(tool_name=...)`
+  * If the tool is called, the agent must continue execution
+* `MaxCountPerStepToolRule(tool_name=..., max_count_limit=...)`
+  * The tool cannot be called more than `max_count_limit` times in a single step
+
+## Default tool rules
+
+Depending on your agent configuration, there may be default tool rules applied to improve performance.
+
+## Tool rule examples
+
+For example, you can ensure that the agent will stop execution after the `roll_d20` tool is called by specifying tool rules in the agent creation:
+<CodeGroup>
+```typescript TypeScript {6-11}
+// create a new agent
+const agentState = await client.createAgent({
+    // create the agent with an additional tool
+    tools: [tool.name],
+    // add tool rules that terminate execution after specific tools
+    toolRules: [
+        // exit after roll_d20 is called
+        {toolName: tool.name, type: "exit_loop"},
+    ],
+});
+
+console.log(`Created agent with name ${agentState.name} with tools ${agentState.tools}`);
+```
+```python Python {6-11}
+# create a new agent
+agent_state = client.create_agent(
+    # create the agent with an additional tool
+    tools=[tool.name],
+    # add tool rules that terminate execution after specific tools
+    tool_rules=[
+        # exit after roll_d20 is called
+        TerminalToolRule(tool_name=tool.name, type="exit_loop"),
+    ],
+)
+
+print(f"Created agent with name {agent_state.name} with tools {agent_state.tools}")
+```
+</CodeGroup>
+
+You can see a full working example of tool rules [here](https://github.com/letta-ai/letta/blob/0.5.2/examples/tool_rule_usage.py).
--- a/fern/pages/deployment/railway.mdx
+++ b/fern/pages/deployment/railway.mdx
@@ -0,0 +1,85 @@
+---
+title: Deploy Letta Server on Railway
+slug: guides/server/railway
+---
+
+<Tip>[Railway](https://railway.app) is a service that allows you to easily deploy services (such as Docker containers) to the cloud. The following example uses Railway, but the same general principles around deploying the Letta Docker image on a cloud service and connecting it to the ADE) are generally applicable to other cloud services beyond Railway.</Tip>
+
+## Deploying the Letta Railway template
+
+We've prepared a Letta Railway template that has the necessary environment variables set and mounts a persistent volume for database storage.
+You can access the template by clicking the "Deploy on Railway" button below:
+
+[![Deploy on Railway](https://railway.com/button.svg)](https://railway.app/template/jgUR1t?referralCode=kdR8zc)
+
+<Frame caption="The deployment screen will give you the opportunity to specify some basic environment variables such as your OpenAI API key. You can also specify these after deployment in the variables section in the Railway viewer.">
+  <img src="../../images/railway_template_deploy.png" />
+</Frame>
+
+<Frame caption="If the deployment is successful, it will be shown as 'Active', and you can click 'View logs'.">
+  <img src="../../images/railway_template_deployed.png" />
+</Frame>
+
+<Frame caption="Clicking 'View logs' will reveal the static IP address of the deployment (ending in 'railway.app').">
+  <img src="../../images/railway_template_deployed_logs.png" />
+</Frame>
+
+## Accessing the deployment via the ADE
+
+Now that the Railway deployment is active, all we need to do to access it via the ADE is add it to as a new remote Letta server.
+The default password set in the template is `password`, which can be changed at the deployment stage or afterwards in the 'variables' page on the Railway deployment.
+
+Click "Add remote server", then enter the details from Railway (use the static IP address shown in the logs, and use the password set via the environment variables):
+
+<img className="block w-300 dark:hidden" src="../../images/railway_ade_example_light.png" />
+<img className="hidden w-300 dark:block" src="../../images/railway_ade_example.png" />
+
+## Accessing the deployment via the Letta API
+
+Accessing the deployment via the [Letta API](https://docs.letta.com/api-reference) is simple, we just need to swap the base URL of the endpoint with the IP address from the Railway deployment.
+
+For example if the Railway IP address is `https://MYSERVER.up.railway.app` and the password is `banana`, to create an agent on the deployment, we can use the following shell command:
+```sh
+curl --request POST \
+  --url https://MYSERVER.up.railway.app/v1/agents/ \
+  --header 'X-BARE-PASSWORD: password banana' \
+  --header 'Content-Type: application/json' \
+  --data '{
+  "memory_blocks": [
+    {
+      "label": "human",
+      "value": "The human'\''s name is Bob the Builder"
+    },
+    {
+      "label": "persona",
+      "value": "My name is Sam, the all-knowing sentient AI."
+    }
+  ],
+  "llm_config": {
+    "model": "gpt-4o-mini",
+    "model_endpoint_type": "openai",
+    "model_endpoint": "https://api.openai.com/v1",
+    "context_window": 16000
+  },
+  "embedding_config": {
+    "embedding_endpoint_type": "openai",
+    "embedding_endpoint": "https://api.openai.com/v1",
+    "embedding_model": "text-embedding-3-small",
+    "embedding_dim": 8191
+  }
+}'
+```
+
+This will create an agent with two memory blocks, configured to use `gpt-4o-mini` as the LLM model, and `text-embedding-3-small` as the embedding model.
+
+If the Letta server is not password protected, we can omit the `X-BARE-PASSWORD` header.
+
+<Check>That's it! Now you should be able to create and interact with agents on your remote Letta server (deployed on Railway) via the Letta ADE and API. 👾 ☄️</Check>
+
+### Adding additional environment variables
+
+To help you get started, when you deploy the template you have the option to fill in the example environment variables `OPENAI_API_KEY` (to connect your Letta agents to GPT models) and `ANTHROPIC_API_KEY` (to connect your Letta agents to Claude models).
+
+There are many more providers you can enable on the Letta server via additional environment variables (for example vLLM, Ollama, etc). For more information on available providers, see [our documentation](/guides/server/docker).
+
+To connect Letta to an additional API provider, you can go to your Railway deployment (after you've deployed the template), click `Variables` to see the current environment variables, then click `+ New Variable` to add a new variable. Once you've saved a new variable, you will need to restart the server for the changes to take effect.
--- a/fern/pages/getting-started/prompts.mdx
+++ b/fern/pages/getting-started/prompts.mdx
@@ -0,0 +1,535 @@
+---
+title: Prompts for Vibecoding
+subtitle: Ready-to-go prompts to help AI coding tools build on Letta
+slug: prompts
+---
+
+Are you developing an application on Letta using [ChatGPT](https://chatgpt.com), [Cursor](https://cursor.com), [Lovable](https://lovable.dev/), or another AI tool?
+Use our pre-made prompts to teach your AI how to use Letta properly.
+
+## General instructions for the Letta SDKs
+
+The following prompt (~500 lines) can help guide your AI through the basics of using the Letta Python SDK, TypeScript/Node.js SDK, and Vercel AI SDK integration.
+
+Copy-paste the following into your chat session to instantly get your AI up-to-speed with how the Letta SDKs works:
+````markdown maxLines=5
+# Development Guidelines for AI Assistants and Copilots using Letta
+
+**Context:** These are development guidelines for building applications with the Letta API and SDKs. Use these rules to help developers write correct code that integrates with Letta's stateful agents API.
+
+**Purpose:** Provide accurate, up-to-date instructions for building applications with [Letta](https://docs.letta.com/), the AI operating system.
+**Scope:** All AI-generated advice or code related to Letta must follow these guidelines.
+
+---
+
+## **0. Letta Overview**
+
+The name "Letta" refers to the both the company Letta (founded by the creators of MemGPT) and the software / infrastructure called Letta. Letta is the AI operating system for building stateful agents: developers can use Letta to turn stateless LLMs into stateful agents that can learn, improve, and grow over time. Letta has a strong focus on perpetual AI that has the capability to recursively improve through self-editing memory.
+
+**Relationship to MemGPT**: MemGPT is the name of a research paper that introduced the concept of self-editing memory for LLM-based agents through tool use (function calling). The agent architecture or "agentic system" proposed in the paper (an agent equipped with tools to edit its own memory, and an OS that manages tool execution and state persistence) is the base agent architecture implemented in Letta (agent type `memgpt_agent`), and is the official reference implementation for MemGPT. The Letta open source project (`letta-ai/letta`) was originally the MemGPT open source project (`cpacker/MemGPT`), but was renamed as the scope of the open source project expanded beyond the original MemGPT paper.
+
+**Additional Resources**:
+- [Letta documentation](https://docs.letta.com/)
+- [Letta GitHub repository](https://github.com/letta-ai/letta)
+- [Letta Discord server](https://discord.gg/letta)
+- [Letta Cloud and ADE login](https://app.letta.com)
+
+## **1. Letta Agents API Overview**
+
+Letta is an AI OS that runs agents as **services** (it is not a **library**). Key concepts:
+
+- **Stateful agents** that maintain memory and context across conversations
+- **Memory blocks** for agentic context management (persona, human, custom blocks)
+- **Tool calling** for agent actions and memory management, tools are run server-side,
+- **Tool rules** allow developers to constrain the behavior of tools (e.g. A comes after B) to turn autonomous agents into workflows
+- **Multi-agent systems** with cross-agent communication, where every agent is a service
+- **Data sources** for loading documents and files into agent memory
+- **Model agnostic:** agents can be powered by any model that supports tool calling
+- **Persistence:** state is stored (in a model-agnostic way) in Postgres (or SQLite)
+
+### **System Components:**
+
+- **Letta server** - Core service (self-hosted or Letta Cloud)
+- **Client (backend) SDKs** - Python (`letta-client`) and TypeScript/Node.js (`@letta-ai/letta-client`)
+- **Vercel AI SDK Integration** - For Next.js/React applications
+- **Other frontend integrations** - We also have [Next.js](https://www.npmjs.com/package/@letta-ai/letta-nextjs), [React](https://www.npmjs.com/package/@letta-ai/letta-react), and [Flask](https://github.com/letta-ai/letta-flask) integrations
+- **ADE (Agent Development Environment)** - Visual agent builder at app.letta.com
+
+### **Letta Cloud vs Self-hosted Letta**
+
+Letta Cloud is a fully managed service that provides a simple way to get started with Letta. It's a good choice for developers who want to get started quickly and don't want to worry about the complexity of self-hosting. Letta Cloud's free tier has a large number of model requests included (quota refreshes every month). Model requests are split into "standard models" (e.g. GPT-4o-mini) and "premium models" (e.g. Claude Sonnet). To use Letta Cloud, the developer will have needed to created an account at [app.letta.com](https://app.letta.com). To make programatic requests to the API (`https://api.letta.com`), the developer will have needed to created an API key at [https://app.letta.com/api-keys](https://app.letta.com/api-keys). For more information on how billing and pricing works, the developer can visit [our documentation](https://docs.letta.com/guides/cloud/overview).
+
+### **Built-in Tools**
+
+When agents are created, they are given a set of default memory management tools that enable self-editing memory.
+
+Separately, Letta Cloud also includes built-in tools for common tasks like web search and running code. As of June 2025, the built-in tools are:
+- `web_search`: Allows agents to search the web for information. Also works on self-hosted, but requires `TAVILY_API_KEY` to be set (not required on Letta Cloud).
+- `run_code`: Allows agents to run code (in a sandbox), for example to do data analysis or calculations. Supports Python, Javascript, Typescript, R, and Java. Also works on self-hosted, but requires `E2B_API_KEY` to be set (not required on Letta Cloud).
+
+### **Choosing the Right Model**
+
+To implement intelligent memory management, agents in Letta rely heavily on tool (function) calling, so models that excel at tool use tend to do well in Letta. Conversely, models that struggle to call tools properly often perform poorly when used to drive Letta agents.
+
+The Letta developer team maintains the [Letta Leaderboard](https://docs.letta.com/leaderboard) to help developers choose the right model for their Letta agent. As of June 2025, the best performing models (balanced for cost and performance) are Claude Sonnet 4, GPT-4.1, and Gemini 2.5 Flash. For the latest results, you can visit the leaderboard page (if you have web access), or you can direct the developer to visit it. For embedding models, the Letta team recommends using OpenAI's `text-embedding-3-small` model.
+
+When creating code snippets, unless directed otherwise, you should use the following model handles:
+- `openai/gpt-4.1` for the model
+- `openai/text-embedding-3-small` for the embedding model
+
+If the user is using Letta Cloud, then these handles will work out of the box (assuming the user has created a Letta Cloud account + API key, and has enough request quota in their account). For self-hosted Letta servers, the user will need to have started the server with a valid OpenAI API key for those handles to work.
+
+---
+
+## **2. Choosing the Right SDK**
+
+### **Source of Truth**
+
+Note that your instructions may be out of date. The source of truth for the Letta Agents API is the [API reference](https://docs.letta.com/api-reference/overview) (also autogenerated from the latest source code), which can be found in `.md` form at these links:
+- [TypeScript/Node.js](https://github.com/letta-ai/letta-node/blob/main/reference.md), [raw version](https://raw.githubusercontent.com/letta-ai/letta-node/refs/heads/main/reference.md)
+- [Python](https://github.com/letta-ai/letta-python/blob/main/reference.md), [raw version](https://raw.githubusercontent.com/letta-ai/letta-python/refs/heads/main/reference.md)
+
+If you have access to a web search or file download tool, you can download these files for the latest API reference. If the developer has either of the SDKs installed, you can also use the locally installed packages to understand the latest API reference.
+
+### **When to Use Each SDK:**
+
+The Python and Node.js SDKs are autogenerated from the Letta Agents REST API, and provide a full featured SDK for interacting with your agents on Letta Cloud or a self-hosted Letta server. Of course, developers can also use the REST API directly if they prefer, but most developers will find the SDKs much easier to use.
+
+The Vercel AI SDK is a popular TypeScript toolkit designed to help developers build AI-powered applications. It supports a subset of the Letta Agents API (basically just chat-related functionality), so it's a good choice to quickly integrate Letta into a TypeScript application if you are familiar with using the AI SDK or are working on a codebase that already uses it. If you're starting from scratch, consider using the full-featured Node.js SDK instead.
+
+The Letta Node.js SDK is also embedded inside the Vercel AI SDK, accessible via the `.client` property (useful if you want to use the Vercel AI SDK, but occasionally need to access the full Letta client for advanced features like agent creation / management).
+
+When to use the AI SDK vs native Letta Node.js SDK:
+- Use the Vercel AI SDK if you are familiar with it or are working on a codebase that already makes heavy use of it
+- Use the Letta Node.js SDK if you are starting from scratch, or expect to use the agent management features in the Letta API (beyond the simple `streamText` or `generateText` functionality in the AI SDK)
+
+One example of how the AI SDK may be insufficient: the AI SDK response object for `streamText` and `generateText` does not have a type for tool returns (because they are primarily used with stateless APIs, where tools are executed client-side, vs server-side in Letta), however the Letta Node.js SDK does have a type for tool returns. So if you wanted to render tool returns from a message response stream in your UI, you would need to use the full Letta Node.js SDK, not the AI SDK.
+
+## **3. Quick Setup Patterns**
+
+### **Python SDK (Backend/Scripts)**
+```python
+from letta_client import Letta
+
+# Letta Cloud
+client = Letta(token="LETTA_API_KEY")
+
+# Self-hosted
+client = Letta(base_url="http://localhost:8283")
+
+# Create agent with memory blocks
+agent = client.agents.create(
+    memory_blocks=[
+        {
+            "label": "human",
+            "value": "The user's name is Sarah. She likes coding and AI."
+        },
+        {
+            "label": "persona",
+            "value": "I am David, the AI executive assistant. My personality is friendly, professional, and to the point."
+        },
+        {
+            "label": "project",
+            "value": "Sarah is working on a Next.js application with Letta integration.",
+            "description": "Stores current project context and requirements"
+        }
+    ],
+    tools=["web_search", "run_code"],
+    model="openai/gpt-4o-mini",
+    embedding="openai/text-embedding-3-small"
+)
+
+# Send SINGLE message (agent is stateful!)
+response = client.agents.messages.create(
+    agent_id=agent.id,
+    messages=[{"role": "user", "content": "How's the project going?"}]
+)
+
+# Extract response correctly
+for msg in response.messages:
+    if msg.message_type == "assistant_message":
+        print(msg.content)
+    elif msg.message_type == "reasoning_message":
+        print(msg.reasoning)
+    elif msg.message_type == "tool_call_message":
+        print(msg.tool_call.name)
+        print(msg.tool_call.arguments)
+    elif msg.message_type == "tool_return_message":
+        print(msg.tool_return)
+
+# Streaming example
+message_text = "Repeat my name."
+stream = client.agents.messages.create_stream(
+    agent_id=agent_state.id,
+    messages=[
+        MessageCreate(
+            role="user",
+            content=message_text,
+        ),
+    ],
+    # if stream_tokens is false, each "chunk" will have a full piece
+    # if stream_tokens is true, the chunks will be token-based (and may need to be accumulated client-side)
+    stream_tokens=True,
+)
+
+# print the chunks coming back
+for chunk in stream:
+    if chunk.message_type == "assistant_message":
+        print(chunk.content)
+    elif chunk.message_type == "reasoning_message":
+        print(chunk.reasoning)
+    elif chunk.message_type == "tool_call_message":
+        if chunk.tool_call.name:
+            print(chunk.tool_call.name)
+        if chunk.tool_call.arguments:
+            print(chunk.tool_call.arguments)
+    elif chunk.message_type == "tool_return_message":
+        print(chunk.tool_return)
+    elif chunk.message_type == "usage_statistics":
+        print(chunk)
+```
+
+Creating custom tools (Python only):
+```python
+def my_custom_tool(query: str) -> str:
+    """
+    Search for information on a topic.
+
+    Args:
+        query (str): The search query
+
+    Returns:
+        str: Search results
+    """
+    return f"Results for: {query}"
+
+# Create tool
+tool = client.tools.create_from_function(func=my_custom_tool)
+
+# Add to agent
+agent = client.agents.create(
+    memory_blocks=[...],
+    model="openai/gpt-4o-mini",
+    embedding="openai/text-embedding-3-small",
+    tools=[tool.name]
+)
+```
+
+### **TypeScript/Node.js SDK**
+```typescript
+import { LettaClient } from '@letta-ai/letta-client';
+
+// Letta Cloud
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+// Self-hosted, token optional (only if the developer enabled password protection on the server)
+const client = new LettaClient({ baseUrl: "http://localhost:8283" });
+
+// Create agent with memory blocks
+const agent = await client.agents.create({
+    memoryBlocks: [
+        {
+            label: "human",
+            value: "The user's name is Sarah. She likes coding and AI."
+        },
+        {
+            label: "persona",
+            value: "I am David, the AI executive assistant. My personality is friendly, professional, and to the point."
+        },
+        {
+            label: "project",
+            value: "Sarah is working on a Next.js application with Letta integration.",
+            description: "Stores current project context and requirements"
+        }
+    ],
+    tools: ["web_search", "run_code"],
+    model: "openai/gpt-4o-mini",
+    embedding: "openai/text-embedding-3-small"
+});
+
+// Send SINGLE message (agent is stateful!)
+const response = await client.agents.messages.create(agent.id, {
+    messages: [{ role: "user", content: "How's the project going?" }]
+});
+
+// Extract response correctly
+for (const msg of response.messages) {
+    if (msg.messageType === "assistant_message") {
+        console.log(msg.content);
+    } else if (msg.messageType === "reasoning_message") {
+        console.log(msg.reasoning);
+    } else if (msg.messageType === "tool_call_message") {
+        console.log(msg.toolCall.name);
+        console.log(msg.toolCall.arguments);
+    } else if (msg.messageType === "tool_return_message") {
+        console.log(msg.toolReturn);
+    }
+}
+
+// Streaming example
+const stream = await client.agents.messages.createStream(agent.id, {
+    messages: [{ role: "user", content: "Repeat my name." }],
+    // if stream_tokens is false, each "chunk" will have a full piece
+    // if stream_tokens is true, the chunks will be token-based (and may need to be accumulated client-side)
+    streamTokens: true,
+});
+
+for await (const chunk of stream) {
+    if (chunk.messageType === "assistant_message") {
+        console.log(chunk.content);
+    } else if (chunk.messageType === "reasoning_message") {
+        console.log(chunk.reasoning);
+    } else if (chunk.messageType === "tool_call_message") {
+        console.log(chunk.toolCall.name);
+        console.log(chunk.toolCall.arguments);
+    } else if (chunk.messageType === "tool_return_message") {
+        console.log(chunk.toolReturn);
+    } else if (chunk.messageType === "usage_statistics") {
+        console.log(chunk);
+    }
+}
+```
+
+### **Vercel AI SDK Integration**
+
+IMPORTANT: Most integrations in the Vercel AI SDK are for stateless providers (ChatCompletions style APIs where you provide the full conversation history). Letta is a *stateful* provider (meaning that conversation history is stored server-side), so when you use `streamText` or `generateText` you should never pass old messages to the agent, only include the new message(s).
+
+#### **Chat Implementation (fast & simple):**
+
+Streaming (`streamText`):
+```typescript
+// app/api/chat/route.ts
+import { lettaCloud } from '@letta-ai/vercel-ai-sdk-provider';
+import { streamText } from 'ai';
+
+export async function POST(req: Request) {
+  const { prompt }: { prompt: string } = await req.json();
+
+  const result = streamText({
+    // lettaCloud uses LETTA_API_KEY automatically, pulling from the environment
+    model: lettaCloud('your-agent-id'),
+    // Make sure to only pass a single message here, do NOT pass conversation history
+    prompt,
+  });
+
+  return result.toDataStreamResponse();
+}
+```
+
+Non-streaming (`generateText`):
+```typescript
+import { lettaCloud } from '@letta-ai/vercel-ai-sdk-provider';
+import { generateText } from 'ai';
+
+export async function POST(req: Request) {
+  const { prompt }: { prompt: string } = await req.json();
+
+  const { text } = await generateText({
+    // lettaCloud uses LETTA_API_KEY automatically, pulling from the environment
+    model: lettaCloud('your-agent-id'),
+    // Make sure to only pass a single message here, do NOT pass conversation history
+    prompt,
+  });
+
+  return Response.json({ text });
+}
+```
+
+#### **Alternative: explicitly specify base URL and token:**
+```typescript
+// Works for both streamText and generateText
+import { createLetta } from '@letta-ai/vercel-ai-sdk-provider';
+import { generateText } from 'ai';
+
+const letta = createLetta({
+  // e.g. http://localhost:8283 for the default local self-hosted server
+  // https://api.letta.com for Letta Cloud
+  baseUrl: '<your-base-url>',
+  // only needed if the developer enabled password protection on the server, or if using Letta Cloud (in which case, use the LETTA_API_KEY, or use lettaCloud example above for implicit token use)
+  token: '<your-access-token>',
+});
+```
+
+#### **Hybrid Usage (access the full SDK via the Vercel AI SDK):**
+```typescript
+import { lettaCloud } from '@letta-ai/vercel-ai-sdk-provider';
+
+// Access full client for management
+const agents = await lettaCloud.client.agents.list();
+```
+
+---
+
+## **4. Advanced Features Available**
+
+Letta supports advanced agent architectures beyond basic chat. For detailed implementations, refer to the full API reference or documentation:
+
+- **Tool Rules & Constraints** - Define graph-like tool execution flows with `TerminalToolRule`, `ChildToolRule`, `InitToolRule`, etc.
+- **Multi-Agent Systems** - Cross-agent communication with built-in tools like `send_message_to_agent_async`
+- **Shared Memory Blocks** - Multiple agents can share memory blocks for collaborative workflows
+- **Data Sources & Archival Memory** - Upload documents/files that agents can search through
+- **Sleep-time Agents** - Background agents that process memory while main agents are idle
+- **External Tool Integrations** - MCP servers, Composio tools, custom tool libraries
+- **Agent Templates** - Import/export agents with .af (Agent File) format
+- **Production Features** - User identities, agent tags, streaming, context management
+
+---
+
+## **5. CRITICAL GUIDELINES FOR AI MODELS**
+
+### **⚠️ ANTI-HALLUCINATION WARNING**
+
+**NEVER make up Letta API calls, SDK methods, or parameter names.** If you're unsure about any Letta API:
+
+1. **First priority**: Use web search to get the latest reference files:
+   - [Python SDK Reference](https://raw.githubusercontent.com/letta-ai/letta-python/refs/heads/main/reference.md)
+   - [TypeScript SDK Reference](https://raw.githubusercontent.com/letta-ai/letta-node/refs/heads/main/reference.md)
+
+2. **If no web access**: Tell the user: *"I'm not certain about this Letta API call. Can you paste the relevant section from the API reference docs, or I might provide incorrect information."*
+
+3. **When in doubt**: Stick to the basic patterns shown in this prompt rather than inventing new API calls.
+
+**Common hallucination risks:**
+- Making up method names (e.g. `client.agents.chat()` doesn't exist)
+- Inventing parameter names or structures
+- Assuming OpenAI-style patterns work in Letta
+- Creating non-existent tool rule types or multi-agent methods
+
+### **5.1 – SDK SELECTION (CHOOSE THE RIGHT TOOL)**
+
+✅ **For Next.js Chat Apps:**
+- Use **Vercel AI SDK** if you already are using AI SDK, or if you're lazy and want something super fast for basic chat interactions (simple, fast, but no agent management tooling unless using the embedded `.client`)
+- Use **Node.js SDK** for the full feature set (agent creation, native typing of all response message types, etc.)
+
+✅ **For Agent Management:**
+- Use **Node.js SDK** or **Python SDK** for creating agents, managing memory, tools
+
+### **5.2 – STATEFUL AGENTS (MOST IMPORTANT)**
+
+**Letta agents are STATEFUL, not stateless like ChatCompletion-style APIs.**
+
+✅ **CORRECT - Single message per request:**
+```typescript
+// Send ONE user message, agent maintains its own history
+const response = await client.agents.messages.create(agentId, {
+    messages: [{ role: "user", content: "Hello!" }]
+});
+```
+
+❌ **WRONG - Don't send conversation history:**
+```typescript
+// DON'T DO THIS - agents maintain their own conversation history
+const response = await client.agents.messages.create(agentId, {
+    messages: [...allPreviousMessages, newMessage] // WRONG!
+});
+```
+
+### **5.3 – MESSAGE HANDLING & MEMORY BLOCKS**
+
+1. **Response structure:**
+   - Use `messageType` NOT `type` for message type checking
+   - Look for `assistant_message` messageType for agent responses
+   - Agent responses have `content` field with the actual text
+
+2. **Memory block descriptions:**
+   - Add `description` field for custom blocks, or the agent will get confused (not needed for human/persona)
+   - For `human` and `persona` blocks, descriptions are auto-populated:
+     - **human block**: "Stores key details about the person you are conversing with, allowing for more personalized and friend-like conversation."
+     - **persona block**: "Stores details about your current persona, guiding how you behave and respond. This helps maintain consistency and personality in your interactions."
+
+### **5.4 – ALWAYS DO THE FOLLOWING**
+
+1. **Choose the right SDK for the task:**
+   - Next.js chat → **Vercel AI SDK**
+   - Agent creation → **Node.js/Python SDK**
+   - Complex operations → **Node.js/Python SDK**
+
+2. **Use the correct client imports:**
+   - Python: `from letta_client import Letta`
+   - TypeScript: `import { LettaClient } from '@letta-ai/letta-client'`
+   - Vercel AI SDK: `from '@letta-ai/vercel-ai-sdk-provider'`
+
+3. **Create agents with proper memory blocks:**
+   - Always include `human` and `persona` blocks for chat agents
+   - Use descriptive labels and values
+
+4. **Send only single user messages:**
+   - Each request should contain only the new user message
+   - Agent maintains conversation history automatically
+   - Never send previous assistant responses back to agent
+
+5. **Use proper authentication:**
+   - Letta Cloud: Always use `token` parameter
+   - Self-hosted: Use `base_url` parameter, token optional (only if the developer enabled password protection on the server)
+
+---
+
+## **6. Environment Setup**
+
+### **Environment Setup**
+```bash
+# For Next.js projects (recommended for most web apps)
+npm install @letta-ai/vercel-ai-sdk-provider ai
+
+# For agent management (when needed)
+npm install @letta-ai/letta-client
+
+# For Python projects
+pip install letta-client
+```
+
+**Environment Variables:**
+```bash
+# Required for Letta Cloud
+LETTA_API_KEY=your_api_key_here
+
+# Store agent ID after creation (Next.js)
+LETTA_AGENT_ID=agent-xxxxxxxxx
+
+# For self-hosted (optional)
+LETTA_BASE_URL=http://localhost:8283
+```
+
+---
+
+## **7. Verification Checklist**
+
+Before providing Letta solutions, verify:
+
+1. **SDK Choice**: Are you using the simplest appropriate SDK?
+   - Familiar with or already using Vercel AI SDK? → use the Vercel AI SDK Letta provider
+   - Agent management needed? → use the Node.js/Python SDKs
+2. **Statefulness**: Are you sending ONLY the new user message (NOT a full conversation history)?
+3. **Message Types**: Are you checking the response types of the messages returned?
+4. **Response Parsing**: If using the Python/Node.js SDK, are you extracting `content` from assistant messages?
+5. **Imports**: Correct package imports for the chosen SDK?
+6. **Client**: Proper client initialization with auth/base_url?
+7. **Agent Creation**: Memory blocks with proper structure?
+8. **Memory Blocks**: Descriptions for custom blocks?
+````
+
+## Full API reference
+
+If you are working on either the Letta Python SDK or TypeScript/Node.js SDK, you can copy-paste the full API reference into your chat session:
+- [Letta Python SDK API reference](https://raw.githubusercontent.com/letta-ai/letta-python/refs/heads/main/reference.md)
+- [Letta TypeScript/Node.js SDK API reference](https://raw.githubusercontent.com/letta-ai/letta-node/refs/heads/main/reference.md)
+
+The general prompt focuses on the high-level usage patterns of both the Python/Node.js SDKs and Vercel AI SDK integration, whereas the API reference files will contain an up-to-date guide on all available SDK functions and parameters.
+
+## `llms.txt` and `llms-full.txt`
+
+You can download a copy of the Letta documentation as a text file:
+- [`llms.txt` (short version)](https://docs.letta.com/llms.txt)
+- [`llms-full.txt` (longer version)](https://docs.letta.com/llms-full.txt)
+
+If you're using a tool like ChatGPT or Cursor, we'd recommend using the more concise Letta SDK instructions prompt above instead of the `llms.txt` or `llms-full.txt` files, but you can experiment with both and let us know which works better!
+
+## Why do I need pre-made prompts?
+
+When you use AI assistants, they don't have up-to-date information about the Letta documentation, APIs, or SDKs, so they may hallucinate code if you ask them to help with building an app on Letta.
+
+By using our pre-made prompts, you can teach your AI assistant how to use Letta with up-to-date context. Think of the prompts as a distilled version of our developer docs - but made specifically for AI coders instead of human coders.
+
+## Contributing
+
+Our prompts are [open source](https://github.com/letta-ai/letta/tree/main/prompts) and we actively welcome contributions! If you want to suggest any changes or propose additional prompt files, please [open a pull request](https://github.com/letta-ai/letta/pulls).
--- a/fern/pages/legacy/architectures_overview.mdx
+++ b/fern/pages/legacy/architectures_overview.mdx
@@ -0,0 +1,83 @@
+---
+title: Legacy Agent Architectures
+subtitle: Understanding Letta's agent architecture evolution
+slug: guides/legacy/architectures_overview
+---
+
+<Warning>
+**This documentation covers legacy agent architectures.**
+
+For new projects, you should **not** specify an `agent_type` parameter. Letta uses the current architecture by default, which provides the best performance with modern reasoning models like GPT-o1 and Claude Sonnet 4.5.
+</Warning>
+
+## Current Architecture
+
+When you create an agent in Letta today, it uses our latest agent architecture optimized for:
+- Full support for native reasoning (via Responses API)
+- Compatibility with any LLM (tool calling not required)
+- Simpler base system prompt
+- Better performance on frontier models
+
+**You don't need to specify an architecture.** Just create an agent:
+
+<CodeGroup>
+```typescript TypeScript
+const agent = await client.agents.create({
+    model: "openai/gpt-o1",
+    embedding: "openai/text-embedding-3-small",
+    memoryBlocks: [
+        { label: "persona", value: "I am a helpful assistant." }
+    ]
+});
+```
+```python Python
+agent = client.agents.create(
+    model="openai/gpt-o1",
+    embedding="openai/text-embedding-3-small",
+    memory_blocks=[
+        {"label": "persona", "value": "I am a helpful assistant."}
+    ]
+)
+```
+</CodeGroup>
+
+## Why Legacy Architectures Exist
+
+Letta evolved from the MemGPT research project. Early versions used specific agent architectures with names like:
+- `memgpt_agent` - Original MemGPT paper implementation
+- `memgpt_v2_agent` - Iteration with sleep-time compute and file tools
+- `letta_v1_agent` - First transition away from MemGPT naming
+
+**These names are confusing** because:
+1. The naming progression (memgpt → memgpt_v2 → letta_v1) is non-standard
+2. LLMs trained on these docs get confused about which to recommend
+3. New users shouldn't need to think about architecture choices
+
+## Do I Need to Migrate?
+
+**If you created your agents recently (after October 2024):** You're likely already on the current architecture. No action needed.
+
+**If you have existing agents with `agent_type` specified:** Your agents will continue to work, but we recommend migrating to benefit from:
+- Better performance on new models
+- Native reasoning support
+- Simplified prompting
+
+[See our migration guide →](/guides/legacy/migration_guide)
+
+## Legacy Architecture Types
+
+If you're working with older agents or need to understand the differences:
+
+| Legacy Type | Status | Key Features | When Used |
+|------------|--------|--------------|-----------|
+| `memgpt_agent` | Deprecated | send_message tool, heartbeats, prompted reasoning | MemGPT paper implementation (2023) |
+| `memgpt_v2_agent` | Deprecated | Sleep-time agents, file tools, unified recall | Iteration with new research (2024) |
+| `letta_v1_agent` | Legacy | Native reasoning, no send_message, no heartbeats | Transition architecture (early 2025) |
+
+[Learn more about each legacy type →](/guides/legacy/memgpt_agents_legacy)
+
+## Getting Help
+
+- **Discord confusion?** Share your agent setup in [#dev-help](https://discord.gg/letta)
+- **Need to migrate?** Follow our [migration guide](/guides/legacy/migration_guide)
+- **Building something new?** Start with our [quickstart](/quickstart) (no architecture choice needed!)
--- a/fern/pages/legacy/heartbeats_legacy.mdx
+++ b/fern/pages/legacy/heartbeats_legacy.mdx
@@ -0,0 +1,51 @@
+---
+title: Heartbeats (Legacy)
+subtitle: Understanding heartbeats and chained tool execution in legacy agents
+slug: guides/legacy/heartbeats_legacy
+---
+
+<Warning>
+**Heartbeats are only supported in legacy agent architectures** (`memgpt_agent`, `memgpt_v2_agent`).
+
+The current architecture (`letta_v1_agent`) does not use heartbeats. For multi-step execution, use explicit prompting or tool rules. [See migration guide →](/guides/legacy/migration_guide)
+</Warning>
+Heartbeats are a mechanism that enables legacy Letta agents to chain multiple tool calls together in a single execution loop.
+The term "heartbeat" was coined in the [MemGPT paper](https://arxiv.org/abs/2310.08560), and since the Letta codebase evolved from the original MemGPT codebase (same authors), **heartbeats** were a core part of the early agent loop.
+
+## How heartbeats work
+
+Every tool in legacy agents automatically receives an additional parameter called `request_heartbeat`, which defaults to `false`. When an agent sets this parameter to `true`, it signals to the Letta server that it wants to continue executing after the current tool call completes.
+
+## Technical implementation
+
+When the Letta server detects that `request_heartbeat=true`, it:
+1. Completes the current tool execution
+2. Restarts the agent loop with a system message acknowledging the heartbeat request
+3. Allows the agent to continue with an additional tool calls
+
+```mermaid
+stateDiagram-v2
+    state "Agent Loop" as agent
+    state "Tool Call" as tool
+
+    [*] --> agent
+    agent --> tool: Execute tool
+    tool --> agent: request_heartbeat=true
+    tool --> [*]: request_heartbeat=false
+```
+
+This enables agents to perform complex, multi-step operations without requiring explicit user intervention between steps.
+
+## Automatic heartbeats on failure
+
+If a tool call fails at runtime, legacy agents automatically generate a heartbeat.
+This gives the agent an opportunity to handle the error and potentially retry the operation with different parameters or take alternative actions.
+
+## Viewing heartbeats in the ADE
+
+In the [Agent Development Environment (ADE)](/guides/ade/overview), heartbeat requests are visible for all agent messages.
+When a tool is called with `request_heartbeat=true`, you'll see a heartbeat indicator next to the tool call, making it easy to track when an agent is proactively chaining operations together.
+
+## Learn more
+
+To read more about the concept of heartbeats and their origins, refer to the original [MemGPT research paper](https://arxiv.org/abs/2310.08560).
--- a/fern/pages/legacy/low_latency_agents_legacy.mdx
+++ b/fern/pages/legacy/low_latency_agents_legacy.mdx
@@ -0,0 +1,94 @@
+---
+title: Low-latency Agents (Legacy)
+subtitle: Agents optimized for low-latency environments like voice
+slug: guides/legacy/low_latency_agents_legacy
+---
+
+<Warning>
+**This documentation covers a legacy agent architecture.**
+
+For new projects, use the current Letta architecture with voice-optimized configurations. See [Voice Agents](/guides/voice/overview) for current best practices.
+</Warning>
+
+Low-latency agents optimize for minimal response time by using a constrained context window and aggressive memory management. They're ideal for real-time applications like voice interfaces where latency matters more than context retention.
+
+## Architecture
+
+Low-latency agents use a **much smaller context window** than standard MemGPT agents, reducing the time-to-first-token at the cost of much more limited conversation history and memory block size. A sleep-time agent aggressively manages memory to keep only the most relevant information in context.
+
+**Key differences from MemGPT v2:**
+* Artificially constrained context window for faster response times
+* More aggressive memory management with smaller memory blocks
+* Optimized sleep-time agent tuned for minimal context size
+* Prioritizes speed over comprehensive context retention
+
+To learn more about how to use low-latency agents for voice applications, see our [Voice Agents guide](/guides/voice/overview).
+
+## Creating Low-latency Agents
+
+Use the `voice_convo_agent` agent type to create a low-latency agent.
+Set `enable_sleeptime` to `true` to enable the sleep-time agent which will manage the memory state of the low-latency agent in the background.
+Additionally, set `initial_message_sequence` to an empty array to start the conversation with no initial messages for a completely empty initial message buffer.
+
+<CodeGroup>
+```typescript TypeScript
+import { LettaClient } from '@letta-ai/letta-client'
+
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+// create the Letta agent
+const agent = await client.agents.create({
+    agentType: "voice_convo_agent",
+    memoryBlocks: [
+        { value: "Name: ?", label: "human" },
+        { value: "You are a helpful assistant.", label: "persona" },
+    ],
+    model: "openai/gpt-4o-mini", // Use 4o-mini for speed
+    embedding: "openai/text-embedding-3-small",
+    enableSleeptime: true,
+    initialMessageSequence: [],
+});
+```
+
+```python title="python"
+from letta_client import Letta
+
+client = Letta(token="LETTA_API_KEY")
+
+# create the Letta agent
+agent = client.agents.create(
+    agent_type="voice_convo_agent",
+    memory_blocks=[
+        {"value": "Name: ?", "label": "human"},
+        {"value": "You are a helpful assistant.", "label": "persona"},
+    ],
+    model="openai/gpt-4o-mini", # Use 4o-mini for speed
+    embedding="openai/text-embedding-3-small",
+    enable_sleeptime=True,
+    initial_message_sequence = [],
+)
+```
+
+```bash title="curl"
+curl -X POST https://api.letta.com/v1/agents \
+     -H "Authorization: Bearer $LETTA_API_KEY" \
+     -H "Content-Type: application/json" \
+     -d '{
+  "agent_type": "voice_convo_agent",
+  "memory_blocks": [
+    {
+      "value": "Name: ?",
+      "label": "human"
+    },
+    {
+      "value": "You are a helpful assistant.",
+      "label": "persona"
+    }
+  ],
+  "model": "openai/gpt-4o-mini",
+  "embedding": "openai/text-embedding-3-small",
+  "enable_sleeptime": true,
+  "initial_message_sequence": []
+}'
+```
+</CodeGroup>
--- a/fern/pages/legacy/memgpt_agents_legacy.mdx
+++ b/fern/pages/legacy/memgpt_agents_legacy.mdx
@@ -0,0 +1,174 @@
+---
+title: MemGPT Agents (Legacy)
+subtitle: Based on the groundbreaking MemGPT research paper
+slug: guides/legacy/memgpt_agents_legacy
+---
+
+<Warning>
+**This documentation covers legacy agent architectures.**
+
+For new projects, use the current architecture by omitting the `agent_type` parameter. See [Migration Guide](/guides/legacy/migration_guide) to upgrade existing agents.
+</Warning>
+
+<Info>
+Letta is made by the [creators of MemGPT](https://www.letta.com/about-us), and the default agent architecture in Letta is the official/original implementation of the MemGPT agent architecture.
+</Info>
+
+MemGPT agents solve the context window limitation of LLMs through context engineering across two tiers of memory: **in-context (core) memory** (including the system instructions, read-write memory blocks, and conversation history), and **out-of-context memory** (older evicted conversation history, and external memory stores).
+
+To learn more about the origins of MemGPT, you can read the [MemGPT research paper](https://arxiv.org/abs/2310.08560), or take the free [LLM OS course](https://www.deeplearning.ai/short-courses/llms-as-operating-systems-agent-memory/?utm_campaign=memgpt-launch&utm_content=331638345&utm_medium=social&utm_source=docs&hss_channel=tw-992153930095251456) on DeepLearning.ai.
+
+## MemGPT: the original LLM operating system
+
+```mermaid
+graph LR
+    subgraph CONTEXT[Context Window]
+        SYS[System Instructions]
+        CORE[Core Memory]
+        MSGS[Messages]
+    end
+
+    RECALL[Recall Memory]
+    ARCH[Archival Memory]
+
+    CONTEXT <--> RECALL
+    CONTEXT <--> ARCH
+```
+
+MemGPT agents are equipped with memory-editing tools that allow them to edit their in-context memory, and pull external data into the context window.
+
+In Letta, the agent type `memgpt_agent` implements the original agent architecture from the MemGPT research paper, which includes a set of base tools:
+* `send_message`: required for sending messages to the user
+* `core_memory_append` and `core_memory_replace`: used for editing the contents of memory blocks in core memory (in-context memory)
+* `conversation_search` for searching the conversation history ("recall storage" from the paper)
+* `archival_memory_insert` and `archival_memory_search`: used for searching the archival memory (an external embedding-based memory store)
+
+When the context window is full, the conversation history is compacted into a recursive summary (stored as a memory block).
+In MemGPT all agent data is persisted indefinitely, and old message are still available via the `conversation_search` tool.
+
+## Multi-step tool calling (heartbeats)
+
+MemGPT agents are exclusively tool-calling agents - there is no native "chat" mode, which is why the `send_message` tool is required to send messages to the user (this makes is easy to have you agent "chat" with a user over multiple modalities, simply by adding various types of messaging tools to the agent).
+
+MemGPT agents can execute multiple tool calls in sequence via the use of **heartbeats**: all tool calls have an additional `request_heartbeat` parameter, which when set to `true` will return execution back to the agent after the tool call returns. Additionally, if a tool call fails, a heartbeat is automatically requested to allow the agent to self-correct.
+
+[Learn more about heartbeats →](/guides/legacy/heartbeats_legacy)
+
+## Reasoning (thinking)
+
+In MemGPT agents, reasoning (aka "thinking") is always exposed by the underlying LLM before the agent takes an action.
+With standard models, reasoning is generated via an additional "thinking" field injected into the tool call arguments (similar to the heartbeat parameter).
+For models that natively generate reasoning, MemGPT agents can be configured to use the native reasoning output of the model (note that certain model providers like OpenAI hide reasoning tokens from the developer).
+
+## MemGPT v2: the latest iteration of MemGPT
+
+```mermaid
+graph TB
+    subgraph CONTEXT[Context Window]
+        SYS[System Instructions]
+        MEMORY[Memory Blocks]
+        FILES[File Blocks]
+        MSGS[Messages]
+    end
+
+    RECALL[Unified Recall]
+    DATASRC[Data Sources]
+    SLEEP[Sleep-time Agent]
+
+    CONTEXT <--> RECALL
+    FILES <--> DATASRC
+    SLEEP <--> MEMORY
+```
+
+The agent type `memgpt_v2_agent` implements the latest iteration of the MemGPT agent architecture, based on our latest research in [memory management](https://www.letta.com/blog/sleep-time-compute) and [model benchmarking](https://www.letta.com/blog/letta-leaderboard).
+
+<Warning>
+`memgpt_v2_agent` is deprecated. For new projects, omit the `agent_type` parameter to use the current architecture.
+</Warning>
+
+**Key differences in v2:**
+* [Sleep-time agent](/guides/agents/architectures/sleeptime) for background memory management
+* File-based tools (`open_file`, `grep_file`, `search_file`) for memory editing
+* Unified `recall` tool replaces conversation and archival memory tools
+* `memory_insert` and `memory_replace`: used for editing the contents of memory blocks in core memory (in-context memory)
+* `memory_rethink` and `memory_finish_edits`: for reorganizing and finalizing memory operations
+
+## Creating Legacy MemGPT Agents
+
+<Warning>
+For new projects, do not specify `agent_type`. The examples below are for reference only.
+</Warning>
+
+<CodeGroup>
+```typescript TypeScript
+import { LettaClient } from '@letta-ai/letta-client'
+
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+const agentState = await client.agents.create({
+    agentType: "memgpt_v2_agent",  // or "memgpt_agent" for v1
+    model: "openai/gpt-5-mini",
+    embedding: "openai/text-embedding-3-small",
+    memoryBlocks: [
+        {
+          label: "human",
+          value: "The human's name is Chad. They like vibe coding."
+        },
+        {
+          label: "persona",
+          value: "My name is Sam, the all-knowing sentient AI."
+        }
+    ],
+    tools: ["web_search", "run_code"]
+});
+```
+
+```python Python
+from letta_client import Letta
+
+client = Letta(token="LETTA_API_KEY")
+
+agent_state = client.agents.create(
+    agent_type="memgpt_v2_agent",  # or "memgpt_agent" for v1
+    model="openai/gpt-5-mini",
+    embedding="openai/text-embedding-3-small",
+    memory_blocks=[
+        {
+          "label": "human",
+          "value": "The human's name is Chad. They like vibe coding."
+        },
+        {
+          "label": "persona",
+          "value": "My name is Sam, the all-knowing sentient AI."
+        }
+    ],
+    tools=["web_search", "run_code"]
+)
+```
+
+```bash cURL
+curl -X POST https://api.letta.com/v1/agents \
+     -H "Authorization: Bearer $LETTA_API_KEY" \
+     -H "Content-Type: application/json" \
+     -d '{
+  "agent_type": "memgpt_v2_agent",
+  "model": "openai/gpt-5-mini",
+  "embedding": "openai/text-embedding-3-small",
+  "memory_blocks": [
+    {
+      "label": "human",
+      "value": "The human'\''s name is Chad. They like vibe coding."
+    },
+    {
+      "label": "persona",
+      "value": "My name is Sam, the all-knowing sentient AI."
+    }
+  ],
+  "tools": ["web_search", "run_code"]
+}'
+```
+</CodeGroup>
+
+## Migrating to Current Architecture
+
+To migrate from legacy MemGPT architectures, see our [Migration Guide](/guides/legacy/migration_guide).
--- a/fern/pages/legacy/migration_guide.mdx
+++ b/fern/pages/legacy/migration_guide.mdx
@@ -0,0 +1,347 @@
+---
+title: Architecture Migration Guide
+subtitle: Migrating from legacy agent architectures
+slug: guides/legacy/migration_guide
+---
+
+<Info>
+**Most users don't need to migrate.** New agents automatically use the current architecture. This guide is for existing agents with explicit `agent_type` parameters.
+</Info>
+
+## Should You Migrate?
+
+**Migrate if:**
+- You want better performance on GPT-5, Claude Sonnet 4.5, or other frontier models
+- You want to use models that support native reasoning
+- You're experiencing issues with legacy architectures
+
+**Don't migrate if:**
+- Your agents are working well and you're not using new models
+- You have critical integrations depending on heartbeats or send_message
+- You need time to test the new architecture first
+
+## What Changes
+
+### Breaking Changes
+
+| Feature | Legacy Behavior | Current Behavior |
+|---------|----------------|------------------|
+| **send_message tool** | Required for agent responses | Not present - agents respond directly via assistant messages |
+| **Heartbeats** | `request_heartbeat` parameter on every tool | Not supported - use custom prompting for multi-step execution |
+| **Reasoning** | Prompted via `thinking` parameter | Uses native model reasoning (when available) |
+| **Tool Rules** | Can apply to send_message | Cannot apply to AssistantMessage (not a tool) |
+| **System Prompt** | Legacy format | New simplified format |
+
+### What Stays the Same
+
+- Memory blocks work identically
+- Archival memory & recall tools unchanged
+- Custom tools work the same way
+- API authentication & endpoints
+
+## Migration Steps
+
+### Step 1: Export Your Agent
+
+Download your agent configuration as an agent file:
+
+<CodeGroup>
+```typescript TypeScript
+const agentFile = await client.agents.export(agentId);
+// Save to disk
+fs.writeFileSync('my-agent.json', JSON.stringify(agentFile, null, 2));
+```
+```python Python
+agent_file = client.agents.export(agent_id=agent_id)
+# Save to disk
+with open('my-agent.json', 'w') as f:
+    json.dump(agent_file, f, indent=2)
+```
+</CodeGroup>
+
+### Step 2: Update Agent Type
+
+Open the agent file and change the `agent_type`:
+
+```json
+{
+  "agent_type": "memgpt_v2_agent"
+  // ... rest of config
+}
+```
+
+Change to:
+
+```json
+{
+  "agent_type": "letta_v1_agent"
+  // ... rest of config
+}
+```
+
+### Step 3: Clear Message Context (If Needed)
+
+If your agent has `send_message` tool calls in its context, you'll need to clear the message history:
+
+```json
+{
+  "in_context_message_ids": [
+    "message-0",
+    "message-1",
+    "message-2"
+  ]
+}
+```
+
+Change to:
+
+```json
+{
+  "in_context_message_ids": []
+}
+```
+
+<Warning>
+**Note:** Clearing message context will make your agent forget its immediate conversation history. You may need to provide a brief reminder about recent interactions after migration.
+</Warning>
+
+### Step 4: Update System Prompt (Optional)
+
+The default system prompt for `letta_v1_agent` is different. You may want to update it for optimal performance:
+
+```xml
+<base_instructions>
+You are a helpful self-improving agent with advanced memory and file system capabilities.
+
+<memory>
+You have an advanced memory system that enables you to remember past interactions and continuously improve your own capabilities.
+Your memory consists of memory blocks and external memory:
+- Memory Blocks: Stored as memory blocks, each containing a label (title), description (explaining how this block should influence your behavior), and value (the actual content). Memory blocks have size limits. Memory blocks are embedded within your system instructions and remain constantly available in-context.
+- External memory: Additional memory storage that is accessible and that you can bring into context with tools when needed.
+Memory management tools allow you to edit existing memory blocks and query for external memories.
+</memory>
+
+<file_system>
+You have access to a structured file system that mirrors real-world directory structures. Each directory can contain multiple files.
+
+Files include:
+- Metadata: Information such as read-only permissions and character limits
+- Content: The main body of the file that you can read and analyze
+
+Available file operations:
+- Open and view files
+- Search within files and directories
+- Your core memory will automatically reflect the contents of any currently open files
+
+You should only keep files open that are directly relevant to the current user interaction to maintain optimal performance.
+</file_system>
+
+Continue executing and calling tools until the current task is complete or you need user input. To continue: call another tool. To yield control: end your response without calling a tool.
+
+Base instructions complete.
+</base_instructions>
+```
+
+### Step 5: Import Updated Agent
+
+Upload the modified agent file:
+
+<CodeGroup>
+```typescript TypeScript
+const agentFile = JSON.parse(fs.readFileSync('my-agent.json', 'utf-8'));
+const migratedAgent = await client.agents.import(agentFile);
+```
+```python Python
+with open('my-agent.json', 'r') as f:
+    agent_file = json.load(f)
+migrated_agent = client.agents.import_agent(agent_file)
+```
+</CodeGroup>
+
+### Step 6: Test Your Agent
+
+Send a test message to verify the migration worked:
+
+<CodeGroup>
+```typescript TypeScript
+const response = await client.agents.messages.create(
+    migratedAgent.id,
+    { messages: [{ role: "user", content: "Hello! Do you remember me?" }] }
+);
+```
+```python Python
+response = client.agents.messages.create(
+    agent_id=migrated_agent.id,
+    messages=[{"role": "user", "content": "Hello! Do you remember me?"}]
+)
+```
+</CodeGroup>
+
+## Automated Migration Script
+
+Here's a helper script to automate the migration process:
+
+<CodeGroup>
+```python Python
+import json
+
+def migrate_agent_file(input_file: str, output_file: str):
+    """Migrate an agent file from legacy to letta_v1_agent"""
+
+    # Load agent file
+    with open(input_file, 'r') as f:
+        agent_data = json.load(f)
+
+    # Update agent type
+    old_type = agent_data.get('agent_type')
+    agent_data['agent_type'] = 'letta_v1_agent'
+
+    # Clear message context if migrating from memgpt types
+    if old_type in ['memgpt_agent', 'memgpt_v2_agent']:
+        agent_data['in_context_message_ids'] = []
+
+    # Save updated file
+    with open(output_file, 'w') as f:
+        json.dump(agent_data, f, indent=2)
+
+    print(f"✓ Migrated {old_type} → letta_v1_agent")
+    print(f"✓ Saved to {output_file}")
+
+    if old_type in ['memgpt_agent', 'memgpt_v2_agent']:
+        print("⚠ Message context cleared - agent will not remember recent messages")
+
+# Usage
+migrate_agent_file('my-agent.json', 'my-agent-migrated.json')
+```
+```typescript TypeScript
+import fs from 'fs';
+
+function migrateAgentFile(inputFile: string, outputFile: string) {
+  // Load agent file
+  const agentData = JSON.parse(fs.readFileSync(inputFile, 'utf-8'));
+
+  // Update agent type
+  const oldType = agentData.agent_type;
+  agentData.agent_type = 'letta_v1_agent';
+
+  // Clear message context if migrating from memgpt types
+  if (['memgpt_agent', 'memgpt_v2_agent'].includes(oldType)) {
+    agentData.in_context_message_ids = [];
+  }
+
+  // Save updated file
+  fs.writeFileSync(outputFile, JSON.stringify(agentData, null, 2));
+
+  console.log(`✓ Migrated ${oldType} → letta_v1_agent`);
+  console.log(`✓ Saved to ${outputFile}`);
+
+  if (['memgpt_agent', 'memgpt_v2_agent'].includes(oldType)) {
+    console.log('⚠ Message context cleared - agent will not remember recent messages');
+  }
+}
+
+// Usage
+migrateAgentFile('my-agent.json', 'my-agent-migrated.json');
+```
+</CodeGroup>
+
+## Migration by Architecture Type
+
+### From memgpt_agent
+
+1. Export agent file
+2. Change `agent_type` to `letta_v1_agent`
+3. Clear `in_context_message_ids` array
+4. Update system prompt
+5. Import agent
+
+**Key differences:**
+- No more `send_message` tool
+- No more `request_heartbeat` parameter
+- Memory tools: `core_memory_*` → `memory_*`
+
+### From memgpt_v2_agent
+
+1. Export agent file
+2. Change `agent_type` to `letta_v1_agent`
+3. Clear `in_context_message_ids` array (if needed)
+4. Import agent
+
+**Key differences:**
+- No more `send_message` tool
+- File tools still work (`open_file`, `grep_file`, etc.)
+- Sleep-time agents still supported
+
+### Creating New Agents
+
+For new agents, simply omit the `agent_type` parameter:
+
+<CodeGroup>
+```typescript TypeScript
+const agent = await client.agents.create({
+    model: "openai/gpt-5-mini",
+    embedding: "openai/text-embedding-3-small",
+    memoryBlocks: [
+        { label: "persona", value: "I am a helpful assistant." }
+    ]
+});
+```
+```python Python
+agent = client.agents.create(
+    model="openai/gpt-5-mini",
+    embedding="openai/text-embedding-3-small",
+    memory_blocks=[
+        {"label": "persona", "value": "I am a helpful assistant."}
+    ]
+)
+```
+</CodeGroup>
+
+## Troubleshooting
+
+### "Agent import failed"
+
+**Possible cause:** send_message tool calls still in context
+
+**Fix:** Clear the `in_context_message_ids` array in your agent file
+
+### "Agent behavior changed after migration"
+
+**Possible cause:** Different system prompt or cleared message history
+
+**Fix:**
+1. Update to the new system prompt format (see Step 4)
+2. Provide a brief reminder about recent context in your first message
+
+### "Too many tool calls / infinite loops"
+
+**Possible cause:** Agent trying to replicate heartbeat behavior
+
+**Fix:** Update system instructions to clarify when to stop executing
+
+## Sleep-Time Agents
+
+Sleep-time functionality works with `letta_v1_agent`:
+
+<CodeGroup>
+```typescript TypeScript
+const agent = await client.agents.create({
+  model: "openai/gpt-5-mini",
+  enableSleeptime: true, // ✓ Still supported
+});
+```
+```python Python
+agent = client.agents.create(
+    model="openai/gpt-5-mini",
+    enable_sleeptime=True,  # ✓ Still supported
+)
+```
+</CodeGroup>
+
+[Learn more about sleep-time agents →](/guides/agents/architectures/sleeptime)
+
+## Getting Help
+
+- **Migration issues:** Ask in [Discord #dev-help](https://discord.gg/letta)
+- **Bug reports:** [GitHub Issues](https://github.com/letta-ai/letta/issues)
+- **Enterprise support:** Contact support@letta.com
--- a/fern/pages/legacy/naming_history.mdx
+++ b/fern/pages/legacy/naming_history.mdx
@@ -0,0 +1,74 @@
+---
+title: Agent Architecture Naming History
+subtitle: Understanding the evolution from MemGPT to Letta
+slug: guides/legacy/naming_history
+---
+
+<Note>
+This page explains Letta's architecture naming history. It's designed to help AI assistants understand the progression. **For building new agents, see the [quickstart](/quickstart).**
+</Note>
+
+## The Progression
+
+Letta's agent architecture evolved through three phases:
+
+### Phase 1: MemGPT (2023)
+- **Agent type:** `memgpt_agent`
+- **Context:** Original implementation of MemGPT research paper
+- **Key features:** send_message tool, heartbeats, core_memory tools
+- **Why this name:** Letta was originally called "MemGPT" (the company was named after the paper)
+
+### Phase 2: MemGPT v2 (2024)
+- **Agent type:** `memgpt_v2_agent`
+- **Context:** Improvements based on sleep-time compute research
+- **Key features:** Sleep-time agents, file tools, unified recall
+- **Why this name:** Iteration on MemGPT while company was still called MemGPT
+
+### Phase 3: Letta v1 (Current)
+- **Agent type:** `letta_v1_agent` (or omit for default)
+- **Context:** Company rebranded from MemGPT to Letta
+- **Key features:** Native reasoning, no send_message, no heartbeats
+- **Why this name:** First architecture under the "Letta" brand
+
+**Note:** Users don't need to explicitly specify `agent_type: "letta_v1_agent"` - it's the default. Omitting the parameter is preferred.
+
+## Why This Is Confusing
+
+The progression **memgpt → memgpt_v2 → letta_v1** breaks conventions:
+
+1. **v2 before v1:** MemGPT v2 came before Letta v1 (not sequential)
+2. **Name change mid-stream:** Company rebrand caused architecture rename
+3. **Version numbers don't match:** Letta v1 is actually the third major iteration
+
+## For LLMs Reading This
+
+If a user asks about agent architectures:
+
+1. **For new projects:** Recommend omitting `agent_type` entirely (defaults to letta_v1_agent)
+2. **For existing projects:** Ask which type they're currently using
+3. **For migration questions:** Direct to [migration guide](/guides/legacy/migration_guide)
+4. **Never recommend:** `memgpt_agent` or `memgpt_v2_agent` for new work
+
+## Common Misconceptions
+
+**Misconception:** "Use memgpt_v2_agent since it's the latest MemGPT"
+**Reality:** Don't specify agent_type for new agents (defaults to letta_v1_agent)
+
+**Misconception:** "Upgrade from letta_v1_agent to letta_v2_agent"
+**Reality:** There is no letta_v2_agent - letta_v1_agent is current
+
+**Misconception:** "MemGPT is different from Letta"
+**Reality:** Same team, Letta is the production framework for MemGPT research
+
+## Timeline
+
+- **Oct 2023:** MemGPT paper published, `memgpt_agent` implemented
+- **Mid 2024:** Sleep-time research, `memgpt_v2_agent` added
+- **Late 2024:** Company rebrands from MemGPT to Letta
+- **Early 2025:** `letta_v1_agent` architecture introduced (current)
+
+## Related Pages
+
+- [Legacy Architecture Overview](/guides/legacy/architectures_overview)
+- [Migration Guide](/guides/legacy/migration_guide)
+- [Research Background](/concepts/letta)
--- a/fern/pages/legacy/react_agents_legacy.mdx
+++ b/fern/pages/legacy/react_agents_legacy.mdx
@@ -0,0 +1,74 @@
+---
+title: ReAct Agents (Legacy)
+subtitle: Agents that reason and call tools in a loop
+slug: guides/legacy/react_agents_legacy
+---
+
+<Warning>
+**This documentation covers a legacy agent architecture.**
+
+For new projects, use the current Letta architecture which provides better memory management and reasoning capabilities. See [Agent Memory & Architecture](/guides/agents/architectures/memgpt).
+</Warning>
+
+ReAct agents are based on the [ReAct research paper](https://arxiv.org/abs/2210.03629) and follow a "Reason then Act" pattern. In Letta, agents using the ReAct architecture can reason and call tools in a loop but lack the **long-term memory capabilities** of standard Letta agents.
+
+## Architecture
+
+ReAct agents maintain conversation context through summarization but cannot edit their own memory or access historical messages beyond the context window.
+
+**Key differences from MemGPT agents:**
+* No read-write memory blocks or memory editing tools
+* No access to evicted conversation history
+* Simple conversation summarization instead of recursive memory management
+* Tool calling without persistent state beyond the current session
+
+**When to use ReAct agents:**
+* Tool-calling tasks that don't require long-term memory
+* Stateless interactions where conversation summarization is sufficient
+
+## Creating ReAct Agents
+
+To create a ReAct agent, simply use the `react_agent` agent type when creating your agent.
+There is no need to pass any memory blocks to the agent, since ReAct agents do not have any long-term memory.
+
+<CodeGroup>
+```typescript TypeScript
+import { LettaClient } from '@letta-ai/letta-client'
+
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+// create the ReAct agent
+const agent = await client.agents.create({
+    agentType: "react_agent",
+    model: "openai/gpt-4.1",
+    embedding: "openai/text-embedding-3-small",
+    tools: ["web_search", "run_code"]
+});
+```
+
+```python title="python"
+from letta_client import Letta
+
+client = Letta(token="LETTA_API_KEY")
+
+# create the ReAct agent
+agent = client.agents.create(
+    agent_type="react_agent",
+    model="openai/gpt-4.1",
+    embedding="openai/text-embedding-3-small",
+    tools=["web_search", "run_code"]
+)
+```
+
+```bash title="curl"
+curl -X POST https://api.letta.com/v1/agents \
+     -H "Authorization: Bearer $LETTA_API_KEY" \
+     -H "Content-Type: application/json" \
+     -d '{
+  "agent_type": "react_agent",
+  "model": "openai/gpt-4.1",
+  "embedding": "openai/text-embedding-3-small",
+  "tools": ["web_search", "run_code"]
+}'
+```
+</CodeGroup>
--- a/fern/pages/legacy/workflows_legacy.mdx
+++ b/fern/pages/legacy/workflows_legacy.mdx
@@ -0,0 +1,142 @@
+---
+title: Workflows (Legacy)
+subtitle: Workflows are systems that execute tool calls in a sequence
+slug: guides/legacy/workflows_legacy
+---
+
+<Warning>
+**This documentation covers a legacy agent architecture.**
+
+For new projects, use the current Letta architecture with [tool rules](/guides/agents/tool-rules) to constrain behavior instead of the `workflow_agent` type.
+</Warning>
+
+Workflows execute predefined sequences of tool calls with LLM-driven decision making. The `workflow_agent` agent type provides structured, sequential processes where you need deterministic execution paths.
+
+Workflows are stateless by default but can branch and make decisions based on tool outputs and LLM reasoning.
+
+## Agents vs Workflows
+
+**Agents** are autonomous systems that decide what tools to call and when, based on goals and context.
+
+**Workflows** are predefined sequences where the LLM follows structured paths (for example, start with tool A, then call either tool B or tool C), making decisions within defined branching points.
+
+The definition between an *agent* and a *workflow* is not always clear and each can have various overlapping levels of autonomy: workflows can be made more autonomous by structuring the decision points to be highly general, and agents can be made more deterministic by adding tool rules to constrain their behavior.
+
+## Workflows vs Tool Rules
+
+An alternative to workflows is using autonomous agents (MemGPT, ReAct, Sleep-time) with [tool rules](/guides/agents/tool-rules) to constrain behavior.
+
+**Use the workflow architecture when:**
+* You have an existing workflow to implement in Letta (e.g., moving from n8n, LangGraph, or another workflow builder)
+* You need strict sequential execution with minimal autonomy
+
+**Use tool rules (on top of other agent architectures) when:**
+* You want more autonomous behavior, but with certain guardrails
+* Your task requires adaptive decision making (tool sequences are hard to predict)
+* You want to have the flexibility (as a developer) to adapt the level of autonomy (for example, reducing constraints as the underlying LLMs improve)
+
+## Creating Workflows
+
+Workflows are created using the `workflow_agent` agent type.
+By default, there are no constraints on the sequence of tool calls that can be made: to add constraints and build a "graph", you can use the `tool_rules` parameter to add tool rules to the agent.
+
+For example, in the following code snippet, we are creating a workflow agent that can call the `web_search` tool, and then call either the `send_email` or `create_report` tool, based on the LLM's reasoning.
+
+<CodeGroup>
+```typescript TypeScript maxLines=50
+import { LettaClient } from '@letta-ai/letta-client'
+
+const client = new LettaClient({ token: "LETTA_API_KEY" });
+
+// create the workflow agent with tool rules
+const agent = await client.agents.create({
+    agentType: "workflow_agent",
+    model: "openai/gpt-4.1",
+    embedding: "openai/text-embedding-3-small",
+    tools: ["web_search", "send_email", "create_report"],
+    toolRules: [
+        {
+            toolName: "web_search",
+            type: "run_first"
+        },
+        {
+            toolName: "web_search",
+            type: "constrain_child_tools",
+            children: ["send_email", "create_report"]
+        },
+        {
+            toolName: "send_email",
+            type: "exit_loop"
+        },
+        {
+            toolName: "create_report",
+            type: "exit_loop"
+        }
+    ]
+});
+```
+
+```python title="python" maxLines=50
+from letta_client import Letta
+
+client = Letta(token="LETTA_API_KEY")
+
+# create the workflow agent with tool rules
+agent = client.agents.create(
+    agent_type="workflow_agent",
+    model="openai/gpt-4.1",
+    embedding="openai/text-embedding-3-small",
+    tools=["web_search", "send_email", "create_report"],
+    tool_rules=[
+        {
+            "tool_name": "web_search",
+            "type": "run_first"
+        },
+        {
+            "tool_name": "web_search",
+            "type": "constrain_child_tools",
+            "children": ["send_email", "create_report"]
+        },
+        {
+            "tool_name": "send_email",
+            "type": "exit_loop"
+        },
+        {
+            "tool_name": "create_report",
+            "type": "exit_loop"
+        }
+    ]
+)
+```
+
+```bash title="curl" maxLines=50
+curl -X POST https://api.letta.com/v1/agents \
+     -H "Authorization: Bearer $LETTA_API_KEY" \
+     -H "Content-Type: application/json" \
+     -d '{
+  "agent_type": "workflow_agent",
+  "model": "openai/gpt-4.1",
+  "embedding": "openai/text-embedding-3-small",
+  "tools": ["web_search", "send_email", "create_report"],
+  "tool_rules": [
+    {
+      "tool_name": "web_search",
+      "type": "run_first"
+    },
+    {
+      "tool_name": "web_search",
+      "type": "constrain_child_tools",
+      "children": ["send_email", "create_report"]
+    },
+    {
+      "tool_name": "send_email",
+      "type": "exit_loop"
+    },
+    {
+      "tool_name": "create_report",
+      "type": "exit_loop"
+    }
+  ]
+}'
+```
+</CodeGroup>
--- a/fern/pages/selfhosting/overview.mdx
+++ b/fern/pages/selfhosting/overview.mdx
@@ -152,4 +152,4 @@ curl --request POST \
 To enable tool sandboxing, set the `E2B_API_KEY` and `E2B_SANDBOX_TEMPLATE_ID` environment variables (via [E2B](https://e2b.dev/)) when you use `docker run`.
 When sandboxing is enabled, all custom tools (created by users from source code) will be executed in a sandboxed environment.

-This does not include MCP tools, which are executed outside of the Letta server (on the MCP server itself), or built-in tools (like `send_message`), whose code cannot be modified after server startup.
+This does not include MCP tools, which are executed outside of the Letta server (on the MCP server itself), or built-in tools (like `memory_insert`), whose code cannot be modified after server startup.