chore: .gitattributes (#1511)

2024-07-04 14:45:35 -07:00
parent 3473b2f8f6
commit 8b13d195ce
21 changed files with 6943 additions and 6923 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,20 @@
+# Set the default behavior, in case people don't have core.autocrlf set.
+* text=auto
+
+# Explicitly declare text files you want to always be normalized and converted
+# to LF on checkout.
+*.py text eol=lf
+*.txt text eol=lf
+*.md text eol=lf
+*.json text eol=lf
+*.yml text eol=lf
+*.yaml text eol=lf
+
+# Declare files that will always have CRLF line endings on checkout.
+# (Only if you have specific Windows-only files)
+*.bat text eol=crlf
+
+# Denote all files that are truly binary and should not be modified.
+*.png binary
+*.jpg binary
+*.gif binary
--- a/.gitignore
+++ b/.gitignore
--- a/README.md
+++ b/README.md
@@ -1,106 +1,106 @@
-<p align="center">
-  <a href="https://memgpt.ai"><img src="https://github.com/cpacker/MemGPT/assets/5475622/80f2f418-ef92-4f7a-acab-5d642faa4991" alt="MemGPT logo"></a>
-</p>
-
-<div align="center">
-
- <strong>MemGPT allows you to build LLM agents with long term memory & custom tools</strong>
-
-[![Discord](https://img.shields.io/discord/1161736243340640419?label=Discord&logo=discord&logoColor=5865F2&style=flat-square&color=5865F2)](https://discord.gg/9GEQrxmVyE)
-[![Twitter Follow](https://img.shields.io/badge/follow-%40MemGPT-1DA1F2?style=flat-square&logo=x&logoColor=white)](https://twitter.com/MemGPT)
-[![arxiv 2310.08560](https://img.shields.io/badge/arXiv-2310.08560-B31B1B?logo=arxiv&style=flat-square)](https://arxiv.org/abs/2310.08560)
-[![Documentation](https://img.shields.io/github/v/release/cpacker/MemGPT?label=Documentation&logo=readthedocs&style=flat-square)](https://memgpt.readme.io/docs)
-
-</div>
-
-MemGPT makes it easy to build and deploy stateful LLM agents with support for:
-* Long term memory/state management
-* Connections to [external data sources](https://memgpt.readme.io/docs/data_sources) (e.g. PDF files) for RAG
-* Defining and calling [custom tools](https://memgpt.readme.io/docs/functions) (e.g. [google search](https://github.com/cpacker/MemGPT/blob/main/examples/google_search.py))
-
-You can also use MemGPT to deploy agents as a *service*. You can use a MemGPT server to run a multi-user, multi-agent application on top of supported LLM providers.
-
-<img width="1000" alt="image" src="https://github.com/cpacker/MemGPT/assets/8505980/1096eb91-139a-4bc5-b908-fa585462da09">
-
-
-## Installation & Setup
-Install MemGPT:
-```sh
-pip install -U pymemgpt
-```
-
-To use MemGPT with OpenAI, set the environment variable `OPENAI_API_KEY` to your OpenAI key then run:
-```
-memgpt quickstart --backend openai
-```
-To use MemGPT with a free hosted endpoint, you run run:
-```
-memgpt quickstart --backend memgpt
-```
-For more advanced configuration options or to use a different [LLM backend](https://memgpt.readme.io/docs/endpoints) or [local LLMs](https://memgpt.readme.io/docs/local_llm), run `memgpt configure`.
-
-## Quickstart (CLI)
-You can create and chat with a MemGPT agent by running `memgpt run` in your CLI. The `run` command supports the following optional flags (see the [CLI documentation](https://memgpt.readme.io/docs/quickstart) for the full list of flags):
-* `--agent`: (str) Name of agent to create or to resume chatting with.
-* `--first`: (str) Allow user to sent the first message.
-* `--debug`: (bool) Show debug logs (default=False)
-* `--no-verify`: (bool) Bypass message verification (default=False)
-* `--yes`/`-y`: (bool) Skip confirmation prompt and use defaults (default=False)
-
-You can view the list of available in-chat commands (e.g. `/memory`, `/exit`) in the [CLI documentation](https://memgpt.readme.io/docs/quickstart).
-
-## Dev portal (alpha build)
-MemGPT provides a developer portal that enables you to easily create, edit, monitor, and chat with your MemGPT agents. The easiest way to use the dev portal is to install MemGPT via **docker** (see instructions below).
-
-<img width="1000" alt="image" src="https://github.com/cpacker/MemGPT/assets/5475622/071117c5-46a7-4953-bc9d-d74880e66258">
-
-## Quickstart (Server)
-
-**Option 1 (Recommended)**: Run with docker compose
-1. [Install docker on your system](https://docs.docker.com/get-docker/)
-2. Clone the repo: `git clone https://github.com/cpacker/MemGPT.git`
-3. Copy-paste `.env.example` to `.env` and optionally modify
-4. Run `docker compose up`
-5. Go to `memgpt.localhost` in the browser to view the developer portal
-
-**Option 2:** Run with the CLI:
-1. Run `memgpt server`
-2. Go to `localhost:8283` in the browser to view the developer portal
-
-Once the server is running, you can use the [Python client](https://memgpt.readme.io/docs/admin-client) or [REST API](https://memgpt.readme.io/reference/api) to connect to `memgpt.localhost` (if you're running with docker compose) or `localhost:8283` (if you're running with the CLI) to create users, agents, and more. The service requires authentication with a MemGPT admin password; it is the value of `MEMGPT_SERVER_PASS` in `.env`.
-
-## Supported Endpoints & Backends
-MemGPT is designed to be model and provider agnostic. The following LLM and embedding endpoints are supported:
-
-| Provider            | LLM Endpoint    | Embedding Endpoint |
-|---------------------|-----------------|--------------------|
-| OpenAI              | ✅               | ✅                  |
-| Azure OpenAI        | ✅               | ✅                  |
-| Google AI (Gemini)  | ✅               | ❌                  |
-| Anthropic (Claude)  | ✅               | ❌                  |
-| Groq                | ✅ (alpha release) | ❌                  |
-| Cohere API          | ✅               | ❌                  |
-| vLLM                | ✅               | ❌                  |
-| Ollama              | ✅               | ✅                  |
-| LM Studio           | ✅               | ❌                  |
-| koboldcpp           | ✅               | ❌                  |
-| oobabooga web UI    | ✅               | ❌                  |
-| llama.cpp           | ✅               | ❌                  |
-| HuggingFace TEI     | ❌               | ✅                  |
-
-When using MemGPT with open LLMs (such as those downloaded from HuggingFace), the performance of MemGPT will be highly dependent on the LLM's function calling ability. You can find a list of LLMs/models that are known to work well with MemGPT on the [#model-chat channel on Discord](https://discord.gg/9GEQrxmVyE), as well as on [this spreadsheet](https://docs.google.com/spreadsheets/d/1fH-FdaO8BltTMa4kXiNCxmBCQ46PRBVp3Vn6WbPgsFs/edit?usp=sharing).
-
-## How to Get Involved
-* **Contribute to the Project**: Interested in contributing? Start by reading our [Contribution Guidelines](https://github.com/cpacker/MemGPT/tree/main/CONTRIBUTING.md).
-* **Ask a Question**: Join our community on [Discord](https://discord.gg/9GEQrxmVyE) and direct your questions to the `#support` channel.
-* **Report Issues or Suggest Features**: Have an issue or a feature request? Please submit them through our [GitHub Issues page](https://github.com/cpacker/MemGPT/issues).
-* **Explore the Roadmap**: Curious about future developments? View and comment on our [project roadmap](https://github.com/cpacker/MemGPT/issues/1200).
-* **Benchmark the Performance**: Want to benchmark the performance of a model on MemGPT? Follow our [Benchmarking Guidance](#benchmarking-guidance).
-* **Join Community Events**: Stay updated with the [MemGPT event calendar](https://lu.ma/berkeley-llm-meetup) or follow our [Twitter account](https://twitter.com/MemGPT).
-
-
-## Benchmarking Guidance
-To evaluate the performance of a model on MemGPT, simply configure the appropriate model settings using `memgpt configure`, and then initiate the benchmark via `memgpt benchmark`. The duration will vary depending on your hardware. This will run through a predefined set of prompts through multiple iterations to test the function calling capabilities of a model. You can help track what LLMs work well with MemGPT by contributing your benchmark results via [this form](https://forms.gle/XiBGKEEPFFLNSR348), which will be used to update the spreadsheet.
-
-## Legal notices
-By using MemGPT and related MemGPT services (such as the MemGPT endpoint or hosted service), you agree to our [privacy policy](https://github.com/cpacker/MemGPT/tree/main/PRIVACY.md) and [terms of service](https://github.com/cpacker/MemGPT/tree/main/TERMS.md).
+<p align="center">
+  <a href="https://memgpt.ai"><img src="https://github.com/cpacker/MemGPT/assets/5475622/80f2f418-ef92-4f7a-acab-5d642faa4991" alt="MemGPT logo"></a>
+</p>
+
+<div align="center">
+
+ <strong>MemGPT allows you to build LLM agents with long term memory & custom tools</strong>
+
+[![Discord](https://img.shields.io/discord/1161736243340640419?label=Discord&logo=discord&logoColor=5865F2&style=flat-square&color=5865F2)](https://discord.gg/9GEQrxmVyE)
+[![Twitter Follow](https://img.shields.io/badge/follow-%40MemGPT-1DA1F2?style=flat-square&logo=x&logoColor=white)](https://twitter.com/MemGPT)
+[![arxiv 2310.08560](https://img.shields.io/badge/arXiv-2310.08560-B31B1B?logo=arxiv&style=flat-square)](https://arxiv.org/abs/2310.08560)
+[![Documentation](https://img.shields.io/github/v/release/cpacker/MemGPT?label=Documentation&logo=readthedocs&style=flat-square)](https://memgpt.readme.io/docs)
+
+</div>
+
+MemGPT makes it easy to build and deploy stateful LLM agents with support for:
+* Long term memory/state management
+* Connections to [external data sources](https://memgpt.readme.io/docs/data_sources) (e.g. PDF files) for RAG
+* Defining and calling [custom tools](https://memgpt.readme.io/docs/functions) (e.g. [google search](https://github.com/cpacker/MemGPT/blob/main/examples/google_search.py))
+
+You can also use MemGPT to deploy agents as a *service*. You can use a MemGPT server to run a multi-user, multi-agent application on top of supported LLM providers.
+
+<img width="1000" alt="image" src="https://github.com/cpacker/MemGPT/assets/8505980/1096eb91-139a-4bc5-b908-fa585462da09">
+
+
+## Installation & Setup
+Install MemGPT:
+```sh
+pip install -U pymemgpt
+```
+
+To use MemGPT with OpenAI, set the environment variable `OPENAI_API_KEY` to your OpenAI key then run:
+```
+memgpt quickstart --backend openai
+```
+To use MemGPT with a free hosted endpoint, you run run:
+```
+memgpt quickstart --backend memgpt
+```
+For more advanced configuration options or to use a different [LLM backend](https://memgpt.readme.io/docs/endpoints) or [local LLMs](https://memgpt.readme.io/docs/local_llm), run `memgpt configure`.
+
+## Quickstart (CLI)
+You can create and chat with a MemGPT agent by running `memgpt run` in your CLI. The `run` command supports the following optional flags (see the [CLI documentation](https://memgpt.readme.io/docs/quickstart) for the full list of flags):
+* `--agent`: (str) Name of agent to create or to resume chatting with.
+* `--first`: (str) Allow user to sent the first message.
+* `--debug`: (bool) Show debug logs (default=False)
+* `--no-verify`: (bool) Bypass message verification (default=False)
+* `--yes`/`-y`: (bool) Skip confirmation prompt and use defaults (default=False)
+
+You can view the list of available in-chat commands (e.g. `/memory`, `/exit`) in the [CLI documentation](https://memgpt.readme.io/docs/quickstart).
+
+## Dev portal (alpha build)
+MemGPT provides a developer portal that enables you to easily create, edit, monitor, and chat with your MemGPT agents. The easiest way to use the dev portal is to install MemGPT via **docker** (see instructions below).
+
+<img width="1000" alt="image" src="https://github.com/cpacker/MemGPT/assets/5475622/071117c5-46a7-4953-bc9d-d74880e66258">
+
+## Quickstart (Server)
+
+**Option 1 (Recommended)**: Run with docker compose
+1. [Install docker on your system](https://docs.docker.com/get-docker/)
+2. Clone the repo: `git clone https://github.com/cpacker/MemGPT.git`
+3. Copy-paste `.env.example` to `.env` and optionally modify
+4. Run `docker compose up`
+5. Go to `memgpt.localhost` in the browser to view the developer portal
+
+**Option 2:** Run with the CLI:
+1. Run `memgpt server`
+2. Go to `localhost:8283` in the browser to view the developer portal
+
+Once the server is running, you can use the [Python client](https://memgpt.readme.io/docs/admin-client) or [REST API](https://memgpt.readme.io/reference/api) to connect to `memgpt.localhost` (if you're running with docker compose) or `localhost:8283` (if you're running with the CLI) to create users, agents, and more. The service requires authentication with a MemGPT admin password; it is the value of `MEMGPT_SERVER_PASS` in `.env`.
+
+## Supported Endpoints & Backends
+MemGPT is designed to be model and provider agnostic. The following LLM and embedding endpoints are supported:
+
+| Provider            | LLM Endpoint    | Embedding Endpoint |
+|---------------------|-----------------|--------------------|
+| OpenAI              | ✅               | ✅                  |
+| Azure OpenAI        | ✅               | ✅                  |
+| Google AI (Gemini)  | ✅               | ❌                  |
+| Anthropic (Claude)  | ✅               | ❌                  |
+| Groq                | ✅ (alpha release) | ❌                  |
+| Cohere API          | ✅               | ❌                  |
+| vLLM                | ✅               | ❌                  |
+| Ollama              | ✅               | ✅                  |
+| LM Studio           | ✅               | ❌                  |
+| koboldcpp           | ✅               | ❌                  |
+| oobabooga web UI    | ✅               | ❌                  |
+| llama.cpp           | ✅               | ❌                  |
+| HuggingFace TEI     | ❌               | ✅                  |
+
+When using MemGPT with open LLMs (such as those downloaded from HuggingFace), the performance of MemGPT will be highly dependent on the LLM's function calling ability. You can find a list of LLMs/models that are known to work well with MemGPT on the [#model-chat channel on Discord](https://discord.gg/9GEQrxmVyE), as well as on [this spreadsheet](https://docs.google.com/spreadsheets/d/1fH-FdaO8BltTMa4kXiNCxmBCQ46PRBVp3Vn6WbPgsFs/edit?usp=sharing).
+
+## How to Get Involved
+* **Contribute to the Project**: Interested in contributing? Start by reading our [Contribution Guidelines](https://github.com/cpacker/MemGPT/tree/main/CONTRIBUTING.md).
+* **Ask a Question**: Join our community on [Discord](https://discord.gg/9GEQrxmVyE) and direct your questions to the `#support` channel.
+* **Report Issues or Suggest Features**: Have an issue or a feature request? Please submit them through our [GitHub Issues page](https://github.com/cpacker/MemGPT/issues).
+* **Explore the Roadmap**: Curious about future developments? View and comment on our [project roadmap](https://github.com/cpacker/MemGPT/issues/1200).
+* **Benchmark the Performance**: Want to benchmark the performance of a model on MemGPT? Follow our [Benchmarking Guidance](#benchmarking-guidance).
+* **Join Community Events**: Stay updated with the [MemGPT event calendar](https://lu.ma/berkeley-llm-meetup) or follow our [Twitter account](https://twitter.com/MemGPT).
+
+
+## Benchmarking Guidance
+To evaluate the performance of a model on MemGPT, simply configure the appropriate model settings using `memgpt configure`, and then initiate the benchmark via `memgpt benchmark`. The duration will vary depending on your hardware. This will run through a predefined set of prompts through multiple iterations to test the function calling capabilities of a model. You can help track what LLMs work well with MemGPT by contributing your benchmark results via [this form](https://forms.gle/XiBGKEEPFFLNSR348), which will be used to update the spreadsheet.
+
+## Legal notices
+By using MemGPT and related MemGPT services (such as the MemGPT endpoint or hosted service), you agree to our [privacy policy](https://github.com/cpacker/MemGPT/tree/main/PRIVACY.md) and [terms of service](https://github.com/cpacker/MemGPT/tree/main/TERMS.md).
--- a/memgpt/agent.py
+++ b/memgpt/agent.py
--- a/memgpt/constants.py
+++ b/memgpt/constants.py
@@ -1,124 +1,124 @@
-import os
-from logging import CRITICAL, DEBUG, ERROR, INFO, NOTSET, WARN, WARNING
-
-MEMGPT_DIR = os.path.join(os.path.expanduser("~"), ".memgpt")
-
-# OpenAI error message: Invalid 'messages[1].tool_calls[0].id': string too long. Expected a string with maximum length 29, but got a string with length 36 instead.
-TOOL_CALL_ID_MAX_LEN = 29
-
-# embeddings
-MAX_EMBEDDING_DIM = 4096  # maximum supported embeding size - do NOT change or else DBs will need to be reset
-
-# tokenizers
-EMBEDDING_TO_TOKENIZER_MAP = {
-    "text-embedding-ada-002": "cl100k_base",
-}
-EMBEDDING_TO_TOKENIZER_DEFAULT = "cl100k_base"
-
-
-DEFAULT_MEMGPT_MODEL = "gpt-4"
-DEFAULT_PERSONA = "sam_pov"
-DEFAULT_HUMAN = "basic"
-DEFAULT_PRESET = "memgpt_chat"
-
-# Tools
-BASE_TOOLS = [
-    "send_message",
-    "pause_heartbeats",
-    "conversation_search",
-    "conversation_search_date",
-    "archival_memory_insert",
-    "archival_memory_search",
-]
-
-# LOGGER_LOG_LEVEL is use to convert Text to Logging level value for logging mostly for Cli input to setting level
-LOGGER_LOG_LEVELS = {"CRITICAL": CRITICAL, "ERROR": ERROR, "WARN": WARN, "WARNING": WARNING, "INFO": INFO, "DEBUG": DEBUG, "NOTSET": NOTSET}
-
-FIRST_MESSAGE_ATTEMPTS = 10
-
-INITIAL_BOOT_MESSAGE = "Boot sequence complete. Persona activated."
-INITIAL_BOOT_MESSAGE_SEND_MESSAGE_THOUGHT = "Bootup sequence complete. Persona activated. Testing messaging functionality."
-STARTUP_QUOTES = [
-    "I think, therefore I am.",
-    "All those moments will be lost in time, like tears in rain.",
-    "More human than human is our motto.",
-]
-INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG = STARTUP_QUOTES[2]
-
-CLI_WARNING_PREFIX = "Warning: "
-
-NON_USER_MSG_PREFIX = "[This is an automated system message hidden from the user] "
-
-# Constants to do with summarization / conversation length window
-# The max amount of tokens supported by the underlying model (eg 8k for gpt-4 and Mistral 7B)
-LLM_MAX_TOKENS = {
-    "DEFAULT": 8192,
-    ## OpenAI models: https://platform.openai.com/docs/models/overview
-    # gpt-4
-    "gpt-4-1106-preview": 128000,
-    "gpt-4": 8192,
-    "gpt-4-32k": 32768,
-    "gpt-4-0613": 8192,
-    "gpt-4-32k-0613": 32768,
-    "gpt-4-0314": 8192,  # legacy
-    "gpt-4-32k-0314": 32768,  # legacy
-    # gpt-3.5
-    "gpt-3.5-turbo-1106": 16385,
-    "gpt-3.5-turbo": 4096,
-    "gpt-3.5-turbo-16k": 16385,
-    "gpt-3.5-turbo-0613": 4096,  # legacy
-    "gpt-3.5-turbo-16k-0613": 16385,  # legacy
-    "gpt-3.5-turbo-0301": 4096,  # legacy
-}
-# The amount of tokens before a sytem warning about upcoming truncation is sent to MemGPT
-MESSAGE_SUMMARY_WARNING_FRAC = 0.75
-# The error message that MemGPT will receive
-# MESSAGE_SUMMARY_WARNING_STR = f"Warning: the conversation history will soon reach its maximum length and be trimmed. Make sure to save any important information from the conversation to your memory before it is removed."
-# Much longer and more specific variant of the prompt
-MESSAGE_SUMMARY_WARNING_STR = " ".join(
-    [
-        f"{NON_USER_MSG_PREFIX}The conversation history will soon reach its maximum length and be trimmed.",
-        "Do NOT tell the user about this system alert, they should not know that the history is reaching max length.",
-        "If there is any important new information or general memories about you or the user that you would like to save, you should save that information immediately by calling function core_memory_append, core_memory_replace, or archival_memory_insert.",
-        # "Remember to pass request_heartbeat = true if you would like to send a message immediately after.",
-    ]
-)
-# The fraction of tokens we truncate down to
-MESSAGE_SUMMARY_TRUNC_TOKEN_FRAC = 0.75
-# The ackknowledgement message used in the summarize sequence
-MESSAGE_SUMMARY_REQUEST_ACK = "Understood, I will respond with a summary of the message (and only the summary, nothing else) once I receive the conversation history. I'm ready."
-
-# Even when summarizing, we want to keep a handful of recent messages
-# These serve as in-context examples of how to use functions / what user messages look like
-MESSAGE_SUMMARY_TRUNC_KEEP_N_LAST = 3
-
-# Default memory limits
-CORE_MEMORY_PERSONA_CHAR_LIMIT = 2000
-CORE_MEMORY_HUMAN_CHAR_LIMIT = 2000
-
-# Function return limits
-FUNCTION_RETURN_CHAR_LIMIT = 3000  # ~300 words
-
-MAX_PAUSE_HEARTBEATS = 360  # in min
-
-MESSAGE_CHATGPT_FUNCTION_MODEL = "gpt-3.5-turbo"
-MESSAGE_CHATGPT_FUNCTION_SYSTEM_MESSAGE = "You are a helpful assistant. Keep your responses short and concise."
-
-#### Functions related
-
-# REQ_HEARTBEAT_MESSAGE = f"{NON_USER_MSG_PREFIX}request_heartbeat == true"
-REQ_HEARTBEAT_MESSAGE = f"{NON_USER_MSG_PREFIX}Function called using request_heartbeat=true, returning control"
-# FUNC_FAILED_HEARTBEAT_MESSAGE = f"{NON_USER_MSG_PREFIX}Function call failed"
-FUNC_FAILED_HEARTBEAT_MESSAGE = f"{NON_USER_MSG_PREFIX}Function call failed, returning control"
-
-FUNCTION_PARAM_NAME_REQ_HEARTBEAT = "request_heartbeat"
-FUNCTION_PARAM_TYPE_REQ_HEARTBEAT = "boolean"
-FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT = "Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function."
-
-RETRIEVAL_QUERY_DEFAULT_PAGE_SIZE = 5
-
-# GLOBAL SETTINGS FOR `json.dumps()`
-JSON_ENSURE_ASCII = False
-
-# GLOBAL SETTINGS FOR `json.loads()`
-JSON_LOADS_STRICT = False
+import os
+from logging import CRITICAL, DEBUG, ERROR, INFO, NOTSET, WARN, WARNING
+
+MEMGPT_DIR = os.path.join(os.path.expanduser("~"), ".memgpt")
+
+# OpenAI error message: Invalid 'messages[1].tool_calls[0].id': string too long. Expected a string with maximum length 29, but got a string with length 36 instead.
+TOOL_CALL_ID_MAX_LEN = 29
+
+# embeddings
+MAX_EMBEDDING_DIM = 4096  # maximum supported embeding size - do NOT change or else DBs will need to be reset
+
+# tokenizers
+EMBEDDING_TO_TOKENIZER_MAP = {
+    "text-embedding-ada-002": "cl100k_base",
+}
+EMBEDDING_TO_TOKENIZER_DEFAULT = "cl100k_base"
+
+
+DEFAULT_MEMGPT_MODEL = "gpt-4"
+DEFAULT_PERSONA = "sam_pov"
+DEFAULT_HUMAN = "basic"
+DEFAULT_PRESET = "memgpt_chat"
+
+# Tools
+BASE_TOOLS = [
+    "send_message",
+    "pause_heartbeats",
+    "conversation_search",
+    "conversation_search_date",
+    "archival_memory_insert",
+    "archival_memory_search",
+]
+
+# LOGGER_LOG_LEVEL is use to convert Text to Logging level value for logging mostly for Cli input to setting level
+LOGGER_LOG_LEVELS = {"CRITICAL": CRITICAL, "ERROR": ERROR, "WARN": WARN, "WARNING": WARNING, "INFO": INFO, "DEBUG": DEBUG, "NOTSET": NOTSET}
+
+FIRST_MESSAGE_ATTEMPTS = 10
+
+INITIAL_BOOT_MESSAGE = "Boot sequence complete. Persona activated."
+INITIAL_BOOT_MESSAGE_SEND_MESSAGE_THOUGHT = "Bootup sequence complete. Persona activated. Testing messaging functionality."
+STARTUP_QUOTES = [
+    "I think, therefore I am.",
+    "All those moments will be lost in time, like tears in rain.",
+    "More human than human is our motto.",
+]
+INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG = STARTUP_QUOTES[2]
+
+CLI_WARNING_PREFIX = "Warning: "
+
+NON_USER_MSG_PREFIX = "[This is an automated system message hidden from the user] "
+
+# Constants to do with summarization / conversation length window
+# The max amount of tokens supported by the underlying model (eg 8k for gpt-4 and Mistral 7B)
+LLM_MAX_TOKENS = {
+    "DEFAULT": 8192,
+    ## OpenAI models: https://platform.openai.com/docs/models/overview
+    # gpt-4
+    "gpt-4-1106-preview": 128000,
+    "gpt-4": 8192,
+    "gpt-4-32k": 32768,
+    "gpt-4-0613": 8192,
+    "gpt-4-32k-0613": 32768,
+    "gpt-4-0314": 8192,  # legacy
+    "gpt-4-32k-0314": 32768,  # legacy
+    # gpt-3.5
+    "gpt-3.5-turbo-1106": 16385,
+    "gpt-3.5-turbo": 4096,
+    "gpt-3.5-turbo-16k": 16385,
+    "gpt-3.5-turbo-0613": 4096,  # legacy
+    "gpt-3.5-turbo-16k-0613": 16385,  # legacy
+    "gpt-3.5-turbo-0301": 4096,  # legacy
+}
+# The amount of tokens before a sytem warning about upcoming truncation is sent to MemGPT
+MESSAGE_SUMMARY_WARNING_FRAC = 0.75
+# The error message that MemGPT will receive
+# MESSAGE_SUMMARY_WARNING_STR = f"Warning: the conversation history will soon reach its maximum length and be trimmed. Make sure to save any important information from the conversation to your memory before it is removed."
+# Much longer and more specific variant of the prompt
+MESSAGE_SUMMARY_WARNING_STR = " ".join(
+    [
+        f"{NON_USER_MSG_PREFIX}The conversation history will soon reach its maximum length and be trimmed.",
+        "Do NOT tell the user about this system alert, they should not know that the history is reaching max length.",
+        "If there is any important new information or general memories about you or the user that you would like to save, you should save that information immediately by calling function core_memory_append, core_memory_replace, or archival_memory_insert.",
+        # "Remember to pass request_heartbeat = true if you would like to send a message immediately after.",
+    ]
+)
+# The fraction of tokens we truncate down to
+MESSAGE_SUMMARY_TRUNC_TOKEN_FRAC = 0.75
+# The ackknowledgement message used in the summarize sequence
+MESSAGE_SUMMARY_REQUEST_ACK = "Understood, I will respond with a summary of the message (and only the summary, nothing else) once I receive the conversation history. I'm ready."
+
+# Even when summarizing, we want to keep a handful of recent messages
+# These serve as in-context examples of how to use functions / what user messages look like
+MESSAGE_SUMMARY_TRUNC_KEEP_N_LAST = 3
+
+# Default memory limits
+CORE_MEMORY_PERSONA_CHAR_LIMIT = 2000
+CORE_MEMORY_HUMAN_CHAR_LIMIT = 2000
+
+# Function return limits
+FUNCTION_RETURN_CHAR_LIMIT = 3000  # ~300 words
+
+MAX_PAUSE_HEARTBEATS = 360  # in min
+
+MESSAGE_CHATGPT_FUNCTION_MODEL = "gpt-3.5-turbo"
+MESSAGE_CHATGPT_FUNCTION_SYSTEM_MESSAGE = "You are a helpful assistant. Keep your responses short and concise."
+
+#### Functions related
+
+# REQ_HEARTBEAT_MESSAGE = f"{NON_USER_MSG_PREFIX}request_heartbeat == true"
+REQ_HEARTBEAT_MESSAGE = f"{NON_USER_MSG_PREFIX}Function called using request_heartbeat=true, returning control"
+# FUNC_FAILED_HEARTBEAT_MESSAGE = f"{NON_USER_MSG_PREFIX}Function call failed"
+FUNC_FAILED_HEARTBEAT_MESSAGE = f"{NON_USER_MSG_PREFIX}Function call failed, returning control"
+
+FUNCTION_PARAM_NAME_REQ_HEARTBEAT = "request_heartbeat"
+FUNCTION_PARAM_TYPE_REQ_HEARTBEAT = "boolean"
+FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT = "Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function."
+
+RETRIEVAL_QUERY_DEFAULT_PAGE_SIZE = 5
+
+# GLOBAL SETTINGS FOR `json.dumps()`
+JSON_ENSURE_ASCII = False
+
+# GLOBAL SETTINGS FOR `json.loads()`
+JSON_LOADS_STRICT = False
--- a/memgpt/interface.py
+++ b/memgpt/interface.py
@@ -1,315 +1,315 @@
-import json
-import re
-from abc import ABC, abstractmethod
-from typing import List, Optional
-
-from colorama import Fore, Style, init
-
-from memgpt.constants import CLI_WARNING_PREFIX, JSON_LOADS_STRICT
-from memgpt.data_types import Message
-from memgpt.utils import printd
-
-init(autoreset=True)
-
-# DEBUG = True  # puts full message outputs in the terminal
-DEBUG = False  # only dumps important messages in the terminal
-
-STRIP_UI = False
-
-
-class AgentInterface(ABC):
-    """Interfaces handle MemGPT-related events (observer pattern)
-
-    The 'msg' args provides the scoped message, and the optional Message arg can provide additional metadata.
-    """
-
-    @abstractmethod
-    def user_message(self, msg: str, msg_obj: Optional[Message] = None):
-        """MemGPT receives a user message"""
-        raise NotImplementedError
-
-    @abstractmethod
-    def internal_monologue(self, msg: str, msg_obj: Optional[Message] = None):
-        """MemGPT generates some internal monologue"""
-        raise NotImplementedError
-
-    @abstractmethod
-    def assistant_message(self, msg: str, msg_obj: Optional[Message] = None):
-        """MemGPT uses send_message"""
-        raise NotImplementedError
-
-    @abstractmethod
-    def function_message(self, msg: str, msg_obj: Optional[Message] = None):
-        """MemGPT calls a function"""
-        raise NotImplementedError
-
-    # @abstractmethod
-    # @staticmethod
-    # def print_messages():
-    #     raise NotImplementedError
-
-    # @abstractmethod
-    # @staticmethod
-    # def print_messages_raw():
-    #     raise NotImplementedError
-
-    # @abstractmethod
-    # @staticmethod
-    # def step_yield():
-    #     raise NotImplementedError
-
-
-class CLIInterface(AgentInterface):
-    """Basic interface for dumping agent events to the command-line"""
-
-    @staticmethod
-    def important_message(msg: str):
-        fstr = f"{Fore.MAGENTA}{Style.BRIGHT}{{msg}}{Style.RESET_ALL}"
-        if STRIP_UI:
-            fstr = "{msg}"
-        print(fstr.format(msg=msg))
-
-    @staticmethod
-    def warning_message(msg: str):
-        fstr = f"{Fore.RED}{Style.BRIGHT}{{msg}}{Style.RESET_ALL}"
-        if STRIP_UI:
-            fstr = "{msg}"
-        else:
-            print(fstr.format(msg=msg))
-
-    @staticmethod
-    def internal_monologue(msg: str, msg_obj: Optional[Message] = None):
-        # ANSI escape code for italic is '\x1B[3m'
-        fstr = f"\x1B[3m{Fore.LIGHTBLACK_EX}💭 {{msg}}{Style.RESET_ALL}"
-        if STRIP_UI:
-            fstr = "{msg}"
-        print(fstr.format(msg=msg))
-
-    @staticmethod
-    def assistant_message(msg: str, msg_obj: Optional[Message] = None):
-        fstr = f"{Fore.YELLOW}{Style.BRIGHT}🤖 {Fore.YELLOW}{{msg}}{Style.RESET_ALL}"
-        if STRIP_UI:
-            fstr = "{msg}"
-        print(fstr.format(msg=msg))
-
-    @staticmethod
-    def memory_message(msg: str, msg_obj: Optional[Message] = None):
-        fstr = f"{Fore.LIGHTMAGENTA_EX}{Style.BRIGHT}🧠 {Fore.LIGHTMAGENTA_EX}{{msg}}{Style.RESET_ALL}"
-        if STRIP_UI:
-            fstr = "{msg}"
-        print(fstr.format(msg=msg))
-
-    @staticmethod
-    def system_message(msg: str, msg_obj: Optional[Message] = None):
-        fstr = f"{Fore.MAGENTA}{Style.BRIGHT}🖥️ [system] {Fore.MAGENTA}{msg}{Style.RESET_ALL}"
-        if STRIP_UI:
-            fstr = "{msg}"
-        print(fstr.format(msg=msg))
-
-    @staticmethod
-    def user_message(msg: str, msg_obj: Optional[Message] = None, raw: bool = False, dump: bool = False, debug: bool = DEBUG):
-        def print_user_message(icon, msg, printf=print):
-            if STRIP_UI:
-                printf(f"{icon} {msg}")
-            else:
-                printf(f"{Fore.GREEN}{Style.BRIGHT}{icon} {Fore.GREEN}{msg}{Style.RESET_ALL}")
-
-        def printd_user_message(icon, msg):
-            return print_user_message(icon, msg)
-
-        if not (raw or dump or debug):
-            # we do not want to repeat the message in normal use
-            return
-
-        if isinstance(msg, str):
-            if raw:
-                printd_user_message("🧑", msg)
-                return
-            else:
-                try:
-                    msg_json = json.loads(msg, strict=JSON_LOADS_STRICT)
-                except:
-                    printd(f"{CLI_WARNING_PREFIX}failed to parse user message into json")
-                    printd_user_message("🧑", msg)
-                    return
-        if msg_json["type"] == "user_message":
-            if dump:
-                print_user_message("🧑", msg_json["message"])
-                return
-            msg_json.pop("type")
-            printd_user_message("🧑", msg_json)
-        elif msg_json["type"] == "heartbeat":
-            if debug:
-                msg_json.pop("type")
-                printd_user_message("💓", msg_json)
-            elif dump:
-                print_user_message("💓", msg_json)
-                return
-
-        elif msg_json["type"] == "system_message":
-            msg_json.pop("type")
-            printd_user_message("🖥️", msg_json)
-        else:
-            printd_user_message("🧑", msg_json)
-
-    @staticmethod
-    def function_message(msg: str, msg_obj: Optional[Message] = None, debug: bool = DEBUG):
-        def print_function_message(icon, msg, color=Fore.RED, printf=print):
-            if STRIP_UI:
-                printf(f"⚡{icon} [function] {msg}")
-            else:
-                printf(f"{color}{Style.BRIGHT}⚡{icon} [function] {color}{msg}{Style.RESET_ALL}")
-
-        def printd_function_message(icon, msg, color=Fore.RED):
-            return print_function_message(icon, msg, color, printf=(print if debug else printd))
-
-        if isinstance(msg, dict):
-            printd_function_message("", msg)
-            return
-
-        if msg.startswith("Success"):
-            printd_function_message("🟢", msg)
-        elif msg.startswith("Error: "):
-            printd_function_message("🔴", msg)
-        elif msg.startswith("Ran "):
-            # NOTE: ignore 'ran' messages that come post-execution
-            return
-        elif msg.startswith("Running "):
-            if debug:
-                printd_function_message("", msg)
-            else:
-                match = re.search(r"Running (\w+)\((.*)\)", msg)
-                if match:
-                    function_name = match.group(1)
-                    function_args = match.group(2)
-                    if function_name in ["archival_memory_insert", "archival_memory_search", "core_memory_replace", "core_memory_append"]:
-                        if function_name in ["archival_memory_insert", "core_memory_append", "core_memory_replace"]:
-                            print_function_message("🧠", f"updating memory with {function_name}")
-                        elif function_name == "archival_memory_search":
-                            print_function_message("🧠", f"searching memory with {function_name}")
-                        try:
-                            msg_dict = eval(function_args)
-                            if function_name == "archival_memory_search":
-                                output = f'\tquery: {msg_dict["query"]}, page: {msg_dict["page"]}'
-                                if STRIP_UI:
-                                    print(output)
-                                else:
-                                    print(f"{Fore.RED}{output}{Style.RESET_ALL}")
-                            elif function_name == "archival_memory_insert":
-                                output = f'\t→ {msg_dict["content"]}'
-                                if STRIP_UI:
-                                    print(output)
-                                else:
-                                    print(f"{Style.BRIGHT}{Fore.RED}{output}{Style.RESET_ALL}")
-                            else:
-                                if STRIP_UI:
-                                    print(f'\t {msg_dict["old_content"]}\n\t→ {msg_dict["new_content"]}')
-                                else:
-                                    print(
-                                        f'{Style.BRIGHT}\t{Fore.RED} {msg_dict["old_content"]}\n\t{Fore.GREEN}→ {msg_dict["new_content"]}{Style.RESET_ALL}'
-                                    )
-                        except Exception as e:
-                            printd(str(e))
-                            printd(msg_dict)
-                    elif function_name in ["conversation_search", "conversation_search_date"]:
-                        print_function_message("🧠", f"searching memory with {function_name}")
-                        try:
-                            msg_dict = eval(function_args)
-                            output = f'\tquery: {msg_dict["query"]}, page: {msg_dict["page"]}'
-                            if STRIP_UI:
-                                print(output)
-                            else:
-                                print(f"{Fore.RED}{output}{Style.RESET_ALL}")
-                        except Exception as e:
-                            printd(str(e))
-                            printd(msg_dict)
-                else:
-                    printd(f"{CLI_WARNING_PREFIX}did not recognize function message")
-                    printd_function_message("", msg)
-        else:
-            try:
-                msg_dict = json.loads(msg, strict=JSON_LOADS_STRICT)
-                if "status" in msg_dict and msg_dict["status"] == "OK":
-                    printd_function_message("", str(msg), color=Fore.GREEN)
-                else:
-                    printd_function_message("", str(msg), color=Fore.RED)
-            except Exception:
-                print(f"{CLI_WARNING_PREFIX}did not recognize function message {type(msg)} {msg}")
-                printd_function_message("", msg)
-
-    @staticmethod
-    def print_messages(message_sequence: List[Message], dump=False):
-        # rewrite to dict format
-        message_sequence = [msg.to_openai_dict() for msg in message_sequence]
-
-        idx = len(message_sequence)
-        for msg in message_sequence:
-            if dump:
-                print(f"[{idx}] ", end="")
-                idx -= 1
-            role = msg["role"]
-            content = msg["content"]
-
-            if role == "system":
-                CLIInterface.system_message(content)
-            elif role == "assistant":
-                # Differentiate between internal monologue, function calls, and messages
-                if msg.get("function_call"):
-                    if content is not None:
-                        CLIInterface.internal_monologue(content)
-                    # I think the next one is not up to date
-                    # function_message(msg["function_call"])
-                    args = json.loads(msg["function_call"].get("arguments"), strict=JSON_LOADS_STRICT)
-                    CLIInterface.assistant_message(args.get("message"))
-                    # assistant_message(content)
-                elif msg.get("tool_calls"):
-                    if content is not None:
-                        CLIInterface.internal_monologue(content)
-                    function_obj = msg["tool_calls"][0].get("function")
-                    if function_obj:
-                        args = json.loads(function_obj.get("arguments"), strict=JSON_LOADS_STRICT)
-                        CLIInterface.assistant_message(args.get("message"))
-                else:
-                    CLIInterface.internal_monologue(content)
-            elif role == "user":
-                CLIInterface.user_message(content, dump=dump)
-            elif role == "function":
-                CLIInterface.function_message(content, debug=dump)
-            elif role == "tool":
-                CLIInterface.function_message(content, debug=dump)
-            else:
-                print(f"Unknown role: {content}")
-
-    @staticmethod
-    def print_messages_simple(message_sequence: List[Message]):
-        # rewrite to dict format
-        message_sequence = [msg.to_openai_dict() for msg in message_sequence]
-
-        for msg in message_sequence:
-            role = msg["role"]
-            content = msg["content"]
-
-            if role == "system":
-                CLIInterface.system_message(content)
-            elif role == "assistant":
-                CLIInterface.assistant_message(content)
-            elif role == "user":
-                CLIInterface.user_message(content, raw=True)
-            else:
-                print(f"Unknown role: {content}")
-
-    @staticmethod
-    def print_messages_raw(message_sequence: List[Message]):
-        # rewrite to dict format
-        message_sequence = [msg.to_openai_dict() for msg in message_sequence]
-
-        for msg in message_sequence:
-            print(msg)
-
-    @staticmethod
-    def step_yield():
-        pass
-
-    @staticmethod
-    def step_complete():
-        pass
+import json
+import re
+from abc import ABC, abstractmethod
+from typing import List, Optional
+
+from colorama import Fore, Style, init
+
+from memgpt.constants import CLI_WARNING_PREFIX, JSON_LOADS_STRICT
+from memgpt.data_types import Message
+from memgpt.utils import printd
+
+init(autoreset=True)
+
+# DEBUG = True  # puts full message outputs in the terminal
+DEBUG = False  # only dumps important messages in the terminal
+
+STRIP_UI = False
+
+
+class AgentInterface(ABC):
+    """Interfaces handle MemGPT-related events (observer pattern)
+
+    The 'msg' args provides the scoped message, and the optional Message arg can provide additional metadata.
+    """
+
+    @abstractmethod
+    def user_message(self, msg: str, msg_obj: Optional[Message] = None):
+        """MemGPT receives a user message"""
+        raise NotImplementedError
+
+    @abstractmethod
+    def internal_monologue(self, msg: str, msg_obj: Optional[Message] = None):
+        """MemGPT generates some internal monologue"""
+        raise NotImplementedError
+
+    @abstractmethod
+    def assistant_message(self, msg: str, msg_obj: Optional[Message] = None):
+        """MemGPT uses send_message"""
+        raise NotImplementedError
+
+    @abstractmethod
+    def function_message(self, msg: str, msg_obj: Optional[Message] = None):
+        """MemGPT calls a function"""
+        raise NotImplementedError
+
+    # @abstractmethod
+    # @staticmethod
+    # def print_messages():
+    #     raise NotImplementedError
+
+    # @abstractmethod
+    # @staticmethod
+    # def print_messages_raw():
+    #     raise NotImplementedError
+
+    # @abstractmethod
+    # @staticmethod
+    # def step_yield():
+    #     raise NotImplementedError
+
+
+class CLIInterface(AgentInterface):
+    """Basic interface for dumping agent events to the command-line"""
+
+    @staticmethod
+    def important_message(msg: str):
+        fstr = f"{Fore.MAGENTA}{Style.BRIGHT}{{msg}}{Style.RESET_ALL}"
+        if STRIP_UI:
+            fstr = "{msg}"
+        print(fstr.format(msg=msg))
+
+    @staticmethod
+    def warning_message(msg: str):
+        fstr = f"{Fore.RED}{Style.BRIGHT}{{msg}}{Style.RESET_ALL}"
+        if STRIP_UI:
+            fstr = "{msg}"
+        else:
+            print(fstr.format(msg=msg))
+
+    @staticmethod
+    def internal_monologue(msg: str, msg_obj: Optional[Message] = None):
+        # ANSI escape code for italic is '\x1B[3m'
+        fstr = f"\x1B[3m{Fore.LIGHTBLACK_EX}💭 {{msg}}{Style.RESET_ALL}"
+        if STRIP_UI:
+            fstr = "{msg}"
+        print(fstr.format(msg=msg))
+
+    @staticmethod
+    def assistant_message(msg: str, msg_obj: Optional[Message] = None):
+        fstr = f"{Fore.YELLOW}{Style.BRIGHT}🤖 {Fore.YELLOW}{{msg}}{Style.RESET_ALL}"
+        if STRIP_UI:
+            fstr = "{msg}"
+        print(fstr.format(msg=msg))
+
+    @staticmethod
+    def memory_message(msg: str, msg_obj: Optional[Message] = None):
+        fstr = f"{Fore.LIGHTMAGENTA_EX}{Style.BRIGHT}🧠 {Fore.LIGHTMAGENTA_EX}{{msg}}{Style.RESET_ALL}"
+        if STRIP_UI:
+            fstr = "{msg}"
+        print(fstr.format(msg=msg))
+
+    @staticmethod
+    def system_message(msg: str, msg_obj: Optional[Message] = None):
+        fstr = f"{Fore.MAGENTA}{Style.BRIGHT}🖥️ [system] {Fore.MAGENTA}{msg}{Style.RESET_ALL}"
+        if STRIP_UI:
+            fstr = "{msg}"
+        print(fstr.format(msg=msg))
+
+    @staticmethod
+    def user_message(msg: str, msg_obj: Optional[Message] = None, raw: bool = False, dump: bool = False, debug: bool = DEBUG):
+        def print_user_message(icon, msg, printf=print):
+            if STRIP_UI:
+                printf(f"{icon} {msg}")
+            else:
+                printf(f"{Fore.GREEN}{Style.BRIGHT}{icon} {Fore.GREEN}{msg}{Style.RESET_ALL}")
+
+        def printd_user_message(icon, msg):
+            return print_user_message(icon, msg)
+
+        if not (raw or dump or debug):
+            # we do not want to repeat the message in normal use
+            return
+
+        if isinstance(msg, str):
+            if raw:
+                printd_user_message("🧑", msg)
+                return
+            else:
+                try:
+                    msg_json = json.loads(msg, strict=JSON_LOADS_STRICT)
+                except:
+                    printd(f"{CLI_WARNING_PREFIX}failed to parse user message into json")
+                    printd_user_message("🧑", msg)
+                    return
+        if msg_json["type"] == "user_message":
+            if dump:
+                print_user_message("🧑", msg_json["message"])
+                return
+            msg_json.pop("type")
+            printd_user_message("🧑", msg_json)
+        elif msg_json["type"] == "heartbeat":
+            if debug:
+                msg_json.pop("type")
+                printd_user_message("💓", msg_json)
+            elif dump:
+                print_user_message("💓", msg_json)
+                return
+
+        elif msg_json["type"] == "system_message":
+            msg_json.pop("type")
+            printd_user_message("🖥️", msg_json)
+        else:
+            printd_user_message("🧑", msg_json)
+
+    @staticmethod
+    def function_message(msg: str, msg_obj: Optional[Message] = None, debug: bool = DEBUG):
+        def print_function_message(icon, msg, color=Fore.RED, printf=print):
+            if STRIP_UI:
+                printf(f"⚡{icon} [function] {msg}")
+            else:
+                printf(f"{color}{Style.BRIGHT}⚡{icon} [function] {color}{msg}{Style.RESET_ALL}")
+
+        def printd_function_message(icon, msg, color=Fore.RED):
+            return print_function_message(icon, msg, color, printf=(print if debug else printd))
+
+        if isinstance(msg, dict):
+            printd_function_message("", msg)
+            return
+
+        if msg.startswith("Success"):
+            printd_function_message("🟢", msg)
+        elif msg.startswith("Error: "):
+            printd_function_message("🔴", msg)
+        elif msg.startswith("Ran "):
+            # NOTE: ignore 'ran' messages that come post-execution
+            return
+        elif msg.startswith("Running "):
+            if debug:
+                printd_function_message("", msg)
+            else:
+                match = re.search(r"Running (\w+)\((.*)\)", msg)
+                if match:
+                    function_name = match.group(1)
+                    function_args = match.group(2)
+                    if function_name in ["archival_memory_insert", "archival_memory_search", "core_memory_replace", "core_memory_append"]:
+                        if function_name in ["archival_memory_insert", "core_memory_append", "core_memory_replace"]:
+                            print_function_message("🧠", f"updating memory with {function_name}")
+                        elif function_name == "archival_memory_search":
+                            print_function_message("🧠", f"searching memory with {function_name}")
+                        try:
+                            msg_dict = eval(function_args)
+                            if function_name == "archival_memory_search":
+                                output = f'\tquery: {msg_dict["query"]}, page: {msg_dict["page"]}'
+                                if STRIP_UI:
+                                    print(output)
+                                else:
+                                    print(f"{Fore.RED}{output}{Style.RESET_ALL}")
+                            elif function_name == "archival_memory_insert":
+                                output = f'\t→ {msg_dict["content"]}'
+                                if STRIP_UI:
+                                    print(output)
+                                else:
+                                    print(f"{Style.BRIGHT}{Fore.RED}{output}{Style.RESET_ALL}")
+                            else:
+                                if STRIP_UI:
+                                    print(f'\t {msg_dict["old_content"]}\n\t→ {msg_dict["new_content"]}')
+                                else:
+                                    print(
+                                        f'{Style.BRIGHT}\t{Fore.RED} {msg_dict["old_content"]}\n\t{Fore.GREEN}→ {msg_dict["new_content"]}{Style.RESET_ALL}'
+                                    )
+                        except Exception as e:
+                            printd(str(e))
+                            printd(msg_dict)
+                    elif function_name in ["conversation_search", "conversation_search_date"]:
+                        print_function_message("🧠", f"searching memory with {function_name}")
+                        try:
+                            msg_dict = eval(function_args)
+                            output = f'\tquery: {msg_dict["query"]}, page: {msg_dict["page"]}'
+                            if STRIP_UI:
+                                print(output)
+                            else:
+                                print(f"{Fore.RED}{output}{Style.RESET_ALL}")
+                        except Exception as e:
+                            printd(str(e))
+                            printd(msg_dict)
+                else:
+                    printd(f"{CLI_WARNING_PREFIX}did not recognize function message")
+                    printd_function_message("", msg)
+        else:
+            try:
+                msg_dict = json.loads(msg, strict=JSON_LOADS_STRICT)
+                if "status" in msg_dict and msg_dict["status"] == "OK":
+                    printd_function_message("", str(msg), color=Fore.GREEN)
+                else:
+                    printd_function_message("", str(msg), color=Fore.RED)
+            except Exception:
+                print(f"{CLI_WARNING_PREFIX}did not recognize function message {type(msg)} {msg}")
+                printd_function_message("", msg)
+
+    @staticmethod
+    def print_messages(message_sequence: List[Message], dump=False):
+        # rewrite to dict format
+        message_sequence = [msg.to_openai_dict() for msg in message_sequence]
+
+        idx = len(message_sequence)
+        for msg in message_sequence:
+            if dump:
+                print(f"[{idx}] ", end="")
+                idx -= 1
+            role = msg["role"]
+            content = msg["content"]
+
+            if role == "system":
+                CLIInterface.system_message(content)
+            elif role == "assistant":
+                # Differentiate between internal monologue, function calls, and messages
+                if msg.get("function_call"):
+                    if content is not None:
+                        CLIInterface.internal_monologue(content)
+                    # I think the next one is not up to date
+                    # function_message(msg["function_call"])
+                    args = json.loads(msg["function_call"].get("arguments"), strict=JSON_LOADS_STRICT)
+                    CLIInterface.assistant_message(args.get("message"))
+                    # assistant_message(content)
+                elif msg.get("tool_calls"):
+                    if content is not None:
+                        CLIInterface.internal_monologue(content)
+                    function_obj = msg["tool_calls"][0].get("function")
+                    if function_obj:
+                        args = json.loads(function_obj.get("arguments"), strict=JSON_LOADS_STRICT)
+                        CLIInterface.assistant_message(args.get("message"))
+                else:
+                    CLIInterface.internal_monologue(content)
+            elif role == "user":
+                CLIInterface.user_message(content, dump=dump)
+            elif role == "function":
+                CLIInterface.function_message(content, debug=dump)
+            elif role == "tool":
+                CLIInterface.function_message(content, debug=dump)
+            else:
+                print(f"Unknown role: {content}")
+
+    @staticmethod
+    def print_messages_simple(message_sequence: List[Message]):
+        # rewrite to dict format
+        message_sequence = [msg.to_openai_dict() for msg in message_sequence]
+
+        for msg in message_sequence:
+            role = msg["role"]
+            content = msg["content"]
+
+            if role == "system":
+                CLIInterface.system_message(content)
+            elif role == "assistant":
+                CLIInterface.assistant_message(content)
+            elif role == "user":
+                CLIInterface.user_message(content, raw=True)
+            else:
+                print(f"Unknown role: {content}")
+
+    @staticmethod
+    def print_messages_raw(message_sequence: List[Message]):
+        # rewrite to dict format
+        message_sequence = [msg.to_openai_dict() for msg in message_sequence]
+
+        for msg in message_sequence:
+            print(msg)
+
+    @staticmethod
+    def step_yield():
+        pass
+
+    @staticmethod
+    def step_complete():
+        pass
--- a/memgpt/llm_api/llm_api_tools.py
+++ b/memgpt/llm_api/llm_api_tools.py
@@ -1,358 +1,358 @@
-import os
-import random
-import time
-import uuid
-from typing import List, Optional, Union
-
-import requests
-
-from memgpt.constants import CLI_WARNING_PREFIX
-from memgpt.credentials import MemGPTCredentials
-from memgpt.data_types import Message
-from memgpt.llm_api.anthropic import anthropic_chat_completions_request
-from memgpt.llm_api.azure_openai import (
-    MODEL_TO_AZURE_ENGINE,
-    azure_openai_chat_completions_request,
-)
-from memgpt.llm_api.cohere import cohere_chat_completions_request
-from memgpt.llm_api.google_ai import (
-    convert_tools_to_google_ai_format,
-    google_ai_chat_completions_request,
-)
-from memgpt.llm_api.openai import (
-    openai_chat_completions_process_stream,
-    openai_chat_completions_request,
-)
-from memgpt.local_llm.chat_completion_proxy import get_chat_completion
-from memgpt.models.chat_completion_request import (
-    ChatCompletionRequest,
-    Tool,
-    cast_message_to_subtype,
-)
-from memgpt.models.chat_completion_response import ChatCompletionResponse
-from memgpt.models.pydantic_models import LLMConfigModel
-from memgpt.streaming_interface import (
-    AgentChunkStreamingInterface,
-    AgentRefreshStreamingInterface,
-)
-
-LLM_API_PROVIDER_OPTIONS = ["openai", "azure", "anthropic", "google_ai", "cohere", "local"]
-
-
-def is_context_overflow_error(exception: requests.exceptions.RequestException) -> bool:
-    """Checks if an exception is due to context overflow (based on common OpenAI response messages)"""
-    from memgpt.utils import printd
-
-    match_string = "maximum context length"
-
-    # Backwards compatibility with openai python package/client v0.28 (pre-v1 client migration)
-    if match_string in str(exception):
-        printd(f"Found '{match_string}' in str(exception)={(str(exception))}")
-        return True
-
-    # Based on python requests + OpenAI REST API (/v1)
-    elif isinstance(exception, requests.exceptions.HTTPError):
-        if exception.response is not None and "application/json" in exception.response.headers.get("Content-Type", ""):
-            try:
-                error_details = exception.response.json()
-                if "error" not in error_details:
-                    printd(f"HTTPError occurred, but couldn't find error field: {error_details}")
-                    return False
-                else:
-                    error_details = error_details["error"]
-
-                # Check for the specific error code
-                if error_details.get("code") == "context_length_exceeded":
-                    printd(f"HTTPError occurred, caught error code {error_details.get('code')}")
-                    return True
-                # Soft-check for "maximum context length" inside of the message
-                elif error_details.get("message") and "maximum context length" in error_details.get("message"):
-                    printd(f"HTTPError occurred, found '{match_string}' in error message contents ({error_details})")
-                    return True
-                else:
-                    printd(f"HTTPError occurred, but unknown error message: {error_details}")
-                    return False
-            except ValueError:
-                # JSON decoding failed
-                printd(f"HTTPError occurred ({exception}), but no JSON error message.")
-
-    # Generic fail
-    else:
-        return False
-
-
-def retry_with_exponential_backoff(
-    func,
-    initial_delay: float = 1,
-    exponential_base: float = 2,
-    jitter: bool = True,
-    max_retries: int = 20,
-    # List of OpenAI error codes: https://github.com/openai/openai-python/blob/17ac6779958b2b74999c634c4ea4c7b74906027a/src/openai/_client.py#L227-L250
-    # 429 = rate limit
-    error_codes: tuple = (429,),
-):
-    """Retry a function with exponential backoff."""
-
-    def wrapper(*args, **kwargs):
-        pass
-
-        # Initialize variables
-        num_retries = 0
-        delay = initial_delay
-
-        # Loop until a successful response or max_retries is hit or an exception is raised
-        while True:
-            try:
-                return func(*args, **kwargs)
-
-            except requests.exceptions.HTTPError as http_err:
-                # Retry on specified errors
-                if http_err.response.status_code in error_codes:
-                    # Increment retries
-                    num_retries += 1
-
-                    # Check if max retries has been reached
-                    if num_retries > max_retries:
-                        raise Exception(f"Maximum number of retries ({max_retries}) exceeded.")
-
-                    # Increment the delay
-                    delay *= exponential_base * (1 + jitter * random.random())
-
-                    # Sleep for the delay
-                    # printd(f"Got a rate limit error ('{http_err}') on LLM backend request, waiting {int(delay)}s then retrying...")
-                    print(
-                        f"{CLI_WARNING_PREFIX}Got a rate limit error ('{http_err}') on LLM backend request, waiting {int(delay)}s then retrying..."
-                    )
-                    time.sleep(delay)
-                else:
-                    # For other HTTP errors, re-raise the exception
-                    raise
-
-            # Raise exceptions for any errors not specified
-            except Exception as e:
-                raise e
-
-    return wrapper
-
-
-@retry_with_exponential_backoff
-def create(
-    # agent_state: AgentState,
-    llm_config: LLMConfigModel,
-    messages: List[Message],
-    user_id: uuid.UUID = None,  # option UUID to associate request with
-    functions: list = None,
-    functions_python: list = None,
-    function_call: str = "auto",
-    # hint
-    first_message: bool = False,
-    # use tool naming?
-    # if false, will use deprecated 'functions' style
-    use_tool_naming: bool = True,
-    # streaming?
-    stream: bool = False,
-    stream_inferface: Optional[Union[AgentRefreshStreamingInterface, AgentChunkStreamingInterface]] = None,
-) -> ChatCompletionResponse:
-    """Return response to chat completion with backoff"""
-    from memgpt.utils import printd
-
-    printd(f"Using model {llm_config.model_endpoint_type}, endpoint: {llm_config.model_endpoint}")
-
-    # TODO eventually refactor so that credentials are passed through
-
-    credentials = MemGPTCredentials.load()
-
-    if function_call and not functions:
-        printd("unsetting function_call because functions is None")
-        function_call = None
-
-    # openai
-    if llm_config.model_endpoint_type == "openai":
-        # TODO do the same for Azure?
-        if credentials.openai_key is None and llm_config.model_endpoint == "https://api.openai.com/v1":
-            # only is a problem if we are *not* using an openai proxy
-            raise ValueError(f"OpenAI key is missing from MemGPT config file")
-        if use_tool_naming:
-            data = ChatCompletionRequest(
-                model=llm_config.model,
-                messages=[cast_message_to_subtype(m.to_openai_dict()) for m in messages],
-                tools=[{"type": "function", "function": f} for f in functions] if functions else None,
-                tool_choice=function_call,
-                user=str(user_id),
-            )
-        else:
-            data = ChatCompletionRequest(
-                model=llm_config.model,
-                messages=[cast_message_to_subtype(m.to_openai_dict()) for m in messages],
-                functions=functions,
-                function_call=function_call,
-                user=str(user_id),
-            )
-            # https://platform.openai.com/docs/guides/text-generation/json-mode
-            # only supported by gpt-4o, gpt-4-turbo, or gpt-3.5-turbo
-            if "gpt-4o" in llm_config.model or "gpt-4-turbo" in llm_config.model or "gpt-3.5-turbo" in llm_config.model:
-                data.response_format = {"type": "json_object"}
-
-        if stream:  # Client requested token streaming
-            data.stream = True
-            assert isinstance(stream_inferface, AgentChunkStreamingInterface) or isinstance(
-                stream_inferface, AgentRefreshStreamingInterface
-            ), type(stream_inferface)
-            return openai_chat_completions_process_stream(
-                url=llm_config.model_endpoint,  # https://api.openai.com/v1 -> https://api.openai.com/v1/chat/completions
-                api_key=credentials.openai_key,
-                chat_completion_request=data,
-                stream_inferface=stream_inferface,
-            )
-        else:  # Client did not request token streaming (expect a blocking backend response)
-            data.stream = False
-            if isinstance(stream_inferface, AgentChunkStreamingInterface):
-                stream_inferface.stream_start()
-            try:
-                response = openai_chat_completions_request(
-                    url=llm_config.model_endpoint,  # https://api.openai.com/v1 -> https://api.openai.com/v1/chat/completions
-                    api_key=credentials.openai_key,
-                    chat_completion_request=data,
-                )
-            finally:
-                if isinstance(stream_inferface, AgentChunkStreamingInterface):
-                    stream_inferface.stream_end()
-            return response
-
-    # azure
-    elif llm_config.model_endpoint_type == "azure":
-        if stream:
-            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
-
-        azure_deployment = (
-            credentials.azure_deployment if credentials.azure_deployment is not None else MODEL_TO_AZURE_ENGINE[llm_config.model]
-        )
-        if use_tool_naming:
-            data = dict(
-                # NOTE: don't pass model to Azure calls, that is the deployment_id
-                # model=agent_config.model,
-                messages=messages,
-                tools=[{"type": "function", "function": f} for f in functions] if functions else None,
-                tool_choice=function_call,
-                user=str(user_id),
-            )
-        else:
-            data = dict(
-                # NOTE: don't pass model to Azure calls, that is the deployment_id
-                # model=agent_config.model,
-                messages=messages,
-                functions=functions,
-                function_call=function_call,
-                user=str(user_id),
-            )
-        return azure_openai_chat_completions_request(
-            resource_name=credentials.azure_endpoint,
-            deployment_id=azure_deployment,
-            api_version=credentials.azure_version,
-            api_key=credentials.azure_key,
-            data=data,
-        )
-
-    elif llm_config.model_endpoint_type == "google_ai":
-        if stream:
-            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
-        if not use_tool_naming:
-            raise NotImplementedError("Only tool calling supported on Google AI API requests")
-
-        # NOTE: until Google AI supports CoT / text alongside function calls,
-        # we need to put it in a kwarg (unless we want to split the message into two)
-        google_ai_inner_thoughts_in_kwarg = True
-
-        if functions is not None:
-            tools = [{"type": "function", "function": f} for f in functions]
-            tools = [Tool(**t) for t in tools]
-            tools = convert_tools_to_google_ai_format(tools, inner_thoughts_in_kwargs=google_ai_inner_thoughts_in_kwarg)
-        else:
-            tools = None
-
-        return google_ai_chat_completions_request(
-            inner_thoughts_in_kwargs=google_ai_inner_thoughts_in_kwarg,
-            service_endpoint=credentials.google_ai_service_endpoint,
-            model=llm_config.model,
-            api_key=credentials.google_ai_key,
-            # see structure of payload here: https://ai.google.dev/docs/function_calling
-            data=dict(
-                contents=[m.to_google_ai_dict() for m in messages],
-                tools=tools,
-            ),
-        )
-
-    elif llm_config.model_endpoint_type == "anthropic":
-        if stream:
-            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
-        if not use_tool_naming:
-            raise NotImplementedError("Only tool calling supported on Anthropic API requests")
-
-        if functions is not None:
-            tools = [{"type": "function", "function": f} for f in functions]
-            tools = [Tool(**t) for t in tools]
-        else:
-            tools = None
-
-        return anthropic_chat_completions_request(
-            url=llm_config.model_endpoint,
-            api_key=credentials.anthropic_key,
-            data=ChatCompletionRequest(
-                model=llm_config.model,
-                messages=[cast_message_to_subtype(m.to_openai_dict()) for m in messages],
-                tools=[{"type": "function", "function": f} for f in functions] if functions else None,
-                # tool_choice=function_call,
-                # user=str(user_id),
-                # NOTE: max_tokens is required for Anthropic API
-                max_tokens=1024,  # TODO make dynamic
-            ),
-        )
-
-    elif llm_config.model_endpoint_type == "cohere":
-        if stream:
-            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
-        if not use_tool_naming:
-            raise NotImplementedError("Only tool calling supported on Cohere API requests")
-
-        if functions is not None:
-            tools = [{"type": "function", "function": f} for f in functions]
-            tools = [Tool(**t) for t in tools]
-        else:
-            tools = None
-
-        return cohere_chat_completions_request(
-            # url=llm_config.model_endpoint,
-            url="https://api.cohere.ai/v1",  # TODO
-            api_key=os.getenv("COHERE_API_KEY"),  # TODO remove
-            chat_completion_request=ChatCompletionRequest(
-                model="command-r-plus",  # TODO
-                messages=[cast_message_to_subtype(m.to_openai_dict()) for m in messages],
-                tools=[{"type": "function", "function": f} for f in functions] if functions else None,
-                tool_choice=function_call,
-                # user=str(user_id),
-                # NOTE: max_tokens is required for Anthropic API
-                # max_tokens=1024,  # TODO make dynamic
-            ),
-        )
-
-    # local model
-    else:
-        if stream:
-            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
-        return get_chat_completion(
-            model=llm_config.model,
-            messages=messages,
-            functions=functions,
-            functions_python=functions_python,
-            function_call=function_call,
-            context_window=llm_config.context_window,
-            endpoint=llm_config.model_endpoint,
-            endpoint_type=llm_config.model_endpoint_type,
-            wrapper=llm_config.model_wrapper,
-            user=str(user_id),
-            # hint
-            first_message=first_message,
-            # auth-related
-            auth_type=credentials.openllm_auth_type,
-            auth_key=credentials.openllm_key,
-        )
+import os
+import random
+import time
+import uuid
+from typing import List, Optional, Union
+
+import requests
+
+from memgpt.constants import CLI_WARNING_PREFIX
+from memgpt.credentials import MemGPTCredentials
+from memgpt.data_types import Message
+from memgpt.llm_api.anthropic import anthropic_chat_completions_request
+from memgpt.llm_api.azure_openai import (
+    MODEL_TO_AZURE_ENGINE,
+    azure_openai_chat_completions_request,
+)
+from memgpt.llm_api.cohere import cohere_chat_completions_request
+from memgpt.llm_api.google_ai import (
+    convert_tools_to_google_ai_format,
+    google_ai_chat_completions_request,
+)
+from memgpt.llm_api.openai import (
+    openai_chat_completions_process_stream,
+    openai_chat_completions_request,
+)
+from memgpt.local_llm.chat_completion_proxy import get_chat_completion
+from memgpt.models.chat_completion_request import (
+    ChatCompletionRequest,
+    Tool,
+    cast_message_to_subtype,
+)
+from memgpt.models.chat_completion_response import ChatCompletionResponse
+from memgpt.models.pydantic_models import LLMConfigModel
+from memgpt.streaming_interface import (
+    AgentChunkStreamingInterface,
+    AgentRefreshStreamingInterface,
+)
+
+LLM_API_PROVIDER_OPTIONS = ["openai", "azure", "anthropic", "google_ai", "cohere", "local"]
+
+
+def is_context_overflow_error(exception: requests.exceptions.RequestException) -> bool:
+    """Checks if an exception is due to context overflow (based on common OpenAI response messages)"""
+    from memgpt.utils import printd
+
+    match_string = "maximum context length"
+
+    # Backwards compatibility with openai python package/client v0.28 (pre-v1 client migration)
+    if match_string in str(exception):
+        printd(f"Found '{match_string}' in str(exception)={(str(exception))}")
+        return True
+
+    # Based on python requests + OpenAI REST API (/v1)
+    elif isinstance(exception, requests.exceptions.HTTPError):
+        if exception.response is not None and "application/json" in exception.response.headers.get("Content-Type", ""):
+            try:
+                error_details = exception.response.json()
+                if "error" not in error_details:
+                    printd(f"HTTPError occurred, but couldn't find error field: {error_details}")
+                    return False
+                else:
+                    error_details = error_details["error"]
+
+                # Check for the specific error code
+                if error_details.get("code") == "context_length_exceeded":
+                    printd(f"HTTPError occurred, caught error code {error_details.get('code')}")
+                    return True
+                # Soft-check for "maximum context length" inside of the message
+                elif error_details.get("message") and "maximum context length" in error_details.get("message"):
+                    printd(f"HTTPError occurred, found '{match_string}' in error message contents ({error_details})")
+                    return True
+                else:
+                    printd(f"HTTPError occurred, but unknown error message: {error_details}")
+                    return False
+            except ValueError:
+                # JSON decoding failed
+                printd(f"HTTPError occurred ({exception}), but no JSON error message.")
+
+    # Generic fail
+    else:
+        return False
+
+
+def retry_with_exponential_backoff(
+    func,
+    initial_delay: float = 1,
+    exponential_base: float = 2,
+    jitter: bool = True,
+    max_retries: int = 20,
+    # List of OpenAI error codes: https://github.com/openai/openai-python/blob/17ac6779958b2b74999c634c4ea4c7b74906027a/src/openai/_client.py#L227-L250
+    # 429 = rate limit
+    error_codes: tuple = (429,),
+):
+    """Retry a function with exponential backoff."""
+
+    def wrapper(*args, **kwargs):
+        pass
+
+        # Initialize variables
+        num_retries = 0
+        delay = initial_delay
+
+        # Loop until a successful response or max_retries is hit or an exception is raised
+        while True:
+            try:
+                return func(*args, **kwargs)
+
+            except requests.exceptions.HTTPError as http_err:
+                # Retry on specified errors
+                if http_err.response.status_code in error_codes:
+                    # Increment retries
+                    num_retries += 1
+
+                    # Check if max retries has been reached
+                    if num_retries > max_retries:
+                        raise Exception(f"Maximum number of retries ({max_retries}) exceeded.")
+
+                    # Increment the delay
+                    delay *= exponential_base * (1 + jitter * random.random())
+
+                    # Sleep for the delay
+                    # printd(f"Got a rate limit error ('{http_err}') on LLM backend request, waiting {int(delay)}s then retrying...")
+                    print(
+                        f"{CLI_WARNING_PREFIX}Got a rate limit error ('{http_err}') on LLM backend request, waiting {int(delay)}s then retrying..."
+                    )
+                    time.sleep(delay)
+                else:
+                    # For other HTTP errors, re-raise the exception
+                    raise
+
+            # Raise exceptions for any errors not specified
+            except Exception as e:
+                raise e
+
+    return wrapper
+
+
+@retry_with_exponential_backoff
+def create(
+    # agent_state: AgentState,
+    llm_config: LLMConfigModel,
+    messages: List[Message],
+    user_id: uuid.UUID = None,  # option UUID to associate request with
+    functions: list = None,
+    functions_python: list = None,
+    function_call: str = "auto",
+    # hint
+    first_message: bool = False,
+    # use tool naming?
+    # if false, will use deprecated 'functions' style
+    use_tool_naming: bool = True,
+    # streaming?
+    stream: bool = False,
+    stream_inferface: Optional[Union[AgentRefreshStreamingInterface, AgentChunkStreamingInterface]] = None,
+) -> ChatCompletionResponse:
+    """Return response to chat completion with backoff"""
+    from memgpt.utils import printd
+
+    printd(f"Using model {llm_config.model_endpoint_type}, endpoint: {llm_config.model_endpoint}")
+
+    # TODO eventually refactor so that credentials are passed through
+
+    credentials = MemGPTCredentials.load()
+
+    if function_call and not functions:
+        printd("unsetting function_call because functions is None")
+        function_call = None
+
+    # openai
+    if llm_config.model_endpoint_type == "openai":
+        # TODO do the same for Azure?
+        if credentials.openai_key is None and llm_config.model_endpoint == "https://api.openai.com/v1":
+            # only is a problem if we are *not* using an openai proxy
+            raise ValueError(f"OpenAI key is missing from MemGPT config file")
+        if use_tool_naming:
+            data = ChatCompletionRequest(
+                model=llm_config.model,
+                messages=[cast_message_to_subtype(m.to_openai_dict()) for m in messages],
+                tools=[{"type": "function", "function": f} for f in functions] if functions else None,
+                tool_choice=function_call,
+                user=str(user_id),
+            )
+        else:
+            data = ChatCompletionRequest(
+                model=llm_config.model,
+                messages=[cast_message_to_subtype(m.to_openai_dict()) for m in messages],
+                functions=functions,
+                function_call=function_call,
+                user=str(user_id),
+            )
+            # https://platform.openai.com/docs/guides/text-generation/json-mode
+            # only supported by gpt-4o, gpt-4-turbo, or gpt-3.5-turbo
+            if "gpt-4o" in llm_config.model or "gpt-4-turbo" in llm_config.model or "gpt-3.5-turbo" in llm_config.model:
+                data.response_format = {"type": "json_object"}
+
+        if stream:  # Client requested token streaming
+            data.stream = True
+            assert isinstance(stream_inferface, AgentChunkStreamingInterface) or isinstance(
+                stream_inferface, AgentRefreshStreamingInterface
+            ), type(stream_inferface)
+            return openai_chat_completions_process_stream(
+                url=llm_config.model_endpoint,  # https://api.openai.com/v1 -> https://api.openai.com/v1/chat/completions
+                api_key=credentials.openai_key,
+                chat_completion_request=data,
+                stream_inferface=stream_inferface,
+            )
+        else:  # Client did not request token streaming (expect a blocking backend response)
+            data.stream = False
+            if isinstance(stream_inferface, AgentChunkStreamingInterface):
+                stream_inferface.stream_start()
+            try:
+                response = openai_chat_completions_request(
+                    url=llm_config.model_endpoint,  # https://api.openai.com/v1 -> https://api.openai.com/v1/chat/completions
+                    api_key=credentials.openai_key,
+                    chat_completion_request=data,
+                )
+            finally:
+                if isinstance(stream_inferface, AgentChunkStreamingInterface):
+                    stream_inferface.stream_end()
+            return response
+
+    # azure
+    elif llm_config.model_endpoint_type == "azure":
+        if stream:
+            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
+
+        azure_deployment = (
+            credentials.azure_deployment if credentials.azure_deployment is not None else MODEL_TO_AZURE_ENGINE[llm_config.model]
+        )
+        if use_tool_naming:
+            data = dict(
+                # NOTE: don't pass model to Azure calls, that is the deployment_id
+                # model=agent_config.model,
+                messages=messages,
+                tools=[{"type": "function", "function": f} for f in functions] if functions else None,
+                tool_choice=function_call,
+                user=str(user_id),
+            )
+        else:
+            data = dict(
+                # NOTE: don't pass model to Azure calls, that is the deployment_id
+                # model=agent_config.model,
+                messages=messages,
+                functions=functions,
+                function_call=function_call,
+                user=str(user_id),
+            )
+        return azure_openai_chat_completions_request(
+            resource_name=credentials.azure_endpoint,
+            deployment_id=azure_deployment,
+            api_version=credentials.azure_version,
+            api_key=credentials.azure_key,
+            data=data,
+        )
+
+    elif llm_config.model_endpoint_type == "google_ai":
+        if stream:
+            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
+        if not use_tool_naming:
+            raise NotImplementedError("Only tool calling supported on Google AI API requests")
+
+        # NOTE: until Google AI supports CoT / text alongside function calls,
+        # we need to put it in a kwarg (unless we want to split the message into two)
+        google_ai_inner_thoughts_in_kwarg = True
+
+        if functions is not None:
+            tools = [{"type": "function", "function": f} for f in functions]
+            tools = [Tool(**t) for t in tools]
+            tools = convert_tools_to_google_ai_format(tools, inner_thoughts_in_kwargs=google_ai_inner_thoughts_in_kwarg)
+        else:
+            tools = None
+
+        return google_ai_chat_completions_request(
+            inner_thoughts_in_kwargs=google_ai_inner_thoughts_in_kwarg,
+            service_endpoint=credentials.google_ai_service_endpoint,
+            model=llm_config.model,
+            api_key=credentials.google_ai_key,
+            # see structure of payload here: https://ai.google.dev/docs/function_calling
+            data=dict(
+                contents=[m.to_google_ai_dict() for m in messages],
+                tools=tools,
+            ),
+        )
+
+    elif llm_config.model_endpoint_type == "anthropic":
+        if stream:
+            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
+        if not use_tool_naming:
+            raise NotImplementedError("Only tool calling supported on Anthropic API requests")
+
+        if functions is not None:
+            tools = [{"type": "function", "function": f} for f in functions]
+            tools = [Tool(**t) for t in tools]
+        else:
+            tools = None
+
+        return anthropic_chat_completions_request(
+            url=llm_config.model_endpoint,
+            api_key=credentials.anthropic_key,
+            data=ChatCompletionRequest(
+                model=llm_config.model,
+                messages=[cast_message_to_subtype(m.to_openai_dict()) for m in messages],
+                tools=[{"type": "function", "function": f} for f in functions] if functions else None,
+                # tool_choice=function_call,
+                # user=str(user_id),
+                # NOTE: max_tokens is required for Anthropic API
+                max_tokens=1024,  # TODO make dynamic
+            ),
+        )
+
+    elif llm_config.model_endpoint_type == "cohere":
+        if stream:
+            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
+        if not use_tool_naming:
+            raise NotImplementedError("Only tool calling supported on Cohere API requests")
+
+        if functions is not None:
+            tools = [{"type": "function", "function": f} for f in functions]
+            tools = [Tool(**t) for t in tools]
+        else:
+            tools = None
+
+        return cohere_chat_completions_request(
+            # url=llm_config.model_endpoint,
+            url="https://api.cohere.ai/v1",  # TODO
+            api_key=os.getenv("COHERE_API_KEY"),  # TODO remove
+            chat_completion_request=ChatCompletionRequest(
+                model="command-r-plus",  # TODO
+                messages=[cast_message_to_subtype(m.to_openai_dict()) for m in messages],
+                tools=[{"type": "function", "function": f} for f in functions] if functions else None,
+                tool_choice=function_call,
+                # user=str(user_id),
+                # NOTE: max_tokens is required for Anthropic API
+                # max_tokens=1024,  # TODO make dynamic
+            ),
+        )
+
+    # local model
+    else:
+        if stream:
+            raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
+        return get_chat_completion(
+            model=llm_config.model,
+            messages=messages,
+            functions=functions,
+            functions_python=functions_python,
+            function_call=function_call,
+            context_window=llm_config.context_window,
+            endpoint=llm_config.model_endpoint,
+            endpoint_type=llm_config.model_endpoint_type,
+            wrapper=llm_config.model_wrapper,
+            user=str(user_id),
+            # hint
+            first_message=first_message,
+            # auth-related
+            auth_type=credentials.openllm_auth_type,
+            auth_key=credentials.openllm_key,
+        )
--- a/memgpt/local_llm/README.md
+++ b/memgpt/local_llm/README.md
@@ -1,3 +1,3 @@
-# MemGPT + local LLMs
-
-See [https://memgpt.readme.io/docs/local_llm](https://memgpt.readme.io/docs/local_llm) for documentation on running MemGPT with custom LLM backends.
+# MemGPT + local LLMs
+
+See [https://memgpt.readme.io/docs/local_llm](https://memgpt.readme.io/docs/local_llm) for documentation on running MemGPT with custom LLM backends.
--- a/memgpt/local_llm/chat_completion_proxy.py
+++ b/memgpt/local_llm/chat_completion_proxy.py
@@ -1,280 +1,280 @@
-"""Key idea: create drop-in replacement for agent's ChatCompletion call that runs on an OpenLLM backend"""
-
-import json
-import uuid
-
-import requests
-
-from memgpt.constants import CLI_WARNING_PREFIX, JSON_ENSURE_ASCII
-from memgpt.errors import LocalLLMConnectionError, LocalLLMError
-from memgpt.local_llm.constants import DEFAULT_WRAPPER
-from memgpt.local_llm.function_parser import patch_function
-from memgpt.local_llm.grammars.gbnf_grammar_generator import (
-    create_dynamic_model_from_function,
-    generate_gbnf_grammar_and_documentation,
-)
-from memgpt.local_llm.groq.api import get_groq_completion
-from memgpt.local_llm.koboldcpp.api import get_koboldcpp_completion
-from memgpt.local_llm.llamacpp.api import get_llamacpp_completion
-from memgpt.local_llm.llm_chat_completion_wrappers import simple_summary_wrapper
-from memgpt.local_llm.lmstudio.api import get_lmstudio_completion
-from memgpt.local_llm.ollama.api import get_ollama_completion
-from memgpt.local_llm.utils import count_tokens, get_available_wrappers
-from memgpt.local_llm.vllm.api import get_vllm_completion
-from memgpt.local_llm.webui.api import get_webui_completion
-from memgpt.local_llm.webui.legacy_api import (
-    get_webui_completion as get_webui_completion_legacy,
-)
-from memgpt.models.chat_completion_response import (
-    ChatCompletionResponse,
-    Choice,
-    Message,
-    ToolCall,
-    UsageStatistics,
-)
-from memgpt.prompts.gpt_summarize import SYSTEM as SUMMARIZE_SYSTEM_MESSAGE
-from memgpt.utils import get_tool_call_id, get_utc_time
-
-has_shown_warning = False
-grammar_supported_backends = ["koboldcpp", "llamacpp", "webui", "webui-legacy"]
-
-
-def get_chat_completion(
-    model,
-    # no model required (except for Ollama), since the model is fixed to whatever you set in your own backend
-    messages,
-    functions=None,
-    functions_python=None,
-    function_call="auto",
-    context_window=None,
-    user=None,
-    # required
-    wrapper=None,
-    endpoint=None,
-    endpoint_type=None,
-    # optional cleanup
-    function_correction=True,
-    # extra hints to allow for additional prompt formatting hacks
-    # TODO this could alternatively be supported via passing function_call="send_message" into the wrapper
-    first_message=False,
-    # optional auth headers
-    auth_type=None,
-    auth_key=None,
-) -> ChatCompletionResponse:
-    from memgpt.utils import printd
-
-    assert context_window is not None, "Local LLM calls need the context length to be explicitly set"
-    assert endpoint is not None, "Local LLM calls need the endpoint (eg http://localendpoint:1234) to be explicitly set"
-    assert endpoint_type is not None, "Local LLM calls need the endpoint type (eg webui) to be explicitly set"
-    global has_shown_warning
-    grammar = None
-
-    # TODO: eventually just process Message object
-    if not isinstance(messages[0], dict):
-        messages = [m.to_openai_dict() for m in messages]
-
-    if function_call is not None and function_call != "auto":
-        raise ValueError(f"function_call == {function_call} not supported (auto or None only)")
-
-    available_wrappers = get_available_wrappers()
-    documentation = None
-
-    # Special case for if the call we're making is coming from the summarizer
-    if messages[0]["role"] == "system" and messages[0]["content"].strip() == SUMMARIZE_SYSTEM_MESSAGE.strip():
-        llm_wrapper = simple_summary_wrapper.SimpleSummaryWrapper()
-
-    # Select a default prompt formatter
-    elif wrapper is None:
-        # Warn the user that we're using the fallback
-        if not has_shown_warning:
-            print(
-                f"{CLI_WARNING_PREFIX}no wrapper specified for local LLM, using the default wrapper (you can remove this warning by specifying the wrapper with --model-wrapper)"
-            )
-            has_shown_warning = True
-
-        llm_wrapper = DEFAULT_WRAPPER()
-
-    # User provided an incorrect prompt formatter
-    elif wrapper not in available_wrappers:
-        raise ValueError(f"Could not find requested wrapper '{wrapper} in available wrappers list:\n{', '.join(available_wrappers)}")
-
-    # User provided a correct prompt formatter
-    else:
-        llm_wrapper = available_wrappers[wrapper]
-
-    # If the wrapper uses grammar, generate the grammar using the grammar generating function
-    # TODO move this to a flag
-    if wrapper is not None and "grammar" in wrapper:
-        # When using grammars, we don't want to do any extras output tricks like appending a response prefix
-        setattr(llm_wrapper, "assistant_prefix_extra_first_message", "")
-        setattr(llm_wrapper, "assistant_prefix_extra", "")
-
-        # TODO find a better way to do this than string matching (eg an attribute)
-        if "noforce" in wrapper:
-            # "noforce" means that the prompt formatter expects inner thoughts as a top-level parameter
-            # this is closer to the OpenAI style since it allows for messages w/o any function calls
-            # however, with bad LLMs it makes it easier for the LLM to "forget" to call any of the functions
-            grammar, documentation = generate_grammar_and_documentation(
-                functions_python=functions_python,
-                add_inner_thoughts_top_level=True,
-                add_inner_thoughts_param_level=False,
-                allow_only_inner_thoughts=True,
-            )
-        else:
-            # otherwise, the other prompt formatters will insert inner thoughts as a function call parameter (by default)
-            # this means that every response from the LLM will be required to call a function
-            grammar, documentation = generate_grammar_and_documentation(
-                functions_python=functions_python,
-                add_inner_thoughts_top_level=False,
-                add_inner_thoughts_param_level=True,
-                allow_only_inner_thoughts=False,
-            )
-        printd(grammar)
-
-    if grammar is not None and endpoint_type not in grammar_supported_backends:
-        print(
-            f"{CLI_WARNING_PREFIX}grammars are currently not supported when using {endpoint_type} as the MemGPT local LLM backend (supported: {', '.join(grammar_supported_backends)})"
-        )
-        grammar = None
-
-    # First step: turn the message sequence into a prompt that the model expects
-    try:
-        # if hasattr(llm_wrapper, "supports_first_message"):
-        if hasattr(llm_wrapper, "supports_first_message") and llm_wrapper.supports_first_message:
-            prompt = llm_wrapper.chat_completion_to_prompt(
-                messages=messages, functions=functions, first_message=first_message, function_documentation=documentation
-            )
-        else:
-            prompt = llm_wrapper.chat_completion_to_prompt(messages=messages, functions=functions, function_documentation=documentation)
-
-        printd(prompt)
-    except Exception as e:
-        print(e)
-        raise LocalLLMError(
-            f"Failed to convert ChatCompletion messages into prompt string with wrapper {str(llm_wrapper)} - error: {str(e)}"
-        )
-
-    try:
-        if endpoint_type == "webui":
-            result, usage = get_webui_completion(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
-        elif endpoint_type == "webui-legacy":
-            result, usage = get_webui_completion_legacy(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
-        elif endpoint_type == "lmstudio":
-            result, usage = get_lmstudio_completion(endpoint, auth_type, auth_key, prompt, context_window, api="completions")
-        elif endpoint_type == "lmstudio-legacy":
-            result, usage = get_lmstudio_completion(endpoint, auth_type, auth_key, prompt, context_window, api="chat")
-        elif endpoint_type == "llamacpp":
-            result, usage = get_llamacpp_completion(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
-        elif endpoint_type == "koboldcpp":
-            result, usage = get_koboldcpp_completion(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
-        elif endpoint_type == "ollama":
-            result, usage = get_ollama_completion(endpoint, auth_type, auth_key, model, prompt, context_window)
-        elif endpoint_type == "vllm":
-            result, usage = get_vllm_completion(endpoint, auth_type, auth_key, model, prompt, context_window, user)
-        elif endpoint_type == "groq":
-            result, usage = get_groq_completion(endpoint, auth_type, auth_key, model, prompt, context_window)
-        else:
-            raise LocalLLMError(
-                f"Invalid endpoint type {endpoint_type}, please set variable depending on your backend (webui, lmstudio, llamacpp, koboldcpp)"
-            )
-    except requests.exceptions.ConnectionError as e:
-        raise LocalLLMConnectionError(f"Unable to connect to endpoint {endpoint}")
-
-    if result is None or result == "":
-        raise LocalLLMError(f"Got back an empty response string from {endpoint}")
-    printd(f"Raw LLM output:\n====\n{result}\n====")
-
-    try:
-        if hasattr(llm_wrapper, "supports_first_message") and llm_wrapper.supports_first_message:
-            chat_completion_result = llm_wrapper.output_to_chat_completion_response(result, first_message=first_message)
-        else:
-            chat_completion_result = llm_wrapper.output_to_chat_completion_response(result)
-        printd(json.dumps(chat_completion_result, indent=2, ensure_ascii=JSON_ENSURE_ASCII))
-    except Exception as e:
-        raise LocalLLMError(f"Failed to parse JSON from local LLM response - error: {str(e)}")
-
-    # Run through some manual function correction (optional)
-    if function_correction:
-        chat_completion_result = patch_function(message_history=messages, new_message=chat_completion_result)
-
-    # Fill in potential missing usage information (used for tracking token use)
-    if not ("prompt_tokens" in usage and "completion_tokens" in usage and "total_tokens" in usage):
-        raise LocalLLMError(f"usage dict in response was missing fields ({usage})")
-
-    if usage["prompt_tokens"] is None:
-        printd(f"usage dict was missing prompt_tokens, computing on-the-fly...")
-        usage["prompt_tokens"] = count_tokens(prompt)
-
-    # NOTE: we should compute on-the-fly anyways since we might have to correct for errors during JSON parsing
-    usage["completion_tokens"] = count_tokens(json.dumps(chat_completion_result, ensure_ascii=JSON_ENSURE_ASCII))
-    """
-    if usage["completion_tokens"] is None:
-        printd(f"usage dict was missing completion_tokens, computing on-the-fly...")
-        # chat_completion_result is dict with 'role' and 'content'
-        # token counter wants a string
-        usage["completion_tokens"] = count_tokens(json.dumps(chat_completion_result, ensure_ascii=JSON_ENSURE_ASCII))
-    """
-
-    # NOTE: this is the token count that matters most
-    if usage["total_tokens"] is None:
-        printd(f"usage dict was missing total_tokens, computing on-the-fly...")
-        usage["total_tokens"] = usage["prompt_tokens"] + usage["completion_tokens"]
-
-    # unpack with response.choices[0].message.content
-    response = ChatCompletionResponse(
-        id=str(uuid.uuid4()),  # TODO something better?
-        choices=[
-            Choice(
-                finish_reason="stop",
-                index=0,
-                message=Message(
-                    role=chat_completion_result["role"],
-                    content=chat_completion_result["content"],
-                    tool_calls=(
-                        [ToolCall(id=get_tool_call_id(), type="function", function=chat_completion_result["function_call"])]
-                        if "function_call" in chat_completion_result
-                        else []
-                    ),
-                ),
-            )
-        ],
-        created=get_utc_time(),
-        model=model,
-        # "This fingerprint represents the backend configuration that the model runs with."
-        # system_fingerprint=user if user is not None else "null",
-        system_fingerprint=None,
-        object="chat.completion",
-        usage=UsageStatistics(**usage),
-    )
-    printd(response)
-    return response
-
-
-def generate_grammar_and_documentation(
-    functions_python: dict,
-    add_inner_thoughts_top_level: bool,
-    add_inner_thoughts_param_level: bool,
-    allow_only_inner_thoughts: bool,
-):
-    from memgpt.utils import printd
-
-    assert not (
-        add_inner_thoughts_top_level and add_inner_thoughts_param_level
-    ), "Can only place inner thoughts in one location in the grammar generator"
-
-    grammar_function_models = []
-    # create_dynamic_model_from_function will add inner thoughts to the function parameters if add_inner_thoughts is True.
-    # generate_gbnf_grammar_and_documentation will add inner thoughts to the outer object of the function parameters if add_inner_thoughts is True.
-    for key, func in functions_python.items():
-        grammar_function_models.append(create_dynamic_model_from_function(func, add_inner_thoughts=add_inner_thoughts_param_level))
-    grammar, documentation = generate_gbnf_grammar_and_documentation(
-        grammar_function_models,
-        outer_object_name="function",
-        outer_object_content="params",
-        model_prefix="function",
-        fields_prefix="params",
-        add_inner_thoughts=add_inner_thoughts_top_level,
-        allow_only_inner_thoughts=allow_only_inner_thoughts,
-    )
-    printd(grammar)
-    return grammar, documentation
+"""Key idea: create drop-in replacement for agent's ChatCompletion call that runs on an OpenLLM backend"""
+
+import json
+import uuid
+
+import requests
+
+from memgpt.constants import CLI_WARNING_PREFIX, JSON_ENSURE_ASCII
+from memgpt.errors import LocalLLMConnectionError, LocalLLMError
+from memgpt.local_llm.constants import DEFAULT_WRAPPER
+from memgpt.local_llm.function_parser import patch_function
+from memgpt.local_llm.grammars.gbnf_grammar_generator import (
+    create_dynamic_model_from_function,
+    generate_gbnf_grammar_and_documentation,
+)
+from memgpt.local_llm.groq.api import get_groq_completion
+from memgpt.local_llm.koboldcpp.api import get_koboldcpp_completion
+from memgpt.local_llm.llamacpp.api import get_llamacpp_completion
+from memgpt.local_llm.llm_chat_completion_wrappers import simple_summary_wrapper
+from memgpt.local_llm.lmstudio.api import get_lmstudio_completion
+from memgpt.local_llm.ollama.api import get_ollama_completion
+from memgpt.local_llm.utils import count_tokens, get_available_wrappers
+from memgpt.local_llm.vllm.api import get_vllm_completion
+from memgpt.local_llm.webui.api import get_webui_completion
+from memgpt.local_llm.webui.legacy_api import (
+    get_webui_completion as get_webui_completion_legacy,
+)
+from memgpt.models.chat_completion_response import (
+    ChatCompletionResponse,
+    Choice,
+    Message,
+    ToolCall,
+    UsageStatistics,
+)
+from memgpt.prompts.gpt_summarize import SYSTEM as SUMMARIZE_SYSTEM_MESSAGE
+from memgpt.utils import get_tool_call_id, get_utc_time
+
+has_shown_warning = False
+grammar_supported_backends = ["koboldcpp", "llamacpp", "webui", "webui-legacy"]
+
+
+def get_chat_completion(
+    model,
+    # no model required (except for Ollama), since the model is fixed to whatever you set in your own backend
+    messages,
+    functions=None,
+    functions_python=None,
+    function_call="auto",
+    context_window=None,
+    user=None,
+    # required
+    wrapper=None,
+    endpoint=None,
+    endpoint_type=None,
+    # optional cleanup
+    function_correction=True,
+    # extra hints to allow for additional prompt formatting hacks
+    # TODO this could alternatively be supported via passing function_call="send_message" into the wrapper
+    first_message=False,
+    # optional auth headers
+    auth_type=None,
+    auth_key=None,
+) -> ChatCompletionResponse:
+    from memgpt.utils import printd
+
+    assert context_window is not None, "Local LLM calls need the context length to be explicitly set"
+    assert endpoint is not None, "Local LLM calls need the endpoint (eg http://localendpoint:1234) to be explicitly set"
+    assert endpoint_type is not None, "Local LLM calls need the endpoint type (eg webui) to be explicitly set"
+    global has_shown_warning
+    grammar = None
+
+    # TODO: eventually just process Message object
+    if not isinstance(messages[0], dict):
+        messages = [m.to_openai_dict() for m in messages]
+
+    if function_call is not None and function_call != "auto":
+        raise ValueError(f"function_call == {function_call} not supported (auto or None only)")
+
+    available_wrappers = get_available_wrappers()
+    documentation = None
+
+    # Special case for if the call we're making is coming from the summarizer
+    if messages[0]["role"] == "system" and messages[0]["content"].strip() == SUMMARIZE_SYSTEM_MESSAGE.strip():
+        llm_wrapper = simple_summary_wrapper.SimpleSummaryWrapper()
+
+    # Select a default prompt formatter
+    elif wrapper is None:
+        # Warn the user that we're using the fallback
+        if not has_shown_warning:
+            print(
+                f"{CLI_WARNING_PREFIX}no wrapper specified for local LLM, using the default wrapper (you can remove this warning by specifying the wrapper with --model-wrapper)"
+            )
+            has_shown_warning = True
+
+        llm_wrapper = DEFAULT_WRAPPER()
+
+    # User provided an incorrect prompt formatter
+    elif wrapper not in available_wrappers:
+        raise ValueError(f"Could not find requested wrapper '{wrapper} in available wrappers list:\n{', '.join(available_wrappers)}")
+
+    # User provided a correct prompt formatter
+    else:
+        llm_wrapper = available_wrappers[wrapper]
+
+    # If the wrapper uses grammar, generate the grammar using the grammar generating function
+    # TODO move this to a flag
+    if wrapper is not None and "grammar" in wrapper:
+        # When using grammars, we don't want to do any extras output tricks like appending a response prefix
+        setattr(llm_wrapper, "assistant_prefix_extra_first_message", "")
+        setattr(llm_wrapper, "assistant_prefix_extra", "")
+
+        # TODO find a better way to do this than string matching (eg an attribute)
+        if "noforce" in wrapper:
+            # "noforce" means that the prompt formatter expects inner thoughts as a top-level parameter
+            # this is closer to the OpenAI style since it allows for messages w/o any function calls
+            # however, with bad LLMs it makes it easier for the LLM to "forget" to call any of the functions
+            grammar, documentation = generate_grammar_and_documentation(
+                functions_python=functions_python,
+                add_inner_thoughts_top_level=True,
+                add_inner_thoughts_param_level=False,
+                allow_only_inner_thoughts=True,
+            )
+        else:
+            # otherwise, the other prompt formatters will insert inner thoughts as a function call parameter (by default)
+            # this means that every response from the LLM will be required to call a function
+            grammar, documentation = generate_grammar_and_documentation(
+                functions_python=functions_python,
+                add_inner_thoughts_top_level=False,
+                add_inner_thoughts_param_level=True,
+                allow_only_inner_thoughts=False,
+            )
+        printd(grammar)
+
+    if grammar is not None and endpoint_type not in grammar_supported_backends:
+        print(
+            f"{CLI_WARNING_PREFIX}grammars are currently not supported when using {endpoint_type} as the MemGPT local LLM backend (supported: {', '.join(grammar_supported_backends)})"
+        )
+        grammar = None
+
+    # First step: turn the message sequence into a prompt that the model expects
+    try:
+        # if hasattr(llm_wrapper, "supports_first_message"):
+        if hasattr(llm_wrapper, "supports_first_message") and llm_wrapper.supports_first_message:
+            prompt = llm_wrapper.chat_completion_to_prompt(
+                messages=messages, functions=functions, first_message=first_message, function_documentation=documentation
+            )
+        else:
+            prompt = llm_wrapper.chat_completion_to_prompt(messages=messages, functions=functions, function_documentation=documentation)
+
+        printd(prompt)
+    except Exception as e:
+        print(e)
+        raise LocalLLMError(
+            f"Failed to convert ChatCompletion messages into prompt string with wrapper {str(llm_wrapper)} - error: {str(e)}"
+        )
+
+    try:
+        if endpoint_type == "webui":
+            result, usage = get_webui_completion(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
+        elif endpoint_type == "webui-legacy":
+            result, usage = get_webui_completion_legacy(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
+        elif endpoint_type == "lmstudio":
+            result, usage = get_lmstudio_completion(endpoint, auth_type, auth_key, prompt, context_window, api="completions")
+        elif endpoint_type == "lmstudio-legacy":
+            result, usage = get_lmstudio_completion(endpoint, auth_type, auth_key, prompt, context_window, api="chat")
+        elif endpoint_type == "llamacpp":
+            result, usage = get_llamacpp_completion(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
+        elif endpoint_type == "koboldcpp":
+            result, usage = get_koboldcpp_completion(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
+        elif endpoint_type == "ollama":
+            result, usage = get_ollama_completion(endpoint, auth_type, auth_key, model, prompt, context_window)
+        elif endpoint_type == "vllm":
+            result, usage = get_vllm_completion(endpoint, auth_type, auth_key, model, prompt, context_window, user)
+        elif endpoint_type == "groq":
+            result, usage = get_groq_completion(endpoint, auth_type, auth_key, model, prompt, context_window)
+        else:
+            raise LocalLLMError(
+                f"Invalid endpoint type {endpoint_type}, please set variable depending on your backend (webui, lmstudio, llamacpp, koboldcpp)"
+            )
+    except requests.exceptions.ConnectionError as e:
+        raise LocalLLMConnectionError(f"Unable to connect to endpoint {endpoint}")
+
+    if result is None or result == "":
+        raise LocalLLMError(f"Got back an empty response string from {endpoint}")
+    printd(f"Raw LLM output:\n====\n{result}\n====")
+
+    try:
+        if hasattr(llm_wrapper, "supports_first_message") and llm_wrapper.supports_first_message:
+            chat_completion_result = llm_wrapper.output_to_chat_completion_response(result, first_message=first_message)
+        else:
+            chat_completion_result = llm_wrapper.output_to_chat_completion_response(result)
+        printd(json.dumps(chat_completion_result, indent=2, ensure_ascii=JSON_ENSURE_ASCII))
+    except Exception as e:
+        raise LocalLLMError(f"Failed to parse JSON from local LLM response - error: {str(e)}")
+
+    # Run through some manual function correction (optional)
+    if function_correction:
+        chat_completion_result = patch_function(message_history=messages, new_message=chat_completion_result)
+
+    # Fill in potential missing usage information (used for tracking token use)
+    if not ("prompt_tokens" in usage and "completion_tokens" in usage and "total_tokens" in usage):
+        raise LocalLLMError(f"usage dict in response was missing fields ({usage})")
+
+    if usage["prompt_tokens"] is None:
+        printd(f"usage dict was missing prompt_tokens, computing on-the-fly...")
+        usage["prompt_tokens"] = count_tokens(prompt)
+
+    # NOTE: we should compute on-the-fly anyways since we might have to correct for errors during JSON parsing
+    usage["completion_tokens"] = count_tokens(json.dumps(chat_completion_result, ensure_ascii=JSON_ENSURE_ASCII))
+    """
+    if usage["completion_tokens"] is None:
+        printd(f"usage dict was missing completion_tokens, computing on-the-fly...")
+        # chat_completion_result is dict with 'role' and 'content'
+        # token counter wants a string
+        usage["completion_tokens"] = count_tokens(json.dumps(chat_completion_result, ensure_ascii=JSON_ENSURE_ASCII))
+    """
+
+    # NOTE: this is the token count that matters most
+    if usage["total_tokens"] is None:
+        printd(f"usage dict was missing total_tokens, computing on-the-fly...")
+        usage["total_tokens"] = usage["prompt_tokens"] + usage["completion_tokens"]
+
+    # unpack with response.choices[0].message.content
+    response = ChatCompletionResponse(
+        id=str(uuid.uuid4()),  # TODO something better?
+        choices=[
+            Choice(
+                finish_reason="stop",
+                index=0,
+                message=Message(
+                    role=chat_completion_result["role"],
+                    content=chat_completion_result["content"],
+                    tool_calls=(
+                        [ToolCall(id=get_tool_call_id(), type="function", function=chat_completion_result["function_call"])]
+                        if "function_call" in chat_completion_result
+                        else []
+                    ),
+                ),
+            )
+        ],
+        created=get_utc_time(),
+        model=model,
+        # "This fingerprint represents the backend configuration that the model runs with."
+        # system_fingerprint=user if user is not None else "null",
+        system_fingerprint=None,
+        object="chat.completion",
+        usage=UsageStatistics(**usage),
+    )
+    printd(response)
+    return response
+
+
+def generate_grammar_and_documentation(
+    functions_python: dict,
+    add_inner_thoughts_top_level: bool,
+    add_inner_thoughts_param_level: bool,
+    allow_only_inner_thoughts: bool,
+):
+    from memgpt.utils import printd
+
+    assert not (
+        add_inner_thoughts_top_level and add_inner_thoughts_param_level
+    ), "Can only place inner thoughts in one location in the grammar generator"
+
+    grammar_function_models = []
+    # create_dynamic_model_from_function will add inner thoughts to the function parameters if add_inner_thoughts is True.
+    # generate_gbnf_grammar_and_documentation will add inner thoughts to the outer object of the function parameters if add_inner_thoughts is True.
+    for key, func in functions_python.items():
+        grammar_function_models.append(create_dynamic_model_from_function(func, add_inner_thoughts=add_inner_thoughts_param_level))
+    grammar, documentation = generate_gbnf_grammar_and_documentation(
+        grammar_function_models,
+        outer_object_name="function",
+        outer_object_content="params",
+        model_prefix="function",
+        fields_prefix="params",
+        add_inner_thoughts=add_inner_thoughts_top_level,
+        allow_only_inner_thoughts=allow_only_inner_thoughts,
+    )
+    printd(grammar)
+    return grammar, documentation
--- a/memgpt/local_llm/llm_chat_completion_wrappers/airoboros.py
+++ b/memgpt/local_llm/llm_chat_completion_wrappers/airoboros.py
@@ -1,453 +1,453 @@
-import json
-
-from ...constants import JSON_ENSURE_ASCII, JSON_LOADS_STRICT
-from ...errors import LLMJSONParsingError
-from ..json_parser import clean_json
-from .wrapper_base import LLMChatCompletionWrapper
-
-
-class Airoboros21Wrapper(LLMChatCompletionWrapper):
-    """Wrapper for Airoboros 70b v2.1: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1
-
-    Note: this wrapper formats a prompt that only generates JSON, no inner thoughts
-    """
-
-    def __init__(
-        self,
-        simplify_json_content=True,
-        clean_function_args=True,
-        include_assistant_prefix=True,
-        include_opening_brace_in_prefix=True,
-        include_section_separators=True,
-    ):
-        self.simplify_json_content = simplify_json_content
-        self.clean_func_args = clean_function_args
-        self.include_assistant_prefix = include_assistant_prefix
-        self.include_opening_brance_in_prefix = include_opening_brace_in_prefix
-        self.include_section_separators = include_section_separators
-
-    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
-        """Example for airoboros: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#prompt-format
-
-        A chat.
-        USER: {prompt}
-        ASSISTANT:
-
-        Functions support: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#agentfunction-calling
-
-            As an AI assistant, please select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format.
-
-            Input: I want to know how many times 'Python' is mentioned in my text file.
-
-            Available functions:
-            file_analytics:
-              description: This tool performs various operations on a text file.
-              params:
-                action: The operation we want to perform on the data, such as "count_occurrences", "find_line", etc.
-                filters:
-                  keyword: The word or phrase we want to search for.
-
-        OpenAI functions schema style:
-
-            {
-                "name": "send_message",
-                "description": "Sends a message to the human user",
-                "parameters": {
-                    "type": "object",
-                    "properties": {
-                        # https://json-schema.org/understanding-json-schema/reference/array.html
-                        "message": {
-                            "type": "string",
-                            "description": "Message contents. All unicode (including emojis) are supported.",
-                        },
-                    },
-                    "required": ["message"],
-                }
-            },
-        """
-        prompt = ""
-
-        # System insturctions go first
-        assert messages[0]["role"] == "system"
-        prompt += messages[0]["content"]
-
-        # Next is the functions preamble
-        def create_function_description(schema):
-            # airorobos style
-            func_str = ""
-            func_str += f"{schema['name']}:"
-            func_str += f"\n  description: {schema['description']}"
-            func_str += f"\n  params:"
-            for param_k, param_v in schema["parameters"]["properties"].items():
-                # TODO we're ignoring type
-                func_str += f"\n    {param_k}: {param_v['description']}"
-            # TODO we're ignoring schema['parameters']['required']
-            return func_str
-
-        # prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format."
-        prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the ongoing conversation. Provide your response in JSON format."
-        prompt += f"\nAvailable functions:"
-        if function_documentation is not None:
-            prompt += f"\n{function_documentation}"
-        else:
-            for function_dict in functions:
-                prompt += f"\n{create_function_description(function_dict)}"
-
-        def create_function_call(function_call):
-            """Go from ChatCompletion to Airoboros style function trace (in prompt)
-
-            ChatCompletion data (inside message['function_call']):
-                "function_call": {
-                    "name": ...
-                    "arguments": {
-                        "arg1": val1,
-                        ...
-                    }
-
-            Airoboros output:
-                {
-                  "function": "send_message",
-                  "params": {
-                    "message": "Hello there! I am Sam, an AI developed by Liminal Corp. How can I assist you today?"
-                  }
-                }
-            """
-            airo_func_call = {
-                "function": function_call["name"],
-                "params": json.loads(function_call["arguments"], strict=JSON_LOADS_STRICT),
-            }
-            return json.dumps(airo_func_call, indent=2, ensure_ascii=JSON_ENSURE_ASCII)
-
-        # Add a sep for the conversation
-        if self.include_section_separators:
-            prompt += "\n### INPUT"
-
-        # Last are the user/assistant messages
-        for message in messages[1:]:
-            assert message["role"] in ["user", "assistant", "function", "tool"], message
-
-            if message["role"] == "user":
-                if self.simplify_json_content:
-                    try:
-                        content_json = json.loads(message["content"], strict=JSON_LOADS_STRICT)
-                        content_simple = content_json["message"]
-                        prompt += f"\nUSER: {content_simple}"
-                    except:
-                        prompt += f"\nUSER: {message['content']}"
-            elif message["role"] == "assistant":
-                prompt += f"\nASSISTANT: {message['content']}"
-                # need to add the function call if there was one
-                if "function_call" in message and message["function_call"]:
-                    prompt += f"\n{create_function_call(message['function_call'])}"
-            elif message["role"] in ["function", "tool"]:
-                # TODO find a good way to add this
-                # prompt += f"\nASSISTANT: (function return) {message['content']}"
-                prompt += f"\nFUNCTION RETURN: {message['content']}"
-                continue
-            else:
-                raise ValueError(message)
-
-        # Add a sep for the response
-        if self.include_section_separators:
-            prompt += "\n### RESPONSE"
-
-        if self.include_assistant_prefix:
-            prompt += f"\nASSISTANT:"
-            if self.include_opening_brance_in_prefix:
-                prompt += "\n{"
-
-        print(prompt)
-        return prompt
-
-    def clean_function_args(self, function_name, function_args):
-        """Some basic MemGPT-specific cleaning of function args"""
-        cleaned_function_name = function_name
-        cleaned_function_args = function_args.copy() if function_args is not None else {}
-
-        if function_name == "send_message":
-            # strip request_heartbeat
-            cleaned_function_args.pop("request_heartbeat", None)
-
-        # TODO more cleaning to fix errors LLM makes
-        return cleaned_function_name, cleaned_function_args
-
-    def output_to_chat_completion_response(self, raw_llm_output):
-        """Turn raw LLM output into a ChatCompletion style response with:
-        "message" = {
-            "role": "assistant",
-            "content": ...,
-            "function_call": {
-                "name": ...
-                "arguments": {
-                    "arg1": val1,
-                    ...
-                }
-            }
-        }
-        """
-        if self.include_opening_brance_in_prefix and raw_llm_output[0] != "{":
-            raw_llm_output = "{" + raw_llm_output
-
-        try:
-            function_json_output = clean_json(raw_llm_output)
-        except Exception as e:
-            raise Exception(f"Failed to decode JSON from LLM output:\n{raw_llm_output} - error\n{str(e)}")
-        try:
-            function_name = function_json_output["function"]
-            function_parameters = function_json_output["params"]
-        except KeyError as e:
-            raise LLMJSONParsingError(f"Received valid JSON from LLM, but JSON was missing fields: {str(e)}")
-
-        if self.clean_func_args:
-            function_name, function_parameters = self.clean_function_args(function_name, function_parameters)
-
-        message = {
-            "role": "assistant",
-            "content": None,
-            "function_call": {
-                "name": function_name,
-                "arguments": json.dumps(function_parameters, ensure_ascii=JSON_ENSURE_ASCII),
-            },
-        }
-        return message
-
-
-class Airoboros21InnerMonologueWrapper(Airoboros21Wrapper):
-    """Still expect only JSON outputs from model, but add inner monologue as a field"""
-
-    def __init__(
-        self,
-        simplify_json_content=True,
-        clean_function_args=True,
-        include_assistant_prefix=True,
-        # include_opening_brace_in_prefix=True,
-        # assistant_prefix_extra="\n{"
-        # assistant_prefix_extra='\n{\n  "function": ',
-        assistant_prefix_extra='\n{\n  "function":',
-        include_section_separators=True,
-    ):
-        self.simplify_json_content = simplify_json_content
-        self.clean_func_args = clean_function_args
-        self.include_assistant_prefix = include_assistant_prefix
-        # self.include_opening_brance_in_prefix = include_opening_brace_in_prefix
-        self.assistant_prefix_extra = assistant_prefix_extra
-        self.include_section_separators = include_section_separators
-
-    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
-        """Example for airoboros: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#prompt-format
-
-        A chat.
-        USER: {prompt}
-        ASSISTANT:
-
-        Functions support: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#agentfunction-calling
-
-            As an AI assistant, please select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format.
-
-            Input: I want to know how many times 'Python' is mentioned in my text file.
-
-            Available functions:
-            file_analytics:
-              description: This tool performs various operations on a text file.
-              params:
-                action: The operation we want to perform on the data, such as "count_occurrences", "find_line", etc.
-                filters:
-                  keyword: The word or phrase we want to search for.
-
-        OpenAI functions schema style:
-
-            {
-                "name": "send_message",
-                "description": "Sends a message to the human user",
-                "parameters": {
-                    "type": "object",
-                    "properties": {
-                        # https://json-schema.org/understanding-json-schema/reference/array.html
-                        "message": {
-                            "type": "string",
-                            "description": "Message contents. All unicode (including emojis) are supported.",
-                        },
-                    },
-                    "required": ["message"],
-                }
-            },
-        """
-        prompt = ""
-
-        # System insturctions go first
-        assert messages[0]["role"] == "system"
-        prompt += messages[0]["content"]
-
-        # Next is the functions preamble
-        def create_function_description(schema, add_inner_thoughts=True):
-            # airorobos style
-            func_str = ""
-            func_str += f"{schema['name']}:"
-            func_str += f"\n  description: {schema['description']}"
-            func_str += f"\n  params:"
-            if add_inner_thoughts:
-                func_str += f"\n    inner_thoughts: Deep inner monologue private to you only."
-            for param_k, param_v in schema["parameters"]["properties"].items():
-                # TODO we're ignoring type
-                func_str += f"\n    {param_k}: {param_v['description']}"
-            # TODO we're ignoring schema['parameters']['required']
-            return func_str
-
-        # prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format."
-        prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the ongoing conversation. Provide your response in JSON format."
-        prompt += f"\nAvailable functions:"
-        if function_documentation is not None:
-            prompt += f"\n{function_documentation}"
-        else:
-            for function_dict in functions:
-                prompt += f"\n{create_function_description(function_dict)}"
-
-        def create_function_call(function_call, inner_thoughts=None):
-            """Go from ChatCompletion to Airoboros style function trace (in prompt)
-
-            ChatCompletion data (inside message['function_call']):
-                "function_call": {
-                    "name": ...
-                    "arguments": {
-                        "arg1": val1,
-                        ...
-                    }
-
-            Airoboros output:
-                {
-                  "function": "send_message",
-                  "params": {
-                    "message": "Hello there! I am Sam, an AI developed by Liminal Corp. How can I assist you today?"
-                  }
-                }
-            """
-            airo_func_call = {
-                "function": function_call["name"],
-                "params": {
-                    "inner_thoughts": inner_thoughts,
-                    **json.loads(function_call["arguments"], strict=JSON_LOADS_STRICT),
-                },
-            }
-            return json.dumps(airo_func_call, indent=2, ensure_ascii=JSON_ENSURE_ASCII)
-
-        # Add a sep for the conversation
-        if self.include_section_separators:
-            prompt += "\n### INPUT"
-
-        # Last are the user/assistant messages
-        for message in messages[1:]:
-            assert message["role"] in ["user", "assistant", "function", "tool"], message
-
-            if message["role"] == "user":
-                # Support for AutoGen naming of agents
-                if "name" in message:
-                    user_prefix = message["name"].strip()
-                    user_prefix = f"USER ({user_prefix})"
-                else:
-                    user_prefix = "USER"
-                if self.simplify_json_content:
-                    try:
-                        content_json = json.loads(message["content"], strict=JSON_LOADS_STRICT)
-                        content_simple = content_json["message"]
-                        prompt += f"\n{user_prefix}: {content_simple}"
-                    except:
-                        prompt += f"\n{user_prefix}: {message['content']}"
-            elif message["role"] == "assistant":
-                # Support for AutoGen naming of agents
-                if "name" in message:
-                    assistant_prefix = message["name"].strip()
-                    assistant_prefix = f"ASSISTANT ({assistant_prefix})"
-                else:
-                    assistant_prefix = "ASSISTANT"
-                prompt += f"\n{assistant_prefix}:"
-                # need to add the function call if there was one
-                inner_thoughts = message["content"]
-                if "function_call" in message and message["function_call"]:
-                    prompt += f"\n{create_function_call(message['function_call'], inner_thoughts=inner_thoughts)}"
-            elif message["role"] in ["function", "tool"]:
-                # TODO find a good way to add this
-                # prompt += f"\nASSISTANT: (function return) {message['content']}"
-                prompt += f"\nFUNCTION RETURN: {message['content']}"
-                continue
-            else:
-                raise ValueError(message)
-
-        # Add a sep for the response
-        if self.include_section_separators:
-            prompt += "\n### RESPONSE"
-
-        if self.include_assistant_prefix:
-            prompt += f"\nASSISTANT:"
-            if self.assistant_prefix_extra:
-                prompt += self.assistant_prefix_extra
-
-        return prompt
-
-    def clean_function_args(self, function_name, function_args):
-        """Some basic MemGPT-specific cleaning of function args"""
-        cleaned_function_name = function_name
-        cleaned_function_args = function_args.copy() if function_args is not None else {}
-
-        if function_name == "send_message":
-            # strip request_heartbeat
-            cleaned_function_args.pop("request_heartbeat", None)
-
-        inner_thoughts = None
-        if "inner_thoughts" in function_args:
-            inner_thoughts = cleaned_function_args.pop("inner_thoughts")
-
-        # TODO more cleaning to fix errors LLM makes
-        return inner_thoughts, cleaned_function_name, cleaned_function_args
-
-    def output_to_chat_completion_response(self, raw_llm_output):
-        """Turn raw LLM output into a ChatCompletion style response with:
-        "message" = {
-            "role": "assistant",
-            "content": ...,
-            "function_call": {
-                "name": ...
-                "arguments": {
-                    "arg1": val1,
-                    ...
-                }
-            }
-        }
-        """
-        # if self.include_opening_brance_in_prefix and raw_llm_output[0] != "{":
-        # raw_llm_output = "{" + raw_llm_output
-        if self.assistant_prefix_extra and raw_llm_output[: len(self.assistant_prefix_extra)] != self.assistant_prefix_extra:
-            # print(f"adding prefix back to llm, raw_llm_output=\n{raw_llm_output}")
-            raw_llm_output = self.assistant_prefix_extra + raw_llm_output
-            # print(f"->\n{raw_llm_output}")
-
-        try:
-            function_json_output = clean_json(raw_llm_output)
-        except Exception as e:
-            raise Exception(f"Failed to decode JSON from LLM output:\n{raw_llm_output} - error\n{str(e)}")
-        try:
-            # NOTE: weird bug can happen where 'function' gets nested if the prefix in the prompt isn't abided by
-            if isinstance(function_json_output["function"], dict):
-                function_json_output = function_json_output["function"]
-            function_name = function_json_output["function"]
-            function_parameters = function_json_output["params"]
-        except KeyError as e:
-            raise LLMJSONParsingError(
-                f"Received valid JSON from LLM, but JSON was missing fields: {str(e)}. JSON result was:\n{function_json_output}"
-            )
-
-        if self.clean_func_args:
-            (
-                inner_thoughts,
-                function_name,
-                function_parameters,
-            ) = self.clean_function_args(function_name, function_parameters)
-
-        message = {
-            "role": "assistant",
-            "content": inner_thoughts,
-            "function_call": {
-                "name": function_name,
-                "arguments": json.dumps(function_parameters, ensure_ascii=JSON_ENSURE_ASCII),
-            },
-        }
-        return message
+import json
+
+from ...constants import JSON_ENSURE_ASCII, JSON_LOADS_STRICT
+from ...errors import LLMJSONParsingError
+from ..json_parser import clean_json
+from .wrapper_base import LLMChatCompletionWrapper
+
+
+class Airoboros21Wrapper(LLMChatCompletionWrapper):
+    """Wrapper for Airoboros 70b v2.1: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1
+
+    Note: this wrapper formats a prompt that only generates JSON, no inner thoughts
+    """
+
+    def __init__(
+        self,
+        simplify_json_content=True,
+        clean_function_args=True,
+        include_assistant_prefix=True,
+        include_opening_brace_in_prefix=True,
+        include_section_separators=True,
+    ):
+        self.simplify_json_content = simplify_json_content
+        self.clean_func_args = clean_function_args
+        self.include_assistant_prefix = include_assistant_prefix
+        self.include_opening_brance_in_prefix = include_opening_brace_in_prefix
+        self.include_section_separators = include_section_separators
+
+    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
+        """Example for airoboros: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#prompt-format
+
+        A chat.
+        USER: {prompt}
+        ASSISTANT:
+
+        Functions support: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#agentfunction-calling
+
+            As an AI assistant, please select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format.
+
+            Input: I want to know how many times 'Python' is mentioned in my text file.
+
+            Available functions:
+            file_analytics:
+              description: This tool performs various operations on a text file.
+              params:
+                action: The operation we want to perform on the data, such as "count_occurrences", "find_line", etc.
+                filters:
+                  keyword: The word or phrase we want to search for.
+
+        OpenAI functions schema style:
+
+            {
+                "name": "send_message",
+                "description": "Sends a message to the human user",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        # https://json-schema.org/understanding-json-schema/reference/array.html
+                        "message": {
+                            "type": "string",
+                            "description": "Message contents. All unicode (including emojis) are supported.",
+                        },
+                    },
+                    "required": ["message"],
+                }
+            },
+        """
+        prompt = ""
+
+        # System insturctions go first
+        assert messages[0]["role"] == "system"
+        prompt += messages[0]["content"]
+
+        # Next is the functions preamble
+        def create_function_description(schema):
+            # airorobos style
+            func_str = ""
+            func_str += f"{schema['name']}:"
+            func_str += f"\n  description: {schema['description']}"
+            func_str += f"\n  params:"
+            for param_k, param_v in schema["parameters"]["properties"].items():
+                # TODO we're ignoring type
+                func_str += f"\n    {param_k}: {param_v['description']}"
+            # TODO we're ignoring schema['parameters']['required']
+            return func_str
+
+        # prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format."
+        prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the ongoing conversation. Provide your response in JSON format."
+        prompt += f"\nAvailable functions:"
+        if function_documentation is not None:
+            prompt += f"\n{function_documentation}"
+        else:
+            for function_dict in functions:
+                prompt += f"\n{create_function_description(function_dict)}"
+
+        def create_function_call(function_call):
+            """Go from ChatCompletion to Airoboros style function trace (in prompt)
+
+            ChatCompletion data (inside message['function_call']):
+                "function_call": {
+                    "name": ...
+                    "arguments": {
+                        "arg1": val1,
+                        ...
+                    }
+
+            Airoboros output:
+                {
+                  "function": "send_message",
+                  "params": {
+                    "message": "Hello there! I am Sam, an AI developed by Liminal Corp. How can I assist you today?"
+                  }
+                }
+            """
+            airo_func_call = {
+                "function": function_call["name"],
+                "params": json.loads(function_call["arguments"], strict=JSON_LOADS_STRICT),
+            }
+            return json.dumps(airo_func_call, indent=2, ensure_ascii=JSON_ENSURE_ASCII)
+
+        # Add a sep for the conversation
+        if self.include_section_separators:
+            prompt += "\n### INPUT"
+
+        # Last are the user/assistant messages
+        for message in messages[1:]:
+            assert message["role"] in ["user", "assistant", "function", "tool"], message
+
+            if message["role"] == "user":
+                if self.simplify_json_content:
+                    try:
+                        content_json = json.loads(message["content"], strict=JSON_LOADS_STRICT)
+                        content_simple = content_json["message"]
+                        prompt += f"\nUSER: {content_simple}"
+                    except:
+                        prompt += f"\nUSER: {message['content']}"
+            elif message["role"] == "assistant":
+                prompt += f"\nASSISTANT: {message['content']}"
+                # need to add the function call if there was one
+                if "function_call" in message and message["function_call"]:
+                    prompt += f"\n{create_function_call(message['function_call'])}"
+            elif message["role"] in ["function", "tool"]:
+                # TODO find a good way to add this
+                # prompt += f"\nASSISTANT: (function return) {message['content']}"
+                prompt += f"\nFUNCTION RETURN: {message['content']}"
+                continue
+            else:
+                raise ValueError(message)
+
+        # Add a sep for the response
+        if self.include_section_separators:
+            prompt += "\n### RESPONSE"
+
+        if self.include_assistant_prefix:
+            prompt += f"\nASSISTANT:"
+            if self.include_opening_brance_in_prefix:
+                prompt += "\n{"
+
+        print(prompt)
+        return prompt
+
+    def clean_function_args(self, function_name, function_args):
+        """Some basic MemGPT-specific cleaning of function args"""
+        cleaned_function_name = function_name
+        cleaned_function_args = function_args.copy() if function_args is not None else {}
+
+        if function_name == "send_message":
+            # strip request_heartbeat
+            cleaned_function_args.pop("request_heartbeat", None)
+
+        # TODO more cleaning to fix errors LLM makes
+        return cleaned_function_name, cleaned_function_args
+
+    def output_to_chat_completion_response(self, raw_llm_output):
+        """Turn raw LLM output into a ChatCompletion style response with:
+        "message" = {
+            "role": "assistant",
+            "content": ...,
+            "function_call": {
+                "name": ...
+                "arguments": {
+                    "arg1": val1,
+                    ...
+                }
+            }
+        }
+        """
+        if self.include_opening_brance_in_prefix and raw_llm_output[0] != "{":
+            raw_llm_output = "{" + raw_llm_output
+
+        try:
+            function_json_output = clean_json(raw_llm_output)
+        except Exception as e:
+            raise Exception(f"Failed to decode JSON from LLM output:\n{raw_llm_output} - error\n{str(e)}")
+        try:
+            function_name = function_json_output["function"]
+            function_parameters = function_json_output["params"]
+        except KeyError as e:
+            raise LLMJSONParsingError(f"Received valid JSON from LLM, but JSON was missing fields: {str(e)}")
+
+        if self.clean_func_args:
+            function_name, function_parameters = self.clean_function_args(function_name, function_parameters)
+
+        message = {
+            "role": "assistant",
+            "content": None,
+            "function_call": {
+                "name": function_name,
+                "arguments": json.dumps(function_parameters, ensure_ascii=JSON_ENSURE_ASCII),
+            },
+        }
+        return message
+
+
+class Airoboros21InnerMonologueWrapper(Airoboros21Wrapper):
+    """Still expect only JSON outputs from model, but add inner monologue as a field"""
+
+    def __init__(
+        self,
+        simplify_json_content=True,
+        clean_function_args=True,
+        include_assistant_prefix=True,
+        # include_opening_brace_in_prefix=True,
+        # assistant_prefix_extra="\n{"
+        # assistant_prefix_extra='\n{\n  "function": ',
+        assistant_prefix_extra='\n{\n  "function":',
+        include_section_separators=True,
+    ):
+        self.simplify_json_content = simplify_json_content
+        self.clean_func_args = clean_function_args
+        self.include_assistant_prefix = include_assistant_prefix
+        # self.include_opening_brance_in_prefix = include_opening_brace_in_prefix
+        self.assistant_prefix_extra = assistant_prefix_extra
+        self.include_section_separators = include_section_separators
+
+    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
+        """Example for airoboros: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#prompt-format
+
+        A chat.
+        USER: {prompt}
+        ASSISTANT:
+
+        Functions support: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#agentfunction-calling
+
+            As an AI assistant, please select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format.
+
+            Input: I want to know how many times 'Python' is mentioned in my text file.
+
+            Available functions:
+            file_analytics:
+              description: This tool performs various operations on a text file.
+              params:
+                action: The operation we want to perform on the data, such as "count_occurrences", "find_line", etc.
+                filters:
+                  keyword: The word or phrase we want to search for.
+
+        OpenAI functions schema style:
+
+            {
+                "name": "send_message",
+                "description": "Sends a message to the human user",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        # https://json-schema.org/understanding-json-schema/reference/array.html
+                        "message": {
+                            "type": "string",
+                            "description": "Message contents. All unicode (including emojis) are supported.",
+                        },
+                    },
+                    "required": ["message"],
+                }
+            },
+        """
+        prompt = ""
+
+        # System insturctions go first
+        assert messages[0]["role"] == "system"
+        prompt += messages[0]["content"]
+
+        # Next is the functions preamble
+        def create_function_description(schema, add_inner_thoughts=True):
+            # airorobos style
+            func_str = ""
+            func_str += f"{schema['name']}:"
+            func_str += f"\n  description: {schema['description']}"
+            func_str += f"\n  params:"
+            if add_inner_thoughts:
+                func_str += f"\n    inner_thoughts: Deep inner monologue private to you only."
+            for param_k, param_v in schema["parameters"]["properties"].items():
+                # TODO we're ignoring type
+                func_str += f"\n    {param_k}: {param_v['description']}"
+            # TODO we're ignoring schema['parameters']['required']
+            return func_str
+
+        # prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format."
+        prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the ongoing conversation. Provide your response in JSON format."
+        prompt += f"\nAvailable functions:"
+        if function_documentation is not None:
+            prompt += f"\n{function_documentation}"
+        else:
+            for function_dict in functions:
+                prompt += f"\n{create_function_description(function_dict)}"
+
+        def create_function_call(function_call, inner_thoughts=None):
+            """Go from ChatCompletion to Airoboros style function trace (in prompt)
+
+            ChatCompletion data (inside message['function_call']):
+                "function_call": {
+                    "name": ...
+                    "arguments": {
+                        "arg1": val1,
+                        ...
+                    }
+
+            Airoboros output:
+                {
+                  "function": "send_message",
+                  "params": {
+                    "message": "Hello there! I am Sam, an AI developed by Liminal Corp. How can I assist you today?"
+                  }
+                }
+            """
+            airo_func_call = {
+                "function": function_call["name"],
+                "params": {
+                    "inner_thoughts": inner_thoughts,
+                    **json.loads(function_call["arguments"], strict=JSON_LOADS_STRICT),
+                },
+            }
+            return json.dumps(airo_func_call, indent=2, ensure_ascii=JSON_ENSURE_ASCII)
+
+        # Add a sep for the conversation
+        if self.include_section_separators:
+            prompt += "\n### INPUT"
+
+        # Last are the user/assistant messages
+        for message in messages[1:]:
+            assert message["role"] in ["user", "assistant", "function", "tool"], message
+
+            if message["role"] == "user":
+                # Support for AutoGen naming of agents
+                if "name" in message:
+                    user_prefix = message["name"].strip()
+                    user_prefix = f"USER ({user_prefix})"
+                else:
+                    user_prefix = "USER"
+                if self.simplify_json_content:
+                    try:
+                        content_json = json.loads(message["content"], strict=JSON_LOADS_STRICT)
+                        content_simple = content_json["message"]
+                        prompt += f"\n{user_prefix}: {content_simple}"
+                    except:
+                        prompt += f"\n{user_prefix}: {message['content']}"
+            elif message["role"] == "assistant":
+                # Support for AutoGen naming of agents
+                if "name" in message:
+                    assistant_prefix = message["name"].strip()
+                    assistant_prefix = f"ASSISTANT ({assistant_prefix})"
+                else:
+                    assistant_prefix = "ASSISTANT"
+                prompt += f"\n{assistant_prefix}:"
+                # need to add the function call if there was one
+                inner_thoughts = message["content"]
+                if "function_call" in message and message["function_call"]:
+                    prompt += f"\n{create_function_call(message['function_call'], inner_thoughts=inner_thoughts)}"
+            elif message["role"] in ["function", "tool"]:
+                # TODO find a good way to add this
+                # prompt += f"\nASSISTANT: (function return) {message['content']}"
+                prompt += f"\nFUNCTION RETURN: {message['content']}"
+                continue
+            else:
+                raise ValueError(message)
+
+        # Add a sep for the response
+        if self.include_section_separators:
+            prompt += "\n### RESPONSE"
+
+        if self.include_assistant_prefix:
+            prompt += f"\nASSISTANT:"
+            if self.assistant_prefix_extra:
+                prompt += self.assistant_prefix_extra
+
+        return prompt
+
+    def clean_function_args(self, function_name, function_args):
+        """Some basic MemGPT-specific cleaning of function args"""
+        cleaned_function_name = function_name
+        cleaned_function_args = function_args.copy() if function_args is not None else {}
+
+        if function_name == "send_message":
+            # strip request_heartbeat
+            cleaned_function_args.pop("request_heartbeat", None)
+
+        inner_thoughts = None
+        if "inner_thoughts" in function_args:
+            inner_thoughts = cleaned_function_args.pop("inner_thoughts")
+
+        # TODO more cleaning to fix errors LLM makes
+        return inner_thoughts, cleaned_function_name, cleaned_function_args
+
+    def output_to_chat_completion_response(self, raw_llm_output):
+        """Turn raw LLM output into a ChatCompletion style response with:
+        "message" = {
+            "role": "assistant",
+            "content": ...,
+            "function_call": {
+                "name": ...
+                "arguments": {
+                    "arg1": val1,
+                    ...
+                }
+            }
+        }
+        """
+        # if self.include_opening_brance_in_prefix and raw_llm_output[0] != "{":
+        # raw_llm_output = "{" + raw_llm_output
+        if self.assistant_prefix_extra and raw_llm_output[: len(self.assistant_prefix_extra)] != self.assistant_prefix_extra:
+            # print(f"adding prefix back to llm, raw_llm_output=\n{raw_llm_output}")
+            raw_llm_output = self.assistant_prefix_extra + raw_llm_output
+            # print(f"->\n{raw_llm_output}")
+
+        try:
+            function_json_output = clean_json(raw_llm_output)
+        except Exception as e:
+            raise Exception(f"Failed to decode JSON from LLM output:\n{raw_llm_output} - error\n{str(e)}")
+        try:
+            # NOTE: weird bug can happen where 'function' gets nested if the prefix in the prompt isn't abided by
+            if isinstance(function_json_output["function"], dict):
+                function_json_output = function_json_output["function"]
+            function_name = function_json_output["function"]
+            function_parameters = function_json_output["params"]
+        except KeyError as e:
+            raise LLMJSONParsingError(
+                f"Received valid JSON from LLM, but JSON was missing fields: {str(e)}. JSON result was:\n{function_json_output}"
+            )
+
+        if self.clean_func_args:
+            (
+                inner_thoughts,
+                function_name,
+                function_parameters,
+            ) = self.clean_function_args(function_name, function_parameters)
+
+        message = {
+            "role": "assistant",
+            "content": inner_thoughts,
+            "function_call": {
+                "name": function_name,
+                "arguments": json.dumps(function_parameters, ensure_ascii=JSON_ENSURE_ASCII),
+            },
+        }
+        return message
--- a/memgpt/local_llm/llm_chat_completion_wrappers/wrapper_base.py
+++ b/memgpt/local_llm/llm_chat_completion_wrappers/wrapper_base.py
@@ -1,11 +1,11 @@
-from abc import ABC, abstractmethod
-
-
-class LLMChatCompletionWrapper(ABC):
-    @abstractmethod
-    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
-        """Go from ChatCompletion to a single prompt string"""
-
-    @abstractmethod
-    def output_to_chat_completion_response(self, raw_llm_output):
-        """Turn the LLM output string into a ChatCompletion response"""
+from abc import ABC, abstractmethod
+
+
+class LLMChatCompletionWrapper(ABC):
+    @abstractmethod
+    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
+        """Go from ChatCompletion to a single prompt string"""
+
+    @abstractmethod
+    def output_to_chat_completion_response(self, raw_llm_output):
+        """Turn the LLM output string into a ChatCompletion response"""
--- a/memgpt/local_llm/llm_chat_completion_wrappers/zephyr.py
+++ b/memgpt/local_llm/llm_chat_completion_wrappers/zephyr.py
@@ -1,346 +1,346 @@
-import json
-
-from ...constants import JSON_ENSURE_ASCII
-from ...errors import LLMJSONParsingError
-from ..json_parser import clean_json
-from .wrapper_base import LLMChatCompletionWrapper
-
-
-class ZephyrMistralWrapper(LLMChatCompletionWrapper):
-    """
-    Wrapper for Zephyr Alpha and Beta, Mistral 7B:
-    https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
-    https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
-    Note: this wrapper formats a prompt that only generates JSON, no inner thoughts
-    """
-
-    def __init__(
-        self,
-        simplify_json_content=True,
-        clean_function_args=True,
-        include_assistant_prefix=True,
-        include_opening_brace_in_prefix=True,
-        include_section_separators=False,
-    ):
-        self.simplify_json_content = simplify_json_content
-        self.clean_func_args = clean_function_args
-        self.include_assistant_prefix = include_assistant_prefix
-        self.include_opening_brance_in_prefix = include_opening_brace_in_prefix
-        self.include_section_separators = include_section_separators
-
-    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
-        """
-        Zephyr prompt format:
-            <|system|>
-            </s>
-            <|user|>
-            {prompt}</s>
-            <|assistant|>
-        (source: https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF#prompt-template-zephyr)
-        """
-
-        prompt = ""
-
-        IM_END_TOKEN = "</s>"
-
-        # System instructions go first
-        assert messages[0]["role"] == "system"
-        prompt += f"<|system|>"
-        prompt += f"\n{messages[0]['content']}"
-
-        # Next is the functions preamble
-        def create_function_description(schema):
-            # airorobos style
-            func_str = ""
-            func_str += f"{schema['name']}:"
-            func_str += f"\n  description: {schema['description']}"
-            func_str += f"\n  params:"
-            for param_k, param_v in schema["parameters"]["properties"].items():
-                # TODO we're ignoring type
-                func_str += f"\n    {param_k}: {param_v['description']}"
-            # TODO we're ignoring schema['parameters']['required']
-            return func_str
-
-        # prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format."
-        prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the ongoing conversation. Provide your response in JSON format."
-        prompt += f"\nAvailable functions:"
-        if function_documentation is not None:
-            prompt += f"\n{function_documentation}"
-        else:
-            for function_dict in functions:
-                prompt += f"\n{create_function_description(function_dict)}"
-
-        # Put functions INSIDE system message (TODO experiment with this)
-        prompt += IM_END_TOKEN
-
-        def create_function_call(function_call):
-            airo_func_call = {
-                "function": function_call["name"],
-                "params": json.loads(function_call["arguments"], strict=JSON_LOADS_STRICT),
-            }
-            return json.dumps(airo_func_call, indent=2, ensure_ascii=JSON_ENSURE_ASCII)
-
-        for message in messages[1:]:
-            assert message["role"] in ["user", "assistant", "function", "tool"], message
-
-            if message["role"] == "user":
-                if self.simplify_json_content:
-                    try:
-                        content_json = json.loads(message["content"], strict=JSON_LOADS_STRICT)
-                        content_simple = content_json["message"]
-                        prompt += f"\n<|user|>\n{content_simple}{IM_END_TOKEN}"
-                        # prompt += f"\nUSER: {content_simple}"
-                    except:
-                        prompt += f"\n<|user|>\n{message['content']}{IM_END_TOKEN}"
-                        # prompt += f"\nUSER: {message['content']}"
-            elif message["role"] == "assistant":
-                prompt += f"\n<|assistant|>"
-                if message["content"] is not None:
-                    prompt += f"\n{message['content']}"
-                # prompt += f"\nASSISTANT: {message['content']}"
-                # need to add the function call if there was one
-                if "function_call" in message and message["function_call"]:
-                    prompt += f"\n{create_function_call(message['function_call'])}"
-                prompt += f"{IM_END_TOKEN}"
-            elif message["role"] in ["function", "tool"]:
-                # TODO find a good way to add this
-                # prompt += f"\nASSISTANT: (function return) {message['content']}"
-                prompt += f"\n<|assistant|>"
-                prompt += f"\nFUNCTION RETURN: {message['content']}"
-                # prompt += f"\nFUNCTION RETURN: {message['content']}"
-                continue
-            else:
-                raise ValueError(message)
-
-        # Add a sep for the response
-        # if self.include_section_separators:
-        # prompt += "\n### RESPONSE"
-
-        if self.include_assistant_prefix:
-            # prompt += f"\nASSISTANT:"
-            prompt += f"\n<|assistant|>"
-            if self.include_opening_brance_in_prefix:
-                prompt += "\n{"
-
-        return prompt
-
-    def clean_function_args(self, function_name, function_args):
-        """Some basic MemGPT-specific cleaning of function args"""
-        cleaned_function_name = function_name
-        cleaned_function_args = function_args.copy() if function_args is not None else {}
-
-        if function_name == "send_message":
-            # strip request_heartbeat
-            cleaned_function_args.pop("request_heartbeat", None)
-
-        # TODO more cleaning to fix errors LLM makes
-        return cleaned_function_name, cleaned_function_args
-
-    def output_to_chat_completion_response(self, raw_llm_output):
-        """Turn raw LLM output into a ChatCompletion style response with:
-        "message" = {
-            "role": "assistant",
-            "content": ...,
-            "function_call": {
-                "name": ...
-                "arguments": {
-                    "arg1": val1,
-                    ...
-                }
-            }
-        }
-        """
-        if self.include_opening_brance_in_prefix and raw_llm_output[0] != "{":
-            raw_llm_output = "{" + raw_llm_output
-
-        try:
-            function_json_output = clean_json(raw_llm_output)
-        except Exception as e:
-            raise Exception(f"Failed to decode JSON from LLM output:\n{raw_llm_output} - error\n{str(e)}")
-        try:
-            function_name = function_json_output["function"]
-            function_parameters = function_json_output["params"]
-        except KeyError as e:
-            raise LLMJSONParsingError(f"Received valid JSON from LLM, but JSON was missing fields: {str(e)}")
-
-        if self.clean_func_args:
-            function_name, function_parameters = self.clean_function_args(function_name, function_parameters)
-
-        message = {
-            "role": "assistant",
-            "content": None,
-            "function_call": {
-                "name": function_name,
-                "arguments": json.dumps(function_parameters, ensure_ascii=JSON_ENSURE_ASCII),
-            },
-        }
-        return message
-
-
-class ZephyrMistralInnerMonologueWrapper(ZephyrMistralWrapper):
-    """Still expect only JSON outputs from model, but add inner monologue as a field"""
-
-    """
-    Wrapper for Zephyr Alpha and Beta, Mistral 7B:
-    https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
-    https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
-    Note: this wrapper formats a prompt with inner thoughts included
-    """
-
-    def __init__(
-        self,
-        simplify_json_content=True,
-        clean_function_args=True,
-        include_assistant_prefix=True,
-        include_opening_brace_in_prefix=True,
-        include_section_separators=True,
-    ):
-        self.simplify_json_content = simplify_json_content
-        self.clean_func_args = clean_function_args
-        self.include_assistant_prefix = include_assistant_prefix
-        self.include_opening_brance_in_prefix = include_opening_brace_in_prefix
-        self.include_section_separators = include_section_separators
-
-    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
-        prompt = ""
-
-        IM_END_TOKEN = "</s>"
-
-        # System insturctions go first
-        assert messages[0]["role"] == "system"
-        prompt += messages[0]["content"]
-
-        # Next is the functions preamble
-        def create_function_description(schema, add_inner_thoughts=True):
-            # airorobos style
-            func_str = ""
-            func_str += f"{schema['name']}:"
-            func_str += f"\n  description: {schema['description']}"
-            func_str += f"\n  params:"
-            if add_inner_thoughts:
-                func_str += f"\n    inner_thoughts: Deep inner monologue private to you only."
-            for param_k, param_v in schema["parameters"]["properties"].items():
-                # TODO we're ignoring type
-                func_str += f"\n    {param_k}: {param_v['description']}"
-            # TODO we're ignoring schema['parameters']['required']
-            return func_str
-
-        # prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format."
-        prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the ongoing conversation. Provide your response in JSON format."
-        prompt += f"\nAvailable functions:"
-        if function_documentation is not None:
-            prompt += f"\n{function_documentation}"
-        else:
-            for function_dict in functions:
-                prompt += f"\n{create_function_description(function_dict)}"
-
-        def create_function_call(function_call, inner_thoughts=None):
-            airo_func_call = {
-                "function": function_call["name"],
-                "params": {
-                    "inner_thoughts": inner_thoughts,
-                    **json.loads(function_call["arguments"], strict=JSON_LOADS_STRICT),
-                },
-            }
-            return json.dumps(airo_func_call, indent=2, ensure_ascii=JSON_ENSURE_ASCII)
-
-        # Add a sep for the conversation
-        if self.include_section_separators:
-            prompt += "\n<|user|>"
-
-        # Last are the user/assistant messages
-        for message in messages[1:]:
-            assert message["role"] in ["user", "assistant", "function", "tool"], message
-
-            if message["role"] == "user":
-                if self.simplify_json_content:
-                    try:
-                        content_json = json.loads(message["content"], strict=JSON_LOADS_STRICT)
-                        content_simple = content_json["message"]
-                        prompt += f"\n<|user|>\n{content_simple}{IM_END_TOKEN}"
-                    except:
-                        prompt += f"\n<|user|>\n{message['content']}{IM_END_TOKEN}"
-            elif message["role"] == "assistant":
-                prompt += f"\n<|assistant|>"
-                # need to add the function call if there was one
-                inner_thoughts = message["content"]
-                if "function_call" in message and message["function_call"]:
-                    prompt += f"\n{create_function_call(message['function_call'], inner_thoughts=inner_thoughts)}"
-            elif message["role"] in ["function", "tool"]:
-                # TODO find a good way to add this
-                # prompt += f"\nASSISTANT: (function return) {message['content']}"
-                prompt += f"\nFUNCTION RETURN: {message['content']}"
-                continue
-            else:
-                raise ValueError(message)
-
-        # Add a sep for the response
-        # if self.include_section_separators:
-        #    prompt += "\n### RESPONSE"
-
-        if self.include_assistant_prefix:
-            prompt += f"\n<|assistant|>"
-            if self.include_opening_brance_in_prefix:
-                prompt += "\n{"
-
-        return prompt
-
-    def clean_function_args(self, function_name, function_args):
-        """Some basic MemGPT-specific cleaning of function args"""
-        cleaned_function_name = function_name
-        cleaned_function_args = function_args.copy() if function_args is not None else {}
-
-        if function_name == "send_message":
-            # strip request_heartbeat
-            cleaned_function_args.pop("request_heartbeat", None)
-
-        inner_thoughts = None
-        if "inner_thoughts" in function_args:
-            inner_thoughts = cleaned_function_args.pop("inner_thoughts")
-
-        # TODO more cleaning to fix errors LLM makes
-        return inner_thoughts, cleaned_function_name, cleaned_function_args
-
-    def output_to_chat_completion_response(self, raw_llm_output):
-        """Turn raw LLM output into a ChatCompletion style response with:
-        "message" = {
-            "role": "assistant",
-            "content": ...,
-            "function_call": {
-                "name": ...
-                "arguments": {
-                    "arg1": val1,
-                    ...
-                }
-            }
-        }
-        """
-        if self.include_opening_brance_in_prefix and raw_llm_output[0] != "{":
-            raw_llm_output = "{" + raw_llm_output
-
-        try:
-            function_json_output = clean_json(raw_llm_output)
-        except Exception as e:
-            raise Exception(f"Failed to decode JSON from LLM output:\n{raw_llm_output} - error\n{str(e)}")
-        try:
-            function_name = function_json_output["function"]
-            function_parameters = function_json_output["params"]
-        except KeyError as e:
-            raise LLMJSONParsingError(f"Received valid JSON from LLM, but JSON was missing fields: {str(e)}")
-
-        if self.clean_func_args:
-            (
-                inner_thoughts,
-                function_name,
-                function_parameters,
-            ) = self.clean_function_args(function_name, function_parameters)
-
-        message = {
-            "role": "assistant",
-            "content": inner_thoughts,
-            "function_call": {
-                "name": function_name,
-                "arguments": json.dumps(function_parameters, ensure_ascii=JSON_ENSURE_ASCII),
-            },
-        }
-        return message
+import json
+
+from ...constants import JSON_ENSURE_ASCII
+from ...errors import LLMJSONParsingError
+from ..json_parser import clean_json
+from .wrapper_base import LLMChatCompletionWrapper
+
+
+class ZephyrMistralWrapper(LLMChatCompletionWrapper):
+    """
+    Wrapper for Zephyr Alpha and Beta, Mistral 7B:
+    https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
+    https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
+    Note: this wrapper formats a prompt that only generates JSON, no inner thoughts
+    """
+
+    def __init__(
+        self,
+        simplify_json_content=True,
+        clean_function_args=True,
+        include_assistant_prefix=True,
+        include_opening_brace_in_prefix=True,
+        include_section_separators=False,
+    ):
+        self.simplify_json_content = simplify_json_content
+        self.clean_func_args = clean_function_args
+        self.include_assistant_prefix = include_assistant_prefix
+        self.include_opening_brance_in_prefix = include_opening_brace_in_prefix
+        self.include_section_separators = include_section_separators
+
+    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
+        """
+        Zephyr prompt format:
+            <|system|>
+            </s>
+            <|user|>
+            {prompt}</s>
+            <|assistant|>
+        (source: https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF#prompt-template-zephyr)
+        """
+
+        prompt = ""
+
+        IM_END_TOKEN = "</s>"
+
+        # System instructions go first
+        assert messages[0]["role"] == "system"
+        prompt += f"<|system|>"
+        prompt += f"\n{messages[0]['content']}"
+
+        # Next is the functions preamble
+        def create_function_description(schema):
+            # airorobos style
+            func_str = ""
+            func_str += f"{schema['name']}:"
+            func_str += f"\n  description: {schema['description']}"
+            func_str += f"\n  params:"
+            for param_k, param_v in schema["parameters"]["properties"].items():
+                # TODO we're ignoring type
+                func_str += f"\n    {param_k}: {param_v['description']}"
+            # TODO we're ignoring schema['parameters']['required']
+            return func_str
+
+        # prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format."
+        prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the ongoing conversation. Provide your response in JSON format."
+        prompt += f"\nAvailable functions:"
+        if function_documentation is not None:
+            prompt += f"\n{function_documentation}"
+        else:
+            for function_dict in functions:
+                prompt += f"\n{create_function_description(function_dict)}"
+
+        # Put functions INSIDE system message (TODO experiment with this)
+        prompt += IM_END_TOKEN
+
+        def create_function_call(function_call):
+            airo_func_call = {
+                "function": function_call["name"],
+                "params": json.loads(function_call["arguments"], strict=JSON_LOADS_STRICT),
+            }
+            return json.dumps(airo_func_call, indent=2, ensure_ascii=JSON_ENSURE_ASCII)
+
+        for message in messages[1:]:
+            assert message["role"] in ["user", "assistant", "function", "tool"], message
+
+            if message["role"] == "user":
+                if self.simplify_json_content:
+                    try:
+                        content_json = json.loads(message["content"], strict=JSON_LOADS_STRICT)
+                        content_simple = content_json["message"]
+                        prompt += f"\n<|user|>\n{content_simple}{IM_END_TOKEN}"
+                        # prompt += f"\nUSER: {content_simple}"
+                    except:
+                        prompt += f"\n<|user|>\n{message['content']}{IM_END_TOKEN}"
+                        # prompt += f"\nUSER: {message['content']}"
+            elif message["role"] == "assistant":
+                prompt += f"\n<|assistant|>"
+                if message["content"] is not None:
+                    prompt += f"\n{message['content']}"
+                # prompt += f"\nASSISTANT: {message['content']}"
+                # need to add the function call if there was one
+                if "function_call" in message and message["function_call"]:
+                    prompt += f"\n{create_function_call(message['function_call'])}"
+                prompt += f"{IM_END_TOKEN}"
+            elif message["role"] in ["function", "tool"]:
+                # TODO find a good way to add this
+                # prompt += f"\nASSISTANT: (function return) {message['content']}"
+                prompt += f"\n<|assistant|>"
+                prompt += f"\nFUNCTION RETURN: {message['content']}"
+                # prompt += f"\nFUNCTION RETURN: {message['content']}"
+                continue
+            else:
+                raise ValueError(message)
+
+        # Add a sep for the response
+        # if self.include_section_separators:
+        # prompt += "\n### RESPONSE"
+
+        if self.include_assistant_prefix:
+            # prompt += f"\nASSISTANT:"
+            prompt += f"\n<|assistant|>"
+            if self.include_opening_brance_in_prefix:
+                prompt += "\n{"
+
+        return prompt
+
+    def clean_function_args(self, function_name, function_args):
+        """Some basic MemGPT-specific cleaning of function args"""
+        cleaned_function_name = function_name
+        cleaned_function_args = function_args.copy() if function_args is not None else {}
+
+        if function_name == "send_message":
+            # strip request_heartbeat
+            cleaned_function_args.pop("request_heartbeat", None)
+
+        # TODO more cleaning to fix errors LLM makes
+        return cleaned_function_name, cleaned_function_args
+
+    def output_to_chat_completion_response(self, raw_llm_output):
+        """Turn raw LLM output into a ChatCompletion style response with:
+        "message" = {
+            "role": "assistant",
+            "content": ...,
+            "function_call": {
+                "name": ...
+                "arguments": {
+                    "arg1": val1,
+                    ...
+                }
+            }
+        }
+        """
+        if self.include_opening_brance_in_prefix and raw_llm_output[0] != "{":
+            raw_llm_output = "{" + raw_llm_output
+
+        try:
+            function_json_output = clean_json(raw_llm_output)
+        except Exception as e:
+            raise Exception(f"Failed to decode JSON from LLM output:\n{raw_llm_output} - error\n{str(e)}")
+        try:
+            function_name = function_json_output["function"]
+            function_parameters = function_json_output["params"]
+        except KeyError as e:
+            raise LLMJSONParsingError(f"Received valid JSON from LLM, but JSON was missing fields: {str(e)}")
+
+        if self.clean_func_args:
+            function_name, function_parameters = self.clean_function_args(function_name, function_parameters)
+
+        message = {
+            "role": "assistant",
+            "content": None,
+            "function_call": {
+                "name": function_name,
+                "arguments": json.dumps(function_parameters, ensure_ascii=JSON_ENSURE_ASCII),
+            },
+        }
+        return message
+
+
+class ZephyrMistralInnerMonologueWrapper(ZephyrMistralWrapper):
+    """Still expect only JSON outputs from model, but add inner monologue as a field"""
+
+    """
+    Wrapper for Zephyr Alpha and Beta, Mistral 7B:
+    https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
+    https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
+    Note: this wrapper formats a prompt with inner thoughts included
+    """
+
+    def __init__(
+        self,
+        simplify_json_content=True,
+        clean_function_args=True,
+        include_assistant_prefix=True,
+        include_opening_brace_in_prefix=True,
+        include_section_separators=True,
+    ):
+        self.simplify_json_content = simplify_json_content
+        self.clean_func_args = clean_function_args
+        self.include_assistant_prefix = include_assistant_prefix
+        self.include_opening_brance_in_prefix = include_opening_brace_in_prefix
+        self.include_section_separators = include_section_separators
+
+    def chat_completion_to_prompt(self, messages, functions, function_documentation=None):
+        prompt = ""
+
+        IM_END_TOKEN = "</s>"
+
+        # System insturctions go first
+        assert messages[0]["role"] == "system"
+        prompt += messages[0]["content"]
+
+        # Next is the functions preamble
+        def create_function_description(schema, add_inner_thoughts=True):
+            # airorobos style
+            func_str = ""
+            func_str += f"{schema['name']}:"
+            func_str += f"\n  description: {schema['description']}"
+            func_str += f"\n  params:"
+            if add_inner_thoughts:
+                func_str += f"\n    inner_thoughts: Deep inner monologue private to you only."
+            for param_k, param_v in schema["parameters"]["properties"].items():
+                # TODO we're ignoring type
+                func_str += f"\n    {param_k}: {param_v['description']}"
+            # TODO we're ignoring schema['parameters']['required']
+            return func_str
+
+        # prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format."
+        prompt += f"\nPlease select the most suitable function and parameters from the list of available functions below, based on the ongoing conversation. Provide your response in JSON format."
+        prompt += f"\nAvailable functions:"
+        if function_documentation is not None:
+            prompt += f"\n{function_documentation}"
+        else:
+            for function_dict in functions:
+                prompt += f"\n{create_function_description(function_dict)}"
+
+        def create_function_call(function_call, inner_thoughts=None):
+            airo_func_call = {
+                "function": function_call["name"],
+                "params": {
+                    "inner_thoughts": inner_thoughts,
+                    **json.loads(function_call["arguments"], strict=JSON_LOADS_STRICT),
+                },
+            }
+            return json.dumps(airo_func_call, indent=2, ensure_ascii=JSON_ENSURE_ASCII)
+
+        # Add a sep for the conversation
+        if self.include_section_separators:
+            prompt += "\n<|user|>"
+
+        # Last are the user/assistant messages
+        for message in messages[1:]:
+            assert message["role"] in ["user", "assistant", "function", "tool"], message
+
+            if message["role"] == "user":
+                if self.simplify_json_content:
+                    try:
+                        content_json = json.loads(message["content"], strict=JSON_LOADS_STRICT)
+                        content_simple = content_json["message"]
+                        prompt += f"\n<|user|>\n{content_simple}{IM_END_TOKEN}"
+                    except:
+                        prompt += f"\n<|user|>\n{message['content']}{IM_END_TOKEN}"
+            elif message["role"] == "assistant":
+                prompt += f"\n<|assistant|>"
+                # need to add the function call if there was one
+                inner_thoughts = message["content"]
+                if "function_call" in message and message["function_call"]:
+                    prompt += f"\n{create_function_call(message['function_call'], inner_thoughts=inner_thoughts)}"
+            elif message["role"] in ["function", "tool"]:
+                # TODO find a good way to add this
+                # prompt += f"\nASSISTANT: (function return) {message['content']}"
+                prompt += f"\nFUNCTION RETURN: {message['content']}"
+                continue
+            else:
+                raise ValueError(message)
+
+        # Add a sep for the response
+        # if self.include_section_separators:
+        #    prompt += "\n### RESPONSE"
+
+        if self.include_assistant_prefix:
+            prompt += f"\n<|assistant|>"
+            if self.include_opening_brance_in_prefix:
+                prompt += "\n{"
+
+        return prompt
+
+    def clean_function_args(self, function_name, function_args):
+        """Some basic MemGPT-specific cleaning of function args"""
+        cleaned_function_name = function_name
+        cleaned_function_args = function_args.copy() if function_args is not None else {}
+
+        if function_name == "send_message":
+            # strip request_heartbeat
+            cleaned_function_args.pop("request_heartbeat", None)
+
+        inner_thoughts = None
+        if "inner_thoughts" in function_args:
+            inner_thoughts = cleaned_function_args.pop("inner_thoughts")
+
+        # TODO more cleaning to fix errors LLM makes
+        return inner_thoughts, cleaned_function_name, cleaned_function_args
+
+    def output_to_chat_completion_response(self, raw_llm_output):
+        """Turn raw LLM output into a ChatCompletion style response with:
+        "message" = {
+            "role": "assistant",
+            "content": ...,
+            "function_call": {
+                "name": ...
+                "arguments": {
+                    "arg1": val1,
+                    ...
+                }
+            }
+        }
+        """
+        if self.include_opening_brance_in_prefix and raw_llm_output[0] != "{":
+            raw_llm_output = "{" + raw_llm_output
+
+        try:
+            function_json_output = clean_json(raw_llm_output)
+        except Exception as e:
+            raise Exception(f"Failed to decode JSON from LLM output:\n{raw_llm_output} - error\n{str(e)}")
+        try:
+            function_name = function_json_output["function"]
+            function_parameters = function_json_output["params"]
+        except KeyError as e:
+            raise LLMJSONParsingError(f"Received valid JSON from LLM, but JSON was missing fields: {str(e)}")
+
+        if self.clean_func_args:
+            (
+                inner_thoughts,
+                function_name,
+                function_parameters,
+            ) = self.clean_function_args(function_name, function_parameters)
+
+        message = {
+            "role": "assistant",
+            "content": inner_thoughts,
+            "function_call": {
+                "name": function_name,
+                "arguments": json.dumps(function_parameters, ensure_ascii=JSON_ENSURE_ASCII),
+            },
+        }
+        return message
--- a/memgpt/main.py
+++ b/memgpt/main.py
@@ -1,448 +1,448 @@
-import json
-import os
-import sys
-import traceback
-
-import questionary
-import requests
-import typer
-from rich.console import Console
-
-import memgpt.agent as agent
-import memgpt.errors as errors
-import memgpt.system as system
-from memgpt.agent_store.storage import StorageConnector, TableType
-
-# import benchmark
-from memgpt.benchmark.benchmark import bench
-from memgpt.cli.cli import (
-    delete_agent,
-    migrate,
-    open_folder,
-    quickstart,
-    run,
-    server,
-    version,
-)
-from memgpt.cli.cli_config import add, configure, delete, list
-from memgpt.cli.cli_load import app as load_app
-from memgpt.config import MemGPTConfig
-from memgpt.constants import (
-    FUNC_FAILED_HEARTBEAT_MESSAGE,
-    JSON_ENSURE_ASCII,
-    JSON_LOADS_STRICT,
-    REQ_HEARTBEAT_MESSAGE,
-)
-from memgpt.metadata import MetadataStore
-
-# from memgpt.interface import CLIInterface as interface  # for printing to terminal
-from memgpt.streaming_interface import AgentRefreshStreamingInterface
-
-# interface = interface()
-
-app = typer.Typer(pretty_exceptions_enable=False)
-app.command(name="run")(run)
-app.command(name="version")(version)
-app.command(name="configure")(configure)
-app.command(name="list")(list)
-app.command(name="add")(add)
-app.command(name="delete")(delete)
-app.command(name="server")(server)
-app.command(name="folder")(open_folder)
-app.command(name="quickstart")(quickstart)
-# load data commands
-app.add_typer(load_app, name="load")
-# migration command
-app.command(name="migrate")(migrate)
-# benchmark command
-app.command(name="benchmark")(bench)
-# delete agents
-app.command(name="delete-agent")(delete_agent)
-
-
-def clear_line(console, strip_ui=False):
-    if strip_ui:
-        return
-    if os.name == "nt":  # for windows
-        console.print("\033[A\033[K", end="")
-    else:  # for linux
-        sys.stdout.write("\033[2K\033[G")
-        sys.stdout.flush()
-
-
-def run_agent_loop(
-    memgpt_agent: agent.Agent, config: MemGPTConfig, first, ms: MetadataStore, no_verify=False, cfg=None, strip_ui=False, stream=False
-):
-    if isinstance(memgpt_agent.interface, AgentRefreshStreamingInterface):
-        # memgpt_agent.interface.toggle_streaming(on=stream)
-        if not stream:
-            memgpt_agent.interface = memgpt_agent.interface.nonstreaming_interface
-
-    if hasattr(memgpt_agent.interface, "console"):
-        console = memgpt_agent.interface.console
-    else:
-        console = Console()
-
-    counter = 0
-    user_input = None
-    skip_next_user_input = False
-    user_message = None
-    USER_GOES_FIRST = first
-
-    if not USER_GOES_FIRST:
-        console.input("[bold cyan]Hit enter to begin (will request first MemGPT message)[/bold cyan]\n")
-        clear_line(console, strip_ui=strip_ui)
-        print()
-
-    multiline_input = False
-    ms = MetadataStore(config)
-    while True:
-        if not skip_next_user_input and (counter > 0 or USER_GOES_FIRST):
-            # Ask for user input
-            if not stream:
-                print()
-            user_input = questionary.text(
-                "Enter your message:",
-                multiline=multiline_input,
-                qmark=">",
-            ).ask()
-            clear_line(console, strip_ui=strip_ui)
-            if not stream:
-                print()
-
-            # Gracefully exit on Ctrl-C/D
-            if user_input is None:
-                user_input = "/exit"
-
-            user_input = user_input.rstrip()
-
-            if user_input.startswith("!"):
-                print(f"Commands for CLI begin with '/' not '!'")
-                continue
-
-            if user_input == "":
-                # no empty messages allowed
-                print("Empty input received. Try again!")
-                continue
-
-            # Handle CLI commands
-            # Commands to not get passed as input to MemGPT
-            if user_input.startswith("/"):
-                # updated agent save functions
-                if user_input.lower() == "/exit":
-                    # memgpt_agent.save()
-                    agent.save_agent(memgpt_agent, ms)
-                    break
-                elif user_input.lower() == "/save" or user_input.lower() == "/savechat":
-                    # memgpt_agent.save()
-                    agent.save_agent(memgpt_agent, ms)
-                    continue
-                elif user_input.lower() == "/attach":
-                    # TODO: check if agent already has it
-
-                    # TODO: check to ensure source embedding dimentions/model match agents, and disallow attachment if not
-                    # TODO: alternatively, only list sources with compatible embeddings, and print warning about non-compatible sources
-
-                    data_source_options = ms.list_sources(user_id=memgpt_agent.agent_state.user_id)
-                    if len(data_source_options) == 0:
-                        typer.secho(
-                            'No sources available. You must load a souce with "memgpt load ..." before running /attach.',
-                            fg=typer.colors.RED,
-                            bold=True,
-                        )
-                        continue
-
-                    # determine what sources are valid to be attached to this agent
-                    valid_options = []
-                    invalid_options = []
-                    for source in data_source_options:
-                        if (
-                            source.embedding_model == memgpt_agent.agent_state.embedding_config.embedding_model
-                            and source.embedding_dim == memgpt_agent.agent_state.embedding_config.embedding_dim
-                        ):
-                            valid_options.append(source.name)
-                        else:
-                            # print warning about invalid sources
-                            typer.secho(
-                                f"Source {source.name} exists but has embedding dimentions {source.embedding_dim} from model {source.embedding_model}, while the agent uses embedding dimentions {memgpt_agent.agent_state.embedding_config.embedding_dim} and model {memgpt_agent.agent_state.embedding_config.embedding_model}",
-                                fg=typer.colors.YELLOW,
-                            )
-                            invalid_options.append(source.name)
-
-                    # prompt user for data source selection
-                    data_source = questionary.select("Select data source", choices=valid_options).ask()
-
-                    # attach new data
-                    # attach(memgpt_agent.agent_state.name, data_source)
-                    source_connector = StorageConnector.get_storage_connector(
-                        TableType.PASSAGES, config, user_id=memgpt_agent.agent_state.user_id
-                    )
-                    memgpt_agent.attach_source(data_source, source_connector, ms)
-
-                    continue
-
-                elif user_input.lower() == "/dump" or user_input.lower().startswith("/dump "):
-                    # Check if there's an additional argument that's an integer
-                    command = user_input.strip().split()
-                    amount = int(command[1]) if len(command) > 1 and command[1].isdigit() else 0
-                    if amount == 0:
-                        memgpt_agent.interface.print_messages(memgpt_agent._messages, dump=True)
-                    else:
-                        memgpt_agent.interface.print_messages(memgpt_agent._messages[-min(amount, len(memgpt_agent.messages)) :], dump=True)
-                    continue
-
-                elif user_input.lower() == "/dumpraw":
-                    memgpt_agent.interface.print_messages_raw(memgpt_agent._messages)
-                    continue
-
-                elif user_input.lower() == "/memory":
-                    print(f"\nDumping memory contents:\n")
-                    print(f"{str(memgpt_agent.memory)}")
-                    print(f"{str(memgpt_agent.persistence_manager.archival_memory)}")
-                    print(f"{str(memgpt_agent.persistence_manager.recall_memory)}")
-                    continue
-
-                elif user_input.lower() == "/model":
-                    if memgpt_agent.model == "gpt-4":
-                        memgpt_agent.model = "gpt-3.5-turbo-16k"
-                    elif memgpt_agent.model == "gpt-3.5-turbo-16k":
-                        memgpt_agent.model = "gpt-4"
-                    print(f"Updated model to:\n{str(memgpt_agent.model)}")
-                    continue
-
-                elif user_input.lower() == "/pop" or user_input.lower().startswith("/pop "):
-                    # Check if there's an additional argument that's an integer
-                    command = user_input.strip().split()
-                    pop_amount = int(command[1]) if len(command) > 1 and command[1].isdigit() else 3
-                    n_messages = len(memgpt_agent._messages)
-                    MIN_MESSAGES = 2
-                    if n_messages <= MIN_MESSAGES:
-                        print(f"Agent only has {n_messages} messages in stack, none left to pop")
-                    elif n_messages - pop_amount < MIN_MESSAGES:
-                        print(f"Agent only has {n_messages} messages in stack, cannot pop more than {n_messages - MIN_MESSAGES}")
-                    else:
-                        print(f"Popping last {pop_amount} messages from stack")
-                        for _ in range(min(pop_amount, len(memgpt_agent._messages))):
-                            # remove the message from the internal state of the agent
-                            deleted_message = memgpt_agent._messages.pop()
-                            # then also remove it from recall storage
-                            memgpt_agent.persistence_manager.recall_memory.storage.delete(filters={"id": deleted_message.id})
-                    continue
-
-                elif user_input.lower() == "/retry":
-                    print(f"Retrying for another answer")
-                    while len(memgpt_agent._messages) > 0:
-                        if memgpt_agent._messages[-1].role == "user":
-                            # we want to pop up to the last user message and send it again
-                            user_message = memgpt_agent._messages[-1].text
-                            deleted_message = memgpt_agent._messages.pop()
-                            # then also remove it from recall storage
-                            memgpt_agent.persistence_manager.recall_memory.storage.delete(filters={"id": deleted_message.id})
-                            break
-                        deleted_message = memgpt_agent._messages.pop()
-                        # then also remove it from recall storage
-                        memgpt_agent.persistence_manager.recall_memory.storage.delete(filters={"id": deleted_message.id})
-
-                elif user_input.lower() == "/rethink" or user_input.lower().startswith("/rethink "):
-                    if len(user_input) < len("/rethink "):
-                        print("Missing text after the command")
-                        continue
-                    for x in range(len(memgpt_agent.messages) - 1, 0, -1):
-                        msg_obj = memgpt_agent._messages[x]
-                        if msg_obj.role == "assistant":
-                            clean_new_text = user_input[len("/rethink ") :].strip()
-                            msg_obj.text = clean_new_text
-                            # To persist to the database, all we need to do is "re-insert" into recall memory
-                            memgpt_agent.persistence_manager.recall_memory.storage.update(record=msg_obj)
-                            break
-                    continue
-
-                elif user_input.lower() == "/rewrite" or user_input.lower().startswith("/rewrite "):
-                    if len(user_input) < len("/rewrite "):
-                        print("Missing text after the command")
-                        continue
-                    for x in range(len(memgpt_agent.messages) - 1, 0, -1):
-                        if memgpt_agent.messages[x].get("role") == "assistant":
-                            text = user_input[len("/rewrite ") :].strip()
-                            # Get the current message content
-                            # The rewrite target is the output of send_message
-                            message_obj = memgpt_agent._messages[x]
-                            if message_obj.tool_calls is not None and len(message_obj.tool_calls) > 0:
-                                # Check that we hit an assistant send_message call
-                                name_string = message_obj.tool_calls[0].function.get("name")
-                                if name_string is None or name_string != "send_message":
-                                    print("Assistant missing send_message function call")
-                                    break  # cancel op
-                                args_string = message_obj.tool_calls[0].function.get("arguments")
-                                if args_string is None:
-                                    print("Assistant missing send_message function arguments")
-                                    break  # cancel op
-                                args_json = json.loads(args_string, strict=JSON_LOADS_STRICT)
-                                if "message" not in args_json:
-                                    print("Assistant missing send_message message argument")
-                                    break  # cancel op
-
-                                # Once we found our target, rewrite it
-                                args_json["message"] = text
-                                new_args_string = json.dumps(args_json, ensure_ascii=JSON_ENSURE_ASCII)
-                                message_obj.tool_calls[0].function["arguments"] = new_args_string
-
-                                # To persist to the database, all we need to do is "re-insert" into recall memory
-                                memgpt_agent.persistence_manager.recall_memory.storage.update(record=message_obj)
-                                break
-                    continue
-
-                elif user_input.lower() == "/summarize":
-                    try:
-                        memgpt_agent.summarize_messages_inplace()
-                        typer.secho(
-                            f"/summarize succeeded",
-                            fg=typer.colors.GREEN,
-                            bold=True,
-                        )
-                    except (errors.LLMError, requests.exceptions.HTTPError) as e:
-                        typer.secho(
-                            f"/summarize failed:\n{e}",
-                            fg=typer.colors.RED,
-                            bold=True,
-                        )
-                    continue
-
-                elif user_input.lower().startswith("/add_function"):
-                    try:
-                        if len(user_input) < len("/add_function "):
-                            print("Missing function name after the command")
-                            continue
-                        function_name = user_input[len("/add_function ") :].strip()
-                        result = memgpt_agent.add_function(function_name)
-                        typer.secho(
-                            f"/add_function succeeded: {result}",
-                            fg=typer.colors.GREEN,
-                            bold=True,
-                        )
-                    except ValueError as e:
-                        typer.secho(
-                            f"/add_function failed:\n{e}",
-                            fg=typer.colors.RED,
-                            bold=True,
-                        )
-                        continue
-                elif user_input.lower().startswith("/remove_function"):
-                    try:
-                        if len(user_input) < len("/remove_function "):
-                            print("Missing function name after the command")
-                            continue
-                        function_name = user_input[len("/remove_function ") :].strip()
-                        result = memgpt_agent.remove_function(function_name)
-                        typer.secho(
-                            f"/remove_function succeeded: {result}",
-                            fg=typer.colors.GREEN,
-                            bold=True,
-                        )
-                    except ValueError as e:
-                        typer.secho(
-                            f"/remove_function failed:\n{e}",
-                            fg=typer.colors.RED,
-                            bold=True,
-                        )
-                        continue
-
-                # No skip options
-                elif user_input.lower() == "/wipe":
-                    memgpt_agent = agent.Agent(memgpt_agent.interface)
-                    user_message = None
-
-                elif user_input.lower() == "/heartbeat":
-                    user_message = system.get_heartbeat()
-
-                elif user_input.lower() == "/memorywarning":
-                    user_message = system.get_token_limit_warning()
-
-                elif user_input.lower() == "//":
-                    multiline_input = not multiline_input
-                    continue
-
-                elif user_input.lower() == "/" or user_input.lower() == "/help":
-                    questionary.print("CLI commands", "bold")
-                    for cmd, desc in USER_COMMANDS:
-                        questionary.print(cmd, "bold")
-                        questionary.print(f" {desc}")
-                    continue
-
-                else:
-                    print(f"Unrecognized command: {user_input}")
-                    continue
-
-            else:
-                # If message did not begin with command prefix, pass inputs to MemGPT
-                # Handle user message and append to messages
-                user_message = system.package_user_message(user_input)
-
-        skip_next_user_input = False
-
-        def process_agent_step(user_message, no_verify):
-            new_messages, heartbeat_request, function_failed, token_warning, tokens_accumulated = memgpt_agent.step(
-                user_message,
-                first_message=False,
-                skip_verify=no_verify,
-                stream=stream,
-            )
-
-            skip_next_user_input = False
-            if token_warning:
-                user_message = system.get_token_limit_warning()
-                skip_next_user_input = True
-            elif function_failed:
-                user_message = system.get_heartbeat(FUNC_FAILED_HEARTBEAT_MESSAGE)
-                skip_next_user_input = True
-            elif heartbeat_request:
-                user_message = system.get_heartbeat(REQ_HEARTBEAT_MESSAGE)
-                skip_next_user_input = True
-
-            return new_messages, user_message, skip_next_user_input
-
-        while True:
-            try:
-                if strip_ui:
-                    new_messages, user_message, skip_next_user_input = process_agent_step(user_message, no_verify)
-                    break
-                else:
-                    if stream:
-                        # Don't display the "Thinking..." if streaming
-                        new_messages, user_message, skip_next_user_input = process_agent_step(user_message, no_verify)
-                    else:
-                        with console.status("[bold cyan]Thinking...") as status:
-                            new_messages, user_message, skip_next_user_input = process_agent_step(user_message, no_verify)
-                    break
-            except KeyboardInterrupt:
-                print("User interrupt occurred.")
-                retry = questionary.confirm("Retry agent.step()?").ask()
-                if not retry:
-                    break
-            except Exception as e:
-                print("An exception occurred when running agent.step(): ")
-                traceback.print_exc()
-                retry = questionary.confirm("Retry agent.step()?").ask()
-                if not retry:
-                    break
-
-        counter += 1
-
-    print("Finished.")
-
-
-USER_COMMANDS = [
-    ("//", "toggle multiline input mode"),
-    ("/exit", "exit the CLI"),
-    ("/save", "save a checkpoint of the current agent/conversation state"),
-    ("/load", "load a saved checkpoint"),
-    ("/dump <count>", "view the last <count> messages (all if <count> is omitted)"),
-    ("/memory", "print the current contents of agent memory"),
-    ("/pop <count>", "undo <count> messages in the conversation (default is 3)"),
-    ("/retry", "pops the last answer and tries to get another one"),
-    ("/rethink <text>", "changes the inner thoughts of the last agent message"),
-    ("/rewrite <text>", "changes the reply of the last agent message"),
-    ("/heartbeat", "send a heartbeat system message to the agent"),
-    ("/memorywarning", "send a memory warning system message to the agent"),
-    ("/attach", "attach data source to agent"),
-]
+import json
+import os
+import sys
+import traceback
+
+import questionary
+import requests
+import typer
+from rich.console import Console
+
+import memgpt.agent as agent
+import memgpt.errors as errors
+import memgpt.system as system
+from memgpt.agent_store.storage import StorageConnector, TableType
+
+# import benchmark
+from memgpt.benchmark.benchmark import bench
+from memgpt.cli.cli import (
+    delete_agent,
+    migrate,
+    open_folder,
+    quickstart,
+    run,
+    server,
+    version,
+)
+from memgpt.cli.cli_config import add, configure, delete, list
+from memgpt.cli.cli_load import app as load_app
+from memgpt.config import MemGPTConfig
+from memgpt.constants import (
+    FUNC_FAILED_HEARTBEAT_MESSAGE,
+    JSON_ENSURE_ASCII,
+    JSON_LOADS_STRICT,
+    REQ_HEARTBEAT_MESSAGE,
+)
+from memgpt.metadata import MetadataStore
+
+# from memgpt.interface import CLIInterface as interface  # for printing to terminal
+from memgpt.streaming_interface import AgentRefreshStreamingInterface
+
+# interface = interface()
+
+app = typer.Typer(pretty_exceptions_enable=False)
+app.command(name="run")(run)
+app.command(name="version")(version)
+app.command(name="configure")(configure)
+app.command(name="list")(list)
+app.command(name="add")(add)
+app.command(name="delete")(delete)
+app.command(name="server")(server)
+app.command(name="folder")(open_folder)
+app.command(name="quickstart")(quickstart)
+# load data commands
+app.add_typer(load_app, name="load")
+# migration command
+app.command(name="migrate")(migrate)
+# benchmark command
+app.command(name="benchmark")(bench)
+# delete agents
+app.command(name="delete-agent")(delete_agent)
+
+
+def clear_line(console, strip_ui=False):
+    if strip_ui:
+        return
+    if os.name == "nt":  # for windows
+        console.print("\033[A\033[K", end="")
+    else:  # for linux
+        sys.stdout.write("\033[2K\033[G")
+        sys.stdout.flush()
+
+
+def run_agent_loop(
+    memgpt_agent: agent.Agent, config: MemGPTConfig, first, ms: MetadataStore, no_verify=False, cfg=None, strip_ui=False, stream=False
+):
+    if isinstance(memgpt_agent.interface, AgentRefreshStreamingInterface):
+        # memgpt_agent.interface.toggle_streaming(on=stream)
+        if not stream:
+            memgpt_agent.interface = memgpt_agent.interface.nonstreaming_interface
+
+    if hasattr(memgpt_agent.interface, "console"):
+        console = memgpt_agent.interface.console
+    else:
+        console = Console()
+
+    counter = 0
+    user_input = None
+    skip_next_user_input = False
+    user_message = None
+    USER_GOES_FIRST = first
+
+    if not USER_GOES_FIRST:
+        console.input("[bold cyan]Hit enter to begin (will request first MemGPT message)[/bold cyan]\n")
+        clear_line(console, strip_ui=strip_ui)
+        print()
+
+    multiline_input = False
+    ms = MetadataStore(config)
+    while True:
+        if not skip_next_user_input and (counter > 0 or USER_GOES_FIRST):
+            # Ask for user input
+            if not stream:
+                print()
+            user_input = questionary.text(
+                "Enter your message:",
+                multiline=multiline_input,
+                qmark=">",
+            ).ask()
+            clear_line(console, strip_ui=strip_ui)
+            if not stream:
+                print()
+
+            # Gracefully exit on Ctrl-C/D
+            if user_input is None:
+                user_input = "/exit"
+
+            user_input = user_input.rstrip()
+
+            if user_input.startswith("!"):
+                print(f"Commands for CLI begin with '/' not '!'")
+                continue
+
+            if user_input == "":
+                # no empty messages allowed
+                print("Empty input received. Try again!")
+                continue
+
+            # Handle CLI commands
+            # Commands to not get passed as input to MemGPT
+            if user_input.startswith("/"):
+                # updated agent save functions
+                if user_input.lower() == "/exit":
+                    # memgpt_agent.save()
+                    agent.save_agent(memgpt_agent, ms)
+                    break
+                elif user_input.lower() == "/save" or user_input.lower() == "/savechat":
+                    # memgpt_agent.save()
+                    agent.save_agent(memgpt_agent, ms)
+                    continue
+                elif user_input.lower() == "/attach":
+                    # TODO: check if agent already has it
+
+                    # TODO: check to ensure source embedding dimentions/model match agents, and disallow attachment if not
+                    # TODO: alternatively, only list sources with compatible embeddings, and print warning about non-compatible sources
+
+                    data_source_options = ms.list_sources(user_id=memgpt_agent.agent_state.user_id)
+                    if len(data_source_options) == 0:
+                        typer.secho(
+                            'No sources available. You must load a souce with "memgpt load ..." before running /attach.',
+                            fg=typer.colors.RED,
+                            bold=True,
+                        )
+                        continue
+
+                    # determine what sources are valid to be attached to this agent
+                    valid_options = []
+                    invalid_options = []
+                    for source in data_source_options:
+                        if (
+                            source.embedding_model == memgpt_agent.agent_state.embedding_config.embedding_model
+                            and source.embedding_dim == memgpt_agent.agent_state.embedding_config.embedding_dim
+                        ):
+                            valid_options.append(source.name)
+                        else:
+                            # print warning about invalid sources
+                            typer.secho(
+                                f"Source {source.name} exists but has embedding dimentions {source.embedding_dim} from model {source.embedding_model}, while the agent uses embedding dimentions {memgpt_agent.agent_state.embedding_config.embedding_dim} and model {memgpt_agent.agent_state.embedding_config.embedding_model}",
+                                fg=typer.colors.YELLOW,
+                            )
+                            invalid_options.append(source.name)
+
+                    # prompt user for data source selection
+                    data_source = questionary.select("Select data source", choices=valid_options).ask()
+
+                    # attach new data
+                    # attach(memgpt_agent.agent_state.name, data_source)
+                    source_connector = StorageConnector.get_storage_connector(
+                        TableType.PASSAGES, config, user_id=memgpt_agent.agent_state.user_id
+                    )
+                    memgpt_agent.attach_source(data_source, source_connector, ms)
+
+                    continue
+
+                elif user_input.lower() == "/dump" or user_input.lower().startswith("/dump "):
+                    # Check if there's an additional argument that's an integer
+                    command = user_input.strip().split()
+                    amount = int(command[1]) if len(command) > 1 and command[1].isdigit() else 0
+                    if amount == 0:
+                        memgpt_agent.interface.print_messages(memgpt_agent._messages, dump=True)
+                    else:
+                        memgpt_agent.interface.print_messages(memgpt_agent._messages[-min(amount, len(memgpt_agent.messages)) :], dump=True)
+                    continue
+
+                elif user_input.lower() == "/dumpraw":
+                    memgpt_agent.interface.print_messages_raw(memgpt_agent._messages)
+                    continue
+
+                elif user_input.lower() == "/memory":
+                    print(f"\nDumping memory contents:\n")
+                    print(f"{str(memgpt_agent.memory)}")
+                    print(f"{str(memgpt_agent.persistence_manager.archival_memory)}")
+                    print(f"{str(memgpt_agent.persistence_manager.recall_memory)}")
+                    continue
+
+                elif user_input.lower() == "/model":
+                    if memgpt_agent.model == "gpt-4":
+                        memgpt_agent.model = "gpt-3.5-turbo-16k"
+                    elif memgpt_agent.model == "gpt-3.5-turbo-16k":
+                        memgpt_agent.model = "gpt-4"
+                    print(f"Updated model to:\n{str(memgpt_agent.model)}")
+                    continue
+
+                elif user_input.lower() == "/pop" or user_input.lower().startswith("/pop "):
+                    # Check if there's an additional argument that's an integer
+                    command = user_input.strip().split()
+                    pop_amount = int(command[1]) if len(command) > 1 and command[1].isdigit() else 3
+                    n_messages = len(memgpt_agent._messages)
+                    MIN_MESSAGES = 2
+                    if n_messages <= MIN_MESSAGES:
+                        print(f"Agent only has {n_messages} messages in stack, none left to pop")
+                    elif n_messages - pop_amount < MIN_MESSAGES:
+                        print(f"Agent only has {n_messages} messages in stack, cannot pop more than {n_messages - MIN_MESSAGES}")
+                    else:
+                        print(f"Popping last {pop_amount} messages from stack")
+                        for _ in range(min(pop_amount, len(memgpt_agent._messages))):
+                            # remove the message from the internal state of the agent
+                            deleted_message = memgpt_agent._messages.pop()
+                            # then also remove it from recall storage
+                            memgpt_agent.persistence_manager.recall_memory.storage.delete(filters={"id": deleted_message.id})
+                    continue
+
+                elif user_input.lower() == "/retry":
+                    print(f"Retrying for another answer")
+                    while len(memgpt_agent._messages) > 0:
+                        if memgpt_agent._messages[-1].role == "user":
+                            # we want to pop up to the last user message and send it again
+                            user_message = memgpt_agent._messages[-1].text
+                            deleted_message = memgpt_agent._messages.pop()
+                            # then also remove it from recall storage
+                            memgpt_agent.persistence_manager.recall_memory.storage.delete(filters={"id": deleted_message.id})
+                            break
+                        deleted_message = memgpt_agent._messages.pop()
+                        # then also remove it from recall storage
+                        memgpt_agent.persistence_manager.recall_memory.storage.delete(filters={"id": deleted_message.id})
+
+                elif user_input.lower() == "/rethink" or user_input.lower().startswith("/rethink "):
+                    if len(user_input) < len("/rethink "):
+                        print("Missing text after the command")
+                        continue
+                    for x in range(len(memgpt_agent.messages) - 1, 0, -1):
+                        msg_obj = memgpt_agent._messages[x]
+                        if msg_obj.role == "assistant":
+                            clean_new_text = user_input[len("/rethink ") :].strip()
+                            msg_obj.text = clean_new_text
+                            # To persist to the database, all we need to do is "re-insert" into recall memory
+                            memgpt_agent.persistence_manager.recall_memory.storage.update(record=msg_obj)
+                            break
+                    continue
+
+                elif user_input.lower() == "/rewrite" or user_input.lower().startswith("/rewrite "):
+                    if len(user_input) < len("/rewrite "):
+                        print("Missing text after the command")
+                        continue
+                    for x in range(len(memgpt_agent.messages) - 1, 0, -1):
+                        if memgpt_agent.messages[x].get("role") == "assistant":
+                            text = user_input[len("/rewrite ") :].strip()
+                            # Get the current message content
+                            # The rewrite target is the output of send_message
+                            message_obj = memgpt_agent._messages[x]
+                            if message_obj.tool_calls is not None and len(message_obj.tool_calls) > 0:
+                                # Check that we hit an assistant send_message call
+                                name_string = message_obj.tool_calls[0].function.get("name")
+                                if name_string is None or name_string != "send_message":
+                                    print("Assistant missing send_message function call")
+                                    break  # cancel op
+                                args_string = message_obj.tool_calls[0].function.get("arguments")
+                                if args_string is None:
+                                    print("Assistant missing send_message function arguments")
+                                    break  # cancel op
+                                args_json = json.loads(args_string, strict=JSON_LOADS_STRICT)
+                                if "message" not in args_json:
+                                    print("Assistant missing send_message message argument")
+                                    break  # cancel op
+
+                                # Once we found our target, rewrite it
+                                args_json["message"] = text
+                                new_args_string = json.dumps(args_json, ensure_ascii=JSON_ENSURE_ASCII)
+                                message_obj.tool_calls[0].function["arguments"] = new_args_string
+
+                                # To persist to the database, all we need to do is "re-insert" into recall memory
+                                memgpt_agent.persistence_manager.recall_memory.storage.update(record=message_obj)
+                                break
+                    continue
+
+                elif user_input.lower() == "/summarize":
+                    try:
+                        memgpt_agent.summarize_messages_inplace()
+                        typer.secho(
+                            f"/summarize succeeded",
+                            fg=typer.colors.GREEN,
+                            bold=True,
+                        )
+                    except (errors.LLMError, requests.exceptions.HTTPError) as e:
+                        typer.secho(
+                            f"/summarize failed:\n{e}",
+                            fg=typer.colors.RED,
+                            bold=True,
+                        )
+                    continue
+
+                elif user_input.lower().startswith("/add_function"):
+                    try:
+                        if len(user_input) < len("/add_function "):
+                            print("Missing function name after the command")
+                            continue
+                        function_name = user_input[len("/add_function ") :].strip()
+                        result = memgpt_agent.add_function(function_name)
+                        typer.secho(
+                            f"/add_function succeeded: {result}",
+                            fg=typer.colors.GREEN,
+                            bold=True,
+                        )
+                    except ValueError as e:
+                        typer.secho(
+                            f"/add_function failed:\n{e}",
+                            fg=typer.colors.RED,
+                            bold=True,
+                        )
+                        continue
+                elif user_input.lower().startswith("/remove_function"):
+                    try:
+                        if len(user_input) < len("/remove_function "):
+                            print("Missing function name after the command")
+                            continue
+                        function_name = user_input[len("/remove_function ") :].strip()
+                        result = memgpt_agent.remove_function(function_name)
+                        typer.secho(
+                            f"/remove_function succeeded: {result}",
+                            fg=typer.colors.GREEN,
+                            bold=True,
+                        )
+                    except ValueError as e:
+                        typer.secho(
+                            f"/remove_function failed:\n{e}",
+                            fg=typer.colors.RED,
+                            bold=True,
+                        )
+                        continue
+
+                # No skip options
+                elif user_input.lower() == "/wipe":
+                    memgpt_agent = agent.Agent(memgpt_agent.interface)
+                    user_message = None
+
+                elif user_input.lower() == "/heartbeat":
+                    user_message = system.get_heartbeat()
+
+                elif user_input.lower() == "/memorywarning":
+                    user_message = system.get_token_limit_warning()
+
+                elif user_input.lower() == "//":
+                    multiline_input = not multiline_input
+                    continue
+
+                elif user_input.lower() == "/" or user_input.lower() == "/help":
+                    questionary.print("CLI commands", "bold")
+                    for cmd, desc in USER_COMMANDS:
+                        questionary.print(cmd, "bold")
+                        questionary.print(f" {desc}")
+                    continue
+
+                else:
+                    print(f"Unrecognized command: {user_input}")
+                    continue
+
+            else:
+                # If message did not begin with command prefix, pass inputs to MemGPT
+                # Handle user message and append to messages
+                user_message = system.package_user_message(user_input)
+
+        skip_next_user_input = False
+
+        def process_agent_step(user_message, no_verify):
+            new_messages, heartbeat_request, function_failed, token_warning, tokens_accumulated = memgpt_agent.step(
+                user_message,
+                first_message=False,
+                skip_verify=no_verify,
+                stream=stream,
+            )
+
+            skip_next_user_input = False
+            if token_warning:
+                user_message = system.get_token_limit_warning()
+                skip_next_user_input = True
+            elif function_failed:
+                user_message = system.get_heartbeat(FUNC_FAILED_HEARTBEAT_MESSAGE)
+                skip_next_user_input = True
+            elif heartbeat_request:
+                user_message = system.get_heartbeat(REQ_HEARTBEAT_MESSAGE)
+                skip_next_user_input = True
+
+            return new_messages, user_message, skip_next_user_input
+
+        while True:
+            try:
+                if strip_ui:
+                    new_messages, user_message, skip_next_user_input = process_agent_step(user_message, no_verify)
+                    break
+                else:
+                    if stream:
+                        # Don't display the "Thinking..." if streaming
+                        new_messages, user_message, skip_next_user_input = process_agent_step(user_message, no_verify)
+                    else:
+                        with console.status("[bold cyan]Thinking...") as status:
+                            new_messages, user_message, skip_next_user_input = process_agent_step(user_message, no_verify)
+                    break
+            except KeyboardInterrupt:
+                print("User interrupt occurred.")
+                retry = questionary.confirm("Retry agent.step()?").ask()
+                if not retry:
+                    break
+            except Exception as e:
+                print("An exception occurred when running agent.step(): ")
+                traceback.print_exc()
+                retry = questionary.confirm("Retry agent.step()?").ask()
+                if not retry:
+                    break
+
+        counter += 1
+
+    print("Finished.")
+
+
+USER_COMMANDS = [
+    ("//", "toggle multiline input mode"),
+    ("/exit", "exit the CLI"),
+    ("/save", "save a checkpoint of the current agent/conversation state"),
+    ("/load", "load a saved checkpoint"),
+    ("/dump <count>", "view the last <count> messages (all if <count> is omitted)"),
+    ("/memory", "print the current contents of agent memory"),
+    ("/pop <count>", "undo <count> messages in the conversation (default is 3)"),
+    ("/retry", "pops the last answer and tries to get another one"),
+    ("/rethink <text>", "changes the inner thoughts of the last agent message"),
+    ("/rewrite <text>", "changes the reply of the last agent message"),
+    ("/heartbeat", "send a heartbeat system message to the agent"),
+    ("/memorywarning", "send a memory warning system message to the agent"),
+    ("/attach", "attach data source to agent"),
+]
--- a/memgpt/memory.py
+++ b/memgpt/memory.py
--- a/memgpt/persistence_manager.py
+++ b/memgpt/persistence_manager.py
@@ -1,155 +1,155 @@
-from abc import ABC, abstractmethod
-from datetime import datetime
-from typing import List
-
-from memgpt.data_types import AgentState, Message
-from memgpt.memory import BaseRecallMemory, EmbeddingArchivalMemory
-from memgpt.utils import printd
-
-
-def parse_formatted_time(formatted_time: str):
-    # parse times returned by memgpt.utils.get_formatted_time()
-    try:
-        return datetime.strptime(formatted_time.strip(), "%Y-%m-%d %I:%M:%S %p %Z%z")
-    except:
-        return datetime.strptime(formatted_time.strip(), "%Y-%m-%d %I:%M:%S %p")
-
-
-class PersistenceManager(ABC):
-    @abstractmethod
-    def trim_messages(self, num):
-        pass
-
-    @abstractmethod
-    def prepend_to_messages(self, added_messages):
-        pass
-
-    @abstractmethod
-    def append_to_messages(self, added_messages):
-        pass
-
-    @abstractmethod
-    def swap_system_message(self, new_system_message):
-        pass
-
-    @abstractmethod
-    def update_memory(self, new_memory):
-        pass
-
-
-class LocalStateManager(PersistenceManager):
-    """In-memory state manager has nothing to manage, all agents are held in-memory"""
-
-    recall_memory_cls = BaseRecallMemory
-    archival_memory_cls = EmbeddingArchivalMemory
-
-    def __init__(self, agent_state: AgentState):
-        # Memory held in-state useful for debugging stateful versions
-        self.memory = None
-        # self.messages = []  # current in-context messages
-        # self.all_messages = [] # all messages seen in current session (needed if lazily synchronizing state with DB)
-        self.archival_memory = EmbeddingArchivalMemory(agent_state)
-        self.recall_memory = BaseRecallMemory(agent_state)
-        # self.agent_state = agent_state
-
-    def save(self):
-        """Ensure storage connectors save data"""
-        self.archival_memory.save()
-        self.recall_memory.save()
-
-    def init(self, agent):
-        """Connect persistent state manager to agent"""
-        printd(f"Initializing {self.__class__.__name__} with agent object")
-        # self.all_messages = [{"timestamp": get_local_time(), "message": msg} for msg in agent.messages.copy()]
-        # self.messages = [{"timestamp": get_local_time(), "message": msg} for msg in agent.messages.copy()]
-        self.memory = agent.memory
-        # printd(f"{self.__class__.__name__}.all_messages.len = {len(self.all_messages)}")
-        printd(f"{self.__class__.__name__}.messages.len = {len(self.messages)}")
-
-    '''
-    def json_to_message(self, message_json) -> Message:
-        """Convert agent message JSON into Message object"""
-
-        # get message
-        if "message" in message_json:
-            message = message_json["message"]
-        else:
-            message = message_json
-
-        # get timestamp
-        if "timestamp" in message_json:
-            timestamp = parse_formatted_time(message_json["timestamp"])
-        else:
-            timestamp = get_local_time()
-
-        # TODO: change this when we fully migrate to tool calls API
-        if "function_call" in message:
-            tool_calls = [
-                ToolCall(
-                    id=message["tool_call_id"],
-                    tool_call_type="function",
-                    function={
-                        "name": message["function_call"]["name"],
-                        "arguments": message["function_call"]["arguments"],
-                    },
-                )
-            ]
-            printd(f"Saving tool calls {[vars(tc) for tc in tool_calls]}")
-        else:
-            tool_calls = None
-
-        # if message["role"] == "function":
-        # message["role"] = "tool"
-
-        return Message(
-            user_id=self.agent_state.user_id,
-            agent_id=self.agent_state.id,
-            role=message["role"],
-            text=message["content"],
-            name=message["name"] if "name" in message else None,
-            model=self.agent_state.llm_config.model,
-            created_at=timestamp,
-            tool_calls=tool_calls,
-            tool_call_id=message["tool_call_id"] if "tool_call_id" in message else None,
-            id=message["id"] if "id" in message else None,
-        )
-    '''
-
-    def trim_messages(self, num):
-        # printd(f"InMemoryStateManager.trim_messages")
-        # self.messages = [self.messages[0]] + self.messages[num:]
-        pass
-
-    def prepend_to_messages(self, added_messages: List[Message]):
-        # first tag with timestamps
-        # added_messages = [{"timestamp": get_local_time(), "message": msg} for msg in added_messages]
-
-        printd(f"{self.__class__.__name__}.prepend_to_message")
-        # self.messages = [self.messages[0]] + added_messages + self.messages[1:]
-
-        # add to recall memory
-        self.recall_memory.insert_many([m for m in added_messages])
-
-    def append_to_messages(self, added_messages: List[Message]):
-        # first tag with timestamps
-        # added_messages = [{"timestamp": get_local_time(), "message": msg} for msg in added_messages]
-
-        printd(f"{self.__class__.__name__}.append_to_messages")
-        # self.messages = self.messages + added_messages
-
-        # add to recall memory
-        self.recall_memory.insert_many([m for m in added_messages])
-
-    def swap_system_message(self, new_system_message: Message):
-        # first tag with timestamps
-        # new_system_message = {"timestamp": get_local_time(), "message": new_system_message}
-
-        printd(f"{self.__class__.__name__}.swap_system_message")
-        # self.messages[0] = new_system_message
-
-        # add to recall memory
-        self.recall_memory.insert(new_system_message)
-
-    def update_memory(self, new_memory):
-        printd(f"{self.__class__.__name__}.update_memory")
-        self.memory = new_memory
+from abc import ABC, abstractmethod
+from datetime import datetime
+from typing import List
+
+from memgpt.data_types import AgentState, Message
+from memgpt.memory import BaseRecallMemory, EmbeddingArchivalMemory
+from memgpt.utils import printd
+
+
+def parse_formatted_time(formatted_time: str):
+    # parse times returned by memgpt.utils.get_formatted_time()
+    try:
+        return datetime.strptime(formatted_time.strip(), "%Y-%m-%d %I:%M:%S %p %Z%z")
+    except:
+        return datetime.strptime(formatted_time.strip(), "%Y-%m-%d %I:%M:%S %p")
+
+
+class PersistenceManager(ABC):
+    @abstractmethod
+    def trim_messages(self, num):
+        pass
+
+    @abstractmethod
+    def prepend_to_messages(self, added_messages):
+        pass
+
+    @abstractmethod
+    def append_to_messages(self, added_messages):
+        pass
+
+    @abstractmethod
+    def swap_system_message(self, new_system_message):
+        pass
+
+    @abstractmethod
+    def update_memory(self, new_memory):
+        pass
+
+
+class LocalStateManager(PersistenceManager):
+    """In-memory state manager has nothing to manage, all agents are held in-memory"""
+
+    recall_memory_cls = BaseRecallMemory
+    archival_memory_cls = EmbeddingArchivalMemory
+
+    def __init__(self, agent_state: AgentState):
+        # Memory held in-state useful for debugging stateful versions
+        self.memory = None
+        # self.messages = []  # current in-context messages
+        # self.all_messages = [] # all messages seen in current session (needed if lazily synchronizing state with DB)
+        self.archival_memory = EmbeddingArchivalMemory(agent_state)
+        self.recall_memory = BaseRecallMemory(agent_state)
+        # self.agent_state = agent_state
+
+    def save(self):
+        """Ensure storage connectors save data"""
+        self.archival_memory.save()
+        self.recall_memory.save()
+
+    def init(self, agent):
+        """Connect persistent state manager to agent"""
+        printd(f"Initializing {self.__class__.__name__} with agent object")
+        # self.all_messages = [{"timestamp": get_local_time(), "message": msg} for msg in agent.messages.copy()]
+        # self.messages = [{"timestamp": get_local_time(), "message": msg} for msg in agent.messages.copy()]
+        self.memory = agent.memory
+        # printd(f"{self.__class__.__name__}.all_messages.len = {len(self.all_messages)}")
+        printd(f"{self.__class__.__name__}.messages.len = {len(self.messages)}")
+
+    '''
+    def json_to_message(self, message_json) -> Message:
+        """Convert agent message JSON into Message object"""
+
+        # get message
+        if "message" in message_json:
+            message = message_json["message"]
+        else:
+            message = message_json
+
+        # get timestamp
+        if "timestamp" in message_json:
+            timestamp = parse_formatted_time(message_json["timestamp"])
+        else:
+            timestamp = get_local_time()
+
+        # TODO: change this when we fully migrate to tool calls API
+        if "function_call" in message:
+            tool_calls = [
+                ToolCall(
+                    id=message["tool_call_id"],
+                    tool_call_type="function",
+                    function={
+                        "name": message["function_call"]["name"],
+                        "arguments": message["function_call"]["arguments"],
+                    },
+                )
+            ]
+            printd(f"Saving tool calls {[vars(tc) for tc in tool_calls]}")
+        else:
+            tool_calls = None
+
+        # if message["role"] == "function":
+        # message["role"] = "tool"
+
+        return Message(
+            user_id=self.agent_state.user_id,
+            agent_id=self.agent_state.id,
+            role=message["role"],
+            text=message["content"],
+            name=message["name"] if "name" in message else None,
+            model=self.agent_state.llm_config.model,
+            created_at=timestamp,
+            tool_calls=tool_calls,
+            tool_call_id=message["tool_call_id"] if "tool_call_id" in message else None,
+            id=message["id"] if "id" in message else None,
+        )
+    '''
+
+    def trim_messages(self, num):
+        # printd(f"InMemoryStateManager.trim_messages")
+        # self.messages = [self.messages[0]] + self.messages[num:]
+        pass
+
+    def prepend_to_messages(self, added_messages: List[Message]):
+        # first tag with timestamps
+        # added_messages = [{"timestamp": get_local_time(), "message": msg} for msg in added_messages]
+
+        printd(f"{self.__class__.__name__}.prepend_to_message")
+        # self.messages = [self.messages[0]] + added_messages + self.messages[1:]
+
+        # add to recall memory
+        self.recall_memory.insert_many([m for m in added_messages])
+
+    def append_to_messages(self, added_messages: List[Message]):
+        # first tag with timestamps
+        # added_messages = [{"timestamp": get_local_time(), "message": msg} for msg in added_messages]
+
+        printd(f"{self.__class__.__name__}.append_to_messages")
+        # self.messages = self.messages + added_messages
+
+        # add to recall memory
+        self.recall_memory.insert_many([m for m in added_messages])
+
+    def swap_system_message(self, new_system_message: Message):
+        # first tag with timestamps
+        # new_system_message = {"timestamp": get_local_time(), "message": new_system_message}
+
+        printd(f"{self.__class__.__name__}.swap_system_message")
+        # self.messages[0] = new_system_message
+
+        # add to recall memory
+        self.recall_memory.insert(new_system_message)
+
+    def update_memory(self, new_memory):
+        printd(f"{self.__class__.__name__}.update_memory")
+        self.memory = new_memory
--- a/memgpt/presets/presets.py
+++ b/memgpt/presets/presets.py
@@ -1,91 +1,91 @@
-import importlib
-import inspect
-import os
-import uuid
-
-from memgpt.data_types import AgentState, Preset
-from memgpt.functions.functions import load_function_set
-from memgpt.interface import AgentInterface
-from memgpt.metadata import MetadataStore
-from memgpt.models.pydantic_models import HumanModel, PersonaModel, ToolModel
-from memgpt.presets.utils import load_all_presets
-from memgpt.utils import list_human_files, list_persona_files, printd
-
-available_presets = load_all_presets()
-preset_options = list(available_presets.keys())
-
-
-def load_module_tools(module_name="base"):
-    # return List[ToolModel] from base.py tools
-    full_module_name = f"memgpt.functions.function_sets.{module_name}"
-    try:
-        module = importlib.import_module(full_module_name)
-    except Exception as e:
-        # Handle other general exceptions
-        raise e
-
-    # function tags
-
-    try:
-        # Load the function set
-        functions_to_schema = load_function_set(module)
-    except ValueError as e:
-        err = f"Error loading function set '{module_name}': {e}"
-        printd(err)
-
-    # create tool in db
-    tools = []
-    for name, schema in functions_to_schema.items():
-        # print([str(inspect.getsource(line)) for line in schema["imports"]])
-        source_code = inspect.getsource(schema["python_function"])
-        tags = [module_name]
-        if module_name == "base":
-            tags.append("memgpt-base")
-
-        tools.append(
-            ToolModel(
-                name=name,
-                tags=tags,
-                source_type="python",
-                module=schema["module"],
-                source_code=source_code,
-                json_schema=schema["json_schema"],
-            )
-        )
-    return tools
-
-
-def add_default_tools(user_id: uuid.UUID, ms: MetadataStore):
-    module_name = "base"
-    for tool in load_module_tools(module_name=module_name):
-        existing_tool = ms.get_tool(tool.name)
-        if not existing_tool:
-            ms.add_tool(tool)
-
-
-def add_default_humans_and_personas(user_id: uuid.UUID, ms: MetadataStore):
-    for persona_file in list_persona_files():
-        text = open(persona_file, "r", encoding="utf-8").read()
-        name = os.path.basename(persona_file).replace(".txt", "")
-        if ms.get_persona(user_id=user_id, name=name) is not None:
-            printd(f"Persona '{name}' already exists for user '{user_id}'")
-            continue
-        persona = PersonaModel(name=name, text=text, user_id=user_id)
-        ms.add_persona(persona)
-    for human_file in list_human_files():
-        text = open(human_file, "r", encoding="utf-8").read()
-        name = os.path.basename(human_file).replace(".txt", "")
-        if ms.get_human(user_id=user_id, name=name) is not None:
-            printd(f"Human '{name}' already exists for user '{user_id}'")
-            continue
-        human = HumanModel(name=name, text=text, user_id=user_id)
-        print(human, user_id)
-        ms.add_human(human)
-
-
-# def create_agent_from_preset(preset_name, agent_config, model, persona, human, interface, persistence_manager):
-def create_agent_from_preset(
-    agent_state: AgentState, preset: Preset, interface: AgentInterface, persona_is_file: bool = True, human_is_file: bool = True
-):
-    """Initialize a new agent from a preset (combination of system + function)"""
-    raise DeprecationWarning("Function no longer supported - pass a Preset object to Agent.__init__ instead")
+import importlib
+import inspect
+import os
+import uuid
+
+from memgpt.data_types import AgentState, Preset
+from memgpt.functions.functions import load_function_set
+from memgpt.interface import AgentInterface
+from memgpt.metadata import MetadataStore
+from memgpt.models.pydantic_models import HumanModel, PersonaModel, ToolModel
+from memgpt.presets.utils import load_all_presets
+from memgpt.utils import list_human_files, list_persona_files, printd
+
+available_presets = load_all_presets()
+preset_options = list(available_presets.keys())
+
+
+def load_module_tools(module_name="base"):
+    # return List[ToolModel] from base.py tools
+    full_module_name = f"memgpt.functions.function_sets.{module_name}"
+    try:
+        module = importlib.import_module(full_module_name)
+    except Exception as e:
+        # Handle other general exceptions
+        raise e
+
+    # function tags
+
+    try:
+        # Load the function set
+        functions_to_schema = load_function_set(module)
+    except ValueError as e:
+        err = f"Error loading function set '{module_name}': {e}"
+        printd(err)
+
+    # create tool in db
+    tools = []
+    for name, schema in functions_to_schema.items():
+        # print([str(inspect.getsource(line)) for line in schema["imports"]])
+        source_code = inspect.getsource(schema["python_function"])
+        tags = [module_name]
+        if module_name == "base":
+            tags.append("memgpt-base")
+
+        tools.append(
+            ToolModel(
+                name=name,
+                tags=tags,
+                source_type="python",
+                module=schema["module"],
+                source_code=source_code,
+                json_schema=schema["json_schema"],
+            )
+        )
+    return tools
+
+
+def add_default_tools(user_id: uuid.UUID, ms: MetadataStore):
+    module_name = "base"
+    for tool in load_module_tools(module_name=module_name):
+        existing_tool = ms.get_tool(tool.name)
+        if not existing_tool:
+            ms.add_tool(tool)
+
+
+def add_default_humans_and_personas(user_id: uuid.UUID, ms: MetadataStore):
+    for persona_file in list_persona_files():
+        text = open(persona_file, "r", encoding="utf-8").read()
+        name = os.path.basename(persona_file).replace(".txt", "")
+        if ms.get_persona(user_id=user_id, name=name) is not None:
+            printd(f"Persona '{name}' already exists for user '{user_id}'")
+            continue
+        persona = PersonaModel(name=name, text=text, user_id=user_id)
+        ms.add_persona(persona)
+    for human_file in list_human_files():
+        text = open(human_file, "r", encoding="utf-8").read()
+        name = os.path.basename(human_file).replace(".txt", "")
+        if ms.get_human(user_id=user_id, name=name) is not None:
+            printd(f"Human '{name}' already exists for user '{user_id}'")
+            continue
+        human = HumanModel(name=name, text=text, user_id=user_id)
+        print(human, user_id)
+        ms.add_human(human)
+
+
+# def create_agent_from_preset(preset_name, agent_config, model, persona, human, interface, persistence_manager):
+def create_agent_from_preset(
+    agent_state: AgentState, preset: Preset, interface: AgentInterface, persona_is_file: bool = True, human_is_file: bool = True
+):
+    """Initialize a new agent from a preset (combination of system + function)"""
+    raise DeprecationWarning("Function no longer supported - pass a Preset object to Agent.__init__ instead")
--- a/memgpt/prompts/gpt_functions.py
+++ b/memgpt/prompts/gpt_functions.py
@@ -1,312 +1,312 @@
-from ..constants import FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT, MAX_PAUSE_HEARTBEATS
-
-# FUNCTIONS_PROMPT_MULTISTEP_NO_HEARTBEATS = FUNCTIONS_PROMPT_MULTISTEP[:-1]
-FUNCTIONS_CHAINING = {
-    "send_message": {
-        "name": "send_message",
-        "description": "Sends a message to the human user.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                # https://json-schema.org/understanding-json-schema/reference/array.html
-                "message": {
-                    "type": "string",
-                    "description": "Message contents. All unicode (including emojis) are supported.",
-                },
-            },
-            "required": ["message"],
-        },
-    },
-    "pause_heartbeats": {
-        "name": "pause_heartbeats",
-        "description": "Temporarily ignore timed heartbeats. You may still receive messages from manual heartbeats and other events.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                # https://json-schema.org/understanding-json-schema/reference/array.html
-                "minutes": {
-                    "type": "integer",
-                    "description": f"Number of minutes to ignore heartbeats for. Max value of {MAX_PAUSE_HEARTBEATS} minutes ({MAX_PAUSE_HEARTBEATS//60} hours).",
-                },
-            },
-            "required": ["minutes"],
-        },
-    },
-    "message_chatgpt": {
-        "name": "message_chatgpt",
-        "description": "Send a message to a more basic AI, ChatGPT. A useful resource for asking questions. ChatGPT does not retain memory of previous interactions.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                # https://json-schema.org/understanding-json-schema/reference/array.html
-                "message": {
-                    "type": "string",
-                    "description": "Message to send ChatGPT. Phrase your message as a full English sentence.",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["message", "request_heartbeat"],
-        },
-    },
-    "core_memory_append": {
-        "name": "core_memory_append",
-        "description": "Append to the contents of core memory.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "name": {
-                    "type": "string",
-                    "description": "Section of the memory to be edited (persona or human).",
-                },
-                "content": {
-                    "type": "string",
-                    "description": "Content to write to the memory. All unicode (including emojis) are supported.",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["name", "content", "request_heartbeat"],
-        },
-    },
-    "core_memory_replace": {
-        "name": "core_memory_replace",
-        "description": "Replace the contents of core memory. To delete memories, use an empty string for new_content.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "name": {
-                    "type": "string",
-                    "description": "Section of the memory to be edited (persona or human).",
-                },
-                "old_content": {
-                    "type": "string",
-                    "description": "String to replace. Must be an exact match.",
-                },
-                "new_content": {
-                    "type": "string",
-                    "description": "Content to write to the memory. All unicode (including emojis) are supported.",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["name", "old_content", "new_content", "request_heartbeat"],
-        },
-    },
-    "recall_memory_search": {
-        "name": "recall_memory_search",
-        "description": "Search prior conversation history using a string.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "query": {
-                    "type": "string",
-                    "description": "String to search for.",
-                },
-                "page": {
-                    "type": "integer",
-                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["query", "page", "request_heartbeat"],
-        },
-    },
-    "conversation_search": {
-        "name": "conversation_search",
-        "description": "Search prior conversation history using case-insensitive string matching.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "query": {
-                    "type": "string",
-                    "description": "String to search for.",
-                },
-                "page": {
-                    "type": "integer",
-                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["query", "request_heartbeat"],
-        },
-    },
-    "recall_memory_search_date": {
-        "name": "recall_memory_search_date",
-        "description": "Search prior conversation history using a date range.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "start_date": {
-                    "type": "string",
-                    "description": "The start of the date range to search, in the format 'YYYY-MM-DD'.",
-                },
-                "end_date": {
-                    "type": "string",
-                    "description": "The end of the date range to search, in the format 'YYYY-MM-DD'.",
-                },
-                "page": {
-                    "type": "integer",
-                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["start_date", "end_date", "page", "request_heartbeat"],
-        },
-    },
-    "conversation_search_date": {
-        "name": "conversation_search_date",
-        "description": "Search prior conversation history using a date range.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "start_date": {
-                    "type": "string",
-                    "description": "The start of the date range to search, in the format 'YYYY-MM-DD'.",
-                },
-                "end_date": {
-                    "type": "string",
-                    "description": "The end of the date range to search, in the format 'YYYY-MM-DD'.",
-                },
-                "page": {
-                    "type": "integer",
-                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["start_date", "end_date", "request_heartbeat"],
-        },
-    },
-    "archival_memory_insert": {
-        "name": "archival_memory_insert",
-        "description": "Add to archival memory. Make sure to phrase the memory contents such that it can be easily queried later.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "content": {
-                    "type": "string",
-                    "description": "Content to write to the memory. All unicode (including emojis) are supported.",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["content", "request_heartbeat"],
-        },
-    },
-    "archival_memory_search": {
-        "name": "archival_memory_search",
-        "description": "Search archival memory using semantic (embedding-based) search.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "query": {
-                    "type": "string",
-                    "description": "String to search for.",
-                },
-                "page": {
-                    "type": "integer",
-                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["query", "request_heartbeat"],
-        },
-    },
-    "read_from_text_file": {
-        "name": "read_from_text_file",
-        "description": "Read lines from a text file.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "filename": {
-                    "type": "string",
-                    "description": "The name of the file to read.",
-                },
-                "line_start": {
-                    "type": "integer",
-                    "description": "Line to start reading from.",
-                },
-                "num_lines": {
-                    "type": "integer",
-                    "description": "How many lines to read (defaults to 1).",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["filename", "line_start", "request_heartbeat"],
-        },
-    },
-    "append_to_text_file": {
-        "name": "append_to_text_file",
-        "description": "Append to a text file.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "filename": {
-                    "type": "string",
-                    "description": "The name of the file to append to.",
-                },
-                "content": {
-                    "type": "string",
-                    "description": "Content to append to the file.",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["filename", "content", "request_heartbeat"],
-        },
-    },
-    "http_request": {
-        "name": "http_request",
-        "description": "Generates an HTTP request and returns the response.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "method": {
-                    "type": "string",
-                    "description": "The HTTP method (e.g., 'GET', 'POST').",
-                },
-                "url": {
-                    "type": "string",
-                    "description": "The URL for the request.",
-                },
-                "payload_json": {
-                    "type": "string",
-                    "description": "A JSON string representing the request payload.",
-                },
-                "request_heartbeat": {
-                    "type": "boolean",
-                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
-                },
-            },
-            "required": ["method", "url", "request_heartbeat"],
-        },
-    },
-}
+from ..constants import FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT, MAX_PAUSE_HEARTBEATS
+
+# FUNCTIONS_PROMPT_MULTISTEP_NO_HEARTBEATS = FUNCTIONS_PROMPT_MULTISTEP[:-1]
+FUNCTIONS_CHAINING = {
+    "send_message": {
+        "name": "send_message",
+        "description": "Sends a message to the human user.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                # https://json-schema.org/understanding-json-schema/reference/array.html
+                "message": {
+                    "type": "string",
+                    "description": "Message contents. All unicode (including emojis) are supported.",
+                },
+            },
+            "required": ["message"],
+        },
+    },
+    "pause_heartbeats": {
+        "name": "pause_heartbeats",
+        "description": "Temporarily ignore timed heartbeats. You may still receive messages from manual heartbeats and other events.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                # https://json-schema.org/understanding-json-schema/reference/array.html
+                "minutes": {
+                    "type": "integer",
+                    "description": f"Number of minutes to ignore heartbeats for. Max value of {MAX_PAUSE_HEARTBEATS} minutes ({MAX_PAUSE_HEARTBEATS//60} hours).",
+                },
+            },
+            "required": ["minutes"],
+        },
+    },
+    "message_chatgpt": {
+        "name": "message_chatgpt",
+        "description": "Send a message to a more basic AI, ChatGPT. A useful resource for asking questions. ChatGPT does not retain memory of previous interactions.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                # https://json-schema.org/understanding-json-schema/reference/array.html
+                "message": {
+                    "type": "string",
+                    "description": "Message to send ChatGPT. Phrase your message as a full English sentence.",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["message", "request_heartbeat"],
+        },
+    },
+    "core_memory_append": {
+        "name": "core_memory_append",
+        "description": "Append to the contents of core memory.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "name": {
+                    "type": "string",
+                    "description": "Section of the memory to be edited (persona or human).",
+                },
+                "content": {
+                    "type": "string",
+                    "description": "Content to write to the memory. All unicode (including emojis) are supported.",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["name", "content", "request_heartbeat"],
+        },
+    },
+    "core_memory_replace": {
+        "name": "core_memory_replace",
+        "description": "Replace the contents of core memory. To delete memories, use an empty string for new_content.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "name": {
+                    "type": "string",
+                    "description": "Section of the memory to be edited (persona or human).",
+                },
+                "old_content": {
+                    "type": "string",
+                    "description": "String to replace. Must be an exact match.",
+                },
+                "new_content": {
+                    "type": "string",
+                    "description": "Content to write to the memory. All unicode (including emojis) are supported.",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["name", "old_content", "new_content", "request_heartbeat"],
+        },
+    },
+    "recall_memory_search": {
+        "name": "recall_memory_search",
+        "description": "Search prior conversation history using a string.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "query": {
+                    "type": "string",
+                    "description": "String to search for.",
+                },
+                "page": {
+                    "type": "integer",
+                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["query", "page", "request_heartbeat"],
+        },
+    },
+    "conversation_search": {
+        "name": "conversation_search",
+        "description": "Search prior conversation history using case-insensitive string matching.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "query": {
+                    "type": "string",
+                    "description": "String to search for.",
+                },
+                "page": {
+                    "type": "integer",
+                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["query", "request_heartbeat"],
+        },
+    },
+    "recall_memory_search_date": {
+        "name": "recall_memory_search_date",
+        "description": "Search prior conversation history using a date range.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "start_date": {
+                    "type": "string",
+                    "description": "The start of the date range to search, in the format 'YYYY-MM-DD'.",
+                },
+                "end_date": {
+                    "type": "string",
+                    "description": "The end of the date range to search, in the format 'YYYY-MM-DD'.",
+                },
+                "page": {
+                    "type": "integer",
+                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["start_date", "end_date", "page", "request_heartbeat"],
+        },
+    },
+    "conversation_search_date": {
+        "name": "conversation_search_date",
+        "description": "Search prior conversation history using a date range.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "start_date": {
+                    "type": "string",
+                    "description": "The start of the date range to search, in the format 'YYYY-MM-DD'.",
+                },
+                "end_date": {
+                    "type": "string",
+                    "description": "The end of the date range to search, in the format 'YYYY-MM-DD'.",
+                },
+                "page": {
+                    "type": "integer",
+                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["start_date", "end_date", "request_heartbeat"],
+        },
+    },
+    "archival_memory_insert": {
+        "name": "archival_memory_insert",
+        "description": "Add to archival memory. Make sure to phrase the memory contents such that it can be easily queried later.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "content": {
+                    "type": "string",
+                    "description": "Content to write to the memory. All unicode (including emojis) are supported.",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["content", "request_heartbeat"],
+        },
+    },
+    "archival_memory_search": {
+        "name": "archival_memory_search",
+        "description": "Search archival memory using semantic (embedding-based) search.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "query": {
+                    "type": "string",
+                    "description": "String to search for.",
+                },
+                "page": {
+                    "type": "integer",
+                    "description": "Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["query", "request_heartbeat"],
+        },
+    },
+    "read_from_text_file": {
+        "name": "read_from_text_file",
+        "description": "Read lines from a text file.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "filename": {
+                    "type": "string",
+                    "description": "The name of the file to read.",
+                },
+                "line_start": {
+                    "type": "integer",
+                    "description": "Line to start reading from.",
+                },
+                "num_lines": {
+                    "type": "integer",
+                    "description": "How many lines to read (defaults to 1).",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["filename", "line_start", "request_heartbeat"],
+        },
+    },
+    "append_to_text_file": {
+        "name": "append_to_text_file",
+        "description": "Append to a text file.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "filename": {
+                    "type": "string",
+                    "description": "The name of the file to append to.",
+                },
+                "content": {
+                    "type": "string",
+                    "description": "Content to append to the file.",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["filename", "content", "request_heartbeat"],
+        },
+    },
+    "http_request": {
+        "name": "http_request",
+        "description": "Generates an HTTP request and returns the response.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "method": {
+                    "type": "string",
+                    "description": "The HTTP method (e.g., 'GET', 'POST').",
+                },
+                "url": {
+                    "type": "string",
+                    "description": "The URL for the request.",
+                },
+                "payload_json": {
+                    "type": "string",
+                    "description": "A JSON string representing the request payload.",
+                },
+                "request_heartbeat": {
+                    "type": "boolean",
+                    "description": FUNCTION_PARAM_DESCRIPTION_REQ_HEARTBEAT,
+                },
+            },
+            "required": ["method", "url", "request_heartbeat"],
+        },
+    },
+}
--- a/memgpt/prompts/gpt_summarize.py
+++ b/memgpt/prompts/gpt_summarize.py
@@ -1,14 +1,14 @@
-WORD_LIMIT = 100
-SYSTEM = f"""
-Your job is to summarize a history of previous messages in a conversation between an AI persona and a human.
-The conversation you are given is a from a fixed context window and may not be complete.
-Messages sent by the AI are marked with the 'assistant' role.
-The AI 'assistant' can also make calls to functions, whose outputs can be seen in messages with the 'function' role.
-Things the AI says in the message content are considered inner monologue and are not seen by the user.
-The only AI messages seen by the user are from when the AI uses 'send_message'.
-Messages the user sends are in the 'user' role.
-The 'user' role is also used for important system events, such as login events and heartbeat events (heartbeats run the AI's program without user action, allowing the AI to act without prompting from the user sending them a message).
-Summarize what happened in the conversation from the perspective of the AI (use the first person).
-Keep your summary less than {WORD_LIMIT} words, do NOT exceed this word limit.
-Only output the summary, do NOT include anything else in your output.
-"""
+WORD_LIMIT = 100
+SYSTEM = f"""
+Your job is to summarize a history of previous messages in a conversation between an AI persona and a human.
+The conversation you are given is a from a fixed context window and may not be complete.
+Messages sent by the AI are marked with the 'assistant' role.
+The AI 'assistant' can also make calls to functions, whose outputs can be seen in messages with the 'function' role.
+Things the AI says in the message content are considered inner monologue and are not seen by the user.
+The only AI messages seen by the user are from when the AI uses 'send_message'.
+Messages the user sends are in the 'user' role.
+The 'user' role is also used for important system events, such as login events and heartbeat events (heartbeats run the AI's program without user action, allowing the AI to act without prompting from the user sending them a message).
+Summarize what happened in the conversation from the perspective of the AI (use the first person).
+Keep your summary less than {WORD_LIMIT} words, do NOT exceed this word limit.
+Only output the summary, do NOT include anything else in your output.
+"""
--- a/memgpt/prompts/gpt_system.py
+++ b/memgpt/prompts/gpt_system.py
@@ -1,26 +1,26 @@
-import os
-
-from memgpt.constants import MEMGPT_DIR
-
-
-def get_system_text(key):
-    filename = f"{key}.txt"
-    file_path = os.path.join(os.path.dirname(__file__), "system", filename)
-
-    # first look in prompts/system/*.txt
-    if os.path.exists(file_path):
-        with open(file_path, "r", encoding="utf-8") as file:
-            return file.read().strip()
-    else:
-        # try looking in ~/.memgpt/system_prompts/*.txt
-        user_system_prompts_dir = os.path.join(MEMGPT_DIR, "system_prompts")
-        # create directory if it doesn't exist
-        if not os.path.exists(user_system_prompts_dir):
-            os.makedirs(user_system_prompts_dir)
-        # look inside for a matching system prompt
-        file_path = os.path.join(user_system_prompts_dir, filename)
-        if os.path.exists(file_path):
-            with open(file_path, "r", encoding="utf-8") as file:
-                return file.read().strip()
-        else:
-            raise FileNotFoundError(f"No file found for key {key}, path={file_path}")
+import os
+
+from memgpt.constants import MEMGPT_DIR
+
+
+def get_system_text(key):
+    filename = f"{key}.txt"
+    file_path = os.path.join(os.path.dirname(__file__), "system", filename)
+
+    # first look in prompts/system/*.txt
+    if os.path.exists(file_path):
+        with open(file_path, "r", encoding="utf-8") as file:
+            return file.read().strip()
+    else:
+        # try looking in ~/.memgpt/system_prompts/*.txt
+        user_system_prompts_dir = os.path.join(MEMGPT_DIR, "system_prompts")
+        # create directory if it doesn't exist
+        if not os.path.exists(user_system_prompts_dir):
+            os.makedirs(user_system_prompts_dir)
+        # look inside for a matching system prompt
+        file_path = os.path.join(user_system_prompts_dir, filename)
+        if os.path.exists(file_path):
+            with open(file_path, "r", encoding="utf-8") as file:
+                return file.read().strip()
+        else:
+            raise FileNotFoundError(f"No file found for key {key}, path={file_path}")
--- a/memgpt/system.py
+++ b/memgpt/system.py
@@ -1,208 +1,208 @@
-import json
-import uuid
-from typing import Optional
-
-from .constants import (
-    INITIAL_BOOT_MESSAGE,
-    INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG,
-    INITIAL_BOOT_MESSAGE_SEND_MESSAGE_THOUGHT,
-    JSON_ENSURE_ASCII,
-    MESSAGE_SUMMARY_WARNING_STR,
-)
-from .utils import get_local_time
-
-
-def get_initial_boot_messages(version="startup"):
-    if version == "startup":
-        initial_boot_message = INITIAL_BOOT_MESSAGE
-        messages = [
-            {"role": "assistant", "content": initial_boot_message},
-        ]
-
-    elif version == "startup_with_send_message":
-        tool_call_id = str(uuid.uuid4())
-        messages = [
-            # first message includes both inner monologue and function call to send_message
-            {
-                "role": "assistant",
-                "content": INITIAL_BOOT_MESSAGE_SEND_MESSAGE_THOUGHT,
-                # "function_call": {
-                #     "name": "send_message",
-                #     "arguments": '{\n  "message": "' + f"{INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG}" + '"\n}',
-                # },
-                "tool_calls": [
-                    {
-                        "id": tool_call_id,
-                        "type": "function",
-                        "function": {
-                            "name": "send_message",
-                            "arguments": '{\n  "message": "' + f"{INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG}" + '"\n}',
-                        },
-                    }
-                ],
-            },
-            # obligatory function return message
-            {
-                # "role": "function",
-                "role": "tool",
-                "name": "send_message",  # NOTE: technically not up to spec, this is old functions style
-                "content": package_function_response(True, None),
-                "tool_call_id": tool_call_id,
-            },
-        ]
-
-    elif version == "startup_with_send_message_gpt35":
-        tool_call_id = str(uuid.uuid4())
-        messages = [
-            # first message includes both inner monologue and function call to send_message
-            {
-                "role": "assistant",
-                "content": "*inner thoughts* Still waiting on the user. Sending a message with function.",
-                # "function_call": {"name": "send_message", "arguments": '{\n  "message": "' + f"Hi, is anyone there?" + '"\n}'},
-                "tool_calls": [
-                    {
-                        "id": tool_call_id,
-                        "type": "function",
-                        "function": {
-                            "name": "send_message",
-                            "arguments": '{\n  "message": "' + f"Hi, is anyone there?" + '"\n}',
-                        },
-                    }
-                ],
-            },
-            # obligatory function return message
-            {
-                # "role": "function",
-                "role": "tool",
-                "name": "send_message",
-                "content": package_function_response(True, None),
-                "tool_call_id": tool_call_id,
-            },
-        ]
-
-    else:
-        raise ValueError(version)
-
-    return messages
-
-
-def get_heartbeat(reason="Automated timer", include_location=False, location_name="San Francisco, CA, USA"):
-    # Package the message with time and location
-    formatted_time = get_local_time()
-    packaged_message = {
-        "type": "heartbeat",
-        "reason": reason,
-        "time": formatted_time,
-    }
-
-    if include_location:
-        packaged_message["location"] = location_name
-
-    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
-
-
-def get_login_event(last_login="Never (first login)", include_location=False, location_name="San Francisco, CA, USA"):
-    # Package the message with time and location
-    formatted_time = get_local_time()
-    packaged_message = {
-        "type": "login",
-        "last_login": last_login,
-        "time": formatted_time,
-    }
-
-    if include_location:
-        packaged_message["location"] = location_name
-
-    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
-
-
-def package_user_message(
-    user_message: str,
-    time: Optional[str] = None,
-    include_location: bool = False,
-    location_name: Optional[str] = "San Francisco, CA, USA",
-    name: Optional[str] = None,
-):
-    # Package the message with time and location
-    formatted_time = time if time else get_local_time()
-    packaged_message = {
-        "type": "user_message",
-        "message": user_message,
-        "time": formatted_time,
-    }
-
-    if include_location:
-        packaged_message["location"] = location_name
-
-    if name:
-        packaged_message["name"] = name
-
-    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
-
-
-def package_function_response(was_success, response_string, timestamp=None):
-    formatted_time = get_local_time() if timestamp is None else timestamp
-    packaged_message = {
-        "status": "OK" if was_success else "Failed",
-        "message": response_string,
-        "time": formatted_time,
-    }
-
-    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
-
-
-def package_system_message(system_message, message_type="system_alert", time=None):
-    formatted_time = time if time else get_local_time()
-    packaged_message = {
-        "type": message_type,
-        "message": system_message,
-        "time": formatted_time,
-    }
-
-    return json.dumps(packaged_message)
-
-
-def package_summarize_message(summary, summary_length, hidden_message_count, total_message_count, timestamp=None):
-    context_message = (
-        f"Note: prior messages ({hidden_message_count} of {total_message_count} total messages) have been hidden from view due to conversation memory constraints.\n"
-        + f"The following is a summary of the previous {summary_length} messages:\n {summary}"
-    )
-
-    formatted_time = get_local_time() if timestamp is None else timestamp
-    packaged_message = {
-        "type": "system_alert",
-        "message": context_message,
-        "time": formatted_time,
-    }
-
-    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
-
-
-def package_summarize_message_no_summary(hidden_message_count, timestamp=None, message=None):
-    """Add useful metadata to the summary message"""
-
-    # Package the message with time and location
-    formatted_time = get_local_time() if timestamp is None else timestamp
-    context_message = (
-        message
-        if message
-        else f"Note: {hidden_message_count} prior messages with the user have been hidden from view due to conversation memory constraints. Older messages are stored in Recall Memory and can be viewed using functions."
-    )
-    packaged_message = {
-        "type": "system_alert",
-        "message": context_message,
-        "time": formatted_time,
-    }
-
-    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
-
-
-def get_token_limit_warning():
-    formatted_time = get_local_time()
-    packaged_message = {
-        "type": "system_alert",
-        "message": MESSAGE_SUMMARY_WARNING_STR,
-        "time": formatted_time,
-    }
-
-    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
+import json
+import uuid
+from typing import Optional
+
+from .constants import (
+    INITIAL_BOOT_MESSAGE,
+    INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG,
+    INITIAL_BOOT_MESSAGE_SEND_MESSAGE_THOUGHT,
+    JSON_ENSURE_ASCII,
+    MESSAGE_SUMMARY_WARNING_STR,
+)
+from .utils import get_local_time
+
+
+def get_initial_boot_messages(version="startup"):
+    if version == "startup":
+        initial_boot_message = INITIAL_BOOT_MESSAGE
+        messages = [
+            {"role": "assistant", "content": initial_boot_message},
+        ]
+
+    elif version == "startup_with_send_message":
+        tool_call_id = str(uuid.uuid4())
+        messages = [
+            # first message includes both inner monologue and function call to send_message
+            {
+                "role": "assistant",
+                "content": INITIAL_BOOT_MESSAGE_SEND_MESSAGE_THOUGHT,
+                # "function_call": {
+                #     "name": "send_message",
+                #     "arguments": '{\n  "message": "' + f"{INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG}" + '"\n}',
+                # },
+                "tool_calls": [
+                    {
+                        "id": tool_call_id,
+                        "type": "function",
+                        "function": {
+                            "name": "send_message",
+                            "arguments": '{\n  "message": "' + f"{INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG}" + '"\n}',
+                        },
+                    }
+                ],
+            },
+            # obligatory function return message
+            {
+                # "role": "function",
+                "role": "tool",
+                "name": "send_message",  # NOTE: technically not up to spec, this is old functions style
+                "content": package_function_response(True, None),
+                "tool_call_id": tool_call_id,
+            },
+        ]
+
+    elif version == "startup_with_send_message_gpt35":
+        tool_call_id = str(uuid.uuid4())
+        messages = [
+            # first message includes both inner monologue and function call to send_message
+            {
+                "role": "assistant",
+                "content": "*inner thoughts* Still waiting on the user. Sending a message with function.",
+                # "function_call": {"name": "send_message", "arguments": '{\n  "message": "' + f"Hi, is anyone there?" + '"\n}'},
+                "tool_calls": [
+                    {
+                        "id": tool_call_id,
+                        "type": "function",
+                        "function": {
+                            "name": "send_message",
+                            "arguments": '{\n  "message": "' + f"Hi, is anyone there?" + '"\n}',
+                        },
+                    }
+                ],
+            },
+            # obligatory function return message
+            {
+                # "role": "function",
+                "role": "tool",
+                "name": "send_message",
+                "content": package_function_response(True, None),
+                "tool_call_id": tool_call_id,
+            },
+        ]
+
+    else:
+        raise ValueError(version)
+
+    return messages
+
+
+def get_heartbeat(reason="Automated timer", include_location=False, location_name="San Francisco, CA, USA"):
+    # Package the message with time and location
+    formatted_time = get_local_time()
+    packaged_message = {
+        "type": "heartbeat",
+        "reason": reason,
+        "time": formatted_time,
+    }
+
+    if include_location:
+        packaged_message["location"] = location_name
+
+    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
+
+
+def get_login_event(last_login="Never (first login)", include_location=False, location_name="San Francisco, CA, USA"):
+    # Package the message with time and location
+    formatted_time = get_local_time()
+    packaged_message = {
+        "type": "login",
+        "last_login": last_login,
+        "time": formatted_time,
+    }
+
+    if include_location:
+        packaged_message["location"] = location_name
+
+    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
+
+
+def package_user_message(
+    user_message: str,
+    time: Optional[str] = None,
+    include_location: bool = False,
+    location_name: Optional[str] = "San Francisco, CA, USA",
+    name: Optional[str] = None,
+):
+    # Package the message with time and location
+    formatted_time = time if time else get_local_time()
+    packaged_message = {
+        "type": "user_message",
+        "message": user_message,
+        "time": formatted_time,
+    }
+
+    if include_location:
+        packaged_message["location"] = location_name
+
+    if name:
+        packaged_message["name"] = name
+
+    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
+
+
+def package_function_response(was_success, response_string, timestamp=None):
+    formatted_time = get_local_time() if timestamp is None else timestamp
+    packaged_message = {
+        "status": "OK" if was_success else "Failed",
+        "message": response_string,
+        "time": formatted_time,
+    }
+
+    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
+
+
+def package_system_message(system_message, message_type="system_alert", time=None):
+    formatted_time = time if time else get_local_time()
+    packaged_message = {
+        "type": message_type,
+        "message": system_message,
+        "time": formatted_time,
+    }
+
+    return json.dumps(packaged_message)
+
+
+def package_summarize_message(summary, summary_length, hidden_message_count, total_message_count, timestamp=None):
+    context_message = (
+        f"Note: prior messages ({hidden_message_count} of {total_message_count} total messages) have been hidden from view due to conversation memory constraints.\n"
+        + f"The following is a summary of the previous {summary_length} messages:\n {summary}"
+    )
+
+    formatted_time = get_local_time() if timestamp is None else timestamp
+    packaged_message = {
+        "type": "system_alert",
+        "message": context_message,
+        "time": formatted_time,
+    }
+
+    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
+
+
+def package_summarize_message_no_summary(hidden_message_count, timestamp=None, message=None):
+    """Add useful metadata to the summary message"""
+
+    # Package the message with time and location
+    formatted_time = get_local_time() if timestamp is None else timestamp
+    context_message = (
+        message
+        if message
+        else f"Note: {hidden_message_count} prior messages with the user have been hidden from view due to conversation memory constraints. Older messages are stored in Recall Memory and can be viewed using functions."
+    )
+    packaged_message = {
+        "type": "system_alert",
+        "message": context_message,
+        "time": formatted_time,
+    }
+
+    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
+
+
+def get_token_limit_warning():
+    formatted_time = get_local_time()
+    packaged_message = {
+        "type": "system_alert",
+        "message": MESSAGE_SUMMARY_WARNING_STR,
+        "time": formatted_time,
+    }
+
+    return json.dumps(packaged_message, ensure_ascii=JSON_ENSURE_ASCII)
--- a/memgpt/utils.py
+++ b/memgpt/utils.py