Update documentation [local LLMs, presets] (#453)
* updated local llm documentation * updated cli flags to be consistent with documentation * added preset documentation * update test to use new arg * update test to use new arg
This commit is contained in:
@@ -14,6 +14,7 @@ The `memgpt run` command supports the following optional flags (if set, will ove
|
||||
* `--yes`/`-y`: (bool) Skip confirmation prompt and use defaults (default=False)
|
||||
|
||||
You can override the parameters you set with `memgpt configure` with the following additional flags specific to local LLMs:
|
||||
|
||||
* `--model-wrapper`: (str) Model wrapper used by backend (e.g. `airoboros_xxx`)
|
||||
* `--model-endpoint-type`: (str) Model endpoint backend type (e.g. lmstudio, ollama)
|
||||
* `--model-endpoint`: (str) Model endpoint url (e.g. `localhost:5000`)
|
||||
|
||||
@@ -10,8 +10,16 @@ For example, if we downloaded the model `dolphin-2.2.1-mistral-7b.Q6_K.gguf` and
|
||||
./koboldcpp.py ~/models/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q6_K.gguf --contextsize 8192
|
||||
```
|
||||
|
||||
In your terminal where you're running MemGPT, run:
|
||||
```sh
|
||||
export OPENAI_API_BASE=http://localhost:5001
|
||||
export BACKEND_TYPE=koboldcpp
|
||||
In your terminal where you're running MemGPT, run `memgpt configure` to set the default backend for MemGPT to point at koboldcpp:
|
||||
```
|
||||
# if you are running koboldcpp locally, the default IP address + port will be http://localhost:5001
|
||||
? Select LLM inference provider: local
|
||||
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): koboldcpp
|
||||
? Enter default endpoint: http://localhost:5001
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
memgpt run --agent your_agent --model-endpoint-type koboldcpp --model-endpoint http://localhost:5001
|
||||
```
|
||||
@@ -10,8 +10,16 @@ For example, if we downloaded the model `dolphin-2.2.1-mistral-7b.Q6_K.gguf` and
|
||||
./server -m ~/models/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q6_K.gguf -c 8000
|
||||
```
|
||||
|
||||
In your terminal where you're running MemGPT, run:
|
||||
```sh
|
||||
export OPENAI_API_BASE=http://localhost:8080
|
||||
export BACKEND_TYPE=llamacpp
|
||||
In your terminal where you're running MemGPT, run `memgpt configure` to set the default backend for MemGPT to point at llama.cpp:
|
||||
```
|
||||
# if you are running llama.cpp locally, the default IP address + port will be http://localhost:8080
|
||||
? Select LLM inference provider: local
|
||||
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): llamacpp
|
||||
? Enter default endpoint: http://localhost:8080
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
memgpt run --agent your_agent --model-endpoint-type llamacpp --model-endpoint http://localhost:8080
|
||||
```
|
||||
@@ -13,10 +13,16 @@
|
||||
3. Click "Start server"
|
||||
4. Copy the IP address + port that your server is running on (in the example screenshot, the address is `http://localhost:1234`)
|
||||
|
||||
In your terminal where you're running MemGPT, run:
|
||||
|
||||
```sh
|
||||
# if you used a different port in LM Studio, change 1234 to the actual port
|
||||
export OPENAI_API_BASE=http://localhost:1234
|
||||
export BACKEND_TYPE=lmstudio
|
||||
In your terminal where you're running MemGPT, run `memgpt configure` to set the default backend for MemGPT to point at LM Studio:
|
||||
```
|
||||
# if you are running LM Studio locally, the default IP address + port will be http://localhost:1234
|
||||
? Select LLM inference provider: local
|
||||
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): lmstudio
|
||||
? Enter default endpoint: http://localhost:1234
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
memgpt run --agent your_agent --model-endpoint-type lmstudio --model-endpoint http://localhost:1234
|
||||
```
|
||||
|
||||
@@ -15,30 +15,59 @@ pip install 'pymemgpt[local]'
|
||||
### Quick overview
|
||||
|
||||
1. Put your own LLM behind a web server API (e.g. [oobabooga web UI](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui))
|
||||
2. Set `OPENAI_API_BASE=YOUR_API_IP_ADDRESS` and `BACKEND_TYPE=webui`
|
||||
2. Run `memgpt configure` and when prompted select your backend/endpoint type and endpoint address (a default will be provided but you may have to override it)
|
||||
|
||||
For example, if we are running web UI (which defaults to port 5000) on the same computer as MemGPT, we would do the following:
|
||||
```sh
|
||||
# set this to the backend we're using, eg 'webui', 'lmstudio', 'llamacpp', 'koboldcpp'
|
||||
export BACKEND_TYPE=webui
|
||||
# set this to the base address of llm web server
|
||||
export OPENAI_API_BASE=http://127.0.0.1:5000
|
||||
For example, if we are running web UI (which defaults to port 5000) on the same computer as MemGPT, running `memgpt configure` would look like this:
|
||||
```
|
||||
? Select LLM inference provider: local
|
||||
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): webui
|
||||
? Enter default endpoint: http://localhost:5000
|
||||
? Select default model wrapper (recommended: airoboros-l2-70b-2.1): airoboros-l2-70b-2.1
|
||||
? Select your model's context window (for Mistral 7B models, this is probably 8k / 8192): 8192
|
||||
? Select embedding provider: local
|
||||
? Select default preset: memgpt_chat
|
||||
? Select default persona: sam_pov
|
||||
? Select default human: cs_phd
|
||||
? Select storage backend for archival data: local
|
||||
Saving config to /home/user/.memgpt/config
|
||||
```
|
||||
|
||||
Now when we run MemGPT, it will use the LLM running on the local web server.
|
||||
Now when we do `memgpt run`, it will use the LLM running on the local web server.
|
||||
|
||||
If you want to change the local LLM settings of an existing agent, you can pass flags to `memgpt run`:
|
||||
```sh
|
||||
# --model-wrapper will override the wrapper
|
||||
# --model-endpoint will override the endpoint address
|
||||
# --model-endpoint-type will override the backend type
|
||||
|
||||
# if we were previously using "agent_11" with web UI, and now want to use lmstudio, we can do:
|
||||
memgpt run --agent agent_11 --model-endpoint http://localhost:1234 --model-endpoint-type lmstudio
|
||||
```
|
||||
|
||||
### Selecting a model wrapper
|
||||
|
||||
When you use local LLMs, `model` no longer specifies the LLM model that is run (you determine that yourself by loading a model in your backend interface). Instead, `model` refers to the _wrapper_ that is used to parse data sent to and from the LLM backend.
|
||||
When you use local LLMs, you can specify a **model wrapper** that changes how the LLM input text is formatted before it is passed to your LLM.
|
||||
|
||||
You can change the wrapper used with the `--model` flag. For example, the following :
|
||||
You can change the wrapper used with the `--model-wrapper` flag:
|
||||
```sh
|
||||
memgpt run --model airoboros-l2-70b-2.1
|
||||
memgpt run --model-wrapper airoboros-l2-70b-2.1
|
||||
```
|
||||
|
||||
The default wrapper is `airoboros-l2-70b-2.1-grammar` if you are using a backend that supports grammar-based sampling, and `airoboros-l2-70b-2.1` otherwise.
|
||||
You can see the full selection of model wrappers by running `memgpt configure`:
|
||||
```
|
||||
? Select LLM inference provider: local
|
||||
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): webui
|
||||
? Enter default endpoint: http://localhost:5000
|
||||
? Select default model wrapper (recommended: airoboros-l2-70b-2.1): (Use arrow keys)
|
||||
» airoboros-l2-70b-2.1
|
||||
airoboros-l2-70b-2.1-grammar
|
||||
dolphin-2.1-mistral-7b
|
||||
dolphin-2.1-mistral-7b-grammar
|
||||
zephyr-7B
|
||||
zephyr-7B-grammar
|
||||
```
|
||||
|
||||
Note: the wrapper name does **not** have to match the model name. For example, the `dolphin-2.1-mistral-7b` model works better with the `airoboros-l2-70b-2.1` wrapper than the `dolphin-2.1-mistral-7b` wrapper. The model you load inside your LLM backend (e.g. LM Studio) determines what model is actually run, the `--model` flag just determines how the prompt is formatted before it is passed to the LLM backend.
|
||||
Note: the wrapper name does **not** have to match the model name. For example, the `dolphin-2.1-mistral-7b` model works better with the `airoboros-l2-70b-2.1` wrapper than the `dolphin-2.1-mistral-7b` wrapper. The model you load inside your LLM backend (e.g. LM Studio) determines what model is actually run, the `--model-wrapper` flag just determines how the prompt is formatted before it is passed to the LLM backend.
|
||||
|
||||
### Grammars
|
||||
|
||||
@@ -46,6 +75,8 @@ Grammar-based sampling can help improve the performance of MemGPT when using loc
|
||||
|
||||
To use grammar-based sampling, make sure you're using a backend that supports it: webui, llama.cpp, or koboldcpp, then you should specify one of the new wrappers that implements grammars, eg: `airoboros-l2-70b-2.1-grammar`.
|
||||
|
||||
Note that even though grammar-based sampling can reduce the mistakes your LLM makes, it can also make your model inference significantly slower.
|
||||
|
||||
### Supported backends
|
||||
|
||||
Currently, MemGPT supports the following backends:
|
||||
|
||||
@@ -28,12 +28,20 @@ removing any unused layers
|
||||
success
|
||||
```
|
||||
|
||||
In your terminal where you're running MemGPT, run:
|
||||
```sh
|
||||
# By default, Ollama runs an API server on port 11434
|
||||
export OPENAI_API_BASE=http://localhost:11434
|
||||
export BACKEND_TYPE=ollama
|
||||
|
||||
# Make sure to add the tag!
|
||||
export OLLAMA_MODEL=dolphin2.2-mistral:7b-q6_K
|
||||
In your terminal where you're running MemGPT, run `memgpt configure` to set the default backend for MemGPT to point at Ollama:
|
||||
```
|
||||
# if you are running Ollama locally, the default IP address + port will be http://localhost:11434
|
||||
# IMPORTANT: with Ollama, there is an extra required "model name" field
|
||||
? Select LLM inference provider: local
|
||||
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): ollama
|
||||
? Enter default endpoint: http://localhost:11434
|
||||
? Enter default model name (required for Ollama, see: https://memgpt.readthedocs.io/en/latest/ollama): dolphin2.2-mistral:7b-q6_K
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
# use --model to switch Ollama models (always include the full Ollama model name with the tag)
|
||||
# use --model-wrapper to switch model wrappers
|
||||
memgpt run --agent your_agent --model dolphin2.2-mistral:7b-q6_K --model-endpoint-type ollama --model-endpoint http://localhost:11434
|
||||
```
|
||||
21
docs/presets.md
Normal file
21
docs/presets.md
Normal file
@@ -0,0 +1,21 @@
|
||||
## Creating new MemGPT presets
|
||||
|
||||
MemGPT **presets** are a combination default settings including a system prompt and a function set. For example, the `memgpt_docs` preset uses a system prompt that is tuned for document analysis, while the default `memgpt_chat` is tuned for general chatting purposes.
|
||||
|
||||
You can create your own presets by creating a `.yaml` file in the `~/.memgpt/presets` directory. If you want to use a new custom system prompt in your preset, you can create a `.txt` file in the `~/.memgpt/system_prompts` directory.
|
||||
|
||||
For example, if I create a new system prompt and place it in `~/.memgpt/system_prompts/custom_prompt.txt`, I can then create a preset that uses this system prompt by creating a new file `~/.memgpt/presets/custom_preset.yaml`:
|
||||
```yaml
|
||||
system_prompt: "custom_prompt"
|
||||
functions:
|
||||
- "send_message"
|
||||
- "pause_heartbeats"
|
||||
- "core_memory_append"
|
||||
- "core_memory_replace"
|
||||
- "conversation_search"
|
||||
- "conversation_search_date"
|
||||
- "archival_memory_insert"
|
||||
- "archival_memory_search"
|
||||
```
|
||||
|
||||
This preset uses the same base function set as the default presets. You can see the example presets provided [here](https://github.com/cpacker/MemGPT/tree/main/memgpt/presets/examples), and you can see example system prompts [here](https://github.com/cpacker/MemGPT/tree/main/memgpt/prompts/system).
|
||||
@@ -16,12 +16,18 @@ For the purposes of this example, we're going to serve (host) the LLMs using [oo
|
||||
4. If the model was loaded successfully, you should be able to access it via the API (if local, this is probably on port `5000`)
|
||||
5. Assuming steps 1-4 went correctly, the LLM is now properly hosted on a port you can point MemGPT to!
|
||||
|
||||
In your terminal where you're running MemGPT, run:
|
||||
In your terminal where you're running MemGPT, run `memgpt configure` to set the default backend for MemGPT to point at web UI:
|
||||
```
|
||||
# if you are running web UI locally, the default IP address + port will be http://localhost:5000
|
||||
? Select LLM inference provider: local
|
||||
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): webui
|
||||
? Enter default endpoint: http://localhost:5000
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
# if you are running web UI locally, the default port will be 5000
|
||||
export OPENAI_API_BASE=http://127.0.0.1:5000
|
||||
export BACKEND_TYPE=webui
|
||||
memgpt run --agent your_agent --model-endpoint-type webui --model-endpoint http://localhost:5000
|
||||
```
|
||||
|
||||
Text gen web UI exposes a lot of parameters that can dramatically change LLM outputs, to change these you can modify the [web UI settings file](https://github.com/cpacker/MemGPT/blob/main/memgpt/local_llm/webui/settings.py).
|
||||
|
||||
@@ -42,14 +42,12 @@ def run(
|
||||
model_wrapper: str = typer.Option(None, help="Specify the LLM model wrapper"),
|
||||
model_endpoint: str = typer.Option(None, help="Specify the LLM model endpoint"),
|
||||
model_endpoint_type: str = typer.Option(None, help="Specify the LLM model endpoint type"),
|
||||
context_window: int = typer.Option(
|
||||
None, "--context_window", help="The context window of the LLM you are using (e.g. 8k for most Mistral 7B variants)"
|
||||
),
|
||||
context_window: int = typer.Option(None, help="The context window of the LLM you are using (e.g. 8k for most Mistral 7B variants)"),
|
||||
# other
|
||||
first: bool = typer.Option(False, "--first", help="Use --first to send the first message in the sequence"),
|
||||
strip_ui: bool = typer.Option(False, "--strip_ui", help="Remove all the bells and whistles in CLI output (helpful for testing)"),
|
||||
strip_ui: bool = typer.Option(False, help="Remove all the bells and whistles in CLI output (helpful for testing)"),
|
||||
debug: bool = typer.Option(False, "--debug", help="Use --debug to enable debugging output"),
|
||||
no_verify: bool = typer.Option(False, "--no_verify", help="Bypass message verification"),
|
||||
no_verify: bool = typer.Option(False, help="Bypass message verification"),
|
||||
yes: bool = typer.Option(False, "-y", help="Skip confirmation prompt and use defaults"),
|
||||
):
|
||||
"""Start chatting with an MemGPT agent
|
||||
|
||||
@@ -23,6 +23,9 @@ nav:
|
||||
- 'koboldcpp': koboldcpp.md
|
||||
- 'ollama': ollama.md
|
||||
- 'Troubleshooting': local_llm_faq.md
|
||||
- 'Customizing MemGPT':
|
||||
- 'Creating new MemGPT presets': presets.md
|
||||
- 'Giving MemGPT more tools': functions.md
|
||||
- 'Integrations':
|
||||
- 'Autogen': autogen.md
|
||||
- 'Advanced':
|
||||
|
||||
@@ -14,7 +14,7 @@ def test_configure_memgpt():
|
||||
|
||||
def test_save_load():
|
||||
# configure_memgpt() # rely on configure running first^
|
||||
child = pexpect.spawn("memgpt run --agent test_save_load --first --strip_ui")
|
||||
child = pexpect.spawn("memgpt run --agent test_save_load --first --strip-ui")
|
||||
|
||||
child.expect("Enter your message:", timeout=TIMEOUT)
|
||||
child.sendline()
|
||||
@@ -30,7 +30,7 @@ def test_save_load():
|
||||
assert child.isalive() is False, "CLI should have terminated."
|
||||
assert child.exitstatus == 0, "CLI did not exit cleanly."
|
||||
|
||||
child = pexpect.spawn("memgpt run --agent test_save_load --first --strip_ui")
|
||||
child = pexpect.spawn("memgpt run --agent test_save_load --first --strip-ui")
|
||||
child.expect("Using existing agent test_save_load", timeout=TIMEOUT)
|
||||
child.expect("Enter your message:", timeout=TIMEOUT)
|
||||
child.sendline("/exit")
|
||||
|
||||
Reference in New Issue
Block a user