* Create docs (#323) * Create .readthedocs.yaml * Update mkdocs.yml * update * revise * syntax * syntax * syntax * syntax * revise * revise * spacing * Docs (#327) * add stuff * patch homepage * more docs * updated * updated * refresh * refresh * refresh * update * refresh * refresh * refresh * refresh * missing file * refresh * refresh * refresh * refresh * fix black * refresh * refresh * refresh * refresh * add readme for just the docs * Update README.md * add more data loading docs * cleanup data sources * refresh * revised * add search * make prettier * revised * updated * refresh * favi * updated --------- Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
4.3 KiB
Using MemGPT with local LLMs
!!! warning "MemGPT + local LLM failure cases"
When using open LLMs with MemGPT, **the main failure case will be your LLM outputting a string that cannot be understood by MemGPT**. MemGPT uses function calling to manage memory (eg `edit_core_memory(...)` and interact with the user (`send_message(...)`), so your LLM needs generate outputs that can be parsed into MemGPT function calls.
Make sure to check the [local LLM troubleshooting page](../local_llm_faq) to see common issues before raising a new issue or posting on Discord.
Quick overview
- Put your own LLM behind a web server API (e.g. oobabooga web UI)
- Set
OPENAI_API_BASE=YOUR_API_IP_ADDRESSandBACKEND_TYPE=webui
For example, if we are running web UI (which defaults to port 5000) on the same computer as MemGPT, we would do the following:
# set this to the backend we're using, eg 'webui', 'lmstudio', 'llamacpp', 'koboldcpp'
export BACKEND_TYPE=webui
# set this to the base address of llm web server
export OPENAI_API_BASE=http://127.0.0.1:5000
Now when we run MemGPT, it will use the LLM running on the local web server.
Selecting a model wrapper
When you use local LLMs, model no longer specifies the LLM model that is run (you determine that yourself by loading a model in your backend interface). Instead, model refers to the wrapper that is used to parse data sent to and from the LLM backend.
You can change the wrapper used with the --model flag. For example, the following :
memgpt run --model airoboros-l2-70b-2.1
The default wrapper is airoboros-l2-70b-2.1-grammar if you are using a backend that supports grammar-based sampling, and airoboros-l2-70b-2.1 otherwise.
Note: the wrapper name does not have to match the model name. For example, the dolphin-2.1-mistral-7b model works better with the airoboros-l2-70b-2.1 wrapper than the dolphin-2.1-mistral-7b wrapper. The model you load inside your LLM backend (e.g. LM Studio) determines what model is actually run, the --model flag just determines how the prompt is formatted before it is passed to the LLM backend.
Grammars
Grammar-based sampling can help improve the performance of MemGPT when using local LLMs. Grammar-based sampling works by restricting the outputs of an LLM to a "grammar", for example, the MemGPT JSON function call grammar. Without grammar-based sampling, it is common to encounter JSON-related errors when using local LLMs with MemGPT.
To use grammar-based sampling, make sure you're using a backend that supports it: webui, llama.cpp, or koboldcpp, then you should specify one of the new wrappers that implements grammars, eg: airoboros-l2-70b-2.1-grammar.
Supported backends
Currently, MemGPT supports the following backends:
- oobabooga web UI (Mac, Windows, Linux) (✔️ supports grammars)
- LM Studio (Mac, Windows) (❌ does not support grammars)
- koboldcpp (Mac, Windows, Linux) (✔️ supports grammars)
- llama.cpp (Mac, Windows, Linux) (✔️ supports grammars)
If you would like us to support a new backend, feel free to open an issue or pull request on the MemGPT GitHub page!
Which model should I use?
If you are experimenting with MemGPT and local LLMs for the first time, we recommend you try the Dolphin Mistral finetune (e.g. ehartford/dolphin-2.2.1-mistral-7b or a quantized variant such as dolphin-2.2.1-mistral-7b.Q6_K.gguf), and use the default airoboros wrapper.
Generating MemGPT-compatible outputs is a harder task for an LLM than regular text output. For this reason we strongly advise users to NOT use models below Q5 quantization - as the model gets worse, the number of errors you will encounter while using MemGPT will dramatically increase (MemGPT will not send messages properly, edit memory properly, etc.).
Check out our local LLM GitHub discussion and the MemGPT Discord server for more advice on model selection and help with local LLM troubleshooting.