added vLLM doc page since we support it (#545)
* added vLLM doc page since we support it * capitalization * updated documentation
This commit is contained in:
@@ -17,7 +17,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
|
||||
If you have an existing agent that you want to move to the koboldcpp backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
memgpt run --agent your_agent --model-endpoint-type koboldcpp --model-endpoint http://localhost:5001
|
||||
```
|
||||
@@ -17,7 +17,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
|
||||
If you have an existing agent that you want to move to the llama.cpp backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
memgpt run --agent your_agent --model-endpoint-type llamacpp --model-endpoint http://localhost:8080
|
||||
```
|
||||
@@ -22,7 +22,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
|
||||
If you have an existing agent that you want to move to the LM Studio backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
memgpt run --agent your_agent --model-endpoint-type lmstudio --model-endpoint http://localhost:1234
|
||||
```
|
||||
|
||||
@@ -37,7 +37,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
|
||||
If you have an existing agent that you want to move to the Ollama backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
# use --model to switch Ollama models (always include the full Ollama model name with the tag)
|
||||
# use --model-wrapper to switch model wrappers
|
||||
|
||||
26
docs/vllm.md
Normal file
26
docs/vllm.md
Normal file
@@ -0,0 +1,26 @@
|
||||
1. Download + install [vLLM](https://docs.vllm.ai/en/latest/getting_started/installation.html) and the model you want to test with
|
||||
2. Launch a vLLM **OpenAI-compatible** API server using [the official vLLM documentation](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
|
||||
|
||||
|
||||
For example, if we want to use the model `dolphin-2.2.1-mistral-7b` from [HuggingFace](https://huggingface.co/ehartford/dolphin-2.2.1-mistral-7b), we would run:
|
||||
```sh
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model ehartford/dolphin-2.2.1-mistral-7b
|
||||
```
|
||||
|
||||
vLLM will automatically download the model (if it's not already downloaded) and store it in your [HuggingFace cache directory](https://huggingface.co/docs/datasets/cache).
|
||||
|
||||
In your terminal where you're running MemGPT, run `memgpt configure` to set the default backend for MemGPT to point at vLLM:
|
||||
```
|
||||
# if you are running vLLM locally, the default IP address + port will be http://localhost:8000
|
||||
? Select LLM inference provider: local
|
||||
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): vllm
|
||||
? Enter default endpoint: http://localhost:8000
|
||||
? Enter HuggingFace model tag (e.g. ehartford/dolphin-2.2.1-mistral-7b): ehartford/dolphin-2.2.1-mistral-7b
|
||||
...
|
||||
```
|
||||
|
||||
If you have an existing agent that you want to move to the vLLM backend, add extra flags to `memgpt run`:
|
||||
```sh
|
||||
memgpt run --agent your_agent --model-endpoint-type vLLM --model-endpoint http://localhost:8000
|
||||
```
|
||||
@@ -23,8 +23,9 @@ nav:
|
||||
# - 'oobabooga web UI (on RunPod)': webui_runpod.md
|
||||
- 'LM Studio': lmstudio.md
|
||||
- 'llama.cpp': llamacpp.md
|
||||
- 'koboldcpp': koboldcpp.md
|
||||
- 'ollama': ollama.md
|
||||
- 'KoboldCpp': koboldcpp.md
|
||||
- 'Ollama': ollama.md
|
||||
- 'vLLM': vllm.md
|
||||
- 'Troubleshooting': local_llm_faq.md
|
||||
- 'Customizing MemGPT':
|
||||
- 'Creating new MemGPT presets': presets.md
|
||||
|
||||
Reference in New Issue
Block a user