From 5d3f2f1a12c31af14c927207142006cc488c1932 Mon Sep 17 00:00:00 2001 From: Charles Packer Date: Fri, 1 Dec 2023 11:27:24 -0800 Subject: [PATCH] added vLLM doc page since we support it (#545) * added vLLM doc page since we support it * capitalization * updated documentation --- docs/koboldcpp.md | 2 +- docs/llamacpp.md | 2 +- docs/lmstudio.md | 2 +- docs/ollama.md | 2 +- docs/vllm.md | 26 ++++++++++++++++++++++++++ mkdocs.yml | 5 +++-- 6 files changed, 33 insertions(+), 6 deletions(-) create mode 100644 docs/vllm.md diff --git a/docs/koboldcpp.md b/docs/koboldcpp.md index ee39c041..b74c8442 100644 --- a/docs/koboldcpp.md +++ b/docs/koboldcpp.md @@ -17,7 +17,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the ... ``` -If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`: +If you have an existing agent that you want to move to the koboldcpp backend, add extra flags to `memgpt run`: ```sh memgpt run --agent your_agent --model-endpoint-type koboldcpp --model-endpoint http://localhost:5001 ``` \ No newline at end of file diff --git a/docs/llamacpp.md b/docs/llamacpp.md index 55420619..f3bb03fc 100644 --- a/docs/llamacpp.md +++ b/docs/llamacpp.md @@ -17,7 +17,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the ... ``` -If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`: +If you have an existing agent that you want to move to the llama.cpp backend, add extra flags to `memgpt run`: ```sh memgpt run --agent your_agent --model-endpoint-type llamacpp --model-endpoint http://localhost:8080 ``` \ No newline at end of file diff --git a/docs/lmstudio.md b/docs/lmstudio.md index 754f8244..2314aec3 100644 --- a/docs/lmstudio.md +++ b/docs/lmstudio.md @@ -22,7 +22,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the ... ``` -If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`: +If you have an existing agent that you want to move to the LM Studio backend, add extra flags to `memgpt run`: ```sh memgpt run --agent your_agent --model-endpoint-type lmstudio --model-endpoint http://localhost:1234 ``` diff --git a/docs/ollama.md b/docs/ollama.md index 87314fc4..c201553b 100644 --- a/docs/ollama.md +++ b/docs/ollama.md @@ -37,7 +37,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the ... ``` -If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`: +If you have an existing agent that you want to move to the Ollama backend, add extra flags to `memgpt run`: ```sh # use --model to switch Ollama models (always include the full Ollama model name with the tag) # use --model-wrapper to switch model wrappers diff --git a/docs/vllm.md b/docs/vllm.md new file mode 100644 index 00000000..3721e527 --- /dev/null +++ b/docs/vllm.md @@ -0,0 +1,26 @@ +1. Download + install [vLLM](https://docs.vllm.ai/en/latest/getting_started/installation.html) and the model you want to test with +2. Launch a vLLM **OpenAI-compatible** API server using [the official vLLM documentation](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) + + +For example, if we want to use the model `dolphin-2.2.1-mistral-7b` from [HuggingFace](https://huggingface.co/ehartford/dolphin-2.2.1-mistral-7b), we would run: +```sh +python -m vllm.entrypoints.openai.api_server \ +--model ehartford/dolphin-2.2.1-mistral-7b +``` + +vLLM will automatically download the model (if it's not already downloaded) and store it in your [HuggingFace cache directory](https://huggingface.co/docs/datasets/cache). + +In your terminal where you're running MemGPT, run `memgpt configure` to set the default backend for MemGPT to point at vLLM: +``` +# if you are running vLLM locally, the default IP address + port will be http://localhost:8000 +? Select LLM inference provider: local +? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): vllm +? Enter default endpoint: http://localhost:8000 +? Enter HuggingFace model tag (e.g. ehartford/dolphin-2.2.1-mistral-7b): ehartford/dolphin-2.2.1-mistral-7b +... +``` + +If you have an existing agent that you want to move to the vLLM backend, add extra flags to `memgpt run`: +```sh +memgpt run --agent your_agent --model-endpoint-type vLLM --model-endpoint http://localhost:8000 +``` \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 623fcf01..dc14db31 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -23,8 +23,9 @@ nav: # - 'oobabooga web UI (on RunPod)': webui_runpod.md - 'LM Studio': lmstudio.md - 'llama.cpp': llamacpp.md - - 'koboldcpp': koboldcpp.md - - 'ollama': ollama.md + - 'KoboldCpp': koboldcpp.md + - 'Ollama': ollama.md + - 'vLLM': vllm.md - 'Troubleshooting': local_llm_faq.md - 'Customizing MemGPT': - 'Creating new MemGPT presets': presets.md