letta-server/docs/vllm.md at 79b078eeffaf2d18d373682ea3d50eaa4de05f80

Files

Sarah Wooders 8ae1e64987 chore: migrate package name to letta (#1775 )

Co-authored-by: Charles Packer <packercharles@gmail.com>
Co-authored-by: Shubham Naik <shubham.naik10@gmail.com>
Co-authored-by: Shubham Naik <shub@memgpt.ai>

2024-09-23 09:15:18 -07:00

1.5 KiB

Raw Blame History

title, excerpt, category

title	excerpt	category
vLLM	Setting up Letta with vLLM	6580da9a40bb410016b8b0c3

Download + install vLLM
Launch a vLLM OpenAI-compatible API server using the official vLLM documentation

For example, if we want to use the model dolphin-2.2.1-mistral-7b from HuggingFace, we would run:

python -m vllm.entrypoints.openai.api_server \
--model ehartford/dolphin-2.2.1-mistral-7b

vLLM will automatically download the model (if it's not already downloaded) and store it in your HuggingFace cache directory.

In your terminal where you're running Letta, run letta configure to set the default backend for Letta to point at vLLM:

# if you are running vLLM locally, the default IP address + port will be http://localhost:8000
? Select LLM inference provider: local
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): vllm
? Enter default endpoint: http://localhost:8000
? Enter HuggingFace model tag (e.g. ehartford/dolphin-2.2.1-mistral-7b): ehartford/dolphin-2.2.1-mistral-7b
...

If you have an existing agent that you want to move to the vLLM backend, add extra flags to letta run:

letta run --agent your_agent --model-endpoint-type vllm --model-endpoint http://localhost:8000 --model ehartford/dolphin-2.2.1-mistral-7b

1.5 KiB Raw Blame History

1.5 KiB

Raw Blame History