--- title: vLLM slug: guides/server/providers/vllm --- To use Letta with vLLM, set the environment variable `VLLM_API_BASE` to point to your vLLM ChatCompletions server. ## Setting up vLLM 1. Download + install [vLLM](https://docs.vllm.ai/en/latest/getting_started/installation.html) 2. Launch a vLLM **OpenAI-compatible** API server using [the official vLLM documentation](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) For example, if we want to use the model `dolphin-2.2.1-mistral-7b` from [HuggingFace](https://huggingface.co/ehartford/dolphin-2.2.1-mistral-7b), we would run: ```sh python -m vllm.entrypoints.openai.api_server \ --model ehartford/dolphin-2.2.1-mistral-7b ``` vLLM will automatically download the model (if it's not already downloaded) and store it in your [HuggingFace cache directory](https://huggingface.co/docs/datasets/cache). ## Enabling vLLM as a provider To enable the vLLM provider, you must set the `VLLM_API_BASE` environment variable. When this is set, Letta will use available LLM and embedding models running on vLLM. ### Using the `docker run` server with vLLM **macOS/Windows:** Since vLLM is running on the host network, you will need to use `host.docker.internal` to connect to the vLLM server instead of `localhost`. ```bash # replace `~/.letta/.persist/pgdata` with wherever you want to store your agent data docker run \ -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \ -p 8283:8283 \ -e VLLM_API_BASE="http://host.docker.internal:8000" \ letta/letta:latest ``` **Linux:** Use `--network host` and `localhost`: ```bash docker run \ -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \ --network host \ -e VLLM_API_BASE="http://localhost:8000" \ letta/letta:latest ``` ### Using `letta run` and `letta server` with vLLM To chat with an agent, run: ```bash export VLLM_API_BASE="http://localhost:8000" letta run ``` To run the Letta server, run: ```bash export VLLM_API_BASE="http://localhost:8000" letta server ``` To select the model used by the server, use the dropdown in the ADE or specify a `LLMConfig` object in the Python SDK.