Files
letta-server/docs/lmstudio.md
Charles Packer 5d3f2f1a12 added vLLM doc page since we support it (#545)
* added vLLM doc page since we support it

* capitalization

* updated documentation
2023-12-01 11:27:24 -08:00

1.9 KiB

!!! warning "Important LM Studio settings"

**Context length**: Make sure that "context length" (`n_ctx`) is set (in "Model initialization" on the right hand side "Server Model Settings" panel) to the max context length of the model you're using (e.g. 8000 for Mistral 7B variants).

**Automatic Prompt Formatting = OFF**: If you see "Automatic Prompt Formatting" inside LM Studio's "Server Options" panel (on the left side), turn it **OFF**. Leaving it **ON** will break MemGPT.

**Context Overflow Policy = Stop at limit**: If you see "Context Overflow Policy" inside LM Studio's "Tools" panel on the right side (below "Server Model Settings"), set it to **Stop at limit**. The default setting "Keep the system prompt ... truncate middle" will break MemGPT.
image
  1. Download LM Studio and the model you want to test with
  2. Go to the "local inference server" tab, load the model and configure your settings (make sure to set the context length to something reasonable like 8k!)
  3. Click "Start server"
  4. Copy the IP address + port that your server is running on (in the example screenshot, the address is http://localhost:1234)

In your terminal where you're running MemGPT, run memgpt configure to set the default backend for MemGPT to point at LM Studio:

# if you are running LM Studio locally, the default IP address + port will be http://localhost:1234
? Select LLM inference provider: local
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): lmstudio
? Enter default endpoint: http://localhost:1234
...

If you have an existing agent that you want to move to the LM Studio backend, add extra flags to memgpt run:

memgpt run --agent your_agent --model-endpoint-type lmstudio --model-endpoint http://localhost:1234