letta-server/docs/llamacpp.md at 9d0da9549b098efee22bf05f5d498326fba54832

Files

Sarah Wooders 8ae1e64987 chore: migrate package name to letta (#1775 )

Co-authored-by: Charles Packer <packercharles@gmail.com>
Co-authored-by: Shubham Naik <shubham.naik10@gmail.com>
Co-authored-by: Shubham Naik <shub@memgpt.ai>

2024-09-23 09:15:18 -07:00

1.3 KiB

Raw Blame History

title, excerpt, category

title	excerpt	category
llama.cpp	Setting up Letta with llama.cpp	6580da9a40bb410016b8b0c3

Download + install llama.cpp and the model you want to test with
In your terminal, run ./server -m <MODEL> -c <CONTEXT_LENGTH>

For example, if we downloaded the model dolphin-2.2.1-mistral-7b.Q6_K.gguf and put it inside ~/models/TheBloke/, we would run:

# using `-c 8000` because Dolphin Mistral 7B has a context length of 8000
# the default port is 8080, you can change this with `--port`
./server -m ~/models/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q6_K.gguf -c 8000

In your terminal where you're running Letta, run letta configure to set the default backend for Letta to point at llama.cpp:

# if you are running llama.cpp locally, the default IP address + port will be http://localhost:8080
? Select LLM inference provider: local
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): llamacpp
? Enter default endpoint: http://localhost:8080
...

If you have an existing agent that you want to move to the llama.cpp backend, add extra flags to letta run:

letta run --agent your_agent --model-endpoint-type llamacpp --model-endpoint http://localhost:8080

1.3 KiB Raw Blame History

1.3 KiB

Raw Blame History