Files
letta-server/docs/llamacpp.md
Owen Easter 796c33f1a6 docs: linting, syntax, formatting & spelling fixes for all files (#761)
* Update README.md

* fix: 'ollama run' should be 'ollama pull'

* fix: linting, syntax, spelling corrections for all docs

* fix: markdown linting rules and missed fixes

* fix: readded space to block

* fix: changed sh blocks to text

* docs: added exception for bare urls in markdown

* docs: added exception for in-line html (MD033/no-inline-html)

* docs: made python indentation level consistent (4 space tabs) even though I prefer 2.

---------

Co-authored-by: Charles Packer <packercharles@gmail.com>
2024-01-02 10:31:50 -08:00

33 lines
1.3 KiB
Markdown

---
title: llama.cpp
excerpt: Setting up MemGPT with llama.cpp
category: 6580da9a40bb410016b8b0c3
---
1. Download + install [llama.cpp](https://github.com/ggerganov/llama.cpp) and the model you want to test with
2. In your terminal, run `./server -m <MODEL> -c <CONTEXT_LENGTH>`
For example, if we downloaded the model `dolphin-2.2.1-mistral-7b.Q6_K.gguf` and put it inside `~/models/TheBloke/`, we would run:
```sh
# using `-c 8000` because Dolphin Mistral 7B has a context length of 8000
# the default port is 8080, you can change this with `--port`
./server -m ~/models/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q6_K.gguf -c 8000
```
In your terminal where you're running MemGPT, run `memgpt configure` to set the default backend for MemGPT to point at llama.cpp:
```text
# if you are running llama.cpp locally, the default IP address + port will be http://localhost:8080
? Select LLM inference provider: local
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): llamacpp
? Enter default endpoint: http://localhost:8080
...
```
If you have an existing agent that you want to move to the llama.cpp backend, add extra flags to `memgpt run`:
```sh
memgpt run --agent your_agent --model-endpoint-type llamacpp --model-endpoint http://localhost:8080
```