letta-server/docs/koboldcpp.md at c0bd66c957f4fbbdf427bd72368a0380bae2df20

Files

Owen Easter 796c33f1a6 docs: linting, syntax, formatting & spelling fixes for all files (#761 )

* Update README.md

* fix: 'ollama run' should be 'ollama pull'

* fix: linting, syntax, spelling corrections for all docs

* fix: markdown linting rules and missed fixes

* fix: readded space to block

* fix: changed sh blocks to text

* docs: added exception for bare urls in markdown

* docs: added exception for in-line html (MD033/no-inline-html)

* docs: made python indentation level consistent (4 space tabs) even though I prefer 2.

---------

Co-authored-by: Charles Packer <packercharles@gmail.com>

2024-01-02 10:31:50 -08:00

1.3 KiB

Raw Blame History

title, excerpt, category

title	excerpt	category
koboldcpp	Setting up MemGPT with koboldcpp	6580da9a40bb410016b8b0c3

Download + install koboldcpp and the model you want to test with
In your terminal, run ./koboldcpp.py <MODEL> -contextsize <CONTEXT_LENGTH>

For example, if we downloaded the model dolphin-2.2.1-mistral-7b.Q6_K.gguf and put it inside ~/models/TheBloke/, we would run:

# using `-contextsize 8192` because Dolphin Mistral 7B has a context length of 8000 (and koboldcpp wants specific intervals, 8192 is the closest)
# the default port is 5001
./koboldcpp.py ~/models/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q6_K.gguf --contextsize 8192

In your terminal where you're running MemGPT, run memgpt configure to set the default backend for MemGPT to point at koboldcpp:

# if you are running koboldcpp locally, the default IP address + port will be http://localhost:5001
? Select LLM inference provider: local
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): koboldcpp
? Enter default endpoint: http://localhost:5001
...

If you have an existing agent that you want to move to the koboldcpp backend, add extra flags to memgpt run:

memgpt run --agent your_agent --model-endpoint-type koboldcpp --model-endpoint http://localhost:5001

1.3 KiB Raw Blame History

1.3 KiB

Raw Blame History