* updated docs for readme * Update index.md * Update index.md * added header * broken link * sync heading sizes * fix various broken rel links * Update index.md * added webp * Update index.md * strip mkdocs/rtk files * replaced readthedocs references with readme
1.3 KiB
1.3 KiB
title, excerpt, category
| title | excerpt | category |
|---|---|---|
| koboldcpp | Setting up MemGPT with koboldcpp | 6580da9a40bb410016b8b0c3 |
- Download + install koboldcpp and the model you want to test with
- In your terminal, run
./koboldcpp.py <MODEL> -contextsize <CONTEXT_LENGTH>
For example, if we downloaded the model dolphin-2.2.1-mistral-7b.Q6_K.gguf and put it inside ~/models/TheBloke/, we would run:
# using `-contextsize 8192` because Dolphin Mistral 7B has a context length of 8000 (and koboldcpp wants specific intervals, 8192 is the closest)
# the default port is 5001
./koboldcpp.py ~/models/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q6_K.gguf --contextsize 8192
In your terminal where you're running MemGPT, run memgpt configure to set the default backend for MemGPT to point at koboldcpp:
# if you are running koboldcpp locally, the default IP address + port will be http://localhost:5001
? Select LLM inference provider: local
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): koboldcpp
? Enter default endpoint: http://localhost:5001
...
If you have an existing agent that you want to move to the koboldcpp backend, add extra flags to memgpt run:
memgpt run --agent your_agent --model-endpoint-type koboldcpp --model-endpoint http://localhost:5001