letta-server/docs/koboldcpp.md at e5add4e430b6f73ad2db8399dc3ffd41912305c0

Files

Charles Packer caba2f468c Create docs pages (#328 )

* Create docs  (#323)

* Create .readthedocs.yaml

* Update mkdocs.yml

* update

* revise

* syntax

* syntax

* syntax

* syntax

* revise

* revise

* spacing

* Docs (#327)

* add stuff

* patch homepage

* more docs

* updated

* updated

* refresh

* refresh

* refresh

* update

* refresh

* refresh

* refresh

* refresh

* missing file

* refresh

* refresh

* refresh

* refresh

* fix black

* refresh

* refresh

* refresh

* refresh

* add readme for just the docs

* Update README.md

* add more data loading docs

* cleanup data sources

* refresh

* revised

* add search

* make prettier

* revised

* updated

* refresh

* favi

* updated

---------

Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>

2023-11-06 12:38:49 -08:00

784 B

Raw Blame History

MemGPT + koboldcpp

Download + install koboldcpp and the model you want to test with
In your terminal, run ./koboldcpp.py <MODEL> -contextsize <CONTEXT_LENGTH>

For example, if we downloaded the model dolphin-2.2.1-mistral-7b.Q6_K.gguf and put it inside ~/models/TheBloke/, we would run:

# using `-contextsize 8192` because Dolphin Mistral 7B has a context length of 8000 (and koboldcpp wants specific intervals, 8192 is the closest)
# the default port is 5001
./koboldcpp.py ~/models/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q6_K.gguf --contextsize 8192

In your terminal where you're running MemGPT, run:

export OPENAI_API_BASE=http://localhost:5001
export BACKEND_TYPE=koboldcpp

784 B Raw Blame History

MemGPT + koboldcpp

784 B

Raw Blame History