letta-server/docs/local_llm_faq.md at 515d9d0f629afa6dabcd801c4e6475ca3c877512

Files

Charles Packer f8b99b562f feat: Migrate docs (#646 )

* updated docs for readme

* Update index.md

* Update index.md

* added header

* broken link

* sync heading sizes

* fix various broken rel links

* Update index.md

* added webp

* Update index.md

* strip mkdocs/rtk files

* replaced readthedocs references with readme

2023-12-18 20:29:24 -08:00

3.7 KiB

Raw Blame History

title, excerpt, category

title	excerpt	category
Troubleshooting	FAQ for MemGPT + custom LLM backends	6580da9a40bb410016b8b0c3

Problems getting MemGPT + local LLMs set up

"Unable to connect to host ...", "API call got non-200 response code"

This error happens when MemGPT tries to run the LLM on the remote server you specified, but the server isn't working as expected.

For example, this error can happen when you have a typo in your endpoint (notice the duplicate /v1 in the URL):

Exception: API call got non-200 response code (code=400, msg={"error": {"message": "Missing required input", "code": 400, "type": "InvalidRequestError", "param": "context"}}) for address: http://localhost:5001/v1/api/v1/generate. Make sure that the web UI server is running and reachable at http://localhost:5001/v1/api/v1/generate.

Correcting the endpoint from http://localhost:5001/v1 to http://localhost:5001 (no /v1 suffix) fixes the example error.

Common errors while running MemGPT with local LLMs

"Warning: no wrapper specified for local LLM, using the default wrapper"

You can ignore this warning.

This warning means that you did not specify a specific wrapper using the --model-wrapper flag, so MemGPT is using the default wrapper. If you would like to silence this warning, specify a wrapper with --model-wrapper or during memgpt configure.

"Failed to parse JSON from local LLM response"

This error occurs when the LLM you're using outputs a string that cannot be parsed into a MemGPT function call. This is basically an LLM output error - the LLM was not able to properly follow MemGPT instructions and generate a MemGPT-compatible function call string.

You can reduce the frequency of these errors by using better models, and wrappers with grammar-based sampling. For example, moving from a 2B model to a 70B model, or moving from a quantized model to the full unquantized version of the same model.

If you use really small models (< 7B) or heavily quantized models (< Q5), you are likely to run into many MemGPT LLM output errors. Try using the recommended models first before experimenting with your own custom models.

Many JSON-related output errors can be fixed by using a wrapper that uses grammars (required a grammar-enabled backend). See instructions about grammars here.

For example, let's look at the following error:

Failed to parse JSON from local LLM response - error: Failed to decode JSON from LLM output:
{
  "function": "send_message",
  "params": {
    "inner_thoughts": "Oops, I got their name wrong! I should apologize and correct myself.",
    "message": "Sorry about that! I assumed you were Chad. Welcome, Brad! "
 <|> - error
JSONDecodeError.init() missing 2 required positional arguments: 'doc' and 'pos'

In this example, the error is saying that the local LLM output the following string:

{
  "function": "send_message",
  "params": {
    "inner_thoughts": "Oops, I got their name wrong! I should apologize and correct myself.",
    "message": "Sorry about that! I assumed you were Chad. Welcome, Brad! "
 <|>

This string is not correct JSON - it is missing closing brackets and has a stray "<|>". Correct JSON would look like this:

{
  "function": "send_message",
  "params": {
    "inner_thoughts": "Oops, I got their name wrong! I should apologize and correct myself.",
    "message": "Sorry about that! I assumed you were Chad. Welcome, Brad! "
  }
}

"Got back an empty response string from ..."

MemGPT asked the server to run the LLM, but got back an empty response. Double-check that your server is running properly and has context length set correctly (it should be set to 8k if using Mistral 7B models).

3.7 KiB Raw Blame History