letta-server/tests/model_settings/ollama.json at 2ffef0fb31de1e69dadb5da050beacbe267bd836 - letta-server - WIUF Gitea: Waiting is - Until Fullness

Fimeg/letta-server

Files

Kian Jones 7eb85707b1 feat(tf): gpu runners and prod memory_repos (#9283 )

* add gpu runners and prod memory_repos

* add lmstudio and vllm in model_settings

* fix llm_configs and change variable name in reusable workflow and change perms for memory_repos to admin in tf

* fix: update self-hosted provider tests to use SDK 1.0 and v2 tests

- Update letta-client from ==0.1.324 to >=1.0.0
- Switch ollama/vllm/lmstudio tests to integration_test_send_message_v2.py

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use openai provider_type for self-hosted model settings

ollama/vllm/lmstudio are not valid provider_type values in the SDK
model_settings schema - they use openai-compatible APIs so provider_type
should be openai. The provider routing is determined by the handle prefix.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: use openai_compat_base_url for ollama/vllm/lmstudio providers

When reconstructing LLMConfig from a model handle lookup, use the
provider's openai_compat_base_url (which includes /v1) instead of
raw base_url. This fixes 404 errors when calling ollama/vllm/lmstudio
since OpenAI client expects /v1/chat/completions endpoint.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: enable redis for ollama/vllm/lmstudio tests

Background streaming tests require Redis. Add use-redis: true to
self-hosted provider test workflows.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* add memfs-py in prod bucket access

* change ollama

* change packer model defaults

* self-hosted provider support

* diasble reasoner to match the number of messages in test case, enable parallel tool calls, and pass embedding configs

* remove reasoning setting not supported for ollama

* add qwen3 to extra assistant message case

* lower temp

* prep for lmstudio and vllm

* used lmstudio_openai client

* skip parallel tool calls on cpu ran provider lmstudio

* revert downgrade since it's so slow already

* add reuired flags for tool call parsing etc.

* change tool call parser from hermes to qwen3_xml

* qwen3_xmlk -> qwen3_coder

* upgrade vllm to latest container

* revert to hermes (incompatible with parallel tool calls?) and skipping vllm tests on parallel tool calls

* install uv redis extra

* remove lmstudio

---------

Co-authored-by: Letta <noreply@letta.com>

2026-02-24 10:52:07 -08:00

10 lines

179 B

JSON

Raw Blame History

 {
   "handle": "ollama/qwen3:8b",
   "model_settings": {
     "provider_type": "openai",
     "temperature": 0.7,
     "max_output_tokens": 4096,
     "parallel_tool_calls": true
   }
 }