Files
letta-server/paper_experiments/doc_qa_task/2_run_eval.sh
Andy Li ff718d8c40 feat: uv migration (#3493)
* uv migration

smaller runners, freeze test runs, remove dev, ruff,hatchling, previw,
poetry, generates wheel, installs wheel, docker

* fix tests and dependency groups

* test fixes

* test fixing and main

* resolve merge conflict

* dev + test dependency group

* Test

* trigger CI

* trigger CI

* add debugging info

* trigger CI

* uv for reusable and sdk preview

* resolve mc and reformat black

* staged-api

* mypy

* fix fern

* prod Dockerfile

* model sweep, and project.toml and uvlock

* --group test -> --extra dev

* remove redundant --extra dev and rename tests to dev

* sdk backwards compat install sqlite

* install sqlite group for sdk-backwards-compat

* install uv on gh runner for cloud-api-integration-tests

* stage+publish

* pytest asyncio

* bug causing pytest package to get removed

* try to fix async event loop issues

* migrate to --with google-cloud-secret-manager

---------

Co-authored-by: Kian Jones <kian@letta.com>
2025-08-26 18:11:09 -07:00

19 lines
648 B
Bash

docs=(1 5 10 20 50 100 200 700)
models=("gpt-4-0613" "gpt-3.5-turbo-1106" "gpt-4-1106-preview")
## run letta eval
for model in "${models[@]}";
do
uv run python icml_experiments/doc_qa_task/llm_judge_doc_qa.py --file results/doc_qa_results_model_${model}.json
done
# Iterate over each model
for model in "${models[@]}"; do
# Iterate over each doc
for doc in "${docs[@]}"; do
# Construct and run the command
echo "Running for model $model with $doc docs..."
uv run python icml_experiments/doc_qa_task/llm_judge_doc_qa.py --file results/doc_qa_baseline_model_${model}_num_docs_${doc}.json --baseline
done
done