Co-authored-by: Charles Packer <packercharles@gmail.com> Co-authored-by: Shubham Naik <shubham.naik10@gmail.com> Co-authored-by: Shubham Naik <shub@memgpt.ai>
48 lines
1.6 KiB
Markdown
48 lines
1.6 KiB
Markdown
|
|
## Nested K/V (`nested_kv_task`)
|
|
This task runs K/V lookups on synthetic data. You can run it with `icml_experiments/nested_kv_task/run.sh`.
|
|
|
|
## Document Q/A (`doc_qa_task`)
|
|
This task runs question answering on a set of embedded wikipedia passages.
|
|
|
|
### Setup
|
|
You need a a running postgres database to run this experiment and an OpenAI account. Set your enviornment variables:
|
|
```
|
|
export PGVECTOR_TEST_DB_URL=postgresql+pg8000://{username}:{password}@localhost:8888/{db}
|
|
export OPENAI_API_KEY={key}
|
|
```
|
|
|
|
## Download data
|
|
Download the wikipedia embedding at:
|
|
```
|
|
huggingface-cli download nlpkevinl/wikipedia_openai_embeddings --repo-type dataset
|
|
```
|
|
|
|
## Loading embeddings
|
|
Run the script `./0_load_embeddings.sh`.
|
|
|
|
This step will take a while. You can check the status of the loading by connecting to `psql`:
|
|
```
|
|
> psql -h localhost -p {password} -U {username} -d {db}
|
|
> SELECT COUNT(*) from letta_passages;
|
|
```
|
|
Once completed, there will be ~19 million rows in the database.
|
|
|
|
### Creating an index
|
|
To avoid extremeley slow queries, you need to create an index:
|
|
```
|
|
CREATE INDEX ON letta_passages USING hnsw (embedding vector_l2_ops);
|
|
```
|
|
You can check to see if the index was created successfully with:
|
|
```
|
|
> SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'letta_passages';
|
|
|
|
letta_passages_embedding_idx | CREATE INDEX letta_passages_embedding_idx ON public.letta_passages USING hnsw (embedding vector_cosine_ops) WITH (m='24', ef_construction='100')
|
|
```
|
|
|
|
## Running Document Q/A
|
|
Run the script `./1_run_docqa.sh {model_name} {n_docs} {letta/model_name}`.
|
|
|
|
## Evaluation
|
|
Run the script `./2_run_eval.sh`.
|