diff --git a/fern/docs.yml b/fern/docs.yml index 99e88504..d0f4dc19 100644 --- a/fern/docs.yml +++ b/fern/docs.yml @@ -707,6 +707,14 @@ navigation: contents: - page: Building customer-specific relationship agents path: pages/tutorials/customer-specific-agents.mdx + - section: RAG + contents: + - page: RAG Overview + path: pages/cookbooks/rag-overview.mdx + - page: Connect a Vector DB to Letta + path: pages/cookbooks/rag-simple.mdx + - page: Agentic RAG + path: pages/cookbooks/rag-agentic.mdx - tab: leaderboard layout: diff --git a/fern/images/chroma-keys.png b/fern/images/chroma-keys.png new file mode 100644 index 00000000..04623565 Binary files /dev/null and b/fern/images/chroma-keys.png differ diff --git a/fern/images/chroma-new-project.png b/fern/images/chroma-new-project.png new file mode 100644 index 00000000..d9d0cb16 Binary files /dev/null and b/fern/images/chroma-new-project.png differ diff --git a/fern/images/connection-string-mongodb.png b/fern/images/connection-string-mongodb.png new file mode 100644 index 00000000..89d585c4 Binary files /dev/null and b/fern/images/connection-string-mongodb.png differ diff --git a/fern/images/create-cluster-mongodb.png b/fern/images/create-cluster-mongodb.png new file mode 100644 index 00000000..7fd77a1d Binary files /dev/null and b/fern/images/create-cluster-mongodb.png differ diff --git a/fern/images/hf-token.png b/fern/images/hf-token.png new file mode 100644 index 00000000..a3330fd5 Binary files /dev/null and b/fern/images/hf-token.png differ diff --git a/fern/images/ip-config-mongodb.png b/fern/images/ip-config-mongodb.png new file mode 100644 index 00000000..618ab9ba Binary files /dev/null and b/fern/images/ip-config-mongodb.png differ diff --git a/fern/images/letta-api-key-nav.png b/fern/images/letta-api-key-nav.png new file mode 100644 index 00000000..ca10d6c4 Binary files /dev/null and b/fern/images/letta-api-key-nav.png differ diff --git a/fern/images/letta-dep-config.png b/fern/images/letta-dep-config.png new file mode 100644 index 00000000..aa86c5ee Binary files /dev/null and b/fern/images/letta-dep-config.png differ diff --git a/fern/images/letta-tool-config.png b/fern/images/letta-tool-config.png new file mode 100644 index 00000000..1ab9a264 Binary files /dev/null and b/fern/images/letta-tool-config.png differ diff --git a/fern/images/qdrant-connection-details.png b/fern/images/qdrant-connection-details.png new file mode 100644 index 00000000..18bc8f36 Binary files /dev/null and b/fern/images/qdrant-connection-details.png differ diff --git a/fern/images/qdrant-create-cluster.png b/fern/images/qdrant-create-cluster.png new file mode 100644 index 00000000..0695b2d8 Binary files /dev/null and b/fern/images/qdrant-create-cluster.png differ diff --git a/fern/images/stateless-agent-ui.png b/fern/images/stateless-agent-ui.png new file mode 100644 index 00000000..2428d03f Binary files /dev/null and b/fern/images/stateless-agent-ui.png differ diff --git a/fern/pages/cookbooks/rag-agentic.mdx b/fern/pages/cookbooks/rag-agentic.mdx new file mode 100644 index 00000000..c45bba77 --- /dev/null +++ b/fern/pages/cookbooks/rag-agentic.mdx @@ -0,0 +1,1540 @@ +--- +title: Agentic RAG with Letta +subtitle: Empower your agent with custom search tools for autonomous retrieval +slug: guides/rag/agentic +--- + +In the Agentic RAG approach, we delegate the retrieval process to the agent itself. Instead of your application deciding what to search for, we provide the agent with a custom tool that allows it to query your vector database directly. This makes the agent more autonomous and your client-side code much simpler. + +By the end of this tutorial, you'll have a research assistant that autonomously decides when to search your vector database and what queries to use. + +## Prerequisites + +To follow along, you need free accounts for: + +- **[Letta](https://www.letta.com)** - To access the agent development platform +- **[Hugging Face](https://huggingface.co/)** - For generating embeddings (MongoDB and Qdrant users only) +- **One of the following vector databases:** + - **[ChromaDB Cloud](https://www.trychroma.com/)** for a hosted vector database + - **[MongoDB Atlas](https://www.mongodb.com/cloud/atlas/register)** for vector search with MongoDB + - **[Qdrant Cloud](https://cloud.qdrant.io/)** for a high-performance vector database + +You will also need Python 3.8+ or Node.js v18+ and a code editor. + + +**MongoDB and Qdrant users:** This guide uses Hugging Face's Inference API for generating embeddings. This approach keeps the tool code lightweight enough to run in Letta's sandbox environment. + + +## Getting Your API Keys + +We'll need API keys for Letta and your chosen vector database. + + + + + + If you don't have one, sign up for a free account at [letta.com](https://www.letta.com). + + + Once logged in, click on **API keys** in the sidebar. + ![Letta API Key Navigation](/images/letta-api-key-nav.png) + + + Click **+ Create API key**, give it a descriptive name, and click **Confirm**. Copy the key and save it somewhere safe. + + + + + + + + + + Sign up for a free account on the [ChromaDB Cloud website](https://www.trychroma.com/). + + + From your dashboard, create a new database. + ![ChromaDB New Project](/images/chroma-new-project.png) + + + In your project settings, you'll find your **API Key**, **Tenant**, **Database**, and **Host URL**. We'll need all of these for our scripts. + ![ChromaDB Keys](/images/chroma-keys.png) + + + + + + + + Sign up for a free account at [mongodb.com/cloud/atlas/register](https://www.mongodb.com/cloud/atlas/register). + + + Click **Build a Cluster** and select the free tier (M0). Choose your preferred cloud provider and region and click **Create deployment**. + ![Create MongoDB Cluster](/images/create-cluster-mongodb.png) + + + Next, set up connection security. + 1. Create a database user, then click **Choose a connection method** + 2. Choose **Drivers** to connect to your application, choose Python as the driver. + 3. Copy the **entire** connection string, including the query parameters at the end. It will look like this: + + ``` + mongodb+srv://:@cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0 + ``` + + + Make sure to replace `` with your actual database user password. Keep all the query parameters (`?retryWrites=true&w=majority&appName=Cluster0`) they are required for proper connection configuration. + + + ![MongoDB Connection String](/images/connection-string-mongodb.png) + + + By default, MongoDB Atlas blocks all outside connections. You must grant access to the services that need to connect. + + 1. Navigate to **Database and Network Access** in the left sidebar. + 2. Click **Add IP Address**. + 3. For local development and testing, select **Allow Access From Anywhere**. This will add the IP address `0.0.0.0/0`. + 4. Click **Confirm**. + + ![MongoDB IP Configuration](/images/ip-config-mongodb.png) + + + For a production environment, you would replace `0.0.0.0/0` with a secure list of static IP addresses provided by your hosting service (e.g., Letta). + + + + + + + + + Sign up for a free account at [cloud.qdrant.io](https://cloud.qdrant.io/). + + + From your dashboard, click **Clusters** and then **+ Create**. Select the free tier and choose your preferred region. + + ![Create Qdrant Cluster](/images/qdrant-create-cluster.png) + + + Once your cluster is created, click on it to view details. + + Copy the following: + + 1. **API Key** + 2. **Cluster URL** + + ![Qdrant Connection Details](/images/qdrant-connection-details.png) + + + + + + + + + + Sign up for a free account at [huggingface.co](https://huggingface.co/join). + + + Click the profile icon in the top right. Navigate to **Settings** > **Access Tokens** (or go directly to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)). + + + Click **New token**, give it a name (e.g., "Letta RAG Demo"), select **Read** role, and click **Create token**. Copy the token and save it securely. + ![Hugging Face Token](/images/hf-token.png) + + + + +The free tier includes 30,000 API requests per month, which is more than enough for development and testing. + + + + +Once you have these credentials, create a `.env` file in your project directory. Add the credentials for your chosen database: + + + +```bash +LETTA_API_KEY="..." +CHROMA_API_KEY="..." +CHROMA_TENANT="..." +CHROMA_DATABASE="..." +``` + + +```bash +LETTA_API_KEY="..." +MONGODB_URI="mongodb+srv://username:password@cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0" +MONGODB_DB_NAME="rag_demo" +HF_API_KEY="..." +``` + + +```bash +LETTA_API_KEY="..." +QDRANT_URL="https://xxxxx.cloud.qdrant.io" +QDRANT_API_KEY="..." +HF_API_KEY="..." +``` + + + +## Step 1: Set Up the Vector Database + +First, we need to populate your chosen vector database with the content of the research papers. We'll use two papers for this demo: ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) and ["BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"](https://arxiv.org/abs/1810.04805). + +Before we begin, let's set up our development environment: + + +```bash title="Python" +# Create a Python virtual environment to keep dependencies isolated +python -m venv venv +source venv/bin/activate # On Windows, use: venv\Scripts\activate +``` + +```bash title="TypeScript" +# Create a new Node.js project +npm init -y + +# Create tsconfig.json for TypeScript configuration +cat > tsconfig.json << 'EOF' +{ + "compilerOptions": { + "target": "ES2020", + "module": "ESNext", + "moduleResolution": "node", + "esModuleInterop": true, + "skipLibCheck": true, + "strict": true + } +} +EOF +``` + + +Typescript users must update package.json to use ES modules: + +```typescript +"type": "module" +``` + +Download the research papers using curl with the `-L` flag to follow redirects: + +``` +curl -L -o 1706.03762.pdf https://arxiv.org/pdf/1706.03762.pdf +curl -L -o 1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf +``` + +Verify the PDFs downloaded correctly: + +``` +file 1706.03762.pdf 1810.04805.pdf +``` + +You should see output indicating these are PDF documents, not HTML files. + +Install the necessary packages for your chosen database: + + + + +```txt title="Python" +# requirements.txt +letta-client +chromadb +pypdf +python-dotenv +``` + +```bash title="TypeScript" +npm install @letta-ai/letta-client dotenv +npm install --save-dev typescript @types/node tsx +``` + + +For Python, install with: +```bash +pip install -r requirements.txt +``` + + + + + +```txt title="Python" +# requirements.txt +letta-client +pymongo +pypdf +python-dotenv +requests +certifi +dnspython +``` + +```bash title="TypeScript" +npm install @letta-ai/letta-client dotenv +npm install --save-dev typescript @types/node tsx +``` + + +For Python, install with: +```bash +pip install -r requirements.txt +``` + + + + + +```txt title="Python" +# requirements.txt +letta-client +qdrant-client +pypdf +python-dotenv +requests +``` + +```bash title="TypeScript" +npm install @letta-ai/letta-client dotenv +npm install --save-dev typescript @types/node tsx +``` + + +For Python, install with: +```bash +pip install -r requirements.txt +``` + + + + +Now create a `setup.py` or `setup.ts` file to load the PDFs, split them into chunks, and ingest them into your database: + + + + +```python title="Python" +import os +import chromadb +import pypdf +from dotenv import load_dotenv + +load_dotenv() + +def main(): + # Connect to ChromaDB Cloud + client = chromadb.CloudClient( + tenant=os.getenv("CHROMA_TENANT"), + database=os.getenv("CHROMA_DATABASE"), + api_key=os.getenv("CHROMA_API_KEY") + ) + + # Create or get the collection + collection = client.get_or_create_collection("rag_collection") + + # Ingest PDFs + pdf_files = ["1706.03762.pdf", "1810.04805.pdf"] + for pdf_file in pdf_files: + print(f"Ingesting {pdf_file}...") + reader = pypdf.PdfReader(pdf_file) + for i, page in enumerate(reader.pages): + collection.add( + ids=[f"{pdf_file}-{i}"], + documents=[page.extract_text()] + ) + + print("\nIngestion complete!") + print(f"Total documents in collection: {collection.count()}") + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { CloudClient } from 'chromadb'; +import { DefaultEmbeddingFunction } from '@chroma-core/default-embed'; +import * as dotenv from 'dotenv'; +import * as path from 'path'; +import * as fs from 'fs'; +import { pdfToPages } from 'pdf-ts'; + +dotenv.config(); + +async function main() { + // Connect to ChromaDB Cloud + const client = new CloudClient({ + apiKey: process.env.CHROMA_API_KEY || '', + tenant: process.env.CHROMA_TENANT || '', + database: process.env.CHROMA_DATABASE || '' + }); + + // Create embedding function + const embedder = new DefaultEmbeddingFunction(); + + // Create or get the collection + const collection = await client.getOrCreateCollection({ + name: 'rag_collection', + embeddingFunction: embedder + }); + + // Ingest PDFs + const pdfFiles = ['1706.03762.pdf', '1810.04805.pdf']; + + for (const pdfFile of pdfFiles) { + console.log(`Ingesting ${pdfFile}...`); + const pdfPath = path.join(__dirname, pdfFile); + const dataBuffer = fs.readFileSync(pdfPath); + + const pages = await pdfToPages(dataBuffer); + + for (let i = 0; i < pages.length; i++) { + const text = pages[i].text.trim(); + if (text) { + await collection.add({ + ids: [`${pdfFile}-${i}`], + documents: [text] + }); + } + } + } + + console.log('\nIngestion complete!'); + const count = await collection.count(); + console.log(`Total documents in collection: ${count}`); +} + +main().catch(console.error); +``` + + + + + +```python title="Python" +import os +import pymongo +import pypdf +import requests +import certifi +from dotenv import load_dotenv + +load_dotenv() + +def get_embedding(text, api_key): + """Get embedding from Hugging Face Inference API""" + API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5" + headers = {"Authorization": f"Bearer {api_key}"} + + response = requests.post(API_URL, headers=headers, json={"inputs": [text], "options": {"wait_for_model": True}}) + + if response.status_code == 200: + return response.json()[0] + else: + raise Exception(f"HF API error: {response.status_code} - {response.text}") + +def main(): + hf_api_key = os.getenv("HF_API_KEY") + mongodb_uri = os.getenv("MONGODB_URI") + db_name = os.getenv("MONGODB_DB_NAME") + + if not all([hf_api_key, mongodb_uri, db_name]): + print("Error: Ensure HF_API_KEY, MONGODB_URI, and MONGODB_DB_NAME are in .env file") + return + + # Connect to MongoDB Atlas using certifi + client = pymongo.MongoClient(mongodb_uri, tlsCAFile=certifi.where()) + db = client[db_name] + collection = db["rag_collection"] + + # Ingest PDFs + pdf_files = ["1706.03762.pdf", "1810.04805.pdf"] + for pdf_file in pdf_files: + print(f"Ingesting {pdf_file}...") + reader = pypdf.PdfReader(pdf_file) + for i, page in enumerate(reader.pages): + text = page.extract_text() + if not text: # Skip empty pages + continue + + # Generate embedding using Hugging Face + print(f" Processing page {i+1}...") + try: + embedding = get_embedding(text, hf_api_key) + collection.insert_one({ + "_id": f"{pdf_file}-{i}", + "text": text, + "embedding": embedding, + "source": pdf_file, + "page": i + }) + except Exception as e: + print(f" Could not process page {i+1}: {e}") + + + print("\nIngestion complete!") + print(f"Total documents in collection: {collection.count_documents({})}") + + # Create vector search index + print("\nNext: Go to your MongoDB Atlas dashboard and create a search index named 'vector_index'") + print("Use the following JSON definition:") + print('''{ + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + } + ] +}''') + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { MongoClient } from 'mongodb'; +import * as dotenv from 'dotenv'; +import { pdfToPages } from 'pdf-ts'; +import * as fs from 'fs'; +import fetch from 'node-fetch'; + +dotenv.config(); + +async function getEmbedding(text: string, apiKey: string): Promise { + const API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5"; + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; + + const response = await fetch(API_URL, { + method: 'POST', + headers: headers, + body: JSON.stringify({ + inputs: [text], + options: { wait_for_model: true } + }) + }); + + if (response.ok) { + const result: any = await response.json(); + return result[0]; + } else { + const errorText = await response.text(); + throw new Error(`HF API error: ${response.status} - ${errorText}`); + } +} + +async function main() { + const hfApiKey = process.env.HF_API_KEY || ''; + const mongoUri = process.env.MONGODB_URI || ''; + const dbName = process.env.MONGODB_DB_NAME || ''; + + if (!hfApiKey || !mongoUri || !dbName) { + console.error('Error: Ensure HF_API_KEY, MONGODB_URI, and MONGODB_DB_NAME are in .env file'); + return; + } + + // Connect to MongoDB Atlas + const client = new MongoClient(mongoUri); + + try { + await client.connect(); + console.log('Connected to MongoDB Atlas'); + + const db = client.db(dbName); + const collection = db.collection('rag_collection'); + + // Ingest PDFs + const pdfFiles = ['1706.03762.pdf', '1810.04805.pdf']; + + for (const pdfFile of pdfFiles) { + console.log(`Ingesting ${pdfFile}...`); + + const dataBuffer = fs.readFileSync(pdfFile); + const pages = await pdfToPages(dataBuffer); + + for (let i = 0; i < pages.length; i++) { + const text = pages[i].text; + + if (!text || text.trim().length === 0) { + continue; // Skip empty pages + } + + // Generate embedding using Hugging Face + console.log(` Processing page ${i + 1}...`); + try { + const embedding = await getEmbedding(text, hfApiKey); + + await collection.insertOne({ + _id: `${pdfFile}-${i}`, + text: text, + embedding: embedding, + source: pdfFile, + page: i + }); + } catch (error) { + console.log(` Could not process page ${i + 1}: ${error}`); + } + } + } + + const docCount = await collection.countDocuments({}); + console.log('\nIngestion complete!'); + console.log(`Total documents in collection: ${docCount}`); + + console.log('\nNext: Go to your MongoDB Atlas dashboard and create a search index named "vector_index"'); + console.log(JSON.stringify({ + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + } + ] + }, null, 2)); + + } catch (error) { + console.error('Error:', error); + } finally { + await client.close(); + } +} + +main(); +``` + + + + + +```python title="Python" +import os +import pypdf +import requests +from dotenv import load_dotenv +from qdrant_client import QdrantClient +from qdrant_client.models import Distance, VectorParams, PointStruct + +load_dotenv() + +def get_embedding(text, api_key): + """Get embedding from Hugging Face Inference API""" + API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5" + headers = {"Authorization": f"Bearer {api_key}"} + + response = requests.post(API_URL, headers=headers, json={"inputs": text, "options": {"wait_for_model": True}}) + + if response.status_code == 200: + return response.json() + else: + raise Exception(f"HF API error: {response.status_code} - {response.text}") + +def main(): + hf_api_key = os.getenv("HF_API_KEY") + + if not hf_api_key: + print("Error: HF_API_KEY not found in .env file") + return + + # Connect to Qdrant Cloud + client = QdrantClient( + url=os.getenv("QDRANT_URL"), + api_key=os.getenv("QDRANT_API_KEY") + ) + + # Create collection + collection_name = "rag_collection" + + # Check if collection exists, if not create it + collections = client.get_collections().collections + if collection_name not in [c.name for c in collections]: + client.create_collection( + collection_name=collection_name, + vectors_config=VectorParams(size=384, distance=Distance.COSINE) + ) + + # Ingest PDFs + pdf_files = ["1706.03762.pdf", "1810.04805.pdf"] + point_id = 0 + + for pdf_file in pdf_files: + print(f"Ingesting {pdf_file}...") + reader = pypdf.PdfReader(pdf_file) + for i, page in enumerate(reader.pages): + text = page.extract_text() + + # Generate embedding using Hugging Face + print(f" Processing page {i+1}...") + embedding = get_embedding(text, hf_api_key) + + client.upsert( + collection_name=collection_name, + points=[ + PointStruct( + id=point_id, + vector=embedding, + payload={"text": text, "source": pdf_file, "page": i} + ) + ] + ) + point_id += 1 + + print("\nIngestion complete!") + collection_info = client.get_collection(collection_name) + print(f"Total documents in collection: {collection_info.points_count}") + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { QdrantClient } from '@qdrant/js-client-rest'; +import { pdfToPages } from 'pdf-ts'; +import dotenv from 'dotenv'; +import fetch from 'node-fetch'; +import * as fs from 'fs'; + +dotenv.config(); + +async function getEmbedding(text: string, apiKey: string): Promise { + const API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5"; + + const response = await fetch(API_URL, { + method: 'POST', + headers: { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }, + body: JSON.stringify({ + inputs: [text], + options: { wait_for_model: true } + }) + }); + + if (response.ok) { + const result: any = await response.json(); + return result[0]; + } else { + const error = await response.text(); + throw new Error(`HuggingFace API error: ${response.status} - ${error}`); + } +} + +async function main() { + const hfApiKey = process.env.HF_API_KEY || ''; + + if (!hfApiKey) { + console.error('Error: HF_API_KEY not found in .env file'); + return; + } + + // Connect to Qdrant Cloud + const client = new QdrantClient({ + url: process.env.QDRANT_URL || '', + apiKey: process.env.QDRANT_API_KEY || '' + }); + + const collectionName = 'rag_collection'; + + // Check if collection exists, if not create it + const collections = await client.getCollections(); + const collectionExists = collections.collections.some(c => c.name === collectionName); + + if (!collectionExists) { + console.log('Creating collection...'); + await client.createCollection(collectionName, { + vectors: { + size: 384, + distance: 'Cosine' + } + }); + } + + // Ingest PDFs + const pdfFiles = ['1706.03762.pdf', '1810.04805.pdf']; + let pointId = 0; + + for (const pdfFile of pdfFiles) { + console.log(`\nIngesting ${pdfFile}...`); + const dataBuffer = fs.readFileSync(pdfFile); + const pages = await pdfToPages(dataBuffer); + + for (let i = 0; i < pages.length; i++) { + const text = pages[i].text; + + console.log(` Processing page ${i + 1}...`); + const embedding = await getEmbedding(text, hfApiKey); + + await client.upsert(collectionName, { + wait: true, + points: [ + { + id: pointId, + vector: embedding, + payload: { + text: text, + source: pdfFile, + page: i + } + } + ] + }); + pointId++; + } + } + + console.log('\nIngestion complete!'); + const collectionInfo = await client.getCollection(collectionName); + console.log(`Total documents in collection: ${collectionInfo.points_count}`); +} + +main().catch(console.error); +``` + + + + +Run the script from your terminal: + + +```bash title="Python" +python setup.py +``` + +```bash title="TypeScript" +npx tsx setup.ts +``` + + +If you are using MongoDB Atlas, you must manually create a vector search index by following the steps below. + + + +**MongoDB Atlas users:** The setup script ingests your data, but MongoDB Atlas requires you to manually create a vector search index before queries will work. Follow these steps carefully. + + + + + Log in to your [MongoDB Atlas dashboard](https://cloud.mongodb.com/), navigate to your cluster, and click on the **"Atlas Search"** tab. + + + Click **"Create Search Index"**, then choose **"JSON Editor"** (not "Visual Editor"). + + + - Database: Select **`rag_demo`** (or whatever you set as `MONGODB_DB_NAME`) + - Collection: Select **`rag_collection`** + + + - Index Name: Enter **`vector_index`** (this exact name is required by the code) + - Paste this JSON definition: + ```json + { + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + } + ] + } + ``` + **Note:** 384 dimensions is for Hugging Face's `BAAI/bge-small-en-v1.5` model. + + + Click **"Create Search Index"**. The index will take a few minutes to build. Wait until the status shows as **"Active"** before proceeding. + + + + +Your vector database is now populated with research paper content and ready to query. + +## Step 2: Create a Custom Search Tool + +A Letta tool is a Python function that your agent can call. We'll create a function that searches your vector database and returns the results. Letta handles the complexities of exposing this function to the agent securely. + + +**TypeScript users:** Letta tools execute in Python, even when called from TypeScript. Create a `tools.ts` file that exports the Python code as a string constant, which you'll use in Step 3 to create the tool. + + +Create a new file named `tools.py` (Python) or `tools.ts` (TypeScript) with the appropriate implementation for your database: + + + + +```python title="Python" +def search_research_papers(query_text: str, n_results: int = 1) -> str: + """ + Searches the research paper collection for a given query. + + Args: + query_text (str): The text to search for. + n_results (int): The number of results to return. + + Returns: + str: The most relevant document found. + """ + import chromadb + import os + + # ChromaDB Cloud Client + # This tool code is executed on the Letta server. It expects the ChromaDB + # credentials to be passed as environment variables. + api_key = os.getenv("CHROMA_API_KEY") + tenant = os.getenv("CHROMA_TENANT") + database = os.getenv("CHROMA_DATABASE") + + if not all([api_key, tenant, database]): + raise ValueError("CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE must be set as environment variables.") + + client = chromadb.CloudClient( + tenant=tenant, + database=database, + api_key=api_key + ) + + collection = client.get_or_create_collection("rag_collection") + + try: + results = collection.query( + query_texts=[query_text], + n_results=n_results + ) + + document = results['documents'][0][0] + return document + except Exception as e: + return f"Tool failed with error: {e}" +``` + +```typescript title="TypeScript" +/** + * This file contains the Python tool code as a string. + * Letta tools execute in Python, so we define the Python source code here. + */ + +export const searchResearchPapersToolCode = `def search_research_papers(query_text: str, n_results: int = 1) -> str: + """ + Searches the research paper collection for a given query. + + Args: + query_text (str): The text to search for. + n_results (int): The number of results to return. + + Returns: + str: The most relevant document found. + """ + import chromadb + import os + + # ChromaDB Cloud Client + # This tool code is executed on the Letta server. It expects the ChromaDB + # credentials to be passed as environment variables. + api_key = os.getenv("CHROMA_API_KEY") + tenant = os.getenv("CHROMA_TENANT") + database = os.getenv("CHROMA_DATABASE") + + if not all([api_key, tenant, database]): + raise ValueError("CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE must be set as environment variables.") + + client = chromadb.CloudClient( + tenant=tenant, + database=database, + api_key=api_key + ) + + collection = client.get_or_create_collection("rag_collection") + + try: + results = collection.query( + query_texts=[query_text], + n_results=n_results + ) + + document = results['documents'][0][0] + return document + except Exception as e: + return f"Tool failed with error: {e}" +`; +``` + + + + + +```python title="Python" +import os + +def search_research_papers(query_text: str, n_results: int = 1) -> str: + """ + Searches the research paper collection for a given query using Hugging Face embeddings. + + Args: + query_text (str): The text to search for. + n_results (int): The number of results to return. + + Returns: + str: The most relevant documents found. + """ + import requests + import pymongo + import certifi + + try: + n_results = int(n_results) + except (ValueError, TypeError): + n_results = 1 + + mongodb_uri = os.getenv("MONGODB_URI") + db_name = os.getenv("MONGODB_DB_NAME") + hf_api_key = os.getenv("HF_API_KEY") + + if not all([mongodb_uri, db_name, hf_api_key]): + raise ValueError("MONGODB_URI, MONGODB_DB_NAME, and HF_API_KEY must be set as environment variables.") + + # --- Hugging Face API Call --- + try: + response = requests.post( + "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5", + headers={"Authorization": f"Bearer {hf_api_key}"}, + json={"inputs": [query_text], "options": {"wait_for_model": True}}, + timeout=30 + ) + response.raise_for_status() + query_embedding = response.json()[0] + except requests.exceptions.RequestException as e: + return f"Hugging Face API request failed: {e}" + + # --- MongoDB Atlas Connection & Search --- + try: + client = pymongo.MongoClient(mongodb_uri, tlsCAFile=certifi.where(), serverSelectionTimeoutMS=30000) + collection = client[db_name]["rag_collection"] + pipeline = [ + { + "$vectorSearch": { + "index": "vector_index", + "path": "embedding", + "queryVector": query_embedding, + "numCandidates": 100, + "limit": n_results + } + }, + { + "$project": { + "text": 1, + "source": 1, + "page": 1, + "score": {"$meta": "vectorSearchScore"} + } + } + ] + results = list(collection.aggregate(pipeline)) + except pymongo.errors.PyMongoError as e: + return f"MongoDB operation failed: {e}" + + # --- Final Processing --- + documents = [doc.get("text", "") for doc in results] + return "\n\n".join(documents) if documents else "No results found." +``` + +```typescript title="TypeScript" +/** + * This file contains the Python tool code as a string. + * Letta tools execute in Python, so we define the Python source code here. + */ + +export const searchResearchPapersToolCode = `import os + +def search_research_papers(query_text: str, n_results: int = 1) -> str: + """ + Searches the research paper collection for a given query using Hugging Face embeddings. + + Args: + query_text (str): The text to search for. + n_results (int): The number of results to return. + + Returns: + str: The most relevant documents found. + """ + import requests + import pymongo + import certifi + + try: + n_results = int(n_results) + except (ValueError, TypeError): + n_results = 1 + + mongodb_uri = os.getenv("MONGODB_URI") + db_name = os.getenv("MONGODB_DB_NAME") + hf_api_key = os.getenv("HF_API_KEY") + + if not all([mongodb_uri, db_name, hf_api_key]): + raise ValueError("MONGODB_URI, MONGODB_DB_NAME, and HF_API_KEY must be set as environment variables.") + + # --- Hugging Face API Call --- + try: + response = requests.post( + "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5", + headers={"Authorization": f"Bearer {hf_api_key}"}, + json={"inputs": [query_text], "options": {"wait_for_model": True}}, + timeout=30 + ) + response.raise_for_status() + query_embedding = response.json()[0] + except requests.exceptions.RequestException as e: + return f"Hugging Face API request failed: {e}" + + # --- MongoDB Atlas Connection & Search --- + try: + client = pymongo.MongoClient(mongodb_uri, tlsCAFile=certifi.where(), serverSelectionTimeoutMS=30000) + collection = client[db_name]["rag_collection"] + pipeline = [ + { + "$vectorSearch": { + "index": "vector_index", + "path": "embedding", + "queryVector": query_embedding, + "numCandidates": 100, + "limit": n_results + } + }, + { + "$project": { + "text": 1, + "source": 1, + "page": 1, + "score": {"$meta": "vectorSearchScore"} + } + } + ] + results = list(collection.aggregate(pipeline)) + except pymongo.errors.PyMongoError as e: + return f"MongoDB operation failed: {e}" + + # --- Final Processing --- + documents = [doc.get("text", "") for doc in results] + return "\\n\\n".join(documents) if documents else "No results found." +`; +``` + + + + + +```python title="Python" +def search_research_papers(query_text: str, n_results: int = 1) -> str: + """ + Searches the research paper collection for a given query using Hugging Face embeddings. + + Args: + query_text (str): The text to search for. + n_results (int): The number of results to return. + + Returns: + str: The most relevant documents found. + """ + import os + import requests + from qdrant_client import QdrantClient + + # Qdrant Cloud Client + url = os.getenv("QDRANT_URL") + api_key = os.getenv("QDRANT_API_KEY") + hf_api_key = os.getenv("HF_API_KEY") + + if not all([url, api_key, hf_api_key]): + raise ValueError("QDRANT_URL, QDRANT_API_KEY, and HF_API_KEY must be set as environment variables.") + + # Connect to Qdrant + client = QdrantClient(url=url, api_key=api_key) + + try: + # Generate embedding using Hugging Face + API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5" + headers = {"Authorization": f"Bearer {hf_api_key}"} + response = requests.post(API_URL, headers=headers, json={"inputs": query_text, "options": {"wait_for_model": True}}) + + if response.status_code != 200: + return f"HF API error: {response.status_code}" + + query_embedding = response.json() + + # Search Qdrant + results = client.query_points( + collection_name="rag_collection", + query=query_embedding, + limit=n_results + ) + + documents = [hit.payload["text"] for hit in results.points] + return "\n\n".join(documents) if documents else "No results found." + except Exception as e: + return f"Tool failed with error: {e}" +``` + +```typescript title="TypeScript" +/** + * This file contains the Python tool code as a string. + * Letta tools execute in Python, so we define the Python source code here. + */ + +export const searchResearchPapersToolCode = `def search_research_papers(query_text: str, n_results: int = 1) -> str: + """ + Searches the research paper collection for a given query using Hugging Face embeddings. + + Args: + query_text (str): The text to search for. + n_results (int): The number of results to return. + + Returns: + str: The most relevant documents found. + """ + import os + import requests + from qdrant_client import QdrantClient + + # Qdrant Cloud Client + url = os.getenv("QDRANT_URL") + api_key = os.getenv("QDRANT_API_KEY") + hf_api_key = os.getenv("HF_API_KEY") + + if not all([url, api_key, hf_api_key]): + raise ValueError("QDRANT_URL, QDRANT_API_KEY, and HF_API_KEY must be set as environment variables.") + + # Connect to Qdrant + client = QdrantClient(url=url, api_key=api_key) + + try: + # Generate embedding using Hugging Face + API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5" + headers = {"Authorization": f"Bearer {hf_api_key}"} + response = requests.post(API_URL, headers=headers, json={"inputs": query_text, "options": {"wait_for_model": True}}) + + if response.status_code != 200: + return f"HF API error: {response.status_code}" + + query_embedding = response.json() + + # Search Qdrant + results = client.query_points( + collection_name="rag_collection", + query=query_embedding, + limit=n_results + ) + + documents = [hit.payload["text"] for hit in results.points] + return "\\n\\n".join(documents) if documents else "No results found." + except Exception as e: + return f"Tool failed with error: {e}" +`; +``` + + + + +This function takes a query, connects to your database, retrieves the most relevant documents, and returns them as a single string. + +## Step 3: Configure an Agentic Research Assistant + +Next, we'll create a new agent. This agent will have a specific persona that instructs it on how to behave and, most importantly, it will be equipped with our new search tool. + +Create a file named `create_agentic_agent.py` (Python) or `create_agentic_agent.ts` (TypeScript): + + +```python title="Python" +import os +from letta_client import Letta +from dotenv import load_dotenv +from tools import search_research_papers + +load_dotenv() + +# Initialize the Letta client +client = Letta(token=os.getenv("LETTA_API_KEY")) + +# Create a tool from our Python function +search_tool = client.tools.create_from_function(func=search_research_papers) + +# Define the agent's persona +persona = """You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the `search_research_papers` tool to find relevant information. Then, answer the user's question based on the information returned by the tool.""" + +# Create the agent with the tool attached +agent = client.agents.create( + name="Agentic RAG Assistant", + description="A smart agent that can search a vector database to answer questions.", + memory_blocks=[ + { + "label": "persona", + "value": persona + } + ], + tools=[search_tool.name] +) + +print(f"Agent '{agent.name}' created with ID: {agent.id}") +``` + +```typescript title="TypeScript" +import { LettaClient } from '@letta-ai/letta-client'; +import * as dotenv from 'dotenv'; +import { searchResearchPapersToolCode } from './tools.js'; + +dotenv.config(); + +async function main() { + // Initialize the Letta client + const client = new LettaClient({ + token: process.env.LETTA_API_KEY || '' + }); + + // Create the tool from the Python code imported from tools.ts + const searchTool = await client.tools.create({ + sourceCode: searchResearchPapersToolCode, + sourceType: 'python' + }); + + console.log(`Tool '${searchTool.name}' created with ID: ${searchTool.id}`); + + // Define the agent's persona + const persona = `You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the \`search_research_papers\` tool to find relevant information. Then, answer the user's question based on the information returned by the tool.`; + + // Create the agent with the tool attached + const agent = await client.agents.create({ + name: 'Agentic RAG Assistant', + description: 'A smart agent that can search a vector database to answer questions.', + memoryBlocks: [ + { + label: 'persona', + value: persona + } + ], + toolIds: [searchTool.id] + }); + + console.log(`Agent '${agent.name}' created with ID: ${agent.id}`); +} + +main().catch(console.error); +``` + + + +**TypeScript users:** Notice how the TypeScript version imports `searchResearchPapersToolCode` from `tools.ts` (the file you created in Step 2). This keeps the code organized, just like the Python version imports from `tools.py`. + + + +Run this script once to create the agent in your Letta project: + + +```bash title="Python" +python create_agentic_agent.py +``` + +```bash title="TypeScript" +npx tsx create_agentic_agent.ts +``` + + +### Configure Tool Dependencies and Environment Variables + +For the tool to work within Letta's environment, we need to configure its dependencies and environment variables through the Letta dashboard. + + + + Navigate to your Letta dashboard and find the "Agentic RAG Assistant" agent you just created. + + + Click on your agent to open the Agent Development Environment (ADE). + + + In the ADE, select **Tools** from the sidebar, find and click on the `search_research_papers` tool, then click on the **Dependencies** tab. + + Add the following dependencies based on your database: + + + + ```txt + chromadb + ``` + + + + ```txt + pymongo + requests + certifi + dnspython + ``` + + + + ```txt + qdrant-client + requests + ``` + + + + ![Letta Dependencies Configuration](/images/letta-dep-config.png) + + + In the same tool configuration, navigate to **Simulator** > **Environment**. + + Add the following environment variables with their corresponding values from your `.env` file: + + + + ```txt + CHROMA_API_KEY + CHROMA_TENANT + CHROMA_DATABASE + ``` + + + + ```txt + MONGODB_URI + MONGODB_DB_NAME + HF_API_KEY + ``` + + + + ```txt + QDRANT_URL + QDRANT_API_KEY + HF_API_KEY + ``` + + + + Make sure to click upload button next to the environment variable to update the agent with the variable. + + ![Letta Tool Configuration](/images/letta-tool-config.png) + + + +Now, when the agent calls this tool, Letta's execution environment will know to install the necessary dependencies and will have access to the necessary credentials to connect to your database. + +## Step 4: Let the Agent Lead the Conversation + +With the agentic setup, our client-side code becomes incredibly simple. We no longer need to worry about retrieving context, we just send the user's raw question to the agent and let it handle the rest. + +Create the `agentic_rag.py` or `agentic_rag.ts` script: + + +```python title="Python" +import os +from letta_client import Letta +from dotenv import load_dotenv + +load_dotenv() + +# Initialize client +letta_client = Letta(token=os.getenv("LETTA_API_KEY")) + +AGENT_ID = "your-agentic-agent-id" # Replace with your new agent ID + +def main(): + while True: + user_query = input("\nAsk a question about the research papers: ") + if user_query.lower() in ['exit', 'quit']: + break + + response = letta_client.agents.messages.create( + agent_id=AGENT_ID, + messages=[{"role": "user", "content": user_query}] + ) + + for message in response.messages: + if message.message_type == 'assistant_message': + print(f"\nAgent: {message.content}") + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { LettaClient } from '@letta-ai/letta-client'; +import * as dotenv from 'dotenv'; +import * as readline from 'readline'; + +dotenv.config(); + +const AGENT_ID = 'your-agentic-agent-id'; // Replace with your new agent ID + +async function main() { + // Initialize client + const lettaClient = new LettaClient({ + token: process.env.LETTA_API_KEY || '' + }); + + const rl = readline.createInterface({ + input: process.stdin, + output: process.stdout + }); + + const askQuestion = (query: string): Promise => { + return new Promise((resolve) => { + rl.question(query, resolve); + }); + }; + + while (true) { + const userQuery = await askQuestion('\nAsk a question about the research papers (or type "exit" to quit): '); + + if (userQuery.toLowerCase() === 'exit' || userQuery.toLowerCase() === 'quit') { + rl.close(); + break; + } + + const response = await lettaClient.agents.messages.create(AGENT_ID, { + messages: [{ role: 'user', content: userQuery }] + }); + + for (const message of response.messages) { + if (message.messageType === 'assistant_message') { + console.log(`\nAgent: ${(message as any).content}`); + } + } + } +} + +main().catch(console.error); +``` + + + +Replace `your-agentic-agent-id` with the ID of the new agent you just created. + + +When you run this script, the agent receives the question, understands from its persona that it needs to search for information, calls the `search_research_papers` tool, gets the context, and then formulates an answer. All the RAG logic is handled by the agent, not your application. + +## Next Steps + +Now that you've integrated Agentic RAG with Letta, you can expand on this foundation: + + + + Learn how to manage retrieval on the client-side for complete control. + + + Explore creating more advanced custom tools for your agents. + + diff --git a/fern/pages/cookbooks/rag-overview.mdx b/fern/pages/cookbooks/rag-overview.mdx new file mode 100644 index 00000000..86944606 --- /dev/null +++ b/fern/pages/cookbooks/rag-overview.mdx @@ -0,0 +1,59 @@ +--- +title: RAG with Letta +subtitle: Connect your custom RAG pipeline to Letta agents +slug: guides/rag/overview +--- + +If you have an existing Retrieval-Augmented Generation (RAG) pipeline, you can connect it to your Letta agents. While Letta provides built-in features like archival memory, you can integrate your own RAG pipeline just as you would with any LLM API. This gives you full control over your data and retrieval methods. + +## What is RAG? + +Retrieval-Augmented Generation (RAG) enhances LLM responses by retrieving relevant information from external data sources before generating an answer. Instead of relying on the model's training data, a RAG system: + +1. Takes a user query. +2. Searches a vector database for relevant documents. +3. Includes those documents in the LLM's context. +4. Generates an informed response based on the retrieved information. + +## Choosing Your RAG Approach + +Letta supports two approaches for integrating RAG, depending on how much control you want over the retrieval process. + +| Aspect | Simple RAG | Agentic RAG | +|--------|------------|-------------| +| **Who Controls Retrieval** | Your application controls when retrieval happens and what the retrieval query is. | The agent decides when to retrieve and what query to use. | +| **Context Inclusion** | You can always include retrieval results in the context. | Retrieval happens only when the agent determines it's needed. | +| **Latency** | Lower – typically single-hop, as the agent doesn't need to do a tool call. | Higher – requires tool calls for retrieval. | +| **Client Code** | More complex, as it handles retrieval logic. | Simpler, as it just sends the user query. | +| **Customization** | You have full control via your retrieval function. | You have full control via your custom tool definition. | + +Both approaches work with any vector database. Our tutorials include examples for **ChromaDB**, **MongoDB Atlas**, and **Qdrant**. + +## Next Steps + +Ready to integrate RAG with your Letta agents? + + + + Learn how to manage retrieval on the client-side and inject context directly into your agent's messages. + + + Learn how to empower your agent with custom search tools for autonomous retrieval. + + + +## Additional Resources + +- [Custom Tools](/guides/agents/custom-tools) - Learn more about creating custom tools for your agents. +- [Memory Management](/guides/agents/memory) - Understand how Letta's built-in memory works. +- [Agent Development Environment](/guides/ade) - Configure and test your agents in the web interface. diff --git a/fern/pages/cookbooks/rag-simple.mdx b/fern/pages/cookbooks/rag-simple.mdx new file mode 100644 index 00000000..daeaf751 --- /dev/null +++ b/fern/pages/cookbooks/rag-simple.mdx @@ -0,0 +1,1511 @@ +--- +title: Simple RAG with Letta +subtitle: Manage retrieval on the client-side and inject context into your agent +slug: guides/rag/simple +--- + +In the Simple RAG approach, your application manages the retrieval process. You query your vector database, retrieve the relevant documents, and include them directly in the message you send to your Letta agent. + +By the end of this tutorial, you'll have a research assistant that uses your vector database to answer questions about scientific papers. + +## Prerequisites + +To follow along, you need free accounts for: + +- **[Letta](https://www.letta.com)** - To access the agent development platform +- **[Hugging Face](https://huggingface.co/)** - For generating embeddings (MongoDB and Qdrant users only) +- **One of the following vector databases:** + - **[ChromaDB Cloud](https://www.trychroma.com/)** for a hosted vector database + - **[MongoDB Atlas](https://www.mongodb.com/cloud/atlas/register)** for vector search with MongoDB + - **[Qdrant Cloud](https://cloud.qdrant.io/)** for a high-performance vector database + +You will also need Python 3.8+ or Node.js v18+ and a code editor. + + +**MongoDB and Qdrant users:** This guide uses Hugging Face's Inference API for generating embeddings. This approach keeps the tool code lightweight enough to run in Letta's sandbox environment. + + +## Getting Your API Keys + +We'll need API keys for Letta and your chosen vector database. + + + + + + If you don't have one, sign up for a free account at [letta.com](https://www.letta.com). + + + Once logged in, click on **API keys** in the sidebar. + ![Letta API Key Navigation](/images/letta-api-key-nav.png) + + + Click **+ Create API key**, give it a descriptive name, and click **Confirm**. Copy the key and save it somewhere safe. + + + + + + + + Sign up for a free account on the [ChromaDB Cloud website](https://www.trychroma.com/). + + + From your dashboard, create a new database. + ![ChromaDB New Project](/images/chroma-new-project.png) + + + In your project settings, you'll find your **API Key**, **Tenant**, **Database**, and **Host URL**. We'll need all of these for our scripts. + ![ChromaDB Keys](/images/chroma-keys.png) + + + + + + + + Sign up for a free account at [mongodb.com/cloud/atlas/register](https://www.mongodb.com/cloud/atlas/register). + + + Click **Build a Cluster** and select the free tier (M0). Choose your preferred cloud provider and region and click **Create deployment**. + ![Create MongoDB Cluster](/images/create-cluster-mongodb.png) + + + Next, set up connection security. + 1. Create a database user, then click **Choose a connection method** + 2. Choose **Drivers** to connect to your application, choose Python as the driver. + 3. Copy the **entire** connection string, including the query parameters at the end. It will look like this: + + ``` + mongodb+srv://:@cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0 + ``` + + + Make sure to replace `` with your actual database user password. Keep all the query parameters (`?retryWrites=true&w=majority&appName=Cluster0`) they are required for proper connection configuration. + + ![MongoDB Connection String](/images/connection-string-mongodb.png) + + + By default, MongoDB Atlas blocks all outside connections. You must grant access to the services that need to connect. + + 1. Navigate to **Database and Network Access** in the left sidebar. + 2. Click **Add IP Address**. + 3. For local development and testing, select **Allow Access From Anywhere**. This will add the IP address `0.0.0.0/0`. + 4. Click **Confirm**. + + ![MongoDB IP Configuration](/images/ip-config-mongodb.png) + + + For a production environment, you would replace `0.0.0.0/0` with a secure list of static IP addresses provided by your hosting service (e.g., Letta). + + + + + + + + + Sign up for a free account at [cloud.qdrant.io](https://cloud.qdrant.io/). + + + From your dashboard, click **Clusters** and then **+ Create**. Select the free tier and choose your preferred region. + + ![Create Qdrant Cluster](/images/qdrant-create-cluster.png) + + + Once your cluster is created, click on it to view details. + + Copy the following: + + 1. **API Key** + 2. **Cluster URL** + + ![Qdrant Connection Details](/images/qdrant-connection-details.png) + + + + + + + + Sign up for a free account at [huggingface.co](https://huggingface.co/join). + + + Click the profile icon in the top right. Navigate to **Settings** > **Access Tokens** (or go directly to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)). + + + Click **New token**, give it a name (e.g., "Letta RAG Demo"), select **Read** role, and click **Create token**. Copy the token and save it securely. + ![Hugging Face Token](/images/hf-token.png) + + + + +The free tier includes 30,000 API requests per month, which is more than enough for development and testing. + + + + +Once you have these credentials, create a `.env` file in your project directory. Add the credentials for your chosen database: + + + +```bash +LETTA_API_KEY="..." +CHROMA_API_KEY="..." +CHROMA_TENANT="..." +CHROMA_DATABASE="..." +``` + + +```bash +LETTA_API_KEY="..." +MONGODB_URI="mongodb+srv://username:password@cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0" +MONGODB_DB_NAME="rag_demo" +HF_API_KEY="..." +``` + + +```bash +LETTA_API_KEY="..." +QDRANT_URL="https://xxxxx.cloud.qdrant.io" +QDRANT_API_KEY="..." +HF_API_KEY="..." +``` + + + +## Step 1: Set Up the Vector Database + +First, we need to populate your chosen vector database with the content of the research papers. We'll use two papers for this demo: ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) and ["BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"](https://arxiv.org/abs/1810.04805). + +Before we begin, let's create a virtual environment to keep our dependencies isolated: + + + +Before we begin, let's create a Python virtual environment to keep our dependencies isolated: + +```bash +python -m venv venv +source venv/bin/activate # On Windows, use: venv\Scripts\activate +``` + + +Before we begin, let's create a new Node.js project: + +```bash +npm init -y +``` + +This will create a `package.json` file for you. + +Next, create a `tsconfig.json` file for TypeScript configuration: + +```json +{ + "compilerOptions": { + "target": "ES2020", + "module": "ESNext", + "moduleResolution": "node", + "esModuleInterop": true, + "skipLibCheck": true, + "strict": true + } +} +``` + +Update your `package.json` to use ES modules by adding this line: + +```json +"type": "module" +``` + + + +Download the research papers using curl with the `-L` flag to follow redirects: + +``` +curl -L -o 1706.03762.pdf https://arxiv.org/pdf/1706.03762.pdf +curl -L -o 1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf +``` + +Verify the PDFs downloaded correctly: + +``` +file 1706.03762.pdf 1810.04805.pdf +``` + +You should see output indicating these are PDF documents, not HTML files. + +Install the necessary packages for your chosen database: + + + + +```txt title="Python" +# requirements.txt +letta-client +chromadb +pypdf +python-dotenv +``` + +```bash title="TypeScript" +npm install @letta-ai/letta-client chromadb @chroma-core/default-embed dotenv pdf-ts +npm install --save-dev typescript @types/node ts-node tsx +``` + + +For Python, install with: +```bash +pip install -r requirements.txt +``` + + + + +```txt title="Python" +# requirements.txt +letta-client +pymongo +pypdf +python-dotenv +requests +certifi +dnspython +``` + +```bash title="TypeScript" +npm install @letta-ai/letta-client mongodb dotenv pdf-ts node-fetch +npm install --save-dev typescript @types/node ts-node tsx +``` + + +For Python, install with: +```bash +pip install -r requirements.txt +``` + + + + +```txt title="Python" +# requirements.txt +letta-client +qdrant-client +pypdf +python-dotenv +requests +``` + +```bash title="TypeScript" +npm install @letta-ai/letta-client @qdrant/js-client-rest dotenv node-fetch pdf-ts +npm install --save-dev typescript @types/node ts-node tsx +``` + + +For Python, install with: +```bash +pip install -r requirements.txt +``` + + + +Now create a `setup.py` or `setup.ts` file to load the PDFs, split them into chunks, and ingest them into your database: + + + + +```python title="Python" +import os +import chromadb +import pypdf +from dotenv import load_dotenv + +load_dotenv() + +def main(): + # Connect to ChromaDB Cloud + client = chromadb.CloudClient( + tenant=os.getenv("CHROMA_TENANT"), + database=os.getenv("CHROMA_DATABASE"), + api_key=os.getenv("CHROMA_API_KEY") + ) + + # Create or get the collection + collection = client.get_or_create_collection("rag_collection") + + # Ingest PDFs + pdf_files = ["1706.03762.pdf", "1810.04805.pdf"] + for pdf_file in pdf_files: + print(f"Ingesting {pdf_file}...") + reader = pypdf.PdfReader(pdf_file) + for i, page in enumerate(reader.pages): + text = page.extract_text() + if text: + collection.add( + ids=[f"{pdf_file}-{i}"], + documents=[text] + ) + + print("\nIngestion complete!") + print(f"Total documents in collection: {collection.count()}") + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { CloudClient } from 'chromadb'; +import { DefaultEmbeddingFunction } from '@chroma-core/default-embed'; +import * as dotenv from 'dotenv'; +import * as path from 'path'; +import * as fs from 'fs'; +import { pdfToPages } from 'pdf-ts'; + +dotenv.config(); + +async function main() { + // Connect to ChromaDB Cloud + const client = new CloudClient({ + apiKey: process.env.CHROMA_API_KEY || '', + tenant: process.env.CHROMA_TENANT || '', + database: process.env.CHROMA_DATABASE || '' + }); + + // Create embedding function + const embedder = new DefaultEmbeddingFunction(); + + // Create or get the collection + const collection = await client.getOrCreateCollection({ + name: 'rag_collection', + embeddingFunction: embedder + }); + + // Ingest PDFs + const pdfFiles = ['1706.03762.pdf', '1810.04805.pdf']; + + for (const pdfFile of pdfFiles) { + console.log(`Ingesting ${pdfFile}...`); + const pdfPath = path.join(__dirname, pdfFile); + const dataBuffer = fs.readFileSync(pdfPath); + + const pages = await pdfToPages(dataBuffer); + + for (let i = 0; i < pages.length; i++) { + const text = pages[i].text.trim(); + if (text) { + await collection.add({ + ids: [`${pdfFile}-${i}`], + documents: [text] + }); + } + } + } + + console.log('\nIngestion complete!'); + const count = await collection.count(); + console.log(`Total documents in collection: ${count}`); +} + +main().catch(console.error); +``` + + + + + +```python title="Python" +import os +import pymongo +import pypdf +import requests +import certifi +from dotenv import load_dotenv + +load_dotenv() + +def get_embedding(text, api_key): + """Get embedding from Hugging Face Inference API""" + API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5" + headers = {"Authorization": f"Bearer {api_key}"} + + response = requests.post(API_URL, headers=headers, json={"inputs": [text], "options": {"wait_for_model": True}}) + + if response.status_code == 200: + return response.json()[0] + else: + raise Exception(f"HF API error: {response.status_code} - {response.text}") + +def main(): + hf_api_key = os.getenv("HF_API_KEY") + mongodb_uri = os.getenv("MONGODB_URI") + db_name = os.getenv("MONGODB_DB_NAME") + + if not all([hf_api_key, mongodb_uri, db_name]): + print("Error: Ensure HF_API_KEY, MONGODB_URI, and MONGODB_DB_NAME are in .env file") + return + + # Connect to MongoDB Atlas using certifi + client = pymongo.MongoClient(mongodb_uri, tlsCAFile=certifi.where()) + db = client[db_name] + collection = db["rag_collection"] + + # Ingest PDFs + pdf_files = ["1706.03762.pdf", "1810.04805.pdf"] + for pdf_file in pdf_files: + print(f"Ingesting {pdf_file}...") + reader = pypdf.PdfReader(pdf_file) + for i, page in enumerate(reader.pages): + text = page.extract_text() + if not text: # Skip empty pages + continue + + # Generate embedding using Hugging Face + print(f" Processing page {i+1}...") + try: + embedding = get_embedding(text, hf_api_key) + collection.insert_one({ + "_id": f"{pdf_file}-{i}", + "text": text, + "embedding": embedding, + "source": pdf_file, + "page": i + }) + except Exception as e: + print(f" Could not process page {i+1}: {e}") + + + print("\nIngestion complete!") + print(f"Total documents in collection: {collection.count_documents({})}") + + # Create vector search index + print("\nNext: Go to your MongoDB Atlas dashboard and create a search index named 'vector_index'") + print('''{ + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + } + ] +}''') + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { MongoClient } from 'mongodb'; +import * as dotenv from 'dotenv'; +import { pdfToPages } from 'pdf-ts'; +import * as fs from 'fs'; +import fetch from 'node-fetch'; + +dotenv.config(); + +async function getEmbedding(text: string, apiKey: string): Promise { + const API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5"; + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; + + const response = await fetch(API_URL, { + method: 'POST', + headers: headers, + body: JSON.stringify({ + inputs: [text], + options: { wait_for_model: true } + }) + }); + + if (response.ok) { + const result: any = await response.json(); + return result[0]; + } else { + const errorText = await response.text(); + throw new Error(`HF API error: ${response.status} - ${errorText}`); + } +} + +async function main() { + const hfApiKey = process.env.HF_API_KEY || ''; + const mongoUri = process.env.MONGODB_URI || ''; + const dbName = process.env.MONGODB_DB_NAME || ''; + + if (!hfApiKey || !mongoUri || !dbName) { + console.error('Error: Ensure HF_API_KEY, MONGODB_URI, and MONGODB_DB_NAME are in .env file'); + return; + } + + // Connect to MongoDB Atlas + const client = new MongoClient(mongoUri); + + try { + await client.connect(); + console.log('Connected to MongoDB Atlas'); + + const db = client.db(dbName); + const collection = db.collection('rag_collection'); + + // Ingest PDFs + const pdfFiles = ['1706.03762.pdf', '1810.04805.pdf']; + + for (const pdfFile of pdfFiles) { + console.log(`Ingesting ${pdfFile}...`); + + const dataBuffer = fs.readFileSync(pdfFile); + const pages = await pdfToPages(dataBuffer); + + for (let i = 0; i < pages.length; i++) { + const text = pages[i].text; + + if (!text || text.trim().length === 0) { + continue; // Skip empty pages + } + + // Generate embedding using Hugging Face + console.log(` Processing page ${i + 1}...`); + try { + const embedding = await getEmbedding(text, hfApiKey); + + await collection.insertOne({ + _id: `${pdfFile}-${i}`, + text: text, + embedding: embedding, + source: pdfFile, + page: i + }); + } catch (error) { + console.log(` Could not process page ${i + 1}: ${error}`); + } + } + } + + const docCount = await collection.countDocuments({}); + console.log('\nIngestion complete!'); + console.log(`Total documents in collection: ${docCount}`); + + console.log('\nNext: Go to your MongoDB Atlas dashboard and create a search index named "vector_index"'); + console.log(JSON.stringify({ + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + } + ] + }, null, 2)); + + } catch (error) { + console.error('Error:', error); + } finally { + await client.close(); + } +} + +main(); +``` + + + + + +```python title="Python" +import os +import pypdf +import requests +from dotenv import load_dotenv +from qdrant_client import QdrantClient +from qdrant_client.models import Distance, VectorParams, PointStruct + +load_dotenv() + +def get_embedding(text, api_key): + """Get embedding from Hugging Face Inference API""" + API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5" + headers = {"Authorization": f"Bearer {api_key}"} + + response = requests.post(API_URL, headers=headers, json={"inputs": text, "options": {"wait_for_model": True}}) + + if response.status_code == 200: + return response.json() + else: + raise Exception(f"HF API error: {response.status_code} - {response.text}") + +def main(): + hf_api_key = os.getenv("HF_API_KEY") + + if not hf_api_key: + print("Error: HF_API_KEY not found in .env file") + return + + # Connect to Qdrant Cloud + client = QdrantClient( + url=os.getenv("QDRANT_URL"), + api_key=os.getenv("QDRANT_API_KEY") + ) + + # Create collection + collection_name = "rag_collection" + + # Check if collection exists, if not create it + collections = client.get_collections().collections + if collection_name not in [c.name for c in collections]: + client.create_collection( + collection_name=collection_name, + vectors_config=VectorParams(size=384, distance=Distance.COSINE) + ) + + # Ingest PDFs + pdf_files = ["1706.03762.pdf", "1810.04805.pdf"] + point_id = 0 + + for pdf_file in pdf_files: + print(f"Ingesting {pdf_file}...") + reader = pypdf.PdfReader(pdf_file) + for i, page in enumerate(reader.pages): + text = page.extract_text() + + # Generate embedding using Hugging Face + print(f" Processing page {i+1}...") + embedding = get_embedding(text, hf_api_key) + + client.upsert( + collection_name=collection_name, + points=[ + PointStruct( + id=point_id, + vector=embedding, + payload={"text": text, "source": pdf_file, "page": i} + ) + ] + ) + point_id += 1 + + print("\nIngestion complete!") + collection_info = client.get_collection(collection_name) + print(f"Total documents in collection: {collection_info.points_count}") + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { QdrantClient } from '@qdrant/js-client-rest'; +import { pdfToPages } from 'pdf-ts'; +import dotenv from 'dotenv'; +import fetch from 'node-fetch'; +import * as fs from 'fs'; + +dotenv.config(); + +async function getEmbedding(text: string, apiKey: string): Promise { + const API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5"; + + const response = await fetch(API_URL, { + method: 'POST', + headers: { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }, + body: JSON.stringify({ + inputs: [text], + options: { wait_for_model: true } + }) + }); + + if (response.ok) { + const result: any = await response.json(); + return result[0]; + } else { + const error = await response.text(); + throw new Error(`HuggingFace API error: ${response.status} - ${error}`); + } +} + +async function main() { + const hfApiKey = process.env.HF_API_KEY || ''; + + if (!hfApiKey) { + console.error('Error: HF_API_KEY not found in .env file'); + return; + } + + // Connect to Qdrant Cloud + const client = new QdrantClient({ + url: process.env.QDRANT_URL || '', + apiKey: process.env.QDRANT_API_KEY || '' + }); + + const collectionName = 'rag_collection'; + + // Check if collection exists, if not create it + const collections = await client.getCollections(); + const collectionExists = collections.collections.some(c => c.name === collectionName); + + if (!collectionExists) { + console.log('Creating collection...'); + await client.createCollection(collectionName, { + vectors: { + size: 384, + distance: 'Cosine' + } + }); + } + + // Ingest PDFs + const pdfFiles = ['1706.03762.pdf', '1810.04805.pdf']; + let pointId = 0; + + for (const pdfFile of pdfFiles) { + console.log(`\nIngesting ${pdfFile}...`); + const dataBuffer = fs.readFileSync(pdfFile); + const pages = await pdfToPages(dataBuffer); + + for (let i = 0; i < pages.length; i++) { + const text = pages[i].text; + + console.log(` Processing page ${i + 1}...`); + const embedding = await getEmbedding(text, hfApiKey); + + await client.upsert(collectionName, { + wait: true, + points: [ + { + id: pointId, + vector: embedding, + payload: { + text: text, + source: pdfFile, + page: i + } + } + ] + }); + pointId++; + } + } + + console.log('\nIngestion complete!'); + const collectionInfo = await client.getCollection(collectionName); + console.log(`Total documents in collection: ${collectionInfo.points_count}`); +} + +main().catch(console.error); +``` + + + + +Run the script from your terminal: + + + +```bash +python setup.py +``` + + +```bash +npx tsx setup.ts +``` + + + +If you are using MongoDB Atlas, you must manually create a vector search index by following the steps below. + + + +**MongoDB Atlas users:** The setup script ingests your data, but MongoDB Atlas requires you to manually create a vector search index before queries will work. Follow these steps carefully. + + + + + Log in to your [MongoDB Atlas dashboard](https://cloud.mongodb.com/), and click on **"Search & Vector Search"** in the sidebar. + + + Click **"Create Search Index"**, choose Vector Search. + + + - Database: Select **`rag_demo`** (or whatever you set as `MONGODB_DB_NAME`) + - Collection: Select **`rag_collection`** + + + - Index Name: Enter **`vector_index`** (this exact name is required by the code) + - Choose **"JSON Editor"** (not "Visual Editor"). Click **Next** + - Paste this JSON definition: + ```json + { + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + } + ] + } + ``` + **Note:** 384 dimensions is for Hugging Face's `BAAI/bge-small-en-v1.5` model. + + + Click **Next**, then click **"Create Search Index"**. The index will take a few minutes to build. Wait until the status shows as **"Active"** before proceeding. + + + + +Your vector database is now populated with research paper content and ready to query. + +## Step 2: Create a Simple Letta Agent + +For the Simple RAG approach, the Letta agent doesn't need any special tools or complex instructions. Its only job is to answer a question based on the context we provide. We can create this agent programmatically using the Letta SDK. + +Create a file named `create_agent.py` or `create_agent.ts`: + + +```python +import os +from letta_client import Letta +from dotenv import load_dotenv + +load_dotenv() + +# Initialize the Letta client +client = Letta(token=os.getenv("LETTA_API_KEY")) + +# Create the agent +agent = client.agents.create( + name="Simple RAG Agent", + description="This agent answers questions based on provided context. It has no tools or special memory.", + memory_blocks=[ + { + "label": "persona", + "value": "You are a helpful research assistant. Answer the user's question based *only* on the context provided." + } + ] +) + +print(f"Agent '{agent.name}' created with ID: {agent.id}") +``` + + +```typescript +import { LettaClient } from '@letta-ai/letta-client'; +import * as dotenv from 'dotenv'; + +dotenv.config(); + +async function main() { + // Initialize the Letta client + const client = new LettaClient({ + token: process.env.LETTA_API_KEY || '' + }); + + // Create the agent + const agent = await client.agents.create({ + name: 'Simple RAG Agent', + description: 'This agent answers questions based on provided context. It has no tools or special memory.', + memoryBlocks: [ + { + label: 'persona', + value: 'You are a helpful research assistant. Answer the user\'s question based *only* on the context provided.' + } + ] + }); + + console.log(`Agent '${agent.name}' created with ID: ${agent.id}`); +} + +main().catch(console.error); +``` + + + +Run this script once to create the agent in your Letta project. + + +```bash title="Python" +python create_agent.py``` + +```bash title="TypeScript" +npx tsx create_agent.ts +``` + + +![Stateless Agent in Letta UI](/images/stateless-agent-ui.png) + +## Step 3: Query, Format, and Ask + +Now we'll write the main script, `simple_rag.py` or `simple_rag.ts`, that ties everything together. This script will: + +1. Take a user's question. +2. Query your vector database to find the most relevant document chunks. +3. Construct a detailed prompt that includes both the user's question and the retrieved context. +4. Send this combined prompt to our Simple Letta agent and print the response. + + + + +```python title="Python" +import os +import chromadb +from letta_client import Letta +from dotenv import load_dotenv + +load_dotenv() + +# Initialize clients +letta_client = Letta(token=os.getenv("LETTA_API_KEY")) +chroma_client = chromadb.CloudClient( + tenant=os.getenv("CHROMA_TENANT"), + database=os.getenv("CHROMA_DATABASE"), + api_key=os.getenv("CHROMA_API_KEY") +) + +AGENT_ID = "your-agent-id" # Replace with your agent ID + +def main(): + while True: + question = input("\nAsk a question about the research papers: ") + if question.lower() in ['exit', 'quit']: + break + + # 1. Query ChromaDB + collection = chroma_client.get_collection("rag_collection") + results = collection.query(query_texts=[question], n_results=3) + context = "\n".join(results["documents"][0]) + + # 2. Construct the prompt + prompt = f'''Context from research paper: +{context} + +Question: {question} + +Answer:''' + + # 3. Send to Letta Agent + response = letta_client.agents.messages.create( + agent_id=AGENT_ID, + messages=[{"role": "user", "content": prompt}] + ) + + for message in response.messages: + if message.message_type == 'assistant_message': + print(f"\nAgent: {message.content}") + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { LettaClient } from '@letta-ai/letta-client'; +import { CloudClient } from 'chromadb'; +import { DefaultEmbeddingFunction } from '@chroma-core/default-embed'; +import * as dotenv from 'dotenv'; +import * as readline from 'readline'; + +dotenv.config(); + +const AGENT_ID = 'your-agent-id'; // Replace with your agent ID + +// Initialize clients +const lettaClient = new LettaClient({ + token: process.env.LETTA_API_KEY || '' +}); + +const chromaClient = new CloudClient({ + apiKey: process.env.CHROMA_API_KEY || '', + tenant: process.env.CHROMA_TENANT || '', + database: process.env.CHROMA_DATABASE || '' +}); + +async function main() { + const embedder = new DefaultEmbeddingFunction(); + const collection = await chromaClient.getCollection({ + name: 'rag_collection', + embeddingFunction: embedder + }); + + const rl = readline.createInterface({ + input: process.stdin, + output: process.stdout + }); + + const askQuestion = () => { + rl.question('\nAsk a question about the research papers (or type "exit" to quit): ', async (question) => { + if (question.toLowerCase() === 'exit' || question.toLowerCase() === 'quit') { + rl.close(); + return; + } + + // 1. Query ChromaDB + const results = await collection.query({ + queryTexts: [question], + nResults: 3 + }); + + const context = results.documents[0].join('\n'); + + // 2. Construct the prompt + const prompt = `Context from research paper: +${context} + +Question: ${question} + +Answer:`; + + // 3. Send to Letta Agent + const response = await lettaClient.agents.messages.create(AGENT_ID, { + messages: [{ role: 'user', content: prompt }] + }); + + for (const message of response.messages) { + if (message.messageType === 'assistant_message') { + console.log(`\nAgent: ${(message as any).content}`); + } + } + + askQuestion(); + }); + }; + + askQuestion(); +} + +main().catch(console.error); +``` + + + + + +```python title="Python" +import os +import pymongo +import requests +import certifi +from letta_client import Letta +from dotenv import load_dotenv + +load_dotenv() + +def get_embedding(text, api_key): + """Get embedding from Hugging Face Inference API""" + API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5" + headers = {"Authorization": f"Bearer {api_key}"} + response = requests.post(API_URL, headers=headers, json={"inputs": [text], "options": {"wait_for_model": True}}) + + if response.status_code == 200: + return response.json()[0] + else: + raise Exception(f"HuggingFace API error: {response.status_code} - {response.text}") + +# Initialize clients +letta_client = Letta(token=os.getenv("LETTA_API_KEY")) +mongo_client = pymongo.MongoClient(os.getenv("MONGODB_URI"), tlsCAFile=certifi.where()) +db = mongo_client[os.getenv("MONGODB_DB_NAME")] +collection = db["rag_collection"] +hf_api_key = os.getenv("HF_API_KEY") + +AGENT_ID = "your-agent-id" # Replace with your agent ID + +def main(): + while True: + question = input("\nAsk a question about the research papers: ") + if question.lower() in ['exit', 'quit']: + break + + # 1. Query MongoDB Atlas Vector Search + query_embedding = get_embedding(question, hf_api_key) + + results = collection.aggregate([ + { + "$vectorSearch": { + "index": "vector_index", + "path": "embedding", + "queryVector": query_embedding, + "numCandidates": 100, + "limit": 3 + } + }, + { + "$project": { + "text": 1, + "source": 1, + "page": 1, + "score": {"$meta": "vectorSearchScore"} + } + } + ]) + + contexts = [doc.get("text", "") for doc in results] + context = "\n\n".join(contexts) + + # 2. Construct the prompt + prompt = f'''Context from research paper: +{context} + +Question: {question} + +Answer:''' + + # 3. Send to Letta Agent + response = letta_client.agents.messages.create( + agent_id=AGENT_ID, + messages=[{"role": "user", "content": prompt}] + ) + + for message in response.messages: + if message.message_type == 'assistant_message': + print(f"\nAgent: {message.content}") + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { LettaClient } from '@letta-ai/letta-client'; +import { MongoClient } from 'mongodb'; +import * as dotenv from 'dotenv'; +import * as readline from 'readline'; +import fetch from 'node-fetch'; + +dotenv.config(); + +const AGENT_ID = 'your-agent-id'; // Replace with your agent ID + +async function getEmbedding(text: string, apiKey: string): Promise { + const API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5"; + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; + + const response = await fetch(API_URL, { + method: 'POST', + headers: headers, + body: JSON.stringify({ + inputs: [text], + options: { wait_for_model: true } + }) + }); + + if (response.ok) { + const result: any = await response.json(); + return result[0]; + } else { + const errorText = await response.text(); + throw new Error(`HuggingFace API error: ${response.status} - ${errorText}`); + } +} + +async function main() { + const lettaApiKey = process.env.LETTA_API_KEY || ''; + const mongoUri = process.env.MONGODB_URI || ''; + const dbName = process.env.MONGODB_DB_NAME || ''; + const hfApiKey = process.env.HF_API_KEY || ''; + + if (!lettaApiKey || !mongoUri || !dbName || !hfApiKey) { + console.error('Error: Ensure LETTA_API_KEY, MONGODB_URI, MONGODB_DB_NAME, and HF_API_KEY are in .env file'); + return; + } + + // Initialize clients + const lettaClient = new LettaClient({ + token: lettaApiKey + }); + + const mongoClient = new MongoClient(mongoUri); + await mongoClient.connect(); + console.log('Connected to MongoDB Atlas\n'); + + const db = mongoClient.db(dbName); + const collection = db.collection('rag_collection'); + + const rl = readline.createInterface({ + input: process.stdin, + output: process.stdout + }); + + const askQuestion = () => { + rl.question('\nAsk a question about the research papers (or type "exit" to quit): ', async (question) => { + if (question.toLowerCase() === 'exit' || question.toLowerCase() === 'quit') { + await mongoClient.close(); + rl.close(); + return; + } + + try { + // 1. Query MongoDB Atlas Vector Search + const queryEmbedding = await getEmbedding(question, hfApiKey); + + const results = collection.aggregate([ + { + $vectorSearch: { + index: 'vector_index', + path: 'embedding', + queryVector: queryEmbedding, + numCandidates: 100, + limit: 3 + } + }, + { + $project: { + text: 1, + source: 1, + page: 1, + score: { $meta: 'vectorSearchScore' } + } + } + ]); + + const docs = await results.toArray(); + const contexts = docs.map(doc => doc.text || ''); + const context = contexts.join('\n\n'); + + // 2. Construct the prompt + const prompt = `Context from research paper: +${context} + +Question: ${question} + +Answer:`; + + // 3. Send to Letta Agent + const response = await lettaClient.agents.messages.create(AGENT_ID, { + messages: [{ role: 'user', content: prompt }] + }); + + for (const message of response.messages) { + if (message.messageType === 'assistant_message') { + console.log(`\nAgent: ${(message as any).content}`); + } + } + + } catch (error) { + console.error('Error:', error); + } + + askQuestion(); + }); + }; + + askQuestion(); +} + +main().catch(console.error); +``` + + + + + +```python title="Python" +import os +import requests +from letta_client import Letta +from dotenv import load_dotenv +from qdrant_client import QdrantClient + +load_dotenv() + +def get_embedding(text, api_key): + """Get embedding from Hugging Face Inference API""" + API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5" + headers = {"Authorization": f"Bearer {api_key}"} + response = requests.post(API_URL, headers=headers, json={"inputs": text, "options": {"wait_for_model": True}}) + + if response.status_code == 200: + return response.json() + else: + raise Exception(f"HuggingFace API error: {response.status_code} - {response.text}") + +# Initialize clients +letta_client = Letta(token=os.getenv("LETTA_API_KEY")) +qdrant_client = QdrantClient( + url=os.getenv("QDRANT_URL"), + api_key=os.getenv("QDRANT_API_KEY") +) +hf_api_key = os.getenv("HF_API_KEY") + +AGENT_ID = "your-agent-id" # Replace with your agent ID + +def main(): + while True: + question = input("\nAsk a question about the research papers: ") + if question.lower() in ['exit', 'quit']: + break + + # 1. Query Qdrant + query_embedding = get_embedding(question, hf_api_key) + + results = qdrant_client.query_points( + collection_name="rag_collection", + query=query_embedding, + limit=3 + ) + + contexts = [hit.payload["text"] for hit in results.points] + context = "\n".join(contexts) + + # 2. Construct the prompt + prompt = f'''Context from research paper: +{context} + +Question: {question} + +Answer:''' + + # 3. Send to Letta Agent + response = letta_client.agents.messages.create( + agent_id=AGENT_ID, + messages=[{"role": "user", "content": prompt}] + ) + + for message in response.messages: + if message.message_type == 'assistant_message': + print(f"\nAgent: {message.content}") + +if __name__ == "__main__": + main() +``` + +```typescript title="TypeScript" +import { QdrantClient } from '@qdrant/js-client-rest'; +import { LettaClient } from '@letta-ai/letta-client'; +import dotenv from 'dotenv'; +import fetch from 'node-fetch'; +import * as readline from 'readline'; + +dotenv.config(); + +async function getEmbedding(text: string, apiKey: string): Promise { + const API_URL = "https://api-inference.huggingface.co/models/BAAI/bge-small-en-v1.5"; + + const response = await fetch(API_URL, { + method: 'POST', + headers: { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }, + body: JSON.stringify({ + inputs: [text], + options: { wait_for_model: true } + }) + }); + + if (response.ok) { + const result: any = await response.json(); + return result[0]; + } else { + const error = await response.text(); + throw new Error(`HuggingFace API error: ${response.status} - ${error}`); + } +} + +async function main() { + // Initialize clients + const lettaClient = new LettaClient({ + token: process.env.LETTA_API_KEY || '' + }); + + const qdrantClient = new QdrantClient({ + url: process.env.QDRANT_URL || '', + apiKey: process.env.QDRANT_API_KEY || '' + }); + + const hfApiKey = process.env.HF_API_KEY || ''; + const AGENT_ID = 'your-agent-id'; // Replace with your agent ID + + const rl = readline.createInterface({ + input: process.stdin, + output: process.stdout + }); + + const askQuestion = (query: string): Promise => { + return new Promise((resolve) => { + rl.question(query, resolve); + }); + }; + + while (true) { + const question = await askQuestion('\nAsk a question about the research papers (or type "exit" to quit): '); + + if (question.toLowerCase() === 'exit' || question.toLowerCase() === 'quit') { + rl.close(); + break; + } + + // 1. Query Qdrant + const queryEmbedding = await getEmbedding(question, hfApiKey); + + const results = await qdrantClient.query( + 'rag_collection', + { + query: queryEmbedding, + limit: 3, + with_payload: true + } + ); + + const contexts = results.points.map((hit: any) => hit.payload.text); + const context = contexts.join('\n'); + + // 2. Construct the prompt + const prompt = `Context from research paper: +${context} + +Question: ${question} + +Answer:`; + + // 3. Send to Letta Agent + const response = await lettaClient.agents.messages.create(AGENT_ID, { + messages: [{ role: 'user', content: prompt }] + }); + + for (const message of response.messages) { + if (message.messageType === 'assistant_message') { + console.log(`\nAgent: ${(message as any).content}`); + } + } + } +} + +main().catch(console.error); +``` + + + + + +Replace `your-agent-id` with the actual ID of the agent you created in the previous step. + + +When you run this script, your application performs the retrieval, and the Letta agent provides the answer based on the context it receives. This gives you full control over the data pipeline. + +## Next Steps + +Now that you've integrated Simple RAG with Letta, you can explore more advanced integration patterns: + + + + Learn how to empower your agent with custom search tools for autonomous retrieval. + + + Explore creating more advanced custom tools for your agents. + + diff --git a/fern/pages/tutorials/letta-rag.mdx b/fern/pages/tutorials/letta-rag.mdx new file mode 100644 index 00000000..d4b9445b --- /dev/null +++ b/fern/pages/tutorials/letta-rag.mdx @@ -0,0 +1,471 @@ +--- +title: Connect Your Custom RAG Pipeline to a Letta Agent +subtitle: A step-by-step guide to integrating external vector databases with Letta Cloud. +slug: cookbooks/custom-rag-integration +--- + +You've built a powerful Retrieval-Augmented Generation (RAG) pipeline with its own vector database, but now you want to connect it to an intelligent agent. This guide is for developers who want to integrate their existing RAG stack with Letta, giving them full control over their data while leveraging Letta's advanced agentic capabilities. + +By the end of this tutorial, we'll build a research assistant that uses a ChromaDB Cloud database to answer questions about scientific papers. We will explore two distinct methods for achieving this. + +### What You'll Learn + +- **Standard RAG:** How to manage retrieval on your client and inject context directly into the agent's prompt. This gives you maximum control over the data the agent sees. +- **Agentic RAG:** How to empower your agent with a custom tool, allowing it to decide when and what to search in your vector database. This creates a more autonomous and flexible agent. + +## Prerequisites + +To follow along, you need free accounts for the following platforms: + +- **[Letta](https://www.letta.com):** To access the agent development platform +- **[ChromaDB Cloud](https://www.trychroma.com/):** To host our vector database + +You will also need Python 3.8+ and a code editor. + +### Getting Your API Keys + +We'll need two API keys for this tutorial. + + + + + + If you don't have one, sign up for a free account at [letta.com](https://www.letta.com). + + + Once logged in, click on **API keys** in the sidebar. + ![Letta API Key Navigation](/images/letta-api-key-nav.png) + + + Click **+ Create API key**, give it a descriptive name, and click **Confirm**. Copy the key and save it somewhere safe. + + + + + + + + Sign up for a free account on the [ChromaDB Cloud website](https://www.trychroma.com/). + + + From your dashboard, create a new database. + ![ChromaDB New Project](/images/chroma-new-project.png) + + + In your project settings, you will find your **API Key** and **Host URL**. We'll need both of these for our scripts. + ![ChromaDB Keys](/images/chroma-keys.png) + + + + + +Once you have these keys, create a `.env` file in your project directory and add them like this: + +``` +LETTA_API_KEY="..." +CHROMA_API_KEY="..." +CHROMA_TENANT="..." +CHROMA_DATABASE="..." +``` + +## Part 1: Standard RAG — Full Control on the Client-Side + +In the standard RAG approach, our application takes the lead. It fetches the relevant information from our ChromaDB database and then passes this context, along with our query, to a simple Letta agent. This method is direct, transparent, and keeps all the retrieval logic in our client application. + +### Step 1: Set Up the Cloud Vector Database + +First, we need to populate our ChromaDB Cloud database with the content of the research papers. We'll use two papers for this demo: ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) and ["BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"](https://arxiv.org/abs/1810.04805). + +Before we begin, let's create a Python virtual environment to keep our dependencies isolated: + +```shell +python -m venv venv +source venv/bin/activate # On Windows, use: venv\Scripts\activate +``` + +Download the research papers we'll be using: + +```shell +curl -o 1706.03762.pdf https://arxiv.org/pdf/1706.03762.pdf +curl -o 1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf +``` + +Now, create a `requirements.txt` file with the necessary Python libraries: + +``` +letta-client +chromadb +pypdf +python-dotenv +``` + +Install them using pip: + +```shell +pip install -r requirements.txt +``` + +Now, create a `setup.py` file. This script will load the PDFs, split them into manageable chunks, and ingest them into a ChromaDB collection named `rag_collection`. + +```python +import os +import chromadb +import pypdf +from dotenv import load_dotenv + +load_dotenv() + +def main(): + # Connect to ChromaDB Cloud + client = chromadb.CloudClient( + tenant=os.getenv("CHROMA_TENANT"), + database=os.getenv("CHROMA_DATABASE"), + api_key=os.getenv("CHROMA_API_KEY") + ) + + # Create or get the collection + collection = client.get_or_create_collection("rag_collection") + + # Ingest PDFs + pdf_files = ["1706.03762.pdf", "1810.04805.pdf"] + for pdf_file in pdf_files: + print(f"Ingesting {pdf_file}...") + reader = pypdf.PdfReader(pdf_file) + for i, page in enumerate(reader.pages): + collection.add( + ids=[f"{pdf_file}-{i}"], + documents=[page.extract_text()] + ) + + print("\nIngestion complete!") + print(f"Total documents in collection: {collection.count()}") + +if __name__ == "__main__": + main() + +``` + +Run the script from your terminal: + +```shell +python setup.py +``` + +This script connects to your ChromaDB Cloud instance, creates a collection, and adds the text content of each page from the PDFs as a separate document. Your vector database is now ready. + +### Step 2: Create a "Stateless" Letta Agent + +For the standard RAG approach, the Letta agent doesn't need any special tools or complex instructions. Its only job is to answer a question based on the context we provide. We can create this agent programmatically using the Letta SDK. + +Create a file named `create_agent.py`: + +```python +import os +from letta_client import Letta +from dotenv import load_dotenv + +load_dotenv() + +# Initialize the Letta client +client = Letta(token=os.getenv("LETTA_API_KEY")) + +# Create the agent +agent = client.agents.create( + name="Stateless RAG Agent", + description="This agent answers questions based on provided context. It has no tools or special memory.", + memory_blocks=[ + { + "label": "persona", + "value": "You are a helpful research assistant. Answer the user's question based *only* on the context provided." + } + ] +) + +print(f"Agent '{agent.name}' created with ID: {agent.id}") + +``` + +Run this script once to create the agent in your Letta project. + +```shell +python create_agent.py +``` + +![Stateless Agent in Letta UI](/images/stateless-agent-ui.png) + +### Step 3: Query, Format, and Ask + +Now we'll write the main script, `standard_rag.py`, that ties everything together. This script will: + +1. Take a user's question. +2. Query the `rag-demo` collection in ChromaDB to find the most relevant document chunks. +3. Construct a detailed prompt that includes both the user's question and the retrieved context. +4. Send this combined prompt to our stateless Letta agent and print the response. + +```python +import os +import chromadb +from letta_client import Letta +from dotenv import load_dotenv + +load_dotenv() + +# Initialize clients +letta_client = Letta(token=os.getenv("LETTA_API_KEY")) +chroma_client = chromadb.CloudClient( + tenant=os.getenv("CHROMA_TENANT"), + database=os.getenv("CHROMA_DATABASE"), + api_key=os.getenv("CHROMA_API_KEY") +) + +AGENT_ID = "your-stateless-agent-id" # Replace with your agent ID + +def main(): + while True: + question = input("Ask a question about the research papers: ") + if question.lower() in ['exit', 'quit']: + break + + # 1. Query ChromaDB + collection = chroma_client.get_collection("rag_collection") + results = collection.query(query_texts=[question], n_results=3) + context = "\n".join(results["documents"][0]) + + # 2. Construct the prompt + prompt = f'''Context from research paper: +{context} +Question: {question} +Answer:''' + + # 3. Send to Letta Agent + response = letta_client.agents.messages.create( + agent_id=AGENT_ID, + messages=[{"role": "user", "content": prompt}] + ) + + for message in response.messages: + if message.message_type == 'assistant_message': + print(f"Agent: {message.content}") + +if __name__ == "__main__": + main() + +``` + + +Replace `your-stateless-agent-id` with the actual ID of the agent you created in the previous step. + + +When you run this script, your application performs the retrieval, and the Letta agent simply provides the answer based on the context it receives. This gives you full control over the data pipeline. + +## Part 2: Agentic RAG — Empowering Your Agent with Tools + +In the agentic RAG approach, we delegate the retrieval process to the agent itself. Instead of our application deciding what to search for, we provide the agent with a custom tool that allows it to query our ChromaDB database directly. This makes the agent more autonomous and our client-side code much simpler. + +### Step 4: Create a Custom Search Tool + +A Letta tool is essentially a Python function that your agent can call. We'll create a function that searches our ChromaDB collection and returns the results. Letta handles the complexities of exposing this function to the agent securely. + +Create a new file named `tools.py`: + +```python +import chromadb +import os + +def search_research_papers(query_text: str, n_results: int = 1) -> str: + """ + Searches the research paper collection for a given query. + Args: + query_text (str): The text to search for. + n_results (int): The number of results to return. + Returns: + str: The most relevant document found. + """ + # ChromaDB Cloud Client + # This tool code is executed on the Letta server. It expects the ChromaDB + # credentials to be passed as environment variables. + api_key = os.getenv("CHROMA_API_KEY") + tenant = os.getenv("CHROMA_TENANT") + database = os.getenv("CHROMA_DATABASE") + + if not all([api_key, tenant, database]): + # If run locally without the env vars, this will fail early. + # When run by the agent, these will be provided by the tool execution environment. + raise ValueError("CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE must be set as environment variables.") + + client = chromadb.CloudClient( + tenant=tenant, + database=database, + api_key=api_key + ) + + collection = client.get_or_create_collection("rag_collection") + + try: + results = collection.query( + query_texts=[query_text], + n_results=n_results + ) + + document = results['documents'][0][0] + return document + except Exception as e: + return f"Tool failed with error: {e}" + +``` + +This function, `search_research_papers`, takes a query, connects to our database, retrieves the top three most relevant documents, and returns them as a single string. + +### Step 5: Configure a "Smart" Research Agent + +Next, we'll create a new, more advanced agent. This agent will have a specific persona that instructs it on how to behave and, most importantly, it will be equipped with our new search tool. + +Create a file named `create_agentic_agent.py`: + +```python +import os +from letta_client import Letta +from dotenv import load_dotenv +from tools import search_research_papers + +load_dotenv() + +# Initialize the Letta client +client = Letta(token=os.getenv("LETTA_API_KEY")) + +# Create a tool from our Python function +search_tool = client.tools.create_from_function(func=search_research_papers) + +# Define the agent's persona +persona = """You are a world-class research assistant. Your goal is to answer questions accurately by searching through a database of research papers. When a user asks a question, first use the `search_research_papers` tool to find relevant information. Then, answer the user's question based on the information returned by the tool.""" + +# Create the agent with the tool attached +agent = client.agents.create( + name="Agentic RAG Assistant", + description="A smart agent that can search a vector database to answer questions.", + memory_blocks=[ + { + "label": "persona", + "value": persona + } + ], + tools=[search_tool.name] +) + +print(f"Agent '{agent.name}' created with ID: {agent.id}") + +``` + +Run this script to create the agent: + +```shell +python create_agentic_agent.py +``` + +#### Configure Tool Dependencies and Environment Variables + +For the tool to work within Letta's environment, we need to configure its dependencies and environment variables through the Letta dashboard. + + + + Navigate to your Letta dashboard and find the "Agentic RAG Assistant" agent you just created. + + + Click on your agent to open the Agent Development Environment (ADE). + + + - In the ADE, select **Tools** from the sidebar + - Find and click on the `search_research_papers` tool + - Click on the **Dependencies** tab + - Add `chromadb` as a dependency + + ![Letta Dependencies Configuration](/images/letta-dep-config.png) + + + - In the same tool configuration, navigate to **Simulator** > **Environment** + - Add the following environment variables with their corresponding values from your `.env` file: + - `CHROMA_API_KEY` + - `CHROMA_TENANT` + - `CHROMA_DATABASE` + + ![Letta Tool Configuration](/images/letta-tool-config.png) + + + +Now, when the agent calls this tool, Letta's execution environment will know to install `chromadb` and will have access to the necessary credentials to connect to your database. + +### Step 6: Let the Agent Lead the Conversation + +With the agentic setup, our client-side code becomes incredibly simple. We no longer need to worry about retrieving context; we just send the user's raw question to the agent and let it handle the rest. + +Create the `agentic_rag.py` script: + +```python +import os +from letta_client import Letta +from dotenv import load_dotenv + +load_dotenv() + +# Initialize client +letta_client = Letta(token=os.getenv("LETTA_API_KEY")) + +AGENT_ID = "your-agentic-agent-id" # Replace with your new agent ID + +def main(): + while True: + user_query = input("Ask a question about the research papers: ") + if user_query.lower() in ['exit', 'quit']: + break + + response = letta_client.agents.messages.create( + agent_id=AGENT_ID, + messages=[{"role": "user", "content": user_query}] + ) + + for message in response.messages: + if message.message_type == 'assistant_message': + print(f"Agent: {message.content}") + +if __name__ == "__main__": + main() + +``` + + +Replace `your-agentic-agent-id` with the ID of the new agent you just created. + + +When you run this script, the agent receives the question, understands from its persona that it needs to search for information, calls the `search_research_papers` tool, gets the context, and then formulates an answer. All the RAG logic is handled by the agent, not your application. + +## Which Approach Is Right for You? + +We've explored two powerful methods for connecting a custom RAG pipeline to a Letta agent. The best choice depends on your specific needs. + +- **Use Standard RAG when...** + - You want to maintain complete, fine-grained control over the retrieval process. + - Your retrieval logic is complex and better handled by your application code. + - You want to keep your agent as simple as possible and minimize its autonomy. + +- **Use Agentic RAG when...** + - You want to build a more autonomous agent that can handle complex, multi-step queries. + - You prefer simpler, cleaner client-side code. + - You want the agent to decide *when* and *what* to search for, leading to more dynamic conversations. + +## What's Next? + +Now that you've integrated a custom RAG pipeline, you can expand on this foundation. Here are a few ideas: + + + +Swap out ChromaDB for other providers like Weaviate, Pinecone, or a database you already have in production. The core logic remains the same: create a tool that queries your database and equip your agent with it. + + + +Create tools that not only read from your database but also write new information to it. This would allow your agent to learn from its interactions and update its own knowledge base over time. + + + +Expand your RAG pipeline to include more documents, web pages, or other sources of information. The more comprehensive your data source, the more capable your agent will become. + +