343 lines
6.8 KiB
Plaintext
343 lines
6.8 KiB
Plaintext
# CLI Commands
|
|
|
|
The **letta-evals** command-line interface lets you run evaluations, validate configurations, and inspect available components.
|
|
|
|
<Note>
|
|
**Quick overview:**
|
|
- **`run`** - Execute an evaluation suite (most common)
|
|
- **`validate`** - Check suite configuration without running
|
|
- **`list-extractors`** - Show available extractors
|
|
- **`list-graders`** - Show available grader functions
|
|
- **Exit codes** - 0 for pass, 1 for fail (perfect for CI/CD)
|
|
</Note>
|
|
|
|
**Typical workflow:**
|
|
1. Validate your suite: `letta-evals validate suite.yaml`
|
|
2. Run evaluation: `letta-evals run suite.yaml --output results/`
|
|
3. Check exit code: `echo $?` (0 = passed, 1 = failed)
|
|
|
|
## run
|
|
|
|
Run an evaluation suite.
|
|
|
|
```bash
|
|
letta-evals run <suite.yaml> [options]
|
|
```
|
|
|
|
### Arguments
|
|
|
|
- `suite.yaml`: Path to the suite configuration file (required)
|
|
|
|
### Options
|
|
|
|
#### --output, -o
|
|
Save results to a directory.
|
|
|
|
```bash
|
|
letta-evals run suite.yaml --output results/
|
|
```
|
|
|
|
Creates:
|
|
- `results/header.json`: Evaluation metadata
|
|
- `results/summary.json`: Aggregate metrics and configuration
|
|
- `results/results.jsonl`: Per-sample results (one JSON per line)
|
|
|
|
#### --quiet, -q
|
|
Quiet mode - only show pass/fail result.
|
|
|
|
```bash
|
|
letta-evals run suite.yaml --quiet
|
|
```
|
|
|
|
Output:
|
|
```
|
|
✓ PASSED
|
|
```
|
|
|
|
#### --max-concurrent
|
|
Maximum concurrent sample evaluations. **Default**: 15
|
|
|
|
```bash
|
|
letta-evals run suite.yaml --max-concurrent 10
|
|
```
|
|
|
|
Higher values = faster evaluation but more resource usage.
|
|
|
|
#### --api-key
|
|
Letta API key (overrides LETTA_API_KEY environment variable).
|
|
|
|
```bash
|
|
letta-evals run suite.yaml --api-key your-key
|
|
```
|
|
|
|
#### --base-url
|
|
Letta server base URL (overrides suite config and environment variable).
|
|
|
|
```bash
|
|
letta-evals run suite.yaml --base-url http://localhost:8283
|
|
```
|
|
|
|
#### --project-id
|
|
Letta project ID for cloud deployments.
|
|
|
|
```bash
|
|
letta-evals run suite.yaml --project-id proj_abc123
|
|
```
|
|
|
|
#### --cached, -c
|
|
Path to cached results (JSONL) for re-grading trajectories without re-running the agent.
|
|
|
|
```bash
|
|
letta-evals run suite.yaml --cached previous_results.jsonl
|
|
```
|
|
|
|
Use this to test different graders on the same agent trajectories.
|
|
|
|
#### --num-runs
|
|
Run the evaluation multiple times to measure consistency. **Default**: 1
|
|
|
|
```bash
|
|
letta-evals run suite.yaml --num-runs 10
|
|
```
|
|
|
|
**Output with multiple runs:**
|
|
- Each run creates a separate `run_N/` directory with individual results
|
|
- An `aggregate_stats.json` file contains statistics across all runs (mean, standard deviation, pass rate)
|
|
|
|
### Examples
|
|
|
|
Basic run:
|
|
```bash
|
|
letta-evals run suite.yaml # Run evaluation, show results in terminal
|
|
```
|
|
|
|
Save results:
|
|
```bash
|
|
letta-evals run suite.yaml --output evaluation-results/ # Save to directory
|
|
```
|
|
|
|
Letta Cloud:
|
|
```bash
|
|
letta-evals run suite.yaml \
|
|
--base-url https://api.letta.com \
|
|
--api-key $LETTA_API_KEY \
|
|
--project-id proj_abc123
|
|
```
|
|
|
|
Quiet CI mode:
|
|
```bash
|
|
letta-evals run suite.yaml --quiet
|
|
if [ $? -eq 0 ]; then
|
|
echo "Evaluation passed"
|
|
else
|
|
echo "Evaluation failed"
|
|
exit 1
|
|
fi
|
|
```
|
|
|
|
### Exit Codes
|
|
|
|
- `0`: Evaluation passed (gate criteria met)
|
|
- `1`: Evaluation failed (gate criteria not met or error)
|
|
|
|
## validate
|
|
|
|
Validate a suite configuration without running it.
|
|
|
|
```bash
|
|
letta-evals validate <suite.yaml>
|
|
```
|
|
|
|
Checks:
|
|
- YAML syntax is valid
|
|
- Required fields are present
|
|
- Paths exist
|
|
- Configuration is consistent
|
|
- Grader/extractor combinations are valid
|
|
|
|
Output on success:
|
|
```
|
|
✓ Suite configuration is valid
|
|
```
|
|
|
|
Output on error:
|
|
```
|
|
✗ Validation failed:
|
|
- Agent file not found: agent.af
|
|
- Grader 'my_metric' references unknown function
|
|
```
|
|
|
|
## list-extractors
|
|
|
|
List all available extractors.
|
|
|
|
```bash
|
|
letta-evals list-extractors
|
|
```
|
|
|
|
Output:
|
|
```
|
|
Available extractors:
|
|
last_assistant - Extract the last assistant message
|
|
first_assistant - Extract the first assistant message
|
|
all_assistant - Concatenate all assistant messages
|
|
pattern - Extract content matching regex
|
|
tool_arguments - Extract tool call arguments
|
|
tool_output - Extract tool return value
|
|
after_marker - Extract content after a marker
|
|
memory_block - Extract from memory block (requires agent_state)
|
|
```
|
|
|
|
## list-graders
|
|
|
|
List all available grader functions.
|
|
|
|
```bash
|
|
letta-evals list-graders
|
|
```
|
|
|
|
Output:
|
|
```
|
|
Available graders:
|
|
exact_match - Exact string match with ground_truth
|
|
contains - Check if contains ground_truth
|
|
regex_match - Match regex pattern
|
|
ascii_printable_only - Validate ASCII-only content
|
|
```
|
|
|
|
## help
|
|
|
|
Show help information.
|
|
|
|
```bash
|
|
letta-evals --help
|
|
```
|
|
|
|
Show help for a specific command:
|
|
|
|
```bash
|
|
letta-evals run --help
|
|
letta-evals validate --help
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
### LETTA_API_KEY
|
|
API key for Letta authentication.
|
|
|
|
```bash
|
|
export LETTA_API_KEY=your-key-here
|
|
```
|
|
|
|
### LETTA_BASE_URL
|
|
Letta server base URL.
|
|
|
|
```bash
|
|
export LETTA_BASE_URL=http://localhost:8283
|
|
```
|
|
|
|
### LETTA_PROJECT_ID
|
|
Letta project ID (for cloud).
|
|
|
|
```bash
|
|
export LETTA_PROJECT_ID=proj_abc123
|
|
```
|
|
|
|
### OPENAI_API_KEY
|
|
OpenAI API key (for rubric graders).
|
|
|
|
```bash
|
|
export OPENAI_API_KEY=your-openai-key
|
|
```
|
|
|
|
## Configuration Priority
|
|
|
|
Configuration values are resolved in this order (highest to lowest priority):
|
|
|
|
1. CLI arguments (`--api-key`, `--base-url`, `--project-id`)
|
|
2. Suite YAML configuration
|
|
3. Environment variables
|
|
|
|
## Using in CI/CD
|
|
|
|
### GitHub Actions
|
|
|
|
```yaml
|
|
name: Run Evals
|
|
on: [push]
|
|
|
|
jobs:
|
|
evaluate:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v2
|
|
|
|
- name: Install dependencies
|
|
run: pip install letta-evals
|
|
|
|
- name: Run evaluation
|
|
env:
|
|
LETTA_API_KEY: ${{ secrets.LETTA_API_KEY }}
|
|
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
|
run: |
|
|
letta-evals run suite.yaml --quiet --output results/
|
|
|
|
- name: Upload results
|
|
uses: actions/upload-artifact@v2
|
|
with:
|
|
name: eval-results
|
|
path: results/
|
|
```
|
|
|
|
### GitLab CI
|
|
|
|
```yaml
|
|
evaluate:
|
|
script:
|
|
- pip install letta-evals
|
|
- letta-evals run suite.yaml --quiet --output results/
|
|
artifacts:
|
|
paths:
|
|
- results/
|
|
variables:
|
|
LETTA_API_KEY: $LETTA_API_KEY
|
|
OPENAI_API_KEY: $OPENAI_API_KEY
|
|
```
|
|
|
|
## Debugging
|
|
|
|
### Common Issues
|
|
|
|
<Warning>
|
|
**"Agent file not found"**
|
|
|
|
```bash
|
|
# Check file exists relative to suite YAML location
|
|
ls -la path/to/agent.af
|
|
```
|
|
</Warning>
|
|
|
|
<Warning>
|
|
**"Connection refused"**
|
|
|
|
```bash
|
|
# Verify Letta server is running
|
|
curl http://localhost:8283/v1/health
|
|
```
|
|
</Warning>
|
|
|
|
<Warning>
|
|
**"Invalid API key"**
|
|
|
|
```bash
|
|
# Check environment variable is set
|
|
echo $LETTA_API_KEY
|
|
```
|
|
</Warning>
|
|
|
|
## Next Steps
|
|
|
|
- [Understanding Results](/evals/results-metrics/understanding-results) - Interpreting evaluation output
|
|
- [Suite YAML Reference](/evals/configuration/suite-yaml-reference) - Complete configuration options
|
|
- [Getting Started](/evals/get-started/getting-started) - Complete tutorial with examples
|