Files
letta-server/fern/pages/evals/cli/commands.mdx
2025-10-24 15:13:47 -07:00

343 lines
6.8 KiB
Plaintext

# CLI Commands
The **letta-evals** command-line interface lets you run evaluations, validate configurations, and inspect available components.
<Note>
**Quick overview:**
- **`run`** - Execute an evaluation suite (most common)
- **`validate`** - Check suite configuration without running
- **`list-extractors`** - Show available extractors
- **`list-graders`** - Show available grader functions
- **Exit codes** - 0 for pass, 1 for fail (perfect for CI/CD)
</Note>
**Typical workflow:**
1. Validate your suite: `letta-evals validate suite.yaml`
2. Run evaluation: `letta-evals run suite.yaml --output results/`
3. Check exit code: `echo $?` (0 = passed, 1 = failed)
## run
Run an evaluation suite.
```bash
letta-evals run <suite.yaml> [options]
```
### Arguments
- `suite.yaml`: Path to the suite configuration file (required)
### Options
#### --output, -o
Save results to a directory.
```bash
letta-evals run suite.yaml --output results/
```
Creates:
- `results/header.json`: Evaluation metadata
- `results/summary.json`: Aggregate metrics and configuration
- `results/results.jsonl`: Per-sample results (one JSON per line)
#### --quiet, -q
Quiet mode - only show pass/fail result.
```bash
letta-evals run suite.yaml --quiet
```
Output:
```
✓ PASSED
```
#### --max-concurrent
Maximum concurrent sample evaluations. **Default**: 15
```bash
letta-evals run suite.yaml --max-concurrent 10
```
Higher values = faster evaluation but more resource usage.
#### --api-key
Letta API key (overrides LETTA_API_KEY environment variable).
```bash
letta-evals run suite.yaml --api-key your-key
```
#### --base-url
Letta server base URL (overrides suite config and environment variable).
```bash
letta-evals run suite.yaml --base-url http://localhost:8283
```
#### --project-id
Letta project ID for cloud deployments.
```bash
letta-evals run suite.yaml --project-id proj_abc123
```
#### --cached, -c
Path to cached results (JSONL) for re-grading trajectories without re-running the agent.
```bash
letta-evals run suite.yaml --cached previous_results.jsonl
```
Use this to test different graders on the same agent trajectories.
#### --num-runs
Run the evaluation multiple times to measure consistency. **Default**: 1
```bash
letta-evals run suite.yaml --num-runs 10
```
**Output with multiple runs:**
- Each run creates a separate `run_N/` directory with individual results
- An `aggregate_stats.json` file contains statistics across all runs (mean, standard deviation, pass rate)
### Examples
Basic run:
```bash
letta-evals run suite.yaml # Run evaluation, show results in terminal
```
Save results:
```bash
letta-evals run suite.yaml --output evaluation-results/ # Save to directory
```
Letta Cloud:
```bash
letta-evals run suite.yaml \
--base-url https://api.letta.com \
--api-key $LETTA_API_KEY \
--project-id proj_abc123
```
Quiet CI mode:
```bash
letta-evals run suite.yaml --quiet
if [ $? -eq 0 ]; then
echo "Evaluation passed"
else
echo "Evaluation failed"
exit 1
fi
```
### Exit Codes
- `0`: Evaluation passed (gate criteria met)
- `1`: Evaluation failed (gate criteria not met or error)
## validate
Validate a suite configuration without running it.
```bash
letta-evals validate <suite.yaml>
```
Checks:
- YAML syntax is valid
- Required fields are present
- Paths exist
- Configuration is consistent
- Grader/extractor combinations are valid
Output on success:
```
✓ Suite configuration is valid
```
Output on error:
```
✗ Validation failed:
- Agent file not found: agent.af
- Grader 'my_metric' references unknown function
```
## list-extractors
List all available extractors.
```bash
letta-evals list-extractors
```
Output:
```
Available extractors:
last_assistant - Extract the last assistant message
first_assistant - Extract the first assistant message
all_assistant - Concatenate all assistant messages
pattern - Extract content matching regex
tool_arguments - Extract tool call arguments
tool_output - Extract tool return value
after_marker - Extract content after a marker
memory_block - Extract from memory block (requires agent_state)
```
## list-graders
List all available grader functions.
```bash
letta-evals list-graders
```
Output:
```
Available graders:
exact_match - Exact string match with ground_truth
contains - Check if contains ground_truth
regex_match - Match regex pattern
ascii_printable_only - Validate ASCII-only content
```
## help
Show help information.
```bash
letta-evals --help
```
Show help for a specific command:
```bash
letta-evals run --help
letta-evals validate --help
```
## Environment Variables
### LETTA_API_KEY
API key for Letta authentication.
```bash
export LETTA_API_KEY=your-key-here
```
### LETTA_BASE_URL
Letta server base URL.
```bash
export LETTA_BASE_URL=http://localhost:8283
```
### LETTA_PROJECT_ID
Letta project ID (for cloud).
```bash
export LETTA_PROJECT_ID=proj_abc123
```
### OPENAI_API_KEY
OpenAI API key (for rubric graders).
```bash
export OPENAI_API_KEY=your-openai-key
```
## Configuration Priority
Configuration values are resolved in this order (highest to lowest priority):
1. CLI arguments (`--api-key`, `--base-url`, `--project-id`)
2. Suite YAML configuration
3. Environment variables
## Using in CI/CD
### GitHub Actions
```yaml
name: Run Evals
on: [push]
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install dependencies
run: pip install letta-evals
- name: Run evaluation
env:
LETTA_API_KEY: ${{ secrets.LETTA_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
letta-evals run suite.yaml --quiet --output results/
- name: Upload results
uses: actions/upload-artifact@v2
with:
name: eval-results
path: results/
```
### GitLab CI
```yaml
evaluate:
script:
- pip install letta-evals
- letta-evals run suite.yaml --quiet --output results/
artifacts:
paths:
- results/
variables:
LETTA_API_KEY: $LETTA_API_KEY
OPENAI_API_KEY: $OPENAI_API_KEY
```
## Debugging
### Common Issues
<Warning>
**"Agent file not found"**
```bash
# Check file exists relative to suite YAML location
ls -la path/to/agent.af
```
</Warning>
<Warning>
**"Connection refused"**
```bash
# Verify Letta server is running
curl http://localhost:8283/v1/health
```
</Warning>
<Warning>
**"Invalid API key"**
```bash
# Check environment variable is set
echo $LETTA_API_KEY
```
</Warning>
## Next Steps
- [Understanding Results](/evals/results-metrics/understanding-results) - Interpreting evaluation output
- [Suite YAML Reference](/evals/configuration/suite-yaml-reference) - Complete configuration options
- [Getting Started](/evals/get-started/getting-started) - Complete tutorial with examples