feat: add voice memo responses via TTS (#394)
This commit is contained in:
@@ -11,6 +11,7 @@ Send a message to the most recent chat, or target a specific channel/chat.
|
||||
lettabot-message send --text "Hello from a background task"
|
||||
lettabot-message send --text "Hello" --channel slack --chat C123456
|
||||
lettabot-message send --file /tmp/report.pdf --text "Report attached" --channel discord --chat 123456789
|
||||
lettabot-message send --file /tmp/voice.ogg --voice # Send as native voice note
|
||||
```
|
||||
|
||||
## lettabot-react
|
||||
|
||||
@@ -675,6 +675,36 @@ transcription:
|
||||
model: whisper-1 # Default
|
||||
```
|
||||
|
||||
## Text-to-Speech (TTS) Configuration
|
||||
|
||||
Voice memo generation via the `<voice>` directive. The agent can reply with voice notes on Telegram and WhatsApp:
|
||||
|
||||
```yaml
|
||||
tts:
|
||||
provider: elevenlabs # "elevenlabs" (default) or "openai"
|
||||
apiKey: sk_475a... # Provider API key
|
||||
voiceId: 21m00Tcm4TlvDq8ikWAM # Voice selection (see below)
|
||||
model: eleven_multilingual_v2 # Optional model override
|
||||
```
|
||||
|
||||
**ElevenLabs** (default):
|
||||
- `voiceId` is an ElevenLabs voice ID. Default: `21m00Tcm4TlvDq8ikWAM` (Rachel). Browse voices at [elevenlabs.io/voice-library](https://elevenlabs.io/voice-library).
|
||||
- `model` defaults to `eleven_multilingual_v2`.
|
||||
|
||||
**OpenAI**:
|
||||
- `voiceId` is one of: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`. Default: `alloy`.
|
||||
- `model` defaults to `tts-1`. Use `tts-1-hd` for higher quality.
|
||||
|
||||
The agent uses the `<voice>` directive in responses:
|
||||
|
||||
```xml
|
||||
<actions>
|
||||
<voice>Hey, here's a quick voice reply!</voice>
|
||||
</actions>
|
||||
```
|
||||
|
||||
The `lettabot-tts` CLI tool is also available for background tasks (heartbeats, cron).
|
||||
|
||||
## Attachments Configuration
|
||||
|
||||
```yaml
|
||||
@@ -807,5 +837,11 @@ Environment variables override config file values:
|
||||
| `LOG_LEVEL` | `server.logLevel` (fatal/error/warn/info/debug/trace). Overrides config. |
|
||||
| `LETTABOT_LOG_LEVEL` | Alias for `LOG_LEVEL` |
|
||||
| `LOG_FORMAT` | Set to `json` for structured JSON output (recommended for Railway/Docker) |
|
||||
| `TTS_PROVIDER` | TTS backend: `elevenlabs` (default) or `openai` |
|
||||
| `ELEVENLABS_API_KEY` | API key for ElevenLabs TTS |
|
||||
| `ELEVENLABS_VOICE_ID` | ElevenLabs voice ID (default: `21m00Tcm4TlvDq8ikWAM` / Rachel) |
|
||||
| `ELEVENLABS_MODEL_ID` | ElevenLabs model (default: `eleven_multilingual_v2`) |
|
||||
| `OPENAI_TTS_VOICE` | OpenAI TTS voice (default: `alloy`) |
|
||||
| `OPENAI_TTS_MODEL` | OpenAI TTS model (default: `tts-1`) |
|
||||
|
||||
See [SKILL.md](../SKILL.md) for complete environment variable reference.
|
||||
|
||||
@@ -48,13 +48,14 @@ Sends a file or image to the same channel/chat as the triggering message.
|
||||
```xml
|
||||
<send-file path="/tmp/report.pdf" caption="Report attached" />
|
||||
<send-file path="/tmp/photo.png" kind="image" caption="Look!" />
|
||||
<send-file path="/tmp/voice.ogg" kind="audio" cleanup="true" />
|
||||
<send-file path="/tmp/temp-export.csv" cleanup="true" />
|
||||
```
|
||||
|
||||
**Attributes:**
|
||||
- `path` / `file` (required) -- Local file path on the LettaBot server
|
||||
- `caption` / `text` (optional) -- Caption text for the file
|
||||
- `kind` (optional) -- `image` or `file` (defaults to auto-detect based on extension)
|
||||
- `kind` (optional) -- `image`, `file`, or `audio` (defaults to auto-detect based on extension). Audio files (.ogg, .opus, .mp3, .m4a, .wav, .aac, .flac) are auto-detected as `audio`.
|
||||
- `cleanup` (optional) -- `true` to delete the file after sending (default: false)
|
||||
|
||||
**Security:**
|
||||
@@ -63,6 +64,22 @@ Sends a file or image to the same channel/chat as the triggering message.
|
||||
- File size is limited to `sendFileMaxSize` (default: 50MB).
|
||||
- The `cleanup` attribute only works when `sendFileCleanup: true` is set in the agent's features config (disabled by default).
|
||||
|
||||
### `<voice>`
|
||||
|
||||
Generates speech from text via TTS and sends it as a native voice note. No tool calls needed.
|
||||
|
||||
```xml
|
||||
<voice>Hey, here's a quick voice reply!</voice>
|
||||
```
|
||||
|
||||
The text content is sent to the configured TTS provider (see [TTS Configuration](./configuration.md#text-to-speech-tts-configuration)), converted to audio, and delivered as a voice note. Audio is automatically cleaned up after sending.
|
||||
|
||||
- Requires `tts` to be configured in `lettabot.yaml`
|
||||
- Renders as native voice bubbles on Telegram and WhatsApp
|
||||
- Discord and Slack receive a playable audio attachment
|
||||
- On Telegram, falls back to audio file if voice messages are restricted by Premium privacy settings
|
||||
- Can be combined with text: any text after the `</actions>` block is sent as a normal message alongside the voice note
|
||||
|
||||
### `<no-reply/>`
|
||||
|
||||
Suppresses response delivery entirely. The agent's text is discarded.
|
||||
@@ -88,13 +105,13 @@ Backslash-escaped quotes (common when LLMs generate XML inside a JSON context) a
|
||||
|
||||
## Channel Support
|
||||
|
||||
| Channel | `addReaction` | `send-file` | Notes |
|
||||
|-----------|:---:|:---:|-------|
|
||||
| Telegram | Yes | Yes | Reactions limited to Telegram's [allowed reaction set](https://core.telegram.org/bots/api#reactiontype). |
|
||||
| Slack | Yes | Yes | Reactions use Slack emoji names (`:thumbsup:` style). |
|
||||
| Discord | Yes | Yes | Custom server emoji not yet supported. |
|
||||
| WhatsApp | No | Yes | Reactions skipped with a warning. |
|
||||
| Signal | No | No | Directive skipped with a warning. |
|
||||
| Channel | `addReaction` | `send-file` | `kind="audio"` | Notes |
|
||||
|-----------|:---:|:---:|:---:|-------|
|
||||
| Telegram | Yes | Yes | Voice note (`sendVoice`) | Falls back to `sendAudio` if voice messages are restricted by Telegram Premium privacy settings. |
|
||||
| Slack | Yes | Yes | Audio attachment | Reactions use Slack emoji names (`:thumbsup:` style). |
|
||||
| Discord | Yes | Yes | Audio attachment | Custom server emoji not yet supported. |
|
||||
| WhatsApp | No | Yes | Voice note (PTT) | Sent with `ptt: true` for native voice bubble. |
|
||||
| Signal | No | No | No | Directive skipped with a warning. |
|
||||
|
||||
When a channel doesn't implement `addReaction`, the directive is silently skipped and a warning is logged. This never blocks message delivery.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user