Add voice message transcription support (#54)

* Add voice message transcription support (all channels)

Adds OpenAI Whisper transcription for voice messages across all channels:
- Telegram: ctx.message.voice
- WhatsApp: audioMessage via downloadMediaMessage
- Signal: audio attachments from local files
- Slack: audio files via url_private_download
- Discord: audio attachments

Voice messages sent to agent as "[Voice message]: <transcript>"

Configuration (config takes priority over env):
- lettabot.yaml: transcription.apiKey, transcription.model
- Env: OPENAI_API_KEY, TRANSCRIPTION_MODEL

Closes #47

Written by Cameron ◯ Letta Code

"The best interface is no interface - just talk."

* Add voice message documentation to README

- Add Voice Messages to features list
- Add configuration section for transcription
- Document supported channels

Written by Cameron ◯ Letta Code

* Notify users when voice transcription is not configured

Instead of silently ignoring voice messages, send a helpful message
linking to the documentation.

Written by Cameron ◯ Letta Code

* feat: upgrade to letta-code-sdk main + fix Signal voice transcription

- Switch from published SDK (v0.0.3) to local main branch (file:../letta-code-sdk)
- Update bot.ts for new SDK API: createSession(agentId?, options) signature
- Add conversationId tracking to store for proper conversation persistence
- Fix Signal voice transcription: read attachments from ~/.local/share/signal-cli/attachments/
- Fix Telegram markdown ESM issue: make markdownToTelegramV2 async with dynamic import
- Add transcription config to lettabot.yaml
- Add extensive debug logging for queue and session processing

Signal voice messages now properly transcribe and send to agent.

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: update Signal CLI message sender to use daemon JSON-RPC API

- Switch from signal-cli-rest-api to signal-cli daemon (port 8090)
- Use JSON-RPC send method instead of REST /v2/send
- Support group IDs with group: prefix
- Handle 201 responses and empty bodies correctly

🐾 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* Add placeholder for untranscribed voice messages on Signal

If a voice-only message arrives and transcription fails or is disabled,
forward a placeholder so the user knows the message was received.

Written by Cameron ◯ Letta Code

---------

Co-authored-by: Letta <noreply@letta.com>
This commit is contained in:
Cameron
2026-02-01 20:07:57 -08:00
committed by GitHub
parent cf1f03e0c7
commit 053763bf89
16 changed files with 511 additions and 773 deletions

View File

@@ -132,9 +132,34 @@ Ask the bot owner to approve with:
this.client.on('messageCreate', async (message) => {
if (message.author?.bot) return;
const content = (message.content || '').trim();
let content = (message.content || '').trim();
const userId = message.author?.id;
if (!userId) return;
// Handle audio attachments
const audioAttachment = message.attachments.find(a => a.contentType?.startsWith('audio/'));
if (audioAttachment?.url) {
try {
const { loadConfig } = await import('../config/index.js');
const config = loadConfig();
if (!config.transcription?.apiKey && !process.env.OPENAI_API_KEY) {
await message.reply('Voice messages require OpenAI API key for transcription. See: https://github.com/letta-ai/lettabot#voice-messages');
} else {
// Download audio
const response = await fetch(audioAttachment.url);
const buffer = Buffer.from(await response.arrayBuffer());
const { transcribeAudio } = await import('../transcription/index.js');
const ext = audioAttachment.contentType?.split('/')[1] || 'mp3';
const transcript = await transcribeAudio(buffer, audioAttachment.name || `audio.${ext}`);
console.log(`[Discord] Transcribed audio: "${transcript.slice(0, 50)}..."`);
content = (content ? content + '\n' : '') + `[Voice message]: ${transcript}`;
}
} catch (error) {
console.error('[Discord] Error transcribing audio:', error);
}
}
const access = await this.checkAccess(userId);
if (access === 'blocked') {