Add voice message transcription support (#54)

* Add voice message transcription support (all channels) Adds OpenAI Whisper transcription for voice messages across all channels: - Telegram: ctx.message.voice - WhatsApp: audioMessage via downloadMediaMessage - Signal: audio attachments from local files - Slack: audio files via url_private_download - Discord: audio attachments Voice messages sent to agent as "[Voice message]: <transcript>" Configuration (config takes priority over env): - lettabot.yaml: transcription.apiKey, transcription.model - Env: OPENAI_API_KEY, TRANSCRIPTION_MODEL Closes #47 Written by Cameron ◯ Letta Code "The best interface is no interface - just talk." * Add voice message documentation to README - Add Voice Messages to features list - Add configuration section for transcription - Document supported channels Written by Cameron ◯ Letta Code * Notify users when voice transcription is not configured Instead of silently ignoring voice messages, send a helpful message linking to the documentation. Written by Cameron ◯ Letta Code * feat: upgrade to letta-code-sdk main + fix Signal voice transcription - Switch from published SDK (v0.0.3) to local main branch (file:../letta-code-sdk) - Update bot.ts for new SDK API: createSession(agentId?, options) signature - Add conversationId tracking to store for proper conversation persistence - Fix Signal voice transcription: read attachments from ~/.local/share/signal-cli/attachments/ - Fix Telegram markdown ESM issue: make markdownToTelegramV2 async with dynamic import - Add transcription config to lettabot.yaml - Add extensive debug logging for queue and session processing Signal voice messages now properly transcribe and send to agent. 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * fix: update Signal CLI message sender to use daemon JSON-RPC API - Switch from signal-cli-rest-api to signal-cli daemon (port 8090) - Use JSON-RPC send method instead of REST /v2/send - Support group IDs with group: prefix - Handle 201 responses and empty bodies correctly 🐾 Generated with [Letta Code](https://letta.com) Co-Authored-By: Letta <noreply@letta.com> * Add placeholder for untranscribed voice messages on Signal If a voice-only message arrives and transcription fails or is disabled, forward a placeholder so the user knows the message was received. Written by Cameron ◯ Letta Code --------- Co-authored-by: Letta <noreply@letta.com>
2026-02-01 20:07:57 -08:00
parent cf1f03e0c7
commit 053763bf89
16 changed files with 511 additions and 773 deletions
--- a/src/channels/discord.ts
+++ b/src/channels/discord.ts
@@ -132,9 +132,34 @@ Ask the bot owner to approve with:
    this.client.on('messageCreate', async (message) => {
      if (message.author?.bot) return;

-      const content = (message.content || '').trim();
+      let content = (message.content || '').trim();
      const userId = message.author?.id;
      if (!userId) return;
+      
+      // Handle audio attachments
+      const audioAttachment = message.attachments.find(a => a.contentType?.startsWith('audio/'));
+      if (audioAttachment?.url) {
+        try {
+          const { loadConfig } = await import('../config/index.js');
+          const config = loadConfig();
+          if (!config.transcription?.apiKey && !process.env.OPENAI_API_KEY) {
+            await message.reply('Voice messages require OpenAI API key for transcription. See: https://github.com/letta-ai/lettabot#voice-messages');
+          } else {
+            // Download audio
+            const response = await fetch(audioAttachment.url);
+            const buffer = Buffer.from(await response.arrayBuffer());
+            
+            const { transcribeAudio } = await import('../transcription/index.js');
+            const ext = audioAttachment.contentType?.split('/')[1] || 'mp3';
+            const transcript = await transcribeAudio(buffer, audioAttachment.name || `audio.${ext}`);
+            
+            console.log(`[Discord] Transcribed audio: "${transcript.slice(0, 50)}..."`);
+            content = (content ? content + '\n' : '') + `[Voice message]: ${transcript}`;
+          }
+        } catch (error) {
+          console.error('[Discord] Error transcribing audio:', error);
+        }
+      }

      const access = await this.checkAccess(userId);
      if (access === 'blocked') {