Skip to content

Discord voice messages not transcribed (audio pipeline not triggered) #30034

@xandorklein

Description

@xandorklein

Bug / Feature Gap

Environment: OpenClaw latest (npm), macOS, Discord channel with bot API

Audio config:

{
  "provider": "openai",
  "model": "gpt-4o-mini-transcribe"
}

With whisper-cli fallback. Works perfectly for Telegram voice notes.

Problem: Discord voice messages are received by the agent but the audio transcription pipeline is never triggered. Zero audio/transcription log entries appear. The agent receives the message but cannot access or transcribe the audio content — it only sees an attachment reference.

Expected: Discord voice messages (OGG/Opus attachments with flags & (1 << 13) aka IS_VOICE_MESSAGE) should be detected as audio and routed through tools.media.audio.models for transcription, same as Telegram voice notes.

Evidence from logs:

  • Telegram VM at same time: transcribed successfully via OpenAI, 🎤 Heard: prefix in message
  • Discord VM at same time: no transcription logs, agent responds "Can't hear audio"
  • Audio file saved to ~/.openclaw/media/inbound/ for Telegram but not for Discord

Workaround: Send voice messages via Telegram instead of Discord, or type messages out.

Discord voice message format: Since 2023, Discord supports voice messages as message attachments with content_type audio/ogg and the IS_VOICE_MESSAGE flag (bit 13) on the attachment. They also include a waveform field and duration_secs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions