Skip to content

[Bug]: Telegram Voice Messages Not Transcribed #17101

@gitmeatarru

Description

@gitmeatarru

Summary

Telegram voice messages (.ogg with Opus codec) are received by OpenClaw but are not automatically transcribed. The agent receives them as raw audio file attachments (<media:audio>) instead of transcribed text, despite tools.media.audio.enabled being set to true and valid transcription models configured.

Steps to reproduce

  1. Configure OpenClaw with Telegram bot integration
  2. Enable audio transcription:
    openclaw config set tools.media.audio.enabled true
    openclaw config set tools.media.audio.models "[{'provider':'openai','model':'whisper-1'}]"
  3. Restart the gateway: openclaw gateway restart
  4. Send a voice message to the Telegram bot
  5. Observe the agent's input context

Attempted configurations:

  • openai/whisper-1 (with and without HTTP proxy for regional block bypass)
  • google-antigravity/gemini-3-flash
  • openrouter/openai/gpt-4o-audio-preview
  • openrouter/google/gemini-pro-1.5

All configurations showed the same behavior: no transcription occurred.

Expected behavior

When a voice message is sent via Telegram:

  1. OpenClaw should detect the audio file as a transcription candidate
  2. The configured transcription model should process the audio
  3. The agent should receive the transcribed text as part of the user message
  4. The original audio file may optionally be attached as context

Actual behavior

  • The agent receives the audio file path and metadata:
    [media attached: C:\Users\user\.openclaw\media\inbound\file_0---xxxxx.ogg (audio/ogg; codecs=opus)]
    <media:audio>
    
  • No transcription attempt is logged
  • The agent has no access to the spoken content
  • Gateway logs show normal message processing with no errors related to audio handling

Relevant log snippet:

2026-02-15T11:54:40.026Z debug agent/embedded embedded run start: runId=b5f52dd1... messageChannel=telegram
2026-02-15T11:54:46.252Z debug agent/embedded embedded run agent end

No audio processing or transcription API calls are logged.

OpenClaw version

2026.2.13 (203b5bd)

Operating system

windows 11

Install method

npm global

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

Operating system

Windows 10 (Build 26100), x64
Node.js: v22.14.0
Shell: PowerShell

Additional context

  • Telegram provider is working correctly for text messages
  • The audio files are successfully downloaded to media/inbound/
  • File format: audio/ogg; codecs=opus (standard Telegram voice message format)
  • The issue persists across multiple model providers and configurations
  • Manual transcription via external tools (curl + OpenAI API) works with the same audio files when using an HTTP proxy

Possible cause

The Telegram channel integration may not be flagging voice messages for transcription, or the audio MIME type (audio/ogg; codecs=opus) may not be recognized as a transcription-eligible format by the media processing pipeline.

Workaround

None currently available. Users must type text messages instead of using voice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions