-
-
Notifications
You must be signed in to change notification settings - Fork 39.9k
Description
Summary
Telegram voice messages (.ogg with Opus codec) are received by OpenClaw but are not automatically transcribed. The agent receives them as raw audio file attachments (<media:audio>) instead of transcribed text, despite tools.media.audio.enabled being set to true and valid transcription models configured.
Steps to reproduce
- Configure OpenClaw with Telegram bot integration
- Enable audio transcription:
openclaw config set tools.media.audio.enabled true openclaw config set tools.media.audio.models "[{'provider':'openai','model':'whisper-1'}]"
- Restart the gateway:
openclaw gateway restart - Send a voice message to the Telegram bot
- Observe the agent's input context
Attempted configurations:
openai/whisper-1(with and without HTTP proxy for regional block bypass)google-antigravity/gemini-3-flashopenrouter/openai/gpt-4o-audio-previewopenrouter/google/gemini-pro-1.5
All configurations showed the same behavior: no transcription occurred.
Expected behavior
When a voice message is sent via Telegram:
- OpenClaw should detect the audio file as a transcription candidate
- The configured transcription model should process the audio
- The agent should receive the transcribed text as part of the user message
- The original audio file may optionally be attached as context
Actual behavior
- The agent receives the audio file path and metadata:
[media attached: C:\Users\user\.openclaw\media\inbound\file_0---xxxxx.ogg (audio/ogg; codecs=opus)] <media:audio> - No transcription attempt is logged
- The agent has no access to the spoken content
- Gateway logs show normal message processing with no errors related to audio handling
Relevant log snippet:
2026-02-15T11:54:40.026Z debug agent/embedded embedded run start: runId=b5f52dd1... messageChannel=telegram
2026-02-15T11:54:46.252Z debug agent/embedded embedded run agent end
No audio processing or transcription API calls are logged.
OpenClaw version
2026.2.13 (203b5bd)
Operating system
windows 11
Install method
npm global
Logs, screenshots, and evidence
Impact and severity
No response
Additional information
Operating system
Windows 10 (Build 26100), x64
Node.js: v22.14.0
Shell: PowerShell
Additional context
- Telegram provider is working correctly for text messages
- The audio files are successfully downloaded to
media/inbound/ - File format:
audio/ogg; codecs=opus(standard Telegram voice message format) - The issue persists across multiple model providers and configurations
- Manual transcription via external tools (curl + OpenAI API) works with the same audio files when using an HTTP proxy
Possible cause
The Telegram channel integration may not be flagging voice messages for transcription, or the audio MIME type (audio/ogg; codecs=opus) may not be recognized as a transcription-eligible format by the media processing pipeline.
Workaround
None currently available. Users must type text messages instead of using voice.