Skip to content

fix(discord): resample audio to 48kHz for voice messages#32298

Merged
steipete merged 1 commit intoopenclaw:mainfrom
kevinWangSheng:fix/discord-voice-24khz
Mar 3, 2026
Merged

fix(discord): resample audio to 48kHz for voice messages#32298
steipete merged 1 commit intoopenclaw:mainfrom
kevinWangSheng:fix/discord-voice-24khz

Conversation

@kevinWangSheng
Copy link
Copy Markdown
Contributor

Fixes #32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audio at 24kHz, Discord voice messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus function, ensuring voice messages always play at correct speed regardless of the input audio's sample rate.

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot bot commented Mar 2, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🔵 Low Unbounded ffmpeg processing in voice message conversion can cause resource-exhaustion (DoS)

1. 🔵 Unbounded ffmpeg processing in voice message conversion can cause resource-exhaustion (DoS)

Property Value
Severity Low
CWE CWE-400
Location src/discord/voice-message.ts:190-201

Description

ensureOggOpus() invokes ffmpeg on attacker-controlled audio content without any execution timeout or duration/size limits.

Impact:

  • A crafted/very long audio file (even if relatively small on disk due to low bitrate) can trigger very long transcoding time and high CPU usage.
  • Because execFile is used without a timeout, ffmpeg can run indefinitely (or for an attacker-chosen duration), tying up worker capacity.

Vulnerable code:

await execFileAsync("ffmpeg", [
  "-y",
  "-i",
  filePath,
  "-ar",
  "48000",
  "-c:a",
  "libopus",
  "-b:a",
  "64k",
  outputPath,
]);

Recommendation

Add hard limits around transcoding:

  1. Enforce a maximum input duration (and/or reject unusually long audio early):
const duration = await getAudioDuration(filePath);
if (duration > 120) throw new Error("Voice messages must be <= 120s");
  1. Apply a timeout to the ffmpeg invocation (either by passing timeout to execFile, or by switching to the existing runCommandWithTimeout() helper):
import { runCommandWithTimeout } from "../process/exec.js";

await runCommandWithTimeout([
  "ffmpeg",
  "-y",
  "-i", filePath,
  "-t", "120",           // cap work
  "-ar", "48000",
  "-c:a", "libopus",
  "-b:a", "64k",
  outputPath,
], { timeoutMs: 30_000 });

Optionally also reduce worst-case probing costs:

  • Use -analyzeduration / -probesize caps
  • Disable non-audio streams with -vn -sn -dn

Analyzed PR: #32298 at commit 09ec684

Last updated on: 2026-03-02T23:47:23Z

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR fixes half-speed Discord voice message playback (issue #32293) by adding -ar 48000 to the ffmpeg command in ensureOggOpus, ensuring audio is resampled to 48kHz before Opus encoding. The fix is correct and sufficient for the most common case (non-OGG or OGG-Vorbis inputs).

Key concern:

  • The early-return path for files already in OGG/Opus format (lines 162–181) returns the file as-is without checking its declared sample rate. A 24kHz OGG/Opus file — which some TTS providers can produce natively — would bypass the new resampling step entirely and still play at half speed in Discord. The ffprobe call at that path should be extended to also verify the sample rate is 48kHz, falling through to full conversion if it is not.

Confidence Score: 3/5

  • Safe to merge with the caveat that the fix is incomplete for 24kHz OGG/Opus inputs.
  • The added -ar 48000 flag correctly fixes the reported issue for the majority of TTS inputs (e.g., WAV/MP3/OGG-Vorbis at 24kHz). However, the OGG/Opus early-return path still skips resampling regardless of the file's actual sample rate, meaning the same playback bug can still occur when the input is already OGG/Opus-encoded at a non-48kHz rate. The fix is not fully complete.
  • src/discord/voice-message.ts — the OGG/Opus early-return block (lines 162–181) needs a sample-rate check.

Last reviewed commit: 09ec684

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 2, 2026

Additional Comments (1)

src/discord/voice-message.ts
Early-return bypasses 48kHz resampling for existing OGG/Opus files

The fix correctly adds -ar 48000 to the conversion path, but the early-return for files that are already OGG/Opus at line 176-178 still returns the file unchanged — without verifying its sample rate. If a TTS provider outputs OGG/Opus natively at 24kHz, this path will skip resampling and Discord will still play the audio at half speed, because the OGG header declares 24kHz and Discord honours that declared rate.

To be fully consistent with the fix's intent, the OGG/Opus fast-path should also verify the stream's sample rate with ffprobe and fall through to conversion when it isn't 48kHz:

if (ext === ".ogg") {
  try {
    const { stdout: codecOut } = await execFileAsync("ffprobe", [
      "-v", "error",
      "-select_streams", "a:0",
      "-show_entries", "stream=codec_name,sample_rate",
      "-of", "csv=p=0",
      filePath,
    ]);
    const [codec, sampleRate] = codecOut.trim().toLowerCase().split(",");
    if (codec === "opus" && sampleRate === "48000") {
      return { path: filePath, cleanup: false };
    }
    // else: fall through to conversion (wrong codec or wrong sample rate)
  } catch {
    // If probe fails, convert anyway
  }
}

Without this change the bug described in the linked issue can still surface whenever the input is already in OGG/Opus container but encoded at a sample rate other than 48kHz.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/discord/voice-message.ts
Line: 162-181

Comment:
**Early-return bypasses 48kHz resampling for existing OGG/Opus files**

The fix correctly adds `-ar 48000` to the conversion path, but the early-return for files that are already OGG/Opus at line 176-178 still returns the file unchanged — without verifying its sample rate. If a TTS provider outputs OGG/Opus natively at 24kHz, this path will skip resampling and Discord will still play the audio at half speed, because the OGG header declares 24kHz and Discord honours that declared rate.

To be fully consistent with the fix's intent, the OGG/Opus fast-path should also verify the stream's sample rate with `ffprobe` and fall through to conversion when it isn't 48kHz:

```typescript
if (ext === ".ogg") {
  try {
    const { stdout: codecOut } = await execFileAsync("ffprobe", [
      "-v", "error",
      "-select_streams", "a:0",
      "-show_entries", "stream=codec_name,sample_rate",
      "-of", "csv=p=0",
      filePath,
    ]);
    const [codec, sampleRate] = codecOut.trim().toLowerCase().split(",");
    if (codec === "opus" && sampleRate === "48000") {
      return { path: filePath, cleanup: false };
    }
    // else: fall through to conversion (wrong codec or wrong sample rate)
  } catch {
    // If probe fails, convert anyway
  }
}
```

Without this change the bug described in the linked issue can still surface whenever the input is already in OGG/Opus container but encoded at a sample rate other than 48kHz.

How can I resolve this? If you propose a fix, please make it concise.

@steipete steipete merged commit 924d9e3 into openclaw:main Mar 3, 2026
30 checks passed
@steipete
Copy link
Copy Markdown
Contributor

steipete commented Mar 3, 2026

Landed.

  • Gate: pnpm vitest run src/discord/voice/command.test.ts src/discord/send.sends-basic-channel-messages.test.ts src/discord/send.components.test.ts
  • Merge commit: 924d9e3

Thanks @kevinWangSheng!

dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026
)

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.

Co-authored-by: Kevin Shenghui <[email protected]>
OWALabuy pushed a commit to kcinzgg/openclaw that referenced this pull request Mar 4, 2026
)

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.

Co-authored-by: Kevin Shenghui <[email protected]>
AytuncYildizli pushed a commit to AytuncYildizli/openclaw that referenced this pull request Mar 4, 2026
)

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.

Co-authored-by: Kevin Shenghui <[email protected]>
zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
)

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.

Co-authored-by: Kevin Shenghui <[email protected]>
V-Gutierrez pushed a commit to V-Gutierrez/openclaw-vendor that referenced this pull request Mar 17, 2026
)

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.

Co-authored-by: Kevin Shenghui <[email protected]>
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 19, 2026
)

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.

Co-authored-by: Kevin Shenghui <[email protected]>
(cherry picked from commit 924d9e3)
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 19, 2026
)

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.

Co-authored-by: Kevin Shenghui <[email protected]>
(cherry picked from commit 924d9e3)
ephb pushed a commit to ephb-bot/openclaw that referenced this pull request Mar 19, 2026
)

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.

Co-authored-by: Kevin Shenghui <[email protected]>
lukeg826 pushed a commit to lukeg826/openclaw that referenced this pull request Mar 26, 2026
)

Fixes openclaw#32293: Discord voice message plays at ~0.5x speed with 24kHz TTS source

When TTS providers (like mlx-audio Qwen3-TTS) output audioHz,
Discord voice at 24k messages play at half speed because Discord expects 48kHz.

This fix adds explicit sample rate conversion to 48kHz in the ensureOggOpus
function, ensuring voice messages always play at correct speed regardless
of the input audio's sample rate.

Co-authored-by: Kevin Shenghui <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discord voice message plays at ~0.5x speed with 24kHz TTS source (mlx-audio Qwen3-TTS)

2 participants