Skip to content

fix: validate Edge TTS output file is non-empty before reporting success#43273

Open
howardpen9 wants to merge 1 commit intoopenclaw:mainfrom
howardpen9:fix/edge-tts-empty-audio-validation
Open

fix: validate Edge TTS output file is non-empty before reporting success#43273
howardpen9 wants to merge 1 commit intoopenclaw:mainfrom
howardpen9:fix/edge-tts-empty-audio-validation

Conversation

@howardpen9
Copy link
Copy Markdown

Summary

  • Add a file-size check after tts.ttsPromise() resolves in edgeTTS() — if the output is 0 bytes, throw an error so the provider-fallback loop in textToSpeech() tries the next provider instead of delivering an empty audio file
  • Add unit tests for the empty-file and valid-file cases

Root Cause

node-edge-tts's ttsPromise() creates a WriteStream, connects to Bing's TTS WebSocket, and resolves the promise when it receives turn.end. However, if the service sends turn.end without any preceding audio frames (e.g. due to rate-limiting, unsupported voice/format, or transient errors), the promise still resolves successfully — leaving a 0-byte file on disk.

textToSpeech() then treats this as a successful result and delivers the empty file to the channel (e.g. Telegram voice message with no audio).

Fix

A single statSync check after ttsPromise resolves:

const { size } = statSync(outputPath);
if (size === 0) {
  throw new Error("Edge TTS produced empty audio file");
}

This allows the existing provider-fallback mechanism to kick in and try OpenAI/ElevenLabs TTS instead.

Test plan

  • New test: edgeTTS throws when output file is 0 bytes
  • New test: edgeTTS succeeds when output file has content
  • All 26 existing TTS tests pass unchanged

Closes #43229

The `edgeTTS()` function calls `tts.ttsPromise()` from node-edge-tts,
which resolves successfully even when the Bing TTS service sends
`turn.end` without any audio frames. This results in a 0-byte MP3 file
being returned as a successful TTS result.

Add a `statSync` check after `ttsPromise` resolves: if the output file
is 0 bytes, throw so the provider-fallback loop in `textToSpeech()` can
try the next provider (OpenAI / ElevenLabs) instead of delivering an
empty audio file.

Closes openclaw#43229
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR fixes a real bug where node-edge-tts's ttsPromise could resolve successfully while producing a 0-byte output file (e.g. when the Bing TTS WebSocket sends turn.end without any preceding audio frames). The fix adds a statSync size check immediately after ttsPromise resolves and throws if the file is empty, which correctly feeds into the existing provider-fallback loop in textToSpeech so OpenAI or ElevenLabs TTS can be tried instead.

  • The core fix in tts-core.ts is minimal and correct; the new error is properly caught by the outer try/catch in tts.ts for both the primary and fallback Edge output-format attempts.
  • Two new unit tests in edge-tts-validation.test.ts cover the empty-file and non-empty-file cases; mock placement (before the dynamic import) is correct.
  • Minor concern: statSync is synchronous and will throw ENOENT with a confusing error message if ttsPromise ever resolves without creating the output file, rather than the intended "Edge TTS produced empty audio file" message. Wrapping the call in a try/catch (or switching to the async stat) would make this more robust.

Confidence Score: 4/5

  • This PR is safe to merge; it fixes a real silent-failure bug with a minimal, well-tested change.
  • The fix correctly addresses the root cause and integrates cleanly with the existing fallback mechanism. Tests are present and cover the new code paths. The only concern is a minor robustness gap where statSync could throw ENOENT (instead of the cleaner custom error) if the output file was never created, and the use of a synchronous FS call inside an async function. Neither issue affects correctness of the happy path or the fallback behavior.
  • No files require special attention.

Last reviewed commit: 29cac6a

Comment on lines +677 to +680
const { size } = statSync(outputPath);
if (size === 0) {
throw new Error("Edge TTS produced empty audio file");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use async stat and guard against missing file

statSync blocks the event loop. While this is a minor concern after an async network call, it is still worth using the async variant to stay consistent with the surrounding async function.

More importantly, if ttsPromise somehow resolves without ever creating the file (e.g. a future node-edge-tts version that skips creating the WriteStream on certain error paths), statSync will throw an ENOENT error whose message will surface in the provider-fallback error log instead of the clearer "Edge TTS produced empty audio file" message.

Suggested change
const { size } = statSync(outputPath);
if (size === 0) {
throw new Error("Edge TTS produced empty audio file");
}
let size = 0;
try {
({ size } = statSync(outputPath));
} catch {
// File was never created — treat the same as an empty file.
}
if (size === 0) {
throw new Error("Edge TTS produced empty audio file");
}

Alternatively, switch to import { stat } from "node:fs/promises" and await stat(outputPath) with the same try/catch pattern.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/tts-core.ts
Line: 677-680

Comment:
**Use async `stat` and guard against missing file**

`statSync` blocks the event loop. While this is a minor concern after an async network call, it is still worth using the async variant to stay consistent with the surrounding `async` function.

More importantly, if `ttsPromise` somehow resolves without ever creating the file (e.g. a future `node-edge-tts` version that skips creating the `WriteStream` on certain error paths), `statSync` will throw an `ENOENT` error whose message will surface in the provider-fallback error log instead of the clearer `"Edge TTS produced empty audio file"` message.

```suggestion
  let size = 0;
  try {
    ({ size } = statSync(outputPath));
  } catch {
    // File was never created — treat the same as an empty file.
  }
  if (size === 0) {
    throw new Error("Edge TTS produced empty audio file");
  }
```

Alternatively, switch to `import { stat } from "node:fs/promises"` and `await stat(outputPath)` with the same try/catch pattern.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29cac6a5e2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +677 to +679
const { size } = statSync(outputPath);
if (size === 0) {
throw new Error("Edge TTS produced empty audio file");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Wait for file flush before rejecting zero-byte Edge output

This synchronous statSync check can misclassify successful Edge TTS calls as failures when ttsPromise() resolves before the underlying write stream has fully flushed to disk (a realistic timing for async createWriteStream writes). In that case size is still 0 (or transiently not finalized), so we throw and trigger provider fallback or a hard failure in Edge-only setups, even though valid audio is written moments later; the check needs a short readiness wait/retry (or a completion signal tied to file flush) instead of a single immediate stat.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] TTS tool generates empty (0-byte) MP3 files

1 participant