fix(talk): prevent double TTS playback when system voice times out by hongsw · Pull Request #53511 · openclaw/openclaw

hongsw · 2026-03-24T07:32:06Z

Summary

Fix Talk Mode playing every assistant reply twice when using a non-ElevenLabs TTS provider
Fix CJK (Korean, Japanese, Chinese) system voice watchdog cutting off speech mid-sentence

Changes

1. Prevent double TTS playback (`TalkModeRuntime.swift`)

Split the ElevenLabs and system-voice error paths. Previously playAssistant() unconditionally retried playSystemVoice() in its catch block — even when system voice itself threw the error. Now system voice failures are logged without retrying; only ElevenLabs failures fall back to system voice.

2. Language-aware watchdog timeout (`TalkSystemSpeechSynthesizer.swift` + `TalkModeRuntime.swift`)

Two fixes:

a) Per-language timing estimates: The watchdog timer now uses per-language estimates based on Pellegrino et al. (2019) syllable-per-second research, adjusted for TTS synthesis speed:

Language	Per-char estimate	Min timeout	50 chars × 3x → watchdog
Korean	0.25s	10s	37.5s
Chinese	0.28s	10s	42.0s
Japanese	0.20s	10s	30.0s
English	0.08s	3s	12.0s

A 3x safety margin is applied so the watchdog only fires on genuine hangs where didFinish never arrives — not during normal speech.

b) App locale fallback: playSystemVoice was passing nil as language (ElevenLabs directive field) which caused TalkSystemSpeechSynthesizer to default to English timing (0.08s/char). Now falls back to the app's voice wake locale (e.g. ko-KR) so the correct language-specific timing is used.

Closes #53510

Test plan

Talk Mode with talk.provider = system — response plays once (not twice)
Korean long response (50+ chars) — plays to completion without timeout
English short response — still completes quickly
Verified watchdog uses correct language (not defaulting to English)
Pre-commit hooks pass (pnpm check)

🤖 Generated with Claude Code

greptile-apps · 2026-03-24T07:35:47Z

Greptile Summary

This PR fixes a double-TTS-playback bug in Talk Mode that affected non-ElevenLabs providers. When TalkSystemSpeechSynthesizer timed out mid-speech, the catch block in playAssistant unconditionally retried playSystemVoice, playing the response a second time. The fix introduces a useElevenLabs flag so the fallback path is only taken when ElevenLabs was the active provider.

Two complementary improvements ship alongside the bug fix:

synthTimeoutSeconds (already computed from text length) is now forwarded to TalkSystemSpeechSynthesizer.speak, so the dynamically-scaled timeout actually reaches the synthesiser and prevents spurious watchdog fires on long responses.
reloadConfig enforces a 2 000 ms minimum silence window for CJK locales (ko, ja, zh), which avoids premature silence detection in those languages.

Key observations:

The useElevenLabs flag correctly mirrors the original if let apiKey / let voiceId guard — behavior is equivalent for all existing provider configurations.
Force-unwrapping input.apiKey! and input.voiceId! on line 457 is safe because useElevenLabs is only true when both are non-nil, but the pattern is non-idiomatic Swift; a guard let or optional-chaining form would be clearer.
The CJK minimum silence override (max(configuredSilenceMs, 2000)) silently clamps an explicit user setting; this is a UX tradeoff, not a bug.

Confidence Score: 5/5

Safe to merge — targeted fix for a well-understood regression with no behavioral changes to unaffected code paths.
The primary bug fix is straightforward and directly addresses the root cause described in the PR. The useElevenLabs flag faithfully replicates the original conditional guard, the synthTimeoutSeconds forwarding removes the underlying trigger for spurious catch-block retries, and the CJK silence floor is an isolated additive change. No existing code paths are altered for ElevenLabs users. The only nits are non-idiomatic force-unwraps that are statically safe.
No files require special attention.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: apps/macos/Sources/OpenClaw/TalkModeRuntime.swift
Line: 454-457

Comment:
**Non-idiomatic force-unwraps inside `useElevenLabs` guard**

`input.apiKey!` on the same line as the `!= nil` check, and both `input.apiKey!` / `input.voiceId!` on line 457, are safe because `useElevenLabs` can only be `true` when both are non-nil — but the pattern is non-idiomatic Swift and slightly obscures the intent.

A `guard let` at the call site would be more conventional and removes the forced unwraps entirely:

```suggestion
        let useElevenLabs = !(input.apiKey?.isEmpty ?? true) && input.voiceId != nil
```

And in the `if useElevenLabs` branch on line 456–457, you could either shadow-bind there or just rely on the already-safe `!` given the guard. Either way, a small refactor would make the intent clearer for future readers.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(talk): prevent double TTS playback w..." | Re-trigger Greptile}

greptile-apps · 2026-03-24T07:35:52Z

apps/macos/Sources/OpenClaw/TalkModeRuntime.swift

+        let useElevenLabs = input.apiKey != nil && !input.apiKey!.isEmpty && input.voiceId != nil
        do {
-            if let apiKey = input.apiKey, !apiKey.isEmpty, let voiceId = input.voiceId {
-                try await self.playElevenLabs(input: input, apiKey: apiKey, voiceId: voiceId)
+            if useElevenLabs {
+                try await self.playElevenLabs(input: input, apiKey: input.apiKey!, voiceId: input.voiceId!)


Non-idiomatic force-unwraps inside useElevenLabs guard

input.apiKey! on the same line as the != nil check, and both input.apiKey! / input.voiceId! on line 457, are safe because useElevenLabs can only be true when both are non-nil — but the pattern is non-idiomatic Swift and slightly obscures the intent.

A guard let at the call site would be more conventional and removes the forced unwraps entirely:

Suggested change

let useElevenLabs = input.apiKey != nil && !input.apiKey!.isEmpty && input.voiceId != nil

do {

if let apiKey = input.apiKey, !apiKey.isEmpty, let voiceId = input.voiceId {

try await self.playElevenLabs(input: input, apiKey: apiKey, voiceId: voiceId)

if useElevenLabs {

try await self.playElevenLabs(input: input, apiKey: input.apiKey!, voiceId: input.voiceId!)

let useElevenLabs = !(input.apiKey?.isEmpty ?? true) && input.voiceId != nil

And in the if useElevenLabs branch on line 456–457, you could either shadow-bind there or just rely on the already-safe ! given the guard. Either way, a small refactor would make the intent clearer for future readers.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/macos/Sources/OpenClaw/TalkModeRuntime.swift Line: 454-457 Comment: **Non-idiomatic force-unwraps inside `useElevenLabs` guard** `input.apiKey!` on the same line as the `!= nil` check, and both `input.apiKey!` / `input.voiceId!` on line 457, are safe because `useElevenLabs` can only be `true` when both are non-nil — but the pattern is non-idiomatic Swift and slightly obscures the intent. A `guard let` at the call site would be more conventional and removes the forced unwraps entirely: ```suggestion let useElevenLabs = !(input.apiKey?.isEmpty ?? true) && input.voiceId != nil ``` And in the `if useElevenLabs` branch on line 456–457, you could either shadow-bind there or just rely on the already-safe `!` given the guard. Either way, a small refactor would make the intent clearer for future readers. How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Addressed — the current revision uses if let apiKey = input.apiKey, !apiKey.isEmpty, let voiceId = input.voiceId to bind safely without force-unwraps. No useElevenLabs boolean or ! operators remain.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f91bbef82e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T07:36:17Z

apps/macos/Sources/OpenClaw/TalkModeRuntime.swift

        try await TalkSystemSpeechSynthesizer.shared.speak(
            text: input.cleanedText,
-            language: input.language)
+            language: input.language,
+            timeout: input.synthTimeoutSeconds)


Remove unsupported timeout arg from system TTS call

TalkSystemSpeechSynthesizer.shared.speak is called with timeout: here, but the OpenClawKit API only defines speak(text:language:onStart:) (see apps/shared/OpenClawKit/Sources/OpenClawKit/TalkSystemSpeechSynthesizer.swift). In macOS builds this produces an extra argument 'timeout' compile error, so the talk-mode fix cannot ship until this call matches the current method signature (or the API is updated in the same commit).

Useful? React with 👍 / 👎.

Addressed — the timeout: parameter has been removed from the playSystemVoice call. The watchdog timeout is now handled entirely inside TalkSystemSpeechSynthesizer using language-aware per-character estimates. No compile error.

fabianwilliams

This appears to be a subset of #53553 — the two bug fix commits here (031fea0, 64768f0) are identical in both PRs. Are you planning to close this one in favor of #53553, or would you prefer to land the bug fixes separately here and rebase #53553 on top?

hongsw · 2026-03-24T12:47:53Z

Good catch — yes, we'll land the bug fixes here separately first, then rebase #53553 on top so the feature commits (system sounds, right Shift interrupt, CJK silence timeout) stack cleanly.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 436812dcde

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T22:16:22Z

apps/macos/Sources/OpenClaw/TalkModeRuntime.swift

+        let ttsLanguage = input.language ?? appLocale
        try await TalkSystemSpeechSynthesizer.shared.speak(
            text: input.cleanedText,
-            language: input.language)
+            language: ttsLanguage)


Preserve default system voice when no directive language is set

Passing voiceWakeLocaleID as language here changes system-TTS behavior, not just watchdog timing: replies without a lang directive no longer use the macOS default voice and instead force the recognition locale. In setups where wake-word recognition locale differs from preferred speech voice, this makes Talk mode speak in the wrong language/voice for every system-provider reply. This regression comes from using locale fallback in the synthesis language: argument rather than only for timeout estimation.

Useful? React with 👍 / 👎.

When using a non-ElevenLabs TTS provider, playAssistant() would unconditionally fall back to playSystemVoice() on any error — even when system voice itself threw the error. This caused the response to play twice: once during the initial attempt and again in the catch block. Split the ElevenLabs and system-voice paths so that system voice failures are logged without retrying. ElevenLabs failures still fall back to system voice as before. Closes openclaw#53510 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

CJK languages have different speech rates per character. Using a flat 0.08s/char estimate caused premature watchdog timeout for Korean and Chinese text. Based on Pellegrino et al. (2011) syllable-per-second research (https://www.science.org/doi/10.1126/sciadv.aaw2594), adjusted for TTS synthesis speed: - Korean: 0.25s/char (1 syllable block per char, 5.96 SPS) - Chinese: 0.28s/char (1 syllable per hanzi, tonal, 5.18 SPS) - Japanese: 0.20s/char (kana/kanji blended average, 7.84 SPS) - English: 0.08s/char (avg ~5 chars/syllable, 6.19 SPS) Apply 3x safety margin on top, so the watchdog only fires on genuine hangs — not during normal speech. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Update Pellegrino et al. reference from 2011 to 2019 (matching the DOI), and align per-char timing values in comments to match actual code values (Japanese 0.20, Korean 0.25, Chinese 0.28). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2fda4b70ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T22:25:32Z

apps/shared/OpenClawKit/Tests/OpenClawKitTests/TalkSystemSpeechSynthesizerTests.swift

+final class TalkSystemSpeechSynthesizerTests: XCTestCase {
+    func testWatchdogTimeoutDefaultsToLatinProfile() {
+        let timeout = TalkSystemSpeechSynthesizer.watchdogTimeoutSeconds(
+            text: String(repeating: "a", count: 100),


Make watchdog timeout tests actor-safe

TalkSystemSpeechSynthesizer is @MainActor, so watchdogTimeoutSeconds is also main-actor isolated, but this synchronous XCTest method calls it from a nonisolated context. With Swift 6 strict concurrency enabled for OpenClawKitTests, this produces an actor-isolation compile error (call to main actor-isolated static method ... in a synchronous nonisolated context), so the new test target can fail to build until the call site is @MainActor (or the helper is marked nonisolated).

Useful? React with 👍 / 👎.

grp06 · 2026-03-26T22:38:04Z

Merged via squash.

Prepared head SHA: 864d556fa66d6d1021e081b3694aa68e225839b7
Merge commit: 138a92373b6b7768010e07a3ec80f1ca462660ab

Thanks @hongsw!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 864d556fa6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T22:41:35Z

apps/macos/Sources/OpenClaw/TalkModeRuntime.swift

+        let appLocale = await MainActor.run { AppStateStore.shared.voiceWakeLocaleID }
+        let ttsLanguage = input.language ?? appLocale


Keep system voice fallback from forcing a locale voice

playSystemVoice now always supplies input.language ?? AppStateStore.shared.voiceWakeLocaleID to TalkSystemSpeechSynthesizer.speak, and speak sets utterance.voice whenever language is non-nil. In the common case where no lang directive is present, this changes behavior from “use macOS default voice” to “force wake-locale voice”, so users with different recognition and preferred TTS locales will hear the wrong voice/language on every system-voice reply; the locale fallback should be used for timeout estimation only, not passed as synthesis language.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-26T22:41:35Z

apps/shared/OpenClawKit/Tests/OpenClawKitTests/TalkSystemSpeechSynthesizerTests.swift

+        let timeout = TalkSystemSpeechSynthesizer.watchdogTimeoutSeconds(
+            text: String(repeating: "a", count: 100),
+            language: nil)


Mark watchdog timeout tests as main-actor isolated

TalkSystemSpeechSynthesizer is @MainActor, so watchdogTimeoutSeconds is main-actor isolated; these synchronous XCTest methods call it directly from a nonisolated context. With StrictConcurrency enabled in apps/shared/OpenClawKit/Package.swift, this becomes a Swift 6 actor-isolation compile error and can fail the OpenClawKitTests target build until the tests (or helper API) are explicitly main-actor safe.

Useful? React with 👍 / 👎.

@grp06

…penclaw#53511) Merged via squash. Prepared head SHA: 864d556 Co-authored-by: hongsw <[email protected]> Co-authored-by: grp06 <[email protected]> Reviewed-by: @grp06

@grp06

…penclaw#53511) Merged via squash. Prepared head SHA: 864d556 Co-authored-by: hongsw <[email protected]> Co-authored-by: grp06 <[email protected]> Reviewed-by: @grp06

@grp06

…penclaw#53511) Merged via squash. Prepared head SHA: 864d556 Co-authored-by: hongsw <[email protected]> Co-authored-by: grp06 <[email protected]> Reviewed-by: @grp06

@grp06

…penclaw#53511) Merged via squash. Prepared head SHA: 864d556 Co-authored-by: hongsw <[email protected]> Co-authored-by: grp06 <[email protected]> Reviewed-by: @grp06

@grp06

…penclaw#53511) Merged via squash. Prepared head SHA: 864d556 Co-authored-by: hongsw <[email protected]> Co-authored-by: grp06 <[email protected]> Reviewed-by: @grp06

openclaw-barnacle bot added app: macos App: macos size: XS labels Mar 24, 2026

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

hongsw force-pushed the fix/talk-double-playback branch 2 times, most recently from 1f41d36 to 21ee726 Compare March 24, 2026 08:02

openclaw-barnacle bot added size: S and removed size: XS labels Mar 24, 2026

hongsw force-pushed the fix/talk-double-playback branch from 21ee726 to 64768f0 Compare March 24, 2026 08:14

fabianwilliams reviewed Mar 24, 2026

View reviewed changes

This comment was marked as duplicate.

Sign in to view

hongsw mentioned this pull request Mar 24, 2026

fix(talk): Talk Mode TTS improvements for CJK languages #53553

Open

14 tasks

grp06 self-assigned this Mar 26, 2026

grp06 force-pushed the fix/talk-double-playback branch from 22a364b to 436812d Compare March 26, 2026 22:11

grp06 added a commit to hongsw/openclaw that referenced this pull request Mar 26, 2026

fix: add talk regression coverage openclaw#53511 thanks @hongsw

31aa173

grp06 added a commit to hongsw/openclaw that referenced this pull request Mar 26, 2026

fix: move changelog entry openclaw#53511 thanks @hongsw

436812d

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

grp06 added a commit to hongsw/openclaw that referenced this pull request Mar 26, 2026

fix: add talk regression coverage openclaw#53511 thanks @hongsw

8b56fc6

grp06 added a commit to hongsw/openclaw that referenced this pull request Mar 26, 2026

fix: move changelog entry openclaw#53511 thanks @hongsw

4312643

grp06 force-pushed the fix/talk-double-playback branch from 436812d to 4312643 Compare March 26, 2026 22:18

grp06 added a commit to hongsw/openclaw that referenced this pull request Mar 26, 2026

fix: add talk regression coverage openclaw#53511 thanks @hongsw

33c43f4

grp06 added a commit to hongsw/openclaw that referenced this pull request Mar 26, 2026

fix: move changelog entry openclaw#53511 thanks @hongsw

2fda4b7

grp06 force-pushed the fix/talk-double-playback branch from 4312643 to 2fda4b7 Compare March 26, 2026 22:19

hongsw and others added 5 commits March 26, 2026 15:22

fix: add talk regression coverage openclaw#53511 thanks @hongsw

bca75fd

fix: move changelog entry openclaw#53511 thanks @hongsw

864d556

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

grp06 force-pushed the fix/talk-double-playback branch from 2fda4b7 to 864d556 Compare March 26, 2026 22:36

grp06 merged commit 138a923 into openclaw:main Mar 26, 2026
24 of 27 checks passed

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

		let appLocale = await MainActor.run { AppStateStore.shared.voiceWakeLocaleID }
		let ttsLanguage = input.language ?? appLocale

Uh oh!

Conversation

hongsw commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Prevent double TTS playback (TalkModeRuntime.swift)

2. Language-aware watchdog timeout (TalkSystemSpeechSynthesizer.swift + TalkModeRuntime.swift)

Test plan

Uh oh!

greptile-apps bot commented Mar 24, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

greptile-apps bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

hongsw Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

hongsw Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

fabianwilliams left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

hongsw commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

grp06 commented Mar 26, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hongsw commented Mar 24, 2026 •

edited

Loading

1. Prevent double TTS playback (`TalkModeRuntime.swift`)

2. Language-aware watchdog timeout (`TalkSystemSpeechSynthesizer.swift` + `TalkModeRuntime.swift`)