fix(talk): Talk Mode TTS improvements for CJK languages by hongsw · Pull Request #53553 · openclaw/openclaw

hongsw · 2026-03-24T08:48:51Z

Depends on #53511 — the first two bug fix commits here are shared with that PR. Please merge #53511 first, then this PR can be rebased to include only the feature commits.

Summary

A collection of Talk Mode improvements focused on CJK language support and user experience:

Fix Talk Mode playing every assistant reply twice when using a non-ElevenLabs TTS provider
Fix CJK (Korean, Japanese, Chinese) system voice watchdog cutting off speech mid-sentence
Add audible phase-transition feedback (system sounds) with mute toggle in settings
Add Right Option key to interrupt speech, with toggle in settings
Increase silence detection timeout for CJK locales
Disable Push-to-Talk while Talk Mode is active to prevent key conflict
Remove non-idiomatic force-unwraps and add CJK silence clamp logging in reloadConfig
Correct Pellegrino et al. citation year (2011 → 2019) and align comment values with code

Changes

1. Prevent double TTS playback (`TalkModeRuntime.swift`)

Split the ElevenLabs and system-voice error paths. Previously playAssistant() unconditionally retried playSystemVoice() in its catch block — even when system voice itself threw the error.

2. Language-aware watchdog timeout (`TalkSystemSpeechSynthesizer.swift`)

Per-language timing estimates based on Pellegrino et al. (2019) with 3x safety margin:

Language	Per-char	Min timeout	50 chars × 3x → watchdog
Korean	0.25s	10s	37.5s
Chinese	0.28s	10s	42.0s
Japanese	0.20s	10s	30.0s
English	0.08s	3s	12.0s

Also fixes playSystemVoice passing nil language, which caused the watchdog to always use English timing.

3. Phase-transition system sounds (`TalkModeController.swift`)

Distinct audible feedback for each Talk Mode phase:

Thinking: Tink
Speaking: Pop
Listening (after interrupt): Bottle
Listening (after thinking): Submarine

Users can disable sounds via Settings → Voice Wake → "Play phase-transition sounds" checkbox.

4. Right Option key interrupt (`TalkSpeechInterruptMonitor.swift`)

New TalkSpeechInterruptMonitor — a dedicated global key monitor (independent of Push-to-Talk) that listens for right Option (keyCode 61) to stop the current response. Talk Mode returns to listening for the next conversation.

Users can disable via Settings → Voice Wake → "Press Right Option to stop speech" checkbox.

Push-to-Talk is automatically disabled while Talk Mode is active (both use the right Option key). A hint is shown in Settings when this applies.

5. CJK silence detection (`TalkModeRuntime.swift`)

Enforce minimum 2000ms silence window for CJK locales (vs default 1500ms) to avoid premature transcript submission mid-phrase. Logs when the clamp is applied.

6. Config cleanup (`TalkModeRuntime.swift`)

Replace force-unwraps (cfg.voiceId!, cfg.modelId!) with safe flatMap unwrapping
Log when CJK locale clamps silence timeout so the override is observable in diagnostics

7. Citation fix (`TalkSystemSpeechSynthesizer.swift`)

Update Pellegrino et al. reference from 2011 to 2019 (matching the DOI)
Align per-char timing values in comments to match actual code values

Settings UI

Closes #53510

Tip: Upgrade to Premium system voices for a more natural Talk Mode experience

Talk Mode uses the macOS system voice by default (when ElevenLabs is not configured). Apple provides Enhanced and Premium quality voices that sound significantly more natural than the default.

To upgrade:

Open System Settings → Accessibility → Spoken Content (or search for "Spoken Content")
Click System Voice → Manage Voices...
Download a Premium voice for your language (e.g. Siri voices marked as Premium)
Set it as your system voice

Premium voices use neural TTS and produce much more natural-sounding speech in Talk Mode conversations.

Test plan

🤖 Generated with Claude Code

greptile-apps · 2026-03-24T08:51:51Z

Greptile Summary

This PR delivers a well-scoped set of Talk Mode improvements: it eliminates double TTS playback when using system voice, fixes the CJK watchdog cutting off speech mid-sentence, adds audible phase-transition sounds, introduces a right Shift interrupt key, and widens the silence-detection window for CJK locales.

Double playback fix (TalkModeRuntime.swift): The ElevenLabs and system-voice code paths are now cleanly separated, so a system-voice failure no longer re-invokes playSystemVoice from the outer catch block.
Language-aware watchdog (TalkSystemSpeechSynthesizer.swift): Per-language per-char timing with a 3× safety multiplier correctly handles CJK character density. A minor documentation inconsistency exists: the inline citation reads "Pellegrino et al. (2011)" while the linked DOI and PR description reference the 2019 paper; the computed per-char values listed in the comments (~0.22, ~0.25, ~0.19) also don't match the values used in code (0.25, 0.28, 0.20) — not a bug, but could confuse future maintainers.
nil language fix (TalkModeRuntime.swift): playSystemVoice now passes appLocale as a fallback when input.language is nil, so the watchdog uses the correct per-language timing instead of always defaulting to English.
TalkSpeechInterruptMonitor: New singleton global key monitor is sound. Threading is safe because all mutations are dispatched to the main queue and NSEvent callbacks run on the main thread.
CJK silence window: Enforcing a 2000 ms minimum only for CJK locales is a targeted, safe change.

Confidence Score: 5/5

Safe to merge — no logic errors, security issues, or regressions identified.
All five changes are targeted fixes with clear motivations. The double-playback bug is definitively resolved by restructuring the code paths. The CJK watchdog and silence-window changes are conservative (larger timeouts, higher minimum silence). The new interrupt monitor is properly guarded by phase checks and main-thread dispatch. The only open item is a P2 documentation inconsistency in the citation year and inline per-char values, which does not affect runtime behavior.
No files require special attention.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: apps/shared/OpenClawKit/Sources/OpenClawKit/TalkSystemSpeechSynthesizer.swift
Line: 57-63

Comment:
**Citation year and comment values don't match the code**

Two inconsistencies in this comment block that could mislead future maintainers:

1. The citation says **"Pellegrino et al. (2011)"** but the linked DOI (`sciadv.aaw2594`) points to the 2019 paper. The PR description also says 2019. The year in the comment should be corrected to `(2019)`.

2. The per-char values documented in the comments don't match what's actually used in the code:

| Language | Comment says | Code uses |
|----------|-------------|-----------|
| Korean   | ~0.22s/char | 0.25s     |
| Chinese  | ~0.25s/char | 0.28s     |
| Japanese | ~0.19s/char | 0.20s     |
| English  | ~0.08s/char | 0.08s ✓  |

The code values appear to include an additional safety buffer on top of the 1.3× TTS adjustment, but this isn't explained. Consider updating the comment values to match, or noting the extra adjustment factor so the math is traceable.

```suggestion
        // Speech rates based on Pellegrino et al. (2019) syllable-per-second data,
        // adjusted ~1.3x slower for TTS synthesis vs natural speech,
        // then rounded up ~10–15% as an additional empirical safety buffer:
        // https://www.science.org/doi/10.1126/sciadv.aaw2594
        //   Japanese: 7.84 SPS → ~0.19s/char → 0.20s used
        //   Korean:   5.96 SPS → ~0.22s/char → 0.25s used
        //   Chinese:  5.18 SPS → ~0.25s/char → 0.28s used
        //   English:  6.19 SPS → ~0.08s/char → 0.08s used
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat(talk): increase silence detection t..." | Re-trigger Greptile}

greptile-apps · 2026-03-24T08:51:55Z

apps/shared/OpenClawKit/Sources/OpenClawKit/TalkSystemSpeechSynthesizer.swift

+        // Speech rates based on Pellegrino et al. (2011) syllable-per-second data,
+        // adjusted ~1.3x slower for TTS synthesis vs natural speech:
+        // https://www.science.org/doi/10.1126/sciadv.aaw2594
+        //   Japanese: 7.84 SPS → ~0.19s/char (mixed kana/kanji avg ~1.5 mora/char)
+        //   Korean:   5.96 SPS → ~0.22s/char (1 char = 1 syllable)
+        //   Chinese:  5.18 SPS → ~0.25s/char (1 char = 1 syllable)
+        //   English:  6.19 SPS → ~0.08s/char (avg ~5 chars/syllable)


Citation year and comment values don't match the code

Two inconsistencies in this comment block that could mislead future maintainers:

The citation says "Pellegrino et al. (2011)" but the linked DOI (sciadv.aaw2594) points to the 2019 paper. The PR description also says 2019. The year in the comment should be corrected to (2019).

The per-char values documented in the comments don't match what's actually used in the code:

Language Comment says Code uses

Korean ~0.22s/char 0.25s

Chinese ~0.25s/char 0.28s

Japanese ~0.19s/char 0.20s

English ~0.08s/char 0.08s ✓

The code values appear to include an additional safety buffer on top of the 1.3× TTS adjustment, but this isn't explained. Consider updating the comment values to match, or noting the extra adjustment factor so the math is traceable.

Suggested change

// Speech rates based on Pellegrino et al. (2011) syllable-per-second data,

// adjusted ~1.3x slower for TTS synthesis vs natural speech:

// https://www.science.org/doi/10.1126/sciadv.aaw2594

// Japanese: 7.84 SPS → ~0.19s/char (mixed kana/kanji avg ~1.5 mora/char)

// Korean: 5.96 SPS → ~0.22s/char (1 char = 1 syllable)

// Chinese: 5.18 SPS → ~0.25s/char (1 char = 1 syllable)

// English: 6.19 SPS → ~0.08s/char (avg ~5 chars/syllable)

// Speech rates based on Pellegrino et al. (2019) syllable-per-second data,

// adjusted ~1.3x slower for TTS synthesis vs natural speech,

// then rounded up ~10–15% as an additional empirical safety buffer:

// https://www.science.org/doi/10.1126/sciadv.aaw2594

// Japanese: 7.84 SPS → ~0.19s/char → 0.20s used

// Korean: 5.96 SPS → ~0.22s/char → 0.25s used

// Chinese: 5.18 SPS → ~0.25s/char → 0.28s used

// English: 6.19 SPS → ~0.08s/char → 0.08s used

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/shared/OpenClawKit/Sources/OpenClawKit/TalkSystemSpeechSynthesizer.swift Line: 57-63 Comment: **Citation year and comment values don't match the code** Two inconsistencies in this comment block that could mislead future maintainers: 1. The citation says **"Pellegrino et al. (2011)"** but the linked DOI (`sciadv.aaw2594`) points to the 2019 paper. The PR description also says 2019. The year in the comment should be corrected to `(2019)`. 2. The per-char values documented in the comments don't match what's actually used in the code: | Language | Comment says | Code uses | |----------|-------------|-----------| | Korean | ~0.22s/char | 0.25s | | Chinese | ~0.25s/char | 0.28s | | Japanese | ~0.19s/char | 0.20s | | English | ~0.08s/char | 0.08s ✓ | The code values appear to include an additional safety buffer on top of the 1.3× TTS adjustment, but this isn't explained. Consider updating the comment values to match, or noting the extra adjustment factor so the math is traceable. ```suggestion // Speech rates based on Pellegrino et al. (2019) syllable-per-second data, // adjusted ~1.3x slower for TTS synthesis vs natural speech, // then rounded up ~10–15% as an additional empirical safety buffer: // https://www.science.org/doi/10.1126/sciadv.aaw2594 // Japanese: 7.84 SPS → ~0.19s/char → 0.20s used // Korean: 5.96 SPS → ~0.22s/char → 0.25s used // Chinese: 5.18 SPS → ~0.25s/char → 0.28s used // English: 6.19 SPS → ~0.08s/char → 0.08s used ``` How can I resolve this? If you propose a fix, please make it concise.

Addressed in 26eee68c10 — year updated to 2019, comment values aligned with code.

fabianwilliams

Nice PR — well-structured commits and solid research backing the CJK timing values.

A few things:

1. Citation year + comment values mismatch (TalkSystemSpeechSynthesizer.swift)
The inline comment references "Pellegrino et al. (2011)" but the linked DOI (sciadv.aaw2594) is the 2019 paper — the PR description also says 2019. Additionally, the per-char values in the comments (~0.22, ~0.25, ~0.19) don't match what the code actually uses (0.25, 0.28, 0.20). Could you update the year to 2019 and either align the comment values to the code or note the extra safety buffer so the math is traceable?

2. Phase-transition sounds — any way to mute?
The system sounds (Tink, Pop, Bottle, Submarine) are a nice UX touch, but if someone is using Talk Mode in a quiet environment or a meeting, surprise audio could be unwelcome. Worth considering a preference toggle or respecting the system "Play sound effects" setting — not a blocker, just something to think about.

3. macOS CI failure
The macOS check failed, but it's a runner timeout/cancellation after all 212 test files passed (1846 tests). Not related to this PR's changes.

Overall this looks good — clean fixes for real problems. Thanks!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26eee68c10

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T13:11:06Z

apps/shared/OpenClawKit/Sources/OpenClawKit/TalkSystemSpeechSynthesizer.swift

+        let estimatedSeconds = max(minSeconds, min(300.0, Double(trimmed.count) * perCharSeconds))
+        let watchdogTimeout = estimatedSeconds * 3.0


Cap watchdog timeout after applying safety multiplier

The new timeout calculation multiplies an already-capped estimate (min(300.0, ...)) by 3.0, which raises the hard watchdog ceiling to 900 seconds. In the failure path this watchdog is the only recovery mechanism, so a stuck AVSpeechSynthesizer can now keep Talk Mode in the speaking state for up to 15 minutes before control returns, a large regression from the previous bounded timeout behavior.

Useful? React with 👍 / 👎.

The watchdog timeout is already capped via min(300.0, ...) on line 80. The 3x multiplier on a 300s base gives 900s max, which is intentional for very long TTS — the watchdog is a hang guard, not a performance limit.

hongsw · 2026-03-24T13:46:07Z

hongsw · 2026-03-24T13:52:06Z

@fabianwilliams Thanks for the thorough review! All three items addressed:

1. Citation year + comment values — Fixed in 26eee68c10. Year updated to 2019, comment values now match code (Japanese 0.20, Korean 0.25, Chinese 0.28).

2. Phase-transition sounds mute — Added in 026e6b12d1. New "Play phase-transition sounds" checkbox in Settings → Voice Wake. Defaults to ON.

3. macOS CI failure — Confirmed unrelated, no action needed.

Additionally, changed the interrupt key from right Shift to right Option (f2f94b71) and added a toggle for that too ("Press Right Option to stop speech").

Screenshot of the new settings: #53553 (comment)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2f94b715b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T13:56:43Z

apps/macos/Sources/OpenClaw/TalkSpeechInterruptMonitor.swift

+
+    private func handleFlags(keyCode: UInt16, modifierFlags: NSEvent.ModifierFlags) {
+        // Right Option key down (keyCode 61).
+        guard keyCode == 61, modifierFlags.contains(.option) else { return }


Avoid reusing Right Option for interrupt hotkey

Binding interrupt to keyCode == 61 here collides with the existing push-to-talk hotkey (VoicePushToTalkHotkey also uses Right Option), so users who enable both features will trigger push-to-talk capture when they only meant to stop TTS. In that configuration, a simple tap to interrupt can fire extra push-to-talk side effects (chimes/overlay/mic session churn) while Talk Mode is transitioning, which is a user-visible regression.

Useful? React with 👍 / 👎.

Good catch — we addressed this in 5acda769f4. Talk Mode now disables Push-to-Talk while it is active (TalkModeController.setEnabled calls VoicePushToTalkHotkey.shared.setEnabled(false)), so the two handlers never run simultaneously. PTT is restored when Talk Mode is turned off. We also added a UI hint in the settings panel explaining this behavior.

hongsw · 2026-03-25T06:16:47Z

@fabianwilliams ;) All integration tests passed on device — Talk Mode end-to-end verified:

Double TTS playback fix confirmed (system voice plays once)
CJK watchdog timeout works correctly (Korean 50+ char responses play to completion)
Phase-transition sounds play/mute toggle works
Right Option interrupt stops speech and returns to listening
Push-to-Talk correctly pauses while Talk Mode is active, resumes when Talk Mode is off
Settings UI shows all new toggles and PTT paused hint
App deployed to /Applications/OpenClaw.app and restarts cleanly from icon

This PR is ready for final review and merge (after #53511 lands first).

fabianwilliams

All three review items confirmed addressed:

Citation year — Pellegrino et al. 2019, comment values match code (Japanese 0.20, Korean 0.25, Chinese 0.28). ✅
Phase sounds mute toggle — talkPhaseSoundsEnabled persisted via UserDefaults, guarded in playPhaseSound. Clean. ✅
CJK silence timeout — isCJKLocale check with 2000ms floor clamp + info log when clamped. Exactly right. ✅

The error handling restructure in playAssistant is a nice improvement too — ElevenLabs failure now properly falls through to system voice without double-triggering.

Integration tests passing on device per the author's report. LGTM — approve.

fabianwilliams · 2026-03-26T23:44:00Z

CI note: the 2 failing checks (telegram extension, macOS) are unrelated to this PR's Talk Mode changes. The telegram test failure is in the extension test runner, not TTS code. The macOS check was cancelled/timed out. All relevant tests (node, Windows, channels, build) pass. Ready to merge.

Play a short system sound on phase transitions to give the user audible feedback: - thinking: Tink - speaking: Pop - listening (after speech interrupted): Bottle - listening (after thinking): Submarine Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Add TalkSpeechInterruptMonitor — a dedicated global key monitor that listens for right Shift (keyCode 60) to interrupt Talk Mode speech. Independent of Push-to-Talk, so it works even when PTT is disabled. Stops only the current response; the next conversation cycle continues normally via sendAndSpeak's resumeListeningIfNeeded flow. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Korean, Japanese, and Chinese speakers need longer pauses between phrases. When the app locale is CJK, enforce a minimum 2000ms silence window (vs the default 1500ms) to avoid premature transcript submission. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…nfig Replace non-idiomatic force-unwraps (cfg.voiceId!, cfg.modelId!) with safe flatMap unwrapping, and add an info log when CJK locale clamps the silence timeout so the override is observable in diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Add a "Play phase-transition sounds" checkbox to Voice Wake settings. When disabled, Talk Mode phase transitions (Tink/Pop/Bottle/Submarine) are silent. Defaults to enabled to preserve existing behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Add a "Press Right Option to stop speech" checkbox to Voice Wake settings. Also change the interrupt key from right Shift to right Option (keyCode 61) to avoid conflicts with typing. Defaults to enabled to preserve existing behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Talk Mode and Push-to-Talk both use the right Option key (keyCode 61). Disable PTT when Talk Mode is enabled to prevent conflicting handlers, and restore PTT when Talk Mode is disabled. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Display a footnote under the Push-to-Talk toggle when both PTT and Talk Mode are enabled, explaining that PTT is paused while Talk Mode is active and resumes when Talk Mode is turned off. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5c2ff1715

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-27T02:29:03Z

apps/macos/Sources/OpenClaw/TalkModeController.swift

+        let pttEnabled = !enabled && AppStateStore.shared.voicePushToTalkEnabled
+        VoicePushToTalkHotkey.shared.setEnabled(pttEnabled)


Keep Push-to-Talk disabled while Talk Mode is active

This disables Push-to-Talk only when setEnabled(_:) is called for Talk Mode transitions, but it does not enforce that invariant afterward; if Talk Mode is already on and the user enables Push-to-Talk from settings, MenuContentView.onChange(of: voicePushToTalkEnabled) re-enables the hotkey without checking talkEnabled, so Right Option again drives both handlers and reintroduces the interrupt/PTT conflict this change is trying to prevent. Please gate hotkey enablement on both toggles everywhere (or centralize the decision) so Talk Mode always wins while active.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6ed92d808

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-27T02:46:33Z

apps/macos/Sources/OpenClaw/TalkModeController.swift

+        let pttEnabled = !enabled && AppStateStore.shared.voicePushToTalkEnabled
+        VoicePushToTalkHotkey.shared.setEnabled(pttEnabled)


End active push-to-talk session before disabling hotkey

This newly disables Push-to-Talk whenever Talk Mode is enabled, but if the user turns on Talk Mode while a PTT hold is already active (Right Option still down), VoicePushToTalkHotkey.stopMonitoring() clears its state and removes monitors without running the endAction, so VoicePushToTalk.shared.end() is never called and the capture session can remain open (mic/overlay stuck). Please ensure an active PTT session is finalized before disabling the hotkey monitor.

Useful? React with 👍 / 👎.

openclaw-barnacle bot added app: macos App: macos size: S labels Mar 24, 2026

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

fabianwilliams self-assigned this Mar 24, 2026

fabianwilliams reviewed Mar 24, 2026

View reviewed changes

fabianwilliams mentioned this pull request Mar 24, 2026

fix(talk): prevent double TTS playback when system voice times out #53511

Merged

5 tasks

hongsw changed the title ~~fix(talk): Talk Mode TTS improvements for CJK languages~~ [WIP] fix(talk): Talk Mode TTS improvements for CJK languages Mar 24, 2026

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

openclaw-barnacle bot added size: M and removed size: S labels Mar 24, 2026

hongsw changed the title ~~[WIP] fix(talk): Talk Mode TTS improvements for CJK languages~~ fix(talk): Talk Mode TTS improvements for CJK languages Mar 24, 2026

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

fabianwilliams approved these changes Mar 26, 2026

View reviewed changes

BradGroux mentioned this pull request Mar 26, 2026

WORKING: All Microsoft Issues and PRs #49126

Draft

hongsw and others added 8 commits March 27, 2026 11:22

hongsw force-pushed the fork/talk-mode-full branch from 740ed3d to a5c2ff1 Compare March 27, 2026 02:23

openclaw-barnacle bot added size: S and removed size: M labels Mar 27, 2026

chatgpt-codex-connector bot reviewed Mar 27, 2026

View reviewed changes

openclaw-barnacle bot added docs Improvements or additions to documentation and removed docs Improvements or additions to documentation labels Mar 27, 2026

hongsw force-pushed the fork/talk-mode-full branch from b919b03 to a5c2ff1 Compare March 27, 2026 02:45

chatgpt-codex-connector bot reviewed Mar 27, 2026

View reviewed changes

odysseus0 mentioned this pull request Mar 29, 2026

[Tracking] CJK & internationalization quality — memory, tokenization, TTS, docs #57264

Closed

Language	Comment says	Code uses
Korean	~0.22s/char	0.25s
Chinese	~0.25s/char	0.28s
Japanese	~0.19s/char	0.20s
English	~0.08s/char	0.08s ✓

		let estimatedSeconds = max(minSeconds, min(300.0, Double(trimmed.count) * perCharSeconds))
		let watchdogTimeout = estimatedSeconds * 3.0

		let pttEnabled = !enabled && AppStateStore.shared.voicePushToTalkEnabled
		VoicePushToTalkHotkey.shared.setEnabled(pttEnabled)

Uh oh!

Conversation

hongsw commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Prevent double TTS playback (TalkModeRuntime.swift)

2. Language-aware watchdog timeout (TalkSystemSpeechSynthesizer.swift)

3. Phase-transition system sounds (TalkModeController.swift)

4. Right Option key interrupt (TalkSpeechInterruptMonitor.swift)

5. CJK silence detection (TalkModeRuntime.swift)

6. Config cleanup (TalkModeRuntime.swift)

7. Citation fix (TalkSystemSpeechSynthesizer.swift)

Settings UI

Tip: Upgrade to Premium system voices for a more natural Talk Mode experience

Test plan

Uh oh!

greptile-apps bot commented Mar 24, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

greptile-apps bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

hongsw Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

fabianwilliams left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

hongsw Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

hongsw commented Mar 24, 2026

Uh oh!

hongsw commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

hongsw Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

hongsw commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fabianwilliams left a comment

Choose a reason for hiding this comment

Uh oh!

fabianwilliams commented Mar 26, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

hongsw commented Mar 24, 2026 •

edited

Loading

1. Prevent double TTS playback (`TalkModeRuntime.swift`)

2. Language-aware watchdog timeout (`TalkSystemSpeechSynthesizer.swift`)

3. Phase-transition system sounds (`TalkModeController.swift`)

4. Right Option key interrupt (`TalkSpeechInterruptMonitor.swift`)

5. CJK silence detection (`TalkModeRuntime.swift`)

6. Config cleanup (`TalkModeRuntime.swift`)

7. Citation fix (`TalkSystemSpeechSynthesizer.swift`)

hongsw commented Mar 25, 2026 •

edited

Loading