Skip to content

fix(talk): Talk Mode TTS improvements for CJK languages#53553

Open
hongsw wants to merge 8 commits intoopenclaw:mainfrom
hongsw:fork/talk-mode-full
Open

fix(talk): Talk Mode TTS improvements for CJK languages#53553
hongsw wants to merge 8 commits intoopenclaw:mainfrom
hongsw:fork/talk-mode-full

Conversation

@hongsw
Copy link
Copy Markdown
Contributor

@hongsw hongsw commented Mar 24, 2026

Depends on #53511 — the first two bug fix commits here are shared with that PR. Please merge #53511 first, then this PR can be rebased to include only the feature commits.

Summary

A collection of Talk Mode improvements focused on CJK language support and user experience:

  • Fix Talk Mode playing every assistant reply twice when using a non-ElevenLabs TTS provider
  • Fix CJK (Korean, Japanese, Chinese) system voice watchdog cutting off speech mid-sentence
  • Add audible phase-transition feedback (system sounds) with mute toggle in settings
  • Add Right Option key to interrupt speech, with toggle in settings
  • Increase silence detection timeout for CJK locales
  • Disable Push-to-Talk while Talk Mode is active to prevent key conflict
  • Remove non-idiomatic force-unwraps and add CJK silence clamp logging in reloadConfig
  • Correct Pellegrino et al. citation year (2011 → 2019) and align comment values with code

Changes

1. Prevent double TTS playback (TalkModeRuntime.swift)

Split the ElevenLabs and system-voice error paths. Previously playAssistant() unconditionally retried playSystemVoice() in its catch block — even when system voice itself threw the error.

2. Language-aware watchdog timeout (TalkSystemSpeechSynthesizer.swift)

Per-language timing estimates based on Pellegrino et al. (2019) with 3x safety margin:

Language Per-char Min timeout 50 chars × 3x → watchdog
Korean 0.25s 10s 37.5s
Chinese 0.28s 10s 42.0s
Japanese 0.20s 10s 30.0s
English 0.08s 3s 12.0s

Also fixes playSystemVoice passing nil language, which caused the watchdog to always use English timing.

3. Phase-transition system sounds (TalkModeController.swift)

Distinct audible feedback for each Talk Mode phase:

  • Thinking: Tink
  • Speaking: Pop
  • Listening (after interrupt): Bottle
  • Listening (after thinking): Submarine

Users can disable sounds via Settings → Voice Wake → "Play phase-transition sounds" checkbox.

4. Right Option key interrupt (TalkSpeechInterruptMonitor.swift)

New TalkSpeechInterruptMonitor — a dedicated global key monitor (independent of Push-to-Talk) that listens for right Option (keyCode 61) to stop the current response. Talk Mode returns to listening for the next conversation.

Users can disable via Settings → Voice Wake → "Press Right Option to stop speech" checkbox.

Push-to-Talk is automatically disabled while Talk Mode is active (both use the right Option key). A hint is shown in Settings when this applies.

5. CJK silence detection (TalkModeRuntime.swift)

Enforce minimum 2000ms silence window for CJK locales (vs default 1500ms) to avoid premature transcript submission mid-phrase. Logs when the clamp is applied.

6. Config cleanup (TalkModeRuntime.swift)

  • Replace force-unwraps (cfg.voiceId!, cfg.modelId!) with safe flatMap unwrapping
  • Log when CJK locale clamps silence timeout so the override is observable in diagnostics

7. Citation fix (TalkSystemSpeechSynthesizer.swift)

  • Update Pellegrino et al. reference from 2011 to 2019 (matching the DOI)
  • Align per-char timing values in comments to match actual code values

Settings UI

Voice Wake settings with new toggles

Closes #53510


Tip: Upgrade to Premium system voices for a more natural Talk Mode experience

Talk Mode uses the macOS system voice by default (when ElevenLabs is not configured). Apple provides Enhanced and Premium quality voices that sound significantly more natural than the default.

To upgrade:

  1. Open System SettingsAccessibilitySpoken Content (or search for "Spoken Content")
  2. Click System VoiceManage Voices...
  3. Download a Premium voice for your language (e.g. Siri voices marked as Premium)
  4. Set it as your system voice

Premium voices use neural TTS and produce much more natural-sounding speech in Talk Mode conversations.


Test plan

  • System voice response plays once (not twice)
  • Korean 50+ char response plays to completion
  • Phase sounds play on transitions
  • Phase sounds muted when toggle is off
  • Right Option stops speech, next conversation works normally
  • Right Option works when other apps are focused
  • Right Option interrupt disabled when toggle is off
  • Push-to-Talk paused while Talk Mode is active
  • Push-to-Talk resumes when Talk Mode is turned off
  • CJK silence detection waits long enough before sending
  • English behavior unchanged
  • Pre-commit hooks pass (pnpm check)
  • pnpm test — 9450 passed, 2 failed (pre-existing vitest-scoped-config.test.ts, unrelated)
  • Integration test on device with Talk Mode end-to-end

🤖 Generated with Claude Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR delivers a well-scoped set of Talk Mode improvements: it eliminates double TTS playback when using system voice, fixes the CJK watchdog cutting off speech mid-sentence, adds audible phase-transition sounds, introduces a right Shift interrupt key, and widens the silence-detection window for CJK locales.

  • Double playback fix (TalkModeRuntime.swift): The ElevenLabs and system-voice code paths are now cleanly separated, so a system-voice failure no longer re-invokes playSystemVoice from the outer catch block.
  • Language-aware watchdog (TalkSystemSpeechSynthesizer.swift): Per-language per-char timing with a 3× safety multiplier correctly handles CJK character density. A minor documentation inconsistency exists: the inline citation reads "Pellegrino et al. (2011)" while the linked DOI and PR description reference the 2019 paper; the computed per-char values listed in the comments (~0.22, ~0.25, ~0.19) also don't match the values used in code (0.25, 0.28, 0.20) — not a bug, but could confuse future maintainers.
  • nil language fix (TalkModeRuntime.swift): playSystemVoice now passes appLocale as a fallback when input.language is nil, so the watchdog uses the correct per-language timing instead of always defaulting to English.
  • TalkSpeechInterruptMonitor: New singleton global key monitor is sound. Threading is safe because all mutations are dispatched to the main queue and NSEvent callbacks run on the main thread.
  • CJK silence window: Enforcing a 2000 ms minimum only for CJK locales is a targeted, safe change.

Confidence Score: 5/5

  • Safe to merge — no logic errors, security issues, or regressions identified.
  • All five changes are targeted fixes with clear motivations. The double-playback bug is definitively resolved by restructuring the code paths. The CJK watchdog and silence-window changes are conservative (larger timeouts, higher minimum silence). The new interrupt monitor is properly guarded by phase checks and main-thread dispatch. The only open item is a P2 documentation inconsistency in the citation year and inline per-char values, which does not affect runtime behavior.
  • No files require special attention.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: apps/shared/OpenClawKit/Sources/OpenClawKit/TalkSystemSpeechSynthesizer.swift
Line: 57-63

Comment:
**Citation year and comment values don't match the code**

Two inconsistencies in this comment block that could mislead future maintainers:

1. The citation says **"Pellegrino et al. (2011)"** but the linked DOI (`sciadv.aaw2594`) points to the 2019 paper. The PR description also says 2019. The year in the comment should be corrected to `(2019)`.

2. The per-char values documented in the comments don't match what's actually used in the code:

| Language | Comment says | Code uses |
|----------|-------------|-----------|
| Korean   | ~0.22s/char | 0.25s     |
| Chinese  | ~0.25s/char | 0.28s     |
| Japanese | ~0.19s/char | 0.20s     |
| English  | ~0.08s/char | 0.08s ✓  |

The code values appear to include an additional safety buffer on top of the 1.3× TTS adjustment, but this isn't explained. Consider updating the comment values to match, or noting the extra adjustment factor so the math is traceable.

```suggestion
        // Speech rates based on Pellegrino et al. (2019) syllable-per-second data,
        // adjusted ~1.3x slower for TTS synthesis vs natural speech,
        // then rounded up ~10–15% as an additional empirical safety buffer:
        // https://www.science.org/doi/10.1126/sciadv.aaw2594
        //   Japanese: 7.84 SPS → ~0.19s/char → 0.20s used
        //   Korean:   5.96 SPS → ~0.22s/char → 0.25s used
        //   Chinese:  5.18 SPS → ~0.25s/char → 0.28s used
        //   English:  6.19 SPS → ~0.08s/char → 0.08s used
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(talk): increase silence detection t..." | Re-trigger Greptile

Comment on lines +57 to +63
// Speech rates based on Pellegrino et al. (2011) syllable-per-second data,
// adjusted ~1.3x slower for TTS synthesis vs natural speech:
// https://www.science.org/doi/10.1126/sciadv.aaw2594
// Japanese: 7.84 SPS → ~0.19s/char (mixed kana/kanji avg ~1.5 mora/char)
// Korean: 5.96 SPS → ~0.22s/char (1 char = 1 syllable)
// Chinese: 5.18 SPS → ~0.25s/char (1 char = 1 syllable)
// English: 6.19 SPS → ~0.08s/char (avg ~5 chars/syllable)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Citation year and comment values don't match the code

Two inconsistencies in this comment block that could mislead future maintainers:

  1. The citation says "Pellegrino et al. (2011)" but the linked DOI (sciadv.aaw2594) points to the 2019 paper. The PR description also says 2019. The year in the comment should be corrected to (2019).

  2. The per-char values documented in the comments don't match what's actually used in the code:

Language Comment says Code uses
Korean ~0.22s/char 0.25s
Chinese ~0.25s/char 0.28s
Japanese ~0.19s/char 0.20s
English ~0.08s/char 0.08s ✓

The code values appear to include an additional safety buffer on top of the 1.3× TTS adjustment, but this isn't explained. Consider updating the comment values to match, or noting the extra adjustment factor so the math is traceable.

Suggested change
// Speech rates based on Pellegrino et al. (2011) syllable-per-second data,
// adjusted ~1.3x slower for TTS synthesis vs natural speech:
// https://www.science.org/doi/10.1126/sciadv.aaw2594
// Japanese: 7.84 SPS → ~0.19s/char (mixed kana/kanji avg ~1.5 mora/char)
// Korean: 5.96 SPS → ~0.22s/char (1 char = 1 syllable)
// Chinese: 5.18 SPS → ~0.25s/char (1 char = 1 syllable)
// English: 6.19 SPS → ~0.08s/char (avg ~5 chars/syllable)
// Speech rates based on Pellegrino et al. (2019) syllable-per-second data,
// adjusted ~1.3x slower for TTS synthesis vs natural speech,
// then rounded up ~10–15% as an additional empirical safety buffer:
// https://www.science.org/doi/10.1126/sciadv.aaw2594
// Japanese: 7.84 SPS → ~0.19s/char → 0.20s used
// Korean: 5.96 SPS → ~0.22s/char → 0.25s used
// Chinese: 5.18 SPS → ~0.25s/char → 0.28s used
// English: 6.19 SPS → ~0.08s/char → 0.08s used
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/shared/OpenClawKit/Sources/OpenClawKit/TalkSystemSpeechSynthesizer.swift
Line: 57-63

Comment:
**Citation year and comment values don't match the code**

Two inconsistencies in this comment block that could mislead future maintainers:

1. The citation says **"Pellegrino et al. (2011)"** but the linked DOI (`sciadv.aaw2594`) points to the 2019 paper. The PR description also says 2019. The year in the comment should be corrected to `(2019)`.

2. The per-char values documented in the comments don't match what's actually used in the code:

| Language | Comment says | Code uses |
|----------|-------------|-----------|
| Korean   | ~0.22s/char | 0.25s     |
| Chinese  | ~0.25s/char | 0.28s     |
| Japanese | ~0.19s/char | 0.20s     |
| English  | ~0.08s/char | 0.08s ✓  |

The code values appear to include an additional safety buffer on top of the 1.3× TTS adjustment, but this isn't explained. Consider updating the comment values to match, or noting the extra adjustment factor so the math is traceable.

```suggestion
        // Speech rates based on Pellegrino et al. (2019) syllable-per-second data,
        // adjusted ~1.3x slower for TTS synthesis vs natural speech,
        // then rounded up ~10–15% as an additional empirical safety buffer:
        // https://www.science.org/doi/10.1126/sciadv.aaw2594
        //   Japanese: 7.84 SPS → ~0.19s/char → 0.20s used
        //   Korean:   5.96 SPS → ~0.22s/char → 0.25s used
        //   Chinese:  5.18 SPS → ~0.25s/char → 0.28s used
        //   English:  6.19 SPS → ~0.08s/char → 0.08s used
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 26eee68c10 — year updated to 2019, comment values aligned with code.

@fabianwilliams fabianwilliams self-assigned this Mar 24, 2026
Copy link
Copy Markdown
Contributor

@fabianwilliams fabianwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR — well-structured commits and solid research backing the CJK timing values.

A few things:

1. Citation year + comment values mismatch (TalkSystemSpeechSynthesizer.swift)
The inline comment references "Pellegrino et al. (2011)" but the linked DOI (sciadv.aaw2594) is the 2019 paper — the PR description also says 2019. Additionally, the per-char values in the comments (~0.22, ~0.25, ~0.19) don't match what the code actually uses (0.25, 0.28, 0.20). Could you update the year to 2019 and either align the comment values to the code or note the extra safety buffer so the math is traceable?

2. Phase-transition sounds — any way to mute?
The system sounds (Tink, Pop, Bottle, Submarine) are a nice UX touch, but if someone is using Talk Mode in a quiet environment or a meeting, surprise audio could be unwelcome. Worth considering a preference toggle or respecting the system "Play sound effects" setting — not a blocker, just something to think about.

3. macOS CI failure
The macOS check failed, but it's a runner timeout/cancellation after all 212 test files passed (1846 tests). Not related to this PR's changes.

Overall this looks good — clean fixes for real problems. Thanks!

@hongsw hongsw changed the title fix(talk): Talk Mode TTS improvements for CJK languages [WIP] fix(talk): Talk Mode TTS improvements for CJK languages Mar 24, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26eee68c10

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +80 to +81
let estimatedSeconds = max(minSeconds, min(300.0, Double(trimmed.count) * perCharSeconds))
let watchdogTimeout = estimatedSeconds * 3.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cap watchdog timeout after applying safety multiplier

The new timeout calculation multiplies an already-capped estimate (min(300.0, ...)) by 3.0, which raises the hard watchdog ceiling to 900 seconds. In the failure path this watchdog is the only recovery mechanism, so a stuck AVSpeechSynthesizer can now keep Talk Mode in the speaking state for up to 15 minutes before control returns, a large regression from the previous bounded timeout behavior.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The watchdog timeout is already capped via min(300.0, ...) on line 80. The 3x multiplier on a 300s base gives 900s max, which is intentional for very long TTS — the watchdog is a hang guard, not a performance limit.

@hongsw
Copy link
Copy Markdown
Contributor Author

hongsw commented Mar 24, 2026

image

@hongsw hongsw changed the title [WIP] fix(talk): Talk Mode TTS improvements for CJK languages fix(talk): Talk Mode TTS improvements for CJK languages Mar 24, 2026
@hongsw
Copy link
Copy Markdown
Contributor Author

hongsw commented Mar 24, 2026

@fabianwilliams Thanks for the thorough review! All three items addressed:

1. Citation year + comment values — Fixed in 26eee68c10. Year updated to 2019, comment values now match code (Japanese 0.20, Korean 0.25, Chinese 0.28).

2. Phase-transition sounds mute — Added in 026e6b12d1. New "Play phase-transition sounds" checkbox in Settings → Voice Wake. Defaults to ON.

3. macOS CI failure — Confirmed unrelated, no action needed.

Additionally, changed the interrupt key from right Shift to right Option (f2f94b71) and added a toggle for that too ("Press Right Option to stop speech").

Screenshot of the new settings: #53553 (comment)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2f94b715b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


private func handleFlags(keyCode: UInt16, modifierFlags: NSEvent.ModifierFlags) {
// Right Option key down (keyCode 61).
guard keyCode == 61, modifierFlags.contains(.option) else { return }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid reusing Right Option for interrupt hotkey

Binding interrupt to keyCode == 61 here collides with the existing push-to-talk hotkey (VoicePushToTalkHotkey also uses Right Option), so users who enable both features will trigger push-to-talk capture when they only meant to stop TTS. In that configuration, a simple tap to interrupt can fire extra push-to-talk side effects (chimes/overlay/mic session churn) while Talk Mode is transitioning, which is a user-visible regression.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — we addressed this in 5acda769f4. Talk Mode now disables Push-to-Talk while it is active (TalkModeController.setEnabled calls VoicePushToTalkHotkey.shared.setEnabled(false)), so the two handlers never run simultaneously. PTT is restored when Talk Mode is turned off. We also added a UI hint in the settings panel explaining this behavior.

@hongsw
Copy link
Copy Markdown
Contributor Author

hongsw commented Mar 25, 2026

@fabianwilliams ;) All integration tests passed on device — Talk Mode end-to-end verified:

  • Double TTS playback fix confirmed (system voice plays once)
  • CJK watchdog timeout works correctly (Korean 50+ char responses play to completion)
  • Phase-transition sounds play/mute toggle works
  • Right Option interrupt stops speech and returns to listening
  • Push-to-Talk correctly pauses while Talk Mode is active, resumes when Talk Mode is off
  • Settings UI shows all new toggles and PTT paused hint
  • App deployed to /Applications/OpenClaw.app and restarts cleanly from icon

This PR is ready for final review and merge (after #53511 lands first).

Copy link
Copy Markdown
Contributor

@fabianwilliams fabianwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three review items confirmed addressed:

  1. Citation year — Pellegrino et al. 2019, comment values match code (Japanese 0.20, Korean 0.25, Chinese 0.28). ✅
  2. Phase sounds mute toggletalkPhaseSoundsEnabled persisted via UserDefaults, guarded in playPhaseSound. Clean. ✅
  3. CJK silence timeoutisCJKLocale check with 2000ms floor clamp + info log when clamped. Exactly right. ✅

The error handling restructure in playAssistant is a nice improvement too — ElevenLabs failure now properly falls through to system voice without double-triggering.

Integration tests passing on device per the author's report. LGTM — approve.

@fabianwilliams
Copy link
Copy Markdown
Contributor

CI note: the 2 failing checks (telegram extension, macOS) are unrelated to this PR's Talk Mode changes. The telegram test failure is in the extension test runner, not TTS code. The macOS check was cancelled/timed out. All relevant tests (node, Windows, channels, build) pass. Ready to merge.

hongsw and others added 8 commits March 27, 2026 11:22
Play a short system sound on phase transitions to give the user
audible feedback:
- thinking: Tink
- speaking: Pop
- listening (after speech interrupted): Bottle
- listening (after thinking): Submarine

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add TalkSpeechInterruptMonitor — a dedicated global key monitor that
listens for right Shift (keyCode 60) to interrupt Talk Mode speech.
Independent of Push-to-Talk, so it works even when PTT is disabled.

Stops only the current response; the next conversation cycle
continues normally via sendAndSpeak's resumeListeningIfNeeded flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Korean, Japanese, and Chinese speakers need longer pauses between
phrases. When the app locale is CJK, enforce a minimum 2000ms
silence window (vs the default 1500ms) to avoid premature
transcript submission.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…nfig

Replace non-idiomatic force-unwraps (cfg.voiceId!, cfg.modelId!) with
safe flatMap unwrapping, and add an info log when CJK locale clamps the
silence timeout so the override is observable in diagnostics.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add a "Play phase-transition sounds" checkbox to Voice Wake settings.
When disabled, Talk Mode phase transitions (Tink/Pop/Bottle/Submarine)
are silent. Defaults to enabled to preserve existing behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add a "Press Right Option to stop speech" checkbox to Voice Wake
settings. Also change the interrupt key from right Shift to right
Option (keyCode 61) to avoid conflicts with typing.
Defaults to enabled to preserve existing behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Talk Mode and Push-to-Talk both use the right Option key (keyCode 61).
Disable PTT when Talk Mode is enabled to prevent conflicting handlers,
and restore PTT when Talk Mode is disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Display a footnote under the Push-to-Talk toggle when both PTT and
Talk Mode are enabled, explaining that PTT is paused while Talk Mode
is active and resumes when Talk Mode is turned off.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@hongsw hongsw force-pushed the fork/talk-mode-full branch from 740ed3d to a5c2ff1 Compare March 27, 2026 02:23
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5c2ff1715

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +23 to +24
let pttEnabled = !enabled && AppStateStore.shared.voicePushToTalkEnabled
VoicePushToTalkHotkey.shared.setEnabled(pttEnabled)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep Push-to-Talk disabled while Talk Mode is active

This disables Push-to-Talk only when setEnabled(_:) is called for Talk Mode transitions, but it does not enforce that invariant afterward; if Talk Mode is already on and the user enables Push-to-Talk from settings, MenuContentView.onChange(of: voicePushToTalkEnabled) re-enables the hotkey without checking talkEnabled, so Right Option again drives both handlers and reintroduces the interrupt/PTT conflict this change is trying to prevent. Please gate hotkey enablement on both toggles everywhere (or centralize the decision) so Talk Mode always wins while active.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation and removed docs Improvements or additions to documentation labels Mar 27, 2026
@hongsw hongsw force-pushed the fork/talk-mode-full branch from b919b03 to a5c2ff1 Compare March 27, 2026 02:45
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6ed92d808

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +23 to +24
let pttEnabled = !enabled && AppStateStore.shared.voicePushToTalkEnabled
VoicePushToTalkHotkey.shared.setEnabled(pttEnabled)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge End active push-to-talk session before disabling hotkey

This newly disables Push-to-Talk whenever Talk Mode is enabled, but if the user turns on Talk Mode while a PTT hold is already active (Right Option still down), VoicePushToTalkHotkey.stopMonitoring() clears its state and removes monitors without running the endAction, so VoicePushToTalk.shared.end() is never called and the capture session can remain open (mic/overlay stuck). Please ensure an active PTT session is finalized before disabling the hotkey monitor.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Mac app Talk Mode plays every reply twice (duplicate TTS audio)

2 participants