feat(ios): background voice mode with improved TTS handling by zeulewan · Pull Request #17319 · openclaw/openclaw

zeulewan · 2026-02-15T16:57:49Z

Summary

Problem: Talk Mode stops working when the iOS app is backgrounded because iOS suspends the audio engine
Why it matters: Users want hands-free voice interaction without keeping the app foregrounded (e.g. driving, walking)
What changed:
- 557ece6 Add Background Voice toggle and basic background audio session
- a9c7a8b Keep app alive in background with silent audio keepalive, add thinking chime
- c4be86e Add toggle to disable ElevenLabs voice directive hint
- 2a440fa Keep AVAudioEngine running continuously, pause/resume recognition instead of full teardown
- 2e9ca02 Add VoIP background mode, speaker bleed detection, push speech queue, audio route change handling, startup chime, settings label cleanup
What did NOT change: Gateway-side logic, non-voice features, Android/macOS apps

Change Type (select all)

Feature

Scope (select all touched areas)

UI / DX

Linked Issue/PR

None

User-visible / Behavior Changes

New "Background Listening" toggle in Talk Mode settings (keeps voice mode active when backgrounded)
New "Voice Directive Hint" toggle to disable ElevenLabs voice switching instructions in the Talk Mode prompt
"Show Talk Button" renamed to "Overlay Button" with description text
Startup chime plays when voice mode begins listening
Silence window reduced from 0.9s to 0.6s for faster response
Speaker bleed detection prevents false interrupts when not using headphones
Background listening works with both headphones and speaker output
Background listening toggle can be enabled mid-conversation

Current limitations

Speech interruption works with headphones/Bluetooth but is disabled on speaker. On speaker, the mic picks up TTS output and falsely triggers interrupts. Threshold-based gating was attempted but the speaker bleed levels are too close to actual speech levels to reliably distinguish the two. Interrupt-on-speech is disabled on speaker for now. Users can still tap the orb to stop playback.
Switching audio devices mid-conversation (e.g. connecting/disconnecting headphones) has not been verified to work.
Using onboard TTS and STT drains battery. Background listening toggle is off by default and the description warns about battery usage.

Security Impact (required)

New permissions/capabilities? Yes - added voip UIBackgroundMode
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No
VoIP background mode keeps the app alive in the background. This is standard iOS API usage for voice apps. Battery impact is disclosed to users in the settings toggle description.

Repro + Verification

Environment

OS: iOS 26.2.1
Device: iPhone 16
Runtime: Xcode 17C52, Swift 6.0

Steps

Enable Talk Mode in settings
Enable "Background Listening"
Start a voice conversation, then background the app
Speak - the assistant should still respond

Expected

Voice mode stays active in background

Actual

Voice mode stays active in background

Evidence

Xcode Debug build succeeded on physical device (iphoneos, arm64)
App deployed and launched on iPhone 16

Human Verification (required)

Verified: Xcode build + deploy to iPhone 16 (iOS 26.2.1), app launches, settings UI shows new toggles
Verified: ~10 minutes of background voice mode usage
Edge cases checked: Build with no signing issues, no compiler warnings

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Failure Recovery (if this breaks)

How to disable/revert: Revert this PR. Background voice is additive, no existing behavior changes if reverted.

Risks and Mitigations

Risk: Battery drain from background audio keepalive
- Mitigation: Toggle is off by default, description warns about battery usage
Risk: False speech interrupts from speaker bleed on non-headphone output
- Mitigation: Interrupt-on-speech disabled entirely on speaker, only enabled with isolated audio output (headphones/Bluetooth). Users can tap the orb to stop playback manually.

AI-assisted: This PR was developed with Claude Code. Tested via Xcode build + deploy to physical iPhone 16. All code changes reviewed and understood.

Greptile Summary

Implements iOS background voice mode by keeping AVAudioEngine running continuously when backgrounded. The PR adds background audio session keepalive, speaker bleed detection, push speech queue handling, audio route change observers, and UI toggles.

Key changes:

New voip UIBackgroundMode enables background execution
pauseRecognitionOnly() / resumeRecognitionOnly() keep engine alive while pausing/resuming speech recognition
Speaker bleed detection disables interrupt-on-speech when using speaker output (headphones work normally)
Silent audio loop as secondary keepalive mechanism
Reduced silence window from 0.9s → 0.6s for faster response
Added startup/thinking chimes for audio feedback
New settings toggles: "Background Listening" and "Voice Directive Hint"

Issues found:

Behavioral change in NodeAppModel.swift:1074: changing speak ?? true to speak == true breaks backward compatibility when speak is nil (previously spoke by default, now won't speak)
Potential crash in TalkModeManager.swift:703-718: keeping audio tap installed after calling recognitionRequest?.endAudio() may cause unsafe append() calls on a stale request
Code duplication between startRecognition and resumeRecognitionOnly (40-line audio tap setup block)
Silent audio player (volume 0.0) may not satisfy iOS background audio requirements if AVAudioEngine stops

The background keepalive approach is sound (continuous AVAudioEngine + VoIP mode), but the two issues above need attention before merge.

Confidence Score: 3/5

This PR has sound architecture but contains two logic issues that need fixing before merge
Score reflects two blocking issues: (1) behavioral breaking change in NodeAppModel.swift:1074 where speak ?? true → speak == true changes nil handling, and (2) potential crash in TalkModeManager.swift:703-718 where audio tap continues calling append() on an ended recognition request. The core background keepalive design is solid, but these issues must be resolved. Additional style improvements (code deduplication, error handling consistency) are recommended but non-blocking.
Pay close attention to apps/ios/Sources/Model/NodeAppModel.swift (backward compatibility break) and apps/ios/Sources/Voice/TalkModeManager.swift (tap lifecycle safety)

_{Last reviewed commit: 2e9ca02}

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

…app is backgrounded - Add 'talk.background.enabled' preference toggle in Settings > Features - Modify TalkModeManager.suspendForBackground() to accept keepActive parameter that preserves the audio session and recognition when enabled - Update NodeAppModel.setScenePhase() to check the preference and skip suspension when Background Voice is on - Update resumeAfterBackground() to skip restart when talk was kept active - Leverages existing UIBackgroundModes=audio entitlement to keep process alive When enabled, the audio session, speech recognition, and gateway chat subscription remain active in the background, allowing continuous voice conversation while browsing other apps.

…ive + add thinking chime Background voice was failing after the first response because stopRecognition() kills the audio engine, causing iOS to suspend the app before it can restart. Fix: When backgroundKeepAlive is enabled, play a silent audio loop (generated in-memory WAV) that keeps the UIBackgroundModes=audio session alive between speech recognition and TTS cycles. Also adds a subtle thinking chime (880Hz A5 note, 150ms fade-out) that plays when the assistant starts processing a transcript, giving audible feedback during the silence between speaking and the response. Changes: - Add backgroundKeepAlive flag and silent AVAudioPlayer loop - Start keepalive on background entry, stop on foreground return or Talk stop - Add playThinkingSound() called at start of processTranscript() - Generate audio programmatically (no asset files needed)

…alk Mode prompt Adds a 'Voice Directive Hint' toggle in Settings > Features that controls whether the ElevenLabs voice switching instruction is included in the Talk Mode prompt. Disabling it saves tokens when voice switching is not needed. Changes: - TalkPromptBuilder.build() now accepts includeVoiceDirectiveHint parameter - Add talk.voiceDirectiveHint.enabled AppStorage preference (default: true) - Add toggle + description in SettingsTab - TalkModeManager.buildPrompt() reads the preference - Add unit tests for the new parameter

PROBLEM ANALYSIS: - Talk Mode died when app was backgrounded despite UIBackgroundModes=audio - Root cause: processTranscript() called stopRecognition() which killed AVAudioEngine - iOS suspended the app when engine stopped/restarted between speech cycles - Silent AVAudioPlayer keepalive wasn't sufficient to maintain background execution RESEARCH FINDINGS: - iOS has strict background execution limits (Apple DTS Quinn Eskimo post) - No general-purpose continuous background execution allowed - BUT audio background mode IS designed for continuous audio apps like VoIP - Key insight: Must keep audio engine running continuously, never stop it SOLUTION IMPLEMENTED: 1. **Added VoIP background mode** to project.yml (audio + voip) 2. **Keep AVAudioEngine running continuously** when backgroundKeepAlive=true 3. **New pauseRecognitionOnly()/resumeRecognitionOnly() methods**: - pauseRecognitionOnly(): Stops speech recognition but keeps engine running - resumeRecognitionOnly(): Restarts recognition on already-running engine 4. **Updated processTranscript() lifecycle**: - In background mode: Use pauseRecognitionOnly() instead of stopRecognition() - After TTS: Use resumeRecognitionOnly() instead of full restart 5. **Improved background keepalive logging** for debugging 6. **Enhanced resumeAfterBackground()** to handle engine state properly TECHNICAL APPROACH: - processTranscript() → pauseRecognitionOnly() when backgroundKeepAlive=true - After speaking → resumeRecognitionOnly() when backgroundKeepAlive=true - playAssistant() → resumeRecognitionOnly() for interrupt handling - Never stop AVAudioEngine while in background voice mode - VoIP + audio background modes for maximum background execution time This follows iOS best practices for VoIP apps like Discord/WhatsApp that need continuous background audio processing. The engine stays alive, preventing iOS from suspending the app between speech recognition cycles. Tested approach based on Apple's background execution guidelines and real-world patterns from successful VoIP applications.

Add VoIP background mode, speaker bleed detection with grace period, push message speech queue, multi-turn follow-up polling, audio route change handling, startup chime, and config reload throttling. Rename settings labels for clarity.

greptile-apps

_{7 files reviewed, 7 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-15T17:01:11Z

apps/ios/Sources/Voice/TalkModeManager.swift

+            let tapDiagnostics = AudioTapDiagnostics(label: "talk") { [weak self] level in
+                guard let self else { return }
+                Task { @MainActor in
+                    let raw = max(0, min(Double(level) * 10.0, 1.0))
+                    let next = (self.micLevel * 0.80) + (raw * 0.20)
+                    self.micLevel = next
+
+                    if self.isListening, !self.isSpeaking, !self.noiseFloorReady {
+                        self.noiseFloorSamples.append(raw)
+                        if self.noiseFloorSamples.count >= 22 {
+                            let sorted = self.noiseFloorSamples.sorted()
+                            let take = max(6, sorted.count / 2)
+                            let slice = sorted.prefix(take)
+                            let avg = slice.reduce(0.0, +) / Double(slice.count)
+                            self.noiseFloor = avg
+                            self.noiseFloorReady = true
+                            self.noiseFloorSamples.removeAll(keepingCapacity: true)
+                            let threshold = min(0.35, max(0.12, avg + 0.10))
+                            GatewayDiagnostics.log(
+                                "talk audio: noiseFloor=\(String(format: "%.3f", avg)) threshold=\(String(format: "%.3f", threshold))")
+                        }
+                    }
+
+                    // Track speaker bleed baseline during TTS for interrupt gating.
+                    if self.isSpeechOutputActive {
+                        self.ttsAudioBaseline = (self.ttsAudioBaseline * 0.92) + (raw * 0.08)
+                    }
+
+                    let threshold: Double = if let floor = self.noiseFloor, self.noiseFloorReady {
+                        min(0.35, max(0.12, floor + 0.10))
+                    } else {
+                        0.18
+                    }
+                    if raw >= threshold {
+                        self.lastAudioActivity = Date()
+                    }
+                }
+            }


Code duplication: entire audio tap setup logic duplicated in resumeRecognitionOnly

This 40-line block (lines 765-802) is nearly identical to the tap setup in startRecognition (lines 520-560). Consider extracting to a shared method like setupAudioTap(request:) to reduce maintenance burden and prevent drift.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/ios/Sources/Voice/TalkModeManager.swift Line: 765:802 Comment: Code duplication: entire audio tap setup logic duplicated in `resumeRecognitionOnly` This 40-line block (lines 765-802) is nearly identical to the tap setup in `startRecognition` (lines 520-560). Consider extracting to a shared method like `setupAudioTap(request:)` to reduce maintenance burden and prevent drift. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-15T17:01:12Z

apps/ios/Sources/Voice/TalkModeManager.swift

+            // Remove the old tap (kept alive during pause) before installing the new one
+            if self.inputTapInstalled {
+                input.removeTap(onBus: 0)
+                self.inputTapInstalled = false
+            }


Verify tap state sync across engine restart paths

In pauseRecognitionOnly (line 703-719), the tap is kept installed. Here in resumeRecognitionOnly, you remove it before re-installing. If the audio engine stops/restarts outside this flow, the tap state could become inconsistent with the inputTapInstalled flag.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/ios/Sources/Voice/TalkModeManager.swift Line: 758:762 Comment: Verify tap state sync across engine restart paths In `pauseRecognitionOnly` (line 703-719), the tap is kept installed. Here in `resumeRecognitionOnly`, you remove it before re-installing. If the audio engine stops/restarts outside this flow, the tap state could become inconsistent with the `inputTapInstalled` flag. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-15T17:01:13Z

apps/ios/Sources/Voice/TalkModeManager.swift

+    private func pauseRecognitionOnly() {
+        self.recognitionTask?.cancel()
+        self.recognitionTask = nil
+        self.recognitionRequest?.endAudio()
+        self.recognitionRequest = nil
+        self.micLevel = 0
+        self.lastAudioActivity = nil
+        self.noiseFloorSamples.removeAll(keepingCapacity: true)
+        self.noiseFloor = nil
+        self.noiseFloorReady = false
+        // CRITICAL: Do NOT remove the input tap or stop the audioEngine in background mode.
+        // Keeping the tap installed means audio continues flowing through the engine,
+        // which iOS recognizes as active audio work and won't suspend us.
+        // The tap's captured request reference has had endAudio() called, so append() is a no-op.
+        self.speechRecognizer = nil
+        self.logger.info("paused recognition only, keeping engine + tap running for background mode (tapInstalled=\(self.inputTapInstalled))")


Tap continues processing with orphaned request after endAudio()

After calling recognitionRequest?.endAudio() on line 706, the tap's captured request reference is stale. The comment (line 716) says "append() is a no-op" but this isn't guaranteed - appending to an ended request could cause crashes or undefined behavior. The tap callback in makeAudioTapAppendCallback (line 862) calls request.append() unconditionally, which may not be safe after endAudio().

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/ios/Sources/Voice/TalkModeManager.swift Line: 703:718 Comment: Tap continues processing with orphaned request after `endAudio()` After calling `recognitionRequest?.endAudio()` on line 706, the tap's captured `request` reference is stale. The comment (line 716) says "append() is a no-op" but this isn't guaranteed - appending to an ended request could cause crashes or undefined behavior. The tap callback in `makeAudioTapAppendCallback` (line 862) calls `request.append()` unconditionally, which may not be safe after `endAudio()`. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-15T17:01:21Z

apps/ios/Sources/Voice/TalkModeManager.swift

+            self.recognitionTask = recognizer.recognitionTask(with: request) { [weak self] result, error in
+                guard let self else { return }
+                if let error {
+                    let msg = error.localizedDescription
+                    GatewayDiagnostics.log("talk speech: error=\(msg)")
+                    if !self.isSpeaking {
+                        if msg.localizedCaseInsensitiveContains("no speech detected") {
+                            self.statusText = self.isEnabled ? "Listening" : "Speech error: \(msg)"
+                        } else {
+                            self.statusText = "Speech error: \(msg)"
+                        }
+                    }
+                    self.logger.debug("speech recognition error: \(msg, privacy: .public)")
+                    if self.captureMode == .continuous, self.isEnabled, !self.isSpeaking {
+                        Task { @MainActor [weak self] in
+                            await self?.restartRecognitionAfterError()
+                        }
+                    }
+                }
+                guard let result else { return }
+                let transcript = result.bestTranscription.formattedString
+                if !result.isFinal, !self.loggedPartialThisCycle {
+                    let trimmed = transcript.trimmingCharacters(in: .whitespacesAndNewlines)
+                    if !trimmed.isEmpty {
+                        self.loggedPartialThisCycle = true
+                        GatewayDiagnostics.log("talk speech: partial chars=\(trimmed.count)")


Error handling inconsistency: captureMode != .idle check missing in resumeRecognitionOnly

In the main startRecognition error handler (lines 600-627), there's a check for captureMode != .idle before treating errors as recoverable. The resumeRecognitionOnly version (lines 818-843) lacks this check on line 823, meaning it may treat intentional stops differently.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/ios/Sources/Voice/TalkModeManager.swift Line: 818:843 Comment: Error handling inconsistency: `captureMode != .idle` check missing in `resumeRecognitionOnly` In the main `startRecognition` error handler (lines 600-627), there's a check for `captureMode != .idle` before treating errors as recoverable. The `resumeRecognitionOnly` version (lines 818-843) lacks this check on line 823, meaning it may treat intentional stops differently. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-15T17:01:22Z

apps/ios/Sources/Model/NodeAppModel.swift

        }

-        if params.speak ?? true {
+        if params.speak == true {


Semantic change: speak ?? true → speak == true

Before: params.speak ?? true (spoke by default when nil)
After: params.speak == true (only speaks when explicitly true)

This changes behavior when speak is nil - previously would speak, now won't. If the API contract expects nil to mean "speak by default", this breaks backward compatibility.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/ios/Sources/Model/NodeAppModel.swift Line: 1074:1074 Comment: Semantic change: `speak ?? true` → `speak == true` Before: `params.speak ?? true` (spoke by default when nil) After: `params.speak == true` (only speaks when explicitly true) This changes behavior when `speak` is `nil` - previously would speak, now won't. If the API contract expects `nil` to mean "speak by default", this breaks backward compatibility. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-15T17:01:24Z

Additional Comments (2)

apps/ios/Sources/Voice/TalkModeManager.swift
Silent audio player loop with volume 0.0 may not satisfy background audio requirement

Setting player.volume = 0.0 (line 1218) creates a completely silent loop. iOS may detect this as fake background activity and suspend the app anyway. Apple's background audio guidelines expect audible content. The comment acknowledges the AVAudioEngine should be primary keepalive, but if the engine stops, this silent loop may not be sufficient.

Prompt To Fix With AI

This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 1195:1226

Comment:
Silent audio player loop with volume 0.0 may not satisfy background audio requirement

Setting `player.volume = 0.0` (line 1218) creates a completely silent loop. iOS may detect this as fake background activity and suspend the app anyway. Apple's background audio guidelines expect audible content. The comment acknowledges the AVAudioEngine should be primary keepalive, but if the engine stops, this silent loop may not be sufficient.

How can I resolve this? If you propose a fix, please make it concise.

apps/ios/Sources/Voice/TalkModeManager.swift
Missing .allowBluetooth before this PR

The PR adds .allowBluetooth and .allowBluetoothA2DP (lines 2095-2096) but only .allowBluetoothHFP existed before. This means Bluetooth audio didn't work properly before this change. This is correct, but worth highlighting as a behavior change that may affect users.

Prompt To Fix With AI

This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 2092:2098

Comment:
Missing `.allowBluetooth` before this PR

The PR adds `.allowBluetooth` and `.allowBluetoothA2DP` (lines 2095-2096) but only `.allowBluetoothHFP` existed before. This means Bluetooth audio didn't work properly before this change. This is correct, but worth highlighting as a behavior change that may affect users.

How can I resolve this? If you propose a fix, please make it concise.

mbelinky · 2026-02-16T18:20:02Z

Closing as superseded by the merged slice PRs from this branch's intent:\n- #18250 (Voice Directive Hint)\n- #18261 (Background Listening core)\n- #18265 (Talk hardening: route-based barge-in gating)\n\nAttribution follow-ups were also applied in changelog where requested.

mbelinky · 2026-02-16T18:20:03Z

Superseded by #18250, #18261, and #18265.

zeulewan added 5 commits February 15, 2026 11:57

greptile-apps bot reviewed Feb 15, 2026

View reviewed changes

zeulewan marked this pull request as draft February 15, 2026 17:04

openclaw-barnacle bot added app: ios App: ios size: L labels Feb 15, 2026

thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f Compare February 15, 2026 18:46

This was referenced Feb 16, 2026

feat(ios): add background listening core toggle #18261

Merged

feat(ios): add Talk voice directive hint toggle #18250

Merged

fix(ios): gate talk barge-in on isolated audio routes #18265

Merged

mbelinky closed this Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ios): background voice mode with improved TTS handling#17319

feat(ios): background voice mode with improved TTS handling#17319
zeulewan wants to merge 5 commits intoopenclaw:mainfrom
zeulewan:feature/background-voice-mode

zeulewan commented Feb 15, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 15, 2026

Uh oh!

greptile-apps bot Feb 15, 2026

Uh oh!

greptile-apps bot Feb 15, 2026

Uh oh!

greptile-apps bot Feb 15, 2026

Uh oh!

greptile-apps bot Feb 15, 2026

Uh oh!

greptile-apps bot commented Feb 15, 2026

Uh oh!

mbelinky commented Feb 16, 2026

Uh oh!

mbelinky commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Uh oh!

Conversation

zeulewan commented Feb 15, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Current limitations

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 15, 2026

Uh oh!

mbelinky commented Feb 16, 2026

Uh oh!

mbelinky commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

zeulewan commented Feb 15, 2026 •

edited by greptile-apps bot

Loading