Skip to content

feat(ios): background voice mode with improved TTS handling#17319

Closed
zeulewan wants to merge 5 commits intoopenclaw:mainfrom
zeulewan:feature/background-voice-mode
Closed

feat(ios): background voice mode with improved TTS handling#17319
zeulewan wants to merge 5 commits intoopenclaw:mainfrom
zeulewan:feature/background-voice-mode

Conversation

@zeulewan
Copy link

@zeulewan zeulewan commented Feb 15, 2026

Summary

  • Problem: Talk Mode stops working when the iOS app is backgrounded because iOS suspends the audio engine
  • Why it matters: Users want hands-free voice interaction without keeping the app foregrounded (e.g. driving, walking)
  • What changed:
    • 557ece6 Add Background Voice toggle and basic background audio session
    • a9c7a8b Keep app alive in background with silent audio keepalive, add thinking chime
    • c4be86e Add toggle to disable ElevenLabs voice directive hint
    • 2a440fa Keep AVAudioEngine running continuously, pause/resume recognition instead of full teardown
    • 2e9ca02 Add VoIP background mode, speaker bleed detection, push speech queue, audio route change handling, startup chime, settings label cleanup
  • What did NOT change: Gateway-side logic, non-voice features, Android/macOS apps

Change Type (select all)

  • Feature

Scope (select all touched areas)

  • UI / DX

Linked Issue/PR

None

User-visible / Behavior Changes

  • New "Background Listening" toggle in Talk Mode settings (keeps voice mode active when backgrounded)
  • New "Voice Directive Hint" toggle to disable ElevenLabs voice switching instructions in the Talk Mode prompt
  • "Show Talk Button" renamed to "Overlay Button" with description text
  • Startup chime plays when voice mode begins listening
  • Silence window reduced from 0.9s to 0.6s for faster response
  • Speaker bleed detection prevents false interrupts when not using headphones
  • Background listening works with both headphones and speaker output
  • Background listening toggle can be enabled mid-conversation

Current limitations

  • Speech interruption works with headphones/Bluetooth but is disabled on speaker. On speaker, the mic picks up TTS output and falsely triggers interrupts. Threshold-based gating was attempted but the speaker bleed levels are too close to actual speech levels to reliably distinguish the two. Interrupt-on-speech is disabled on speaker for now. Users can still tap the orb to stop playback.
  • Switching audio devices mid-conversation (e.g. connecting/disconnecting headphones) has not been verified to work.
  • Using onboard TTS and STT drains battery. Background listening toggle is off by default and the description warns about battery usage.

Security Impact (required)

  • New permissions/capabilities? Yes - added voip UIBackgroundMode
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • VoIP background mode keeps the app alive in the background. This is standard iOS API usage for voice apps. Battery impact is disclosed to users in the settings toggle description.

Repro + Verification

Environment

  • OS: iOS 26.2.1
  • Device: iPhone 16
  • Runtime: Xcode 17C52, Swift 6.0

Steps

  1. Enable Talk Mode in settings
  2. Enable "Background Listening"
  3. Start a voice conversation, then background the app
  4. Speak - the assistant should still respond

Expected

  • Voice mode stays active in background

Actual

  • Voice mode stays active in background

Evidence

  • Xcode Debug build succeeded on physical device (iphoneos, arm64)
  • App deployed and launched on iPhone 16

Human Verification (required)

  • Verified: Xcode build + deploy to iPhone 16 (iOS 26.2.1), app launches, settings UI shows new toggles
  • Verified: ~10 minutes of background voice mode usage
  • Edge cases checked: Build with no signing issues, no compiler warnings

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert: Revert this PR. Background voice is additive, no existing behavior changes if reverted.

Risks and Mitigations

  • Risk: Battery drain from background audio keepalive
    • Mitigation: Toggle is off by default, description warns about battery usage
  • Risk: False speech interrupts from speaker bleed on non-headphone output
    • Mitigation: Interrupt-on-speech disabled entirely on speaker, only enabled with isolated audio output (headphones/Bluetooth). Users can tap the orb to stop playback manually.

AI-assisted: This PR was developed with Claude Code. Tested via Xcode build + deploy to physical iPhone 16. All code changes reviewed and understood.

Greptile Summary

Implements iOS background voice mode by keeping AVAudioEngine running continuously when backgrounded. The PR adds background audio session keepalive, speaker bleed detection, push speech queue handling, audio route change observers, and UI toggles.

Key changes:

  • New voip UIBackgroundMode enables background execution
  • pauseRecognitionOnly() / resumeRecognitionOnly() keep engine alive while pausing/resuming speech recognition
  • Speaker bleed detection disables interrupt-on-speech when using speaker output (headphones work normally)
  • Silent audio loop as secondary keepalive mechanism
  • Reduced silence window from 0.9s → 0.6s for faster response
  • Added startup/thinking chimes for audio feedback
  • New settings toggles: "Background Listening" and "Voice Directive Hint"

Issues found:

  • Behavioral change in NodeAppModel.swift:1074: changing speak ?? true to speak == true breaks backward compatibility when speak is nil (previously spoke by default, now won't speak)
  • Potential crash in TalkModeManager.swift:703-718: keeping audio tap installed after calling recognitionRequest?.endAudio() may cause unsafe append() calls on a stale request
  • Code duplication between startRecognition and resumeRecognitionOnly (40-line audio tap setup block)
  • Silent audio player (volume 0.0) may not satisfy iOS background audio requirements if AVAudioEngine stops

The background keepalive approach is sound (continuous AVAudioEngine + VoIP mode), but the two issues above need attention before merge.

Confidence Score: 3/5

  • This PR has sound architecture but contains two logic issues that need fixing before merge
  • Score reflects two blocking issues: (1) behavioral breaking change in NodeAppModel.swift:1074 where speak ?? truespeak == true changes nil handling, and (2) potential crash in TalkModeManager.swift:703-718 where audio tap continues calling append() on an ended recognition request. The core background keepalive design is solid, but these issues must be resolved. Additional style improvements (code deduplication, error handling consistency) are recommended but non-blocking.
  • Pay close attention to apps/ios/Sources/Model/NodeAppModel.swift (backward compatibility break) and apps/ios/Sources/Voice/TalkModeManager.swift (tap lifecycle safety)

Last reviewed commit: 2e9ca02

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

…app is backgrounded

- Add 'talk.background.enabled' preference toggle in Settings > Features
- Modify TalkModeManager.suspendForBackground() to accept keepActive parameter
  that preserves the audio session and recognition when enabled
- Update NodeAppModel.setScenePhase() to check the preference and skip
  suspension when Background Voice is on
- Update resumeAfterBackground() to skip restart when talk was kept active
- Leverages existing UIBackgroundModes=audio entitlement to keep process alive

When enabled, the audio session, speech recognition, and gateway chat subscription
remain active in the background, allowing continuous voice conversation while
browsing other apps.
…ive + add thinking chime

Background voice was failing after the first response because stopRecognition()
kills the audio engine, causing iOS to suspend the app before it can restart.

Fix: When backgroundKeepAlive is enabled, play a silent audio loop (generated
in-memory WAV) that keeps the UIBackgroundModes=audio session alive between
speech recognition and TTS cycles.

Also adds a subtle thinking chime (880Hz A5 note, 150ms fade-out) that plays
when the assistant starts processing a transcript, giving audible feedback
during the silence between speaking and the response.

Changes:
- Add backgroundKeepAlive flag and silent AVAudioPlayer loop
- Start keepalive on background entry, stop on foreground return or Talk stop
- Add playThinkingSound() called at start of processTranscript()
- Generate audio programmatically (no asset files needed)
…alk Mode prompt

Adds a 'Voice Directive Hint' toggle in Settings > Features that controls
whether the ElevenLabs voice switching instruction is included in the Talk
Mode prompt. Disabling it saves tokens when voice switching is not needed.

Changes:
- TalkPromptBuilder.build() now accepts includeVoiceDirectiveHint parameter
- Add talk.voiceDirectiveHint.enabled AppStorage preference (default: true)
- Add toggle + description in SettingsTab
- TalkModeManager.buildPrompt() reads the preference
- Add unit tests for the new parameter
PROBLEM ANALYSIS:
- Talk Mode died when app was backgrounded despite UIBackgroundModes=audio
- Root cause: processTranscript() called stopRecognition() which killed AVAudioEngine
- iOS suspended the app when engine stopped/restarted between speech cycles
- Silent AVAudioPlayer keepalive wasn't sufficient to maintain background execution

RESEARCH FINDINGS:
- iOS has strict background execution limits (Apple DTS Quinn Eskimo post)
- No general-purpose continuous background execution allowed
- BUT audio background mode IS designed for continuous audio apps like VoIP
- Key insight: Must keep audio engine running continuously, never stop it

SOLUTION IMPLEMENTED:
1. **Added VoIP background mode** to project.yml (audio + voip)
2. **Keep AVAudioEngine running continuously** when backgroundKeepAlive=true
3. **New pauseRecognitionOnly()/resumeRecognitionOnly() methods**:
   - pauseRecognitionOnly(): Stops speech recognition but keeps engine running
   - resumeRecognitionOnly(): Restarts recognition on already-running engine
4. **Updated processTranscript() lifecycle**:
   - In background mode: Use pauseRecognitionOnly() instead of stopRecognition()
   - After TTS: Use resumeRecognitionOnly() instead of full restart
5. **Improved background keepalive logging** for debugging
6. **Enhanced resumeAfterBackground()** to handle engine state properly

TECHNICAL APPROACH:
- processTranscript() → pauseRecognitionOnly() when backgroundKeepAlive=true
- After speaking → resumeRecognitionOnly() when backgroundKeepAlive=true
- playAssistant() → resumeRecognitionOnly() for interrupt handling
- Never stop AVAudioEngine while in background voice mode
- VoIP + audio background modes for maximum background execution time

This follows iOS best practices for VoIP apps like Discord/WhatsApp that need
continuous background audio processing. The engine stays alive, preventing
iOS from suspending the app between speech recognition cycles.

Tested approach based on Apple's background execution guidelines and
real-world patterns from successful VoIP applications.
Add VoIP background mode, speaker bleed detection with grace period,
push message speech queue, multi-turn follow-up polling, audio route
change handling, startup chime, and config reload throttling. Rename
settings labels for clarity.
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +765 to +802
let tapDiagnostics = AudioTapDiagnostics(label: "talk") { [weak self] level in
guard let self else { return }
Task { @MainActor in
let raw = max(0, min(Double(level) * 10.0, 1.0))
let next = (self.micLevel * 0.80) + (raw * 0.20)
self.micLevel = next

if self.isListening, !self.isSpeaking, !self.noiseFloorReady {
self.noiseFloorSamples.append(raw)
if self.noiseFloorSamples.count >= 22 {
let sorted = self.noiseFloorSamples.sorted()
let take = max(6, sorted.count / 2)
let slice = sorted.prefix(take)
let avg = slice.reduce(0.0, +) / Double(slice.count)
self.noiseFloor = avg
self.noiseFloorReady = true
self.noiseFloorSamples.removeAll(keepingCapacity: true)
let threshold = min(0.35, max(0.12, avg + 0.10))
GatewayDiagnostics.log(
"talk audio: noiseFloor=\(String(format: "%.3f", avg)) threshold=\(String(format: "%.3f", threshold))")
}
}

// Track speaker bleed baseline during TTS for interrupt gating.
if self.isSpeechOutputActive {
self.ttsAudioBaseline = (self.ttsAudioBaseline * 0.92) + (raw * 0.08)
}

let threshold: Double = if let floor = self.noiseFloor, self.noiseFloorReady {
min(0.35, max(0.12, floor + 0.10))
} else {
0.18
}
if raw >= threshold {
self.lastAudioActivity = Date()
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplication: entire audio tap setup logic duplicated in resumeRecognitionOnly

This 40-line block (lines 765-802) is nearly identical to the tap setup in startRecognition (lines 520-560). Consider extracting to a shared method like setupAudioTap(request:) to reduce maintenance burden and prevent drift.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 765:802

Comment:
Code duplication: entire audio tap setup logic duplicated in `resumeRecognitionOnly`

This 40-line block (lines 765-802) is nearly identical to the tap setup in `startRecognition` (lines 520-560). Consider extracting to a shared method like `setupAudioTap(request:)` to reduce maintenance burden and prevent drift.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +758 to +762
// Remove the old tap (kept alive during pause) before installing the new one
if self.inputTapInstalled {
input.removeTap(onBus: 0)
self.inputTapInstalled = false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify tap state sync across engine restart paths

In pauseRecognitionOnly (line 703-719), the tap is kept installed. Here in resumeRecognitionOnly, you remove it before re-installing. If the audio engine stops/restarts outside this flow, the tap state could become inconsistent with the inputTapInstalled flag.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 758:762

Comment:
Verify tap state sync across engine restart paths

In `pauseRecognitionOnly` (line 703-719), the tap is kept installed. Here in `resumeRecognitionOnly`, you remove it before re-installing. If the audio engine stops/restarts outside this flow, the tap state could become inconsistent with the `inputTapInstalled` flag.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +703 to +718
private func pauseRecognitionOnly() {
self.recognitionTask?.cancel()
self.recognitionTask = nil
self.recognitionRequest?.endAudio()
self.recognitionRequest = nil
self.micLevel = 0
self.lastAudioActivity = nil
self.noiseFloorSamples.removeAll(keepingCapacity: true)
self.noiseFloor = nil
self.noiseFloorReady = false
// CRITICAL: Do NOT remove the input tap or stop the audioEngine in background mode.
// Keeping the tap installed means audio continues flowing through the engine,
// which iOS recognizes as active audio work and won't suspend us.
// The tap's captured request reference has had endAudio() called, so append() is a no-op.
self.speechRecognizer = nil
self.logger.info("paused recognition only, keeping engine + tap running for background mode (tapInstalled=\(self.inputTapInstalled))")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tap continues processing with orphaned request after endAudio()

After calling recognitionRequest?.endAudio() on line 706, the tap's captured request reference is stale. The comment (line 716) says "append() is a no-op" but this isn't guaranteed - appending to an ended request could cause crashes or undefined behavior. The tap callback in makeAudioTapAppendCallback (line 862) calls request.append() unconditionally, which may not be safe after endAudio().

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 703:718

Comment:
Tap continues processing with orphaned request after `endAudio()`

After calling `recognitionRequest?.endAudio()` on line 706, the tap's captured `request` reference is stale. The comment (line 716) says "append() is a no-op" but this isn't guaranteed - appending to an ended request could cause crashes or undefined behavior. The tap callback in `makeAudioTapAppendCallback` (line 862) calls `request.append()` unconditionally, which may not be safe after `endAudio()`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +818 to +843
self.recognitionTask = recognizer.recognitionTask(with: request) { [weak self] result, error in
guard let self else { return }
if let error {
let msg = error.localizedDescription
GatewayDiagnostics.log("talk speech: error=\(msg)")
if !self.isSpeaking {
if msg.localizedCaseInsensitiveContains("no speech detected") {
self.statusText = self.isEnabled ? "Listening" : "Speech error: \(msg)"
} else {
self.statusText = "Speech error: \(msg)"
}
}
self.logger.debug("speech recognition error: \(msg, privacy: .public)")
if self.captureMode == .continuous, self.isEnabled, !self.isSpeaking {
Task { @MainActor [weak self] in
await self?.restartRecognitionAfterError()
}
}
}
guard let result else { return }
let transcript = result.bestTranscription.formattedString
if !result.isFinal, !self.loggedPartialThisCycle {
let trimmed = transcript.trimmingCharacters(in: .whitespacesAndNewlines)
if !trimmed.isEmpty {
self.loggedPartialThisCycle = true
GatewayDiagnostics.log("talk speech: partial chars=\(trimmed.count)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling inconsistency: captureMode != .idle check missing in resumeRecognitionOnly

In the main startRecognition error handler (lines 600-627), there's a check for captureMode != .idle before treating errors as recoverable. The resumeRecognitionOnly version (lines 818-843) lacks this check on line 823, meaning it may treat intentional stops differently.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 818:843

Comment:
Error handling inconsistency: `captureMode != .idle` check missing in `resumeRecognitionOnly`

In the main `startRecognition` error handler (lines 600-627), there's a check for `captureMode != .idle` before treating errors as recoverable. The `resumeRecognitionOnly` version (lines 818-843) lacks this check on line 823, meaning it may treat intentional stops differently.

How can I resolve this? If you propose a fix, please make it concise.

}

if params.speak ?? true {
if params.speak == true {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantic change: speak ?? truespeak == true

Before: params.speak ?? true (spoke by default when nil)
After: params.speak == true (only speaks when explicitly true)

This changes behavior when speak is nil - previously would speak, now won't. If the API contract expects nil to mean "speak by default", this breaks backward compatibility.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Model/NodeAppModel.swift
Line: 1074:1074

Comment:
Semantic change: `speak ?? true``speak == true`

Before: `params.speak ?? true` (spoke by default when nil)
After: `params.speak == true` (only speaks when explicitly true)

This changes behavior when `speak` is `nil` - previously would speak, now won't. If the API contract expects `nil` to mean "speak by default", this breaks backward compatibility.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 15, 2026

Additional Comments (2)

apps/ios/Sources/Voice/TalkModeManager.swift
Silent audio player loop with volume 0.0 may not satisfy background audio requirement

Setting player.volume = 0.0 (line 1218) creates a completely silent loop. iOS may detect this as fake background activity and suspend the app anyway. Apple's background audio guidelines expect audible content. The comment acknowledges the AVAudioEngine should be primary keepalive, but if the engine stops, this silent loop may not be sufficient.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 1195:1226

Comment:
Silent audio player loop with volume 0.0 may not satisfy background audio requirement

Setting `player.volume = 0.0` (line 1218) creates a completely silent loop. iOS may detect this as fake background activity and suspend the app anyway. Apple's background audio guidelines expect audible content. The comment acknowledges the AVAudioEngine should be primary keepalive, but if the engine stops, this silent loop may not be sufficient.

How can I resolve this? If you propose a fix, please make it concise.

apps/ios/Sources/Voice/TalkModeManager.swift
Missing .allowBluetooth before this PR

The PR adds .allowBluetooth and .allowBluetoothA2DP (lines 2095-2096) but only .allowBluetoothHFP existed before. This means Bluetooth audio didn't work properly before this change. This is correct, but worth highlighting as a behavior change that may affect users.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 2092:2098

Comment:
Missing `.allowBluetooth` before this PR

The PR adds `.allowBluetooth` and `.allowBluetoothA2DP` (lines 2095-2096) but only `.allowBluetoothHFP` existed before. This means Bluetooth audio didn't work properly before this change. This is correct, but worth highlighting as a behavior change that may affect users.

How can I resolve this? If you propose a fix, please make it concise.

@mbelinky
Copy link
Contributor

Closing as superseded by the merged slice PRs from this branch's intent:\n- #18250 (Voice Directive Hint)\n- #18261 (Background Listening core)\n- #18265 (Talk hardening: route-based barge-in gating)\n\nAttribution follow-ups were also applied in changelog where requested.

@mbelinky
Copy link
Contributor

Superseded by #18250, #18261, and #18265.

@mbelinky mbelinky closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments