iOS Security Stack 4/5: TTS PCM->MP3 Fallback (#30885) by mbelinky · Pull Request #33032 · openclaw/openclaw

mbelinky · 2026-03-03T13:10:39Z

Part 4/5 of iOS security stack.

Includes #30885 on top of Part 3.

Scope

Auto-fallback ElevenLabs TTS output from PCM to MP3 when PCM is rejected
Persist fallback choice for session after first PCM failure
Re-probe PCM availability on config reload

Files

apps/ios/Sources/Voice/TalkModeManager.swift

Stack

Base: Part 3/5 (stack/ios-sec-03-security-guards)
Next PR: Part 5/5 (stack/ios-sec-05-docs)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ad8bb8630

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T13:14:42Z

apps/ios/Sources/Voice/TalkModeManager.swift

+        let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(
+            requestedOutputFormat ?? self.effectiveDefaultOutputFormat)


Rebuild incremental context after marking PCM unavailable

This captures effectiveDefaultOutputFormat into IncrementalSpeechContext.outputFormat only once, but incremental playback reuses the same context while the directive is unchanged. When speakIncrementalSegment later sets pcmFormatUnavailable = true after a PCM failure, subsequent incremental segments still use the stale PCM format and keep incurring failed PCM attempts before MP3 retry. In long streamed replies, this defeats the intended session-level fallback and adds repeated latency/API calls.

Useful? React with 👍 / 👎.

greptile-apps · 2026-03-03T13:15:15Z

Greptile Summary

This PR adds a session-level PCM→MP3 fallback mechanism for ElevenLabs TTS: after the first PCM rejection (e.g., 403 subscription_required), a pcmFormatUnavailable flag is set so all subsequent requests use mp3_44100_128 directly, avoiding repeated failed PCM attempts. The flag is re-probed on config reload. The approach integrates cleanly with the existing dual-player architecture and the effectiveDefaultOutputFormat computed property is a clean abstraction.

However, two issues affect the implementation:

Incremental segment caching limitation: Once pcmFormatUnavailable is set, the cached IncrementalSpeechContext (which stores the PCM output format) is not invalidated. updateIncrementalContextIfNeeded skips a rebuild when the directive is unchanged, so every remaining segment in the same streaming response still requests PCM, fails, and retries — partially defeating the stated "persist fallback within session" goal. The optimization only takes hold at the next assistant turn.
Config reload reset timing: pcmFormatUnavailable is cleared before the network request completes. A failed reload will still reset the flag, causing an unnecessary PCM retry on the immediately following TTS call.

Confidence Score: 2/5

While TTS will always succeed via fallback, the implementation has two gaps that affect the stated goal of persisting fallback within a session.
The fallback mechanism is functionally correct for inter-turn scenarios, but the cached incremental context means segments within the same streaming response still retry PCM before falling back — a meaningful limitation. Additionally, the reload reset timing could trigger unnecessary PCM retries after failed reloads. Neither issue causes data corruption or security concerns, but both reduce the efficiency of the fallback and contradict the stated goal. The author should address at least the incremental context caching issue to fully realize the intended behavior.
apps/ios/Sources/Voice/TalkModeManager.swift — specifically the interaction between speakIncrementalSegment, updateIncrementalContextIfNeeded, and the pcmFormatUnavailable flag; and the timing of the reset in reloadConfig.

_{Last reviewed commit: 4ad8bb8}

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-03T13:15:19Z

apps/ios/Sources/Voice/TalkModeManager.swift

            var playback = await self.pcmPlayer.play(stream: stream, sampleRate: sampleRate)
            if !playback.finished, playback.interruptedAt == nil {
                self.logger.warning("pcm playback failed; retrying mp3")
+                self.pcmFormatUnavailable = true
                self.lastPlaybackWasPCM = false


The pcmFormatUnavailable flag is set here when PCM playback fails, but the cached IncrementalSpeechContext is not invalidated. When updateIncrementalContextIfNeeded is called for the next segment in the same streaming response, it returns early if the directive hasn't changed (line 1468), leaving incrementalSpeechContext.outputFormat set to PCM. This means each remaining segment still requests PCM, fails, and retries — the behavior this PR aims to eliminate.

The optimization only takes effect at the next assistant turn when a new context is built via buildIncrementalSpeechContext, which correctly uses effectiveDefaultOutputFormat.

To fully honor the "persist fallback choice for session" goal for mid-stream segments, consider invalidating the cached context when the flag transitions:

self.pcmFormatUnavailable = true self.incrementalSpeechContext = nil // force rebuild on next segment self.lastPlaybackWasPCM = false

This approach ensures fallback is applied immediately to remaining segments in the current response.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/ios/Sources/Voice/TalkModeManager.swift Line: 1597-1601 Comment: The `pcmFormatUnavailable` flag is set here when PCM playback fails, but the cached `IncrementalSpeechContext` is not invalidated. When `updateIncrementalContextIfNeeded` is called for the next segment in the same streaming response, it returns early if the directive hasn't changed (line 1468), leaving `incrementalSpeechContext.outputFormat` set to PCM. This means each remaining segment still requests PCM, fails, and retries — the behavior this PR aims to eliminate. The optimization only takes effect at the *next* assistant turn when a new context is built via `buildIncrementalSpeechContext`, which correctly uses `effectiveDefaultOutputFormat`. To fully honor the "persist fallback choice for session" goal for mid-stream segments, consider invalidating the cached context when the flag transitions: ```swift self.pcmFormatUnavailable = true self.incrementalSpeechContext = nil // force rebuild on next segment self.lastPlaybackWasPCM = false ``` This approach ensures fallback is applied immediately to remaining segments in the current response. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-03T13:15:20Z

apps/ios/Sources/Voice/TalkModeManager.swift

    func reloadConfig() async {
        guard let gateway else { return }
+        self.pcmFormatUnavailable = false
        do {


pcmFormatUnavailable is reset to false before the network request completes. If gateway.request throws (network error, timeout, etc.), the flag is already cleared. The next TTS call will then attempt PCM again even though the subscription status has not actually been re-confirmed, potentially causing an unnecessary PCM round-trip and immediate fallback.

Consider resetting the flag only after a successful response:

func reloadConfig() async { guard let gateway else { return } do { let res = try await gateway.request( method: "talk.config", paramsJSON: "{\"includeSecrets\":true}", timeoutSeconds: 8 ) // Reset here after confirming successful response self.pcmFormatUnavailable = false // ... rest of processing } catch { // Flag remains set if reload fails } }

If re-probing on every reload attempt (including failures) is intentional, a comment explaining the rationale would help.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/ios/Sources/Voice/TalkModeManager.swift Line: 1936-1939 Comment: `pcmFormatUnavailable` is reset to `false` before the network request completes. If `gateway.request` throws (network error, timeout, etc.), the flag is already cleared. The next TTS call will then attempt PCM again even though the subscription status has not actually been re-confirmed, potentially causing an unnecessary PCM round-trip and immediate fallback. Consider resetting the flag only after a successful response: ```swift func reloadConfig() async { guard let gateway else { return } do { let res = try await gateway.request( method: "talk.config", paramsJSON: "{\"includeSecrets\":true}", timeoutSeconds: 8 ) // Reset here after confirming successful response self.pcmFormatUnavailable = false // ... rest of processing } catch { // Flag remains set if reload fails } } ``` If re-probing on every reload attempt (including failures) is intentional, a comment explaining the rationale would help. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7d43afdcd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T15:24:46Z

apps/ios/Sources/Voice/TalkModeManager.swift

                let desiredOutputFormat = (directive?.outputFormat ?? self.defaultOutputFormat)?
                    .trimmingCharacters(in: .whitespacesAndNewlines)
                let requestedOutputFormat = (desiredOutputFormat?.isEmpty == false) ? desiredOutputFormat : nil
-                let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(requestedOutputFormat ?? "pcm_44100")
+                let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(
+                    requestedOutputFormat ?? self.effectiveDefaultOutputFormat)


Apply session fallback when default output format is PCM

This still gives precedence to directive?.outputFormat ?? self.defaultOutputFormat, so once pcmFormatUnavailable is set, the MP3 fallback is skipped whenever config has an explicit outputFormat like pcm_44100 (loaded into defaultOutputFormat in reloadConfig). In that setup, each new utterance retries PCM, fails, and then retries MP3 again, which defeats the intended session-level fallback and adds repeated latency/API calls.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72b9e1eccb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T15:29:51Z

apps/ios/Sources/Voice/TalkModeManager.swift


    func reloadConfig() async {
        guard let gateway else { return }
+        self.pcmFormatUnavailable = false


Preserve PCM fallback across turns when reloading config

Resetting pcmFormatUnavailable at the start of reloadConfig clears the session-level fallback on every config refresh, and processTranscript calls reloadConfig() before each assistant reply. In environments where ElevenLabs rejects PCM, this causes every turn to retry PCM and incur the same failed request + MP3 retry latency instead of persisting the fallback for the active session. Reset should happen only after a successful re-probe/config change, not unconditionally on each reload call.

Useful? React with 👍 / 👎.

…mbelinky

…mbelinky

…mbelinky

…mbelinky

…mbelinky

…mbelinky

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87f34e305d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T16:18:11Z

apps/ios/Sources/Voice/TalkModeManager.swift

+        AsyncThrowingStream { continuation in
+            let task = Task {
+                do {
+                    for try await chunk in stream {
+                        continuation.yield(chunk)


Preserve backpressure when wrapping PCM streams

monitorStreamFailures forwards rawStream through a new AsyncThrowingStream driven by a separate Task, which uses the default unbounded buffering policy. That removes backpressure between playback and network reads, so on long responses the producer can enqueue audio chunks much faster than the PCM player consumes them, causing avoidable memory growth (and potential jetsam on lower-memory devices). Keep producer/consumer coupled or use a bounded buffering policy when proxying chunks.

Useful? React with 👍 / 👎.

openclaw-barnacle bot added the app: ios App: ios label Mar 3, 2026

chatgpt-codex-connector bot reviewed Mar 3, 2026

View reviewed changes

greptile-apps bot reviewed Mar 3, 2026

View reviewed changes

mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 64ce1a1 to 0477d46 Compare March 3, 2026 15:19

mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from 4ad8bb8 to c7d43af Compare March 3, 2026 15:19

chatgpt-codex-connector bot reviewed Mar 3, 2026

View reviewed changes

mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 0477d46 to 7504ea2 Compare March 3, 2026 15:25

mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from c7d43af to 72b9e1e Compare March 3, 2026 15:26

chatgpt-codex-connector bot reviewed Mar 3, 2026

View reviewed changes

openclaw-barnacle bot added size: S maintainer Maintainer-authored PR labels Mar 3, 2026

mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 7504ea2 to 136ec0d Compare March 3, 2026 15:43

mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from 72b9e1e to 172033e Compare March 3, 2026 15:43

mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 136ec0d to 53cfcb4 Compare March 3, 2026 15:56

mbelinky added a commit that referenced this pull request Mar 3, 2026

chore(changelog): add iOS TTS fallback entry for 33032 #33032 thanks @…

a330f48

…mbelinky

mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from 172033e to a330f48 Compare March 3, 2026 15:56

mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 53cfcb4 to 961dfc0 Compare March 3, 2026 16:10

mbelinky added a commit that referenced this pull request Mar 3, 2026

chore(changelog): add iOS TTS fallback entry for 33032 #33032 thanks @…

478be3e

…mbelinky

mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from a330f48 to 478be3e Compare March 3, 2026 16:10

mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 961dfc0 to 84864c4 Compare March 3, 2026 16:13

mbelinky added a commit that referenced this pull request Mar 3, 2026

chore(changelog): add iOS TTS fallback entry for 33032 #33032 thanks @…

87f34e3

…mbelinky

mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from 478be3e to 87f34e3 Compare March 3, 2026 16:13

mbelinky changed the base branch from stack/ios-sec-03-security-guards to main March 3, 2026 16:17

chatgpt-codex-connector bot reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

iOS Security Stack 4/5: TTS PCM->MP3 Fallback (#30885)#33032

iOS Security Stack 4/5: TTS PCM->MP3 Fallback (#30885)#33032
mbelinky merged 4 commits intomainfrom
stack/ios-sec-04-tts-fallback

mbelinky commented Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

greptile-apps bot commented Mar 3, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Mar 3, 2026

Uh oh!

greptile-apps bot Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

		let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(
		requestedOutputFormat ?? self.effectiveDefaultOutputFormat)

Uh oh!

Conversation

mbelinky commented Mar 3, 2026

Scope

Files

Stack

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Mar 3, 2026

Greptile Summary

Confidence Score: 2/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!