Skip to content

iOS Security Stack 4/5: TTS PCM->MP3 Fallback (#30885)#33032

Merged
mbelinky merged 4 commits intomainfrom
stack/ios-sec-04-tts-fallback
Mar 3, 2026
Merged

iOS Security Stack 4/5: TTS PCM->MP3 Fallback (#30885)#33032
mbelinky merged 4 commits intomainfrom
stack/ios-sec-04-tts-fallback

Conversation

@mbelinky
Copy link
Copy Markdown
Contributor

@mbelinky mbelinky commented Mar 3, 2026

Part 4/5 of iOS security stack.

Includes #30885 on top of Part 3.

Scope

  • Auto-fallback ElevenLabs TTS output from PCM to MP3 when PCM is rejected
  • Persist fallback choice for session after first PCM failure
  • Re-probe PCM availability on config reload

Files

  • apps/ios/Sources/Voice/TalkModeManager.swift

Stack

  • Base: Part 3/5 (stack/ios-sec-03-security-guards)
  • Next PR: Part 5/5 (stack/ios-sec-05-docs)

@openclaw-barnacle openclaw-barnacle bot added the app: ios App: ios label Mar 3, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ad8bb8630

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1488 to +1489
let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(
requestedOutputFormat ?? self.effectiveDefaultOutputFormat)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Rebuild incremental context after marking PCM unavailable

This captures effectiveDefaultOutputFormat into IncrementalSpeechContext.outputFormat only once, but incremental playback reuses the same context while the directive is unchanged. When speakIncrementalSegment later sets pcmFormatUnavailable = true after a PCM failure, subsequent incremental segments still use the stale PCM format and keep incurring failed PCM attempts before MP3 retry. In long streamed replies, this defeats the intended session-level fallback and adds repeated latency/API calls.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR adds a session-level PCM→MP3 fallback mechanism for ElevenLabs TTS: after the first PCM rejection (e.g., 403 subscription_required), a pcmFormatUnavailable flag is set so all subsequent requests use mp3_44100_128 directly, avoiding repeated failed PCM attempts. The flag is re-probed on config reload. The approach integrates cleanly with the existing dual-player architecture and the effectiveDefaultOutputFormat computed property is a clean abstraction.

However, two issues affect the implementation:

  1. Incremental segment caching limitation: Once pcmFormatUnavailable is set, the cached IncrementalSpeechContext (which stores the PCM output format) is not invalidated. updateIncrementalContextIfNeeded skips a rebuild when the directive is unchanged, so every remaining segment in the same streaming response still requests PCM, fails, and retries — partially defeating the stated "persist fallback within session" goal. The optimization only takes hold at the next assistant turn.

  2. Config reload reset timing: pcmFormatUnavailable is cleared before the network request completes. A failed reload will still reset the flag, causing an unnecessary PCM retry on the immediately following TTS call.

Confidence Score: 2/5

  • While TTS will always succeed via fallback, the implementation has two gaps that affect the stated goal of persisting fallback within a session.
  • The fallback mechanism is functionally correct for inter-turn scenarios, but the cached incremental context means segments within the same streaming response still retry PCM before falling back — a meaningful limitation. Additionally, the reload reset timing could trigger unnecessary PCM retries after failed reloads. Neither issue causes data corruption or security concerns, but both reduce the efficiency of the fallback and contradict the stated goal. The author should address at least the incremental context caching issue to fully realize the intended behavior.
  • apps/ios/Sources/Voice/TalkModeManager.swift — specifically the interaction between speakIncrementalSegment, updateIncrementalContextIfNeeded, and the pcmFormatUnavailable flag; and the timing of the reset in reloadConfig.

Last reviewed commit: 4ad8bb8

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 1597 to 1601
var playback = await self.pcmPlayer.play(stream: stream, sampleRate: sampleRate)
if !playback.finished, playback.interruptedAt == nil {
self.logger.warning("pcm playback failed; retrying mp3")
self.pcmFormatUnavailable = true
self.lastPlaybackWasPCM = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pcmFormatUnavailable flag is set here when PCM playback fails, but the cached IncrementalSpeechContext is not invalidated. When updateIncrementalContextIfNeeded is called for the next segment in the same streaming response, it returns early if the directive hasn't changed (line 1468), leaving incrementalSpeechContext.outputFormat set to PCM. This means each remaining segment still requests PCM, fails, and retries — the behavior this PR aims to eliminate.

The optimization only takes effect at the next assistant turn when a new context is built via buildIncrementalSpeechContext, which correctly uses effectiveDefaultOutputFormat.

To fully honor the "persist fallback choice for session" goal for mid-stream segments, consider invalidating the cached context when the flag transitions:

self.pcmFormatUnavailable = true
self.incrementalSpeechContext = nil   // force rebuild on next segment
self.lastPlaybackWasPCM = false

This approach ensures fallback is applied immediately to remaining segments in the current response.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 1597-1601

Comment:
The `pcmFormatUnavailable` flag is set here when PCM playback fails, but the cached `IncrementalSpeechContext` is not invalidated. When `updateIncrementalContextIfNeeded` is called for the next segment in the same streaming response, it returns early if the directive hasn't changed (line 1468), leaving `incrementalSpeechContext.outputFormat` set to PCM. This means each remaining segment still requests PCM, fails, and retries — the behavior this PR aims to eliminate.

The optimization only takes effect at the *next* assistant turn when a new context is built via `buildIncrementalSpeechContext`, which correctly uses `effectiveDefaultOutputFormat`.

To fully honor the "persist fallback choice for session" goal for mid-stream segments, consider invalidating the cached context when the flag transitions:

```swift
self.pcmFormatUnavailable = true
self.incrementalSpeechContext = nil   // force rebuild on next segment
self.lastPlaybackWasPCM = false
```

This approach ensures fallback is applied immediately to remaining segments in the current response.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 1936 to 1939
func reloadConfig() async {
guard let gateway else { return }
self.pcmFormatUnavailable = false
do {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pcmFormatUnavailable is reset to false before the network request completes. If gateway.request throws (network error, timeout, etc.), the flag is already cleared. The next TTS call will then attempt PCM again even though the subscription status has not actually been re-confirmed, potentially causing an unnecessary PCM round-trip and immediate fallback.

Consider resetting the flag only after a successful response:

func reloadConfig() async {
    guard let gateway else { return }
    do {
        let res = try await gateway.request(
            method: "talk.config",
            paramsJSON: "{\"includeSecrets\":true}",
            timeoutSeconds: 8
        )
        // Reset here after confirming successful response
        self.pcmFormatUnavailable = false
        // ... rest of processing
    } catch {
        // Flag remains set if reload fails
    }
}

If re-probing on every reload attempt (including failures) is intentional, a comment explaining the rationale would help.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 1936-1939

Comment:
`pcmFormatUnavailable` is reset to `false` before the network request completes. If `gateway.request` throws (network error, timeout, etc.), the flag is already cleared. The next TTS call will then attempt PCM again even though the subscription status has not actually been re-confirmed, potentially causing an unnecessary PCM round-trip and immediate fallback.

Consider resetting the flag only after a successful response:

```swift
func reloadConfig() async {
    guard let gateway else { return }
    do {
        let res = try await gateway.request(
            method: "talk.config",
            paramsJSON: "{\"includeSecrets\":true}",
            timeoutSeconds: 8
        )
        // Reset here after confirming successful response
        self.pcmFormatUnavailable = false
        // ... rest of processing
    } catch {
        // Flag remains set if reload fails
    }
}
```

If re-probing on every reload attempt (including failures) is intentional, a comment explaining the rationale would help.

How can I resolve this? If you propose a fix, please make it concise.

@mbelinky mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 64ce1a1 to 0477d46 Compare March 3, 2026 15:19
@mbelinky mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from 4ad8bb8 to c7d43af Compare March 3, 2026 15:19
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7d43afdcd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 1027 to +1031
let desiredOutputFormat = (directive?.outputFormat ?? self.defaultOutputFormat)?
.trimmingCharacters(in: .whitespacesAndNewlines)
let requestedOutputFormat = (desiredOutputFormat?.isEmpty == false) ? desiredOutputFormat : nil
let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(requestedOutputFormat ?? "pcm_44100")
let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(
requestedOutputFormat ?? self.effectiveDefaultOutputFormat)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply session fallback when default output format is PCM

This still gives precedence to directive?.outputFormat ?? self.defaultOutputFormat, so once pcmFormatUnavailable is set, the MP3 fallback is skipped whenever config has an explicit outputFormat like pcm_44100 (loaded into defaultOutputFormat in reloadConfig). In that setup, each new utterance retries PCM, fails, and then retries MP3 again, which defeats the intended session-level fallback and adds repeated latency/API calls.

Useful? React with 👍 / 👎.

@mbelinky mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 0477d46 to 7504ea2 Compare March 3, 2026 15:25
@mbelinky mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from c7d43af to 72b9e1e Compare March 3, 2026 15:26
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72b9e1eccb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


func reloadConfig() async {
guard let gateway else { return }
self.pcmFormatUnavailable = false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve PCM fallback across turns when reloading config

Resetting pcmFormatUnavailable at the start of reloadConfig clears the session-level fallback on every config refresh, and processTranscript calls reloadConfig() before each assistant reply. In environments where ElevenLabs rejects PCM, this causes every turn to retry PCM and incur the same failed request + MP3 retry latency instead of persisting the fallback for the active session. Reset should happen only after a successful re-probe/config change, not unconditionally on each reload call.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle bot added size: S maintainer Maintainer-authored PR labels Mar 3, 2026
@mbelinky mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 7504ea2 to 136ec0d Compare March 3, 2026 15:43
@mbelinky mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from 72b9e1e to 172033e Compare March 3, 2026 15:43
@mbelinky mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 136ec0d to 53cfcb4 Compare March 3, 2026 15:56
@mbelinky mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from 172033e to a330f48 Compare March 3, 2026 15:56
@mbelinky mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 53cfcb4 to 961dfc0 Compare March 3, 2026 16:10
@mbelinky mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from a330f48 to 478be3e Compare March 3, 2026 16:10
@mbelinky mbelinky force-pushed the stack/ios-sec-03-security-guards branch from 961dfc0 to 84864c4 Compare March 3, 2026 16:13
@mbelinky mbelinky force-pushed the stack/ios-sec-04-tts-fallback branch from 478be3e to 87f34e3 Compare March 3, 2026 16:13
@mbelinky mbelinky changed the base branch from stack/ios-sec-03-security-guards to main March 3, 2026 16:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87f34e305d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1574 to +1578
AsyncThrowingStream { continuation in
let task = Task {
do {
for try await chunk in stream {
continuation.yield(chunk)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve backpressure when wrapping PCM streams

monitorStreamFailures forwards rawStream through a new AsyncThrowingStream driven by a separate Task, which uses the default unbounded buffering policy. That removes backpressure between playback and network reads, so on long responses the producer can enqueue audio chunks much faster than the PCM player consumes them, causing avoidable memory growth (and potential jetsam on lower-memory devices). Keep producer/consumer coupled or use a bounded buffering policy when proxying chunks.

Useful? React with 👍 / 👎.

@openclaw-barnacle