iOS Security Stack 4/5: TTS PCM->MP3 Fallback (#30885)#33032
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ad8bb8630
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let outputFormat = ElevenLabsTTSClient.validatedOutputFormat( | ||
| requestedOutputFormat ?? self.effectiveDefaultOutputFormat) |
There was a problem hiding this comment.
Rebuild incremental context after marking PCM unavailable
This captures effectiveDefaultOutputFormat into IncrementalSpeechContext.outputFormat only once, but incremental playback reuses the same context while the directive is unchanged. When speakIncrementalSegment later sets pcmFormatUnavailable = true after a PCM failure, subsequent incremental segments still use the stale PCM format and keep incurring failed PCM attempts before MP3 retry. In long streamed replies, this defeats the intended session-level fallback and adds repeated latency/API calls.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR adds a session-level PCM→MP3 fallback mechanism for ElevenLabs TTS: after the first PCM rejection (e.g., 403 However, two issues affect the implementation:
Confidence Score: 2/5
Last reviewed commit: 4ad8bb8 |
| var playback = await self.pcmPlayer.play(stream: stream, sampleRate: sampleRate) | ||
| if !playback.finished, playback.interruptedAt == nil { | ||
| self.logger.warning("pcm playback failed; retrying mp3") | ||
| self.pcmFormatUnavailable = true | ||
| self.lastPlaybackWasPCM = false |
There was a problem hiding this comment.
The pcmFormatUnavailable flag is set here when PCM playback fails, but the cached IncrementalSpeechContext is not invalidated. When updateIncrementalContextIfNeeded is called for the next segment in the same streaming response, it returns early if the directive hasn't changed (line 1468), leaving incrementalSpeechContext.outputFormat set to PCM. This means each remaining segment still requests PCM, fails, and retries — the behavior this PR aims to eliminate.
The optimization only takes effect at the next assistant turn when a new context is built via buildIncrementalSpeechContext, which correctly uses effectiveDefaultOutputFormat.
To fully honor the "persist fallback choice for session" goal for mid-stream segments, consider invalidating the cached context when the flag transitions:
self.pcmFormatUnavailable = true
self.incrementalSpeechContext = nil // force rebuild on next segment
self.lastPlaybackWasPCM = falseThis approach ensures fallback is applied immediately to remaining segments in the current response.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 1597-1601
Comment:
The `pcmFormatUnavailable` flag is set here when PCM playback fails, but the cached `IncrementalSpeechContext` is not invalidated. When `updateIncrementalContextIfNeeded` is called for the next segment in the same streaming response, it returns early if the directive hasn't changed (line 1468), leaving `incrementalSpeechContext.outputFormat` set to PCM. This means each remaining segment still requests PCM, fails, and retries — the behavior this PR aims to eliminate.
The optimization only takes effect at the *next* assistant turn when a new context is built via `buildIncrementalSpeechContext`, which correctly uses `effectiveDefaultOutputFormat`.
To fully honor the "persist fallback choice for session" goal for mid-stream segments, consider invalidating the cached context when the flag transitions:
```swift
self.pcmFormatUnavailable = true
self.incrementalSpeechContext = nil // force rebuild on next segment
self.lastPlaybackWasPCM = false
```
This approach ensures fallback is applied immediately to remaining segments in the current response.
How can I resolve this? If you propose a fix, please make it concise.| func reloadConfig() async { | ||
| guard let gateway else { return } | ||
| self.pcmFormatUnavailable = false | ||
| do { |
There was a problem hiding this comment.
pcmFormatUnavailable is reset to false before the network request completes. If gateway.request throws (network error, timeout, etc.), the flag is already cleared. The next TTS call will then attempt PCM again even though the subscription status has not actually been re-confirmed, potentially causing an unnecessary PCM round-trip and immediate fallback.
Consider resetting the flag only after a successful response:
func reloadConfig() async {
guard let gateway else { return }
do {
let res = try await gateway.request(
method: "talk.config",
paramsJSON: "{\"includeSecrets\":true}",
timeoutSeconds: 8
)
// Reset here after confirming successful response
self.pcmFormatUnavailable = false
// ... rest of processing
} catch {
// Flag remains set if reload fails
}
}If re-probing on every reload attempt (including failures) is intentional, a comment explaining the rationale would help.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 1936-1939
Comment:
`pcmFormatUnavailable` is reset to `false` before the network request completes. If `gateway.request` throws (network error, timeout, etc.), the flag is already cleared. The next TTS call will then attempt PCM again even though the subscription status has not actually been re-confirmed, potentially causing an unnecessary PCM round-trip and immediate fallback.
Consider resetting the flag only after a successful response:
```swift
func reloadConfig() async {
guard let gateway else { return }
do {
let res = try await gateway.request(
method: "talk.config",
paramsJSON: "{\"includeSecrets\":true}",
timeoutSeconds: 8
)
// Reset here after confirming successful response
self.pcmFormatUnavailable = false
// ... rest of processing
} catch {
// Flag remains set if reload fails
}
}
```
If re-probing on every reload attempt (including failures) is intentional, a comment explaining the rationale would help.
How can I resolve this? If you propose a fix, please make it concise.64ce1a1 to
0477d46
Compare
4ad8bb8 to
c7d43af
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7d43afdcd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let desiredOutputFormat = (directive?.outputFormat ?? self.defaultOutputFormat)? | ||
| .trimmingCharacters(in: .whitespacesAndNewlines) | ||
| let requestedOutputFormat = (desiredOutputFormat?.isEmpty == false) ? desiredOutputFormat : nil | ||
| let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(requestedOutputFormat ?? "pcm_44100") | ||
| let outputFormat = ElevenLabsTTSClient.validatedOutputFormat( | ||
| requestedOutputFormat ?? self.effectiveDefaultOutputFormat) |
There was a problem hiding this comment.
Apply session fallback when default output format is PCM
This still gives precedence to directive?.outputFormat ?? self.defaultOutputFormat, so once pcmFormatUnavailable is set, the MP3 fallback is skipped whenever config has an explicit outputFormat like pcm_44100 (loaded into defaultOutputFormat in reloadConfig). In that setup, each new utterance retries PCM, fails, and then retries MP3 again, which defeats the intended session-level fallback and adds repeated latency/API calls.
Useful? React with 👍 / 👎.
0477d46 to
7504ea2
Compare
c7d43af to
72b9e1e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 72b9e1eccb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| func reloadConfig() async { | ||
| guard let gateway else { return } | ||
| self.pcmFormatUnavailable = false |
There was a problem hiding this comment.
Preserve PCM fallback across turns when reloading config
Resetting pcmFormatUnavailable at the start of reloadConfig clears the session-level fallback on every config refresh, and processTranscript calls reloadConfig() before each assistant reply. In environments where ElevenLabs rejects PCM, this causes every turn to retry PCM and incur the same failed request + MP3 retry latency instead of persisting the fallback for the active session. Reset should happen only after a successful re-probe/config change, not unconditionally on each reload call.
Useful? React with 👍 / 👎.
7504ea2 to
136ec0d
Compare
72b9e1e to
172033e
Compare
136ec0d to
53cfcb4
Compare
172033e to
a330f48
Compare
53cfcb4 to
961dfc0
Compare
a330f48 to
478be3e
Compare
961dfc0 to
84864c4
Compare
478be3e to
87f34e3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87f34e305d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| AsyncThrowingStream { continuation in | ||
| let task = Task { | ||
| do { | ||
| for try await chunk in stream { | ||
| continuation.yield(chunk) |
There was a problem hiding this comment.
Preserve backpressure when wrapping PCM streams
monitorStreamFailures forwards rawStream through a new AsyncThrowingStream driven by a separate Task, which uses the default unbounded buffering policy. That removes backpressure between playback and network reads, so on long responses the producer can enqueue audio chunks much faster than the PCM player consumes them, causing avoidable memory growth (and potential jetsam on lower-memory devices). Keep producer/consumer coupled or use a bounded buffering policy when proxying chunks.
Useful? React with 👍 / 👎.
Part 4/5 of iOS security stack.
Includes #30885 on top of Part 3.
Scope
Files
Stack
stack/ios-sec-03-security-guards)stack/ios-sec-05-docs)