Skip to content

fix(ios): auto-fallback from PCM to MP3 for ElevenLabs TTS#30885

Closed
leepokai wants to merge 2 commits intoopenclaw:mainfrom
leepokai:fix/ios-tts-default-mp3
Closed

fix(ios): auto-fallback from PCM to MP3 for ElevenLabs TTS#30885
leepokai wants to merge 2 commits intoopenclaw:mainfrom
leepokai:fix/ios-tts-default-mp3

Conversation

@leepokai
Copy link
Copy Markdown

@leepokai leepokai commented Mar 1, 2026

Summary

  • Keep pcm_44100 as the default TTS output format (better quality/lower latency for Pro users)
  • When the ElevenLabs API rejects PCM with HTTP 403 (subscription_required), automatically fall back to mp3_44100_128 and set pcmFormatUnavailable to skip PCM for all subsequent requests in the session
  • The flag resets on reloadConfig() (reconnect/new session) so it re-probes the API
  • Standardize MP3 fallback format from mp3_44100 to mp3_44100_128 for consistent bitrate

Root cause

pcm_44100 output format requires an ElevenLabs Pro tier subscription. Users on free or starter plans receive a silent HTTP 403 error, causing TTS to fail with no audio output. The existing PCM-to-MP3 retry path fired on every segment without remembering the failure, wasting an API call each time.

Test plan

  • ElevenLabs Pro user: default PCM format works, no fallback triggered
  • ElevenLabs free/starter user: first TTS segment retries with MP3, all subsequent segments use MP3 directly (no double API calls)
  • After gateway reconnect, the format preference resets and re-probes PCM
  • Explicit outputFormat in talk config overrides the default (both PCM and MP3)

🤖 Generated with Claude Code

The default output format pcm_44100 requires an ElevenLabs Pro tier
subscription. Users on free or starter plans get a silent 403 failure
and hear no audio.

Instead of hardcoding mp3, keep pcm_44100 as the default (better
quality for Pro users) but remember the failure: when a PCM request
is rejected, set pcmFormatUnavailable and use mp3_44100_128 for all
subsequent requests in the session. The flag resets on config reload
so it re-probes after reconnection.

Also standardize the MP3 fallback format from mp3_44100 to
mp3_44100_128 for consistent bitrate.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fbc26ef9f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

self.gatewayTalkDefaultModelId = nil
self.gatewayTalkApiKeyConfigured = false
self.gatewayTalkConfigLoaded = false
self.pcmFormatUnavailable = false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset PCM fallback state on successful config reloads

pcmFormatUnavailable is only cleared in the catch branch of reloadConfig(), so a normal reconnect/config refresh leaves the flag stuck once it has been set. After a single PCM failure, subsequent sessions in the same app process will keep defaulting to MP3 instead of re-probing PCM, which contradicts the intended reconnect behavior and causes avoidable quality/latency degradation for users who can use PCM.

Useful? React with 👍 / 👎.

var playback = await self.pcmPlayer.play(stream: stream, sampleRate: sampleRate)
if !playback.finished, playback.interruptedAt == nil {
self.logger.warning("pcm playback failed; retrying mp3")
self.pcmFormatUnavailable = true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Invalidate incremental context after marking PCM unavailable

When incremental playback hits a PCM failure, this sets pcmFormatUnavailable but does not refresh the cached IncrementalSpeechContext; updateIncrementalContextIfNeeded() reuses that context while the directive is unchanged, so context.outputFormat can remain PCM for later segments. In cases where prefetched MP3 audio is unavailable, speakIncrementalSegment will still issue a new PCM synthesis request and then retry MP3, so the new “skip PCM after first failure” optimization is not consistently applied in incremental mode.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 1, 2026

Greptile Summary

This PR implements a session-level fallback from PCM to MP3 for ElevenLabs TTS when the API rejects the PCM format (typically due to subscription tier limitations). The implementation correctly adds a pcmFormatUnavailable flag and uses it via effectiveDefaultOutputFormat to avoid repeated PCM attempts after initial failure. MP3 format is also standardized to mp3_44100_128 for consistent bitrate.

Critical issue found: The flag is only reset in the catch block of reloadConfig() (line 2006), meaning it will NOT reset on successful reconnection. This breaks the intended re-probe behavior described in the PR description. The flag should be reset at the start of reloadConfig() to re-probe PCM on every config reload.

Confidence Score: 2/5

  • Critical logic bug prevents the re-probe behavior from working as designed
  • The flag reset placement in the error path instead of success path will prevent PCM format from being re-tried after successful reconnection, breaking core functionality described in the PR. The rest of the implementation is sound.
  • apps/ios/Sources/Voice/TalkModeManager.swift - flag reset logic needs correction

Last reviewed commit: fbc26ef

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

self.gatewayTalkDefaultModelId = nil
self.gatewayTalkApiKeyConfigured = false
self.gatewayTalkConfigLoaded = false
self.pcmFormatUnavailable = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag is only reset in the catch block when config loading fails. It won't reset on successful reconnect, breaking the re-probe behavior described in the PR. Move this to the beginning of reloadConfig() (before line 1935) so it resets on every config reload.

Suggested change
self.pcmFormatUnavailable = false
guard let gateway else { return }
self.pcmFormatUnavailable = false
do {
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Voice/TalkModeManager.swift
Line: 2006

Comment:
The flag is only reset in the catch block when config loading fails. It won't reset on successful reconnect, breaking the re-probe behavior described in the PR. Move this to the beginning of `reloadConfig()` (before line 1935) so it resets on every config reload.

```suggestion
        guard let gateway else { return }
        self.pcmFormatUnavailable = false
        do {
```

How can I resolve this? If you propose a fix, please make it concise.

mbelinky added a commit that referenced this pull request Mar 3, 2026
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: f77e3d7
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky
@mbelinky
Copy link
Copy Markdown
Contributor

mbelinky commented Mar 3, 2026

Closing as superseded by merged PR #33032 (merge commit bf70610), which lands this PCM->MP3 fallback behavior (with additional guardrails from stack review). Thanks @leepokai for the original implementation.

@mbelinky mbelinky closed this Mar 3, 2026
dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026
…law#33032)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: f77e3d7
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky