Skip to content

fix: route voice param correctly for CustomVoice TTS models#467

Merged
jundot merged 1 commit intojundot:mainfrom
ethannortharc:fix/tts-customvoice-routing
Mar 30, 2026
Merged

fix: route voice param correctly for CustomVoice TTS models#467
jundot merged 1 commit intojundot:mainfrom
ethannortharc:fix/tts-customvoice-routing

Conversation

@ethannortharc
Copy link
Copy Markdown
Contributor

@ethannortharc ethannortharc commented Mar 29, 2026

Summary

  • Fix voice parameter routing in TTSEngine.synthesize() that caused HTTP 500 for CustomVoice models (e.g. Qwen3-TTS-12Hz-1.7B-CustomVoice)
  • The previous logic checked for instruct first, misrouting the speaker name when a model accepts both voice and instruct parameters
  • Now prioritizes voice when present; falls back to instruct only for VoiceDesign-only models

Fixes #461

Changes

  • omlx/engine/tts.py: Swap parameter priority — check voice before instruct
  • tests/test_audio_tts.py: Add TestTTSVoiceRouting with 3 test cases covering CustomVoice, VoiceDesign, and voice-only models

Test plan

  • Unit tests pass (pytest tests/test_audio_tts.py — 15 passed)
  • Manual verification with Qwen3-TTS-12Hz-1.7B-CustomVoice-bf16 — tested all 7 available speakers:
    • vivian (Chinese text) — 200 OK, valid WAV
    • aiden (Japanese text) — 200 OK, valid WAV
    • serena (English text) — 200 OK, valid WAV
    • ryan (English text) — 200 OK, valid WAV
    • eric (English text) — 200 OK, valid WAV
    • dylan (Chinese text) — 200 OK, valid WAV
    • vivian (English text) — 200 OK, valid WAV
  • Qwen3-TTS-12Hz-1.7B-Base-bf16 and Qwen3-TTS-12Hz-0.6B-Base-bf16 also verified working with voice param

…dels

CustomVoice models (e.g. Qwen3-TTS-12Hz-1.7B-CustomVoice) accept both
'voice' and 'instruct' in generate(). The previous logic checked for
'instruct' first, causing the speaker name to be misrouted and the
model to raise a 500 error claiming 'voice' was missing.

Prioritize the 'voice' parameter when present; fall back to 'instruct'
only for VoiceDesign-only models that lack a 'voice' parameter.

Fixes jundot#461

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@jundot
Copy link
Copy Markdown
Owner

jundot commented Mar 30, 2026

Thanks for the fix, this was a solid root cause analysis.

I found one thing after merging though. All Qwen3-TTS variants (Base, CustomVoice, VoiceDesign) share the same generate(voice=, instruct=) signature, so checking for "voice" in gen_params is always true for every Qwen3-TTS model. This means the elif "instruct" branch never runs for VoiceDesign either.

In practice, if someone sends voice to a VoiceDesign model, it goes to the voice kwarg (which VoiceDesign ignores) and instruct stays empty, so the model raises a 500.

I added two follow-up commits on main:

  1. Changed AudioSpeechRequest.voice default from "default" to None. No model recognized "default" as a valid speaker anyway, and None lets models use their own defaults (Kokoro defaults to "af_heart", etc).

  2. Added an instructions field to AudioSpeechRequest. This maps directly to the instruct kwarg in generate(), so VoiceDesign users can pass voice descriptions through instructions instead of voice. It also means CustomVoice can now receive both a speaker name (voice) and a voice description (instructions) at the same time.

@ethannortharc
Copy link
Copy Markdown
Contributor Author

@jundot Thanks for the follow-up — that's a much cleaner solution. I hadn't considered that all Qwen3-TTS variants share the same signature, so the voice-first check would break VoiceDesign. Adding a separate instructions field makes the intent explicit at the API level instead of trying to infer it from the model signature. Good call on the None default too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/v1/audio/speech returns 500 for Qwen3-TTS CustomVoice even when voice is provided

2 participants