fix: route voice param correctly for CustomVoice TTS models#467
fix: route voice param correctly for CustomVoice TTS models#467jundot merged 1 commit intojundot:mainfrom
Conversation
…dels CustomVoice models (e.g. Qwen3-TTS-12Hz-1.7B-CustomVoice) accept both 'voice' and 'instruct' in generate(). The previous logic checked for 'instruct' first, causing the speaker name to be misrouted and the model to raise a 500 error claiming 'voice' was missing. Prioritize the 'voice' parameter when present; fall back to 'instruct' only for VoiceDesign-only models that lack a 'voice' parameter. Fixes jundot#461 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
747df95 to
31d2a67
Compare
|
Thanks for the fix, this was a solid root cause analysis. I found one thing after merging though. All Qwen3-TTS variants (Base, CustomVoice, VoiceDesign) share the same In practice, if someone sends I added two follow-up commits on main:
|
|
@jundot Thanks for the follow-up — that's a much cleaner solution. I hadn't considered that all Qwen3-TTS variants share the same signature, so the |
Summary
TTSEngine.synthesize()that caused HTTP 500 for CustomVoice models (e.g.Qwen3-TTS-12Hz-1.7B-CustomVoice)instructfirst, misrouting the speaker name when a model accepts bothvoiceandinstructparametersvoicewhen present; falls back toinstructonly for VoiceDesign-only modelsFixes #461
Changes
omlx/engine/tts.py: Swap parameter priority — checkvoicebeforeinstructtests/test_audio_tts.py: AddTestTTSVoiceRoutingwith 3 test cases covering CustomVoice, VoiceDesign, and voice-only modelsTest plan
pytest tests/test_audio_tts.py— 15 passed)Qwen3-TTS-12Hz-1.7B-CustomVoice-bf16— tested all 7 available speakers:vivian(Chinese text) — 200 OK, valid WAVaiden(Japanese text) — 200 OK, valid WAVserena(English text) — 200 OK, valid WAVryan(English text) — 200 OK, valid WAVeric(English text) — 200 OK, valid WAVdylan(Chinese text) — 200 OK, valid WAVvivian(English text) — 200 OK, valid WAVQwen3-TTS-12Hz-1.7B-Base-bf16andQwen3-TTS-12Hz-0.6B-Base-bf16also verified working with voice param