feat(stt): custom vocabulary biasing for all speech models#451
Merged
HenryNdubuaku merged 1 commit intocactus-compute:mainfrom Mar 7, 2026
Merged
Conversation
Signed-off-by: ammesatyajit <[email protected]>
8c3c543 to
4a7993b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #396.
Adds custom vocabulary biasing for all speech models (Whisper, Moonshine).
This PR supersedes #436, which added the bias infrastructure to
WhisperModelonly. Based on the review, the implementation has been moved to the baseModelclass, so it generalises to all speech models.Changes
cactus/engine/engine.h: addedvocab_bias_field andset_vocab_bias()to base
Modelclass so all speech models inherit it automaticallycactus/engine/engine_model.cpp: mergevocab_bias_intotool_constrainerbias map inside
Model::decode()before passing togb->sample()cactus/ffi/cactus_transcribe.cpp: parsecustom_vocabularyandvocabulary_boostfromoptions_json, tokenize each word, callset_vocab_bias()cactus/ffi/cactus_stream.cpp: same parsing for the streaming pathtests/test_stt.cpp: addedtest_vocab_bias_base_classwhich verifiesthe full chain — JSON parsing → tokenization → bias map → decode
How it works
When
custom_vocabularyis passed inoptions_json, the FFI layer tokenizeseach word and builds a
token_id → boostmap. InsideModel::decode()thismap is merged with the existing tool constrainer bias and passed to
gb->sample().Boost values are clamped to [0, 20] to prevent degenerate outputs.
Testing
All stt tests pass. Debug output from cactus_transcribe.cpp confirmed the full chain works: