feat(stt): WhisperModel logit bias for custom vocabulary#436
Closed
vyomshah05 wants to merge 1 commit intocactus-compute:mainfrom
Closed
feat(stt): WhisperModel logit bias for custom vocabulary#436vyomshah05 wants to merge 1 commit intocactus-compute:mainfrom
vyomshah05 wants to merge 1 commit intocactus-compute:mainfrom
Conversation
Collaborator
|
@vyomshah05 thanks so much, we wanna make sure the solution generalises to all speech models, not just whisper. @ammesatyajit will dive into this as he's leading the efforts at Cactus. The modifications should be here cactus/cactus/engine/engine_model.cpp Line 215 in 38632d3 |
- Add vocab_bias_ field and set_vocab_bias() to base Model class in engine.h - Apply vocab bias in Model::decode() by merging with tool_constrainer bias - Parse custom_vocabulary and vocabulary_boost from options_json in cactus_transcribe.cpp and cactus_stream.cpp - Tokenize vocabulary words and build token->boost map at decode time - Add test_vocab_bias_base_class to test_stt.cpp verifying full chain Signed-off-by: vyomshah05 <[email protected]>
2581f46 to
daa4f90
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
vocab_bias_infrastructure toWhisperModelinmodel.handimplements the bias application inside
decode_with_audio.Why
Part of the custom vocabulary / hotword biasing feature. Closes #396.
When a caller sets a token → boost map via
set_vocab_bias(), the decodeloop executes the graph first, converts logits to FP32, applies the
clamped boost values, then samples on CPU. When no bias is set, the
original
gb->sample()fast path is completely unchanged.Changes
cactus/models/model.h: addedvocab_bias_field andset_vocab_bias()public method to
WhisperModeldecode_with_audio: split sample+execute into fast path (no bias) andmanual sampling path (with bias), reusing existing FP32 conversion pattern
tests/test_stt.cpp: added three new tests:vocab_bias_transcription— verifies no crash and valid output whencustom_vocabularyis passed through options_jsonvocab_bias_affects_output— runs same audio with and without extremebias and logs both outputs for comparison
vocab_bias_direct— bypasses FFI entirely, callsset_vocab_bias()directly on the model to prove the engine path works in isolation
Not in scope for this PR
custom_vocabularyJSON →set_vocab_bias())Testing
cli/cactus test --only sttpasses. No regression on existing transcription.Note:
vocab_bias_affects_outputcurrently logs identical outputs becausethe FFI JSON parsing layer is not merged yet —
vocab_bias_is onlypopulated when called directly via
set_vocab_bias(). The test isstructured to catch regressions once the FFI wiring lands.