Skip to content

feat(stt): WhisperModel logit bias for custom vocabulary#436

Closed
vyomshah05 wants to merge 1 commit intocactus-compute:mainfrom
vyomshah05:feat/whisper-vocab-bias
Closed

feat(stt): WhisperModel logit bias for custom vocabulary#436
vyomshah05 wants to merge 1 commit intocactus-compute:mainfrom
vyomshah05:feat/whisper-vocab-bias

Conversation

@vyomshah05
Copy link
Copy Markdown
Contributor

What

Adds vocab_bias_ infrastructure to WhisperModel in model.h and
implements the bias application inside decode_with_audio.

Why

Part of the custom vocabulary / hotword biasing feature. Closes #396.

When a caller sets a token → boost map via set_vocab_bias(), the decode
loop executes the graph first, converts logits to FP32, applies the
clamped boost values, then samples on CPU. When no bias is set, the
original gb->sample() fast path is completely unchanged.

Changes

  • cactus/models/model.h: added vocab_bias_ field and set_vocab_bias()
    public method to WhisperModel
  • decode_with_audio: split sample+execute into fast path (no bias) and
    manual sampling path (with bias), reusing existing FP32 conversion pattern
  • tests/test_stt.cpp: added three new tests:
    • vocab_bias_transcription — verifies no crash and valid output when
      custom_vocabulary is passed through options_json
    • vocab_bias_affects_output — runs same audio with and without extreme
      bias and logs both outputs for comparison
    • vocab_bias_direct — bypasses FFI entirely, calls set_vocab_bias()
      directly on the model to prove the engine path works in isolation

Not in scope for this PR

  • MoonshineModel
  • FFI/options_json parsing (needed to connect custom_vocabulary JSON → set_vocab_bias())
  • SDK docs

Testing

cli/cactus test --only stt passes. No regression on existing transcription.

Note: vocab_bias_affects_output currently logs identical outputs because
the FFI JSON parsing layer is not merged yet — vocab_bias_ is only
populated when called directly via set_vocab_bias(). The test is
structured to catch regressions once the FFI wiring lands.

@HenryNdubuaku
Copy link
Copy Markdown
Collaborator

HenryNdubuaku commented Feb 24, 2026

@vyomshah05 thanks so much, we wanna make sure the solution generalises to all speech models, not just whisper. @ammesatyajit will dive into this as he's leading the efforts at Cactus. The modifications should be here

uint32_t Model::decode(const std::vector<uint32_t>& tokens, float temperature, float top_p,

- Add vocab_bias_ field and set_vocab_bias() to base Model class in engine.h
- Apply vocab bias in Model::decode() by merging with tool_constrainer bias
- Parse custom_vocabulary and vocabulary_boost from options_json in
  cactus_transcribe.cpp and cactus_stream.cpp
- Tokenize vocabulary words and build token->boost map at decode time
- Add test_vocab_bias_base_class to test_stt.cpp verifying full chain

Signed-off-by: vyomshah05 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add custom vocabulary / hotword biasing for transcription

2 participants