Skip to content

Add dynamic max_tokens estimation based on audio length in cactus_tra…#499

Merged
HenryNdubuaku merged 1 commit intomainfrom
transcribe_dynamic_max_token_estimation
Mar 5, 2026
Merged

Add dynamic max_tokens estimation based on audio length in cactus_tra…#499
HenryNdubuaku merged 1 commit intomainfrom
transcribe_dynamic_max_token_estimation

Conversation

@jakmro
Copy link
Copy Markdown
Collaborator

@jakmro jakmro commented Mar 5, 2026

…nscribe

Copilot AI review requested due to automatic review settings March 5, 2026 15:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an automatic max_tokens default for cactus_transcribe requests by estimating token budget from the input audio duration, so longer audio doesn’t prematurely stop at the prior fixed default.

Changes:

  • Detect absence of "max_tokens" in options_json and compute an estimated token budget from audio_samples duration.
  • Apply different tokens/sec heuristics depending on whether the model is Parakeet vs non-Parakeet.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +195 to +200
if (opts.find("\"max_tokens\"") == std::string::npos) {
const float audio_length_sec = static_cast<float>(audio_samples.size()) / static_cast<float>(WHISPER_SAMPLE_RATE);
const float tps = is_parakeet ? 30.0f : 20.0f;
const size_t estimated = static_cast<size_t>(audio_length_sec * tps);
max_tokens = std::max<size_t>(estimated, 100);
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dynamic default reintroduces hard-coded constants (100, and the chosen tokens/sec values) instead of building on the existing defaults coming from parse_options_json. Since max_tokens is already initialized to its default before this block, consider using the current max_tokens value as the floor (and possibly moving the tps values to config/constants) so the default behavior stays consistent if parse_options_json defaults change later.

Copilot uses AI. Check for mistakes.
Comment on lines +195 to +200
if (opts.find("\"max_tokens\"") == std::string::npos) {
const float audio_length_sec = static_cast<float>(audio_samples.size()) / static_cast<float>(WHISPER_SAMPLE_RATE);
const float tps = is_parakeet ? 30.0f : 20.0f;
const size_t estimated = static_cast<size_t>(audio_length_sec * tps);
max_tokens = std::max<size_t>(estimated, 100);
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_tokens now scales with total audio length with no explicit upper bound when the caller doesn’t provide max_tokens. For very long audio inputs this can substantially increase decode work across many chunks (previously the default 100 would stop early), which is a potential resource/abuse risk for any service exposing this FFI. Consider clamping the estimated value to a reasonable configured maximum (or a maximum derived from chunk count/max_tps) to keep worst-case runtime bounded by policy.

Copilot uses AI. Check for mistakes.
@HenryNdubuaku HenryNdubuaku merged commit bc268ad into main Mar 5, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants