Add dynamic max_tokens estimation based on audio length in cactus_tra… by jakmro · Pull Request #499 · cactus-compute/cactus

jakmro · 2026-03-05T15:58:35Z

…nscribe

…nscribe Signed-off-by: jakmro <[email protected]>

Copilot

Pull request overview

Adds an automatic max_tokens default for cactus_transcribe requests by estimating token budget from the input audio duration, so longer audio doesn’t prematurely stop at the prior fixed default.

Changes:

Detect absence of "max_tokens" in options_json and compute an estimated token budget from audio_samples duration.
Apply different tokens/sec heuristics depending on whether the model is Parakeet vs non-Parakeet.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T16:01:42Z

cactus/ffi/cactus_transcribe.cpp

+        if (opts.find("\"max_tokens\"") == std::string::npos) {
+            const float audio_length_sec = static_cast<float>(audio_samples.size()) / static_cast<float>(WHISPER_SAMPLE_RATE);
+            const float tps = is_parakeet ? 30.0f : 20.0f;
+            const size_t estimated = static_cast<size_t>(audio_length_sec * tps);
+            max_tokens = std::max<size_t>(estimated, 100);
+        }


The dynamic default reintroduces hard-coded constants (100, and the chosen tokens/sec values) instead of building on the existing defaults coming from parse_options_json. Since max_tokens is already initialized to its default before this block, consider using the current max_tokens value as the floor (and possibly moving the tps values to config/constants) so the default behavior stays consistent if parse_options_json defaults change later.

Copilot · 2026-03-05T16:01:43Z

cactus/ffi/cactus_transcribe.cpp

+        if (opts.find("\"max_tokens\"") == std::string::npos) {
+            const float audio_length_sec = static_cast<float>(audio_samples.size()) / static_cast<float>(WHISPER_SAMPLE_RATE);
+            const float tps = is_parakeet ? 30.0f : 20.0f;
+            const size_t estimated = static_cast<size_t>(audio_length_sec * tps);
+            max_tokens = std::max<size_t>(estimated, 100);
+        }


max_tokens now scales with total audio length with no explicit upper bound when the caller doesn’t provide max_tokens. For very long audio inputs this can substantially increase decode work across many chunks (previously the default 100 would stop early), which is a potential resource/abuse risk for any service exposing this FFI. Consider clamping the estimated value to a reasonable configured maximum (or a maximum derived from chunk count/max_tps) to keep worst-case runtime bounded by policy.

Add dynamic max_tokens estimation based on audio length in cactus_tra…

8baf8d2

…nscribe Signed-off-by: jakmro <[email protected]>

Copilot AI review requested due to automatic review settings March 5, 2026 15:58

Copilot started reviewing on behalf of jakmro March 5, 2026 15:59 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

HenryNdubuaku merged commit bc268ad into main Mar 5, 2026
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dynamic max_tokens estimation based on audio length in cactus_tra…#499

Add dynamic max_tokens estimation based on audio length in cactus_tra…#499
HenryNdubuaku merged 1 commit intomainfrom
transcribe_dynamic_max_token_estimation

jakmro commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jakmro commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants