Skip to content

[Feature]: Improve effort → thinking_budget mapping granularity for models that exhaust their budget #13844

@EurFelux

Description

@EurFelux

Issue Checklist

  • I understand that issues are for reporting problems and requesting features, not for off-topic comments, and I will provide as much detail as possible to help resolve the issue.
  • I have checked the pinned issues and searched through the existing open issues, closed issues, and discussions and did not find a similar suggestion.
  • I have provided a short and descriptive title so that developers can quickly understand the issue when browsing the issue list, rather than vague titles like "A suggestion" or "Stuck."
  • The latest version of Cherry Studio does not include the feature I am suggesting.

Platform

macOS

Version

v1.8.4

Is your feature request related to an existing issue?

Related to #13831.

While attempting to fix the issue where effort was not correctly converted to thinking_budget, I discovered that the current effort → thinking_budget mapping strategy itself is fundamentally flawed for certain models.

Experiment: Using Qwen3.5-397B-A17B with low effort, thinking_budget was correctly converted to 4096 tokens. However, this actually made the model overthink significantly more compared to not passing thinking_budget at all:

Scenario Prompt Thinking Time
No thinking_budget (default) "Hi" ~11s
thinking_budget: 4096 (low effort) "Hi" ~49s

This reveals that the model tends to exhaust whatever thinking budget it receives, regardless of the actual complexity of the prompt. Even 4096 tokens (which is only ~5% of Qwen3.5's ~80k max thinking budget) is far too much for a trivial prompt like "Hi" — a budget of 100 would already be excessive in this case.

Desired Solution

We need a more nuanced approach to effort → thinking_budget mapping that accounts for model-specific behavior differences. Some potential directions:

  1. Model-family-specific scaling: Different model families (Qwen, Claude, Gemini, etc.) may need fundamentally different mapping curves. Qwen3.5 clearly needs much more aggressive reduction at lower effort levels compared to Claude models.

  2. Non-linear mapping: Instead of the current linear interpolation between min and max, consider exponential or logarithmic curves that provide finer granularity at the lower end.

  3. Provider-level overrides: Allow THINKING_TOKEN_MAP entries to optionally specify custom effort ratios or mapping functions per model family.

  4. Consider not sending thinking_budget at low effort: For models that behave better without an explicit thinking budget, the "low" effort setting could simply omit the parameter rather than sending a small value that paradoxically increases thinking.

The ideal solution would be for models to dynamically decide thinking intensity based on prompt complexity, but since that's a model-level behavior we can't control from the client side, we need smarter client-side heuristics.

Alternative Solutions

  • Per-model opt-out: Add a flag in model configuration to disable thinking_budget passthrough entirely, letting the model use its default behavior.
  • User-configurable thinking budget: Expose the raw thinking_budget value as an advanced setting, allowing power users to fine-tune it manually.
  • Adaptive approach: Start with a very low budget and increase it if the model indicates truncated reasoning (though this would require multiple API calls).

Additional Information

The current EFFORT_RATIO mapping:

  • minimal: 0.01
  • low: 0.05
  • medium: 0.5
  • high: 0.95
  • xhigh: 1.0

For Qwen3.5 with a max of ~80k tokens, even minimal (0.01) would yield ~800 tokens, which may still cause overthinking for simple prompts. The root issue is that some models interpret any explicit thinking budget as a signal to think at least that much, rather than treating it as a maximum.

Code reference: src/renderer/src/aiCore/utils/reasoning.tsgetReasoningEffort() and the EFFORT_RATIO / findTokenLimit() mechanisms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ModelsCategorizes an issue or PR as relevant to SIG LLM

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions