Skip to content

research(llm): Gemini thinking_level parameter support for flash-thinking models #1652

@bug-ops

Description

@bug-ops

Research Finding

Gemini's latest API supports a thinking_level parameter for flash-thinking models (gemini-2.5-flash-thinking, etc.) that controls the depth of internal reasoning:

  • "none" — no thinking (fastest, cheapest)
  • "low" / "medium" / "high" — progressive reasoning depth
  • "auto" — model decides based on prompt complexity

This is analogous to Claude's thinking block (budget_tokens) and OpenAI's reasoning_effort ("low"/"medium"/"high").

Applicability

The Gemini provider (Phase 3, PR #1635) is currently being built. thinking_level should be added as part of the Gemini configuration or as a provider hint alongside existing params.

Design

[llm.gemini]
model = "gemini-2.5-flash-thinking"
thinking_level = "auto"  # none | low | medium | high | auto

In GeminiRequest / GenerationConfig:

#[serde(skip_serializing_if = "Option::is_none")]
thinking_level: Option<ThinkingLevel>,

Only serialize when model contains "thinking" or when explicitly set to avoid API errors on non-thinking models.

Source

Research session 2026-03-13. Gemini API docs (ai.google.dev/gemini-api/docs/thinking).

Priority

Medium — applicable once Gemini Phase 4+ is implemented (streaming, tool use). Can be deferred until after Phase 3 stabilization.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestllmzeph-llm crate (Ollama, Claude)researchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions