Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Dec 10, 2025

Summary

Implements proper interleaved thinking support for DeepSeek V3's thinking mode (deepseek-reasoner model) following the DeepSeek API documentation at https://api-docs.deepseek.com/guides/thinking_mode

This PR managed to get the evals to 85.6% by extending the time of the runs from 5 to 10 min.

image

Changes

  • Add thinking: { type: "enabled" } parameter for deepseek-reasoner model
  • Handle streaming reasoning_content from DeepSeek API that yields reasoning chunks
  • Add tool call conversion (Anthropic tool_use → OpenAI tool_calls format) for thinking mode
  • Add tool result conversion (Anthropic tool_result → OpenAI tool messages)
  • Extract reasoning from content blocks for API continuations (Task stores reasoning as content blocks)
  • Add getReasoningContent() method to accumulate and expose reasoning content
  • Add comprehensive tests for interleaved thinking mode (6 new tests in deepseek.spec.ts)
  • Add tests for tool call support (7 new tests in r1-format.spec.ts)
  • Updated default temp from 0.0 to 0.3

How It Works

Per DeepSeek's API documentation:

  1. Thinking mode enabled via thinking: { type: "enabled" } for deepseek-reasoner model
  2. During streaming: reasoning_content is yielded as reasoning chunks and accumulated
  3. Tool call flow: The model can think → call tools → receive results → continue thinking
  4. Within a turn: reasoning_content is preserved and passed back to the API for continuation
  5. Between turns: Reasoning content is cleared (new user question starts fresh)

Important

Implements interleaved thinking mode for DeepSeek reasoner, handling reasoning content and tool calls, with updated message conversion and comprehensive tests.

  • Behavior:
    • Implements interleaved thinking mode for deepseek-reasoner model in deepseek.ts.
    • Adds thinking: { type: "enabled" } parameter for deepseek-reasoner.
    • Handles streaming reasoning_content and tool calls in createMessage().
  • Conversion:
    • Updates convertToR1Format() to handle reasoning_content and tool calls.
    • Converts Anthropic tool_use to OpenAI tool_calls and tool_result to tool messages.
  • Tests:
    • Adds tests for interleaved thinking mode in deepseek.spec.ts.
    • Adds tests for tool call support in r1-format.spec.ts.
  • Misc:
    • Updates default temperature to 0.3 in deepseek.ts.

This description was created by Ellipsis for 31def3a. You can customize this summary. It will automatically update as commits are pushed.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. Enhancement New feature or request labels Dec 10, 2025
@roomote
Copy link
Contributor

roomote bot commented Dec 10, 2025

Oroocle Clock   See task on Roo Cloud

Re-review complete for commit range b84b5b5..31def3a. No new issues found.

  • No new issues found.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Dec 10, 2025
@bozoweed
Copy link

so needed too for kimi-k2-thinking

here is some more details on that tech

# Your tool implementation
def get_weather(city: str) -> dict:
    return {"weather": "Sunny"}

# Tool schema definition
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Retrieve current weather information. Call this when the user asks about the weather.",
        "parameters": {
            "type": "object",
            "required": ["city"],
            "properties": {
                "city": {
                    "type": "string",
                    "description": "Name of the city"
                }
            }
        }
    }
}]

# Map tool names to their implementations
tool_map = {
    "get_weather": get_weather
}

def tool_call_with_client(client: OpenAI, model_name: str):
    messages = [
        {"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
        {"role": "user", "content": "What's the weather like in Beijing today? Use the tool to check."}
    ]
    finish_reason = None
    while finish_reason is None or finish_reason == "tool_calls":
        completion = client.chat.completions.create(
            model=model_name,
            messages=messages,
            temperature=1.0,
            tools=tools,          # tool list defined above
            tool_choice="auto"
        )
        choice = completion.choices[0]
        finish_reason = choice.finish_reason
        if finish_reason == "tool_calls":
            messages.append(choice.message)
            for tool_call in choice.message.tool_calls:
                tool_call_name = tool_call.function.name
                tool_call_arguments = json.loads(tool_call.function.arguments)
                tool_function = tool_map[tool_call_name]
                tool_result = tool_function(**tool_call_arguments)
                print("tool_result:", tool_result)
                
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "name": tool_call_name,
                    "content": json.dumps(tool_result)
                })
    print("-" * 100)
    print(choice.message.content)

@hannesrudolph hannesrudolph marked this pull request as draft December 11, 2025 19:05
@hannesrudolph hannesrudolph moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Dec 12, 2025
@hannesrudolph hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Dec 12, 2025
@hannesrudolph hannesrudolph force-pushed the feat/deepseek-interleaved-thinking branch 2 times, most recently from 9b50c0e to 5b34edb Compare December 15, 2025 18:03
@hannesrudolph hannesrudolph marked this pull request as ready for review December 15, 2025 19:24
@hannesrudolph hannesrudolph moved this from PR [Draft / In Progress] to PR [Needs Prelim Review] in Roo Code Roadmap Dec 15, 2025
@roomote
Copy link
Contributor

roomote bot commented Dec 15, 2025

Oroocle Clock   Follow along on Roo Cloud

Review in progress.

  • DeepSeekHandler: ensure Azure AI Inference requests still pass the required request path option (OPENAI_AZURE_AI_INFERENCE_PATH) when using deepseek via .services.ai.azure.com endpoints.

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

mini2s added a commit to zgsm-ai/costrict that referenced this pull request Dec 18, 2025
* Add vendor confidentiality section to the system prompt for stealth models (RooCodeInc#9742)

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

* chore: add changeset for v3.35.3 (RooCodeInc#9743)

* Changeset version bump (RooCodeInc#9745)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Matt Rubens <[email protected]>

* Refactor: Remove line_count parameter from write_to_file tool (RooCodeInc#9667)

* fix: remove reasoning toggles for GLM-4.5 and GLM-4.6 on z.ai provider (RooCodeInc#9752)

Co-authored-by: Roo Code <[email protected]>

* fix: handle malformed native tool calls to prevent hanging (RooCodeInc#9758)

Co-authored-by: Matt Rubens <[email protected]>

* chore: add changeset for v3.35.4 (RooCodeInc#9763)

* Changeset version bump (RooCodeInc#9764)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Matt Rubens <[email protected]>

* Convert the Roo provider tools for OpenAI (RooCodeInc#9769)

* Update the evals keygen command (RooCodeInc#9754)

* feat: Add provider routing selection for OpenRouter embeddings (RooCodeInc#9144) (RooCodeInc#9693)

Co-authored-by: Sannidhya <[email protected]>

* ux: Updates to CloudView (RooCodeInc#9776)

* refactor: remove TabHeader and onDone callback from CloudView

- Removed TabHeader component from CloudView as it is no longer needed
- Removed onDone prop from CloudView component definition and usage
- Updated all test files to reflect the removal of onDone prop
- Kept Button import that was accidentally removed initially

* Updates upsell copy to reflect today's product

* Update webview-ui/src/components/cloud/CloudView.tsx

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Update webview-ui/src/i18n/locales/ko/cloud.json

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Update webview-ui/src/i18n/locales/zh-CN/cloud.json

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Test fixes

---------

Co-authored-by: Roo Code <[email protected]>
Co-authored-by: Bruno Bergher <[email protected]>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Update model key for minimax in MODEL_DEFAULTS (RooCodeInc#9778)

Co-authored-by: Roo Code <[email protected]>

* Release v3.35.5 (RooCodeInc#9781)

* Changeset version bump (RooCodeInc#9783)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Matt Rubens <[email protected]>

* Use search_and_replace for minimax (RooCodeInc#9780)

* fix: restore context when rewinding after condense (RooCodeInc#8295) (RooCodeInc#9665)

* fix: remove omission detection logic to fix false positives (RooCodeInc#9787)

Co-authored-by: Roo Code <[email protected]>

* Fix Vercel AI Gateway model fetching (RooCodeInc#9791)

Co-authored-by: Roo Code <[email protected]>

* refactor: remove insert_content tool (RooCodeInc#9751)

Co-authored-by: Roo Code <[email protected]>

* feat: add reasoning_details support to Roo provider (RooCodeInc#9796)

- Add currentReasoningDetails accumulator to track reasoning details
- Add getReasoningDetails() method to expose accumulated details
- Handle reasoning_details array format in streaming responses
- Accumulate reasoning details by type-index key
- Support reasoning.text, reasoning.summary, and reasoning.encrypted types
- Maintain backward compatibility with legacy reasoning format
- Follows same pattern as OpenRouter provider

Co-authored-by: Roo Code <[email protected]>

* chore: hide parallel tool calls experiment and disable feature (RooCodeInc#9798)

* Update next.js (RooCodeInc#9799)

* Fix the download count on the homepage (RooCodeInc#9807)

* Default to native tools for all models in the Roo provider (RooCodeInc#9811)

Co-authored-by: Roo Code <[email protected]>

* Fix/cerebras conservative max tokens (RooCodeInc#9804)

Co-authored-by: Matt Rubens <[email protected]>

* Release v3.36.0 (RooCodeInc#9814)

* Changeset version bump (RooCodeInc#9828)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Matt Rubens <[email protected]>

* Merge remote-tracking branch 'upstream/main' into roo-to-main

* ux: improved error messages and documentation links (RooCodeInc#9777)

* Minor ui tweaks

* Basic setup for richer API request errors

* Better errors messages and contact link

* i18n

* Update webview-ui/src/i18n/locales/en/chat.json

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

* Update webview-ui/src/i18n/locales/en/chat.json

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

* Empty better than null

* Update webview-ui/src/i18n/locales/nl/chat.json

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* i18n

* Start retryAttempt at 1

* Reverse retryAttempt number, just ommit it from the message

---------

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* web: New Pricing Page (RooCodeInc#9821)

* Removes Pro, restructures pricing page

* Solves provider/credits

* Update apps/web-roo-code/src/app/pricing/page.tsx

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

* Updates agent landing pages to not mention a trial that doesn't exist

* Updates agent-specific landing pages to reflect new home and trial

* Indicate the agent landing page the user came from

* Clean up the carousel

---------

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

* Ignore input to the execa terminal process (RooCodeInc#9827)

* fix: Overly round follow-up question suggestions (RooCodeInc#9829)

Not that rounded

* Always enabled reasoning for models that require it (RooCodeInc#9836)

* ChatView: smoother stick-to-bottom during streaming (RooCodeInc#8999)

* feat: add symlink support for slash commands in .roo/commands folder (RooCodeInc#9838)

Co-authored-by: Roo Code <[email protected]>

* fix: sanitize reasoning_details IDs to remove invalid characters (RooCodeInc#9839)

* feat(evals-ui): Add filtering, bulk delete, tool consolidation, and run notes (RooCodeInc#9837)

* Be safer about large file reads (RooCodeInc#9843)

validateFileTokenBudget wasn't being called considering
the output budget.

* Revert "fix: sanitize reasoning_details IDs to remove invalid characters" (RooCodeInc#9846)

* Merge remote-tracking branch 'upstream/main' into roo-to-main

* Exclude the ID from Roo reasoning details (RooCodeInc#9847)

* fix: prevent cascading truncation loop by only truncating visible messages (RooCodeInc#9844)

* FIX + feat: add MessageManager layer for centralized history coordination (RooCodeInc#9842)

* feat(web-evals): add multi-model launch and UI improvements (RooCodeInc#9845)

Co-authored-by: Roo Code <[email protected]>

* Revert "Exclude the ID from Roo reasoning details" (RooCodeInc#9850)

* fix: handle unknown/invalid native tool calls to prevent extension freeze (RooCodeInc#9834)

* feat: add gpt-5.1-codex-max model to OpenAI provider (RooCodeInc#9848)

* Delete .changeset/symlink-commands.md

* Release v3.36.1 (RooCodeInc#9851)

* Changeset version bump (RooCodeInc#9840)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Matt Rubens <[email protected]>

* feat: add dynamic settings support for Roo models from API (RooCodeInc#9852)

* chore: restrict gpt-5 tool set to apply_patch (RooCodeInc#9853)

* Fix chutes model fetching (RooCodeInc#9854)

* Release v3.36.2 (RooCodeInc#9855)

* Changeset version bump (RooCodeInc#9856)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Matt Rubens <[email protected]>

* Better error logs for parseToolCall exceptions (RooCodeInc#9857)

* (update): Add DeepSeek V3-2 Support for Baseten Provider (RooCodeInc#9861)

Co-authored-by: AlexKer <[email protected]>

* web: Product pages (RooCodeInc#9865)

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

* fix: sanitize removed/invalid API providers to prevent infinite loop (RooCodeInc#9869)

* Update xAI models catalog (RooCodeInc#9872)

* refactor: decouple tools from system prompt (RooCodeInc#9784)

* Stop making count_tokens requests (RooCodeInc#9884)

* Default to using native tools when supported on openrouter (RooCodeInc#9878)

* feat: change defaultToolProtocol default from xml to native (RooCodeInc#9892)

* feat: change defaultToolProtocol to default to native instead of xml

* fix: add missing getMcpHub mock to Subtask Rate Limiting tests

---------

Co-authored-by: Roo Code <[email protected]>

* Refactor: Unified context-management architecture with improved UX (RooCodeInc#9795)

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Make eval runs deleteable (RooCodeInc#9909)

* fix: add Kimi, MiniMax, and Qwen model configurations for Bedrock (RooCodeInc#9905)

* fix: add Kimi, MiniMax, and Qwen model configurations for Bedrock

- Add moonshot.kimi-k2-thinking with 32K max tokens and 256K context
- Add minimax.minimax-m2 with 16K max tokens and 230K context
- Add qwen.qwen3-next-80b-a3b with 8K max tokens and 262K context
- Add qwen.qwen3-coder-480b-a35b-v1:0 with 8K max tokens and 262K context

All models configured with native tool support and appropriate pricing.

Fixes RooCodeInc#9902

* fix: add preserveReasoning flag and update Kimi K2 context window

- Added preserveReasoning: true to moonshot.kimi-k2-thinking model
- Added preserveReasoning: true to minimax.minimax-m2 model
- Updated Kimi K2 context window from 256_000 to 262_144

These changes ensure:
1. Reasoning traces are properly preserved for both models
2. Roo correctly recognizes task completion
3. Tool calls within reasoning traces are handled appropriately
4. Context window matches AWS Console specification

* fix: update MiniMax M2 context window to 196_608 for Bedrock

Based on AWS CLI testing, the actual context window limit for MiniMax M2
on Bedrock is 196,608 tokens, not 230,000 as initially configured.

* Update packages/types/src/providers/bedrock.ts

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

---------

Co-authored-by: Roo Code <[email protected]>
Co-authored-by: Matt Rubens <[email protected]>
Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

* fix: use foreground color for context-management icons (RooCodeInc#9912)

* feat: add xhigh reasoning effort for gpt-5.1-codex-max (RooCodeInc#9900)

* feat: add xhigh reasoning effort for gpt-5.1-codex-max

* fix: Address openai-native.spec.ts test failure

* chore: Localisation of 'Extra high'

* chore: revert unrelated CustomModesManager refactoring

---------

Co-authored-by: Hannes Rudolph <[email protected]>

* feat: add search_replace native tool for single-replacement operations (RooCodeInc#9918)

Adds a new search_replace tool that performs a single search and replace
operation on a file, requiring the old_string to uniquely identify the
target text with 3-5 lines of context.

Parameters:
- file_path: Path to file (relative or absolute)
- old_string: Text to find (must be unique in file)
- new_string: Replacement text (must differ from old_string)

* Improve cloud job error logging for RCC provider errors (RooCodeInc#9924)

* feat: configure tool preferences for xAI models (RooCodeInc#9923)

* fix: process finish_reason to emit tool_call_end events (RooCodeInc#9927)

* fix: suppress 'ask promise was ignored' error in handleError (RooCodeInc#9914)

* fix: exclude apply_diff from native tools when diffEnabled is false (RooCodeInc#9920)

Co-authored-by: Roo Code <[email protected]>

* Try to make OpenAI errors more useful (RooCodeInc#9639)

* refactor: consolidate ThinkingBudget components and fix disable handling (RooCodeInc#9930)

* Add timeout to OpenAI Compatible Provider Client (RooCodeInc#9898)

* fix: add finish_reason processing to xai.ts provider (RooCodeInc#9929)

* Remove defaultTemperature from Roo provider configuration (RooCodeInc#9932)

Co-authored-by: Roo Code <[email protected]>

* feat: forbid time estimates in architect mode (RooCodeInc#9931)

Co-authored-by: Roo Code <[email protected]>

* feat: streaming tool stats + token usage throttling (RooCodeInc#9926)

Co-authored-by: Matt Rubens <[email protected]>

* feat: Make Architect save to `/plans` and gitignore it (RooCodeInc#9944)

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>
Co-authored-by: Roo Code <[email protected]>

* feat: add announcement support CTA and social icons (RooCodeInc#9945)

* fix: display actual API error message instead of generic text on retry (RooCodeInc#9954)

* feat(roo): add versioned settings support with minPluginVersion gating (RooCodeInc#9934)

* Revert "feat: change defaultToolProtocol default from xml to native" (RooCodeInc#9956)

* fix: return undefined instead of 0 for disabled API timeout (RooCodeInc#9960)

* feat(deepseek): update DeepSeek models to V3.2 with new pricing (RooCodeInc#9962)

Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com>

* Add a way to save screenshots from the browser tool (RooCodeInc#9963)

* Add a way to save screenshots from the browser tool

* fix: use cross-platform paths in BrowserSession screenshot tests

* fix: validate screenshot paths to prevent filesystem escape

---------

Co-authored-by: Roo Code <[email protected]>

* Tweaks to baseten model definitions (RooCodeInc#9866)

* fix: always show tool protocol selector for openai-compatible (RooCodeInc#9966)

* feat: add API error telemetry to OpenRouter provider (RooCodeInc#9953)

Co-authored-by: Roo Code <[email protected]>

* fix: validate and fix tool_result IDs before API requests (RooCodeInc#9952)

Co-authored-by: cte <[email protected]>
Co-authored-by: Roo Code <[email protected]>
Co-authored-by: Hannes Rudolph <[email protected]>

* fix: respect explicit supportsReasoningEffort array values (RooCodeInc#9970)

* v3.36.3 (RooCodeInc#9972)

* fix(activate): unify webview panel identifier to use consistent tabPanelId

* feat(gemini): add minimal and medium reasoning effort levels (RooCodeInc#9973)

Co-authored-by: Roo Code <[email protected]>
Co-authored-by: cte <[email protected]>

* Delete changeset files (RooCodeInc#9977)

* Add missing release notes for v3.36.3 (RooCodeInc#9979)

* feat: add error details modal with on-demand display (RooCodeInc#9985)

* feat: add error details modal with on-demand display

- Add errorDetails prop to ErrorRow component
- Show Info icon on hover in error header when errorDetails is provided
- Display detailed error message in modal dialog on Info icon click
- Add Copy to Clipboard button in error details modal
- Update generic error case to show localized message with details on demand
- Add i18n translations for error details UI

* UI Tweaks

* Properly handles error details

* i18n

* Lighter visual treatment for errors

---------

Co-authored-by: Roo Code <[email protected]>
Co-authored-by: Bruno Bergher <[email protected]>

* Fix: Correct TODO list display order in chat view (ROO-107) (RooCodeInc#9991)

Co-authored-by: Roo Code <[email protected]>

* fix: prevent premature rawChunkTracker clearing for MCP tools (RooCodeInc#9993)

* fix: filter out 429 rate limit errors from API error telemetry (RooCodeInc#9987)

Co-authored-by: Roo Code <[email protected]>
Co-authored-by: cte <[email protected]>

* Release v3.36.4 (